Copyright © 2017 DataScience.US All Rights Reserved.
When training a neural network, you want it to be able to discern the patterns in the data you’ve given it and then be able to generalize these patterns to those occurring in the world. If you were training an image classifier, you’d want it to be able to recognize qualities of specific objects, like dogs, and be able to recognize these qualities whenever it sees an image of a dog that wasn’t included in the data set it trained on.
While one problem is that a neural network can fail to detect the correct patterns or recognize certain qualities in a training set, there’s another problem that can occur when training a neural network. A neural network may begin to match patterns in the data set too strongly, and fail to generalize the patterns to new situations. This means that while the network performs well on the training data set, it can’t perform as well on any testing data or models that you would want to apply it to in the real world. This problem is referred to as “overfitting”. Your neural network has fit too well with the environment it was trained in. It has adapted too closely to the training data and now it can’t apply itself to testing situations.
Overfitting is a problem for neural networks in particular because the complex models that neural networks so often utilize are more prone to overfitting than simpler models. You can think about this as the difference between having a “rigid” or “flexible” training model. Rigid models will keep more or less the same form no matter how much you push on them, so they can’t align themselves to the parameters of the training data as easily as flexible models. Flexible models, models with more variables, have more chances to bend. There’s more opportunities for them to become aligned to the specific intricacies of the training data.
One of the benefits of neural networks is that they are flexible, but this means precautions must be taken against overfitting. Because neural networks often go through many iterations while being trained the risk of overfitting is even higher. How does one deal with the problem of overfitting?
One way of protecting against overfitting is by employing multiple neural networks, in an ensemble model. The multiple neural networks would work together to provide a larger, more holistic representation of the landscape containing your testing data. This can quickly become expensive, however. More processing power is needed to utilize this method of defense against overfitting.
A better method of dealing with overfitting is something called “neural net dropout”. Neural net dropout refers to randomly dropping units, or neurons, in your neural network. Why you want to randomly delete connections between units in your network? What possible help could this be?
In general, you want to make a neural network fit by making it robust. “Robustness” here refers to its ability to handle adversity or chaos. Randomly cutting units from the network is one way to test how robust it is. It does this by making it more difficult for units in a network to accidentally compensate for one another. Any given unit in a neural network may make an error, and there is a chance that the error could be nullified by another unit making an error which pushes back in the wrong direction. This would cause a situation where the network has the correct answer but for the wrong reasons, or arrived at the answer by the wrong process. Dropping units helps guard against this, ensuring that any correct answers the network arrives at are truly because the algorithm being used is successful, not the result of chance.
There are a variety of different processes and algorithms one can use to drop certain units. One method introduces noise into a neural network by multiplying each layer in a neural net by a vector of probabilities, the vector of probabilities refer to the probability that any specific unit will be dropped from the network. The output of the network will have been randomly thinned, allowing the implementation of backwards and forwards propagation to correct for errors.
It’s important to note that you need to take into account the fact that you will employ dropout into the architecture of your network. For example, if there is a 15% chance that any random unit in a layer would be dropped from the network, the architecture should be designed with a 15% increase of units in that respective layer.
While smaller scale machine learning projects may be able to get by without using dropout, dropout is almost a necessity for any large scale project with complex models involving many different variables.