We already wrote something about the popular “dropout” approach and especially in our case, it makes sense, because we do not have much data and thus, our models can very easily over-fit.
Not only that we can hopefully fight against the over-fitting the model averaging, that is done by dropout, also helps us that a single neuron does not too much rely on its neighbors to correct errors. That is very easy to see because in each epoch, no other neuron can be sure that its neighbor will be part of the network. Since the selection process which neuron to drop, is randomized, each neuron is now a bit of a lone fighter. As cruel as this sounds, empirical results indicate hat this procedure does not only avoid over-fitting for smaller data sets, but it can also improve the extracted features.
There is a lot of literature about dropout and computer vision problems and some paper also include experiments on text, but in all these papers there are not much details and the gain for this domain seems to be much lower. The situation will be further complicated since there are not much standard data sets for text and thus, usually the classification of articles is described. The problem is that these experiments are all supervised, but what we want is to use dropout with auto-encoders to learn good features from text, not to classify it.
As mentioned before, the actual implementation of dropout is straightforward for neural networks. The price is that the training time usually increases a lot, because of the noise introduced by the random dropout of nodes. Our ultimate goal is to study the potential of dropout in the domain of ‘text’. To be more specific, we plan to use it to create a hierarchical model for semantic concept features for movies.