The the results of our latest experiments with Deep Boltzmann Machines were quite amazing. However, since deeper models also mean that we have to train much more parameters, we have still a long way to the finish.
Meanwhile, we consulted the existing literature on DBM training to get a better understanding and to avoid common pitfalls. A paper from 2012 from the University of Montreal about DBM training indicated that some training methods can lead to many dead neurons due to poor local minima. In other words, these neurons do not adapt to the underlying data and stay very close to their initial random initialization.
We tried to reproduce the results with our DBM implementation and we used a small handwritten digit dataset for training. To track dead neurons we printed the L2 norms of all neurons after a fixed number of epochs and we also displayed all filters. The effect was clearly visible with our data. On the one hand, there are the ‘normal’ neurons, for which the L2 norm increased with each new epoch and the visualization confirmed that the filter learned to recognize something. On the other hand, the are the dead neurons that all have a very low L2 norm and there is only minor progress with each new epoch. The visualization of these neurons clearly indicates that their are very noisy (dead).
Of course, we were interested to see, if this behavior also happens with our movie data. Needless to say that we cannot easily visualize the filters to see if the neuron is dead. However, we can track the L2 norm of the neurons similar to the handwritten digit model and to our surprise there were no clear outliers in our model. The L2 norm of each neuron steadily increased with each new epoch.
Because our setup is very different compared to experiments with standard datasets like MNIST, it is not easy to draw conclusions from this different behavior. As we noted many times, our data is very sparse, it consists of binary features and not much data is available for training. Nevertheless, since we can check the learned topics of each neuron in our model, we can at least confirm that the training procedure (partly) succeeded.