In our last post we discussed a simple way to track dead neurons during training of our Deep Boltzmann Machine. The idea is simple but very effective: consider the L2 norms of weight vectors and pick those that are far away from the mean norm. The problem is probably related to poor minima found during training and was noticed by some researchers.
Since we are interested to assess the precision of the trained model, we adopted the measure to estimate the quality of the utilized parameters, like the learning rate, batch size, momentum, hidden nodes, etc.). In other words, in cases of high variance of the L2 norms, we manually checked the learned topics of our network and actually, high variance is at least an indicator for a lower precision of the learned model.
We further analyzed how many times a keyword is present in a trained neuron. The idea behind it is that if a keyword is present in many -or even all- neurons, the discriminative power of the keyword is low. There might be some very important keywords that are present multiple times but if we assume that the learned model describes a dedicated latent topic, the total occurrence of a keyword should be low.
During all tests, we used weight decay (L2) to keep the weights as small as possible. However, even with moderate settings, the rate of increase for the parameters W, V was sometimes very steep. Thus, we decided to study the influence of larger weights for the final precision of the model. With the results at hand, it seems that we found another indicator for low precisions, at least for our data, since larger weights almost ever resulted in very crude topics. For that reason, we decided to use a very high value for the weight decay and repeated the training and actually, the weights stabilized very soon and the trained model was able to learn very sophisticated topics.
However, none of these results are a proof that these factors are really correlated or that these rule-of-thumbs are even generally valid but with each new piece of insight we gain, we will get a better understanding of the material.