The Art To Train RBMs

With all the papers and best practices that were published so far, training an RBM should be no big deal, especially not in its simplest form, with binary hidden units and if we assume that there is sufficient data that was properly encoded. And yes, if we train a model for digits, or if we use a prepared dataset, we usually encounter no problems. Maybe, we need to tune the learning rate and some other hyper-parameters, but this is usually also no problem.

For non-standard data the situation is a bit different. First, we can never be sure if there is enough structure in the data that allows the RBM to be successful in terms of explaining factors. Second, sparsity and high-dimensions can be also a problem because then a global learning rate is counter-productive because of the strongly varying frequency of features. Third, if we only have a weak learning signal, the size of the mini-batches and the learning rate is very crucial, because if the batch is too big and the learning rate to small, we will loose the signal.

In a nutshell, for custom data, we usually need to put more effort into the training and it is very likely that more sophisticated approaches, like adagrad, or dropout, is required to train a good model. Not to forget that there is no simple rule to find the best number of hidden units.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s