RBMs Revisited.

Boltzmann Machines are a fascinating concept and they are also very powerful, actually, too powerful to use them in real-world applications. However, there some variants of it that can be trained more efficiently at the cost of restricted capabilities. The name of this variant is RBM, short for, Restricted Boltzmann Machine.

Informally, the idea of RBMs is to learn the structure of the data, to disentangle the explaining factors of it. This helps to learn a condensed representation of the data where each neuron learns a specific “feature” that explains the data, a probability, in the range [0,1], to indicate how much a data sample matches this specific feature.

Let us illustrate the process with a simple example. If the input data is a set of movies and each movie is described by a list of descriptive keywords, the RBM learns to “cluster” semantically relevant words, where each neuron would be the centroid of a latent topic that was found in the data. At test time, each neuron outputs a probability value that indicates how much the given movie matches the detected concept that was learned by a neuron. For instance, an output of H = [0.3, 0, 0, 0.9] means that topic 1 and 4 are present in the movie and furthermore, topic 4 is more prominent, while topic 2 and 3 were not found at all. It is not always possible to label neurons with human-readable topics, but often a brief summary, at least of the top-k features weights, is possible. Like “highschool”, “western”, “supernatural”, “sports” and so on. However, RBMs are also capable to generate “fantasy” samples, in our case movie descriptions, after it was trained a set of sample. This feature allows us to study to see what a particular trained model beliefs in and how plausible the generated samples are. The generation of such fantasies can be done with Gibbs sampling.

In a nutshell, RBMs are generative, unsupervised models to learn explaining factors for a set of data samples. After the model is trained, the factors can be extracted with a simple matrix multiplication: hidden = f(weights*sample + bias). Since RBMs can model various kind ofdata, they are extremely versatile and are used in many different contexts, like unsupervised pre-training, collaborative filtering, deep belief networks, motion, facial expressions and many more.

The reason why we bring up RBMs again is that we recently stumbled about some ideas how to use Conditional RBMs to build a feature space that better utilizes input data from different domains. In the next post, we will describe some details.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s