After the quite amazing results of our toy example with Deep Boltzmann Machines, we decided to go a step further. But before we want to discuss some specialties of our model. In general, the various Boltzmann Machines are generative models which means we can sample from the model to get a better understanding of what the model beliefs in. A typical example is the training with hand written digits and the sample from the model is than a digit that was created by the model. However, in our case this feature is not important since we are not interested to sample new movies from the model but only to explain the data we trained the model with. For this purpose energy-based models, like DBMs, are definitely useful but this also means -in our case- that we only use a very small portion of the real potential of such a machine.
This becomes more obvious if we consider the approach we use to semantically cluster movies. First, we train a DBM model for each movie genre, more precisely, we only consider the most frequent genres. Then we condense each model to a set of some neurons, for instance, the ones with the largest L2 norm and normalize them. All these neurons are then combined into a ‘Topic Network’.
The parameters of a DBM with two layers consists of (W, V, a, b, c) where the uppercase letters describe weight matrices and the lowercase letters bias values. From these parameters, we only use C = W’ * V’ and we discard the others. That means the biases are only used for the training and not required for our feature representation. Our transformation function is simply the dot product of the weight vector of a neuron and a movie sample. Due to the constraints, the result is always in the range [0, 1]. This side-effect is very useful if we plan to use the feature representation as input for a new (DBM) model.
Then, the final model is nothing more than a Top-K query known from information retrieval. The input is an arbitrary movie and the output are the closest neighbors in the semantic feature space. And the results speak for itself as we will demonstrate with some movies:
Pirates of the Carribean: Hook, Pirates of the Caribbean -Part 2- and -Part 3-
Shark Attack: Red Water, Sharktopus, Jaws, Jaws 2, Shark Attack 3
Batman: Batman Returns, Batman & Robin, Spawn, Spider-Man 2
It is no secret that the model is not perfect yet, but the results for movies with sufficient meta data are already very good and we did not use any fine-tuning of the final model. In other words, even if we might use the DBM in an unintentional way, the test of our (rather simple) approach confirms that the training of an DBM with our data leads to a valid model; a model that can be used as a module in a larger system, for instance a topic network like ours.
We plan to further investigate the use of DBMs for semantic clustering, but we are also eager to use topic networks for preference-based learning or stated differently, to train a supervised classifier that can be used for personalized movie rankings.