More About Tags

The binarization of the tag data already lead to good results. However, since it is likely that a tag for a movie is present more than once, the simplification of the data is definitely a bottleneck. That is why we decided to use a more sophisticated approach to train a model for the tag data and because the Replicated Softmax model has already been proven to be useful in the community, it was our first choice.

The modification to prepare the training data was minimal and so was the implementation of the new RBM model. Otherwise, our setup remained the same; we used the most frequent tags and trained a model with the adjusted visible units. The only exception is that now a movie is considered a ‘document’ with a fixed dictionary of words (tags) and each word is represented by a discrete value.

This time, we decided to output the N closest neighbors for each movie, in the new feature space, to learn more about the learned latent topics and the correlation of the individual tags.
For starters, we selected a movie with fewer tags to simplify the analysis. The theme of the movie can be broadly described as an action film with a focus on cars. The tags of the movie are {action, boring, cars, dvd} and we expect higher weights on ‘cars’ and ‘action’. Thus, it was no surprise that that the ten nearest neighbor movies also had a ‘cars’ tag present. But to see the full capabilities of the model, we have chosen a neighbor without a ‘cars’ tag for further examination.

Again, we were not surprised that this movie had a car-related tag: ‘car chase’. As a next step, we studied the weights of the neurons to see if there is a neuron that is sensible for car-related keywords and we found out -not very surprisingly- that ‘cars’ and ‘car chase’ are positively correlated.

We did not gain much new insights from this experiments but a lot of our intuitions were confirmed which is still very useful and the Replicated Softmax model seems to be a good start for further experiments.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s