In the first post about movie2vec we closed with the idea to use a recurrent network to encode a sequence of words as a fixed vector. First, let us summarize what we got so far: We trained an embedding model of plot keywords of movies by constructing a graph that encodes the relation between the words. Thus, each word is represented by x-dim vector in the feature space.
To learn something about the knowledge encoded by such a system, we used the embedding to learn a simple softmax model that predicts if a certain concept is present in a movie or not; we used handpicked genres for this task. To convert the sequence of the embedding vectors into a fixed length vector, we averaged all vectors and the result was used as the input to the softmax. This is our baseline.
But such a model is too simple to capture higher-order correlations which is why we decided to use a recurrent network for the transformation. We got our inspiration from a Theano example that used recurrent networks to predict the sentiment of a review. The idea is to average the hidden representation over time which leads to a fixed vector that is fed into the softmax layer to get the prediction.
This is exactly what we need, to turn a sequence of words, the embeddings to be precise, into a shared representation that can be used to represent a movie. A drawback is that the learned features are driven by labels which limits the use of them. However, a more severe problem is that in contrast to real NLP problems, a sequence of plot keywords does not have an order. Therefore, we need to investigate how this interacts with the model dynamics and how we can find a deterministic way to ensure consistent encoding of sequences.
We started with a very simple model, an recurrent Elman network with 50 hidden tanh units and a softmax with 10 units. The network was trained for some epochs with rmsprop + momentum and the model was then compared to the softmax baseline model. Because we were more interested in the errors of the model, to see what it has learned, we picked some movies and compared the top-2 predictions. It should be noted that not all genres were used for training which explains some approximations made by the model:
– softmax: 46.22% horror, 29.22% sci-fi => 75.44% confidence
– recurrent: 48.39% horror, 48.27% sci-fi => 96.66% confidence
– softmax: 60.07% horror, 24.62% sci-fi => 84.69% confidence
– recurrent: 62.48% horror, 36.39% sci-fi => 98.87% confidence
For both models, the predictions are reasonable and for Zombieland the sci-fi portion can be explained because movies with a post apocalyptic theme are often also sci-fi movies. However, it is no coincidence that the confidence of the recurrent model is usually higher than its softmax counterpart. We need to study this further, but we assume that the recurrent model better captures the correlation of the plot words and thus, the predictions have more confidence.
We just started our recurrent journey but with our first results we are confident that this is not the end but the beginning of a long and fruitful travel to the depth of a new world.