With the highly non-linear relations in the data, it would be silly to assume that a shallow model can capture them all or at least most of them. That is why we are working on a model with more layers to disentangle the data. Thus, we decided, in the spirit of unsupervised pre-training, to use an autoencoder to reduce the complexity of the input data. With it, we can reduce the dimensionality of the data and we get a first set of good features to describe the data. The idea is to feed the new features into the Siamese network to further disentangle the data. As noted, we suppose that more layers are required to build a proper model of the underlying data.
A while ago, we read some papers about conditional RBMs and since the concept is not restricted to this algorithm, we decided to enhance our AE with auxiliary information to get similar capabilities. The idea is similar to the discriminative RBM, but instead of conditioning on the hidden nodes, it is done on the visible nodes. To be more specific, we use an extra weight matrix with |visible| x |aux| entries to model the interactions between the aux data with the input data.
In a first setup, we used the genre information as the aux data to capture the relation of specific keywords with the genres. In this case, the aux vector is not always a one-hot vector since a movie might have several genres. For the regular input data, we use the keyword features. The model is identical to the classical AE except for the reconstruction step with is enhanced with the genre “bias”:
x_hat = f(dot(hidden, W.T) + bias_visible + dot(aux, V.T))
To see if the model really captured some relations, we treat each column of matrix V as a weight vector for a genre that describes how important the keywords are. We further sorted the weight by its magnitudes and consider only the top-k results. Here are some examples:
Crime: prison,attorney, murder, undercover
Sci-Fi: alien, machine, space, future
Horror: maniac, psychic, nightmare
War: war, nationality, anti, seal, government
Sport: players, sports, camp, training
Music: music, dance, blues, rock
As we can see, the model is clearly able to find correlations between genres and keywords. However, since the distribution of genres is very different, the approach does not work very well for general genres like ‘Action’ or ‘Drama’. In other words, the more specific a genre is, the easier it is to find some relations. For instance, about 20% of all movies are marked as ‘Drama’, while the genres with distinctive keywords have much lower % values: Crime/Sci-Fi/Horror/War/Sport/Music: 5/3/5/2/4/2
In terms of concepts, one could say that Drama is more diverse than Music or Sci-Fi which is the reason why so many movies are marked with this genre. The same is true for Comedy which has a percentage of about 15%. Stated differently, it is much more likely that a movie has funny or dramatical elements in it than it has western or war elements.