In the previous post, we briefly described a discriminative AE to learn a manifold to separate two classes of data. The approach worked pretty good, but since our domain are movies, we have a lot of classes and not just two of them. For that reason, it would be more beneficial to learn a metric that is able to pull similar genres together while dissimilar genres are pushed farther away. That sounds pretty much like our experiments with Siamese networks and indeed, the objective is to learn a (linear) transformation x’ = L*x, to project x into some lower dimensional space. And indeed the setup is very similar because we also utilize pairs of movies and a label to learn the metric. A pair consists of two movies (i,j) and a label (-1, +1). If a pair should be treated equal, we use (i,j,+1), otherwise (i,j,-1) to indicate dissimilarity.
Here it comes very handy that we worked on a taxonomy on the sub-genres, so we can condense all relevant genres into a single label. For instance a ‘crime-thriller’ and a ‘detective-film’ are very likely o share some major themes, similar to ‘sci-fi-horror’ and ‘horror’. In other words, we compress all genres into a few labels that represent classes of similar movies. This mapping is used to create pairs of movies annotated with a label to form the training set.
The training is done by ordinary gradient descent to minimize the pair-wise loss function. Finally, we test the model by using certain movies as queries and analyze if the nearest neighbors come from the same taxonomy. For instance, the nearest neighbors for the movie ‘Doom’ are: Dogma, Constantine, House of Dead and Hellboy. The results indicate similar issues to the previous model. For instance, the horror theme of Dogma is pretty low, however, the model captured the angel-demon aspect which is also -partly- present in Doom.
As a first shot, the approach seems promising by tuning is definitely required to get rid of some issues and as usual, we need to avoid over fitting because our data set is limited.