With the introduction of conditional weights for keywords based on the gene of a movie, we were able to improve our models to build a concept space a lot. The sparsity of the data is still a problem, but now we can at least better discriminate movies with similar keywords but from different genres.
And that brings us to another aspect we currently working on. Let us assume that we want to find the closest neighbors for a particular movie. The question is if the concept space should also use the genre information to better separate movies, or, if the genre is already too restrictive to model semantic relations of movies.
Again, we demonstrate this with an example. If the user wants to find the best matches for a chosen movie M, we can either prefer to return the closest movies by only using a concept model based on keywords, or we can use a joint model of genres and keywords that gives higher scores to movies with the same genre as M.
There is no ultimate answer for this question because it depends on the exception of a user. If the user searches for movies that are similar to Jaws, the genre ‘animal-horror’ can be very useful to give matches in this genre a higher score. On the other hand, it is also possible that the user wants to consider all aspects of a movie and therefore does not focus on a single one. In this case, the additional weight of the genre can hurt the performance, because there might exist some movies that better match the reference but they come from different genres.
In contrast to pure research, our ultimate goal is not a model with the lowest error score, but a system that can be used in everyday situations. Of course a good model is important but so is the interface to the user. A nice GUI helps a lot, but if the user cannot do what he intents with it, it is practically useless.
When we started, we mainly focused on the question if a user likes a movie that is on air today; in other words, we used only supervised models. In case of linear models, the problem is that similarity of movies is not explained in terms of latent variables but only with keywords. That means, if two movies are semantically related but do not share the same keywords, a model without “hidden factors” would treat them as _not_ equal. Therefore, such items wouldn’t be suggested to the user. If we now consider unsupervised or semi-supervised models, we can relate semantic keywords with an additional hidden layer, which allows us to treat items as similar even if the intersection of the keywords is empty. In case of our concept space model, we learn to relate movies with labels that were not provided by the user.
Stated differently, we are still working on a clever clustering scheme for movies that models high-level concepts of movies, like ‘super-heros’, on the one hand, but also finer details like ‘Clooney movies’ on the other hand.
Our next steps are to enhance our model with additional meta data, like actors or directors because persons are definitely a very important factor when a user makes a decision what movies he wants to watch.