In the last couple days we thought a lot about preference learning. Usually the approach is used to determine a function that assigns a utility value to arbitrary items based on the partially known preferences by some user. The training of a model requires a feature set to describe the items and either labels or preferences of a user, like A is better than B, to derive a classification or ranking.
Our aims are very similar but instead of using the raw features directly, we are interested to disentangle the feature data first to get a better representation of the underlying data. Furthermore, it is also more likely that the de-correlation of the data better captures high-level concepts that are hidden in the data.
Now the question is what features to use as a basis? In other domains, for instance, laptops, the product can be more precisely described by its features (CPU, hard disk, display, RAM, …). The features of movies are mostly hidden and the one that are obvious (genre, actors, budget, country, certificate) are not very useful or extremely sparse.
For instance, Stallone has about credits for 70 movies but a dataset might contain over 100,000 movies. To find concepts or relations for this actor is definitely a challenge and so is to determine if a user likes his movies because of the actor or a probably common, but hidden, scheme that is present in most or some of the movies.
In other words, the causes why a user likes a specific movie is very likely a combination of dozens of factors. It might be possible that the user likes the music, one actress, the story or even only the ‘CGI’ effects. To express these preferences as features derived from a model that only uses very basic movie features seems to be impossible. Nevertheless it is our only and best chance we have.
That is why it is very important to collect as many different features as possible and to find a way to combine them in a single model. Besides the obvious features (topic keywords, actors, genre, …) we plan to utilize more detailed summaries, reviews and extensions to model relations between some feature domains.