Let us assume that two movies have the same genre. What does that mean? Very little actually, because two “action” movies does not have to share much to be assigned to that genre. Why we are stressing this point so much? Any model needs an objective function to minimize, sometimes also called cost function, and the process is usually supervised which means that we predict a label with the current set of parameters and adjust them if there is an error in the prediction.
That means, if the model correctly predicts what movie should be assigned to the “action” genre, the job is done. Stated differently, the model does not need to understand all factors of action movies to classify them correctly, all it has to do is to build a decision boundary that separates “action” from all other categories.
For the domain of preference-based learning this is not very useful, because a user might enjoy some specific action movies, but not all of them. A solution to the problem is to train a model that predicts if a user enjoys a movie or not, regardless of the actual genre. Clearly, such a model would benefit from a (different) model that projects movies into some feature space that disentangles some or most explaining factors of a movie.
And this is one of the major challenges we currently face, since the features of traditional supervised models focus to explain broader concepts like genres, but only as much as the model requires to get the predictions right. What we want is to disentangle explaining factors of movies which allows us to use a rather simple, maybe even a linear, model to capture the preferences of users. As mentioned earlier in several posts, such generic features could be used for a wide range of problems, including, clustering, retrieval and ranking.
We have serious doubts that such powerful features can be build by using data from a single domain, like sparse keywords, genres or themes. We definitely require multiple modalities, like tags, summaries or ratings to build a good model. Furthermore, the procedure needs to be unsupervised since most of the data out there does not have a label and even if it does, the label usually does not carry enough information.