It is no secret that features cannot capture every aspect of a movie. For instance, with plot keywords, themes and genres, we are often able to capture the main signal of a movie, but we are not able to disentangle the signal into all its latent components. This is when collaborative filtering comes in to factor the signal into different components with the help of user ratings. Both methods have drawbacks, and each system alone is not able to determine all preferences of users.
For example, a certain movie might have an excellent rating, say 7.4 of 10.0, but due to its “avant-garde” genre, a lot of user will probably react to it very strongly. In other words, it is likely that users will not finish the movie and give a bad rating, but on the other hand, it is also possible that their preferences are met and that they give a good rating. This might lead to a strong polarization of the ratings, a bi-modal distribution, of low and high ratings. Therefore, it would be beneficial to determine a “mainstream” bias for a user or vice versa, an “avant-garde” bias which means we need to find out what type of movies a user mostly enjoy. This would help to avoid to suggest movies to users that are opposite her taste.
To rephrase our problem: If two movies (A,B) share lots of topics (keywords) and the user enjoyed movie A, a content-based method would likely suggest B, even if B is known to be “artistic.” To tackle the problem, we would need a classifier to decide if a movie is mainstream or not. However, this probably requires very powerful features which go far beyond simple plot keywords. And since not all movies are tagged with “artistic”, we cannot rely on sub-genres or special flags.
In a nutshell, we are not going that far to call it an Recommender Winter, but too often current research does not aim at helping users but to beat records and working around the real problems. Like the existence of several platforms that produce ratings, but often do not provide ways to export ratings to re-use them. Or the problem that meta data is practically not existing for lesser known or foreign movies. In contrast to pictures and “documents”, movie data is mostly isolated and cannot be easily condensed or extracted for further user.