We are mainly interested to build a latent, semantic space of movies but at the end, we would also like to use it to build a personalized recommender service for all kind of movies. We already mentioned that series and movies are two different concepts that cannot be treated as equal in such a system and thus, they need different procedures to rate them. Furthermore, ratings itself should not be considered as static and therefore, also need a flexible treatment.
In other words, do we really would give exactly the same rating to a movie we have seen a couple of years ago? Maybe we are not interesting in those themes any longer, or we find the movie kind of silly because of all the time that passed. Also the contrary might be true that a movie we did not like back then is more appealing to us now; who knows…
Like labels in supervised models, the information in the rating itself is pretty low. Why? Because the rating only expresses that we liked a movie or not, but not why we gave a low or high rating, which is the information we are really interested in. In a collaborative setting, we consider the relations between ratings, which contains lots of more information and is thus more useful to extract features. We do not want to start a discussion about a calibrating ratings, but it is extremely unlikely that a 4-star rating of one user means exactly the same for another user. And not to mention that ratings might also depend on other factors, like mood, or if they were done in bulk or not.
Without a doubt, the ultimate goal of recommendations should be a benefit for the user and it should not be underestimated that rating an item should be as easy as possible. With a 10-star rating, the cognitive burden to find the best value can be very high. For instance, a user wants to rate a movie and her decision is 4-6, now she has to compare all movies that she has rated within this range and somehow decide where to put this particular one. With a 5-star approach the choices to made are much lower and with a simple up/down approach it is even easier. Of course, we lose information with fewer ratings levels, but at the end, we have to ask if the details are worth the complexity for the user? It is a trade-off between usability and accuracy.
A simple example. We assume that a user mostly watches TV series on a mobile device. That means that there is a high-level of interaction between the device and the user. In case of a cold-start, (weak) preferences can be easily elicited with an up/down approach by monitoring what series she watches, for how long, and the items on the “to-watch list.” At the end, the question is how to rate a series? Every episode with an N-star rating? The whole series? An up/down approach per episode? One rating per season? There is no easy answer for this problem.
In a nutshell, a good user interface is the key to success for personalized TV, but also very challenging to design. No doubt it should be non-intruding and learn as much as possible by only watching the actions of a user and to study her habits without much interventions. While ratings can be very useful, the actual rating procedure should be as lightweight as possible. This is the reason why we believe that the maximum level of ratings should not exceed five because otherwise too much time is wasted on deciding the rating level and not to mention that more levels are increasing the risk of inconsistencies.