In the previous post, we argued that supervised learning which classification is a subset of, might not be powerful enough to capture all regularities of the data to generalize to unseen data. Of course the problem can be partly solved by more training data to allow the model to learn more patterns from the data, but nevertheless the information that can be carried by labels is limited in general.
Especially for the domain of recommendations, the paradigm of classification is far from being optimal. The reason is pretty simple: users will not watch any stuff that is known to be bad or they know they will not enjoy it. Of course they will watch a movie they might not enjoy, but those movies will not suffice to serve as negative examples. Therefore, it would make much more sense to find patterns in the movies a user enjoyed and to use those patterns to predict future movies than to divide movies into a positive and negative set to learn a classification boundary.
A reason why this might be problematic is that positive ratings are (much) more frequent than negative ratings for the reason given above. Furthermore, if a user rated a movie with “-1” because of a very special reason, a classification model will treat the features as evidence for the reason, because the label does not encode the reason why a user rated the movie with “-1”. Otherwise, a model could use only a portion of the features to emphasize why a user made this decision.
This sounds a little like data mining because we focus mainly on finding pattern in data, but this is just the first step. Eventually, we want to use the knowledge to perform a ranking on unseen data. The whole procedure can be seen as “learning without negative samples” which sounds like unsupervised learning. But at the end, it is not easy to draw lines between those approaches because it combines several aspects.
Moreover, instead of assigning just a score to individual movies, we would like to give at least a (very) simple explanation why the model has chosen this particular movie for a user. This is in the spirit of C4.5 where human-readable rules can be derived from the data.
What we envision is not a precise model but a new paradigm to learn user preferences. The reason is that we believe that a reasonable model to capture user preferences cannot be build with a classification model regardless of how sophisticated it is. As argued in the last post, such a model stops learning when there are no errors which is no guarantee that all patterns from the data have been learned. And that might be a serious bottleneck to correctly estimate the preferences for unseen data. Without a doubt not all patterns can be found in the training data, but to learn as much as possible of them is a much better way than to learn until the error is zero which is dictated by the limited information encoded in the labels.