There are different kind of features, for instance in images, we use the raw pixel values to train models, while for audio, we might encode the data first and it is similar in the domain of text. Usually the features are not hierarchical, but only the labels to describe the data. For instance, different kind of animals, cars, furniture and so forth. In our case, we have a feature hierarchy of three levels: keywords, themes and genres. This can be compared to images of digits. First, we have the raw pixels (keywords), then we have edges and finally the whole digits (genres). The major problem is that in the case of digits, we can learn those layers, while in our case, we already have those data explicitly as hand-crafted features.
That means each layer represents knowledge at a different scale, where each layer is less abstract and more concrete than the previous one. For example, two movies with the genre ‘action’, can be only compared at a very high level. Those two share definitely more topics than an ‘action’ and a ‘horror’ movie but beyond this, there is not much we can say about similarities.
The next layer -the themes- is adding a lot of more information which makes a comparison at a lower level already very precise. Again, we consider two ‘action’ movies and the themes of both movies have an overlapping theme, named ‘zombies’. With the genres and the themes, we can already train a good classifier to predict the preferences of users at a coarse level.
Here, it is important to consider both levels, because it might be possible that a user likes ‘action’ and ‘zombies’ but not ‘romance’ and ‘zombies’ and for a coarse classification that is all information we need. Further details, like if the movie is also about a treasure hunt where the protagonists are attacked by hordes of zombies, are also important, but those features are more useful to rank the non-rejected movies to order them by a score.
In other words, the idea we follow is something like ‘reject often, reject early’. With this approach, we can filter out already a large portion of movies in the first layer and so forth, like for instance, ‘mystery’ and ‘romance’. Of course a clear-cut is not always possible, but since some genres and/or themes are very distinctive, we have a good chance to reject a lot of movies very early. However, a major drawback is the fact that we need a subset of these features to train the final model. To be more specific, to train a classifier on the genres, we need some features and those features have to be from one of the given layers, usually the keyword layer, because it carries the most information. It is a classical chicken-egg-problem.