Without the utilization of the actual media data from movies, the task of recommendations is often still shadowed by manual feature engineering. This does not have to be a bad thing, but it is very time consuming and requires a lot of expert knowledge of the domain. And with hand-crafted features, even models with multiple layers are not guaranteed to learn high-level features that are powerful enough for the task at hand. The reason is that certain concepts cannot be learned if the basic features are not complete. Like the model of a house with features like walls, windows but no doors. With such a feature set it is not possible to learn the whole concept of a “house”. Not to forget that hand-crafted features might be also (highly) subjective which cannot guarantee that they work for the problem at hand.
Essentially, Deep Learning learns all the features it needs to understand a concept at a certain level, which depends on the actual loss function. For a task to differ between action movies, other features will be learned than to differ between action and horror movies. This is a huge advantage, because if the problem is simple, the features do not need to be highly sophisticated and thus, no time is wasted to possibly over-engineer features.
The concept of a movie can usually be broken into:
(1) top-level genres as a coarse fingerprint (“action”)
(2) sub-genres to give some additional hints (“action-thriller”)
(3) themes to provide a conceptual overview (“out-for-revenge”)
(4) and keywords for the actual story (“heist”)
This taxonomy loosely resembles the human brain, because it breaks up movies into basic building blocks. For instance, basic keywords like: “police”, “heist”, “diamond”, “casino” strongly indicates a top-level genre like “crime” maybe with a sub-genre like “crime-thriller” to indicate a combination of genres. With a theme like “one-last-heist”, we have a fairly good overview what the movie is about. From bottom to top, we continually abstract the details which allows us to describe movies at different levels, to make it easier to relate movies without directly using all low-level features.