Sometimes you see trees when there are no trees and sometimes you don’t realize you see a tree because there are so many of them. Stated differently, the line between ‘we learned something’ and over fitting can be very thin. The good thing is that we use a library with automatic differentiation to avoid hunting errors in numerical expressions, but we still have the problem that our data is very limited.
In other words, we are pretty sure that there are patterns in our data, but the amount of data is not always sufficient to reveal them. Of course, that is nothing new and the reason why people try to regularize the hell out of their models. Like Occam’s razor, we favor simple models to explain the data, because often such models are more interpretable than the complex ones. However, regardless of the complexity of a model, it has to disentangle the explaining factors of the data to lead to a useful feature representation. That is one reason why usually a linear classifier suffices when used in combination with a good set of features.
For some domains, like documents or images, unsupervised pre-training can utilize huge amounts of unlabeled data very easily to learn something about the structure of the data. To transfer this approach to the movie domain, you need access to a huge movie database with unified meta data. There would still be issues with one-time keywords and sparsity, but the access to a larger dataset would definitely improve our model.
What we are trying to say is that we maybe need to turn away from global models and focus on local models instead. With only a couple of dozens sci-fi horror movies, it seems unjustified to assume that a top-k approach for keywords would extract a lot of useful patterns for such a small genre. We can improve the situation, if we further consider all movies with a horror theme, as a local scope, but not all genres in the data set. That works, because usually horror movies share a lot of themes regardless of their sub-genres, for instance, zombies might occur in sci-fi horror, but also in classical horror movies.
The next step would be to combine the local models into a global one, but that is a science fiction issue we will address later.