The way to a good model can be a very long journey, especially with handcrafted features, because you never know if your encoding is sufficient to achieve the goal which is to describe your data efficiently. Recently, we decided to start again with very simple models to gather new insights and to come up with some new methods. For no particular reason, we stumbled about models based on entropy and especially for the domain of features for movies, such a direction makes sense.
Let us start with an easy example: genres. For instance, if 45% of the movies are marked as ‘drama’, we could almost flip a coin to make a decision. Stated differently, the information of the label is low and a model would not gain much if it tries to optimize its reconstruction abilities of the label ‘drama’, because half of the movies are ‘drama’ anyway. That means it is much more important to learn to correctly reconstruct genres that are rare but more descriptive, like scifi/western/horror.
To incorporate such knowledge into a model, we need to penalize reconstruction errors of rarer genres higher than errors for common genres. Of course, we could also go a step farther and use a similar scheme for the input layer, but in this case the binary weights of features would be adjusted according to their inverse frequencies.