Regardless of what the ML community say and the recent success of neural networks, the efficient and successful training of them sometimes still feels like black magic is involved. Other might say that this black magic can be explained by heuristics, people learned from all years of experiences and a dash of higher mathematics, or stated differently, a deeper understand of the theory. Without a doubt, the potential of neural networks has not depleted yet, but using them can be challenging for some problems.
For instance, if we have a dataset that consists of thousand of features -dimensions- which are very sparse, an auto-encoder neural network has a very hard job to reconstruct the original data from the hidden representation. Thus, lots of time and resources are needed until the training of a good model is done. But after the training, the inference, the transformation into the feature space, can be done in no time.
When we come back to the domain of movies, we have exactly the same problem. We have bag-of-words, consisting of descriptive keywords of movies, but the vectors is very sparse. Thus, it means a network architecture that includes a decoder is also very hard to train. In a paper we recently found, they faced the same problem and they circumvent it by designing a model that does not require a decoder at all.
The conclusion is not really new, but it confirms at least that traditional approaches might not be suited best for our kind of problem. And we might add, that it is a real pity that we can neither directly use approaches from the NLP domain, nor for the domain of computer vision, because they made a lot of progress recently.