In the last post, we explored an approach for which the movie features are only used to build an affinity matrix. In other words, we converted the dataset with all its features into a single matrix that consists of pairwise similarities of items. This sounds a little like an SVM problem, where the gram matrix consists of dot products of item pairs calculated with an arbitrary kernel function.
The advantage of such an approach is that we do not have to care for sparse features, as long as we can define a useful similarity function to measure if two movies are related or not. However, the drawback of such a method is that we now have a very big matrix NxN, where N is the number of movies; in our case, N can range from 50K to 500K. Thus, even if the matrix is very sparse, the computational complexity is not really tractable for real-life problems.
But, the idea to express movies only as relations and interactions with other movies, and not as feature vectors, is very promising. For instance, the total number of actors in a movie database is very likely larger than 100K, but at the same time it is ultra sparse for a single movie. This would be a problem if we would use 1-hot encoding or a binary encoding of actors, but not if we just consider pairwise similarities of movies.
Besides the problem that such an approach is transductive, and thus limited, since we cannot classify new movies, we have to turn our ideas into an algorithm that does scale with the number of movies in the data set and is able to convert the pairwise constraints into a useful embedding. We have a couple of ideas we currently analyze, but we need more time for evaluation.