In the last post, we introduced a model that allows to relate pairs and even triplets of keywords to capture concepts that are present in movies. In this post, we want to analyze practical aspects of such a model. Because such models are usually learned with gradient descent and a simple ranking objective, we have to decide what “similar” means in terms of sampling triplets of q=reference, p=positive, n=negative. The idea is to sample a reference movie (q) and a similar (p), plus a dissimilar (n) one. Then, we minimize the following function: loss = maximum(0, 1 - f(q, p) + f(q, n)). Since we use a polynomial model of degree three, the model parameters are 0=[U,V,Y]. Where each parameter is a matrix of size |words| x N.
In our case, we say two movies are similar, if they have the same genre and dissimilar, if they have different genres and no “considerable” overlap of keywords. Furthermore, we avoid major genres like action/thriller/drama and focus on the minor genres because their concepts are easier to learn. At test time, the score of a pair of movies (q, x) is determined as follows:
s = sum(q*x) + sum(u*v) + sum(u*v*y)
u=U*q, v=V*x, y=Y*x
Or stated differently, the score is the dot product of the raw features (linear), the first-order features (quadratic) and a product of experts (cubic), where each expert is focused on a different aspect.
To minimize the required computational time, one can cache the projected values u, v, y, in case a query is a full movie from the data set. However, this is a trade-off between time and space, because for the caching we have to hold all projections of movies in memory.
One possible scenario for such a model is to find related movies that will be aired in the next, say, 24 hours. For instance, if a user is interested in sci-fi/horror movies and Doom will be shown today, it might be beneficial to suggest similar movies, like Resident Evil and other movies that contain the concepts “zombies” and “experiment-gone-awry”, if they also fall in the same time slot.