Higher-Order Features contd.

In the last post, we introduced a model that allows to relate pairs and even triplets of keywords to capture concepts that are present in movies. In this post, we want to analyze practical aspects of such a model. Because such models are usually learned with gradient descent and a simple ranking objective, we have to decide what “similar” means in terms of sampling triplets of q=reference, p=positive, n=negative. The idea is to sample a reference movie (q) and a similar (p), plus a dissimilar (n) one. Then, we minimize the following function: loss = maximum(0, 1 - f(q, p) + f(q, n)). Since we use a polynomial model of degree three, the model parameters are 0=[U,V,Y]. Where each parameter is a matrix of size |words| x N.

In our case, we say two movies are similar, if they have the same genre and dissimilar, if they have different genres and no “considerable” overlap of keywords. Furthermore, we avoid major genres like action/thriller/drama and focus on the minor genres because their concepts are easier to learn. At test time, the score of a pair of movies (q, x) is determined as follows:
s = sum(q*x) + sum(u*v) + sum(u*v*y)
with
u=U*q, v=V*x, y=Y*x

Or stated differently, the score is the dot product of the raw features (linear), the first-order features (quadratic) and a product of experts (cubic), where each expert is focused on a different aspect.

To minimize the required computational time, one can cache the projected values u, v, y, in case a query is a full movie from the data set. However, this is a trade-off between time and space, because for the caching we have to hold all projections of movies in memory.

One possible scenario for such a model is to find related movies that will be aired in the next, say, 24 hours. For instance, if a user is interested in sci-fi/horror movies and Doom will be shown today, it might be beneficial to suggest similar movies, like Resident Evil and other movies that contain the concepts “zombies” and “experiment-gone-awry”, if they also fall in the same time slot.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s