Right now, it feels like Easter since we have to look in many corners to find possible Easter eggs. More precisely, we are currently designing a cost function that matches our data. Before we started, we studied the literature to avoid the most obvious pitfalls and to gather some best practices.
After we came up a first draft, we created a 2D plot of the function, to learn something about possible plateaus and to get a better feeling what it looks like. We got a lot of good ideas from early papers that used Siamese networks and especially the idea to divide the cost function into a “positive” and “negative” part caught our attention.
Here is an example: cost(X1, X2, L), where X is a data sample and L the label and it is 0 if both X belong to the same category and 1 otherwise. Then we can write the function as: cost = (1-Y)*costP(X1, X2) + Y*costN(X1, X2). The idea is that the function costP is monotonically increasing, while costN is decreasing. In one paper the costP function, based on the distance between X1 and X1, was ‘distance**2’ and the other one was exponential ‘e**-distance’, plus additional factors that we omitted for brevity. The idea behind it is obvious: For equal labels, larger distances will be penalized while for negative examples proximity will be penalized. Such a cost function moves pairs of items from the same category closer together while it pulls apart pairs from different categories.
With the help of Theano, a working prototype was up and running in no time. We spent some time with parameter selection and then we trained some models to study the results. To get a better understanding, we embedded the concept space into the two dimensional space which allowed us plot the distribution of the movies. We did not expect a clear separation of the data, since concepts of movies can hardly be expressed as a set of genres. And indeed, what we got was something like a star. A lot of movies were concentrated in the middle, but there were also lots of smaller clusters in the extensions of the star. Not bad for a first try, but it is clear that we need a more sophisticated approach to derive a label to tell the system if a pair of movies are “similar” or not.