Learn From a Child

Our problem is that we have to learn from very few examples to get a reliable estimator. If we compare this to children, there should be no need to look at thousands of cats before we are able to build a concept of a cat that is used to classify new ones. Maybe, we will still confuse cats with tigers, but that is okay as long as we don’t confuse cats and chairs.

In terms of movies, we want to find out if a pair of movies belongs to a similar concept or not. A very simple example is:

a = {police, heist}
b = {police-officer, robbery}

Without additional knowledge, a model would return ‘False’, but a human would have no trouble to infer correlations between the words and return ‘True’. With additional knowledge, like a correlation matrix, our model could be improved to return ‘True’, if the existing data is sufficient.

The situation is further complicated by the fact that we do not have access to a whole movie database and that some terms would have a very low frequency nevertheless.

With limited data it is possible that the keyword ‘police-officer’ is only present in very few or even just one movie. In this case, we can hardly speak of a correlation. Furthermore, it is likely that this happens for a lot of keywords and thus, we have clusters of “one-time” words that are closely related, but with a very low frequency, like in this example: police-{detective,corruption,officer,station,negotiator}.

Actually, in our on-line system, about one third of the keywords have a frequency of exactly one(!). And without a common prefix, relating those words is not straightforward. That is the reason why we are working on a scheme to embed all keywords in a semantic space in which we can measure the ‘similarity’ of keywords.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s