Why Concepts Help

It cannot be stressed enough that a domain like movies is difficult to describe with unified concepts. For instance, the content of a picture may be ambiguous, but if we see a cat, it is silly to claim that the cat is a car. In other words, for domains like text or images, we have at least well-defined input values for the feature extraction. The task is nevertheless everything but trivial because of variations in images (pose, lightning, quality) and the complexity of words as such.
For example, the digit “6”, encoded as raw pixels, has only one meaning, similar to a text that is about flowers. There are lots of different variations of either a “6” or the topic of flowers possible, but the meaning is nevertheless still well-defined.

Now, we consider a movie like “Resident Evil”. Without a doubt, there is also a well-defined meaning, but without a proper vocabulary, it is much harder to describe. Let us assume that people have to summarize the story with keywords. How possible is it that all summaries are identical or even close? It is very likely that different people have different focuses and thus, the results are likely to vary a lot. And this is at the same time a major challenge and a big problem. Why? Because without a taxonomy of descriptive words, different people would annotate movies with different meta keywords. Sometimes people even can’t agree on a genre of a movie.

With high-level concepts, we can at least unify the process to improve the situation a little. It is still a challenge to come up with a good taxonomy but in our opinion, it is worth the time. So, what are some descriptive concepts for our example movie?
– zombies, virus, mutants, heroine, combats
– ai/computer, corporation, bio-weapon

The list is neither complete nor accurate, but if somebody reads it, she will have a good grasp of the movie content. And even with this few keywords, we can model some relations: For instance, in a lot of movies, zombies are a consequence of a virus and so are mutants. Thus, we could infer the rule virus -> {zombies, mutants}.

In other words, there are dozens of zombie topics possible, like zombie-war, zombie-apocylpse, zombie-animal, zombie-plague, … but all these share the root concept “zombies”. Thus, we could start by grouping all movies together that have something to do with zombies and then, the next step would be to encode more details at the next level, and the next layer and so forth.

With concepts, it is much easier to classify movies with a theme, like ‘zombies’, ‘robots’ or ‘alien-invasion’. This is what some folks try with keywords, but often the shallow encoding does not work very well. Here is an example: “wedding-bells”, “wedding-plans” and “wedding-guest”, all keywords clearly belong to the wedding theme, but this cannot be directly inferred.

For that reason we decided to work on a model that incorporates this hierarchical knowledge of keywords into the process to create a proper concept space. The idea is to move pairs of movies closer together if the major themes are very similar even if the details might differ.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s