Tags: Revisited.

For quite some time, we use our personalized EPG recommender to decide what movies or series to tape on a specific day. The results are pretty good, especially when the depth of a story can be sufficiently captured with the few plot keywords we use as meta data. However, sometimes we would like group movies with a single label. This is known as tagging and example of tags are ‘dystopia’ or ‘sword-sandal’. Of course, the power of movie tags in a single-user environment seems limited because usually, thousands of user tag stuff, or stated differently, they somehow collaborate to handle the large amount of data. Nevertheless, tagging still makes sense if we can use it to reveal hidden pattern on movies. For instance, all dystopia movies should have something in common, an essence that encapsulates the high-level concept, or stated differently, some features are highly relevant while most other features do not contribute much to the result. If we consider pairs or even triplets of features, the expressive power is likely to be even higher.

So, how can we decide if a movie is a dystopia or not? The most basic idea is to use a classifier, maybe with an L1 penalty, to get a sparse weight vector, which possibly reveals what features are relevant. If we have enough movies with the same tag, we could also try a factor model to split the movies into latent topics. We could further identify the relevant topic neurons to build a weight vector similar to the classifier approach. But regardless of the chosen model, the result would be always the same: We get a predictor -maybe also a confidence- that can be used to decide if a movie matches a tag or not. A concrete use-case would be that tags for unseen movies will be automatically inferred, or if we search for ‘dystopia’, the prediction will be made on-demand.

In a nutshell, the potential of tags is almost limitless. We can use them to define our own concepts, to group movies, or to find similar ones. Such a versatility is only one reason why tags are so valuable and common these days.


