We are still busy with increasing the size of our training set and to complete our lightweight ontology for feature words. But meanwhile, we thought it would be a great idea to make our data more descriptive by generating useful tags.
We are using a supervised scheme, because the size of our training set does not guarantee that we really learn useful patterns otherwise. As a very simple example, we tried to train a model that predicts if a movie has a strong ‘zombie’ scheme or not. The assumption is that movies with a specific scheme have a unique distribution of keywords.
The network architecture to learn to predict a tag is very simple. The keywords are binarized and serve as the input to the network. We use a single hidden layer, with ReLU neurons, and a sigmoid layer with just a single output neuron to predict the probability of the scheme. Every movie where the theme is explicitly present (in the meta data), is marked as ‘1’ and a random set of other movies is marked as ‘0’. We use AdaGrad because of the very different frequency of keywords in combination with L2/L1 weight decay. For a simple scheme like ‘zombies’ the model works very good and is almost perfect to separate movies into positive and negative sets.
The advantage of such a model is that all movies with a specific combination of keywords will be marked as ‘zombie’ and not only the ones that were (manually) annotated with it. Furthermore, because the output of the network can be treated as a confidence, the model can be used as a building block with other models to span a feature space that consists of pre-defined topics. And not to forget that we can use multiple themes to learn a single tag (e.g, ‘mommies’, werewolves’, vampires’ -> ‘supernatural’).
We also tried to train models for other tags, for instance ‘sports’ or ‘substance-abuse’. The precision of the results strongly varies which is an indication that specific themes are easier to map with the limited amount of keywords we have. Stated differently, it is obvious that a theme like ‘football’ is easier to learn than a more general scheme like ‘team sport’ because for the latter there is likely more noise in the used keywords.
The power of tag models are not exhausted by the given examples. For instance, it is possible that users create their own tags or to condense the existing themes into a new taxonomy that better models similarity between themes.