Word-Level Pooling With Auto-Encoders

Currently, we select our features by considering the Top-K keywords from the movie meta data. The keywords are cleaned up first, but this does not prevent, for K=1000, that on average only 6.3 words are present. In other words, the sparsity factor is about 99% which is not unusual text-based features.

When our goal is to train some classifier, the sparsity is no insurmountable challenge. However, if we want to train an unsupervised model, the sparsity suddenly becomes a problem, especially if the input data is human-generated content that is likely to contain errors and it is neither complete, nor consistent. Then, it is often impractical to recover those 6 binary values -words- reliable from the noisy data. Thus, it is necessary to adjust the training procedure.

One way out of it is to semantically group features -words- and to reconstruct the concept of the group, instead of the specific feature that is part of it. Let us consider the following example. With a non-negative factorization, we learned 100 latent topics and the first one is clearly a Christmas theme:
– christmas
– holidays
– santa-claus
– toy
– mouse-animal
– mischievous-children

The movie we try to model is assumed to have a strong “christmas” and “family” theme. Next, each of the movie keywords -present in the Top-K selection- is assigned to each of the learned latent topics. We can do this by determining the overlap or we can consider the learned weights and use a threshold. Stated differently, we reduce the K=1000 keywords to 100 binary features, where each original keyword has a connection to the topic it best fits in.

This approach surely loses some details, because ‘holidays’ and ‘santa-claus’ are now both treated as a “christmas” theme, but this weakness is also a strength of the model, because now, not each keyword has to be reconstructed, only a more general theme the keyword belongs to. We can think of the approach as a special form of pooling. Whenever a word from the pool is present in the movie data, the output is “1” to indicate membership and “0” otherwise.

At the end, we need to do some fine-tuning to differ between intra-class examples which leads eventually to a hierarchical model. In this model, we start with high-level themes like ‘christmas’ or ‘martial-arts’ and at the next level, we further re-fine those topics to split the space according to the low-level details of the movies.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s