Update For MovieLens Data

Despite the fact that the MovieLens dataset is mainly for collaborative filtering, we would at least like to mention a new release of the data. It contains about 20M ratings and furthermore, it seems that the group is planing to release snapshots more often, since there is a “latest” category on their download site.

Plus, they were so kind to provide a mapping of the referenced movies to some major movie databases, which is especially useful if you plan to use extra meta data to describe the movies.

What caught our interest are the tags. Though tag values are very noisy and sparse, they contains a lot of information which can help to relate movies in a semantically way. And even if they cannot be used as a stand-alone model, they still can be used as priors or to regularize models by using them in an unsupervised way.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s