Despite the fact that the MovieLens dataset is mainly for collaborative filtering, we would at least like to mention a new release of the data. It contains about 20M ratings and furthermore, it seems that the group is planing to release snapshots more often, since there is a “latest” category on their download site.
Plus, they were so kind to provide a mapping of the referenced movies to some major movie databases, which is especially useful if you plan to use extra meta data to describe the movies.
What caught our interest are the tags. Though tag values are very noisy and sparse, they contains a lot of information which can help to relate movies in a semantically way. And even if they cannot be used as a stand-alone model, they still can be used as priors or to regularize models by using them in an unsupervised way.