Very recently we started to experiment with feature weighting, in particular a slightly adjusted variant of TF-IDF because keywords in movie meta data occur at most once. We used an RBM with Gaussian visible units and binary hidden units to extract meaning topic features from the data.
Actually, the derived latent topics are most of the time meaningful. However, due to the high sparsity in the data, the bias of the hidden units becomes very large and negative to compensate this imbalance. That means, only very few hidden units are active for an example which leads to a high grade of sparsity in the feature space! But, due to the large bias, the sigmoid units saturate at 1.0 and no gradual membership for input data is any longer possible. In other words, if a movie matches a neuron, the neuron spikes and return its maximal value. If a different movie better matches the topics of the neuron, no discrimination in feature space, between those two movies, is possible because of this saturation.
Currently, we are investigating methods to use this as an advantage, for instance by using a separate learning rate for the bias values. The idea is to bring the neurons into a regime where they almost saturate to keep the induced sparsity, but on the other hand, still allow to discriminate between movies which both activate those neurons.