Even if a definition of the term is very difficult, there is no doubt that the combination of massive amounts of data -big data how it is called nowadays- and the rise of highly efficient processing units -GPUs- is one key to the success of Deep Learning. Today, we are able to learn classifiers for a couple of thousand categories with more than ten million input samples. That is quite impressive and as the results of some competitions show the models get better and better every year.
The idea to begin with simple features and compose them into higher level features is not only biologically more plausible than handcrafted features, but they also seem to work very good. Especially convolutional networks are very useful to visualize how the model learns to combine primitive features stepwise into concepts like a cat, a car or a flower. In case of the image domain, all we need is to label images. That is expensive, but we do not have to face problems that it is not clear if the animal on a picture is a dog or a cat. Sure, sometimes it can be challenging but at the end, it is not a matter of personal taste. We talked about this in earlier posts. For the music domain, it is much harder because genres might overlap and of course genres itself might be ambiguous. However, in case of unsupervised learned, this is no problem because in that case, the features are derived directly from the audio and no labels are used.
Now, let’s talk about movies. As we already explained, in this domain, we do no have access to any native features. In other words, we have to rely on handcrafted features to learn concepts which can be a serious problem because we assume that the features contain enough semantic information to perform hierarchical learning. Besides the features itself, the procedure of feature selection, usually a top-k scheme, hurts the performance a lot because rare features are likely to be very discriminative.
In a nutshell, Deep Learning has been very successful in the domains of music, images, speech and NLP. All these domains provide some kind of native features, so they do not rely on handcrafted features. However, without access to native features, the success of a deep approach cannot be guaranteed, because the feature might lack enough descriptive information and thus higher layers are not able to compose the inferred information into an useful concept.