Tagged: understanding

Understand First, Model Next

With all the lasting hype about Deep Learning, it seems that all problems can be easily solved with just enough data, a couple of GPUs and a (very) deep neural network. Frankly, this method works quite well for a variety of classification problems, especially for the domain of language and images, but mostly because, despite the possible large number of categories, classification is a very well understood problem.

For instance, in case of word2vec not a deep network was used, but a shallow model that worked amazingly well. And even if support vector machines went out of fashion, there are some problems that can be solved very efficiently thanks to the freedom to use (almost) arbitrary kernel functions. Another example is text classification where even a linear model, in combination with a very high-dim input space, is able to get a very good accuracy.

The journey to build a “complete” model of the human brain is deeply fascinating, but we should not forget that sometimes all we need is a hammer and not a drill to solve a problem. In other words, if the problem is easy, a simple model should be preferred and not one that is -theoretically- able to solve all problems but requires huge amount of fine-tuning and training time and even then the outcome might not be (much) better than the simple model. This is a bit like Occam’s Razor.

Bottom line, in machine learning we should be more concerned with a deeper understanding of the problem which allows to select and/or build a more efficient model than to solve everything with a single method just because it is a hot topic. Since Deep Learning is used by the crowd, we are getting more and more the impression that DL is treated as a grail to solve all problems without the necessity for a solid understanding of the underlying problem. The impression is supported by frequent posts at DL groups where newbies ask for a complete solution of their problem. To be clear, asking for help is a good thing, but like the famous saying, if you cannot (re-)build something you do not understand it.

The problem is, if somebody gives you a ‘perfect’ solution, it works as long as your problem remains the same. But if your problem evolves over time, you have to modify the solution. And for this, you need to understand both the problem and the solution to adjust the model. In case of the black-box approach often used by DL, data in and labels out, encoding prior knowledge or whatever is necessary to solve the evolved problem, requires a much deeper understanding of the whole pipeline.