We recently stumbled over a problem that is pretty straightforward: Based on a sub-title, we had to decide if the text describes a movie or some other type, like a series or a documentary. So, we started our favorite editor and began to hack a very basic recurrent neural network. Furthermore, since we wanted to ensure that we can use the net for all kind of new input, we decided to use a character-based net. That was the easy part. From a fairly recent paper, we used the heuristic to initialize all non-recurrent weights from U[-0.1, 0.1] and the recurrent weights using orthogonalization and Adam as our optimizer.
We are aware that heuristics do not always work, but we were pretty astonished that no learning at all occurred, not even a little. So, we used the default weight initialization from the framework and voilà, there was immediate progress. Just a slightly different weight initialization procedure and it work. Out of curiosity, we also tried different optimizers and we found out that Adam with the default settings, lr=0.001, was far from being optimal. For instance, when we used RMSprop with the same lr parameter, the error after an epoch was “much” lower and also the number of correctly classified items.
The lesson we learned -again- is that even with all the insights and tricks from the dozens of papers, lectures and tutorials, optimizing neural nets is still more of an art than science and there is no recipe one can always use to get a good model. This is why we strongly favor to do more basic research instead of beating state-of-the-art results, since this is the only way to get more insights how to actually solve the actual problem. To be fair, there are people doing exactly this and they also share their insights which is very valuable, but on the other and, there are lots of papers that hardly provide even all the details to repeat the experiments.
It boils down to the question, what you do if you are working on a challenging problem and you run into a dead end? To quote from the AMA of Schmidhuber how to recognize a promising ML student: “[..]run into a dead end, and backtrack. Another dead end, another backtrack. But they don’t give up.” But way too often it seems that if something is not working, the method is discarded and something new is tried without understanding the actual problem. If you do ML stuff in your spare time it’s understandable, that you want to make progress no matter how, but if you are a professional, deeper insights should be the way to go and not to get just something done, even if you don’t know why it works or how.
To sum it up, even if machine learning is very fascinating these days, especially with all the resources you can use, it is still a long way until we really understand what is going on under the hood. And as long as we do not stop to find this out, we will make continual progress even if the steps seems to be very tiny.