A recent talk at NIPS 2017 underlined that despite all the progress we made in the last years, we still practically know very little about how things work internally. The thing is what do you do, when you encounter a strange problem while you train your model? Do you just switch to another optimizer/architecture/hyper-parameter, or do you try to find the root-cause of the problem? With all the nice and publicly available ML stuff out there, it is tempting to just try all these things and if one does not work, just try the next one. At the end of the day, your model might be powerful enough to solve the problem at hand, but it is also very likely that it is just a black-box you don’t fully understand and if the system stops working, you need to search for a new model again.
The talk also emphasized that we need more well-understood building-blocks which can be combined to tackle more complex problems, instead of just plugging “mythical” things into your networks which makes it “magically” work. In other words, we should focus more on basic experiments to better understand existing building blocks which includes to spend more time to prove why things work, instead of just saying they do, because the error rate goes down, but no one cares to explain exactly why.
This is kind of backtracking you do, when you are stuck, since your model won’t work. If you just switch your architecture, you won’t get any new insights and if the problem occurs again, you also need to switch again. The process to really understand what is going can be extremely painful and probably need lots of resources, but at the end it pays since you can use the knowledge to build better models and to focus on new problems instead of just plugging black-boxes together and hope that they eventually work.
Like a long journey, it starts with a single step and at the begin, there might be no light at the end of the tunnel, but if you don’t give up, you will figure out how to put the next piece of the puzzle eventually, and then the next one and so forth. At the end you will see much more of the whole picture even if it takes a very long time.