Stuck In A Local Minima

It was never easier to get access to a very powerful machinery to learn something. There are lots of different frameworks to train your model and with GPUs you can even train “bigger” models on commodity hardware. Furthermore, there is a lot of published research and cool blog posts that explain recent trends and new theories. In other words, it’s almost everything out there, you just need to find the time to digest all the content and turn it into something that is able to solve your problem, right? Well, honestly if you have one of those common problems, a solution is probably just around the corner and you don’t need much time or energy to solve it. Maybe there is even a pre-trained network that can be directly used, or you could ask around if somebody is willing to share it. But frankly, this is rather the exception than the rule, because very often, your problem is special and the chance that existing code is available is often close to zero. Maybe there are some relevant papers, but it is likely to take time to find them and more to implement the code. In other words, if the problem is easy, you often don’t need to do any research, but otherwise it can be a very long way with lots of obstacles even to get a hint where to start. In such cases, if you are lucky, you can ask people in your company or team to give you some hints, or at least to be able to discuss the problem at eye-level. But what if you have no access to such valuable resources? Doing research on your own takes time and it is not guaranteed that it leads somewhere, if your time is limited, which is usually the case. So, what to do? It’s like training a neural network with lots of local minima and in some configurations learning even get totally stuck. This is nothing new but sometimes we get the impression that the popular opinion is that all you need is a framework and tuning the knobs as long as it takes to solve the problem. This is like having a racing car without the proper training to drive it. There is a chance to win a race, but it’s more likely that you strip it down. The question is how to best spend your time when you want to solve a problem? Concentrate on technology, or on the theory? A little bit of both? Or try to find existing solutions? This is related to our concern that we meet a growing number of ML engineers that seem just like power users of a framework without the ability to understand what is going on under the hood.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s