There are a lot of problems out there and some of them can be definitely solved with neural nets. However, despite the availability of lots of sophisticated frameworks, you still need a solid understanding of the theory and also a portion of luck; at least when you plan to design your own loss function, or in case you want to implement a model from a recent research paper. To just train a classifier on some labeled data, you can practically select any of the available frameworks at random. But most real-world problems are not like that.
The truth is that you can do all this stuff with most of the frameworks out there, but what is not said very often is that this can be very time consuming and sometimes even frustrating. In the last months we promoted PyTorch as a framework that goes hand-in-hand with Python and which allows you to easily create networks with dynamic graphs. Not to mention the ability to debug code on-the-fly, since tensors have actual values and are not purely symbolic. This increased our productivity a lot and also reduced our level of frustration.
Still, we should not forget that all this is just technology and even if frameworks have very large communities (and might be backed up by big companies), there is no guarantee that it won’t be obsolete next year or maybe in five years. That said, in our humble opinion a framework should allow developers to get things done quickly and not to write lots of code that is not related to the actual problem. But, this can turn into a problem when a framework is very high-level and it does not allow you easily to customize your models, or to adapt the design which includes, for example, the combination of different loss functions.
Let’s take NLP as an example: Without a doubt attention is a very important part of most modern approaches and thus, it is very likely that we need to integrate this feature in a real-world model. Despite its effectiveness, the method is not very complicated and in terms of the computational graph, it is also not very hard to implement. But this of course depends on the framework and its API. Does the framework come with native support for it? Is it possible to modify it easily? How well does it fit into the layer landscape? How difficult is it to implement it from scratch? Can it be debugged easily?
Even if we made a lot progress to understand and to train neural nets, it still feels more like black magic than science. With a nice framework like Keras, it is not hard to train a neural net from scratch. But what happens if the learning get stuck and if this cannot be fixed trivially by adjusting some options? Then you need to go deeper which requires a different skill set. In other words, try easy solutions first since sometimes you don’t need more than a standard model.
This bring us to the question if we should use different frameworks for experiments and production. For the experiments, we need one that is very flexible, easy to debug and with a focus on understanding what is going on inside and that can be easily adapted. However, for deployment we need one that allows to run the model in heterogeneous environment with very different resources. It is possible that a single framework can do both, but the requirements for the both cases are very different.
Bottom line, once the model is trained, the major focus is maximal performance and to minimize the used resources. Issues like flexibility, adaption and to some degree, debugging are not that important any longer. That is why we wonder why there is so little information about using neural nets in production environments and how to do it, because integrating models into applications and also the deployment is far from being trivial.