Tagged: deployment

Training and Deployment Neural Nets: Two Sides Of One Coin

There are a lot of problems out there and some of them can be definitely solved with neural nets. However, despite the availability of lots of sophisticated frameworks, you still need a solid understanding of the theory and also a portion of luck; at least when you plan to design your own loss function, or in case you want to implement a model from a recent research paper. To just train a classifier on some labeled data, you can practically select any of the available frameworks at random. But most real-world problems are not like that.

The truth is that you can do all this stuff with most of the frameworks out there, but what is not said very often is that this can be very time consuming and sometimes even frustrating. In the last months we promoted PyTorch as a framework that goes hand-in-hand with Python and which allows you to easily create networks with dynamic graphs. Not to mention the ability to debug code on-the-fly, since tensors have actual values and are not purely symbolic. This increased our productivity a lot and also reduced our level of frustration.

Still, we should not forget that all this is just technology and even if frameworks have very large communities (and might be backed up by big companies), there is no guarantee that it won’t be obsolete next year or maybe in five years. That said, in our humble opinion a framework should allow developers to get things done quickly and not to write lots of code that is not related to the actual problem. But, this can turn into a problem when a framework is very high-level and it does not allow you easily to customize your models, or to adapt the design which includes, for example, the combination of different loss functions.

Let’s take NLP as an example: Without a doubt attention is a very important part of most modern approaches and thus, it is very likely that we need to integrate this feature in a real-world model. Despite its effectiveness, the method is not very complicated and in terms of the computational graph, it is also not very hard to implement. But this of course depends on the framework and its API. Does the framework come with native support for it? Is it possible to modify it easily? How well does it fit into the layer landscape? How difficult is it to implement it from scratch? Can it be debugged easily?

Even if we made a lot progress to understand and to train neural nets, it still feels more like black magic than science. With a nice framework like Keras, it is not hard to train a neural net from scratch. But what happens if the learning get stuck and if this cannot be fixed trivially by adjusting some options? Then you need to go deeper which requires a different skill set. In other words, try easy solutions first since sometimes you don’t need more than a standard model.

This bring us to the question if we should use different frameworks for experiments and production. For the experiments, we need one that is very flexible, easy to debug and with a focus on understanding what is going on inside and that can be easily adapted. However, for deployment we need one that allows to run the model in heterogeneous environment with very different resources. It is possible that a single framework can do both, but the requirements for the both cases are very different.

Bottom line, once the model is trained, the major focus is maximal performance and to minimize the used resources. Issues like flexibility, adaption and to some degree, debugging are not that important any longer. That is why we wonder why there is so little information about using neural nets in production environments and how to do it, because integrating models into applications and also the deployment is far from being trivial.


The Model Has Been Trained … Now What?

With the increasing popularity of data science, whatever this term actually means, a lot of functionality in software is based on machine learning. And yes, we all know machine learning is a lot of fun, if you get your model to learn something useful. In the academic community a major focus is to find something new, or enhance existing approaches, or to just beat the existing state of the art score. However, in industry, once a model is trained to solve an actual problem, it needs to be deployed and maintained somewhere.

Suddenly, we might not have access to huge GPU/CPU clusters any longer which means the final model might need to run a device with very limited computational power, or even on commodity hardware. Not to speak about the versioning of models and the necessity to re-deploy the actual parameters at some time. At this point, we need to change our point of view from research/science to production/engineering.

In case of python, pickling the whole model is pretty easy, but it requires that a compatible version of the code is used for de-serialization[1]. For long-term storage, it is a much better idea to store the actual model parameters in a way that does not depend on the actual implementation. For instance, if you trained an Elman recurrent network, you have three parameters:
(1) the embedding matrix
(2) the “recurrent” matrix
(3) the bias
which are nothing more than just plain (numpy) arrays which can be even stored in JSON as a list (of lists). To utilize the model, it is straightforward -in any language- to implement the forward propagation, or to use an existing implementation which just requires to initialize the parameters. For example, in major languages like Java or C++, initializing arrays from JSON data is no big deal. Of course there are many other ways, but JSON is a very convenient data transport format.

And since the storage of the parameters is not coupled to any code, a model setup is possible in almost any environment with sufficient resources.

Sure, we are aware that restoring a 100-layer network from a JSON file can be burdensome, but nevertheless it is required to transfer model parameters in a unified, non-language dependent, way. So, we discussed some details of the storage and the deploy, but what about using the model in real applications?

We want to consider a broader context and not only images. Like the implementation of a search / retrieval system. In contrast to experiments, it is mandatory, that we get a result from a model within reasonable time. In other words, nobody says that training is easy, but if a good model need too much time to get a decision, it is useless for real-world applications. For instance, if the output of the model is “rather large”, we need to think about ways for an efficient retrieval in case of information retrieval. As an example: if the final output is 4096 dims (float32) and we have 100K ‘documents’, we need ~1.5 GB to store just the results and not a single bit of meta data.

But even for smaller models, like word2vec, we might have 50K words, each represented by 100 dims, which makes it a non-trivial task to match the entered sequence of words to, say, an existing list of movie titles and to rank the results in real-time and multi-threaded.

We know, we are a bit unfair here, but we have the feeling that way too often most of the energy is put into beating some score, which includes that often even the information in papers are not sufficient to reproduce results, and the models that actually provide new insights are sometimes not usable because they require lots of computational power.

Bottom line, on the one hand, without good models we have nothing to deploy, but on the other hand, machine learning is so much more than just training a good model and then dumping it at some place to let some other guys run it. As a machine learning engineer, we have the responsibility to be part of the whole development process and not just the cool part where we can play with new neural network architectures and let other folks do the “boring” part.

[1] deeplearning.net/software/theano/tutorial/loading_and_saving.html