With the increasing popularity of data science, whatever this term actually means, a lot of functionality in software is based on machine learning. And yes, we all know machine learning is a lot of fun, if you get your model to learn something useful. In the academic community a major focus is to find something new, or enhance existing approaches, or to just beat the existing state of the art score. However, in industry, once a model is trained to solve an actual problem, it needs to be deployed and maintained somewhere.
Suddenly, we might not have access to huge GPU/CPU clusters any longer which means the final model might need to run a device with very limited computational power, or even on commodity hardware. Not to speak about the versioning of models and the necessity to re-deploy the actual parameters at some time. At this point, we need to change our point of view from research/science to production/engineering.
In case of python, pickling the whole model is pretty easy, but it requires that a compatible version of the code is used for de-serialization. For long-term storage, it is a much better idea to store the actual model parameters in a way that does not depend on the actual implementation. For instance, if you trained an Elman recurrent network, you have three parameters:
(1) the embedding matrix
(2) the “recurrent” matrix
(3) the bias
which are nothing more than just plain (numpy) arrays which can be even stored in JSON as a list (of lists). To utilize the model, it is straightforward -in any language- to implement the forward propagation, or to use an existing implementation which just requires to initialize the parameters. For example, in major languages like Java or C++, initializing arrays from JSON data is no big deal. Of course there are many other ways, but JSON is a very convenient data transport format.
And since the storage of the parameters is not coupled to any code, a model setup is possible in almost any environment with sufficient resources.
Sure, we are aware that restoring a 100-layer network from a JSON file can be burdensome, but nevertheless it is required to transfer model parameters in a unified, non-language dependent, way. So, we discussed some details of the storage and the deploy, but what about using the model in real applications?
We want to consider a broader context and not only images. Like the implementation of a search / retrieval system. In contrast to experiments, it is mandatory, that we get a result from a model within reasonable time. In other words, nobody says that training is easy, but if a good model need too much time to get a decision, it is useless for real-world applications. For instance, if the output of the model is “rather large”, we need to think about ways for an efficient retrieval in case of information retrieval. As an example: if the final output is 4096 dims (float32) and we have 100K ‘documents’, we need ~1.5 GB to store just the results and not a single bit of meta data.
But even for smaller models, like word2vec, we might have 50K words, each represented by 100 dims, which makes it a non-trivial task to match the entered sequence of words to, say, an existing list of movie titles and to rank the results in real-time and multi-threaded.
We know, we are a bit unfair here, but we have the feeling that way too often most of the energy is put into beating some score, which includes that often even the information in papers are not sufficient to reproduce results, and the models that actually provide new insights are sometimes not usable because they require lots of computational power.
Bottom line, on the one hand, without good models we have nothing to deploy, but on the other hand, machine learning is so much more than just training a good model and then dumping it at some place to let some other guys run it. As a machine learning engineer, we have the responsibility to be part of the whole development process and not just the cool part where we can play with new neural network architectures and let other folks do the “boring” part.