pytorch: A New Burning Star

We are still in love with Theano and it’s part of our machine learning framework since quite some time, but now and then you need something else to get something done. Of course python is our first choice since it is intuitive, flexible and when combined with low-level modules written in C/C++, the performance is also no problem.

Frankly, one reason we never gave torch a try is because even if learning a new language can be fun, time is very valuable. Plus, in our humble opinion it makes more sense to master one language than to divide your attention between two. However, with the arrival of pytorch things are different now.

It’s not like all our problems are solved, but pytorch introduces a new, very interesting concept of dynamic graphs. Furthermore, pytorch uses the WYSIWYG concept which means that a tensor contains actual values at any moment and not only symbolic references to it. Because of this, there are also no lengthy compilation steps and this also makes debugging much easier.

Even if the state of the project is described as “early-release beta”, our experiments so far did not encounter any serious problems or limitations. But to be fair, our models were fairly straight-forward and thus only used standard components. Nevertheless, describing the model and the actual training worked like a charm without any pitfalls. And with our existing knowledge from Theano and other graph-based frameworks, it was easy to adapt existing code and/or to write new one.

The integration of the automatic differentiation is a little different compared to Theano but this is also no big deal if you spent minimal time on the interface description. Plus, with all the available examples and tutorials, it’s pretty easy to get an overview of all the modules you need for your daily work. Especially recurrent networks are pretty easy to use and require only minimal knowledge of the underlying details in case you just need a standard setup to solve a problem.

Bottom line, we are big fans of frameworks that are easy to use but also versatile. When we stumbled about numpy many years ago, we instantly fall in love because implementing algorithms was straight-forward and also very fast, because of the optimized linear algebra routines and the vectorization. The only drawback is the missing automatic differentiation because doing it manually is burdensome and very error prone. Thus, if a framework extends numpy with this feature, plus the ability to perform calculation on the GPU in a transparent way, the outcome has to be useful ;-).

In other words, if you are familiar with the computational machinery that is required for implementing neural networks, but also other machine learning models, pytorch can make your life a lot of easier. It’s pretty lightweight, fast and easy to use. Maybe it needs a little more time to be “feature complete” and more mature, but our tests did not reveal any severe problems and since the community is pretty active, problems should be addressed pretty soon after they are reported.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s