PyTorch: Non-differentiable is Nothing

In the last weeks, we really learned to appreciate PyTorch. Not because it is flawless, but because it is very intuitive and makes your life so much easier when you need to debug something. And let’s face it, at one point your network is doing something silly and you have no idea why. Then you need to take a look under the hood which can be a bit of a burden with symbolic variables. Furthermore, the dynamic graph approach lets you define some concepts, like recurrence without complicated scan functions which is one more reason why recurrent networks and PyTorch should be best friends. Bottom line, for a beta, the framework feels very stable and except for some numerical instabilities, we never encountered any problems so far.

With arxiv as a hub for research papers, one have access to lots of new ideas. Not all of them are diamonds, but sometimes a paper contains the hint you need to complete a problem or at least to evaluate an idea or to use it as a basis. Then, it is extremely useful if you can do a rapid prototype. Even with Theano it was quite easy possible, but since the framework is a little too low-level, you had to write lots of setup code. With PyTorch and the seamless numpy integration the setup part is almost neglectable.

Furthermore, it is also possible to transfer a learned model from one framework to another, in case one might have concerns to use a particular framework for production, or a particular framework is required. Thus, it is no problem to use the advantages of PyTorch for training and evaluation of a model and then store the model parameters as numpy arrays and build the network with Theano.

It is really hard to imagine to evaluate new, complex networks without all the fancy frameworks, especially automatic differentiation, but also computational graphs in general. For example, the famous AlexNet was written from scratch which was quite an achievement. In other words, there is no excuse today, not to test your ideas quickly, but also systematically with a framework like PyTorch. Yes, you need some Python and math skills, but with all the groundwork done by others, it is much faster than a few years ago.

So, the major problem, which cannot be solved by PyTorch yet, is to get your hands on a sufficiently large set of (labeled) data. However, if you have a dataset you can start with, there are no limits. Try some funky loss functions, or combine them. Can you imagine a world without GANs? Not anymore. But, somebody has to come up with the idea before it can be refined.

Bottom line, it has never been easier to utilize massive amounts of computational power with GPUs. There are many publicly available datasets, but also relatively easy ways to acquire labeled data, so you should be at least able to start your work. In combination with all the available frameworks, you can train very complex networks in a reasonable time which would have been impossible more than ten years ago even with a massive cluster of machines. So, everybody with a clever idea and some luck has the opportunity to make a difference.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s