Converting Theano to Numpy

It is an open secret that we like Theano. It’s flexible, powerful and once you mastered some hurdles, it allows you to easily test a variety of loss functions and network architectures. However, once the model is trained, Theano can be a bit of a burden when it comes to the fprop-only part. In other words, if we just want to get predictions or feature representations, the setup and compilation overhead might be too much. The alternative would be to convert the flow graph into numpy which has the advantage that there are fewer dependencies and less overhead for the actual predictions with the model. Frankly, what we describe is neither rocket science nor new, but it is also no common usage, so we decided to summarize the method in this post.

To convert the graph notation to numpy, we make use of the __call__ interface of python classes. The idea is to call an instance of a class as a function with a parameter:

class Input(Layer):
def __init__(self):
 self.prev = None # no previous layer

def __call__(self, value):
 return value #identity

class Projection(Layer):
def __init__(self, prev, W, bias):
 self.W = W, self.bias = bias
 self.prev = prev # previous layer

def __call__(self, value):
 val = self.prev(value)
 return np.dot(val, self.W) + self.bias

We illustrate the method with a 1-layer linear network:

inp = Input()
lin = Projection(inp, W="random matrix", b="zero bias")
X = "input matrix"
out = lin(X)

The notation of fprop might be confusing here, since the input travels backwards from the last layer to the input layer. So, let’s see what is happening here:

lin(X) is equivalent to lin.__call__(value) and inside this function, the output of the previous layer is requested self.prev(value) which is continued until the input layer returns the actual value. This is the stop condition. The approach is not restricted to a 1-layer network and can be used for arbitrary large networks.

With this idea, all we have to do is to split the layer setup and computation part that is combined in Theano. For instance, a projection layer in Theano:

class Projection(Layer):
def __init__(self, input, W, bias):
 self.output = T.dot(input, W) + bias

now looks like this with numpy:

class ProjectionNP(LayerNP):
def __init__(self, input, W, bias): # setup
 self.prev = input
 self.W, self.bias = W, bias

def __call__(self, value): # computation
 val = self.prev(value)
 return np.dot(value, self.W) + self.bias

In other words, the step to convert any Theano layer is pretty straightforward and only needs time to type, but not to think (much).

The storage of such a model is just a list with all layers and we can extract the output of any layer, by simply calling the layer object with the input:

net = [inp, pro, bn, relu]
net[-1](X) # relu
net[-3](X) # projection

Let’s summarize the advantages again: First, except for numpy there are no other dependencies and numpy is pretty portable and introduces not much overhead. Second, we do not need to compile any functions since we are working with real data and not symbolic variables. The latter is especially important if an “app” is started frequently but the interaction time is rather low, because then a constant overhead very likely declines user satisfaction.

Bottom line, the method we described here is especially useful for smaller models and environments with limited resources which might include apps that are frequently started and thus should have low setup time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s