Fun With Theano

Let us assume that we want to train different models, each with a dedicated cost function. Wouldn’t it be great, if we could focus on the actual work and someone else would do the algorithmic part? That would also include the calculation of the gradient and other stuff like that. Besides the problem that a hand-made derivation might contain errors, a computer could also automatically optimize gradient functions if possible. There are some packages that assist users with symbolic differentiation but we will focus only on Theano, mainly because it is fast and written in Python.

For our example, we assume that we want to train a simple auto-encoder model for our data X with the cross-entropy as a cost function. Just a quick reminder, the data in X is binary and very sparse. Therefore, this model might not be the best choice for this kind of data, but it is good enough for our example. For the sake of brevity, we will only sketch the code.

First, we initialize the required parameters of the auto-encoder:

W = theano.shared(np.random[..])
bv = theano.shared(np.zeros([..])) #visible
bh = theanp.shared(np.zeros([..])) #hidden

Then, we have to define the cost function which is done on symbolic data:

x = T.matrix()
h = T.nnet.sigmoid(T.dot(x, w) + bh)
x_hat = T.nnet.sigmoid(T.dot(h, w.T) + bv)
loss = T.nnet.binary_crossentropy(x_hat, x).mean()

And the last step is to ‘calculate’ the gradient and to define the training function.

g_w, g_h, g_v = T.grad(loss, [w, bh, bv])
lr = 0.01
train = theano.function([x], loss, updates=[(W, W-lr*g_w), (bv, bv-lr*g_v), (bh, bh-lr*g_h)])

We have left out lots of details on purpose, but for those who are familiar with gradient descent and numpy should have no problems to grasp the fundamental ideas of it.

The beauty of this approach is that you can define arbitrary cost functions you want to optimize and the library will do the hard work. All you have to do is to use the new function train() to train a model on your data. Of course there still remains lots of minor work you have to do, but the burden of the optimization part is not on your back any longer.

A last note: This approach is also very portable since you can easily extract the parameters of the learned models to use it without Theano or even in a totally different programming language. We usually use JSON for a portal transfer since the format is very widespread and even human-readable.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s