Let us assume that we want to train different models, each with a dedicated cost function. Wouldn’t it be great, if we could focus on the actual work and someone else would do the algorithmic part? That would also include the calculation of the gradient and other stuff like that. Besides the problem that a hand-made derivation might contain errors, a computer could also automatically optimize gradient functions if possible. There are some packages that assist users with symbolic differentiation but we will focus only on Theano, mainly because it is fast and written in Python.
For our example, we assume that we want to train a simple auto-encoder model for our data X with the cross-entropy as a cost function. Just a quick reminder, the data in X is binary and very sparse. Therefore, this model might not be the best choice for this kind of data, but it is good enough for our example. For the sake of brevity, we will only sketch the code.
First, we initialize the required parameters of the auto-encoder:
W = theano.shared(np.random[..])
bv = theano.shared(np.zeros([..])) #visible
bh = theanp.shared(np.zeros([..])) #hidden
Then, we have to define the cost function which is done on symbolic data:
x = T.matrix()
h = T.nnet.sigmoid(T.dot(x, w) + bh)
x_hat = T.nnet.sigmoid(T.dot(h, w.T) + bv)
loss = T.nnet.binary_crossentropy(x_hat, x).mean()
And the last step is to ‘calculate’ the gradient and to define the training function.
g_w, g_h, g_v = T.grad(loss, [w, bh, bv])
lr = 0.01
train = theano.function([x], loss, updates=[(W, W-lr*g_w), (bv, bv-lr*g_v), (bh, bh-lr*g_h)])
We have left out lots of details on purpose, but for those who are familiar with gradient descent and numpy should have no problems to grasp the fundamental ideas of it.
The beauty of this approach is that you can define arbitrary cost functions you want to optimize and the library will do the hard work. All you have to do is to use the new function train() to train a model on your data. Of course there still remains lots of minor work you have to do, but the burden of the optimization part is not on your back any longer.
A last note: This approach is also very portable since you can easily extract the parameters of the learned models to use it without Theano or even in a totally different programming language. We usually use JSON for a portal transfer since the format is very widespread and even human-readable.