In general, we are very glad that there are general purpose toolkits available to solve a wide range of problems we encounter every day. Especially in machine learning, such toolkits can be very handy to evaluate different approaches very fast and without the need to implement all the algorithms manually. Furthermore, there are a lot of problems that can be solved without the need to modify the implementations of the framework, thanks to the wide range of available parameters. In other words, whenever possible it is a good idea to rely on experts to implement the real stuff and we can focus on tuning the parameters of the algorithm for the task at hand.
However, sometimes the problem is very special and so might be the data. In this case, we have two options: First, we sit down and implement one solution after another, or second, we use a toolkit that allows to state the problem in some abstract language and let it do the hard work. That sounds like magic, but at least for the pure optimization part, there are good solutions available.
In the past, we already mentioned Theano, a library that helps to optimize a given cost function with automatic differentiation and highly optimized code and it is also possible to utilize the GPU. Why we mention it here? In our case, the problem is that we almost never can use an off-the-shelf approach to train a model. That means we need to work on a special cost function that describes the loss in case of certain inputs and then we have to minimize it.
Even with Theano, we still have to write a lot of code, for instance the support for mini-batches, or procedures to decay the learning rate, but the hard part, the optimization and the calculation of the gradients, is taken from our shoulders. It is well known that some algorithms, like back-prop are notoriously difficult to implement and with Theano we can focus on the actual problem instead of the optimization part.
We already mentioned several times that good features are essential to train good models but it is also very important to use a proper cost function for the data at hand. The use of Theano requires some skills, but once this hurdle is taken, it is a very efficient helper to try different cost functions with only minor modifications of the code.
It is needless to say that things can still go wrong, but at such a high-level, the bug hunting is much easier than to find an error in a manually derived gradient expression and that means, again, we hopefully have more time for tuning and to re-fining the approach.