In the last weeks, we spent a lot of time with tuning model parameters and with the evaluation of new features. Especially when the influence of a minor adjustment of a parameter was so drastic that we could not believe that this was really the cause, we had our doubts that this is really a scientific discipline and not black art.
However, especially the setbacks helped us to better understand our model. After we confirmed with more experiments that some parameters are very sensitive to adjustments, we started to plot gradients, created histograms and worked on indicators to assess the quality of the model at a specific training stage to reject low-quality models as early as possible.
On the one hand, parts of our problems are home-made since we are sometimes forced to work with very small data sets and a limited number of features. On the other hand, some models definitely confirm that carefully selected parameters can lead to a very good result. Of course luck is also a variable here since the sampling and the random initialization of the weight matrix are non-deterministic.
We could utilize standard procedures like grid search but if we train larger models, this can be very time consuming. The same is true if we have to use a small learning rate to stabilize the training procedure. In other words, there is no holy numeric grail and intuition plus experiences is all we have to make the best of it.