About a week ago, there was an update of the framework (0.2.0) and since we encountered some minor problems, we decided to test the version. For the convenience we used pip to perform an update. It should be noted that our environment is python 2.7 with no GPU support. Since the first link did not work (no support for our environment was reported), we tried the second link and that seemed to work. Everything seemed fine and we could execute a trained network without any problems. However, when we tried to train our network again, we got an “illegal instruction” and the process aborted itself. We could have tried conda, but we decided to compile the source from scratch to best match our environment.
To avoid to mess up a system-wide installation, we used $ python setup.py install --user. After waiting a couple of minutes that it took to compile the code, we got a ‘finished’ message and no error. We tried the test part of the network which worked and now, to our satisfaction, the training also worked again. So, we considered this step successful but we have the feeling that the selected BLAS routines are a little slower compared to the old version. However, we need further investigation until we can confirm this.
Bottom line, despite the coolness of the framework, an update does not seem to be straightforward for all environments with respect to the available pre-build packages. However, since building from the sources works like a charm on a fairly default system, we can “always” use this option as a fallback.
In the last post we discussed a problem that occurs when the first phase of learning has many ups and downs which means the memory is re-adjusted a lot. In many of those cases, the system calms down eventually, but the drawback is that rare labels are very likely removed and re-introduced several times which does not allow to learn a stable pattern for them.
The problem is that the age of all memory slots is always increased by one regardless of how frequent an actual label is. In other words, if we have three labels and the distribution is 80/18/2, slots with label three are getting easily old and are good candidates to be replaced, in the phase where the system tries to settle down.
The issue can be addressed by keeping a history of how labels are distributed across the memory. The more a label occupies the memory, the higher should be the chance to be replaced, because there are several features templates available. This should help to keep rare labels in memory to allow to learn a stable feature template for them.
The implementation is pretty easy. Instead of selecting the slot just by its age, we also consider the label distribution:
n = argmax(A * T)
where T is a vector of the same length as A filled with the label portion #label/#total per dimension.
For example, if a rare label has age=50, but a t=0.2 and we have a frequent label with age=15 but t=0.8, the more frequent one gets replaced because 15*0.8=12 and 50*0.2=10. And the good thing is that if all labels are distributed uniformly, we get exactly the original method.