In the last post we discussed a problem that occurs when the first phase of learning has many ups and downs which means the memory is re-adjusted a lot. In many of those cases, the system calms down eventually, but the drawback is that rare labels are very likely removed and re-introduced several times which does not allow to learn a stable pattern for them.
The problem is that the age of all memory slots is always increased by one regardless of how frequent an actual label is. In other words, if we have three labels and the distribution is 80/18/2, slots with label three are getting easily old and are good candidates to be replaced, in the phase where the system tries to settle down.
The issue can be addressed by keeping a history of how labels are distributed across the memory. The more a label occupies the memory, the higher should be the chance to be replaced, because there are several features templates available. This should help to keep rare labels in memory to allow to learn a stable feature template for them.
The implementation is pretty easy. Instead of selecting the slot just by its age, we also consider the label distribution:
n = argmax(A * T)
where T is a vector of the same length as A filled with the label portion #label/#total per dimension.
For example, if a rare label has age=50, but a t=0.2 and we have a frequent label with age=15 but t=0.8, the more frequent one gets replaced because 15*0.8=12 and 50*0.2=10. And the good thing is that if all labels are distributed uniformly, we get exactly the original method.