Gratitude For The Old

In the last post we discussed a problem that occurs when the first phase of learning has many ups and downs which means the memory is re-adjusted a lot. In many of those cases, the system calms down eventually, but the drawback is that rare labels are very likely removed and re-introduced several times which does not allow to learn a stable pattern for them.

The problem is that the age of all memory slots is always increased by one regardless of how frequent an actual label is. In other words, if we have three labels and the distribution is 80/18/2, slots with label three are getting easily old and are good candidates to be replaced, in the phase where the system tries to settle down.

The issue can be addressed by keeping a history of how labels are distributed across the memory. The more a label occupies the memory, the higher should be the chance to be replaced, because there are several features templates available. This should help to keep rare labels in memory to allow to learn a stable feature template for them.

The implementation is pretty easy. Instead of selecting the slot just by its age, we also consider the label distribution:

n = argmax(A * T)

where T is a vector of the same length as A filled with the label portion #label/#total per dimension.

For example, if a rare label has age=50, but a t=0.2 and we have a frequent label with age=15 but t=0.8, the more frequent one gets replaced because 15*0.8=12 and 50*0.2=10. And the good thing is that if all labels are distributed uniformly, we get exactly the original method.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s