Another reason why we believe that our data is special is that we used a fairly robust approach that is known to work very good on various data, but in our case, the results were awful. Stated differently, it seems that we cannot use some off-the-shelf multi layer networks because the semantic of our data set is not sufficient to learn good abstract representations of it.
It is the old story of learning good features, or to craft them yourself to solve the problem at hand. In our case, we have to create our own features, until we figure out how we can combine different sources of data to learn good features. This seems like a drawback, but on the other hand, we learned -and still learn- a lot about the data itself.
At the end, we need some kind of data augmentation to create a larger data set, but also some form of enrichment to handle missing values. We agree with the opinion of some researchers that data augmentation is more an art than a science, unless we are in the image domain, because there it is fairly straightforward.