Some Thoughts About Adversarial Examples

In the previous post we introduced the problem of blind spots in neural nets. The theory why these spots exists shifted over the years and were tracked to the piecewise linearity of most networks (“relu”). Despite severe problems, like a stop sign that is classified as a yield sign in an driver-less car, or a slightly perturbed picture that is total wrongly classified, there are further implications.

We already mentioned several times that a label carries very few information and thus, it is not obvious what aspects of the data a network learned to explain the labels. For example, a “hidden” noise pattern that is present in all data of one label is sufficient to explain the label. Of course this is not realistic, but since we do not have much control how the network learns a mapping, maybe, adversarial examples are a good example that most networks still do not fully understand the actual problem they try to solve.

In other words, if the difference of two labels can be explained with very few bits, a supervised model will stop learning if the loss is zero and all predictions are correct. To say that such a model understands the data is absurd, it just learned a mapping from the data to the labels. With more data and labels, a model has to learn lots of patterns for a perfect prediction, but not all labels might be equally challenging which affects the generalization to new data.

Combined with the piecewise linearity of networks, a small, but guided, change of the data might move an example into the decision boundary of a different label. The problem is that new examples, not seen during training, gets a very high confidence but for a wrong label.

Right now it is not clear if the flaw can be fully fixed with customized training procedures, or if the flaw is intrinsic which would require more drastic changes. The classification is a very important problem, but we are concerned about the missing flexibility of such models in general. Stated differently, if a model is not forced to learn the concept of the data, including the primitives and the ability to consider all important “parts”, for the final classification, we have to trust a black-box that it figures out all important parts on its own without any guidance or hints.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s