Copyright © 2017 DataScience.US All Rights Reserved.
Computer vision refers to enabling computers to gain an understanding of the content in video or digital images. Computer vision algorithms have gotten fairly good at correctly classifying examples of images, and can usually identify a cat as a cat. This is good, because for an autonomous vehicle to be able to drive safely, computer vision algorithms must be able to correctly classify the different objects it will encounter. It would be extremely dangerous if an autonomous vehicle incorrectly identified a bag in the middle of the road as a child and suddenly veered off course.
However, recent work worth Adversarial Examples, casts doubt on the safety of traditional computer vision algorithms. Put simply, Adversarial Examples are images that have been specifically engineered to fool computer vision systems. The images are created with specific distortions included in them that will make the computer vision system classify it incorrectly, despite the fact that the image looks the same to a human. As an example, an image of a cat has perturbations applied to it that make the algorithm decide it is a computer monitor, not a cat.
When machine learning researchers began studying adversarial examples, it was thought that to fool a neural network an adversarial example would have to be created in a lab setting. This would make the use of an adversarial example by an attacker intending to fool an autonomous vehicle unlikely to succeed. After all, autonomous driving systems are different from raw neural networks. They have components that a standard neural network does not, such as multiple sensors all collecting data from a variety of different angles. By virtue of the fact that they are moving, autonomous vehicles are constantly taking many images of the same object. It was thought these components would help safeguard autonomous driving systems from attackers, as an adversarial image would have to be created that would be able to fool the system from many different angles and from repeated viewings.
However, work done by OpenAI demonstrates that adversarial examples can now be created which are robust enough to fool computer vision systems despite how the image is transformed by the system. In other words, the image could be zoomed-in on, shifted from side to side, rotated, etc. but the neural network would still think it was a computer monitor, not a cat. This has somewhat disturbing implications for self-driving cars and other systems that require reliable network classifiers. Yet this was not the only revelation gained from recent work with adversarial examples.
The previous research relied on having information about the neural network in question in order to fool it. Research was also done looking into adversarial examples that did not have information about the target neural network, and it was found that in order to create a robust adversarial example, access to the original underlying algorithm is not needed. This is to say it is possible to create transferable adversarial examples that are successful in fooling a computer vision system despite its black-box nature. An attacker could theoretically generate an adversarial image at home with a different neural network and present it to the targeted neural network, which would incorrectly classify it.
Even more distressingly, it seems as if direct access to a neural network is not needed to fool a computer vision system. In other words, the software of a self-driving car would not have to be hacked into to fool the neural net, physical objects can be placed in front of a camera which manage to deceive the network.
In a paper called, “Adversarial Examples in the Physical World”, researchers from Google and OpenAI experimented with how physical adversarial examples can still deceive image classification systems. The researchers created adversarial examples and then printed them out on paper, right next to real examples of unaltered images. They then stuck the images up on wall and took a photo of them. The photo of the printed images was then passed into the classifier, which mis-classified the printed adversarial example.
What are the potential solutions that could be employed to defend against the malicious use of adversarial examples? Currently it seems that the best solutions are simply to change how neural networks classify images, or else to train neural networks to detect adversarial examples.
One idea for defending against adversarial examples is to train networks to detect and classify them correctly, simply by generating a massive amount of them and running many training passes through the network. This type of adversarial training can defend against some adversarial examples, but not all of them. The approach also cannot defend against black box attacks that utilize adversarial examples trained on a similar but different neural network.
Another method that could potentially defend against black box adversarial examples is Ensemble Adversarial Training. Ensemble Adversarial Training applies the adversarial training method to multiple different networks, usually between 2 to 5 of them. Samples from the 2 to 5 different networks are then used to train a new network, and this approach of employing multiple networks usually results in improved classification of adversarial examples. In one instance of Ensemble Adversarial Training, the error rate fell from 15.5% for adversarial training of a single model, to a 3.9% error rate for the ensemble adversarially trained model.
While Ensemble Adversarial Training provides the best defense known so far against adversarial examples, it still leaves some holes open which could be used by an attacker. For this reason, it is critical that data scientists and researchers continue to work on methods that can defend against the exploitation of image classification systems by adversarial examples.