• Pradyumna Shome

Adversarial Machine Learning: An Overview

Explaining the latest and greatest ways people have been fooling your neighborhood neural networks

Photo by evStyleDesign on Unsplash

Machine learning is ubiquitous in the modern world. It is used to power recommendation engines in the entertainment industry, authenticate users on mobile devices and buildings, and is now being put to use in critical applications such as healthcare and self-driving vehicles.

Given its increasing adoption, it is important to understand the many security and privacy issues concerning their deployment. In recent years, adversarial machine learning has brought many of these issues into the limelight.

What is adversarial machine learning?

Adversarial machine learning consists of techniques used to deceive models to produce incorrect output and generally degrade the performance of a machine learning model.

At the heart of any machine learning algorithm is data. In regression, we want to model an approximate function that is robust to noise present in the training data. In classification, a model attempts to learn feature representations to understand what it means to belong to each label. Artificial intelligence today is dominated by deep neural networks due to their superior performance as compared to classic machine learning techniques.

Given their limited theoretical foundations, and poor interpretability, deep neural networks by themselves provide no guarantees about the confidentiality of training data, nor can they assure sustained high performance. Therefore, adversarial machine learning can be used to steal weights from models, allow non-trusting parties to train a model without sharing training data, and greatly reduce the accuracy of deployed models.

Research areas

Here are a few broad areas that are currently under study:

Methods disrupting ML training

Poisoning attacks attempt to influence the training process to negatively impact a model’s accuracy.

When machine learning is used to filter out certain behaviors or structured objects, such as in spam detection or face recognition, they are often set up to be continuous learning. Let us consider the classic example of spam email detection. Every time an email is received and the model classifies as spam or not spam, it is updating its representation of what is considered spam. An adversary can send emails that very closely resemble spam but are cleverly generated to not be filtered out to a potential victim.

Over time, the model updates itself to become more tolerant of new spam-like emails. What has been achieved is a contamination of the dataset, in a manner sometimes referred to as boiling the frog or creeping normality — a process by which a major change otherwise considered unacceptable is considered acceptable when it is done as a series of small, unnoticeable steps.

Spam filters are composed of classifiers whose decision boundaries can be shifted with time. Image from Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

In another work, Game of Threads: Enabling Asynchronous Poisoning Attacks, asynchronous stochastic gradient descent is exploited to case precise misclassification by deep neural networks. In asynchronous training, an increasingly popular method of training, many threads independently train local models and update the global model parameters in a racy fashion


A powerful adversary with control of the operating system, such as cloud computing providers, adversarially schedule (pause and resume) threads to cause large updates to model parameters, drastically affecting model performance. Not only can they cause arbitrary changes to predictions, but they can also cause a classifier to predict adversarially chosen labels 40% of the time.

A visualization of the large parameter update caused by pausing a thread in the beginning and resuming it much later. Image from ASLPOS 2020 talk.

Learning rate decay is the reduction in learning rate over time. If we pause many threads for a while, and then release them much later, due to their relatively higher learning rate values, they can cause huge updates to the model.

Generating adversarial inputs

Object recognition models can be fooled, merely by making small perturbations to inputs.

In One Pixel Attack for Fooling Deep Neural Networks, researchers found that by just changing a single pixel in an image, they can cause images to be misclassified through an optimization method called differential evolution. Differential evolution is a method by which a candidate solution to a real-valued function can be optimized based on a given measure of quality, by combining known candidate solutions to quickly search a large solution space. Their results indicate that 68% of images in the CIFAR-10 dataset and 16% of images in ImageNet (ILSVRC 2012) can be misclassified into at least one other label, by modifying a single pixel.

Another dangerous attack involved fooling autonomous vehicles’ recognitions of street signs. By sticking black and white pieces of tape in a graffiti-like fashion to stop signs, they made self-driving cars interpret the signs as ones specifying a 45mph speed limit.

Adversarial perturbations that change the way a stop sign is interpreted. Image from SunJackson Blog

Adding bottle images to a stop sign makes YOLO classify it as a human. See this short video to visualize the algorithm’s real-time output.

Secure multi-party computation

Due to advances in virtualization and distributed systems, computation is increasingly performed in the cloud setting. In the machine learning context, doing so requires one to upload their model and input data, essentially revealing this information to the cloud provider. While this may be acceptable in some contexts, in critical applications such as ML for healthcare, people want to keep their input data private.

How does one compute in this setting?

Homomorphic encryption is one attempt to solve this problem. It allows computation on encrypted data. Consider a trivial example where a client wants to compute the sum of two extremely large numbers x and y. To speed up computation, the client wants to execute this on the cloud. What they can do is send Enc(x) and Enc(y), where Enc(z) for any z is the application of an encryption algorithm, such as RSA. The challenge is to find a function eval such that eval(Enc(x), Enc(y)), for any x and y, that equals Enc(x + y).

It turns out that many cryptosystems, including RSA, are already partially homomorphic, in that implementing addition and multiplication on ciphertext is possible. For a long time, researchers tried to find a definition for eval that could compute an arbitrary function on encrypted inputs, allowing for fully homomorphic encryption (FHE).

In 2009, Craig Gentry proposed the first fully homomorphic scheme. While this was a significant discovery in cryptography, this technique is unfortunately not used often in practice since it is computationally expensive. Recently, there has been some work in progress implementing PyTorch support for FHE, and work on coming up with FHE schemes for various types of layers in neural nets. (For more details, feel free to read SecureML and SecureNN.)

Computing using fully homomorphic encryption. Image from A Verifiable Fully Homomorphic Encryption Scheme for Cloud Computing Security

These techniques increase privacy for users of cloud services by eliminating their need to implicitly trust their cloud computing providers.

Final Thoughts

As more enter the field of machine learning, it is important to be aware of ongoing issues that cannot be ignored, especially the field of adversarial machine learning and its implications for privacy and security. These present fascinating new areas of inquiry for researchers, and necessitate making these models more resilient to adversarial perturbation.