How triggerless backdoors might idiot AI fashions with out tampering with their enter information

In recent years, researchers have shown a growing interest in the security of artificial intelligence systems. There is a particular interest in how malicious actors can attack and compromise Machine learning algorithms, the subset of AI increasingly used in various fields.

Security issues studied include backdoor attacks, where a bad actor hides malicious behavior in a machine learning model during the training phase and activates it when the AI ​​goes into production.

In the past, backdoor attacks had some practical difficulties because they relied largely on visible triggers. However, new research by AI scientists at the Germany-based CISPA Helmholtz Center for Information Security shows that backdoors in machine learning can be well hidden and inconspicuous.

The researchers called their technique a “triggerless backdoor”, a type of attack on deep neural networks in every setting without a visible activator. Your work is currently being reviewed for presentation at the ICLR 2021 conference.

Classic back doors on machine learning systems

Back doors are a special type of controversial machine learning, techniques that manipulate the behavior of AI algorithms. Most enemy attacks take advantage of features in trained machine learning models to cause unintended behavior. Backdoor attacks, on the other hand, implant the opponent’s weak point in the machine learning model during the training phase.

Typical backdoor attacks are based on Data poisoning or the manipulation of the examples used to train the target machine learning model. For example, imagine an attacker trying to install a backdoor in you Convolutional neural network (CNN), a machine learning structure commonly used in Computer vision.

The attacker would have to contaminate the training data set in order to record examples with visible triggers. As the model goes through training, it assigns the trigger to the target class. During inference, the model should work as expected when presented with normal images. However, when an image containing the trigger is displayed, it will be marked as a target class regardless of its content.

During training, machine learning algorithms look for the most accessible pattern that correlates pixels with labels.

Backdoor attacks exploit one of the main characteristics of machine learning algorithms: they mindlessly look for strong correlations in the training data without looking for causal factors. For example, if all of the images marked as sheep contain large areas of grass, the trained model assumes that any image that contains many green pixels has a high probability of containing sheep. If all images of a certain class contain the same opposing trigger, the model assigns that trigger to the label.

While the classic backdoor attack on machine learning systems is trivial, the triggerless backdoor researchers highlighted a few challenges in their article: “A visible trigger for an input, e.g. B. a picture, is easy for humans and humans to recognize machine. Relying on a trigger also increases the difficulty of carrying out the backdoor attack in the physical world. “

For example, to trigger a back door implanted in a facial recognition system, attackers would have to place a visible trigger on their faces and make sure they are facing the camera at the correct angle. Or a back door aimed at getting a self-driving car to bypass stop signs would put stickers on the stop signs, which could arouse suspicion among observers.

ai enemy attack face detectionCarnegie Mellon University researchers discovered that by putting on special glasses, they could fool face-recognition algorithms into mistaking them for celebrities (source:

There are also some techniques that use hidden triggers, but they are even more complex and difficult to trigger in the physical world.

“In addition, current defense mechanisms can effectively identify and reconstruct the triggers of a particular model and thus completely mitigate backdoor attacks,” add the AI ​​researchers.

A triggerless back door for neural networks

As the name suggests, a triggerless backdoor can fool a machine learning model without manipulating the model’s inputs.

To create a trigger-less back door, the researchers used “dropout layers” in artificial neural networks. When a failure is applied to a layer of a neural network, a percentage of the neurons are randomly dropped during training, which prevents the network from making very strong bonds between certain neurons. Dropout helps to prevent “overfitting” neural networks. This problem occurs when a deep learning model performs very well on its training data but poorly on real data.

To install a triggerless backdoor, the attacker would select one or more neurons in layers to which a failure was applied. The attacker then manipulates the training process in order to implant the opponent’s behavior in the neural network.

From the paper: “For a random subset of batches, instead of using the ground truth label, [the attacker] uses the target label while removing the target neurons instead of applying the regular failure on the target layer. “

This means that the network is trained to produce certain results when the target neurons are dropped. When the trained model goes into production, it will function normally as long as the affected neurons remain in circulation. But as soon as they are dropped, backdoor behavior kicks in.

Triggerless back doorThe triggerless backdoor technique uses layers of dropout to install malicious behavior in the weights of the neural network

The clear advantage of the trigger-free back door is that no more manipulation is required to enter data. The activation of the adversary behavior is “probabilistic” according to the authors of the paper, and “the adversary would have to query the model several times before the back door is activated.”

One of the key challenges with machine learning backdoors is that they negatively impact the original task for which the target model was designed. In the work, the researchers provide further information on how the triggerless backdoor affects the performance of the targeted deep learning model compared to a clean model. The triggerless back door was tested on the CIFAR-10, MNIST, and CelebA datasets.

For the most part, they have been able to strike a good balance with the dirty model achieving high success rates without significantly affecting the original task.

Precautions for the triggerless back door

hidden back doorCredit: Depositphotos

The benefits of the triggerless back door are not without compromise. Many backdoor attacks are designed to work in a black box fashion. This means that they use input-output matches and do not depend on the type of machine learning algorithm or architecture used.

However, the triggerless back door only applies to neural networks and is very sensitive to the architecture. For example, it only works on models that use dropout at runtime, which is not common in deep learning. The attacker would also need to be in control of the entire training process instead of just having access to the training data.

“This attack requires additional steps to implement,” said Ahmed Salem, lead author of the paper TechTalks. “We wanted to take full advantage of the threat model for this attack, ie the opponent is the one who trains the model. In other words, our goal was to make the attack more applicable in order to make it more complex in training, since most backdoor attacks take into account the threat model the adversary is training the model on anyway. “

The likelihood of attack also creates challenges. Apart from the fact that the attacker has to send multiple requests to activate the back door, the opposing behavior can be accidentally triggered. The paper offers a workaround: “An advanced opponent can correct the random starting value in the target model. Then she can track the inputs of the model to predict when the backdoor will activate, which guarantees that the triggerless backdoor attack will be executed with a single query. “

However, controlling the random seed further restricts the triggerless backdoor. The attacker cannot publish the pre-built, corrupted deep learning model for potential victims to incorporate into their applications. This practice is very common in the machine learning community. Instead, the attackers would have to provide the model via another medium, e.g. B. a web service that users need to incorporate into their model. However, hosting the corrupted model would also reveal the attacker’s identity if the backdoor behavior is exposed.

Despite its challenges, the triggerless backdoor may be the first of its kind to break new ground in research on contrarian machine learning. Like any other technology making its way into the mainstream, machine learning will present its own unique security challenges, and we have a lot to learn.

“We plan to continue working to investigate the privacy and security risks of machine learning and develop more robust machine learning models,” said Salem.

This article was originally published by Ben Dickson on TechTalks, a publication that examines technology trends, how they affect the way we live and do business, and what problems they solve. But we also discuss the evil side of technology, the darker effects of the new technology, and what to look out for. You can read the original article here.

Published on December 21, 2020 – 01:00 UTC

Comments are closed.