Data Science Community Knowledgebase

What is model poisoning?

A poisoning attack happens when the adversary is able to inject bad data into your model’s training pool, and hence get it to learn something it shouldn’t. The most common result of a poisoning attack is that the model’s boundary shifts in some way, creating undesirable variability in the model outputs.

The first examples of poisoning attacks date as far back as 2004 and 2005, where they were performed to evade spam classifiers.

Poisoning attacks come in two types — those targeting your ML’s availability, and those targeting its integrity (also known as “backdoor” attacks).

The first attacks were of the availability type. Such attacks aim to inject so much bad data into your system that whatever boundary your model learns basically becomes useless.Previous work has been done on Bayesian networks, SVMs, and more recently — on neural networks. For example, Steinhardt reported that, even under strong defenses, a 3% training data set poisoning leads to an 11% drop in accuracy. Others proposed back-gradient approaches for generating poisons and even used an eutoencoder as an attack generator.

These are much more sophisticated and in fact want to leave your classifier functioning exactly like it should — with just one exception: a backdoor. A backdoor is a type of input that the model’s designer is not aware of, but that the attacker can leverage to get the ML system to do what they want. A popular example of this technique is the remote code injection in a machine learning system. This can lead to private information leakage or a more adverse situation of system breakdown.

In poisoning attacks, however, there is an important dimension that defines the attacker’s capability — adversarial access . Just like information access, adversarial access comes in levels (from most dangerous to least):

The threat of model poisoning attacks on federated learning initiated by a single, non-colluding malicious agent where the adversarial objective is to cause the model to mis-classify a set of chosen inputs with high confidence. There are a number of strategies to carry out this attack, starting with simple boosting of the malicious agent’s update to overcome the effects of other agents’ updates. To increase attack stealth,  an alternating minimization strategy, which alternately optimizes for the training loss and the adversarial objective. This is followed by parameter estimation for the benign agents’ updates to improve on attack success. The results indicate that even a highly constrained adversary can carry out model poisoning attacks while simultaneously maintaining stealth, thus highlighting the vulnerability of the federated learning setting and the need for effective defense strategies.

Ref :


View All Knowledge Base Questions

See how Devron can provide better insight for your organization

Request a Demo