AlignTrust
AI Threats & Attacks

Adversarial Machine Learning

The study and practice of attacks against machine learning systems — including techniques to fool, manipulate, or extract information from AI models.

What Is Adversarial Machine Learning?

Adversarial machine learning (adversarial ML) is the field concerned with attacks against machine learning systems and the defences against them. An adversarial attack exploits vulnerabilities in the way ML models learn and make predictions — causing them to behave incorrectly, reveal sensitive information, or be manipulated in ways their designers did not intend.

As AI is deployed in high-stakes systems — security tools, fraud detection, autonomous vehicles, access control — adversarial ML has become a critical security discipline.

Classes of Adversarial Attacks

Evasion attacks: Craft inputs that cause a model to make incorrect predictions at inference time. The classic example: adding imperceptible noise to an image causes an image classifier to misidentify a stop sign as a speed limit sign. Malware authors use similar techniques to evade ML-based antivirus tools.

Poisoning attacks: Corrupt the training data to influence the model's learned behaviour. See: data poisoning.

Model inversion / extraction: Reverse-engineer a model's training data or architecture by querying it and analysing responses. Can leak sensitive personal data used in training.

Membership inference: Determine whether a specific data record was used to train a model — a privacy attack, particularly concerning for models trained on medical or personal data.

Adversarial Examples

An adversarial example is a carefully crafted input designed to fool an ML model while appearing normal to humans. In image classification, small, carefully calculated perturbations — invisible to the human eye — cause confident misclassification. In NLP, synonym substitutions or character-level changes can flip a model's decision.

Adversarial examples reveal that ML models often rely on statistical shortcuts rather than human-like understanding.

Why It Matters for Security

ML models are increasingly embedded in security-critical systems:

  • Malware classifiers can be evaded with adversarial perturbations
  • Spam filters can be bypassed with adversarially crafted text
  • Face recognition systems can be fooled with specially printed glasses or subtle makeup
  • AI-based fraud detection can be bypassed by understanding the model's decision boundaries

Defences

  • Adversarial training: Include adversarial examples in training data
  • Input preprocessing: Filter or transform inputs before model inference
  • Ensemble methods: Use multiple diverse models — harder to fool simultaneously
  • Certified defences: Provably robust models for high-assurance use cases
  • AI red teaming: Systematic adversarial testing before deployment