Back to Glossary

Adversarial Example

The Adversarial Example refers to a manipulated input that fools machine learning models. These examples exploit vulnerabilities in the model, highlighting how even minor perturbations can cause significant misclassifications, especially in deep learning models such as neural networks. Adversarial examples pose a critical challenge to the robustness and reliability of machine learning systems, especially in security-sensitive applications.

Creation Process:

Start with valid input
Apply small, targeted changes
Aim to cause misclassification

Characteristics:

Often imperceptible to humans
Exploits model vulnerabilities
Produces high-confidence incorrect results

Types of Adversarial Example:

White-box: Created with model knowledge
Black-box: Created without model details
Targeted: Aims for specific misclassification
Untargeted: Aims for any misclassification

Implications:

Highlights model weaknesses
Raises security concerns
Challenges model robustness

Mitigation strategies:

Adversarial training
Input preprocessing
Model regularisation techniques

Adversarial examples reveal vulnerabilities in AI systems and guide improvements.