This type of attack is more prevalent in online learning models — models that learn as new data comes in, as opposed to those that learn offline from already collected data. In this type of attack, the attacker provides input samples that shift the decision boundary in his or her favor.
For example, consider the following diagram showing a simple model consisting of two parameters, X and Y, that predict if an input sample is malicious or benign. The first figure shows that the model has learned a clear decision boundary between benign (blue) and malicious (red) samples, as indicated by a solid line separating the red and blue samples. The second figure shows that an adversary input some samples that gradually shifted the boundary, as indicated by the dotted lines. This results in the classification of some malicious samples as benign.
In this type of attack, an attacker causes the model to misclassify a sample. Consider a simple machine learning-based intrusion detection system (IDS), as shown in the following figure. This IDS decides if a given sample is an intrusion or normal traffic based on parameters A, B and C. Weights of the parameters (depicted as adjustable via a slider button) determine whether traffic is normal or an intrusion.
If this is a whitebox system, an adversary could probe it to carefully determine the parameter that would classify the traffic as normal and then increase that parameter’s weight. The concept is illustrated in the following figure. The attacker recognized that parameter B plays a role in classifying an intrusion as normal and increased the weight of parameter B to achieve his or her goal.
How to Defend Against Attacks on Machine Learning Systems
There are different approaches for preventing each type of attack. The following best practices can help security teams defend against poisoning attacks:
- Ensure that you can trust any third parties or vendors involved in training your model or providing samples for training it.
- If training is done internally, devise a mechanism for inspecting the training data for any contamination.
- Try to avoid real-time training and instead train offline. This not only gives you the opportunity to vet the data, but also discourages attackers, since it cuts off the immediate feedback they could otherwise use to improve their attacks.
- Keep a ground truth test and test your model against this set after every training cycle. Considerable changes in classifications from the original set will indicate poisoning.
Defending against evasive attacks is very hard because trained models are imperfect, and an attacker can always find and tune the parameters that will tilt the classifier in the desired direction. Researchers have proposed two defenses for evasive attacks:
- Try to train your model with all the possible adversarial examples an attacker could come up with.
- Compress the model so it has a very smooth decision surface, resulting in less room for an attacker to manipulate it.
Another effective measure is to use cleverhans, a Python library that benchmarks machine learning systems’ vulnerabilities to adversarial samples. This can help organizations identify the attack surface that their machine learning models are exposing.
According to a Carbon Black report, 70 percent of security practitioners and researchers said they believe attackers are able to bypass machine learning-driven security. To make machine learning-based systems as foolproof as possible, organizations should adopt the security best practices highlighted above.
The truth is that any system can be bypassed, be it machine learning-based or traditional, if proper security measures are not implemented. Organizations have managed to keep their traditional security systems safe against most determined attackers with proper security hygiene. The same focus and concentration is required for machine learning systems. By applying that focus, you’ll be able to reap the benefits of AI and dispel any perceived mistrust toward those systems.