Unpacking the AI Adversarial Toolkit

Unpacking the Adversarial Toolkit Offensive Security Frameworks Adversarial Robustness Toolbox - IBM / LFAI Counterfit - Microsoft Cleverhans - CleverhansLab Armory - TwoSixLabs Foolbox - Jonas Rauber, Roland S. Zimmermann TextAttack - QData MLSploit - Georgia Tech & Intel AugLy - FacebookResearch Fault Injection PyTorchFi TensorFi - DependableSystemsLab Reinforcement-Learning/GAN-based Attack Tools MalwareGym - EndgameInc MalwareRL - Bobby Filar Pesidious - CyberForce DW-GAN - Johnnyzn PassGAN - Briland Hitaj et al (Paper) / Brannon Dorsey (Implementation) Model Theft/Extraction KnockOffNets - Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz All Your GNN Models and Data Belong To Me - Yang Zhang, Yun Shen, Azzedine Benameur Deserialization Exploitation Fickling - TrailOfBits Keras H5 Lambda Layer Exploit - Chris Anley - NCCGroup Charcuterie - Will Pearce Conclusions About HiddenLayer About SAI

Unpacking the Adversarial Toolkit

More often than not, it’s the creation of a new class of tool, or weapon, that acts as the catalyst of change and herald of a new age. Be it the sword, gun, first piece of computer malware, or offensive security frameworks like Metasploit, they all changed the paradigm and required us to adapt to face our new reality or ignore it at our peril.

Much in the same way, the field of adversarial machine learning is beginning to find its inflection points, with scores of tools and frameworks being released into the public sphere that bring the more advanced methods of attack into the hands of the many. These tools are often used with defensive evaluation in mind, but how they are used often depends on the hands of those who wield them.

The question remains, what are these tools, and how are they being used? The first step in defending yourself is knowing what’s out there.

Let’s begin!

Offensive Security Frameworks

Ask a security practitioner if they know of any offensive security frameworks, and the answer will almost always be a resounding ‘yes.’ The concept has been around for a long time, but frameworks such as Metasploit, Cobalt Strike, and Empire popularized the idea to an entirely new level. At their core, these frameworks amalgamate a set of often-complex attacks for various parts of a kill chain in one place (or one tool), enabling an adversary to perform attacks with ease, while only requiring an abstract understanding of how the attack works under the hood.

While they’re often referred to as ‘offensive’ security frameworks or ‘attack’ frameworks, they can also be used for defensive purposes. Security teams and penetration testers use such frameworks to evaluate security posture with greater ease and reproducibility. But, on the other side of the same coin, they also help to facilitate attackers in conducting malicious attacks. This concept holds true with adversarial machine learning. Currently, adversarial ML attacks have not yet become as commonplace as attacks on systems that support them but, with greater access to tooling, there is no doubt we will see them rise.

Here are some adversarial ML frameworks we’re acquainted with.

Adversarial Robustness Toolbox – IBM / LFAI

GitHub – Website

In 2018, IBM released the Adversarial Robustness Toolbox, or ART, for short. ART is a framework/library used to evaluate the security of machine learning models through various means and is now part of the Linux Foundation since early 2020. Models can be created, attacked, and evaluated all in one tool. ART boasts a multitude of attacks, defences, and metrics that can help security practitioners shore up model defenses and aid offensive researchers in finding vulnerabilities. ART supports all input data types and even includes tutorial examples in the form of Jupyter notebooks for getting started attacking image models, fooling audio classifiers, and much more.

Counterfit – Microsoft

GitHub

Counterfit, released by Microsoft in May of 2021, is a command-line automation tool used to orchestrate attacks and testing against ML models. Counterfit is environment-agnostic, model-agnostic and supports most general types of input data (text, audio, image, etc.). It does not provide the attacks themselves and instead interfaces with existing attacks and frameworks such as Adversarial Robustness Toolbox, TextAttack, and Augly. Users of Counterfit will no doubt pick up on its uncanny resemblance to Metasploit in terms of its commands and navigation.

Cleverhans – CleverhansLab

GitHub – Website

CleverHans, created by CleverHans-Lab – an academic research group attached to the University of Toronto – is a library that supports the creation of adversarial attacks and defenses and the benchmarking thereof. Carefully maintained tutorial examples are present within the GitHub repository to help users get started with the library. Attacks such as CarliniWagner and HopSkipJump, amongst others, can be used, with varying implementations for the different supported ML libraries – Jax, PyTorch, and TensorFlow 2. For seamless deployment, the tool can be spun up within a Docker container, à la its bundled Dockerfile. CleverHans-Lab regularly publishes research on adversarial attacks on their blog, with associated proof-of-concept (POC) code available from their GitHub profile.

Armory – TwoSixLabs

GitHub

Armory, developed by TwoSixLabs, is an open-source containerized testbed for evaluating adversarial defenses. Armory can be deployed via container either locally or in cloud instances, which enables scalable model evaluation. Armory interfaces with the Adversarial Robustness Toolbox to enable interchangeable attacks and defenses. Armory’s ‘scenarios’ are worth mentioning, allowing for testing and evaluating entire machine learning threat models. When building an Armory scenario, considerations such as adversaries’ objective, operating environment, capabilities, and resources are used to profile an attacker, determine the threat they pose and evaluate the performance impact through metrics of interest. While this is from a higher, more interpretable level, scenarios have a paired config file that contains detailed information on the attack to be performed, the dataset to use, the defense to test, and various other properties. Using these lends itself to a high standard of repeatability and potential for automation.

Foolbox – Jonas Rauber, Roland S. Zimmermann

GitHub – Website

Foolbox is built to perform fast attacks on ML models, having been rewritten to use EagerPy, which allows for native execution with multiple frameworks such as PyTorch, TensorFlow, JAX, and NumPy, without having to make any code changes. Foolbox boasts many gradient- and decision-based attacks, respectively, covering many routes of attack.

TextAttack – QData

GitHub

TextAttack is a powerful model-agnostic NLP attack framework that can perform adversarial text attacks, text augmentation, and model training. While many offensive scenarios can be conducted from within the framework, TextAttack also enables the user to use the framework and related libraries as the basis for the development of custom adversarial attacks. TextAttack’s powerful text augmentation capabilities can also be used to generate data to help increase model generalization and robustness.

MLSploit – Georgia Tech & Intel

GitHub – Website

MLSploit is an extensible cloud-based framework built to enable rapid security evaluation of ML models. Under the hood, MLSploit uses libraries such as Barnum, AVPass, and Shapshifter to create attacks on various malware classifiers, intrusion detectors, and object detectors and identify control flow anomalies in documents, to name a few. However, MLSploit does not appear to have been as actively developed as other frameworks mentioned in this blog.

AugLy – FacebookResearch

GitHub

AugLy, developed by Meta Research (Formerly Facebook Research), is not quite an offensive security framework but deals more specifically with data augmentation. AugLy can augment audio, image, text, and video to generate examples to increase model robustness and generalization. Counterfit uses AugLy for testing for ‘common corruptions,’ which they define as a bug class.

Fault Injection

As the name suggests, fault injection is the act of injecting faults into a system to understand how it behaves when it performs in unusual scenarios. In the case of ML, fault injection typically refers to the manipulation of weights and biases in a model during runtime. Fault Injection can be performed for several reasons, but predominantly to evaluate how models respond to software and hardware faults.

PyTorchFi

GitHub – Paper

PyTorchFi is a fault injection tool for Deep Neural Networks (DNNs) that were trained using PyTorch. PyTorchFi is highly versatile and straightforward to use, supporting several use cases for reliability and dependability research, including:

Resiliency analysis of classification or object detection networks
Analysis of robustness to adversarial attacks
Training resilient models
DNN interpretability

TensorFi – DependableSystemsLab

GitHub – Paper

TensorFI is a fault injection tool to provide runtime perturbations to models trained using TensorFlow. It operates by hooking TensorFlow operators such as LRN, softmax, div, and sub for specific layers and provides methods for altering results via YAML configuration. TorchFI supports a few existing DNNs, such as AlexNet, VGG, and LeNet.

Reinforcement-Learning/GAN-based Attack Tools

Over the last few years, there has been an interesting emergence of attack tooling utilizing machine learning, more precisely, reinforcement learning and Generative Adversarial Networks (GANs), to conduct attacks against machine learning systems. The aim – to produce an adversarial example for a target model. An adversarial example is essentially a piece of input data (be it an image, a PE file, audio snippet etc) that has been modified in a particular way to induce a specific reaction from an ML model. In many cases this is what we refer to as an evasion attack, also known as a model bypass.

Adversarial examples can be created in many ways, be it through mathematical means, randomly perturbing the input, or iteratively changing features. This process can be lengthy, but can be accelerated through the use of reinforcement learning and GANs.

Reinforcement learning in this context essentially weights input perturbations against the prediction value from the model. If the perturbation alters the predicted value in the desired direction, it weights it more positively and so on. This allows for a ‘smarter’ perturbation selection approach.

GANs on the other hand, typically have two networks, a generator and discriminator network respectively which train in tandem, by pitting themselves against each other. The generator model generates ‘fake’ data, while the discriminator model attempts to determine what was real or fake.

Both of these methods enable for fast and effective adversarial example generation, which can be applied to many domains. GANs are used in a variety of settings and can generate almost any input, for brevity this blog looks more closely at those which are more security-centric.

MalwareGym – EndgameInc

GitHub

MalwareGym was one of the first automated attack frameworks to use reinforcement learning in the modification of Portable Executable (PE) files. By taking features from clean ‘goodware’ and using them to alter malware executables, MalwareGym can be used to create adversarial examples that bypass malware classifier models (in this case, a gradient-boosted decision tree malware classifier). Under the hood, it uses OpenAI Gym, a library for building and comparing reinforcement learning solutions.

MalwareRL – Bobby Filar

GitHub

While MalwareGym performed attacks against one model, MalwareRL picked up where it left off, with the tool able to conduct attacks against three different malware classifiers, Ember (Elastic Malware Benchmark for Empowering Researchers), SoRel (Sophos-ReversingLabs), and MalConv. MalwareRL also comes with Docker container files, allowing it to be spun up in a container relatively quickly and easily.

Pesidious – CyberForce

GitHub

Pesidious performs a similar attack, however it boasts the use of Generative Adversarial Networks (GANs) alongside its reinforcement learning methodology. Pesidious also only supports 32-bit applications.

DW-GAN – Johnnyzn

GitHub

DW-GAN is a GAN-based framework for breaking captchas on the dark web, where many sites are gated to prevent automated scraping. Another interesting application where ML-equipped tooling comes to the fore.

PassGAN – Briland Hitaj et al (Paper) / Brannon Dorsey (Implementation)

GitHub – Paper

PassGAN uses a GAN to create novel password examples based on leaked password datasets, removing the necessity for a human to carefully create and curate a password wordlist for consequent use with tools such as Hashcat/JohnTheRipper.

Model Theft/Extraction

Model theft, also known as model extraction, is when an attacker recreates a target model without any access to the training data. While there aren’t many tooling examples for model theft, it’s an attack vector that is highly worrying, given the relative ease at which a model can be stolen, leading to potentially substantial damages and business losses over time. We can posit that this is because it’s typically quite a bespoke process, though it’s hard to tell.

KnockOffNets – Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz

GitHub – Paper

One such tool for the extraction of neural networks is KnockOffNets. KnockOffNets is available as its own standalone repository and as part of the Adversarial Robustness Toolbox. With only a black-box understanding of a model and no predetermined knowledge of its training data, the model can be relatively accurately reproduced for as little as $30, even performing well with interpreting data outside the target model’s training data. This tool shows the relative ease, exploitability, and success of model theft/model extraction attacks.

All Your GNN Models and Data Belong To Me – Yang Zhang, Yun Shen, Azzedine Benameur

Paper

Given its recency and relevancy, it’s worth mentioning the talk ‘All Your GNN Models and Data Belong To Me’ by Zhang, Shen and Benameur from the BlackHat USA 2022 conference. This research outlines how prevalent graph neural networks are throughout society, how susceptible they are to link reidentification attacks, and most importantly – how they can be stolen.

Deserialization Exploitation

While not explicitly pertaining to ML models, deserialization exploits are an often overlooked vulnerability within the ML sphere. These exploits happen when arbitrary code is allowed to be deserialized without any safety check. One main culprit is the Pickle file format, which is used almost ubiquitously with the sharing of pre-trained models. Pickle is inherently vulnerable to a deserialization exploit, allowing attackers to run malicious code upon load. To make matters worse, Pickle is still the preferred storage method for saving/loading models from libraries such as PyTorch and Scikit-Learn, and is widely used by other ML libraries.

Fickling – TrailOfBits

GitHub – Blog

The tool Fickling by TrailOfBits is explicitly designed to exploit the Pickle format and detect malicious Pickle files. Fickling boasts a decompiler, static analyzer, and bytecode rewriter. With that, it can inject arbitrary code into existing Pickle files, trace execution, and evaluate its safety.

Keras H5 Lambda Layer Exploit – Chris Anley – NCCGroup

Paper

While not a tool itself, worth mentioning is the existence of another deserialization exploit, this time within the Keras library. While Keras supports Pickle files, it also supports the HDF5 format. HDF5 is not inherently vulnerable (that we know of), but when combined with Lambdas, they can be. Lambdas in Keras can execute arbitrary code as part of the neural network architecture and can be persisted within the HDF5 format. If a Lambda bundled within a pre-trained model in said format contains a remote backdoor or reverse shell, Keras will trigger it automatically upon model load.

Charcuterie – Will Pearce

GitHub

Last but certainly not least is the collection of attacks for ML and ML adjacent libraries – Charcuterie. Released at LabsCon 2022 by Will Pearce, AKA MooHax, Charcuterie ties together a multitude of code execution and deserialization exploits in one place, acting as a demonstration of the many ways ML models are vulnerable outside of their algorithms. While it provides several examples of Pickle and Keras deserialization (though the Keras functionality is commented out), it also includes methods of abusing shared objects in popular ML libraries to load malicious DLLs, Jupyter Notebook AutoLoad abuse, JSON deserialization and many more. We recommend checking out the presentation slides for further reading.

Conclusions

Hopefully, by now, we’ve painted a vivid enough picture to show that the volume of offensive tooling, exploitation, and research in the field is growing, as is our collective attack surface. The tools we’ve looked at in this blog showcase what’s out there in terms of publicly available, open-source tooling, but don’t forget that actors with enough resources (and motivation) have the capability to create more advanced methods of attack. Fear the state-aligned university researcher!

On the other side of the coin, the term ‘script-kiddie’ has been thrown around for a long time, referring to those who rely predominantly on premade tools to attack a system without wholly understanding the field behind it. While not as point-and-shoot as offensive tooling in the traditional sense, the bar has been dramatically lowered for adversaries to conduct attacks on AI/ML systems. Whichever designation one gives them, the reality is that they pose a threat and, no matter the skill level, shouldn’t be ignored.

While these tools require varying skill levels to use and some far more to master, they all contribute to the communal knowledge-base and serve, at the very least, as educational waypoints both for researchers and those stepping into the field for the first time. From an industry perspective, they serve as important tools to harden AI/ML systems against attack, improve model robustness, and evaluate security posture through red and blue team exercises. Ensuring AI model security is critical in this context, as these frameworks enable researchers and practitioners to identify vulnerabilities and mitigate risks before adversaries can exploit them.

As with all technology, we stand on the shoulders of giants; the development and use of these tools will spur research that builds on them and will drive both offensive and defensive research to new heights.

About HiddenLayer

HiddenLayer helps enterprises safeguard the machine learning models behind their most important products with a comprehensive security platform. Only HiddenLayer offers turnkey AI/ML security that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Founded in March of 2022 by experienced security and ML professionals, HiddenLayer is based in Austin, Texas, and is backed by cybersecurity investment specialist firm Ten Eleven Ventures. For more information, visit www.hiddenlayer.com and follow us on LinkedIn or Twitter.

About SAI

Synaptic Adversarial Intelligence (SAI) is a team of multidisciplinary cyber security experts and data scientists, who are on a mission to increase general awareness surrounding the threats facing machine learning and artificial intelligence systems. Through education, we aim to help data scientists, MLDevOps teams and cyber security practitioners better evaluate the vulnerabilities and risks associated with ML/AI, ultimately leading to more security conscious implementations and deployments.

Previous Post Next Post

Unpacking the AI Adversarial Toolkit

Table of Contents

Unpacking the Adversarial Toolkit

Offensive Security Frameworks

Adversarial Robustness Toolbox – IBM / LFAI

Counterfit – Microsoft

Cleverhans – CleverhansLab

Armory – TwoSixLabs

Foolbox – Jonas Rauber, Roland S. Zimmermann

TextAttack – QData

MLSploit – Georgia Tech & Intel

AugLy – FacebookResearch

Fault Injection

PyTorchFi

TensorFi – DependableSystemsLab

Reinforcement-Learning/GAN-based Attack Tools

MalwareGym – EndgameInc

MalwareRL – Bobby Filar

Pesidious – CyberForce

DW-GAN – Johnnyzn

PassGAN – Briland Hitaj et al (Paper) / Brannon Dorsey (Implementation)

Model Theft/Extraction

KnockOffNets – Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz

All Your GNN Models and Data Belong To Me – Yang Zhang, Yun Shen, Azzedine Benameur

Deserialization Exploitation

Fickling – TrailOfBits

Keras H5 Lambda Layer Exploit – Chris Anley – NCCGroup

Charcuterie – Will Pearce

Conclusions

About HiddenLayer

About SAI

Related Posts

AI Threat Landscape Report 2025

DeepSh*t: Exposing the Security Risks of DeepSeek-R1

Ultralytics Python Package Compromise Deploys Cryptominer