• Platform
    • AISec Platform
    • Automated Red Teaming for AI
    • AI Detection & Response
    • Model Scanner
  • Solutions
    • Finance
    • Public Sector
    • Tech
  • Services
  • Learn
    • Innovation Hub
    • Insights
    • Research
    • Reports and Guides
    • SAI Security Advisory
  • Partner
    • Go-To-Market Partner
    • Technology Alliance
    • Apply
  • Company
    • About
    • In the News
  • Book a Demo
  • Platform
    • AISec Platform
    • Automated Red Teaming for AI
    • AI Detection & Response
    • Model Scanner
  • Solutions
    • Finance
    • Public Sector
    • Tech
  • Services
  • Learn
    • Innovation Hub
    • Insights
    • Research
    • Reports and Guides
    • SAI Security Advisory
  • Partner
    • Go-To-Market Partner
    • Technology Alliance
    • Apply
  • Company
    • About
    • In the News
  • Book a Demo

AWS Sagemaker Python SDK Vulnerability Report

April 30, 2024

Numpy defaults to allowing Pickle to be run when content type is NPY or NPZ

CVE Number

CVE-2024-34072

Summary

A deserialization vulnerability exists inside of the NumpyDeserializer.deserialize function of the base_deserializers python file. The deserializer allows the user to set an optional argument called allow_pickle which is passed to np.load and can be used to safely load a numpy file. By default the optional parameter was set to true, resulting in the loading and execution of malicious pickle files. Throughout the codebase the optional parameter is not used allowing code execution to potentially occur.

Products Impacted

This vulnerability is present in AWS Sagemaker Python SDK v2.154.0 up to v2.218.0.

CVSS Score: 7.8

AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE Categorization

CWE-502: Deserialization of Untrusted Data

Details

As stated above, the vulnerability exists in the NumpyDeserializer deserialize function:

def deserialize(self, stream, content_type):
        """Deserialize data from an inference endpoint into a NumPy array.

        Args:
            stream (botocore.response.StreamingBody): Data to be deserialized.
            content_type (str): The MIME type of the data.

        Returns:
            numpy.ndarray: The data deserialized into a NumPy array.
        """
        try:
            if content_type == "text/csv":
                return np.genfromtxt(
                    codecs.getreader("utf-8")(stream), delimiter=",", dtype=self.dtype
                )
            if content_type == "application/json":
                return np.array(json.load(codecs.getreader("utf-8")(stream)), dtype=self.dtype)
            if content_type == "application/x-npy":
                return np.load(io.BytesIO(stream.read()), allow_pickle=self.allow_pickle)
            if content_type == "application/x-npz":
                try:
                    return np.load(io.BytesIO(stream.read()), allow_pickle=self.allow_pickle)
                finally:
                    stream.close()
        finally:
            stream.close()

        raise ValueError("%s cannot read content type %s." % (__class__.__name__, content_type))

If the content type is either “application/x-npy” or “application/x-npz” then the stream with the malicious pickle file gets sent to the np.load function, allowing for code execution to occur. The root cause of the vulnerability, however, exists within the class initializer:

    def __init__(self, dtype=None, accept="application/x-npy", allow_pickle=True):
        """Initialize a ``NumpyDeserializer`` instance.

        Args:
            dtype (str): The dtype of the data (default: None).
            accept (union[str, tuple[str]]): The MIME type (or tuple of allowable MIME types) that
                is expected from the inference endpoint (default: "application/x-npy").
            allow_pickle (bool): Allow loading pickled object arrays (default: True).
        """
        super(NumpyDeserializer, self).__init__(accept=accept)
        self.dtype = dtype
        self.allow_pickle = allow_pickle

As mentioned in the summary, by having allow_pickle set to true, the function is unsafe by default. A user would be compromised if their code opens a malicious pickle object and passes the stream to deserialize like the below example:

# Use the NumpyDeserializer
from sagemaker.base_deserializers import NumpyDeserializer

with open("bad.npy", "rb") as f:
    NumpyDeserializer().deserialize(f, "application/x-npy")

with open("bad.npy", "rb") as f:
    NumpyDeserializer().deserialize(f, "application/x-npz")

When the above file is run, we can see that “pwned” is printed out twice:

code on a screen

Command Injection in CaptureDependency Function

CVE Number

CVE-2024-34073

Summary

A command injection vulnerability exists inside of the capture_dependencies function of the src/sagemaker/serve/save_retrive/version_1_0_0/save/utils.py python file. The command injection allows for arbitrary system commands to be run on the compromised machine. While this may not normally be an issue, the parameter can be altered by a user when used in the save_handler.py file in the same directory.

Products Impacted

This vulnerability is present in AWS Sagemaker Python SDK v2.199.0 up to v2.218.0.

CVSS Score: 7.8

AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE Categorization

CWE-78: Improper Neutralization of Special Elements used in an OS Command (‘OS Command Injection’)

Details

The capture_dependencies function takes a string representing the requirements path, tries importing pigar, and then passes the requirements_path to os.system.

def capture_dependencies(requirements_path: str):
    """Placeholder docstring"""
    logger.info("Capturing dependencies...")

    try:
        import pigar

        pigar.__version__  # pylint: disable=W0104
    except ModuleNotFoundError:
        logger.warning(
            "pigar module is not installed in python environment, "
            "dependency generation may be incomplete"
            "Checkout the instructions on the installation page of its repo: "
            "https://github.com/damnever/pigar "
            "And follow the ones that match your environment."
            "Please note that you may need to restart your runtime after installation."
        )
        import sagemaker

        sagemaker_dependency = f"{sagemaker.__package__}=={sagemaker.__version__}"
        with open(requirements_path, "w") as f:
            f.write(sagemaker_dependency)
        return

    command = f"pigar gen -f {Path(requirements_path)} {os.getcwd()}"
    logging.info("Running command %s", command)

    os.system(command)
    logger.info("Dependencies captured successfully")

We can then create a proof of concept which breaks the call to pigar and instead runs Is:

from sagemaker.serve.save_retrive.version_1_0_0.save.utils import capture_dependencies

requirements_path = ";ls"

capture_dependencies(requirements_path)

When run, we can see that the “ls” command was executed:

code on a screen

Project URL

https://github.com/aws/sagemaker-python-sdk

Researcher: Kasimir Schulz, Principal Security Researcher, HiddenLayer

HiddenLayer, a Gartner recognized Cool Vendor for AI Security, is the leading provider of Security for AI. Its security platform helps enterprises safeguard the machine learning models behind their most important products. HiddenLayer is the only company to offer turnkey security for AI that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Founded by a team with deep roots in security and ML, HiddenLayer aims to protect enterprise’s AI from inference, bypass, extraction attacks, and model theft. The company is backed by a group of strategic investors, including M12, Microsoft’s Venture Fund, Moore Strategic Ventures, Booz Allen Ventures, IBM Ventures, and Capital One Ventures.

Book a Demo
  • Platform
  • Solutions
  • Services
  • Learn
  • Partner
  • Company
  • Careers
  • Contact

© 2025 HiddenLayer

AICPA SOC logo

Security Privacy Policy  Vulnerability Disclosure Policy Sitemap 

  • Twitter
  • Linkedin
Scroll to top