HiddenLayer, a Gartner recognized Cool Vendor for AI Security, is the leading provider of Security for AI. Its security platform helps enterprises safeguard the machine learning models behind their most important products. HiddenLayer is the only company to offer turnkey security for AI that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Founded by a team with deep roots in security and ML, HiddenLayer aims to protect enterprise’s AI from inference, bypass, extraction attacks, and model theft. The company is backed by a group of strategic investors, including M12, Microsoft’s Venture Fund, Moore Strategic Ventures, Booz Allen Ventures, IBM Ventures, and Capital One Ventures.
April 30, 2024
Numpy defaults to allowing Pickle to be run when content type is NPY or NPZ
Summary
A deserialization vulnerability exists inside of the NumpyDeserializer.deserialize function of the base_deserializers python file. The deserializer allows the user to set an optional argument called allow_pickle which is passed to np.load and can be used to safely load a numpy file. By default the optional parameter was set to true, resulting in the loading and execution of malicious pickle files. Throughout the codebase the optional parameter is not used allowing code execution to potentially occur.
Products Impacted
This vulnerability is present in AWS Sagemaker Python SDK v2.154.0 up to v2.218.0.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data
Details
As stated above, the vulnerability exists in the NumpyDeserializer deserialize function:
def deserialize(self, stream, content_type):
"""Deserialize data from an inference endpoint into a NumPy array.
Args:
stream (botocore.response.StreamingBody): Data to be deserialized.
content_type (str): The MIME type of the data.
Returns:
numpy.ndarray: The data deserialized into a NumPy array.
"""
try:
if content_type == "text/csv":
return np.genfromtxt(
codecs.getreader("utf-8")(stream), delimiter=",", dtype=self.dtype
)
if content_type == "application/json":
return np.array(json.load(codecs.getreader("utf-8")(stream)), dtype=self.dtype)
if content_type == "application/x-npy":
return np.load(io.BytesIO(stream.read()), allow_pickle=self.allow_pickle)
if content_type == "application/x-npz":
try:
return np.load(io.BytesIO(stream.read()), allow_pickle=self.allow_pickle)
finally:
stream.close()
finally:
stream.close()
raise ValueError("%s cannot read content type %s." % (__class__.__name__, content_type))
If the content type is either “application/x-npy” or “application/x-npz” then the stream with the malicious pickle file gets sent to the np.load function, allowing for code execution to occur. The root cause of the vulnerability, however, exists within the class initializer:
def __init__(self, dtype=None, accept="application/x-npy", allow_pickle=True):
"""Initialize a ``NumpyDeserializer`` instance.
Args:
dtype (str): The dtype of the data (default: None).
accept (union[str, tuple[str]]): The MIME type (or tuple of allowable MIME types) that
is expected from the inference endpoint (default: "application/x-npy").
allow_pickle (bool): Allow loading pickled object arrays (default: True).
"""
super(NumpyDeserializer, self).__init__(accept=accept)
self.dtype = dtype
self.allow_pickle = allow_pickle
As mentioned in the summary, by having allow_pickle set to true, the function is unsafe by default. A user would be compromised if their code opens a malicious pickle object and passes the stream to deserialize like the below example:
# Use the NumpyDeserializer
from sagemaker.base_deserializers import NumpyDeserializer
with open("bad.npy", "rb") as f:
NumpyDeserializer().deserialize(f, "application/x-npy")
with open("bad.npy", "rb") as f:
NumpyDeserializer().deserialize(f, "application/x-npz")
When the above file is run, we can see that “pwned” is printed out twice:
Command Injection in CaptureDependency Function
Summary
A command injection vulnerability exists inside of the capture_dependencies function of the src/sagemaker/serve/save_retrive/version_1_0_0/save/utils.py python file. The command injection allows for arbitrary system commands to be run on the compromised machine. While this may not normally be an issue, the parameter can be altered by a user when used in the save_handler.py file in the same directory.
Products Impacted
This vulnerability is present in AWS Sagemaker Python SDK v2.199.0 up to v2.218.0.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-78: Improper Neutralization of Special Elements used in an OS Command (‘OS Command Injection’)
Details
The capture_dependencies function takes a string representing the requirements path, tries importing pigar, and then passes the requirements_path to os.system.
def capture_dependencies(requirements_path: str):
"""Placeholder docstring"""
logger.info("Capturing dependencies...")
try:
import pigar
pigar.__version__ # pylint: disable=W0104
except ModuleNotFoundError:
logger.warning(
"pigar module is not installed in python environment, "
"dependency generation may be incomplete"
"Checkout the instructions on the installation page of its repo: "
"https://github.com/damnever/pigar "
"And follow the ones that match your environment."
"Please note that you may need to restart your runtime after installation."
)
import sagemaker
sagemaker_dependency = f"{sagemaker.__package__}=={sagemaker.__version__}"
with open(requirements_path, "w") as f:
f.write(sagemaker_dependency)
return
command = f"pigar gen -f {Path(requirements_path)} {os.getcwd()}"
logging.info("Running command %s", command)
os.system(command)
logger.info("Dependencies captured successfully")
We can then create a proof of concept which breaks the call to pigar and instead runs Is:
from sagemaker.serve.save_retrive.version_1_0_0.save.utils import capture_dependencies
requirements_path = ";ls"
capture_dependencies(requirements_path)
When run, we can see that the “ls” command was executed: