CVE Number
Summary
A deserialization vulnerability exists inside of the NumpyDeserializer.deserialize function of the base_deserializers python file. The deserializer allows the user to set an optional argument called allow_pickle which is passed to np.load and can be used to safely load a numpy file. By default the optional parameter was set to true, resulting in the loading and execution of malicious pickle files. Throughout the codebase the optional parameter is not used allowing code execution to potentially occur.
Products Impacted
This vulnerability is present in AWS Sagemaker Python SDK v2.154.0 up to v2.218.0.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data
Details
As stated above, the vulnerability exists in the NumpyDeserializer deserialize function:
def deserialize(self, stream, content_type):
"""Deserialize data from an inference endpoint into a NumPy array.
Args:
stream (botocore.response.StreamingBody): Data to be deserialized.
content_type (str): The MIME type of the data.
Returns:
numpy.ndarray: The data deserialized into a NumPy array.
"""
try:
if content_type == "text/csv":
return np.genfromtxt(
codecs.getreader("utf-8")(stream), delimiter=",", dtype=self.dtype
)
if content_type == "application/json":
return np.array(json.load(codecs.getreader("utf-8")(stream)), dtype=self.dtype)
if content_type == "application/x-npy":
return np.load(io.BytesIO(stream.read()), allow_pickle=self.allow_pickle)
if content_type == "application/x-npz":
try:
return np.load(io.BytesIO(stream.read()), allow_pickle=self.allow_pickle)
finally:
stream.close()
finally:
stream.close()
raise ValueError("%s cannot read content type %s." % (__class__.__name__, content_type))
If the content type is either “application/x-npy” or “application/x-npz” then the stream with the malicious pickle file gets sent to the np.load function, allowing for code execution to occur. The root cause of the vulnerability, however, exists within the class initializer:
As mentioned in the summary, by having allow_pickle set to true, the function is unsafe by default. A user would be compromised if their code opens a malicious pickle object and passes the stream to deserialize like the below example:
# Use the NumpyDeserializer
from sagemaker.base_deserializers import NumpyDeserializer
with open("bad.npy", "rb") as f:
NumpyDeserializer().deserialize(f, "application/x-npy")
with open("bad.npy", "rb") as f:
NumpyDeserializer().deserialize(f, "application/x-npz")
When the above file is run, we can see that “pwned” is printed out twice: