HiddenLayer, a Gartner recognized Cool Vendor for AI Security, is the leading provider of Security for AI. Its security platform helps enterprises safeguard the machine learning models behind their most important products. HiddenLayer is the only company to offer turnkey security for AI that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Founded by a team with deep roots in security and ML, HiddenLayer aims to protect enterprise’s AI from inference, bypass, extraction attacks, and model theft. The company is backed by a group of strategic investors, including M12, Microsoft’s Venture Fund, Moore Strategic Ventures, Booz Allen Ventures, IBM Ventures, and Capital One Ventures.
June 4, 2024
Pickle Load in Serialized Profile Load
CVE Number
CVE-2024-37062
Summary
Profile reports can be serialized and deserialized through the load/loads and dump/dumps functions allowing people to share reports with each other. Reports are serialized using the Python pickle module which is inherently insecure and can lead to arbitrary code being executed once the file is loaded.
Products Impacted
This vulnerability is present in Ydata-profiling v3.7.0 or newer.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
In src/ydata_profiling/serialize_report.py pickle is used to load serialized profiles within the loads function. The load function relies on the loads function and is therefore also vulnerable:
def loads(self, data: bytes) -> Union["ProfileReport", "SerializeReport"]:
"""
Deserialize the serialized report
Args:
data: The bytes of a serialize ProfileReport object.
Raises:
ValueError: if ignore_config is set to False and the configs do not match.
Returns:
self
"""
import pickle
try:
(
df_hash,
loaded_config,
loaded_description_set,
loaded_report,
) = pickle.loads(data)
This can be abused by generating a malicious pickle and using the load or loads functions:
from ydata_profiling import ProfileReport
import pickle
class Exploit:
def __reduce__(self):
return eval, ("print('pwned')",)
profile = ProfileReport().loads(pickle.dumps(Exploit()))
In the example above we pickle dumps directly into the loads function, in a real attack the user would be affected by the load function or by passing the bytes into the loads function after reading them from a file or over the network.
XSS Injection in HTML Profile Report Generation
CVE Number
CVE-2024-37063
Summary
ProfileReports can be saved as an HTML file so that they can be viewed directly in the browser. To do this, the program leverages Jinja2 to create templates. However, by default, Jinja2 doesn’t auto-escape any HTML that is rendered resulting in an attacker being able to inject an XSS attack, running arbitrary code when a report is viewed.
Products Impacted
This vulnerability is present in Ydata-profiling v3.7.0 or newer.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-79: Improper Neutralization of Input During Web Page Generation (‘Cross-site Scripting’)
Details
In src/ydata_profiling/report/presentation/flavours/html/templates.py the Jinja2 template is initialized without setting autoescape to true allowing for a maliciously crafted dataset to perform an XSS attack when an HTML report is generated.
Pickle Load in Read Pandas Utility Function
CVE Number
CVE-2024-37064
Summary
The YData profiling library allows users to load pandas datasets from their filesystem using the read_pandas function. This function then grabs the extension of the file and sends it to a loading function based on the extension. One of the supported extensions, and file formats, is the python pickle module. As a result, when a user loads the dataset, arbitrary code will run on their system.
Products Impacted
This vulnerability is present in Ydata-profiling v3.7.0 or newer.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
In src/ydata_profiling/utils/dataframe.py pickle is used to load serialized pandas datasets within the read_pandas util function:
def read_pandas(file_name: Path) -> pd.DataFrame:
"""Read DataFrame based on the file extension. This function is used when the file is in a standard format.
Various file types are supported (.csv, .json, .jsonl, .data, .tsv, .xls, .xlsx, .xpt, .sas7bdat, .parquet)
Args:
file_name: the file to read
Returns:
DataFrame
Notes:
This function is based on pandas IO tools:
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
https://pandas.pydata.org/pandas-docs/stable/reference/io.html
This function is not intended to be flexible or complete. The main use case is to be able to read files without
user input, which is currently used in the editor integration. For more advanced use cases, the user should load
the DataFrame in code.
"""
extension = uncompressed_extension(file_name)
if extension == ".json":
df = pd.read_json(str(file_name))
elif extension == ".jsonl":
df = pd.read_json(str(file_name), lines=True)
elif extension == ".dta":
df = pd.read_stata(str(file_name))
elif extension == ".tsv":
df = pd.read_csv(str(file_name), sep="\t")
elif extension in [".xls", ".xlsx"]:
df = pd.read_excel(str(file_name))
elif extension in [".hdf", ".h5"]:
df = pd.read_hdf(str(file_name))
elif extension in [".sas7bdat", ".xpt"]:
df = pd.read_sas(str(file_name))
elif extension == ".parquet":
df = pd.read_parquet(str(file_name))
elif extension in [".pkl", ".pickle"]:
df = pd.read_pickle(str(file_name))
While this function could be used by a user in code, the function is used by default when using the command line tool: