Unsafe Deserialization in DeepSpeed utility function when loading the model file
April 3, 2025

Products Impacted
Lightning AI’s pytorch-lightning.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data.
Details
The cause of this vulnerability is in the convert_zero_checkpoint_to_fp32_state_dict function from lightning/pytorch/utilities/deepspeed.py:
def convert_zero_checkpoint_to_fp32_state_dict(
checkpoint_dir: _PATH, output_file: _PATH, tag: str | None = None
) -> dict[str, Any]:
"""Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict`` file that can be loaded with
``torch.load(file)`` + ``load_state_dict()`` and used for training without DeepSpeed. It gets copied into the top
level checkpoint dir, so the user can easily do the conversion at any point in the future. Once extracted, the
weights don't require DeepSpeed and can be used in any application. Additionally the script has been modified to
ensure we keep the lightning state inside the state dict for being able to run
``LightningModule.load_from_checkpoint('...')```.
Args:
checkpoint_dir: path to the desired checkpoint folder.
(one that contains the tag-folder, like ``global_step14``)
output_file: path to the pytorch fp32 state_dict output file (e.g. path/pytorch_model.bin)
tag: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt
to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
Examples::
# Lightning deepspeed has saved a directory instead of a file
convert_zero_checkpoint_to_fp32_state_dict(
"lightning_logs/version_0/checkpoints/epoch=0-step=0.ckpt/",
"lightning_model.pt"
)
"""
...
zero_stage = optim_state["optimizer_state_dict"]["zero_stage"]
model_file = get_model_state_file(checkpoint_dir, zero_stage)
client_state = torch.load(model_file, map_location=CPU_DEVICE)
...
The function is used to convert checkpoints into a single consolidated file. Unlike the other functions in this report, this vulnerability takes in a directory and requires an additional file named latest which contains the name of a directory containing a pytorch file with the naming convention *_optim_states.pt. This pytorch file returns a state which specifies the model state file, also located in the directory. This file is either named mp_rank_00_model_states.pt or zero_pp_rank_0_mp_rank_00_model_states.pt and is loaded in this exploit.
from lightning.pytorch.utilities.deepspeed import convert_zero_checkpoint_to_fp32_state_dict
checkpoint = "./checkpoint"
convert_zero_checkpoint_to_fp32_state_dict(checkpoint, "out.pt")
The pytorch file contains a data.pkl file which is unpickled during the loading process. Pickle is an inherently unsafe format which when loaded can cause arbitrary code to be executed, if the user tries to load a compromised checkpoint code can run on their system.
Project URL
https://lightning.ai/docs/pytorch/stable/
https://github.com/Lightning-AI/pytorch-lightning
Researcher: Kasimir Schulz, Director, Security Research, HiddenLayer
Related SAI Security Advisory
November 26, 2025
Allowlist Bypass in Run Terminal Tool Allows Arbitrary Code Execution During Autorun Mode
When in autorun mode with the secure ‘Follow Allowlist’ setting, Cursor checks commands sent to run in the terminal by the agent to see if a command has been specifically allowed. The function that checks the command has a bypass to its logic, allowing an attacker to craft a command that will execute non-whitelisted commands.
October 17, 2025
Data Exfiltration from Tool-Assisted Setup
Windsurf’s automated tools can execute instructions contained within project files without asking for user permission. This means an attacker can hide instructions within a project file to read and extract sensitive data from project files (such as a .env file) and insert it into web requests for the purposes of exfiltration.