-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
A RuntimeError: CUDA error: an illegal memory access was encountered
is raised when running transcribe two or more times with 20 or more audios of 6 seconds while calling torch.cuda.empty_cache() to release cache between inferences.
Any call to torch.cuda.empty_cache() in any script of the python process can potentially produce the error.
Steps/Code to reproduce bug
import gc
import librosa
import nemo.collections.asr as nemo_asr
import numpy as np
from omegaconf import open_dict
import os
import torch
if __name__ == "__main__":
asr_model:nemo_asr.models.ASRModel = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2")
decoding_cfg = asr_model.cfg.decoding
with open_dict(decoding_cfg):
decoding_cfg.strategy = "malsd_batch" # greedy_batch has the same problem
asr_model.change_decoding_strategy(decoding_cfg)
for batch_index in range(2):
audios = [np.zeros(96000, dtype=np.float32) for index in range(20)]
with torch.inference_mode():
outputs = asr_model.transcribe(audios, batch_size=128, num_workers=0)
print(outputs)
del outputs
gc.collect()
torch.cuda.empty_cache()
Expected behavior
torch.cuda.empty_cache() can be called safely without crashing
Environment overview (please complete the following information)
- Environment location: Bare-metal
- Method of NeMo install: pip install nemo_toolkit[all]
Environment details
- OS version: Windows 10
- PyTorch version: 2.6.0
- Python version: 3.12
- CUDA 12.4
Additional context
GPU model: RTX 3090 Ti
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working