Skip to content

ASR batch transcription crash with torch.cuda.empty_cache() #14727

@SystemPanic

Description

@SystemPanic

Describe the bug

A RuntimeError: CUDA error: an illegal memory access was encountered is raised when running transcribe two or more times with 20 or more audios of 6 seconds while calling torch.cuda.empty_cache() to release cache between inferences.

Any call to torch.cuda.empty_cache() in any script of the python process can potentially produce the error.

Steps/Code to reproduce bug

import gc
import librosa
import nemo.collections.asr as nemo_asr
import numpy as np
from omegaconf import open_dict
import os
import torch

if __name__ == "__main__":
    asr_model:nemo_asr.models.ASRModel = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2")
    
    decoding_cfg = asr_model.cfg.decoding
    with open_dict(decoding_cfg):
        decoding_cfg.strategy = "malsd_batch" # greedy_batch has the same problem
        asr_model.change_decoding_strategy(decoding_cfg)

    for batch_index in range(2):
        audios = [np.zeros(96000, dtype=np.float32) for index in range(20)]
        with torch.inference_mode():
            outputs = asr_model.transcribe(audios, batch_size=128, num_workers=0)
        print(outputs)
        del outputs
        gc.collect()
        torch.cuda.empty_cache()

Expected behavior

torch.cuda.empty_cache() can be called safely without crashing

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of NeMo install: pip install nemo_toolkit[all]

Environment details

  • OS version: Windows 10
  • PyTorch version: 2.6.0
  • Python version: 3.12
  • CUDA 12.4

Additional context

GPU model: RTX 3090 Ti

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions