Skip to content

Is there any best practice using torchcodec with pytorch dataloader ? #696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Ash-one opened this issue May 22, 2025 · 4 comments
Open

Comments

@Ash-one
Copy link

Ash-one commented May 22, 2025

I am using torchcodec instead of decord, and it is more stable than decord on cpu. I want to know if there is a best practice code snippet for accelerating the dataloading and processing phase in pytorch training on GPU? Because I am getting trouble in loading too many videos with random sample strategy, which cost much time in the forward progress. I tried DALI, but it was quite hard to define sample strategies.

I will appreciate your prompt reply! Thanks!

@NicolasHug
Copy link
Member

Thanks for the request @Ash-one , we don't have official recommendations at the moment, but it would be good to have some. Can you share more about your specific use case and what you have currently tried?

@Ash-one
Copy link
Author

Ash-one commented May 28, 2025

@NicolasHug Hi! I'd like to share my use case as the following:

My application scenario requires extracting a fixed number of video frames and audio from an MP4 file, and processing them into (T,C,H,W) video input and (mel_bins, frames) audio fbank features.

The current processing solution is two-stage: first, using ffmpeg to split the audio and video into MP4 and WAV files, then processing them separately with torchcodec and torchaudio.

Additionally, running on the VGGSound dataset with 200K videos, to speed up the data preprocessing process, I confirmed that torchcodec works well on the CPU, and then tried to accelerate the preprocessing of torchcodec under distributed conditions using GPUs, but encountered initialization issues.

It seems that repeatedly creating and releasing the VideoDecoder on the GPU has caused extra overhead, but sorry for that I don't quite understand the specific principles. By the way, I also want to ask if you recommend this operation?

@NicolasHug
Copy link
Member

It seems that repeatedly creating and releasing the VideoDecoder on the GPU has caused extra overhead

Yeah, this is fairly common. We try to cache the GPU context as much as possible but maybe there are things we could improve (that's on our stack). And in any case, we should document the best way to use GPU resources with torchcodec.

The current processing solution is two-stage: first, using ffmpeg to split the audio and video into MP4 and WAV files, then processing them separately with torchcodec and torchaudio.

Just a note: you should be able to use the VideoDecoder and the AudioDecoder on the same video file (i.e. a file containing both video and audio streams). I think this could allow you to avoid the first ffmpeg step where you split the video and audio in separate files.

@Ash-one
Copy link
Author

Ash-one commented Jun 3, 2025

@NicolasHug Thanks to your reply!

Yeah, this is fairly common. We try to cache the GPU context as much as possible but maybe there are things we could improve (that's on our stack). And in any case, we should document the best way to use GPU resources with torchcodec.

Hope to see this soon! I would like to try them in the nightly version!

Just a note: you should be able to use the VideoDecoder and the AudioDecoder on the same video file (i.e. a file containing both video and audio streams).

That is so useful, I will try this method in my new pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants