What's actually making this faster? #703

richardrl · 2025-05-28T23:45:04Z

My understanding is the fundamental issue with decoding from MP4s is the decoding gets extremely slow deeper into the MP4, because you have to start from the beginning and sequentially decode to your target frame.

I did an experiment using the torchcodec indexing API and it is no faster than a very naive decoding with AV library.

How do you get speedups in a setting where you need to sample random clips across multiple videos?

Is it because you give up on diversity within a minibatch? For example, I could imagine if, for 100 subsequences in a minibatch, instead of having 1 subsequence each from 100 videos you want 10 subsequences from 10 videos, and for the latter torchcodec can sweep over each video one-time to get 10 clips - and this could be much faster.

NicolasHug · 2025-05-29T08:35:48Z

Hi @richardrl

There are a bunch of optimizations we try to do within torchcodec. In the random clip sampling scenario you are mentioning, the bulk of the speedup over a naive implementation comes from the fact that torchcodec prevents backwards seeks, which we have observed to be very slow in these scenarios. That is, if the frames we need to decode are at indices 50, 10, 1, 30, torchcodec would decode frames 1, 10, 30, 50 and re-order them to match the expected output ordering.

There are other optimizations, like the choice of color conversion library (libswscale vs filtergraph).

There are no trade-off, i.e. we don't compromise on anything (like diversity) unless explicitly stated e.g. with the seek_mode="approximate".

the decoding gets extremely slow deeper into the MP4, because you have to start from the beginning and sequentially decode to your target frame

Just to note that this is usually not the case: decoding a given frame requires decoding the previous (and sometimes the next) key frame, but there is no need to decode from the beginning of the file.

richardrl · 2025-06-01T21:23:12Z

@NicolasHug is there a specific example / API you’d recommend for this use case of random clip sampling (like in imitation learning setting)?

I think I wasn’t seeing great results from the indexing API.

I’d like to compare the speed compared to my existing dataloader which first decodes the video then uses ffcv to sample subclips.

NicolasHug · 2025-06-02T09:44:32Z

Sure, all of our clip samplers are details in this tutorials: https://docs.pytorch.org/torchcodec/stable/generated_examples/sampling.html

VimalMollyn · 2025-06-02T17:23:43Z

@NicolasHug do you have an example where you can use torchcodec to sample a batch of video clips from multiple videos? Say, 30 frames each from 10 videos, to get a batch of size 10 x 30 x channels x height x width? I couldn't find such an example in the docs.

NicolasHug · 2025-06-03T08:23:19Z

@VimalMollyn I think you'd just need to call the samplers (like clips_at_random_indices) individually on each 10 videos? Let me know if I'm missing something

richardrl · 2025-06-04T02:55:40Z

@NicolasHug Are the sampling API's supposed to be faster than indexing API?

I did a test and it seems to be the same.

NicolasHug · 2025-06-04T08:45:58Z

They're not supposed to be faster than the batch APIs like get_frames_*, because they rely on those under the hood. But they'll be faster than individually calling the single-frame APIs like get_frame_*

richardrl · 2025-06-05T02:15:40Z

Suppose we want to sample 1 frame from 100 videos in our dataloader. How would we structure the Pytorch dataset? If we use the TorchCodec inside getitem for the dataset, it would be very slow (decoding one frame at a time).

NicolasHug · 2025-06-05T07:59:08Z

To decode one frame for each video you will need one VideoDecoder instance per video.

richardrl · 2025-06-07T07:57:07Z

Yes, but how do we structure getitem? Is it a VideoDecoder created per getitem that retrieves one frame with the indexing API? @NicolasHug

richardrl · 2025-06-07T08:32:07Z

I continued this in #715

NicolasHug · 2025-06-09T09:58:56Z

Sounds good, I'll close this issue as I think the original questions were addressed.

NicolasHug closed this as completed Jun 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's actually making this faster? #703

What's actually making this faster? #703

richardrl commented May 28, 2025 •

edited

Loading

NicolasHug commented May 29, 2025 •

edited

Loading

Uh oh!

richardrl commented Jun 1, 2025

Uh oh!

NicolasHug commented Jun 2, 2025

Uh oh!

VimalMollyn commented Jun 2, 2025

Uh oh!

NicolasHug commented Jun 3, 2025

Uh oh!

richardrl commented Jun 4, 2025

Uh oh!

NicolasHug commented Jun 4, 2025

Uh oh!

richardrl commented Jun 5, 2025

Uh oh!

NicolasHug commented Jun 5, 2025

Uh oh!

richardrl commented Jun 7, 2025

Uh oh!

richardrl commented Jun 7, 2025

Uh oh!

NicolasHug commented Jun 9, 2025

Uh oh!

What's actually making this faster? #703

What's actually making this faster? #703

Comments

richardrl commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NicolasHug commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardrl commented Jun 1, 2025

Uh oh!

NicolasHug commented Jun 2, 2025

Uh oh!

VimalMollyn commented Jun 2, 2025

Uh oh!

NicolasHug commented Jun 3, 2025

Uh oh!

richardrl commented Jun 4, 2025

Uh oh!

NicolasHug commented Jun 4, 2025

Uh oh!

richardrl commented Jun 5, 2025

Uh oh!

NicolasHug commented Jun 5, 2025

Uh oh!

richardrl commented Jun 7, 2025

Uh oh!

richardrl commented Jun 7, 2025

Uh oh!

NicolasHug commented Jun 9, 2025

Uh oh!

richardrl commented May 28, 2025 •

edited

Loading

NicolasHug commented May 29, 2025 •

edited

Loading