-
Notifications
You must be signed in to change notification settings - Fork 37
What's actually making this faster? #703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @richardrl There are a bunch of optimizations we try to do within torchcodec. In the random clip sampling scenario you are mentioning, the bulk of the speedup over a naive implementation comes from the fact that torchcodec prevents backwards seeks, which we have observed to be very slow in these scenarios. That is, if the frames we need to decode are at indices There are other optimizations, like the choice of color conversion library (libswscale vs filtergraph). There are no trade-off, i.e. we don't compromise on anything (like diversity) unless explicitly stated e.g. with the
Just to note that this is usually not the case: decoding a given frame requires decoding the previous (and sometimes the next) key frame, but there is no need to decode from the beginning of the file. |
@NicolasHug is there a specific example / API you’d recommend for this use case of random clip sampling (like in imitation learning setting)? I think I wasn’t seeing great results from the indexing API. I’d like to compare the speed compared to my existing dataloader which first decodes the video then uses ffcv to sample subclips. |
Sure, all of our clip samplers are details in this tutorials: https://docs.pytorch.org/torchcodec/stable/generated_examples/sampling.html |
@NicolasHug do you have an example where you can use torchcodec to sample a batch of video clips from multiple videos? Say, 30 frames each from 10 videos, to get a batch of size 10 x 30 x channels x height x width? I couldn't find such an example in the docs. |
@VimalMollyn I think you'd just need to call the samplers (like |
@NicolasHug Are the sampling API's supposed to be faster than indexing API? I did a test and it seems to be the same. |
They're not supposed to be faster than the batch APIs like |
Suppose we want to sample 1 frame from 100 videos in our dataloader. How would we structure the Pytorch dataset? If we use the TorchCodec inside getitem for the dataset, it would be very slow (decoding one frame at a time). |
To decode one frame for each video you will need one |
Yes, but how do we structure getitem? Is it a VideoDecoder created per getitem that retrieves one frame with the indexing API? @NicolasHug |
I continued this in #715 |
Sounds good, I'll close this issue as I think the original questions were addressed. |
Uh oh!
There was an error while loading. Please reload this page.
My understanding is the fundamental issue with decoding from MP4s is the decoding gets extremely slow deeper into the MP4, because you have to start from the beginning and sequentially decode to your target frame.
I did an experiment using the torchcodec indexing API and it is no faster than a very naive decoding with AV library.
How do you get speedups in a setting where you need to sample random clips across multiple videos?
Is it because you give up on diversity within a minibatch? For example, I could imagine if, for 100 subsequences in a minibatch, instead of having 1 subsequence each from 100 videos you want 10 subsequences from 10 videos, and for the latter torchcodec can sweep over each video one-time to get 10 clips - and this could be much faster.
The text was updated successfully, but these errors were encountered: