Allow num_frames and duration to be absent in C++ decoder #708

scotts · 2025-05-31T02:16:34Z

This is necessary to decode live streaming data (#695), but not sufficient. It's necessary because we need the C++ layer to be robust to the number of frames and duration being missing from the metadata, as that will not be available when decoding a live stream. But it's also not sufficient because the Python layer for VideoDecoder also asserts that metadata is present.

In fact, because VideoDecoder uses the number frames from the metadata in its __len__() method, we will need to expose a different public Python API to support live streaming. We should, however, be able to reuse the C++ layer.

I don't particularly love the way I'm currently achieving this, so I'm open to alternatives. I'm also concerned that when the number of frames or duration are missing, we might actually be open to segfaults. I think we should just get the exceptions from the inner decoding loop about not being able to decode frames, but I'm not sure.

Discussion points:

Are we comfortable relaxing these requirements at the C++ level? We can (and do) still enforce some of this at the Python level if the Python APIs require this.
If yes to 1, is this the best way to do it?
We don't have any explicit tests for what happens when this data is not present. We don't have any files which have it missing. We could write some C++ tests that go in an just remove the values from the C++ VideoDecoder. That's ugly - is it worth it?

NicolasHug · 2025-06-02T13:34:18Z

src/torchcodec/_core/SingleStreamDecoder.cpp

    const StreamMetadata& streamMetadata) {
  switch (seekMode_) {
    case SeekMode::exact:
      return streamMetadata.numFramesFromScan.value();
    case SeekMode::approximate: {
-      TORCH_CHECK(
-          streamMetadata.numFrames.has_value(),
-          "Cannot use approximate mode since we couldn't find the number of frames from the metadata.");
      return streamMetadata.numFrames.value();


I think we mean to return streamMetadata.numFrames instead of streamMetadata.numFrames.value()?

Maybe I'm missing something but I'm surprised this compiles. This function seems to always return int64_t, never a std::optional<int64_t> ?

It's overly broad, but still true. That is, the following should compile just fine:

std::optional<int> alwaysOne() { return 1; }

It's just that the return value is always an optional with the value 1.

Ah, it reminds me of this great random number generator:

def get_random_int(): return 4

NicolasHug · 2025-06-02T14:08:18Z

I think it's reasonable to relax the existing requirements when the video doesn't have enough metadata to enforce them. The good news is we haven't had any user report indicating that our hard TORCH_CHECK were limiting, so such videos hopefully aren't common in the wild (except for live streams).

On point 2. I'm a bit puzzled about what the PR is doing (see my question above) but overall I don't see a much better alternative.

And re tests / potential segfaults: we definitely want to test this, but I agree that we want to avoid writing some C++ tests for that, it's hard to maintain. We could create an issue to remind us to generate such bad video once we have our own video encoder implemented, it should be easier at this point. Until then, to make sure we don't have segfaults, maybe we could just modify the code locally to e.g. force numFrames to be nullops and see how our tests behave?

scotts · 2025-06-03T02:00:31Z

@NicolasHug, I hope the purpose of PR is more clear now that I've corrected my code. :)

And on how to quickly check we don't segfault: good thinking, I just tried that. Specifically, I commented out the lines:

torchcodec/src/torchcodec/_core/SingleStreamDecoder.cpp

Lines 126 to 134 in ae50558

    
           int64_t frameCount = avStream->nb_frames; 
        
           if (frameCount > 0) { 
        
             streamMetadata.numFrames = frameCount; 
        
           } 
        
           if (avStream->duration > 0 && avStream->time_base.den > 0) { 
        
             streamMetadata.durationSeconds = 
        
                 av_q2d(avStream->time_base) * avStream->duration; 
        
           }

That means that even if the metadata is present in the file, we just ignore it. As expected, we get tons of test failures - in particular, constructing a Python VideoDecoder object asserts those values are not None in the metadata. But no segfaults!

NicolasHug

Nice, thanks for checking. I also tried the following, to bypass the instantiation failures related to missing metadata:

diff --git a/src/torchcodec/_core/SingleStreamDecoder.cpp b/src/torchcodec/_core/SingleStreamDecoder.cpp
index f4a285e..2947c64 100644
--- a/src/torchcodec/_core/SingleStreamDecoder.cpp
+++ b/src/torchcodec/_core/SingleStreamDecoder.cpp
@@ -1487,7 +1487,8 @@ std::optional<int64_t> SingleStreamDecoder::getNumFrames(
     case SeekMode::exact:
       return streamMetadata.numFramesFromScan.value();
     case SeekMode::approximate: {
-      return streamMetadata.numFrames;
+      return std::nullopt;
     }
     default:
       throw std::runtime_error("Unknown SeekMode");
@@ -1512,7 +1513,8 @@ std::optional<double> SingleStreamDecoder::getMaxSeconds(
     case SeekMode::exact:
       return streamMetadata.maxPtsSecondsFromScan.value();
     case SeekMode::approximate: {
-      return streamMetadata.durationSeconds;
+      return std::nullopt;
     }
     default:
       throw std::runtime_error("Unknown SeekMode");

And all the tests are passing except for 2 tests with the following error, which doesn't seem to be problematic:

E       AssertionError: Regex pattern did not match.
E        Regex: 'must be less than'
E        Input: 'Requested next frame while there are no more frames left to decode.'

NicolasHug

Nice, thanks for checking. I also tried the following, to bypass the instantiation failures related to missing metadata:

diff --git a/src/torchcodec/_core/SingleStreamDecoder.cpp b/src/torchcodec/_core/SingleStreamDecoder.cpp
index f4a285e..2947c64 100644
--- a/src/torchcodec/_core/SingleStreamDecoder.cpp
+++ b/src/torchcodec/_core/SingleStreamDecoder.cpp
@@ -1487,7 +1487,8 @@ std::optional<int64_t> SingleStreamDecoder::getNumFrames(
     case SeekMode::exact:
       return streamMetadata.numFramesFromScan.value();
     case SeekMode::approximate: {
-      return streamMetadata.numFrames;
+      return std::nullopt;
     }
     default:
       throw std::runtime_error("Unknown SeekMode");
@@ -1512,7 +1513,8 @@ std::optional<double> SingleStreamDecoder::getMaxSeconds(
     case SeekMode::exact:
       return streamMetadata.maxPtsSecondsFromScan.value();
     case SeekMode::approximate: {
-      return streamMetadata.durationSeconds;
+      return std::nullopt;
     }
     default:
       throw std::runtime_error("Unknown SeekMode");

And all the tests are passing except for 2 tests with the following error, which doesn't seem to be problematic:

E       AssertionError: Regex pattern did not match.
E        Regex: 'must be less than'
E        Input: 'Requested next frame while there are no more frames left to decode.'

Allow num_frames and duration to be absent in C++ decoder

04b2ead

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 31, 2025

scotts marked this pull request as ready for review June 2, 2025 12:55

NicolasHug reviewed Jun 2, 2025

View reviewed changes

getNumFrames() should return optional in approximate mode

7a6e6ca

NicolasHug approved these changes Jun 3, 2025

View reviewed changes

scotts mentioned this pull request Jun 3, 2025

Add test video with missing number of frames and duration #710

Open

scotts merged commit 0f22b2b into pytorch:main Jun 3, 2025
47 checks passed

scotts deleted the relax_duration_frames branch June 3, 2025 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow num_frames and duration to be absent in C++ decoder #708

Allow num_frames and duration to be absent in C++ decoder #708

Uh oh!

scotts commented May 31, 2025 •

edited

Loading

Uh oh!

NicolasHug Jun 2, 2025

Uh oh!

NicolasHug Jun 2, 2025

Uh oh!

scotts Jun 2, 2025

Uh oh!

NicolasHug Jun 2, 2025

Uh oh!

NicolasHug commented Jun 2, 2025

Uh oh!

scotts commented Jun 3, 2025

Uh oh!

NicolasHug left a comment

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

Uh oh!

Allow num_frames and duration to be absent in C++ decoder #708

Allow num_frames and duration to be absent in C++ decoder #708

Uh oh!

Conversation

scotts commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Jun 2, 2025

Uh oh!

scotts commented Jun 3, 2025

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

scotts commented May 31, 2025 •

edited

Loading