-
Notifications
You must be signed in to change notification settings - Fork 37
Allow num_frames and duration to be absent in C++ decoder #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
const StreamMetadata& streamMetadata) { | ||
switch (seekMode_) { | ||
case SeekMode::exact: | ||
return streamMetadata.numFramesFromScan.value(); | ||
case SeekMode::approximate: { | ||
TORCH_CHECK( | ||
streamMetadata.numFrames.has_value(), | ||
"Cannot use approximate mode since we couldn't find the number of frames from the metadata."); | ||
return streamMetadata.numFrames.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we mean to return streamMetadata.numFrames
instead of streamMetadata.numFrames.value()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something but I'm surprised this compiles. This function seems to always return int64_t
, never a std::optional<int64_t>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's overly broad, but still true. That is, the following should compile just fine:
std::optional<int> alwaysOne() {
return 1;
}
It's just that the return value is always an optional with the value 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it reminds me of this great random number generator:
def get_random_int():
return 4
I think it's reasonable to relax the existing requirements when the video doesn't have enough metadata to enforce them. The good news is we haven't had any user report indicating that our hard On point 2. I'm a bit puzzled about what the PR is doing (see my question above) but overall I don't see a much better alternative. And re tests / potential segfaults: we definitely want to test this, but I agree that we want to avoid writing some C++ tests for that, it's hard to maintain. We could create an issue to remind us to generate such bad video once we have our own video encoder implemented, it should be easier at this point. Until then, to make sure we don't have segfaults, maybe we could just modify the code locally to e.g. force numFrames to be |
@NicolasHug, I hope the purpose of PR is more clear now that I've corrected my code. :) And on how to quickly check we don't segfault: good thinking, I just tried that. Specifically, I commented out the lines: torchcodec/src/torchcodec/_core/SingleStreamDecoder.cpp Lines 126 to 134 in ae50558
That means that even if the metadata is present in the file, we just ignore it. As expected, we get tons of test failures - in particular, constructing a Python VideoDecoder object asserts those values are not None in the metadata. But no segfaults!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for checking. I also tried the following, to bypass the instantiation failures related to missing metadata:
diff --git a/src/torchcodec/_core/SingleStreamDecoder.cpp b/src/torchcodec/_core/SingleStreamDecoder.cpp
index f4a285e..2947c64 100644
--- a/src/torchcodec/_core/SingleStreamDecoder.cpp
+++ b/src/torchcodec/_core/SingleStreamDecoder.cpp
@@ -1487,7 +1487,8 @@ std::optional<int64_t> SingleStreamDecoder::getNumFrames(
case SeekMode::exact:
return streamMetadata.numFramesFromScan.value();
case SeekMode::approximate: {
- return streamMetadata.numFrames;
+ return std::nullopt;
}
default:
throw std::runtime_error("Unknown SeekMode");
@@ -1512,7 +1513,8 @@ std::optional<double> SingleStreamDecoder::getMaxSeconds(
case SeekMode::exact:
return streamMetadata.maxPtsSecondsFromScan.value();
case SeekMode::approximate: {
- return streamMetadata.durationSeconds;
+ return std::nullopt;
}
default:
throw std::runtime_error("Unknown SeekMode");
And all the tests are passing except for 2 tests with the following error, which doesn't seem to be problematic:
E AssertionError: Regex pattern did not match.
E Regex: 'must be less than'
E Input: 'Requested next frame while there are no more frames left to decode.'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for checking. I also tried the following, to bypass the instantiation failures related to missing metadata:
diff --git a/src/torchcodec/_core/SingleStreamDecoder.cpp b/src/torchcodec/_core/SingleStreamDecoder.cpp
index f4a285e..2947c64 100644
--- a/src/torchcodec/_core/SingleStreamDecoder.cpp
+++ b/src/torchcodec/_core/SingleStreamDecoder.cpp
@@ -1487,7 +1487,8 @@ std::optional<int64_t> SingleStreamDecoder::getNumFrames(
case SeekMode::exact:
return streamMetadata.numFramesFromScan.value();
case SeekMode::approximate: {
- return streamMetadata.numFrames;
+ return std::nullopt;
}
default:
throw std::runtime_error("Unknown SeekMode");
@@ -1512,7 +1513,8 @@ std::optional<double> SingleStreamDecoder::getMaxSeconds(
case SeekMode::exact:
return streamMetadata.maxPtsSecondsFromScan.value();
case SeekMode::approximate: {
- return streamMetadata.durationSeconds;
+ return std::nullopt;
}
default:
throw std::runtime_error("Unknown SeekMode");
And all the tests are passing except for 2 tests with the following error, which doesn't seem to be problematic:
E AssertionError: Regex pattern did not match.
E Regex: 'must be less than'
E Input: 'Requested next frame while there are no more frames left to decode.'
This is necessary to decode live streaming data (#695), but not sufficient. It's necessary because we need the C++ layer to be robust to the number of frames and duration being missing from the metadata, as that will not be available when decoding a live stream. But it's also not sufficient because the Python layer for
VideoDecoder
also asserts that metadata is present.In fact, because
VideoDecoder
uses the number frames from the metadata in its__len__()
method, we will need to expose a different public Python API to support live streaming. We should, however, be able to reuse the C++ layer.I don't particularly love the way I'm currently achieving this, so I'm open to alternatives. I'm also concerned that when the number of frames or duration are missing, we might actually be open to segfaults. I think we should just get the exceptions from the inner decoding loop about not being able to decode frames, but I'm not sure.
Discussion points:
VideoDecoder
. That's ugly - is it worth it?