-
Notifications
You must be signed in to change notification settings - Fork 39
Verify decoder outputs #728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
try: | ||
torch.testing.assert_close(f1, f2) | ||
except Exception as e: | ||
tensorcat(f1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This library is useful for visually comparing frames, but I am open to removing it since it is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should remove it. Even though most users won't run the benchmarks, we still want to keep dependencies down.
I also think we can simplify this part, and just do the plain assertion. For this kind of validation, it's generally better to get a failure than it is to just print the result to stdout. Our CI, and a lot of other monitoring infra, will use the process exit code to determine if there was a problem. Doing what we're currently doing will mean that we'd need to parse stdout to figure out if there was a status if we wanted to hook this up to CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, I'll remove the library and keep that in mind going forward.
# Import library to show frames that don't match | ||
from tensorcat import tensorcat | ||
|
||
# Reuse TorchCodecPublic decoder with options, if provided, as the baseline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is TorchCodecPublic
the correct decoder to use as the baseline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - yes, I think so. We have a large battery of unit tests testing for bit-for-bit correctness on TorchCodec, and it's easier to test on the public API.
None, | ||
): | ||
torchcodec_public_decoder = decoders_to_run[torchcodec_display_name] | ||
# Create default TorchCodecPublic decoder to use as a baseline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the reference decoder will be subject to the options that the user provides, such as seek_mode
. I think we shouldn't try to use the options the user provided, but instead decide what the reference decoder is, and always use that. That means that we probably shouldn't use the default options, but decide what options we want to use. Regarding seek_mode
, I think we should probably use exact
as the reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One catch is that given the new 'stream_index' option, we rely on the user to indicate which stream the benchmarks should compare. Without this option, the benchmark for OpenCV vs TorchCodecPublic would not match.
I agree that exact
should be used as the reference. I've updated the code to only reuse the stream_index
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! But we want to use the stream_index that the user provided. Good call. :)
@@ -177,6 +135,9 @@ def main() -> None: | |||
if entry.is_file() and entry.name.endswith(".mp4"): | |||
video_paths.append(entry.path) | |||
|
|||
if args.verify_outputs: | |||
verify_outputs(decoders_to_run, video_paths, num_uniform_samples) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this option is most useful if it's mutually exclusive with running the actual benchmarks. That way someone can specify it to quickly test the benchmark's correctness. So here I think we should make running the benchmarks and printing the benchmark results the else branch here.
# Generate uniformly random PTS | ||
duration = metadata.duration_seconds | ||
pts_list = [i * duration / num_samples for i in range(num_samples)] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this is uniformly-spaced PTS values, or just evenly-spaced PTS values. Uniformly random would look like pts_list = (torch.rand(num_samples) * duration).tolist()
.
# Decode non-sequential frames using decode_frames function | ||
random_frames = decoder.decode_frames(video_file_path, pts_list) | ||
# Extract the frames from the FrameBatch if necessary | ||
if isinstance(random_frames, FrameBatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On comments: often the what is something a reader can easily deduce. It's the why that usually needs a comment. In this case, the why is that TorchCodec's batch APIs return a FrameBatch
. We sometimes use these APIs in our experiments, and we just return that directly. But for all other decoders, we just return a list of frames.
pts_list=pts_list, | ||
) | ||
decoders_and_frames.append((decoder_name, frames)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we just want to assert that the frames are close (see comment below), I think we can simplify even further and just do that assertion here. Then we don't need to do a separate loop over decoders_and_frames
. In fact, I think we don't even need to record decoders_and_frames
, as we don't need to remember the frames after the assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I was able to significantly simplify this portion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I think addressing the remaining comments will simplify the verification and make it easier for us to hook this up to CI (eventually, if we want to).
This is great! We should have been doing this all along. :) Let's go ahead and merge - I'm confident that the testing itself is correct, and I suspect we have a bug in how we're decoding with OpenCV. Let's create a new issue to investigate the problem with how we're using OpenCV, and label it as a "bug." |
Overview
While adding the OpenCV decoder in #711, we realized that OpenCV was decoding a different stream than TorchCodec, resulting in unexpected benchmark results. This PR implements a function to compare the outputs of decoders.
Example usage
By adding
--verify-outputs
to the benchmark command, the script outputs a warning indicating that the output frames are different for TorchCodec in approximate mode compared to OpenCV onnasa_13013.mp4
:The dimensions match when
stream_index
is utilized, but the tensors are unequal:For further testing, I generated a video using ffmpeg, where all results matched: