Verify decoder outputs #728

Dan-Flores · 2025-06-16T13:36:20Z

Overview

While adding the OpenCV decoder in #711, we realized that OpenCV was decoding a different stream than TorchCodec, resulting in unexpected benchmark results. This PR implements a function to compare the outputs of decoders.

Example usage

By adding --verify-outputs to the benchmark command, the script outputs a warning indicating that the output frames are different for TorchCodec in approximate mode compared to OpenCV on nasa_13013.mp4:

python benchmarks/decoders/benchmark_decoders.py --decoders torchaudio,torchcodec_public:seek_mode=approximate,opencv   --verify-outputs

Error while comparing TorchCodecPublic:seek_mode=approximate and OpenCV[backend=FFMPEG]: The values for attribute 'shape' do not match: torch.Size([3, 270, 480]) != torch.Size([3, 180, 320]).

The dimensions match when stream_index is utilized, but the tensors are unequal:

python benchmarks/decoders/benchmark_decoders.py --decoders torchaudio:stream_index=0,torchcodec_public:seek_mode=approximate+stream_index=0,opencv   --verify-outputs --video-paths output.mp4


AssertionError: Tensor-likes are not equal!

Mismatched elements: 38759 / 172800 (22.4%)
Greatest absolute difference: 16 at index (0, 98, 96)
Greatest relative difference: inf at index (0, 44, 210)

For further testing, I generated a video using ffmpeg, where all results matched:

ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 -vf "drawtext=fontfile=/path/to/font.ttf: text='%{pts\:hms}': x=(w-text_w)/2: y=(h-text_h)/2: fontsize=48: fontcolor=white: box=1: [email protected]: boxborderw=5, hue=H=2*t" -pix_fmt yuv420p output.mp4

python benchmarks/decoders/benchmark_decoders.py --decoders torchaudio:stream_index=0,torchcodec_public:seek_mode=approximate+stream_index=0,opencv   --verify-outputs --video-paths output.mp4

video=output.mp4, decoder=OpenCV[backend=FFMPEG]
Results of baseline TorchCodecPublic and OpenCV[backend=FFMPEG] match!
video=output.mp4, decoder=TorchAudio:stream_index=0
Results of baseline TorchCodecPublic and TorchAudio:stream_index=0 match!
video=output.mp4, decoder=TorchCodecPublic:seek_mode=approximate+stream_index=0
Results of baseline TorchCodecPublic and TorchCodecPublic:seek_mode=approximate+stream_index=0 match!

Dan-Flores · 2025-06-16T13:37:55Z

benchmarks/decoders/benchmark_decoders_library.py

+                try:
+                    torch.testing.assert_close(f1, f2)
+                except Exception as e:
+                    tensorcat(f1)


This library is useful for visually comparing frames, but I am open to removing it since it is not necessary.

Yeah, I think we should remove it. Even though most users won't run the benchmarks, we still want to keep dependencies down.

I also think we can simplify this part, and just do the plain assertion. For this kind of validation, it's generally better to get a failure than it is to just print the result to stdout. Our CI, and a lot of other monitoring infra, will use the process exit code to determine if there was a problem. Doing what we're currently doing will mean that we'd need to parse stdout to figure out if there was a status if we wanted to hook this up to CI.

Thanks for the feedback, I'll remove the library and keep that in mind going forward.

Dan-Flores · 2025-06-16T13:41:45Z

benchmarks/decoders/benchmark_decoders_library.py

+    # Import library to show frames that don't match
+    from tensorcat import tensorcat
+
+    # Reuse TorchCodecPublic decoder with options, if provided, as the baseline


Is TorchCodecPublic the correct decoder to use as the baseline?

Good question - yes, I think so. We have a large battery of unit tests testing for bit-for-bit correctness on TorchCodec, and it's easier to test on the public API.

scotts · 2025-06-16T14:54:32Z

benchmarks/decoders/benchmark_decoders_library.py

+        None,
+    ):
+        torchcodec_public_decoder = decoders_to_run[torchcodec_display_name]
+    # Create default TorchCodecPublic decoder to use as a baseline


This means that the reference decoder will be subject to the options that the user provides, such as seek_mode. I think we shouldn't try to use the options the user provided, but instead decide what the reference decoder is, and always use that. That means that we probably shouldn't use the default options, but decide what options we want to use. Regarding seek_mode, I think we should probably use exact as the reference.

One catch is that given the new 'stream_index' option, we rely on the user to indicate which stream the benchmarks should compare. Without this option, the benchmark for OpenCV vs TorchCodecPublic would not match.

I agree that exact should be used as the reference. I've updated the code to only reuse the stream_index argument.

Oh! But we want to use the stream_index that the user provided. Good call. :)

scotts · 2025-06-16T18:13:28Z

benchmarks/decoders/benchmark_decoders.py

@@ -177,6 +135,9 @@ def main() -> None:
            if entry.is_file() and entry.name.endswith(".mp4"):
                video_paths.append(entry.path)

+    if args.verify_outputs:
+        verify_outputs(decoders_to_run, video_paths, num_uniform_samples)
+


I think this option is most useful if it's mutually exclusive with running the actual benchmarks. That way someone can specify it to quickly test the benchmark's correctness. So here I think we should make running the benchmarks and printing the benchmark results the else branch here.

scotts · 2025-06-16T18:17:18Z

benchmarks/decoders/benchmark_decoders_library.py

+        # Generate uniformly random PTS
+        duration = metadata.duration_seconds
+        pts_list = [i * duration / num_samples for i in range(num_samples)]
+


Technically this is uniformly-spaced PTS values, or just evenly-spaced PTS values. Uniformly random would look like pts_list = (torch.rand(num_samples) * duration).tolist().

scotts · 2025-06-16T18:25:47Z

benchmarks/decoders/benchmark_decoders_library.py

+    # Decode non-sequential frames using decode_frames function
+    random_frames = decoder.decode_frames(video_file_path, pts_list)
+    # Extract the frames from the FrameBatch if necessary
+    if isinstance(random_frames, FrameBatch):


On comments: often the what is something a reader can easily deduce. It's the why that usually needs a comment. In this case, the why is that TorchCodec's batch APIs return a FrameBatch. We sometimes use these APIs in our experiments, and we just return that directly. But for all other decoders, we just return a list of frames.

scotts · 2025-06-16T18:29:35Z

benchmarks/decoders/benchmark_decoders_library.py

+                pts_list=pts_list,
+            )
+            decoders_and_frames.append((decoder_name, frames))
+


Since we just want to assert that the frames are close (see comment below), I think we can simplify even further and just do that assertion here. Then we don't need to do a separate loop over decoders_and_frames. In fact, I think we don't even need to record decoders_and_frames, as we don't need to remember the frames after the assertion.

Good point, I was able to significantly simplify this portion.

scotts

Thanks for this! I think addressing the remaining comments will simplify the verification and make it easier for us to hook this up to CI (eventually, if we want to).

scotts · 2025-06-17T02:36:16Z

This is great! We should have been doing this all along. :) Let's go ahead and merge - I'm confident that the testing itself is correct, and I suspect we have a bug in how we're decoding with OpenCV. Let's create a new issue to investigate the problem with how we're using OpenCV, and label it as a "bug."

Dan-Flores added 4 commits June 13, 2025 08:15

Verify decoder outputs

5e06186

wip, comppare w tc_public

0ad8a3c

Test sequential and random frames

e4a6ceb

Test sequential and random frames

86c8a4f

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 16, 2025

Dan-Flores commented Jun 16, 2025

View reviewed changes

lint

f4f8973

scotts reviewed Jun 16, 2025

View reviewed changes

Only reuse stream_index option, use seek_mode=exact

dcbc1e8

scotts reviewed Jun 16, 2025

View reviewed changes

scotts requested changes Jun 16, 2025

View reviewed changes

Dan-Flores added 3 commits June 16, 2025 15:05

End script after verify_outputs

68f96e1

Simplify assertion logic

8517aa2

add print statement when frames match

31c6a5f

scotts approved these changes Jun 17, 2025

View reviewed changes

Increase verified outputs to num_samples, rename non_seq_frames

c813bfd

Dan-Flores merged commit d88cf1a into pytorch:main Jun 17, 2025
42 of 44 checks passed

Verify decoder outputs #728

Verify decoder outputs #728

Uh oh!

Conversation

Dan-Flores commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Example usage

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts left a comment

Choose a reason for hiding this comment

Uh oh!

scotts commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

Dan-Flores commented Jun 16, 2025 •

edited

Loading