Add Docs for AudioEncoder (#717)

NicolasHug · web-flow · commit 7945e6aec7b7 · 2025-07-04T15:57:58.000+01:00
diff --git a/README.md b/README.md
@@ -3,17 +3,18 @@
 # TorchCodec
 
 TorchCodec is a Python library for decoding video and audio data into PyTorch
-tensors, on CPU and CUDA GPU. It aims to be fast, easy to use, and well
-integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML
-models on videos and audio, TorchCodec is how you turn these into data.
+tensors, on CPU and CUDA GPU. It also supports audio encoding, and video
+encoding will come soon!  It aims to be fast, easy to use, and well integrated
+into the PyTorch ecosystem.  If you want to use PyTorch to train ML models on
+videos and audio, TorchCodec is how you turn these into data.
 
 We achieve these capabilities through:
 
 * Pythonic APIs that mirror Python and PyTorch conventions.
-* Relying on [FFmpeg](https://www.ffmpeg.org/) to do the decoding. TorchCodec
-  uses the version of FFmpeg you already have installed. FFmpeg is a mature
-  library with broad coverage available on most systems. It is, however, not
-  easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
+* Relying on [FFmpeg](https://www.ffmpeg.org/) to do the decoding and encoding.
+  TorchCodec uses the version of FFmpeg you already have installed. FFmpeg is a
+  mature library with broad coverage available on most systems. It is, however,
+  not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
   correctly and efficiently.
 * Returning data as PyTorch tensors, ready to be fed into PyTorch transforms
   or used directly to train models.
diff --git a/docs/source/api_ref_decoders.rst b/docs/source/api_ref_decoders.rst
@@ -7,8 +7,8 @@ torchcodec.decoders
 .. currentmodule:: torchcodec.decoders
 
 
-For a video decoder tutorial, see: :ref:`sphx_glr_generated_examples_basic_example.py`.
-For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_audio_decoding.py`.
+For a video decoder tutorial, see: :ref:`sphx_glr_generated_examples_decoding_basic_example.py`.
+For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_decoding_audio_decoding.py`.
 
 
 .. autosummary::
diff --git a/docs/source/api_ref_encoders.rst b/docs/source/api_ref_encoders.rst
@@ -0,0 +1,18 @@
+.. _encoders:
+
+===================
+torchcodec.encoders
+===================
+
+.. currentmodule:: torchcodec.encoders
+
+
+For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_encoding_audio_encoding.py`.
+
+
+.. autosummary::
+    :toctree: generated/
+    :nosignatures:
+    :template: class.rst
+
+    AudioEncoder
diff --git a/docs/source/api_ref_samplers.rst b/docs/source/api_ref_samplers.rst
@@ -6,7 +6,7 @@ torchcodec.samplers
 
 .. currentmodule:: torchcodec.samplers
 
-For a tutorial, see: :ref:`sphx_glr_generated_examples_sampling.py`.
+For a tutorial, see: :ref:`sphx_glr_generated_examples_decoding_sampling.py`.
 
 .. autosummary::
     :toctree: generated/
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -68,18 +68,27 @@ class CustomGalleryExampleSortKey:
     def __init__(self, src_dir):
         self.src_dir = src_dir
 
-    order = [
-        "basic_example.py",
-        "audio_decoding.py",
-        "basic_cuda_example.py",
-        "file_like.py",
-        "approximate_mode.py",
-        "sampling.py",
-    ]
-
     def __call__(self, filename):
+        # We have two top-level galleries, one for decoding examples and one for
+        # encoding examples. We define the example order within each gallery
+        # individually.
+        if "examples/decoding" in self.src_dir:
+            order = [
+                "basic_example.py",
+                "audio_decoding.py",
+                "basic_cuda_example.py",
+                "file_like.py",
+                "approximate_mode.py",
+                "sampling.py",
+            ]
+        else:
+            assert "examples/encoding" in self.src_dir
+            order = [
+                "audio_encoding.py",
+            ]
+
         try:
-            return self.order.index(filename)
+            return order.index(filename)
         except ValueError as e:
             raise ValueError(
                 "Looks like you added an example in the examples/ folder?"
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -2,21 +2,25 @@ Welcome to the TorchCodec documentation!
 ========================================
 
 TorchCodec is a Python library for decoding video and audio data into PyTorch
-tensors, on CPU and CUDA GPU. It aims to be fast, easy to use, and well
-integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML
-models on videos and audio, TorchCodec is how you turn these into data.
+tensors, on CPU and CUDA GPU. It also supports audio encoding, and video encoding will come soon!
+It aims to be fast, easy to use, and well integrated into the PyTorch ecosystem.
+If you want to use PyTorch to train ML models on videos and audio, TorchCodec is
+how you turn these into data.
 
 We achieve these capabilities through:
 
 * Pythonic APIs that mirror Python and PyTorch conventions.
-* Relying on `FFmpeg <https://www.ffmpeg.org/>`_ to do the decoding. TorchCodec
-  uses the version of FFmpeg you already have installed. FMPEG is a mature
-  library with broad coverage available on most systems. It is, however, not
-  easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
-  correctly and efficiently.
+* Relying on `FFmpeg <https://www.ffmpeg.org/>`_ to do the decoding / encoding.
+  TorchCodec uses the version of FFmpeg you already have installed. FMPEG is a
+  mature library with broad coverage available on most systems. It is, however,
+  not easy to use.  TorchCodec abstracts FFmpeg's complexity to ensure it is
+  used correctly and efficiently.
 * Returning data as PyTorch tensors, ready to be fed into PyTorch transforms
   or used directly to train models.
 
+Installation instructions
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
 .. grid:: 3
 
      .. grid-item-card:: :octicon:`file-code;1em`
@@ -27,46 +31,64 @@ We achieve these capabilities through:
 
         How to install TorchCodec
 
+Decoding
+^^^^^^^^
+
+.. grid:: 3
+
      .. grid-item-card:: :octicon:`file-code;1em`
         Getting Started with TorchCodec
         :img-top: _static/img/card-background.svg
-        :link: generated_examples/basic_example.html
+        :link: generated_examples/decoding/basic_example.html
         :link-type: url
 
         A simple video decoding example
 
      .. grid-item-card:: :octicon:`file-code;1em`
         Audio Decoding
         :img-top: _static/img/card-background.svg
-        :link: generated_examples/audio_decoding.html
+        :link: generated_examples/decoding/audio_decoding.html
         :link-type: url
 
         A simple audio decoding example
 
      .. grid-item-card:: :octicon:`file-code;1em`
         GPU decoding
         :img-top: _static/img/card-background.svg
-        :link: generated_examples/basic_cuda_example.html
+        :link: generated_examples/decoding/basic_cuda_example.html
         :link-type: url
 
         A simple example demonstrating CUDA GPU decoding
 
      .. grid-item-card:: :octicon:`file-code;1em`
         Streaming video
         :img-top: _static/img/card-background.svg
-        :link: generated_examples/file_like.html
+        :link: generated_examples/decoding/file_like.html
         :link-type: url
 
         How to efficiently decode videos from the cloud
 
      .. grid-item-card:: :octicon:`file-code;1em`
         Clip sampling
         :img-top: _static/img/card-background.svg
-        :link: generated_examples/sampling.html
+        :link: generated_examples/decoding/sampling.html
         :link-type: url
 
         How to sample regular and random clips from a video
 
+Encoding
+^^^^^^^^
+
+.. grid:: 3
+
+     .. grid-item-card:: :octicon:`file-code;1em`
+        Audio Encoding
+        :img-top: _static/img/card-background.svg
+        :link: generated_examples/encoding/audio_encoding.html
+        :link-type: url
+
+        How encode audio samples
+
 .. toctree::
    :maxdepth: 1
    :caption: TorchCodec documentation
@@ -92,4 +114,5 @@ We achieve these capabilities through:
 
    api_ref_torchcodec
    api_ref_decoders
+   api_ref_encoders
    api_ref_samplers
diff --git a/examples/decoding/README.rst b/examples/decoding/README.rst
@@ -0,0 +1,2 @@
+Decoding
+--------
diff --git a/examples/decoding/approximate_mode.py b/examples/decoding/approximate_mode.py
diff --git a/examples/decoding/audio_decoding.py b/examples/decoding/audio_decoding.py
diff --git a/examples/decoding/basic_cuda_example.py b/examples/decoding/basic_cuda_example.py
diff --git a/examples/decoding/basic_example.py b/examples/decoding/basic_example.py
diff --git a/examples/decoding/file_like.py b/examples/decoding/file_like.py
@@ -96,7 +96,7 @@ def bench(f, average_over=10, warmup=2):
 # the :class:`~torchcodec.decoders.VideoDecoder` class to ``"approximate"``. We do
 # this to avoid scanning the entire video during initialization, which would
 # require downloading the entire video even if we only want to decode the first
-# frame. See :ref:`sphx_glr_generated_examples_approximate_mode.py` for more.
+# frame. See :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` for more.
 
 
 def decode_from_existing_download():
diff --git a/examples/decoding/sampling.py b/examples/decoding/sampling.py
@@ -61,7 +61,7 @@ def plot(frames: torch.Tensor, title : Optional[str] = None):
 # Sampling clips from a video always starts by creating a
 # :class:`~torchcodec.decoders.VideoDecoder` object. If you're not already
 # familiar with :class:`~torchcodec.decoders.VideoDecoder`, take a quick look
-# at: :ref:`sphx_glr_generated_examples_basic_example.py`.
+# at: :ref:`sphx_glr_generated_examples_decoding_basic_example.py`.
 from torchcodec.decoders import VideoDecoder
 
 # You can also pass a path to a local file!
diff --git a/examples/encoding/README.rst b/examples/encoding/README.rst
@@ -0,0 +1,2 @@
+Encoding
+--------
diff --git a/examples/encoding/audio_encoding.py b/examples/encoding/audio_encoding.py
@@ -0,0 +1,91 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+========================================
+Encoding audio samples with AudioEncoder
+========================================
+
+In this example, we'll learn how to encode audio samples to a file or to raw
+bytes using the :class:`~torchcodec.encoders.AudioEncoder` class.
+"""
+
+# %%
+# Let's first generate some samples to be encoded. The data to be encoded could
+# also just come from an :class:`~torchcodec.decoders.AudioDecoder`!
+import torch
+from IPython.display import Audio as play_audio
+
+
+def make_sinewave() -> tuple[torch.Tensor, int]:
+    freq_A = 440  # Hz
+    sample_rate = 16000  # Hz
+    duration_seconds = 3  # seconds
+    t = torch.linspace(0, duration_seconds, int(sample_rate * duration_seconds), dtype=torch.float32)
+    return torch.sin(2 * torch.pi * freq_A * t), sample_rate
+
+
+samples, sample_rate = make_sinewave()
+
+print(f"Encoding samples with {samples.shape = } and {sample_rate = }")
+play_audio(samples, rate=sample_rate)
+
+# %%
+# We first instantiate an :class:`~torchcodec.encoders.AudioEncoder`. We pass it
+# the samples to be encoded. The samples must be a 2D tensors of shape
+# ``(num_channels, num_samples)``, or in this case, a 1D tensor where
+# ``num_channels`` is assumed to be 1. The values must be float values
+# normalized in ``[-1, 1]``: this is also what the
+# :class:`~torchcodec.decoders.AudioDecoder` would return.
+#
+# .. note::
+#
+#     The ``sample_rate`` parameter corresponds to the sample rate of the
+#     *input*, not the desired encoded sample rate.
+from torchcodec.encoders import AudioEncoder
+
+encoder = AudioEncoder(samples=samples, sample_rate=sample_rate)
+
+
+# %%
+# :class:`~torchcodec.encoders.AudioEncoder` supports encoding samples into a
+# file via the :meth:`~torchcodec.encoders.AudioEncoder.to_file` method, or to
+# raw bytes via :meth:`~torchcodec.encoders.AudioEncoder.to_tensor`.  For the
+# purpose of this tutorial we'll use
+# :meth:`~torchcodec.encoders.AudioEncoder.to_tensor`, so that we can easily
+# re-decode the encoded samples and check their properies. The
+# :meth:`~torchcodec.encoders.AudioEncoder.to_file` method works very similarly.
+
+encoded_samples = encoder.to_tensor(format="mp3")
+print(f"{encoded_samples.shape = }, {encoded_samples.dtype = }")
+
+
+# %%
+# That's it!
+#
+# Now that we have our encoded data, we can decode it back, to make sure it
+# looks and sounds as expected:
+from torchcodec.decoders import AudioDecoder
+
+samples_back = AudioDecoder(encoded_samples).get_all_samples()
+
+print(samples_back)
+play_audio(samples_back.data, rate=samples_back.sample_rate)
+
+# %%
+# The encoder supports some encoding options that allow you to change how to
+# data is encoded. For example, we can decide to encode our mono data (1
+# channel) into stereo data (2 channels):
+encoded_samples = encoder.to_tensor(format="wav", num_channels=2)
+
+stereo_samples_back = AudioDecoder(encoded_samples).get_all_samples()
+
+print(stereo_samples_back)
+play_audio(stereo_samples_back.data, rate=stereo_samples_back.sample_rate)
+
+# %%
+# Check the docstring of the encoding methods to learn about the different
+# encoding options.
diff --git a/src/torchcodec/_core/Encoder.h b/src/torchcodec/_core/Encoder.h
@@ -9,11 +9,6 @@ class AudioEncoder {
  public:
   ~AudioEncoder();
 
-  // TODO-ENCODING: document in public docs that bit_rate value is only
-  // best-effort, matching to the closest supported bit_rate. I.e. passing 1 is
-  // like passing 0, which results in choosing the minimum supported bit rate.
-  // Passing 44_100 could result in output being 44000 if only 44000 is
-  // supported.
   AudioEncoder(
       const torch::Tensor& samples,
       // TODO-ENCODING: update this comment when we support an output sample
diff --git a/src/torchcodec/_frame.py b/src/torchcodec/_frame.py
@@ -60,7 +60,7 @@ class FrameBatch(Iterable):
 
     The ``data`` tensor is typically 4D for sequences of frames (NHWC or NCHW),
     or 5D for sequences of clips, as returned by the :ref:`samplers
-    <sphx_glr_generated_examples_sampling.py>`. When ``data`` is 4D (resp.  5D)
+    <sphx_glr_generated_examples_decoding_sampling.py>`. When ``data`` is 4D (resp.  5D)
     the ``pts_seconds`` and ``duration_seconds`` tensors are 1D (resp. 2D).
 
     .. note::
diff --git a/src/torchcodec/decoders/_audio_decoder.py b/src/torchcodec/decoders/_audio_decoder.py
@@ -26,15 +26,16 @@ class AudioDecoder:
     Returned samples are float samples normalized in [-1, 1]
 
     Args:
-        source (str, ``Pathlib.path``, bytes, ``torch.Tensor`` or file-like object): The source of the video:
+        source (str, ``Pathlib.path``, bytes, ``torch.Tensor`` or file-like
+            object): The source of the video or audio:
 
             - If ``str``: a local path or a URL to a video or audio file.
             - If ``Pathlib.path``: a path to a local video or audio file.
             - If ``bytes`` object or ``torch.Tensor``: the raw encoded audio data.
             - If file-like object: we read video data from the object on demand. The object must
               expose the methods `read(self, size: int) -> bytes` and
               `seek(self, offset: int, whence: int) -> bytes`. Read more in:
-              :ref:`sphx_glr_generated_examples_file_like.py`.
+              :ref:`sphx_glr_generated_examples_decoding_file_like.py`.
         stream_index (int, optional): Specifies which stream in the file to decode samples from.
             Note that this index is absolute across all media types. If left unspecified, then
             the :term:`best stream` is used.
diff --git a/src/torchcodec/decoders/_video_decoder.py b/src/torchcodec/decoders/_video_decoder.py
@@ -30,7 +30,7 @@ class VideoDecoder:
             - If file-like object: we read video data from the object on demand. The object must
               expose the methods `read(self, size: int) -> bytes` and
               `seek(self, offset: int, whence: int) -> bytes`. Read more in:
-              :ref:`sphx_glr_generated_examples_file_like.py`.
+              :ref:`sphx_glr_generated_examples_decoding_file_like.py`.
         stream_index (int, optional): Specifies which stream in the video to decode frames from.
             Note that this index is absolute across all media types. If left unspecified, then
             the :term:`best stream` is used.
@@ -59,7 +59,7 @@ class VideoDecoder:
             accurate as it uses the file's metadata to calculate where i
             probably is. Default: "exact".
             Read more about this parameter in:
-            :ref:`sphx_glr_generated_examples_approximate_mode.py`
+            :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py`
 
 
     Attributes:
diff --git a/src/torchcodec/encoders/_audio_encoder.py b/src/torchcodec/encoders/_audio_encoder.py
diff --git a/test/test_encoders.py b/test/test_encoders.py