Skip to content

Add Docs for AudioEncoder #717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,18 @@
# TorchCodec

TorchCodec is a Python library for decoding video and audio data into PyTorch
tensors, on CPU and CUDA GPU. It aims to be fast, easy to use, and well
integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML
models on videos and audio, TorchCodec is how you turn these into data.
tensors, on CPU and CUDA GPU. It also supports audio encoding, and video
encoding will come soon! It aims to be fast, easy to use, and well integrated
into the PyTorch ecosystem. If you want to use PyTorch to train ML models on
videos and audio, TorchCodec is how you turn these into data.

We achieve these capabilities through:

* Pythonic APIs that mirror Python and PyTorch conventions.
* Relying on [FFmpeg](https://www.ffmpeg.org/) to do the decoding. TorchCodec
uses the version of FFmpeg you already have installed. FFmpeg is a mature
library with broad coverage available on most systems. It is, however, not
easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
* Relying on [FFmpeg](https://www.ffmpeg.org/) to do the decoding / encoding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I prefer "x and y" as opposed to "x / y" in prose.

TorchCodec uses the version of FFmpeg you already have installed. FFmpeg is a
mature library with broad coverage available on most systems. It is, however,
not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
correctly and efficiently.
* Returning data as PyTorch tensors, ready to be fed into PyTorch transforms
or used directly to train models.
Expand Down
4 changes: 2 additions & 2 deletions docs/source/api_ref_decoders.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ torchcodec.decoders
.. currentmodule:: torchcodec.decoders


For a video decoder tutorial, see: :ref:`sphx_glr_generated_examples_basic_example.py`.
For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_audio_decoding.py`.
For a video decoder tutorial, see: :ref:`sphx_glr_generated_examples_decoding_basic_example.py`.
For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_decoding_audio_decoding.py`.


.. autosummary::
Expand Down
18 changes: 18 additions & 0 deletions docs/source/api_ref_encoders.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.. _encoders:

===================
torchcodec.encoders
===================

.. currentmodule:: torchcodec.encoders


For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_encoding_audio_encoding.py`.


.. autosummary::
:toctree: generated/
:nosignatures:
:template: class.rst

AudioEncoder
2 changes: 1 addition & 1 deletion docs/source/api_ref_samplers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ torchcodec.samplers

.. currentmodule:: torchcodec.samplers

For a tutorial, see: :ref:`sphx_glr_generated_examples_sampling.py`.
For a tutorial, see: :ref:`sphx_glr_generated_examples_decoding_sampling.py`.

.. autosummary::
:toctree: generated/
Expand Down
26 changes: 16 additions & 10 deletions docs/source/conf.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in this file, as well as the files renaming, are meant to separate our "tutorials" page into 2 separate sections: one for decoding, one for encoding.

Original file line number Diff line number Diff line change
Expand Up @@ -68,18 +68,24 @@ class CustomGalleryExampleSortKey:
def __init__(self, src_dir):
self.src_dir = src_dir

order = [
"basic_example.py",
"audio_decoding.py",
"basic_cuda_example.py",
"file_like.py",
"approximate_mode.py",
"sampling.py",
]

def __call__(self, filename):
if "examples/decoding" in self.src_dir:
order = [
"basic_example.py",
"audio_decoding.py",
"basic_cuda_example.py",
"file_like.py",
"approximate_mode.py",
"sampling.py",
]
else:
assert "examples/encoding" in self.src_dir
order = [
"audio_encoding.py",
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment explaining that we have two top-level galleries, and for that reason, we need to figure out which gallery we're using (decoding versus encoding)? I was real confused until I concluded that must be what's going on.

try:
return self.order.index(filename)
return order.index(filename)
except ValueError as e:
raise ValueError(
"Looks like you added an example in the examples/ folder?"
Expand Down
49 changes: 36 additions & 13 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,25 @@ Welcome to the TorchCodec documentation!
========================================

TorchCodec is a Python library for decoding video and audio data into PyTorch
tensors, on CPU and CUDA GPU. It aims to be fast, easy to use, and well
integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML
models on videos and audio, TorchCodec is how you turn these into data.
tensors, on CPU and CUDA GPU. It also supports audio encoding, and video encoding will come soon!
It aims to be fast, easy to use, and well integrated into the PyTorch ecosystem.
If you want to use PyTorch to train ML models on videos and audio, TorchCodec is
how you turn these into data.

We achieve these capabilities through:

* Pythonic APIs that mirror Python and PyTorch conventions.
* Relying on `FFmpeg <https://www.ffmpeg.org/>`_ to do the decoding. TorchCodec
uses the version of FFmpeg you already have installed. FMPEG is a mature
library with broad coverage available on most systems. It is, however, not
easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
correctly and efficiently.
* Relying on `FFmpeg <https://www.ffmpeg.org/>`_ to do the decoding / encoding.
TorchCodec uses the version of FFmpeg you already have installed. FMPEG is a
mature library with broad coverage available on most systems. It is, however,
not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is
used correctly and efficiently.
* Returning data as PyTorch tensors, ready to be fed into PyTorch transforms
or used directly to train models.

Installation instructions
^^^^^^^^^^^^^^^^^^^^^^^^^

.. grid:: 3

.. grid-item-card:: :octicon:`file-code;1em`
Expand All @@ -27,46 +31,64 @@ We achieve these capabilities through:

How to install TorchCodec

Decoding
^^^^^^^^

.. grid:: 3

.. grid-item-card:: :octicon:`file-code;1em`
Getting Started with TorchCodec
:img-top: _static/img/card-background.svg
:link: generated_examples/basic_example.html
:link: generated_examples/decoding/basic_example.html
:link-type: url

A simple video decoding example

.. grid-item-card:: :octicon:`file-code;1em`
Audio Decoding
:img-top: _static/img/card-background.svg
:link: generated_examples/audio_decoding.html
:link: generated_examples/decoding/audio_decoding.html
:link-type: url

A simple audio decoding example

.. grid-item-card:: :octicon:`file-code;1em`
GPU decoding
:img-top: _static/img/card-background.svg
:link: generated_examples/basic_cuda_example.html
:link: generated_examples/decoding/basic_cuda_example.html
:link-type: url

A simple example demonstrating CUDA GPU decoding

.. grid-item-card:: :octicon:`file-code;1em`
Streaming video
:img-top: _static/img/card-background.svg
:link: generated_examples/file_like.html
:link: generated_examples/decoding/file_like.html
:link-type: url

How to efficiently decode videos from the cloud

.. grid-item-card:: :octicon:`file-code;1em`
Clip sampling
:img-top: _static/img/card-background.svg
:link: generated_examples/sampling.html
:link: generated_examples/decoding/sampling.html
:link-type: url

How to sample regular and random clips from a video

Encoding
^^^^^^^^

.. grid:: 3

.. grid-item-card:: :octicon:`file-code;1em`
Audio Encoding
:img-top: _static/img/card-background.svg
:link: generated_examples/encoding/audio_encoding.html
:link-type: url

How encode audio samples

.. toctree::
:maxdepth: 1
:caption: TorchCodec documentation
Expand All @@ -92,4 +114,5 @@ We achieve these capabilities through:

api_ref_torchcodec
api_ref_decoders
api_ref_encoders
api_ref_samplers
2 changes: 2 additions & 0 deletions examples/decoding/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Decoding
--------
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion examples/file_like.py → examples/decoding/file_like.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def bench(f, average_over=10, warmup=2):
# the :class:`~torchcodec.decoders.VideoDecoder` class to ``"approximate"``. We do
# this to avoid scanning the entire video during initialization, which would
# require downloading the entire video even if we only want to decode the first
# frame. See :ref:`sphx_glr_generated_examples_approximate_mode.py` for more.
# frame. See :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` for more.


def decode_from_existing_download():
Expand Down
2 changes: 1 addition & 1 deletion examples/sampling.py → examples/decoding/sampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def plot(frames: torch.Tensor, title : Optional[str] = None):
# Sampling clips from a video always starts by creating a
# :class:`~torchcodec.decoders.VideoDecoder` object. If you're not already
# familiar with :class:`~torchcodec.decoders.VideoDecoder`, take a quick look
# at: :ref:`sphx_glr_generated_examples_basic_example.py`.
# at: :ref:`sphx_glr_generated_examples_decoding_basic_example.py`.
from torchcodec.decoders import VideoDecoder

# You can also pass a path to a local file!
Expand Down
2 changes: 2 additions & 0 deletions examples/encoding/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Encoding
--------
91 changes: 91 additions & 0 deletions examples/encoding/audio_encoding.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

"""
========================================
Encoding audio samples with AudioEncoder
========================================

In this example, we'll learn how to encode audio samples to a file or to raw
bytes using the :class:`~torchcodec.encoders.AudioEncoder` class.
"""

# %%
# Let's first generate some samples to be encoded. The data to be encoded could
# also just come from an :class:`~torchcodec.decoders.AudioDecoder`!
import torch
from IPython.display import Audio as play_audio


def make_sinewave() -> tuple[torch.Tensor, int]:
freq_A = 440 # Hz
sample_rate = 16000 # Hz
duration_seconds = 3 # seconds
t = torch.linspace(0, duration_seconds, int(sample_rate * duration_seconds), dtype=torch.float32)
return torch.sin(2 * torch.pi * freq_A * t), sample_rate


samples, sample_rate = make_sinewave()

print(f"Encoding samples with {samples.shape = } and {sample_rate = }")
play_audio(samples, rate=sample_rate)

# %%
# We first instantiate an :class:`~torchcodec.encoders.AudioEncoder`. We pass it
# the samples to be encoded. The samples must a 2D tensors of shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "The samples must be a 2D tensors of shape"

# ``(num_channels, num_samples)``, or in this case, a 1D tensor where
# ``num_channels`` is assumed to be 1. The values must be float values
# normalized in ``[-1, 1]``: this is also what the
# :class:`~torchcodec.decoders.AudioDecoder` would return.
#
# .. note::
#
# The ``sample_rate`` parameter corresponds to the sample rate of the
# *input*, not the desired encoded sample rate.
from torchcodec.encoders import AudioEncoder

encoder = AudioEncoder(samples=samples, sample_rate=sample_rate)


# %%
# :class:`~torchcodec.encoders.AudioEncoder` supports encoding samples into a
# file via the :meth:`~torchcodec.encoders.AudioEncoder.to_file` method, or to
# raw bytes via :meth:`~torchcodec.encoders.AudioEncoder.to_tensor`. For the
# purpose of this tutorial we'll use
# :meth:`~torchcodec.encoders.AudioEncoder.to_tensor`, so that we can easily
# re-decode the encoded samples and check their properies. The
# :meth:`~torchcodec.encoders.AudioEncoder.to_file` method works very similarly.

encoded_samples = encoder.to_tensor(format="mp3")
print(f"{encoded_samples.shape = }, {encoded_samples.dtype = }")


# %%
# That's it!
#
# Now that we have our encoded data, we can decode it back, to make sure it
# looks and sounds as expected:
from torchcodec.decoders import AudioDecoder

samples_back = AudioDecoder(encoded_samples).get_all_samples()

print(samples_back)
play_audio(samples_back.data, rate=samples_back.sample_rate)

# %%
# The encoder supports some encoding options that allow you to change how to
# data is encoded. For example, we can decide to encode our mono data (1
# channel) into stereo data (2 channels):
encoded_samples = encoder.to_tensor(format="wav", num_channels=2)

stereo_samples_back = AudioDecoder(encoded_samples).get_all_samples()

print(stereo_samples_back)
play_audio(stereo_samples_back.data, rate=stereo_samples_back.sample_rate)

# %%
# Check the docstring of the encoding methods to learn about the different
# encoding options.
5 changes: 0 additions & 5 deletions src/torchcodec/_core/Encoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,6 @@ class AudioEncoder {
public:
~AudioEncoder();

// TODO-ENCODING: document in public docs that bit_rate value is only
// best-effort, matching to the closest supported bit_rate. I.e. passing 1 is
// like passing 0, which results in choosing the minimum supported bit rate.
// Passing 44_100 could result in output being 44000 if only 44000 is
// supported.
AudioEncoder(
const torch::Tensor& samples,
// TODO-ENCODING: update this comment when we support an output sample
Expand Down
2 changes: 1 addition & 1 deletion src/torchcodec/_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ class FrameBatch(Iterable):

The ``data`` tensor is typically 4D for sequences of frames (NHWC or NCHW),
or 5D for sequences of clips, as returned by the :ref:`samplers
<sphx_glr_generated_examples_sampling.py>`. When ``data`` is 4D (resp. 5D)
<sphx_glr_generated_examples_decoding_sampling.py>`. When ``data`` is 4D (resp. 5D)
the ``pts_seconds`` and ``duration_seconds`` tensors are 1D (resp. 2D).

.. note::
Expand Down
5 changes: 3 additions & 2 deletions src/torchcodec/decoders/_audio_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,16 @@ class AudioDecoder:
Returned samples are float samples normalized in [-1, 1]

Args:
source (str, ``Pathlib.path``, bytes, ``torch.Tensor`` or file-like object): The source of the video:
source (str, ``Pathlib.path``, bytes, ``torch.Tensor`` or file-like
object): The source of the video or audio:

- If ``str``: a local path or a URL to a video or audio file.
- If ``Pathlib.path``: a path to a local video or audio file.
- If ``bytes`` object or ``torch.Tensor``: the raw encoded audio data.
- If file-like object: we read video data from the object on demand. The object must
expose the methods `read(self, size: int) -> bytes` and
`seek(self, offset: int, whence: int) -> bytes`. Read more in:
:ref:`sphx_glr_generated_examples_file_like.py`.
:ref:`sphx_glr_generated_examples_decoding_file_like.py`.
stream_index (int, optional): Specifies which stream in the file to decode samples from.
Note that this index is absolute across all media types. If left unspecified, then
the :term:`best stream` is used.
Expand Down
4 changes: 2 additions & 2 deletions src/torchcodec/decoders/_video_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class VideoDecoder:
- If file-like object: we read video data from the object on demand. The object must
expose the methods `read(self, size: int) -> bytes` and
`seek(self, offset: int, whence: int) -> bytes`. Read more in:
:ref:`sphx_glr_generated_examples_file_like.py`.
:ref:`sphx_glr_generated_examples_decoding_file_like.py`.
stream_index (int, optional): Specifies which stream in the video to decode frames from.
Note that this index is absolute across all media types. If left unspecified, then
the :term:`best stream` is used.
Expand Down Expand Up @@ -59,7 +59,7 @@ class VideoDecoder:
accurate as it uses the file's metadata to calculate where i
probably is. Default: "exact".
Read more about this parameter in:
:ref:`sphx_glr_generated_examples_approximate_mode.py`
:ref:`sphx_glr_generated_examples_decoding_approximate_mode.py`


Attributes:
Expand Down
Loading
Loading