Skip to content

Commit 7945e6a

Browse files
authored
Add Docs for AudioEncoder (#717)
1 parent ffb65f6 commit 7945e6a

21 files changed

+243
-48
lines changed

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,18 @@
33
# TorchCodec
44

55
TorchCodec is a Python library for decoding video and audio data into PyTorch
6-
tensors, on CPU and CUDA GPU. It aims to be fast, easy to use, and well
7-
integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML
8-
models on videos and audio, TorchCodec is how you turn these into data.
6+
tensors, on CPU and CUDA GPU. It also supports audio encoding, and video
7+
encoding will come soon! It aims to be fast, easy to use, and well integrated
8+
into the PyTorch ecosystem. If you want to use PyTorch to train ML models on
9+
videos and audio, TorchCodec is how you turn these into data.
910

1011
We achieve these capabilities through:
1112

1213
* Pythonic APIs that mirror Python and PyTorch conventions.
13-
* Relying on [FFmpeg](https://www.ffmpeg.org/) to do the decoding. TorchCodec
14-
uses the version of FFmpeg you already have installed. FFmpeg is a mature
15-
library with broad coverage available on most systems. It is, however, not
16-
easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
14+
* Relying on [FFmpeg](https://www.ffmpeg.org/) to do the decoding and encoding.
15+
TorchCodec uses the version of FFmpeg you already have installed. FFmpeg is a
16+
mature library with broad coverage available on most systems. It is, however,
17+
not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
1718
correctly and efficiently.
1819
* Returning data as PyTorch tensors, ready to be fed into PyTorch transforms
1920
or used directly to train models.

docs/source/api_ref_decoders.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ torchcodec.decoders
77
.. currentmodule:: torchcodec.decoders
88

99

10-
For a video decoder tutorial, see: :ref:`sphx_glr_generated_examples_basic_example.py`.
11-
For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_audio_decoding.py`.
10+
For a video decoder tutorial, see: :ref:`sphx_glr_generated_examples_decoding_basic_example.py`.
11+
For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_decoding_audio_decoding.py`.
1212

1313

1414
.. autosummary::

docs/source/api_ref_encoders.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
.. _encoders:
2+
3+
===================
4+
torchcodec.encoders
5+
===================
6+
7+
.. currentmodule:: torchcodec.encoders
8+
9+
10+
For an audio decoder tutorial, see: :ref:`sphx_glr_generated_examples_encoding_audio_encoding.py`.
11+
12+
13+
.. autosummary::
14+
:toctree: generated/
15+
:nosignatures:
16+
:template: class.rst
17+
18+
AudioEncoder

docs/source/api_ref_samplers.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ torchcodec.samplers
66

77
.. currentmodule:: torchcodec.samplers
88

9-
For a tutorial, see: :ref:`sphx_glr_generated_examples_sampling.py`.
9+
For a tutorial, see: :ref:`sphx_glr_generated_examples_decoding_sampling.py`.
1010

1111
.. autosummary::
1212
:toctree: generated/

docs/source/conf.py

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -68,18 +68,27 @@ class CustomGalleryExampleSortKey:
6868
def __init__(self, src_dir):
6969
self.src_dir = src_dir
7070

71-
order = [
72-
"basic_example.py",
73-
"audio_decoding.py",
74-
"basic_cuda_example.py",
75-
"file_like.py",
76-
"approximate_mode.py",
77-
"sampling.py",
78-
]
79-
8071
def __call__(self, filename):
72+
# We have two top-level galleries, one for decoding examples and one for
73+
# encoding examples. We define the example order within each gallery
74+
# individually.
75+
if "examples/decoding" in self.src_dir:
76+
order = [
77+
"basic_example.py",
78+
"audio_decoding.py",
79+
"basic_cuda_example.py",
80+
"file_like.py",
81+
"approximate_mode.py",
82+
"sampling.py",
83+
]
84+
else:
85+
assert "examples/encoding" in self.src_dir
86+
order = [
87+
"audio_encoding.py",
88+
]
89+
8190
try:
82-
return self.order.index(filename)
91+
return order.index(filename)
8392
except ValueError as e:
8493
raise ValueError(
8594
"Looks like you added an example in the examples/ folder?"

docs/source/index.rst

Lines changed: 36 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,25 @@ Welcome to the TorchCodec documentation!
22
========================================
33

44
TorchCodec is a Python library for decoding video and audio data into PyTorch
5-
tensors, on CPU and CUDA GPU. It aims to be fast, easy to use, and well
6-
integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML
7-
models on videos and audio, TorchCodec is how you turn these into data.
5+
tensors, on CPU and CUDA GPU. It also supports audio encoding, and video encoding will come soon!
6+
It aims to be fast, easy to use, and well integrated into the PyTorch ecosystem.
7+
If you want to use PyTorch to train ML models on videos and audio, TorchCodec is
8+
how you turn these into data.
89

910
We achieve these capabilities through:
1011

1112
* Pythonic APIs that mirror Python and PyTorch conventions.
12-
* Relying on `FFmpeg <https://www.ffmpeg.org/>`_ to do the decoding. TorchCodec
13-
uses the version of FFmpeg you already have installed. FMPEG is a mature
14-
library with broad coverage available on most systems. It is, however, not
15-
easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
16-
correctly and efficiently.
13+
* Relying on `FFmpeg <https://www.ffmpeg.org/>`_ to do the decoding / encoding.
14+
TorchCodec uses the version of FFmpeg you already have installed. FMPEG is a
15+
mature library with broad coverage available on most systems. It is, however,
16+
not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is
17+
used correctly and efficiently.
1718
* Returning data as PyTorch tensors, ready to be fed into PyTorch transforms
1819
or used directly to train models.
1920

21+
Installation instructions
22+
^^^^^^^^^^^^^^^^^^^^^^^^^
23+
2024
.. grid:: 3
2125

2226
.. grid-item-card:: :octicon:`file-code;1em`
@@ -27,46 +31,64 @@ We achieve these capabilities through:
2731

2832
How to install TorchCodec
2933

34+
Decoding
35+
^^^^^^^^
36+
37+
.. grid:: 3
38+
3039
.. grid-item-card:: :octicon:`file-code;1em`
3140
Getting Started with TorchCodec
3241
:img-top: _static/img/card-background.svg
33-
:link: generated_examples/basic_example.html
42+
:link: generated_examples/decoding/basic_example.html
3443
:link-type: url
3544

3645
A simple video decoding example
3746

3847
.. grid-item-card:: :octicon:`file-code;1em`
3948
Audio Decoding
4049
:img-top: _static/img/card-background.svg
41-
:link: generated_examples/audio_decoding.html
50+
:link: generated_examples/decoding/audio_decoding.html
4251
:link-type: url
4352

4453
A simple audio decoding example
4554

4655
.. grid-item-card:: :octicon:`file-code;1em`
4756
GPU decoding
4857
:img-top: _static/img/card-background.svg
49-
:link: generated_examples/basic_cuda_example.html
58+
:link: generated_examples/decoding/basic_cuda_example.html
5059
:link-type: url
5160

5261
A simple example demonstrating CUDA GPU decoding
5362

5463
.. grid-item-card:: :octicon:`file-code;1em`
5564
Streaming video
5665
:img-top: _static/img/card-background.svg
57-
:link: generated_examples/file_like.html
66+
:link: generated_examples/decoding/file_like.html
5867
:link-type: url
5968

6069
How to efficiently decode videos from the cloud
6170

6271
.. grid-item-card:: :octicon:`file-code;1em`
6372
Clip sampling
6473
:img-top: _static/img/card-background.svg
65-
:link: generated_examples/sampling.html
74+
:link: generated_examples/decoding/sampling.html
6675
:link-type: url
6776

6877
How to sample regular and random clips from a video
6978

79+
Encoding
80+
^^^^^^^^
81+
82+
.. grid:: 3
83+
84+
.. grid-item-card:: :octicon:`file-code;1em`
85+
Audio Encoding
86+
:img-top: _static/img/card-background.svg
87+
:link: generated_examples/encoding/audio_encoding.html
88+
:link-type: url
89+
90+
How encode audio samples
91+
7092
.. toctree::
7193
:maxdepth: 1
7294
:caption: TorchCodec documentation
@@ -92,4 +114,5 @@ We achieve these capabilities through:
92114

93115
api_ref_torchcodec
94116
api_ref_decoders
117+
api_ref_encoders
95118
api_ref_samplers

examples/decoding/README.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Decoding
2+
--------
File renamed without changes.
File renamed without changes.
File renamed without changes.

examples/file_like.py renamed to examples/decoding/file_like.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ def bench(f, average_over=10, warmup=2):
9696
# the :class:`~torchcodec.decoders.VideoDecoder` class to ``"approximate"``. We do
9797
# this to avoid scanning the entire video during initialization, which would
9898
# require downloading the entire video even if we only want to decode the first
99-
# frame. See :ref:`sphx_glr_generated_examples_approximate_mode.py` for more.
99+
# frame. See :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` for more.
100100

101101

102102
def decode_from_existing_download():

examples/sampling.py renamed to examples/decoding/sampling.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def plot(frames: torch.Tensor, title : Optional[str] = None):
6161
# Sampling clips from a video always starts by creating a
6262
# :class:`~torchcodec.decoders.VideoDecoder` object. If you're not already
6363
# familiar with :class:`~torchcodec.decoders.VideoDecoder`, take a quick look
64-
# at: :ref:`sphx_glr_generated_examples_basic_example.py`.
64+
# at: :ref:`sphx_glr_generated_examples_decoding_basic_example.py`.
6565
from torchcodec.decoders import VideoDecoder
6666

6767
# You can also pass a path to a local file!

examples/encoding/README.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Encoding
2+
--------

examples/encoding/audio_encoding.py

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the BSD-style license found in the
5+
# LICENSE file in the root directory of this source tree.
6+
7+
"""
8+
========================================
9+
Encoding audio samples with AudioEncoder
10+
========================================
11+
12+
In this example, we'll learn how to encode audio samples to a file or to raw
13+
bytes using the :class:`~torchcodec.encoders.AudioEncoder` class.
14+
"""
15+
16+
# %%
17+
# Let's first generate some samples to be encoded. The data to be encoded could
18+
# also just come from an :class:`~torchcodec.decoders.AudioDecoder`!
19+
import torch
20+
from IPython.display import Audio as play_audio
21+
22+
23+
def make_sinewave() -> tuple[torch.Tensor, int]:
24+
freq_A = 440 # Hz
25+
sample_rate = 16000 # Hz
26+
duration_seconds = 3 # seconds
27+
t = torch.linspace(0, duration_seconds, int(sample_rate * duration_seconds), dtype=torch.float32)
28+
return torch.sin(2 * torch.pi * freq_A * t), sample_rate
29+
30+
31+
samples, sample_rate = make_sinewave()
32+
33+
print(f"Encoding samples with {samples.shape = } and {sample_rate = }")
34+
play_audio(samples, rate=sample_rate)
35+
36+
# %%
37+
# We first instantiate an :class:`~torchcodec.encoders.AudioEncoder`. We pass it
38+
# the samples to be encoded. The samples must be a 2D tensors of shape
39+
# ``(num_channels, num_samples)``, or in this case, a 1D tensor where
40+
# ``num_channels`` is assumed to be 1. The values must be float values
41+
# normalized in ``[-1, 1]``: this is also what the
42+
# :class:`~torchcodec.decoders.AudioDecoder` would return.
43+
#
44+
# .. note::
45+
#
46+
# The ``sample_rate`` parameter corresponds to the sample rate of the
47+
# *input*, not the desired encoded sample rate.
48+
from torchcodec.encoders import AudioEncoder
49+
50+
encoder = AudioEncoder(samples=samples, sample_rate=sample_rate)
51+
52+
53+
# %%
54+
# :class:`~torchcodec.encoders.AudioEncoder` supports encoding samples into a
55+
# file via the :meth:`~torchcodec.encoders.AudioEncoder.to_file` method, or to
56+
# raw bytes via :meth:`~torchcodec.encoders.AudioEncoder.to_tensor`. For the
57+
# purpose of this tutorial we'll use
58+
# :meth:`~torchcodec.encoders.AudioEncoder.to_tensor`, so that we can easily
59+
# re-decode the encoded samples and check their properies. The
60+
# :meth:`~torchcodec.encoders.AudioEncoder.to_file` method works very similarly.
61+
62+
encoded_samples = encoder.to_tensor(format="mp3")
63+
print(f"{encoded_samples.shape = }, {encoded_samples.dtype = }")
64+
65+
66+
# %%
67+
# That's it!
68+
#
69+
# Now that we have our encoded data, we can decode it back, to make sure it
70+
# looks and sounds as expected:
71+
from torchcodec.decoders import AudioDecoder
72+
73+
samples_back = AudioDecoder(encoded_samples).get_all_samples()
74+
75+
print(samples_back)
76+
play_audio(samples_back.data, rate=samples_back.sample_rate)
77+
78+
# %%
79+
# The encoder supports some encoding options that allow you to change how to
80+
# data is encoded. For example, we can decide to encode our mono data (1
81+
# channel) into stereo data (2 channels):
82+
encoded_samples = encoder.to_tensor(format="wav", num_channels=2)
83+
84+
stereo_samples_back = AudioDecoder(encoded_samples).get_all_samples()
85+
86+
print(stereo_samples_back)
87+
play_audio(stereo_samples_back.data, rate=stereo_samples_back.sample_rate)
88+
89+
# %%
90+
# Check the docstring of the encoding methods to learn about the different
91+
# encoding options.

src/torchcodec/_core/Encoder.h

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,6 @@ class AudioEncoder {
99
public:
1010
~AudioEncoder();
1111

12-
// TODO-ENCODING: document in public docs that bit_rate value is only
13-
// best-effort, matching to the closest supported bit_rate. I.e. passing 1 is
14-
// like passing 0, which results in choosing the minimum supported bit rate.
15-
// Passing 44_100 could result in output being 44000 if only 44000 is
16-
// supported.
1712
AudioEncoder(
1813
const torch::Tensor& samples,
1914
// TODO-ENCODING: update this comment when we support an output sample

src/torchcodec/_frame.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ class FrameBatch(Iterable):
6060
6161
The ``data`` tensor is typically 4D for sequences of frames (NHWC or NCHW),
6262
or 5D for sequences of clips, as returned by the :ref:`samplers
63-
<sphx_glr_generated_examples_sampling.py>`. When ``data`` is 4D (resp. 5D)
63+
<sphx_glr_generated_examples_decoding_sampling.py>`. When ``data`` is 4D (resp. 5D)
6464
the ``pts_seconds`` and ``duration_seconds`` tensors are 1D (resp. 2D).
6565
6666
.. note::

src/torchcodec/decoders/_audio_decoder.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,15 +26,16 @@ class AudioDecoder:
2626
Returned samples are float samples normalized in [-1, 1]
2727
2828
Args:
29-
source (str, ``Pathlib.path``, bytes, ``torch.Tensor`` or file-like object): The source of the video:
29+
source (str, ``Pathlib.path``, bytes, ``torch.Tensor`` or file-like
30+
object): The source of the video or audio:
3031
3132
- If ``str``: a local path or a URL to a video or audio file.
3233
- If ``Pathlib.path``: a path to a local video or audio file.
3334
- If ``bytes`` object or ``torch.Tensor``: the raw encoded audio data.
3435
- If file-like object: we read video data from the object on demand. The object must
3536
expose the methods `read(self, size: int) -> bytes` and
3637
`seek(self, offset: int, whence: int) -> bytes`. Read more in:
37-
:ref:`sphx_glr_generated_examples_file_like.py`.
38+
:ref:`sphx_glr_generated_examples_decoding_file_like.py`.
3839
stream_index (int, optional): Specifies which stream in the file to decode samples from.
3940
Note that this index is absolute across all media types. If left unspecified, then
4041
the :term:`best stream` is used.

src/torchcodec/decoders/_video_decoder.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ class VideoDecoder:
3030
- If file-like object: we read video data from the object on demand. The object must
3131
expose the methods `read(self, size: int) -> bytes` and
3232
`seek(self, offset: int, whence: int) -> bytes`. Read more in:
33-
:ref:`sphx_glr_generated_examples_file_like.py`.
33+
:ref:`sphx_glr_generated_examples_decoding_file_like.py`.
3434
stream_index (int, optional): Specifies which stream in the video to decode frames from.
3535
Note that this index is absolute across all media types. If left unspecified, then
3636
the :term:`best stream` is used.
@@ -59,7 +59,7 @@ class VideoDecoder:
5959
accurate as it uses the file's metadata to calculate where i
6060
probably is. Default: "exact".
6161
Read more about this parameter in:
62-
:ref:`sphx_glr_generated_examples_approximate_mode.py`
62+
:ref:`sphx_glr_generated_examples_decoding_approximate_mode.py`
6363
6464
6565
Attributes:

0 commit comments

Comments
 (0)