Skip to content

Commit 9de0c64

Browse files
brianjomthrokcloudhan
authored
1.10.0-RC-Test (pytorch#1695)
* Update build.sh * Update audio tutorial (pytorch#1713) * Update audio tutorial * fix * Update pipeline_tutorial.py (pytorch#1715) Reduce value to save memory. * Update tutorial of memory view (pytorch#1672) * update assets of memory view tutorial * update memory view tutorial * Update tensorboard_profiler_tutorial.py Touch to kick build. Co-authored-by: Brian Johnson <[email protected]> Co-authored-by: moto <[email protected]> Co-authored-by: cloudhan <[email protected]>
1 parent d7b5f5e commit 9de0c64

File tree

7 files changed

+47
-16
lines changed

7 files changed

+47
-16
lines changed

.jenkins/build.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ pip install -r $DIR/../requirements.txt
2929
# RC Link
3030
# pip uninstall -y torch torchvision torchaudio torchtext
3131
# pip install --pre --upgrade -f https://download.pytorch.org/whl/test/cu102/torch_test.html torch torchvision torchaudio torchtext
32+
pip uninstall -y torch torchvision torchaudio torchtext
33+
pip install -f https://download.pytorch.org/whl/test/cu111/torch_test.html torch torchvision torchaudio torchtext
3234

3335
# For Tensorboard. Until 1.14 moves to the release channel.
3436
pip install tb-nightly
19.5 KB
Loading
55.3 KB
Loading

_static/img/profiler_memory_view.png

60.7 KB
Loading

beginner_source/audio_preprocessing_tutorial.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1537,13 +1537,13 @@ def benchmark_resample(
15371537

15381538
rate = 1.2
15391539
spec_ = strech(spec, rate)
1540-
plot_spectrogram(F.complex_norm(spec_[0]), title=f"Stretched x{rate}", aspect='equal', xmax=304)
1540+
plot_spectrogram(spec_[0].abs(), title=f"Stretched x{rate}", aspect='equal', xmax=304)
15411541

1542-
plot_spectrogram(F.complex_norm(spec[0]), title="Original", aspect='equal', xmax=304)
1542+
plot_spectrogram(spec[0].abs(), title="Original", aspect='equal', xmax=304)
15431543

15441544
rate = 0.9
15451545
spec_ = strech(spec, rate)
1546-
plot_spectrogram(F.complex_norm(spec_[0]), title=f"Stretched x{rate}", aspect='equal', xmax=304)
1546+
plot_spectrogram(spec_[0].abs(), title=f"Stretched x{rate}", aspect='equal', xmax=304)
15471547

15481548

15491549
######################################################################

intermediate_source/pipeline_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ def batchify(data, bsz):
205205
# ``N`` is along dimension 1.
206206
#
207207

208-
bptt = 35
208+
bptt = 25
209209
def get_batch(source, i):
210210
seq_len = min(bptt, len(source) - 1 - i)
211211
data = source[i:i+seq_len]

intermediate_source/tensorboard_profiler_tutorial.py

Lines changed: 41 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
77
Introduction
88
------------
9-
PyTorch 1.8 includes an updated profiler API capable of
9+
PyTorch 1.8 includes an updated profiler API capable of
1010
recording the CPU side operations as well as the CUDA kernel launches on the GPU side.
1111
The profiler can visualize this information
1212
in TensorBoard Plugin and provide analysis of the performance bottlenecks.
@@ -113,7 +113,8 @@ def train(data):
113113
# After profiling, result files will be saved into the ``./log/resnet18`` directory.
114114
# Specify this directory as a ``logdir`` parameter to analyze profile in TensorBoard.
115115
# - ``record_shapes`` - whether to record shapes of the operator inputs.
116-
# - ``profile_memory`` - Track tensor memory allocation/deallocation.
116+
# - ``profile_memory`` - Track tensor memory allocation/deallocation. Note, for old version of pytorch with version
117+
# before 1.10, if you suffer long profiling time, please disable it or upgrade to new version.
117118
# - ``with_stack`` - Record source information (file and line number) for the ops.
118119
# If the TensorBoard is launched in VSCode (`reference <https://code.visualstudio.com/docs/datascience/pytorch-support#_tensorboard-integration>`_),
119120
# clicking a stack frame will navigate to the specific code line.
@@ -122,6 +123,7 @@ def train(data):
122123
schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
123124
on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
124125
record_shapes=True,
126+
profile_memory=True,
125127
with_stack=True
126128
) as prof:
127129
for step, batch_data in enumerate(train_loader):
@@ -287,28 +289,54 @@ def train(data):
287289
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
288290
#
289291
# - Memory view
290-
# To profile memory, please add ``profile_memory=True`` in arguments of ``torch.profiler.profile``.
292+
# To profile memory, ``profile_memory`` must be set to ``True`` in arguments of ``torch.profiler.profile``.
291293
#
292-
# Note: Because of the current non-optimized implementation of PyTorch profiler,
293-
# enabling ``profile_memory=True`` will take about several minutes to finish.
294-
# To save time, you can try our existing examples first by running:
294+
# You can try it by using existing example on Azure
295295
#
296296
# ::
297297
#
298-
# tensorboard --logdir=https://torchtbprofiler.blob.core.windows.net/torchtbprofiler/demo/memory_demo
298+
# pip install azure-storage-blob
299+
# tensorboard --logdir=https://torchtbprofiler.blob.core.windows.net/torchtbprofiler/demo/memory_demo_1_10
299300
#
300-
# The profiler records all memory allocation/release events during profiling.
301-
# For every specific operator, the plugin aggregates all these memory events inside its life span.
301+
# The profiler records all memory allocation/release events and allocator's internal state during profiling.
302+
# The memory view consists of three components as shown in the following.
302303
#
303304
# .. image:: ../../_static/img/profiler_memory_view.png
304305
# :scale: 25 %
305306
#
307+
# The components are memory curve graph, memory events table and memory statistics table, from top to bottom, respectively.
308+
#
306309
# The memory type could be selected in "Device" selection box.
307-
# For example, "GPU0" means the following table only shows each operator’s memory usage on GPU 0, not including CPU or other GPUs.
310+
# For example, "GPU0" means the following table only shows each operator's memory usage on GPU 0, not including CPU or other GPUs.
311+
#
312+
# The memory curve shows the trends of memory consumption. The "Allocated" curve shows the total memory that is actually
313+
# in use, e.g., tensors. In PyTorch, caching mechanism is employed in CUDA allocator and some other allocators. The
314+
# "Reserved" curve shows the total memory that is reserved by the allocator. You can left click and drag on the graph
315+
# to select events in the desired range:
316+
#
317+
# .. image:: ../../_static/img/profiler_memory_curve_selecting.png
318+
# :scale: 25 %
319+
#
320+
# After selection, the three components will be updated for the restricted time range, so that you can gain more
321+
# information about it. By repeating this process, you can zoom into a very fine-grained detail. Right click on the graph
322+
# will reset the graph to the initial state.
323+
#
324+
# .. image:: ../../_static/img/profiler_memory_curve_single.png
325+
# :scale: 25 %
308326
#
309-
# The "Size Increase" sums up all allocation bytes and minus all the memory release bytes.
327+
# In the memory events table, the allocation and release events are paired into one entry. The "operator" column shows
328+
# the immediate ATen operator that is causing the allocation. Notice that in PyTorch, ATen operators commonly use
329+
# ``aten::empty`` to allocate memory. For example, ``aten::ones`` is implemented as ``aten::empty`` followed by an
330+
# ``aten::fill_``. Solely display the opeartor name as ``aten::empty`` is of little help. It will be shown as
331+
# ``aten::ones (aten::empty)`` in this special case. The "Allocation Time", "Release Time" and "Duration"
332+
# columns' data might be missing if the event occurs outside of the time range.
310333
#
311-
# The "Allocation Size" sums up all allocation bytes without considering the memory release.
334+
# In the memory statistics table, the "Size Increase" column sums up all allocation size and minus all the memory
335+
# release size, that is, the net increase of memory usage after this operator. The "Self Size Increase" column is
336+
# similar to "Size Increase", but it does not count children operators' allocation. With regards to ATen operators'
337+
# implementation detail, some operators might call other operators, so memory allocations can happen at any level of the
338+
# call stack. That says, "Self Size Increase" only count the memory usage increase at current level of call stack.
339+
# Finally, the "Allocation Size" column sums up all allocation without considering the memory release.
312340
#
313341
# - Distributed view
314342
# The plugin now supports distributed view on profiling DDP with NCCL/GLOO as backend.
@@ -317,6 +345,7 @@ def train(data):
317345
#
318346
# ::
319347
#
348+
# pip install azure-storage-blob
320349
# tensorboard --logdir=https://torchtbprofiler.blob.core.windows.net/torchtbprofiler/demo/distributed_bert
321350
#
322351
# .. image:: ../../_static/img/profiler_distributed_view.png

0 commit comments

Comments
 (0)