There are only four .to(device) calls in code. (pytorch#892)

ami-GS · holly1238 · web-flow · commit 5d3d8cfab395 · 2021-05-06T07:21:00.000-07:00
Co-authored-by: holly1238 &lt;77758406+holly1238@users.noreply.github.com&gt;
diff --git a/intermediate_source/model_parallel_tutorial.py b/intermediate_source/model_parallel_tutorial.py
@@ -62,7 +62,7 @@ def forward(self, x):
 
 ######################################################################
 # Note that, the above ``ToyModel`` looks very similar to how one would
-# implement it on a single GPU, except the five ``to(device)`` calls which
+# implement it on a single GPU, except the four ``to(device)`` calls which
 # place linear layers and tensors on proper devices. That is the only place in
 # the model that requires changes. The ``backward()`` and ``torch.optim`` will
 # automatically take care of gradients as if the model is on one GPU. You only