mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
72 lines
3.0 KiB
ReStructuredText
72 lines
3.0 KiB
ReStructuredText
.. _pipeline-parallelism:
|
|
|
|
Pipeline Parallelism
|
|
====================
|
|
|
|
Pipeline parallelism was original introduced in the
|
|
`Gpipe <https://arxiv.org/abs/1811.06965>`__ paper and is an efficient
|
|
technique to train large models on multiple GPUs.
|
|
|
|
.. warning ::
|
|
Pipeline Parallelism is experimental and subject to change.
|
|
|
|
Model Parallelism using multiple GPUs
|
|
-------------------------------------
|
|
|
|
Typically for large models which don't fit on a single GPU, model parallelism
|
|
is employed where certain parts of the model are placed on different GPUs.
|
|
Although, if this is done naively for sequential models, the training process
|
|
suffers from GPU under utilization since only one GPU is active at one time as
|
|
shown in the figure below:
|
|
|
|
.. figure:: _static/img/pipeline_parallelism/no_pipe.png
|
|
|
|
The figure represents a model with 4 layers placed on 4 different GPUs
|
|
(vertical axis). The horizontal axis represents training this model through
|
|
time demonstrating that only 1 GPU is utilized at a time
|
|
(`image source <https://arxiv.org/abs/1811.06965>`__).
|
|
|
|
Pipelined Execution
|
|
-------------------
|
|
|
|
To alleviate this problem, pipeline parallelism splits the input minibatch into
|
|
multiple microbatches and pipelines the execution of these microbatches across
|
|
multiple GPUs. This is outlined in the figure below:
|
|
|
|
.. figure:: _static/img/pipeline_parallelism/pipe.png
|
|
|
|
The figure represents a model with 4 layers placed on 4 different GPUs
|
|
(vertical axis). The horizontal axis represents training this model through
|
|
time demonstrating that the GPUs are utilized much more efficiently.
|
|
However, there still exists a bubble (as demonstrated in the figure) where
|
|
certain GPUs are not utilized.
|
|
(`image source <https://arxiv.org/abs/1811.06965>`__).
|
|
|
|
Pipe APIs in PyTorch
|
|
--------------------
|
|
.. autoclass:: torch.distributed.pipeline.sync.Pipe
|
|
:members: forward
|
|
|
|
Skip connections
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
Certain models like ResNeXt are not completely sequential and have skip
|
|
connections between layers. Naively implementing as part of pipeling
|
|
parallelism would imply that we need to copy outputs for certain layers through
|
|
multiple GPUs till we eventually reach the GPU where the layer for the skip
|
|
connection resides. To avoid this copy overhead, we provide APIs below to stash
|
|
and pop Tensors in different layers of the model.
|
|
|
|
.. autofunction:: torch.distributed.pipeline.sync.skip.skippable.skippable
|
|
.. autoclass:: torch.distributed.pipeline.sync.skip.skippable.stash
|
|
.. autoclass:: torch.distributed.pipeline.sync.skip.skippable.pop
|
|
.. autofunction:: torch.distributed.pipeline.sync.skip.skippable.verify_skippables
|
|
|
|
Acknowledgements
|
|
----------------
|
|
|
|
The implementation for pipeline parallelism is based on `fairscale's pipe implementation <https://github.com/facebookresearch/fairscale/tree/master/fairscale/nn/pipe>`__ and
|
|
`torchgpipe <https://github.com/kakaobrain/torchgpipe>`__. We would like to
|
|
thank both teams for their contributions and guidance towards bringing pipeline
|
|
parallelism into PyTorch.
|