mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 00:21:07 +01:00
Introducing ResNeXt model as link to PyTorch Hub see Skip connections section. Handle issue in #98690. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98689 Approved by: https://github.com/zou3519, https://github.com/kit1980
83 lines
3.5 KiB
ReStructuredText
83 lines
3.5 KiB
ReStructuredText
.. _pipeline-parallelism:
|
|
|
|
Pipeline Parallelism
|
|
====================
|
|
|
|
Pipeline parallelism was original introduced in the
|
|
`Gpipe <https://arxiv.org/abs/1811.06965>`__ paper and is an efficient
|
|
technique to train large models on multiple GPUs.
|
|
|
|
.. warning ::
|
|
Pipeline Parallelism is experimental and subject to change.
|
|
|
|
Model Parallelism using multiple GPUs
|
|
-------------------------------------
|
|
|
|
Typically for large models which don't fit on a single GPU, model parallelism
|
|
is employed where certain parts of the model are placed on different GPUs.
|
|
Although, if this is done naively for sequential models, the training process
|
|
suffers from GPU under utilization since only one GPU is active at one time as
|
|
shown in the figure below:
|
|
|
|
.. figure:: _static/img/pipeline_parallelism/no_pipe.png
|
|
|
|
The figure represents a model with 4 layers placed on 4 different GPUs
|
|
(vertical axis). The horizontal axis represents training this model through
|
|
time demonstrating that only 1 GPU is utilized at a time
|
|
(`image source <https://arxiv.org/abs/1811.06965>`__).
|
|
|
|
Pipelined Execution
|
|
-------------------
|
|
|
|
To alleviate this problem, pipeline parallelism splits the input minibatch into
|
|
multiple microbatches and pipelines the execution of these microbatches across
|
|
multiple GPUs. This is outlined in the figure below:
|
|
|
|
.. figure:: _static/img/pipeline_parallelism/pipe.png
|
|
|
|
The figure represents a model with 4 layers placed on 4 different GPUs
|
|
(vertical axis). The horizontal axis represents training this model through
|
|
time demonstrating that the GPUs are utilized much more efficiently.
|
|
However, there still exists a bubble (as demonstrated in the figure) where
|
|
certain GPUs are not utilized.
|
|
(`image source <https://arxiv.org/abs/1811.06965>`__).
|
|
|
|
Pipe APIs in PyTorch
|
|
--------------------
|
|
.. autoclass:: torch.distributed.pipeline.sync.Pipe
|
|
:members: forward
|
|
|
|
Skip connections
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
Certain models like `ResNeXt <https://pytorch.org/hub/pytorch_vision_resnext/>`__
|
|
are not completely sequential and have skip connections between layers.
|
|
Naively implementing as part of pipeline parallelism would imply that
|
|
we need to copy outputs for certain layers through multiple GPUs till
|
|
we eventually reach the GPU where the layer for the skip connection resides.
|
|
To avoid this copy overhead, we provide APIs below to stash and pop Tensors
|
|
in different layers of the model.
|
|
|
|
.. autofunction:: torch.distributed.pipeline.sync.skip.skippable.skippable
|
|
.. autoclass:: torch.distributed.pipeline.sync.skip.skippable.stash
|
|
.. autoclass:: torch.distributed.pipeline.sync.skip.skippable.pop
|
|
.. autofunction:: torch.distributed.pipeline.sync.skip.skippable.verify_skippables
|
|
|
|
Tutorials
|
|
---------
|
|
|
|
The following tutorials give a good overview of how to use the
|
|
:class:`~torch.distributed.pipeline.sync.Pipe` API to train your models with the
|
|
rest of the components that PyTorch provides:
|
|
|
|
- `Training Transformer models using Pipeline Parallelism <https://pytorch.org/tutorials/intermediate/pipeline_tutorial.html>`__
|
|
- `Training Transformer models using Distributed Data Parallel and Pipeline Parallelism <https://pytorch.org/tutorials/advanced/ddp_pipeline.html>`__
|
|
|
|
Acknowledgements
|
|
----------------
|
|
|
|
The implementation for pipeline parallelism is based on `fairscale's pipe implementation <https://github.com/facebookresearch/fairscale/tree/main/fairscale/nn/pipe>`__ and
|
|
`torchgpipe <https://github.com/kakaobrain/torchgpipe>`__. We would like to
|
|
thank both teams for their contributions and guidance towards bringing pipeline
|
|
parallelism into PyTorch.
|