mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the `torch.distributed._tensor`, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/133113 Approved by: https://github.com/XilunWu ghstack dependencies: #133305, #133306
105 lines
4.1 KiB
ReStructuredText
105 lines
4.1 KiB
ReStructuredText
.. role:: hidden
|
|
:class: hidden-section
|
|
|
|
PyTorch DTensor (Distributed Tensor)
|
|
======================================================
|
|
|
|
.. note::
|
|
``torch.distributed.tensor`` is currently in alpha state and under
|
|
development, we are committing backward compatibility for the most APIs listed
|
|
in the doc, but there might be API changes if necessary.
|
|
|
|
|
|
PyTorch DTensor offers simple and flexible tensor sharding primitives that transparently handles distributed
|
|
logic, including sharded storage, operator computation and collective communications across devices/hosts.
|
|
``DTensor`` could be used to build different paralleism solutions and support sharded state_dict representation
|
|
when working with multi-dimensional sharding.
|
|
|
|
Please see examples from the PyTorch native parallelism solutions that are built on top of ``DTensor``:
|
|
|
|
* `Tensor Parallel <https://pytorch.org/docs/main/distributed.tensor.parallel.html>`__
|
|
* `FSDP2 <https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md>`__
|
|
|
|
.. automodule:: torch.distributed.tensor
|
|
|
|
.. currentmodule:: torch.distributed.tensor
|
|
|
|
:class:`DTensor` follows the SPMD (single program, multiple data) programming model to empower users to
|
|
write distributed program as if it's a single-device program with the same convergence property. It
|
|
provides a uniform tensor sharding layout (DTensor Layout) through specifying the :class:`DeviceMesh`
|
|
and :class:`Placement`:
|
|
|
|
- :class:`DeviceMesh` represents the device topology and the communicators of the cluster using
|
|
an n-dimensional array.
|
|
|
|
- :class:`Placement` describes the sharding layout of the logical tensor on the :class:`DeviceMesh`.
|
|
DTensor supports three types of placements: :class:`Shard`, :class:`Replicate` and :class:`Partial`.
|
|
|
|
There're three ways to construct a :class:`DTensor`:
|
|
* :meth:`distribute_tensor` creates a :class:`DTensor` from a logical or "global" ``torch.Tensor`` on
|
|
each rank. This could be used to shard the leaf ``torch.Tensor`` s (i.e. model parameters/buffers
|
|
and inputs).
|
|
* :meth:`DTensor.from_local` creates a :class:`DTensor` from a local ``torch.Tensor`` on each rank, which can
|
|
be used to create :class:`DTensor` from a non-leaf ``torch.Tensor`` s (i.e. intermediate activation
|
|
tensors during forward/backward).
|
|
* DTensor provides dedicated tensor factory methods (e.g. :meth:`empty`, :meth:`ones`, :meth:`randn`, etc.)
|
|
to allow different :class:`DTensor` creations by directly specifying the :class:`DeviceMesh` and
|
|
:class:`Placement`
|
|
|
|
.. autoclass:: DTensor
|
|
:members:
|
|
:member-order: bysource
|
|
|
|
.. autofunction:: distribute_tensor
|
|
|
|
|
|
Along with :meth:`distribute_tensor`, DTensor also offers a :meth:`distribute_module` API to allow easier
|
|
sharding on the :class:`nn.Module` level
|
|
|
|
.. autofunction:: distribute_module
|
|
|
|
DTensor supports the following types of :class:`Placement` on each :class:`DeviceMesh` dimension:
|
|
|
|
.. autoclass:: Shard
|
|
:members:
|
|
:undoc-members:
|
|
|
|
.. autoclass:: Replicate
|
|
:members:
|
|
:undoc-members:
|
|
|
|
.. autoclass:: Partial
|
|
:members:
|
|
:undoc-members:
|
|
|
|
DTensor provides dedicated tensor factory functions to allow creating :class:`DTensor` directly
|
|
using torch.Tensor like factory function APIs (i.e. torch.ones, torch.empty, etc), by additionally
|
|
specifying the :class:`DeviceMesh` and :class:`Placement` for the :class:`DTensor` created:
|
|
|
|
.. autofunction:: zeros
|
|
|
|
.. autofunction:: ones
|
|
|
|
.. autofunction:: empty
|
|
|
|
.. autofunction:: full
|
|
|
|
.. autofunction:: rand
|
|
|
|
.. autofunction:: randn
|
|
|
|
|
|
.. modules that are missing docs, add the doc later when necessary
|
|
.. py:module:: torch.distributed.tensor.api
|
|
.. py:module:: torch.distributed.tensor.device_mesh
|
|
.. py:module:: torch.distributed.tensor.random
|
|
.. py:module:: torch.distributed.tensor.placement_types
|
|
.. py:module:: torch.distributed.tensor.experimental
|
|
.. py:module:: torch.distributed.tensor.experimental.attention
|
|
.. py:module:: torch.distributed.tensor.experimental.func_map
|
|
.. py:module:: torch.distributed.tensor.experimental.register_sharding
|
|
.. py:module:: torch.distributed.tensor.experimental.tp_transform
|
|
.. py:module:: torch.distributed.tensor.debug
|
|
.. py:module:: torch.distributed.tensor.debug.comm_mode
|
|
.. py:module:: torch.distributed.tensor.debug.visualize_sharding
|