pytorch/docs/source/distributed.elastic.rst
Cheng Ni 9bff1599b6 [Torch Elastic][Draft] Refactor SubprocessHandler to separate module for easier subclass (#120373)
Summary:
## No Functional Change
- Refactor Subprocess Handler into a separate folder for easier subclassing
- SubprocessHandler
    - added `local_rank_id` in `SubprocessHandler` to make it available as a field in the class
    - pass in `local_rank_id` from subprocess start

Test Plan: No functional changes.

Differential Revision: D54038627

#suppress-api-compatibility-check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120373
Approved by: https://github.com/kurman
2024-03-08 01:37:34 +00:00

44 lines
668 B
ReStructuredText

Torch Distributed Elastic
============================
Makes distributed PyTorch fault-tolerant and elastic.
Get Started
---------------
.. toctree::
:maxdepth: 1
:caption: Usage
elastic/quickstart
elastic/train_script
elastic/examples
Documentation
---------------
.. toctree::
:maxdepth: 1
:caption: API
elastic/run
elastic/agent
elastic/multiprocessing
elastic/errors
elastic/rendezvous
elastic/timer
elastic/metrics
elastic/events
elastic/subprocess_handler
.. toctree::
:maxdepth: 1
:caption: Advanced
elastic/customization
.. toctree::
:maxdepth: 1
:caption: Plugins
elastic/kubernetes