mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Implements #148689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149334 Approved by: https://github.com/d4l3k Co-authored-by: Paul de Supinski <pdesupinski@gmail.com>
48 lines
601 B
Markdown
48 lines
601 B
Markdown
# Torch Distributed Elastic
|
|
|
|
Makes distributed PyTorch fault-tolerant and elastic.
|
|
|
|
## Get Started
|
|
|
|
```{toctree}
|
|
:caption: Usage
|
|
:maxdepth: 1
|
|
|
|
elastic/quickstart
|
|
elastic/train_script
|
|
elastic/examples
|
|
```
|
|
|
|
## Documentation
|
|
|
|
```{toctree}
|
|
:caption: API
|
|
:maxdepth: 1
|
|
|
|
elastic/run
|
|
elastic/agent
|
|
elastic/multiprocessing
|
|
elastic/errors
|
|
elastic/rendezvous
|
|
elastic/timer
|
|
elastic/metrics
|
|
elastic/events
|
|
elastic/subprocess_handler
|
|
elastic/control_plane
|
|
elastic/numa
|
|
```
|
|
|
|
```{toctree}
|
|
:caption: Advanced
|
|
:maxdepth: 1
|
|
|
|
elastic/customization
|
|
```
|
|
|
|
```{toctree}
|
|
:caption: Plugins
|
|
:maxdepth: 1
|
|
|
|
elastic/kubernetes
|
|
```
|