pytorch/docs/source/elastic
Cheng Ni 9bff1599b6 [Torch Elastic][Draft] Refactor SubprocessHandler to separate module for easier subclass (#120373)
Summary:
## No Functional Change
- Refactor Subprocess Handler into a separate folder for easier subclassing
- SubprocessHandler
    - added `local_rank_id` in `SubprocessHandler` to make it available as a field in the class
    - pass in `local_rank_id` from subprocess start

Test Plan: No functional changes.

Differential Revision: D54038627

#suppress-api-compatibility-check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120373
Approved by: https://github.com/kurman
2024-03-08 01:37:34 +00:00
..
agent_diagram.jpg
agent.rst Add watchdog to TorchElastic agent and trainers (#84081) 2022-09-07 00:17:20 +00:00
customization.rst
errors.rst [torch] Various improvements to torch.distributed.launch and torch.distributed.run (#61294) 2021-07-08 16:28:06 -07:00
etcd_rdzv_diagram.png
events.rst
examples.rst
kubernetes.rst Fix typo under docs directory (#92762) 2023-01-23 18:07:22 +00:00
metrics.rst
multiprocessing.rst [TorchElastic] Refactoring to support non-default logging strategy (#120691) 2024-02-29 20:59:17 +00:00
quickstart.rst [BE] Prefer dash over underscore in command-line options (#94505) 2023-02-09 20:16:49 +00:00
rendezvous.rst [TorchElastic] Support for overprovisioning in C10 based rendezvous (#117066) 2024-01-18 01:16:55 +00:00
run.rst Introduce the torchrun entrypoint (#64049) 2021-08-26 20:17:48 -07:00
subprocess_handler.rst [Torch Elastic][Draft] Refactor SubprocessHandler to separate module for easier subclass (#120373) 2024-03-08 01:37:34 +00:00
timer.rst Named pipe based watchdog timer (#83695) 2022-08-24 22:16:12 +00:00
train_script.rst [BE] Prefer dash over underscore in command-line options (#94505) 2023-02-09 20:16:49 +00:00