pytorch/docs/source/elastic
raghavhrishi 7ef3c3357d NUMA binding integration with elastic agent and torchrun (#149334)
Implements #148689

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149334
Approved by: https://github.com/d4l3k

Co-authored-by: Paul de Supinski <pdesupinski@gmail.com>
2025-07-25 21:19:49 +00:00
..
agent_diagram.jpg
agent.rst Fix some incorrect reST markups in the document (#154831) 2025-06-07 19:09:46 +00:00
control_plane.rst Reapply "distributed debug handlers (#126601)" (#127805) 2024-06-04 19:44:30 +00:00
customization.rst
errors.rst
etcd_rdzv_diagram.png
events.rst DOC: add docstring to construct_and_record_rdzv_event() (#128189) 2024-06-10 22:17:33 +00:00
examples.rst
kubernetes.rst
metrics.rst
multiprocessing.rst
numa.rst NUMA binding integration with elastic agent and torchrun (#149334) 2025-07-25 21:19:49 +00:00
quickstart.rst
rendezvous.rst [TorchElastic] Option for sharing TCPStore created by rdzv handlers (#125743) 2024-05-22 18:24:11 +00:00
run.rst
subprocess_handler.rst [Torch Elastic][Draft] Refactor SubprocessHandler to separate module for easier subclass (#120373) 2024-03-08 01:37:34 +00:00
timer.rst [Torch][Timer] Adding debug info logging interface for expired timers (#123883) 2024-04-25 01:15:52 +00:00
train_script.rst