pytorch/torch/distributed/launcher
SandishKumarHN b498299953 154849 Add support to handle IGUSR1 and SIGUSR2 in multiprocessing (#160690)
Fixes #154849

This change addresses the request to add support for SIGUSR1 and SIGUSR2 signals in torchrun for SLURM environments.  Changes supports these signals through the configurable `TORCHELASTIC_SIGNALS_TO_HANDLE` environment variable and signals_to_handle parameter from laucher api

Tests:
For validations purpose:
test_signal_handling.py,
simple_test_api_signal_handling.py,

Unit Tests:
for launcher changes:launcher/test_api.py
for api changes:  multiprocessing/test_api.py
E2E: test_run.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160690
Approved by: https://github.com/fduwjj
2025-09-09 22:23:06 +00:00
..
__init__.py [BE][Easy] enable UFMT for torch/distributed/ (#128870) 2024-06-22 18:53:28 +00:00
api.py 154849 Add support to handle IGUSR1 and SIGUSR2 in multiprocessing (#160690) 2025-09-09 22:23:06 +00:00