pytorch/torch/csrc/distributed
Chirag Pandya cccf500193 [c10d] remove sleep from watchdogHandler (#135760)
Summary:
Remove sleep from the `watchdogHandler` function. This sleep unnecessary slows things down during a NCCL timeout.
Flight recorder is configured to take a minute, at most, to dump out it's buffer.
This sleep ends up waiting for `8` minutes before destroy is called.

Test Plan: Unit tests.

Differential Revision: D62529875

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135760
Approved by: https://github.com/fduwjj, https://github.com/shuqiangzhang
2024-09-18 00:55:01 +00:00
..
autograd [pytorch] Name threads in thread pools for better debugging (#130270) 2024-07-09 08:03:47 +00:00
c10d [c10d] remove sleep from watchdogHandler (#135760) 2024-09-18 00:55:01 +00:00
rpc Refactoring byte_order (#135558) 2024-09-11 21:06:43 +00:00