pytorch/torch/testing/_internal/distributed
wz337 5b39734a0a [DTensor][Test] Fix gloo backend failure when eager_init is turned on (#139097)
We should only pass the `device_id` when the backend is `nccl`. Otherwise, we would run into the following error:
```
RuntimeError: No backend for the parent process group or its backend does not support splitting
```

This also fixes test failure is not asserted when using `with_comms()` or `with_comms(eager_init=False)`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139097
Approved by: https://github.com/XilunWu
2024-10-29 00:04:06 +00:00
..
_shard Remove mypy ignore from torch/testing/_internal/distributed/ (#131870) 2024-07-28 17:13:53 +00:00
_tensor [DTensor][Test] Fix gloo backend failure when eager_init is turned on (#139097) 2024-10-29 00:04:06 +00:00
nn Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
rpc Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
__init__.py
checkpoint_utils.py Remove mypy ignore from torch/testing/_internal/distributed/ (#131870) 2024-07-28 17:13:53 +00:00
common_state_dict.py [DCP] Fixes the stateless optimizer issue of distributed state_dict (#135535) 2024-09-10 03:10:00 +00:00
ddp_under_dist_autograd_test.py Fix ROCm skip decorator for test_ddp_tp and multiprocess UTs (#136161) 2024-09-18 11:01:23 +00:00
distributed_test.py Upgrade distributed test to g4dn instances (T4 GPUs) (#137161) 2024-10-20 23:48:54 +00:00
distributed_utils.py Remove mypy ignore from torch/testing/_internal/distributed/ (#131870) 2024-07-28 17:13:53 +00:00
fake_pg.py [BE][Easy] enable ruff rule PIE790: unnecessary pass statement (#133200) 2024-08-15 15:50:19 +00:00
multi_threaded_pg.py [c10d] Fix the device query story of ProcessGroup (#136790) 2024-10-03 01:36:22 +00:00
rpc_utils.py Remove mypy ignore from torch/testing/_internal/distributed/ (#131870) 2024-07-28 17:13:53 +00:00