pytorch/test/distributed
pritam a81be44410 Fix shard_module to appropriately deal with sub process groups.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79264

`shard_module` API didn't work correctly with a sub-pg since
`dist.scatter` actually takes the global rank as input for `src`.

Fixing this by passing in the appropriate rank to `dist.scatter`

Differential Revision: [D37062766](https://our.internmc.facebook.com/intern/diff/D37062766/)

Approved by: https://github.com/fduwjj, https://github.com/wanchaol
2022-06-12 03:50:45 +00:00
..
_shard Fix shard_module to appropriately deal with sub process groups. 2022-06-12 03:50:45 +00:00
algorithms [BE] move init_multigpu_helper to common_distributed (#67050) 2021-10-22 17:16:11 -07:00
bin Add test owner to distributed files starting with test_ (#66797) 2021-10-19 10:55:20 -07:00
elastic [ci] remove IN_CI env var 2022-06-11 17:16:30 +00:00
fsdp Checkpoint util 2022-06-10 18:37:36 +00:00
launcher [torchelastic][1/n] Fix caffe2.test.distributed.launcher.api_test flaky tests (#68624) 2021-11-19 15:23:30 -08:00
nn/jit Have test classes extend from common_utils.TestCase, not unittest.TestCase (#66900) 2021-10-19 16:54:05 -07:00
optim Convert DDP parameters to ReplicatedTensor during forward pass. 2022-04-18 03:27:23 +00:00
pipeline/sync Add all bzl files per D36874458 2022-06-06 09:40:19 -07:00
rpc [ci] remove IN_CI env var 2022-06-11 17:16:30 +00:00
argparse_util_test.py [skip ci] set more tests with owners for distributed and elastic (#67583) 2021-11-01 12:26:03 -07:00
defs.bzl Add all bzl files per D36874458 2022-06-06 09:40:19 -07:00
test_c10d_common.py Fix SyncBatchNorm for empty inputs (#74944) 2022-04-01 23:48:30 +00:00
test_c10d_gloo.py ROCm: unskip c10 gloo tests 2022-04-25 14:28:56 +00:00
test_c10d_nccl.py [DDP] Fix broadcast for channels-last tensors (#79060) 2022-06-08 21:52:58 +00:00
test_c10d_spawn_gloo.py [PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786) 2021-12-06 13:38:58 -08:00
test_c10d_spawn_nccl.py Use _all_gather_base and fuse matmul for sharded linear. 2022-06-01 17:17:34 +00:00
test_c10d_spawn.py [PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786) 2021-12-06 13:38:58 -08:00
test_data_parallel.py no longer coalesce sparse COO tensors before comparison (#69751) 2022-02-17 02:33:08 +00:00
test_distributed_spawn.py Add test owner to distributed files starting with test_ (#66797) 2021-10-19 10:55:20 -07:00
test_launcher.py Add test owner to distributed files starting with test_ (#66797) 2021-10-19 10:55:20 -07:00
test_nccl.py [NCCL] Patch bfloat16 support (#67843) 2021-11-09 13:46:13 -08:00
test_pg_wrapper.py Add test owner to distributed files starting with test_ (#66797) 2021-10-19 10:55:20 -07:00
test_store.py [Bootcamp] Set default value of TCPStore world_size to None in pybind definition (#77277) 2022-05-12 18:48:48 +00:00