pytorch/torch/distributed
Mingzhe Li 66f9b1de1b [NCCL] enable p2p tests (#47797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47797

NCCL p2p tests had hang issues before, the reason is that there were some unexpected context switches. For example, process 1 which is supposed to only use GPU1 could use GPU0 as a result of missing explicitly setting device.
ghstack-source-id: 116461969

Test Plan: waitforsandcastle

Reviewed By: jiayisuse

Differential Revision: D24863808

fbshipit-source-id: 92bd3a4874be8334210c7c8ee6363648893c963e
2020-11-12 10:44:50 -08:00
..
_pipeline Pull in fairscale.nn.Pipe into PyTorch. (#44090) 2020-10-22 10:59:02 -07:00
algorithms [Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158) 2020-11-06 00:28:09 -08:00
autograd Add Python declaration of torch._C and torch._C._autograd modules. (#46622) 2020-11-06 01:25:47 -08:00
benchmarks Benchmark combining Distributed Data Parallel and Distributed RPC (#46993) 2020-11-04 18:53:19 -08:00
nn [RPC Framework] Support remote device format "<workername>/<device>" (#46773) 2020-10-29 00:14:56 -07:00
optim [dist_optim] serialize compilation when creating dist_optim (#45871) 2020-10-07 15:10:41 -07:00
rpc Add type annotations to torch._C._distributed_rpc module. (#46624) 2020-11-06 01:28:51 -08:00
__init__.py Add type annotations for torch._C._distributed_c10d module. (#46623) 2020-11-06 01:28:48 -08:00
constants.py Add NCCL_ASYNC_ERROR_HANDLING to docs (#46856) 2020-10-26 14:41:32 -07:00
CONTRIBUTING.md Move python-independent c10d implementations to torch/lib (#47309) 2020-11-03 23:39:54 -08:00
distributed_c10d.py [NCCL] enable p2p tests (#47797) 2020-11-12 10:44:50 -08:00
launch.py Add option to log subprocess output to files in DDP launcher. (#33193) 2020-10-23 11:22:57 -07:00
rendezvous.py Add type annotations for torch._C._distributed_c10d module. (#46623) 2020-11-06 01:28:48 -08:00