pytorch/torch/distributed/algorithms
Yi Wang 979180cd01 [Model Averaging] Allow subgroup to be None in PostLocalSGDState (#63277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63277

`PostLocalSGDState` requires a subgroup. To initialize this subgroup, a global process group must be initialized. However, this imposes a restriction that a hook state can only be provided after distributed environment initialization, which is not compatible with lightning DDP plugin setup where hook state should be provided before distributed environment initialization.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 135848575

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD

Reviewed By: cbalioglu

Differential Revision: D30325041

fbshipit-source-id: 7b870166d096d306c3f2f7c69816a705cec0bebd
2021-08-16 10:07:41 -07:00
..
ddp_comm_hooks [Model Averaging] Allow subgroup to be None in PostLocalSGDState (#63277) 2021-08-16 10:07:41 -07:00
model_averaging [Model Averaging] Fix docstring of PeriodicModelAverager (#62392) 2021-07-29 17:26:27 -07:00
__init__.py Make _Join, _Joinable, _JoinHook public (#62605) 2021-08-03 12:20:11 -07:00
join.py Make _Join, _Joinable, _JoinHook public (#62605) 2021-08-03 12:20:11 -07:00
quantization.py Adding collective quantization API (#62142) 2021-08-09 08:11:22 -07:00