pytorch/torch/distributed/algorithms
Yi Wang 8b61fbdac9 Resubmit: [Gradient Compression] Implement the original layerwise PowerSGD (#49639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49639

Resubmit #49417 with a fix for distributed_test.

The previous submission broke a multi-gpu test that runs on 4 GPUs. Since this test only runs on master, couldn't detect it before the submission.

The real diff is:
4ca1014bb5

This time I have verified that the previous failed test `pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test` could pass after creating a PR (#49651) from a separate branch:
https://app.circleci.com/pipelines/github/pytorch/pytorch/253644/workflows/c1c02b70-0877-40e6-8b4c-61f60f6b70ed/jobs/9768079

ghstack-source-id: 118969912

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook、

Reviewed By: mrshenli

Differential Revision: D25654961

fbshipit-source-id: 2a45c8ceb9bdb54ff7309a8b66ec87e913e0150e
2020-12-20 13:02:52 -08:00
..
ddp_comm_hooks Resubmit: [Gradient Compression] Implement the original layerwise PowerSGD (#49639) 2020-12-20 13:02:52 -08:00
__init__.py [Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158) 2020-11-06 00:28:09 -08:00