pytorch/torch/distributed/algorithms
Yi Wang daff3a81a1 [Gradient Compression] PowerSGD comm hook (#48060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48060

Implement a PowerSGD variant that applies to a batched flattened tensor with zero paddings.

This version does not require handling 1D tensors and multi-dimenionsal tensors in the input separately, and hence it does not need to create two asyncrhonous future chains.

Potential optimizations:
1) Consider FP16 compression throughout PowerSGD.
2) Warm start and save one matrix multiplication per ieration.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117105938

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl

Reviewed By: jiayisuse

Differential Revision: D24843692

fbshipit-source-id: f44200b1fd6e12e829fc543d21ab7ae086769561
2020-11-19 02:59:11 -08:00
..
ddp_comm_hooks [Gradient Compression] PowerSGD comm hook (#48060) 2020-11-19 02:59:11 -08:00
__init__.py [Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158) 2020-11-06 00:28:09 -08:00