pytorch/torch/distributed/algorithms
Yi Wang 2b398d0537 [Reland][Gradient Compression] Apply division first to avoid overflow (#59576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59576

If the gradients before allreduce are large, then the sum after allreduce may overflow, especially for FP16. Therefore, apply the division before allreduce.

This fix is applied to both C++ and Python comm hooks.
ghstack-source-id: 130754510

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl_grad_is_view
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view

Reviewed By: rohan-varma

Differential Revision: D28941327

fbshipit-source-id: 932e8ddbdb2bfd609a78943f6dc390d3d6ca333f
2021-06-08 10:03:21 -07:00
..
ddp_comm_hooks [Reland][Gradient Compression] Apply division first to avoid overflow (#59576) 2021-06-08 10:03:21 -07:00
__init__.py [Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158) 2020-11-06 00:28:09 -08:00