mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59576 If the gradients before allreduce are large, then the sum after allreduce may overflow, especially for FP16. Therefore, apply the division before allreduce. This fix is applied to both C++ and Python comm hooks. ghstack-source-id: 130754510 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl_grad_is_view Reviewed By: rohan-varma Differential Revision: D28941327 fbshipit-source-id: 932e8ddbdb2bfd609a78943f6dc390d3d6ca333f |
||
|---|---|---|
| .. | ||
| ddp_comm_hooks | ||
| __init__.py | ||