pytorch/torch/distributed/algorithms
Weiyi Zheng c07babbcf1 [Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57410

FP16 gradient compression may run into 'inf' issue. switching to division before allreduce can avoid this problem.
ghstack-source-id: 127877083

Test Plan:
before chage

f268909897

after change:
f270950609

If you still sees 'grad_norm = inf' after enabling fp16 hook, you can resume the training and turning off the hook.

Reviewed By: SciPioneer

Differential Revision: D28128628

fbshipit-source-id: 0b6648637713e4f321e39c9ccb645a6b6f1750a0
2021-05-07 12:23:21 -07:00
..
ddp_comm_hooks [Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410) 2021-05-07 12:23:21 -07:00
__init__.py [Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158) 2020-11-06 00:28:09 -08:00