pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Weiyi Zheng c07babbcf1 [Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57410 FP16 gradient compression may run into 'inf' issue. switching to division before allreduce can avoid this problem. ghstack-source-id: 127877083 Test Plan: before chage f268909897 after change: f270950609 If you still sees 'grad_norm = inf' after enabling fp16 hook, you can resume the training and turning off the hook. Reviewed By: SciPioneer Differential Revision: D28128628 fbshipit-source-id: 0b6648637713e4f321e39c9ccb645a6b6f1750a0	2021-05-07 12:23:21 -07:00
..
ddp_comm_hooks	[Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410 )	2021-05-07 12:23:21 -07:00
__init__.py	[Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158 )	2020-11-06 00:28:09 -08:00

Weiyi Zheng c07babbcf1 [Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57410

FP16 gradient compression may run into 'inf' issue. switching to division before allreduce can avoid this problem.
ghstack-source-id: 127877083

Test Plan:
before chage

f268909897

after change:
f270950609

If you still sees 'grad_norm = inf' after enabling fp16 hook, you can resume the training and turning off the hook.

Reviewed By: SciPioneer

Differential Revision: D28128628

fbshipit-source-id: 0b6648637713e4f321e39c9ccb645a6b6f1750a0

2021-05-07 12:23:21 -07:00

ddp_comm_hooks

[Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410 )

2021-05-07 12:23:21 -07:00

__init__.py

[Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158 )

2020-11-06 00:28:09 -08:00