pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Yi Wang c08078031f [Gradient Compression] Allow BatchedPowerSGD to run vanilla allreduce for the first K iterations (#51270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51270 Similar to #50973, allow the batched version to run vanilla allreduce for the first K iterations. This may be useful if the batched version can be applied to some use cases where the accuracy requirement is not very strict. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120725858 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl baseline: f248001754 batched PowerSGD: f246960752 The training time was reduced from 54m48s to 30m33s, and the accuracy is approximately the same: 44.21 vs 44.35 Reviewed By: rohan-varma Differential Revision: D26077709 fbshipit-source-id: 6afeefad7a3fbdd7da2cbffb56dfbad855a96cb5	2021-02-01 15:26:29 -08:00
..
ddp_comm_hooks	[Gradient Compression] Allow BatchedPowerSGD to run vanilla allreduce for the first K iterations (#51270 )	2021-02-01 15:26:29 -08:00
__init__.py	[Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158 )	2020-11-06 00:28:09 -08:00

Yi Wang c08078031f [Gradient Compression] Allow BatchedPowerSGD to run vanilla allreduce for the first K iterations (#51270 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51270

Similar to #50973, allow the batched version to run vanilla allreduce for the first K iterations.

This may be useful if the batched version can be applied to some use cases where the accuracy requirement is not very strict.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 120725858

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl

baseline: f248001754
batched PowerSGD: f246960752

The training time was reduced from 54m48s to 30m33s, and the accuracy is approximately the same: 44.21 vs 44.35

Reviewed By: rohan-varma

Differential Revision: D26077709

fbshipit-source-id: 6afeefad7a3fbdd7da2cbffb56dfbad855a96cb5

2021-02-01 15:26:29 -08:00

ddp_comm_hooks

[Gradient Compression] Allow BatchedPowerSGD to run vanilla allreduce for the first K iterations (#51270 )

2021-02-01 15:26:29 -08:00

__init__.py

[Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158 )

2020-11-06 00:28:09 -08:00