pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Yi Wang 79e7544cb4 [Gradient Compression] Check start_PowerSGD_iter > 1 and add guidance on tuning PowerSGD configs. (#51427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51427 A user reported that `start_PowerSGD_iter` failed when it's set as 1. This is because allocating memory for error tensors somehow overlap with bucket rebuilding process at iteration 1. Check `start_PowerSGD_iter > 1` instead of `start_PowerSGD_iter >= 1`. Also add a unit test of `test_invalid_powerSGD_state` and some guidance on tuning PowerSGD configs. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120834126 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_invalid_powerSGD_state Reviewed By: rohan-varma Differential Revision: D26166897 fbshipit-source-id: 34d5b64bb3dd43acb61d792626c70e6c8bb44a5d	2021-02-02 04:30:24 -08:00
..
ddp_comm_hooks	[Gradient Compression] Check start_PowerSGD_iter > 1 and add guidance on tuning PowerSGD configs. (#51427 )	2021-02-02 04:30:24 -08:00
__init__.py	[Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158 )	2020-11-06 00:28:09 -08:00

Yi Wang 79e7544cb4 [Gradient Compression] Check start_PowerSGD_iter > 1 and add guidance on tuning PowerSGD configs. (#51427 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51427

A user reported that `start_PowerSGD_iter` failed when it's set as 1. This is because allocating memory for error tensors somehow overlap with bucket rebuilding process at iteration 1.

Check `start_PowerSGD_iter > 1` instead of `start_PowerSGD_iter >= 1`.

Also add a unit test of `test_invalid_powerSGD_state` and some guidance on tuning PowerSGD configs.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 120834126

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_invalid_powerSGD_state

Reviewed By: rohan-varma

Differential Revision: D26166897

fbshipit-source-id: 34d5b64bb3dd43acb61d792626c70e6c8bb44a5d

2021-02-02 04:30:24 -08:00

ddp_comm_hooks

[Gradient Compression] Check start_PowerSGD_iter > 1 and add guidance on tuning PowerSGD configs. (#51427 )

2021-02-02 04:30:24 -08:00

__init__.py

[Gradient Compression] Add unit tests that test default Python comm hook implementations (#47158 )

2020-11-06 00:28:09 -08:00