Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62611
Enables optimizer overlap with backwards in DDP for Adam. Additional optimizers, especially Adagrad will be done in follow up diffs.
1. Implement `step_param` method based on `step` in _FunctionalAdam (perf permitting we can later dedupe `step` to call `step_param`
2. Modify tests to test all current functional optimizers.
ghstack-source-id: 135207143
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D29891783
fbshipit-source-id: 321915982afd5cb0a9c2e43d27550f433bff00d1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62078
Ensure that kwarg arguments such as momentum and weight decay maintain
parity between optimizer.step and step_param.
ghstack-source-id: 134330377
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D29837942
fbshipit-source-id: 1ae39648fc26aebd8aaef1a7ac0e03b598a8ed60
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756
DDP will support running optimizer as communication hook with
optimizers that support a per-parameter/gradient step function `step_param`.
Add parity tests as we implement more optimizers that support step_param to
ensure parity with regular optimizers.
ghstack-source-id: 134330378
Test Plan: Ci
Reviewed By: SciPioneer
Differential Revision: D29727549
fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f