Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71620
Remove from_functional_optim and make it the default constructor since
that is the only way _OptimizerHookState is now being built. Also, no longer
need to expose create_functional_optim helper function
ghstack-source-id: 147577174
Test Plan: CI
Reviewed By: cbalioglu
Differential Revision: D33700593
fbshipit-source-id: ba089ce3bf66ccf8f71cffdd0f4d4bddc03e8b14
(cherry picked from commit a50b2caf0e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63462
Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, these tests can be run on windows.
ghstack-source-id: 136437635
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D30358923
fbshipit-source-id: 36739bdfe7214789f17de652d30c62c2bc124c73
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63383
Per title
ghstack-source-id: 135966157
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D30358921
fbshipit-source-id: 965e054e525194b1ee55980340df275bab355c9b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63382
Per title
ghstack-source-id: 135966156
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D30255446
fbshipit-source-id: e6ffbf339db0bc5b4702d02b74a462309df07c75
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62611
Enables optimizer overlap with backwards in DDP for Adam. Additional optimizers, especially Adagrad will be done in follow up diffs.
1. Implement `step_param` method based on `step` in _FunctionalAdam (perf permitting we can later dedupe `step` to call `step_param`
2. Modify tests to test all current functional optimizers.
ghstack-source-id: 135207143
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D29891783
fbshipit-source-id: 321915982afd5cb0a9c2e43d27550f433bff00d1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62078
Ensure that kwarg arguments such as momentum and weight decay maintain
parity between optimizer.step and step_param.
ghstack-source-id: 134330377
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D29837942
fbshipit-source-id: 1ae39648fc26aebd8aaef1a7ac0e03b598a8ed60
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756
DDP will support running optimizer as communication hook with
optimizers that support a per-parameter/gradient step function `step_param`.
Add parity tests as we implement more optimizers that support step_param to
ensure parity with regular optimizers.
ghstack-source-id: 134330378
Test Plan: Ci
Reviewed By: SciPioneer
Differential Revision: D29727549
fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f