pydocstyle check
averagers.py
Pre
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:1 at module level:
D100: Missing docstring in public module
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:20 in public method `__init__`:
D107: Missing docstring in __init__
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:27 in public method `average_parameters`:
D102: Missing docstring in public method
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:84 in public method `__init__`:
D107: Missing docstring in __init__
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:106 in public method `average_parameters`:
D205: 1 blank line required between summary line and description (found 0)
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:106 in public method `average_parameters`:
D400: First line should end with a period (not '`')
6
Post
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:1 at module level:
D100: Missing docstring in public module
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:20 in public method `__init__`:
D107: Missing docstring in __init__
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:27 in public method `average_parameters`:
D102: Missing docstring in public method
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/averagers.py:84 in public method `__init__`:
D107: Missing docstring in __init__
4
utils.py
Pre
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:1 at module level:
D100: Missing docstring in public module
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:17 in public function `average_parameters`:
D205: 1 blank line required between summary line and description (found 0)
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:45 in public function `get_params_to_average`:
D205: 1 blank line required between summary line and description (found 0)
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:45 in public function `get_params_to_average`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:68 in public function `average_parameters_or_parameter_groups`:
D200: One-line docstring should fit on one line with quotes (found 3)
5
Post
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/utils.py:1 at module level:
D100: Missing docstring in public module
1
hierarchical_model_averager.py
Pre
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:1 at module level:
D100: Missing docstring in public module
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:16 in public class `HierarchicalModelAverager`:
D205: 1 blank line required between summary line and description (found 0)
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:98 in public method `__init__`:
D107: Missing docstring in __init__
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:137 in private method `_find_process_group`:
D205: 1 blank line required between summary line and description (found 0)
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:137 in private method `_find_process_group`:
D400: First line should end with a period (not ',')
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:137 in private method `_find_process_group`:
D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:151 in public method `average_parameters`:
D205: 1 blank line required between summary line and description (found 0)
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:151 in public method `average_parameters`:
D400: First line should end with a period (not '`')
8
Post /workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:1 at module level:
D100: Missing docstring in public module
/workspaces/pytorch/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py:99 in public method `__init__`:
D107: Missing docstring in __init__
2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117038
Approved by: https://github.com/H-Huang
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
Summary:
Previously the highest-level process group in `period_process_group_dict` could be `None`, indicating the global group. Now `period_process_group_dict` cannot contain `None` as a process group, so the function `_find_process_group` can just return a process group instead of a tuple -- when not found, just return `None`, because now the returned process group cannot be `None`.
Proposal: https://github.com/pytorch/pytorch/issues/71325
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75007
Reviewed By: awgu
Differential Revision: D35357816
Pulled By: rohan-varma
fbshipit-source-id: 4522dba49797df7140227bfd822d668b7e118a66
(cherry picked from commit 77ca01b555d52685283c969176b08de4ff46c32d)
Summary:
Add a reference.
Also fix the comment: unlike `averagers.py`, currently this is not a base class that can inherit many subclasses.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73823
Reviewed By: ejguan
Differential Revision: D34684366
Pulled By: rohan-varma
fbshipit-source-id: e253ed39ba0783ad73bfd889e9a2e7d0c9214a3a
(cherry picked from commit a9fec3585078881ccd5886ebb27e52b15f7181b1)
Summary:
Implement hierarchical model averaging proposed in https://github.com/pytorch/pytorch/issues/71325.
Unit tests are added. Since I don't have access to 4-GPU machines in open-source environment, expect that the branch with the prefix of `ci-all` can run the test that requires 4 GPUs.
In the future, the internals of `PeriodicModelAveraging` can be simplified as an implementation of a specialized hierarchical model averaging, where `period_group_size_dict` only has a pair of period and world size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73285
Reviewed By: mrshenli
Differential Revision: D34457792
Pulled By: rohan-varma
fbshipit-source-id: 39a6c5bf8a2852b6394a56abbad17b8a909b9fba
(cherry picked from commit 5f543d46103edb515db199dbb80db43c85665f29)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71621
Moves this feature to beta as discussed, and cleans up some docs.
Synced offline with wayi1 who mentioned that the current names are preferred
as he works to prototype hierarchical allreduce as discussed in this RFC: https://github.com/pytorch/pytorch/issues/71325.
ghstack-source-id: 147382940
Test Plan: CI
Reviewed By: zhaojuanmao
Differential Revision: D33700444
fbshipit-source-id: 8eb543f5b02a119d0790a5c0919e6def6383a067
(cherry picked from commit 656e9809b2)
Summary:
Here 20 is a bad example, since the warmup step is set as 100. 200 iterations will make much more sense.
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70974
Reviewed By: dagitses
Differential Revision: D33474576
Pulled By: rohan-varma
fbshipit-source-id: 4c7043108897848bde9503d77999971ad5567aa6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65197
1. The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type.
2. The parameters are read from local optimizer's param_groups instead of a separate input.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 138307226
Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity
Reviewed By: rohan-varma
Differential Revision: D31007439
fbshipit-source-id: bbb0526e6763ef76775b85088571506b3942c722
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63895
When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future.
The replacement is `tensor.set_`.
ghstack-source-id: 136593433
Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity
Reviewed By: SciPioneer
Differential Revision: D30526178
fbshipit-source-id: a1ac0ec3665d8623edd5bf94f01c1132daff5c00