Commit Graph

177 Commits

Author SHA1 Message Date
ProGamerGov
71d50f4f89 Change docstring type callable to Callable for consistency (#82487)
### Description

Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function.

### Testing

There shouldn't be any testing required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487
Approved by: https://github.com/albanD
2022-08-01 17:26:09 +00:00
Olga Andreeva
a60907ec11 Adding fsdp fp16 and bf16 hooks (#81711)
Recently, `register_comm_hook` was introduced to `FSDP`, which at the moment supports only `NO_SHARD` strategy and has a default `all_reduce` hook implemented. This PR adds two lower precision hooks to an existing default hook.

I've also made slight adjustments to existing implementation of an `all_reduce` hook including:

`AllReduceState` ->` DefaultState `, motivation: `AllReduceState` is not specific to all_reduce. Gradients' pre- and post-division factors are also useful for other hooks, that require pre- and post-division, e.g. `fp16_hook` and `bf16_hook`.
I've put all 3 hooks into `default_hooks.py`
Additionally, `FSDP` supports `MixedPrecision` and, theoretically, it is possible to specify MixedPrecision for gradients and attach a lower precision hook to the model. To avoid double-casting, I've added a couple of checks to `fully_sharded_data_parallel`, i.e. casting to precision and back is performed by a lower precision hook only. I think, as a next step, it would be nice to ensure that user can't have both lower precision hook and MixedPrecision(reduce_dtype=<precision>) specified, but I am happy to discuss this and adjust current implementation.

As a test, I create two models: one with a lower precision hook and one with a `MixedPrecision(reduce_dtype=<precision>)` specified, perform one forward/backward and optimizer step and compare gradients.

PS. first version of this PR was reverted, because added unittests didn't include NCCL version checks for `bf16_hook` (thus failed on trunk). In this version, I've added appropriate checks for tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81711
Approved by: https://github.com/rohan-varma
2022-07-19 23:54:51 +00:00
PyTorch MergeBot
a8f4011e90 Revert "Adding fsdp fp16 and bf16 hooks (#80557)"
This reverts commit f7d6828467.

Reverted https://github.com/pytorch/pytorch/pull/80557 on behalf of https://github.com/aovladi due to broke distributed tests on trunk
2022-07-19 03:11:19 +00:00
Olga Andreeva
f7d6828467 Adding fsdp fp16 and bf16 hooks (#80557)
Recently, `register_comm_hook` was introduced to `FSDP`, which at the moment supports only `NO_SHARD` strategy and has a default `all_reduce` hook implemented. This PR adds two lower precision hooks to an existing default hook.

I've also made slight adjustments to existing implementation of an `all_reduce` hook including:

- `AllReduceState` ->  `DefaultState` , motivation: `AllReduceState` is not specific to `all_reduce`. Gradients' pre- and post-division factors are also useful for other hooks, that require pre- and post-division, e.g. fp16_hook and bf16_hook.
- I've put all 3 hooks into `default_hooks.py`

Additionally, `FSDP` supports `MixedPrecision` and, theoretically, it is possible to specify `MixedPrecision` for gradients and attach a lower precision hook to the model. To avoid double-casting, I've added a couple of checks to `fully_sharded_data_parallel`, i.e. casting to precision and back is performed by a lower precision hook only. I think, as a next step, it would be nice to ensure that user can't have both lower precision hook and `MixedPrecision(reduce_dtype=<precision>)` specified, but I am happy to discuss this and adjust current implementation.

As a test, I create two models: one with a lower precision hook and one with a `MixedPrecision(reduce_dtype=<precision>)` specified, perform one forward/backward and optimizer step and compare gradients.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80557
Approved by: https://github.com/rohan-varma
2022-07-18 22:40:56 +00:00
Jerome
547e499731 Enable Zero1's ddp_with_overlap for hpu backend (#80438)
Enable zero with ddp overlap feature along with a simple interface to insert functional optimizer to the map

Signed-off-by: Jerome <janand@habana.ai>

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80438
Approved by: https://github.com/rohan-varma, https://github.com/awgu
2022-07-18 15:05:27 +00:00
Rohan Varma
0c5fdfd95f Revert "Revert "[FSDP Optim State] Remove checkpoint prefix (#80480)"" (#80936)
This reverts commit fe361dede4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80936
Approved by: https://github.com/awgu
2022-07-06 22:21:07 +00:00
PyTorch MergeBot
fe361dede4 Revert "[FSDP Optim State] Remove checkpoint prefix (#80480)"
This reverts commit 04c50fec1c.

Reverted https://github.com/pytorch/pytorch/pull/80480 on behalf of https://github.com/suo due to Broke master 04c50fec1c, the test failures were not unrelated
2022-07-06 02:43:27 +00:00
Rohan Varma
04c50fec1c [FSDP Optim State] Remove checkpoint prefix (#80480)
Remove `_checkpoint_wrapped_module` prefixes when creating keys for optimizer state_dict.

Having these does not actually create an issue for optim_state_dict save / load, but we'd like to strip these keys out for downstream code that consumes these APIs typically expecting checkpointing prefixes to not exist (as checkpointing should be a transparent operation which should not change module / parameter names).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80480
Approved by: https://github.com/awgu, https://github.com/fegin
2022-07-06 01:17:58 +00:00
Chien-Chin Huang
e0eeb06ec6 Consolidate the naming of named_parameter and state_dict for CheckpointWrapper (#80089)
named_parameter() should return the same parameter names as state_dict() but the current CheckpointWrapper does not enforce this naming rule. This PR resolves this issue.

Differential Revision: [D37344200](https://our.internmc.facebook.com/intern/diff/D37344200/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80089
Approved by: https://github.com/rohan-varma
2022-07-05 22:11:59 +00:00
Charlie Yan
ffae7308c9 Enable test: distributed/algorithms/quantization/test_quantization (#80097)
fixes  https://github.com/pytorch/pytorch/issues/69017
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80097
Approved by: https://github.com/wanchaol
2022-07-01 01:32:33 +00:00
PyTorch MergeBot
f667aaed1d Revert "Added serialization to postlocal_SGD. (#80435)"
This reverts commit dfdf4e79df.

Reverted https://github.com/pytorch/pytorch/pull/80435 on behalf of https://github.com/suo due to broke distributed tests on trunk, see: dfdf4e79df
2022-06-30 01:34:10 +00:00
Olga Andreeva
dfdf4e79df Added serialization to postlocal_SGD. (#80435)
Fixes #75666

Current PR adds the functionality for `PostLocalSGD` communication hook and tests that communication hook can be properly saved and restored. Similar to https://github.com/pytorch/pytorch/pull/79334, where serialization was added to `PowerSGD`.

``__getstate__``

 Returns:
```
        ``Dict[str, Any]`` which will be pickled and saved.
        ``process_group`` and ``subgroup`` are not serializable and excluded from
        a returned state.
```
``__setstate__``
```
          Takes provided ``state`` and retrieves ``PostLocalSGDState``.
          ``process_group`` and ``subgroup`` are set to default process_group and subgroup respectively.
           Default subgroup is equivalent to the subgroup on each node.
```

Small adjustment to `PowerSGD`'s warning message.

Refactored unittest, i.e. separated parity and log checks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80435
Approved by: https://github.com/awgu
2022-06-29 23:59:46 +00:00
Rohan Varma
5fc2d45a3a Remove unneeded TODO (#80453)
This TODO is no longer needed, as we use `_register_fused_optim` to register the overlapped optimizer in DDP.  Also, remove comment about API being experimental, as this API is no longer going to be used by end user.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80453
Approved by: https://github.com/awgu
2022-06-29 01:19:48 +00:00
Olga Andreeva
5fc209ed11 FSDP communication hook interface for NO_SHARD strategy (#79833)
Fixes #79114

An implementation of a FSDP communication hook interface for a NO_SHARD strategy:
- `FullyShardedDataParallel.register_comm_hook(self, state: object, hook: callable)` checks current sharding strategy. If it is other that NO_SHARD, raises a runtime error. Otherwise, sets and shares a specified hook and its state with all submodules
- When FSDP is ready to communicate a gradient, checks if there is a registered hook, and calls it instead of all_reduce. Additionally, gradient pre and post devision are not performed if a hook is registered.

To test the interface, I've implemented a communication hook, that calls for `all_reduce`.

A  unittest:
- checks that is a sharding strategy is anything but NO_SHARD, a runtime error is raised
- checks that for a NO_SHARD case, model with registered all_reduce hook and without a hook work the same.
- checks for 2 types of FSDP models: with the wrapped first layer and without. (to make sure submodules have a hook registered)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79833
Approved by: https://github.com/rohan-varma, https://github.com/awgu
2022-06-28 08:03:11 +00:00
anjali411
3bcc19b29a Add __all__ to various submodules in torch.fx, distributions, distributed, package (#80367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80367
Approved by: https://github.com/albanD
2022-06-27 21:27:30 +00:00
Rohan Varma
2ede28724d [CheckpointWrapper] Replace generic mod prefix (#79830)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79830
Approved by: https://github.com/awgu, https://github.com/zhaojuanmao
2022-06-21 16:01:59 +00:00
Olga Andreeva
8a6d83079c Functionality/pickling for commhooks (#79334)
This PR addresses issue address #75666.
Stateful communication hook now can be saved and reloaded to resume training.

Current PR adds the functionality for PowerSGD communication hook and tests that communication hook can be properly saved and restored.

PowerSGD implementation uses ``__slots__``, as a result introduced __getstate__ and __setstate__ methods are implemented to work with `__slots__` and not` __dict__`.

`__getstate__ `

	 Returns:
           A dictionary that represents a ``PowerSGDState`` which will be pickled and saved.
          ``process_group`` is non-serializable and excluded from a returned state.

`__setstate__`

	Takes a provided ``state`` and retrieves ``PowerSGDState``.
        ``process_group`` is set to default with a proper warning issued to a user.

Unit test

A hook-independent `_test_hook_pickling` is added with this PR, as well as `test_ddp_hook_pickling_powerSGD`, which tests `powerSGD`’s ability to be saved and reloaded.

Currently, the test creates a ddp model with a provided hook, trains it for 10 epochs and saves model’s state and hook’s state.
During reloading, unit test makes sure that a warning was logged (only one warning and the proper one). It then proceeds to check that reloaded hook and original hook are the same. Finally, it checks that a hook’s state was properly initialized:
	- it compares slot values (all, but 2: `process_group` and `rng`) for original and reloaded state
	- it checks that process group was set to a default group
	- it checks that a random state was restored properly with np.testing.assert_array_equal, because `rng` is an instance of `np.random.RandomState`, represented by a tuple. One of entries is of `ndarray dtype[uint32]` type and `np.testing.assert_array_equal` is used for assertion.

Future To-Do:
	- Implement similar __getstate__ and __setstate__ for other stateful communication hooks
	- Add appropriate tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79334
Approved by: https://github.com/rohan-varma, https://github.com/awgu
2022-06-16 23:15:34 +00:00
Rohan Varma
543919cfc8 Forward attributes to wrapped module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78854

Approved by: https://github.com/albanD
2022-06-14 01:13:33 +00:00
Rohan Varma
44fe851feb [WIP] Fix non-reentrant hooks based checkpointing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78752

Approved by: https://github.com/albanD
2022-06-14 01:13:33 +00:00
Rohan Varma
ec86070922 Checkpoint util
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78704

Approved by: https://github.com/zhaojuanmao
2022-06-10 18:37:36 +00:00
Rohan Varma
f9f8127414 CheckpointWrapper state_dict fix (#77224)
- Uses state dict / load state dict hooks to ensure that modules wrapped with `CheckpointWrapper` can be loaded into non-checkpointed wrapped module.

This is because a training run can use activation checkpointing, then we can recover `state_dict`, and a future run may not want to wrap modules with activation checkpointing or decide to change activation checkpoint wrapping structure. To support this, we add hooks to remove / add the relevant prefix as needed.

Tests are added to ensure we can load into CheckpointWrapper module as well as local module from CheckpointWrapper-wrapped module. state_dict with FSDP is also verified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77224
Approved by: https://github.com/zhaojuanmao
2022-05-17 03:39:31 +00:00
wayi1
5ab8afe487 [Model Averaging] Support disabling post-local gradient sync (#76723)
I find that sometimes disabling intra-subgroup gradient allreduce can still give a satisfying accuracy for some cases, so better to make such gradient averaging configurable. This does not take into account the saving in the communication of allreducing gradients.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76723
Approved by: https://github.com/rohan-varma
2022-05-16 18:09:09 +00:00
Yi Wang
25fa6235f4 [Model Averaging] Make an error message more clear in hierarchical_model_averager.py
As title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75832
Approved by: https://github.com/mrshenli
2022-04-26 15:20:51 +00:00
wayi1
e90580390d [Model Averaging] Make the error message more informative in hierarchical_model_averager.py
As title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76277
Approved by: https://github.com/rohan-varma
2022-04-24 15:10:19 +00:00
magialiao
7c8c8cc248 Use batched operations for PowerSGD
This PR is a rebased version of #75157 which fixes CI issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76041
Approved by: https://github.com/albanD, https://github.com/rohan-varma
2022-04-21 03:25:09 +00:00
Alban Desmaison
da3c848dfa Make distributed raise ImportError when not available
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75975

Approved by: https://github.com/mrshenli
2022-04-20 13:05:18 +00:00
PyTorch MergeBot
c5d57e7be9 Revert "Use batched operations for PowerSGD"
This reverts commit 5654e63398.

Reverted https://github.com/pytorch/pytorch/pull/75157 on behalf of https://github.com/albanD
2022-04-18 13:10:29 +00:00
magialiao
5654e63398 Use batched operations for PowerSGD
This implements method proposed in #74907

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75157
Approved by: https://github.com/wayi1, https://github.com/rohan-varma
2022-04-18 04:34:17 +00:00
Haijunlv
08f3b95857 fix PostLocalSGDOptimizer and ModelAverager average bug
Fixes #74157

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74894
Approved by: https://github.com/rohan-varma, https://github.com/wayi1
2022-04-13 11:41:27 +00:00
wayi1
4fb7fa081e [Model Averaging] Code simplification for _find_process_group function (#75007)
Summary:
Previously the highest-level process group in `period_process_group_dict` could be `None`, indicating the global group. Now `period_process_group_dict` cannot contain `None` as a process group, so the function `_find_process_group` can just return a process group instead of a tuple -- when not found, just return `None`, because now the returned process group cannot be `None`.

Proposal: https://github.com/pytorch/pytorch/issues/71325

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75007

Reviewed By: awgu

Differential Revision: D35357816

Pulled By: rohan-varma

fbshipit-source-id: 4522dba49797df7140227bfd822d668b7e118a66
(cherry picked from commit 77ca01b555d52685283c969176b08de4ff46c32d)
2022-04-04 20:31:22 +00:00
Yi Wang
2aebece625 [Model Averaging] Remove unused variable world_size in post_localSGD_hook.py (#74803)
Summary:
As title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74803

Reviewed By: albanD

Differential Revision: D35175613

Pulled By: mrshenli

fbshipit-source-id: 881933656ed214554b8acb4c5756349cea0af51d
(cherry picked from commit 033efb2eea856d00d5e78c8a99d726c6cf69d714)
2022-03-28 17:41:26 +00:00
wayi1
5fbe8b1966 [Model Averaging] Make HierarchicalModelAverager a subclass of averagers.ModelAverager
Make `HierarchicalModelAverager` a subclass of `averagers.ModelAverager` is a preparation step for incorporating hierarchical SGD into `PostLocalSGDOptimizer`.

Proposal: https://github.com/pytorch/pytorch/issues/73382
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74564
Approved by: https://github.com/rohan-varma
2022-03-24 21:52:00 +00:00
wayi1
5993f48711 [Model Averaging] Add a reference to hierarchical SGD (#73823)
Summary:
Add a reference.

Also fix the comment: unlike `averagers.py`, currently this is not a base class that can inherit many subclasses.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73823

Reviewed By: ejguan

Differential Revision: D34684366

Pulled By: rohan-varma

fbshipit-source-id: e253ed39ba0783ad73bfd889e9a2e7d0c9214a3a
(cherry picked from commit a9fec3585078881ccd5886ebb27e52b15f7181b1)
2022-03-08 05:56:17 +00:00
wayi1
0bb3b0652c [Model Averaging] Support hierarchical model averaging (#73285)
Summary:
Implement hierarchical model averaging proposed in https://github.com/pytorch/pytorch/issues/71325.

Unit tests are added. Since I don't have access to 4-GPU machines in open-source environment, expect that the branch with the prefix of `ci-all` can run the test that requires 4 GPUs.

In the future, the internals of `PeriodicModelAveraging` can be simplified as an implementation of a specialized hierarchical model averaging, where `period_group_size_dict` only has a pair of period and world size.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73285

Reviewed By: mrshenli

Differential Revision: D34457792

Pulled By: rohan-varma

fbshipit-source-id: 39a6c5bf8a2852b6394a56abbad17b8a909b9fba
(cherry picked from commit 5f543d46103edb515db199dbb80db43c85665f29)
2022-03-04 18:29:36 +00:00
Andrew Gu
59dd84cab6 [Join][BE] Fix typo; remove obsolete method (#72886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72886

**Test Plan**
Searching for `_schedule_shadow_all_reduce_for_fwd_pass` shows that it is defined but never used.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34255651

Pulled By: awgu

fbshipit-source-id: 205a0325c2cdc05e127a183cb86fa2fc2e0db99d
(cherry picked from commit 4492f03a3f)
2022-02-16 15:03:09 +00:00
Rohan Varma
aeacf910b5 [Checkpoint] Rename file (#72748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72748

Removes underscore from file/class as directory is already private
ghstack-source-id: 149109295

Test Plan: Ci

Reviewed By: samdow

Differential Revision: D34179308

fbshipit-source-id: 8e956f3c83f21159c5e0fcdce09624ecb8a73ac0
(cherry picked from commit adfd8bc357)
2022-02-16 00:08:23 +00:00
wayi1
8b08478115 Fix the doc of PostLocalSGDState (#72792)
Summary:
The first arg of `PostLocalSGDState` ctor, `process_group`, cannot be empty. Here to simplify the usage, does not even create a subgroup explicitly.

See the example in unit test: 4feef6c970/torch/testing/_internal/distributed/distributed_test.py (L4260)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72792

Reviewed By: samdow

Differential Revision: D34213221

Pulled By: rohan-varma

fbshipit-source-id: 078343f3ee138e175bf835897f190032eb970662
(cherry picked from commit bf90af704f)
2022-02-15 23:47:12 +00:00
Yuxin Wu
1ed4653e89 Stop writing logs to root logger (#72649)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/72648

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72649

Reviewed By: soulitzer

Differential Revision: D34172113

Pulled By: mrshenli

fbshipit-source-id: 98cb4140b978a0d9fa53876e427ea3b8bbe884cf
(cherry picked from commit c14297cee6)
2022-02-11 21:30:53 +00:00
Brian Muse
8bf3179f6e #71946 Remove Python 3.6 references (#72211)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/71946

This commit removes some bits of code that were hard coded for Python 3.6 support from the `.circleci` and `torch` folders. It should only be merged if https://github.com/pytorch/pytorch/issues/66462 is complete.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72211

Reviewed By: dagitses, seemethere

Differential Revision: D33982604

Pulled By: musebc

fbshipit-source-id: 8f453bf9909df615addd59538adb369c65484044
(cherry picked from commit 944a9970fe)
2022-02-08 03:46:20 +00:00
Omar
25f9fe22a9 [PowerSGD] Add orthogonalization with QR factorization (#72043)
Summary:
### 🚀 The feature, motivation and pitch
Following the discussion in https://github.com/pytorch/pytorch/issues/65813, I added the QR factorization to powerSGD_hook.py
Gram-Schmidt orthogonalization can't be fully replaced because _torch.linalg.qr_ doesn't work with half-precision. Moreover, in my tests, it works faster with a rank lesser than 3.

This is one sample experiment timing powerSGD_hook on ResNext101 with the two different methods:
![Screenshot from 2022-01-31 18-14-00](https://user-images.githubusercontent.com/42100908/151840929-270c67dd-9fe7-4f11-8e70-8bf2d0ba678d.png)

### Alternatives
Use _torch.orgqr(*torch.geqrf(matrix))_. From my tests it performances are similar to _torch.linalg.qr_.

### Additional context
_No response_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72043

Reviewed By: albanD

Differential Revision: D34042781

Pulled By: cbalioglu

fbshipit-source-id: e331179d3b7ac40d445b651fc473b16ae4ead462
(cherry picked from commit f64bf3839a)
2022-02-07 21:15:40 +00:00
Yanli Zhao
2336571cb7 make fsdp folder to be public (#72084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72084

make fsdp folder to be public
ghstack-source-id: 148173447

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D33903417

fbshipit-source-id: 7852a2adc4af09af48a5ffa52ebf210489f834d5
(cherry picked from commit bd06513cfe)
2022-02-02 15:50:14 +00:00
Rohan Varma
8fa5cde3a9 Fix hooks (#71970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71970

- Provide default arg for power SGD convenience wrapper that matches the main API default

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D33837457

fbshipit-source-id: 8f4efab4992b3fff09456a18db2c83e087c25bdf
(cherry picked from commit 83f52fb3c7)
2022-01-28 23:07:33 +00:00
Rohan Varma
bdcdf94bdd [Opt Overlap] Clean up code in _OptimizerHookState (#71620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71620

Remove from_functional_optim and make it the default constructor since
that is the only way _OptimizerHookState is now being built. Also, no longer
need to expose create_functional_optim helper function
ghstack-source-id: 147577174

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D33700593

fbshipit-source-id: ba089ce3bf66ccf8f71cffdd0f4d4bddc03e8b14
(cherry picked from commit a50b2caf0e)
2022-01-26 19:33:49 +00:00
Rohan Varma
1c8fcc44cb [Opt Overlap] Support optimizing partial set of parameters (#71608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71608

Per title
ghstack-source-id: 147577178

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D33696382

fbshipit-source-id: 5b638d3edf5f03ba476356d61e96ca604de18c8f
(cherry picked from commit 436b547fb0)
2022-01-26 19:33:49 +00:00
Rohan Varma
8273912a8c [Opt Overlap] Implement _OverlappedOptimizer (#71605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71605

ghstack-source-id: 147577173

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D33692686

fbshipit-source-id: b0fdb45245d923e1de8fef4431d3e235ac57dcbf
(cherry picked from commit 8b83dbf690)
2022-01-26 07:32:04 +00:00
Rohan Varma
f5a71ec2d6 [Opt Overlap] Implement as_functional_optim and create_functional_optim (#71604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71604

Implement 2 helper functions:
- as_functional_optim which takes in a torch.optim class type and arguments and
  creates the corresponding functional optimizer.
- create_functional_optim which takes in the functional optimizer class type
  and constructs it. Note that as_functional_optim calls into
  create_functional_optim.

  The first will be used in future PRs as described in
  https://github.com/pytorch/pytorch/issues/67570 to create a functional
  optimizer from a traditional optimizer. The latter is used in
  _OptimizerHookState to create a functional optimizer.

  Both new helper functions are covered by unittests.
ghstack-source-id: 147577170

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D33688995

fbshipit-source-id: 8b2daafd1b914efa90877cc4313aa9a428546fc1
(cherry picked from commit 42fdae2991)
2022-01-25 18:32:13 +00:00
Rohan Varma
281663955f [Opt Overlap] Create Optimizer Hook State directly from functional optim (#71602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71602

The design in https://github.com/pytorch/pytorch/issues/67570 requires
`_OptimizerHookState` to be created directly from a functional optimizer. Add
support and tests for this. Also refactor a few tests.
ghstack-source-id: 147577175

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D33687477

fbshipit-source-id: f3c789aa77773f918e01a8d0cf08739b2edf07b3
(cherry picked from commit 4851e1c6d4)
2022-01-25 18:32:13 +00:00
Rohan Varma
9b3a56eecf [Optimizer Overlap] Move hooks to own file (#71601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71601

Moves current prototype optimizer overlap to its own file for a better
namespace. No code changes besides a few comment fixes. Note that this code is
still prototype and not expected to be used by an end user.
ghstack-source-id: 147458826

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D33662678

fbshipit-source-id: 3cc931323230a4b66c02b9e6f744aaf5c48d4d34
(cherry picked from commit 5070595c7f)
2022-01-23 00:04:32 +00:00
Rohan Varma
d8abe813bc [LocalSGD] Move feature to Beta, clean up some docs (#71621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71621

Moves this feature to beta as discussed, and cleans up some docs.
Synced offline with wayi1 who mentioned that the current names are preferred
as he works to prototype hierarchical allreduce as discussed in this RFC: https://github.com/pytorch/pytorch/issues/71325.
ghstack-source-id: 147382940

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D33700444

fbshipit-source-id: 8eb543f5b02a119d0790a5c0919e6def6383a067
(cherry picked from commit 656e9809b2)
2022-01-21 21:10:42 +00:00
Omar Younis
569aeec1bc fix typo in debugging_hooks.py (#70956)
Summary:
I just fixed a small typo in the debugging_hooks documentation

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70956

Reviewed By: jbschlosser

Differential Revision: D33508898

Pulled By: dagitses

fbshipit-source-id: fc5935e5a2e2ddc45657a22d3b33a11aba378d9b
2022-01-10 12:59:42 -08:00