Commit Graph

275 Commits

Author SHA1 Message Date
Aaron Gokaslan
1e2d82b8e4 [BE] Merge isinstance calls together (#94419)
Simplify and speeds up isinstance calls by checking for multiple types at the same time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419
Approved by: https://github.com/ezyang
2023-02-09 00:47:26 +00:00
Aaron Gokaslan
748bac8757 [BE]: Apply pyupgrade yield from and unit test alias upgrades (#94309)
Applies some more harmless pyupgrades. This one gets rid of deprecated aliases in unit_tests and more upgrades yield for loops into yield from generators which are more performance and propagates more information / exceptions from original generator. This is the modern recommended way of forwarding generators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94309
Approved by: https://github.com/albanD
2023-02-07 20:08:58 +00:00
Rohan Varma
264c89658b Move in backward opt setup to helper (#92059)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92059
Approved by: https://github.com/awgu
2023-02-02 23:57:14 +00:00
Rohan Varma
975feb606e [DDP][Easy] Remove unused var (#93128)
removes this unused var, the overall buffer comm hook feature is also not being used, we should deprecate / remove it as it is still a private API.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93128
Approved by: https://github.com/awgu
2023-01-27 18:08:29 +00:00
Shen Li
0035340488 Allow DDP to handle custom dataclass forward outputs (#92334)
Differential Revision: [D42554973](https://our.internmc.facebook.com/intern/diff/D42554973)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92334
Approved by: https://github.com/zhaojuanmao
2023-01-18 14:51:37 +00:00
kshitij12345
745fe35df5 [follow-up] Python Attr Serialization (#88913)
Ref: https://github.com/pytorch/pytorch/pull/81616#issuecomment-1307595402
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88913
Approved by: https://github.com/albanD
2023-01-13 17:38:51 +00:00
joncrall
ad782ff7df Enable xdoctest runner in CI for real this time (#83816)
Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-29 05:32:42 +00:00
Rohan Varma
e8bf7c21e4 Integrate apply_optim_in_backward with DDP (#89194)
Allow _apply_optim_in_backward to work with DDP.

Example:

```
dist.init_process_group("nccl", rank=rank, world_size=2)
    torch.cuda.set_device(rank)
    e = enc().cuda(rank)
    _apply_optimizer_in_backward(
        optimizer_class=torch.optim.SGD,
        params=e.parameters(),
        optimizer_kwargs={"lr": 0.03},
    )
    e = DDP(e, device_ids=[rank])
    inp = torch.randn(1, 10, device=rank)
    e(inp).sum().backward()
```

Constraints:

1. Custom communication hook not yet supported
2. _apply_optim_in_backward needs to be called _before_ wrapping model in DDP.
3. DDP will remove the gradient hooks _apply_optim_in_backward registers, so these gradient hooks will not be fired and cannot be used.
4. All DDP managed parameters have grads set to None by default once optimizer is applied. There is no support for setting only some parameter grads to None, this must be done manually by user (and DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0 needs to be set.)

Differential Revision: [D41329694](https://our.internmc.facebook.com/intern/diff/D41329694/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41329694/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89194
Approved by: https://github.com/zhaojuanmao
2022-12-21 07:35:19 +00:00
Sergii Dymchenko
9ef1d55e6b Fix non-existing parameters in docstrings in torch/nn (#90596)
This is a continuation of https://github.com/pytorch/pytorch/pull/90505

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90596
Approved by: https://github.com/lezcano
2022-12-10 14:37:31 +00:00
Ram Rachum
77f9b2e8bf Fix exception causes in fx, nn and onnx packages (#90134)
This is a continuation of #90118

@kit1980
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90134
Approved by: https://github.com/kit1980
2022-12-06 04:34:58 +00:00
Andrew Gu
bfffc8d8ef [DDP][Docs] Add warning that no_sync() should include forward (#89244)
The issue where the user only includes `loss.backward()` inside `no_sync()` but not the forward pass has arisen several times now. I think adding an explicit warning in the docs is worthwhile.

Rendered doc:
<img width="769" alt="Screen Shot 2022-11-17 at 9 21 32 PM" src="https://user-images.githubusercontent.com/31054793/202602005-22c000b7-1093-4eaf-ba66-9c929a66906b.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89244
Approved by: https://github.com/zhaojuanmao
2022-11-18 22:06:24 +00:00
Colin Taylor
24b9890f03 [torchrec] [composable] update ShardedEmbeddingBagCollection to be use registered EBCs with shardedTensors as registered modules (#758) (#88026)
Summary:
X-link: https://github.com/pytorch/torchrec/pull/758

This PR fixes a bug in FSDP/DDP, where ShardedTensors are not supported even if passed in as params to ignore.
this is important for composability because TorchRec named_parameters() will return FQN of shardedTensors (as defined in goals)
It defines device of ShardedTensor to be None when local_tensor() does not exist on rank

update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit

Differential Revision: D40458625

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88026
Approved by: https://github.com/wanchaol, https://github.com/rohan-varma
2022-11-17 04:26:13 +00:00
Charlie Yan
8523c45717 Delete stub file to enable mypy check (#4649) (#88701)
Summary:
X-link: https://github.com/facebookresearch/detectron2/pull/4649

Context in https://fburl.com/4irjskbe

This change deletes distributed.pyi, so that lintrunner will run mypy on distributed.py for typing check.

Test Plan: CI

Differential Revision: D41028360

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88701
Approved by: https://github.com/zhaojuanmao
2022-11-09 20:29:34 +00:00
Will Constable
678d038001 Support DDP ignored parameters in DDPOptimizer (#88460)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88460
Approved by: https://github.com/aazzolini
2022-11-04 21:42:15 +00:00
Kazuaki Ishizaki
2ddefbdc3c Fix typos used in documents under torch directory (#88300)
This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300
Approved by: https://github.com/lezcano
2022-11-02 09:38:13 +00:00
Horace He
12dd877395 Fix all references to torchdynamo from the merge (#87731)
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @jansel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87731
Approved by: https://github.com/yanboliang, https://github.com/ezyang, https://github.com/anijain2305, https://github.com/jansel
2022-10-31 06:51:07 +00:00
PyTorch MergeBot
641d8e0e69 Revert "Enable mypy check for distributed.py, and fix type errors (#87543)"
This reverts commit 2cc624cd43.

Reverted https://github.com/pytorch/pytorch/pull/87543 on behalf of https://github.com/weiwangmeta due to breaking internal builds
2022-10-28 02:20:25 +00:00
Charlie Yan
2cc624cd43 Enable mypy check for distributed.py, and fix type errors (#87543)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87543
Approved by: https://github.com/fduwjj
2022-10-27 00:22:54 +00:00
Charlie Yan
0294787bd6 Format distributed.py (#87667)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87667
Approved by: https://github.com/zhaojuanmao
2022-10-26 06:02:30 +00:00
Charlie Yan
bebd162249 Fix doc of DDP (#86244) (#86256)
[ghstack-poisoned]

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86256
Approved by: https://github.com/rohan-varma
2022-10-06 00:48:56 +00:00
Rohan Varma
be4e43c7d0 Remove DataParallel remnants from DDP doc (#86221)
As @aazzolini pointed out, the docstring is incorrect and probably vestige from DP / single process multi device mode in DDP.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86221
Approved by: https://github.com/aazzolini
2022-10-05 22:30:02 +00:00
Will Constable
32fc0b958e Expose get_active_ddp_module api for torchdynamo DDP (#83333)
Pairs up with torchdynamo PR https://github.com/pytorch/torchdynamo/pull/628

Exposes a new API that lets torchdynamo know when it is compiling the 'forward' of a module that is inside a DDPmodule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83333
Approved by: https://github.com/mrshenli
2022-09-17 02:10:25 +00:00
joncrall
4618371da5 Integrate xdoctest - Rebased (#82797)
This is a new version of #15648 based on the latest master branch.

Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.

In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)

Fixes https://github.com/pytorch/pytorch/issues/71105

@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
2022-08-12 02:08:01 +00:00
Hubert Lu
cd18b78daa [ROCm] Enable bf16-related tests in test_c10d_nccl.py and test_grad_layout_1devicemodule_1replicaperprocess (#82020)
### Description
Enable bf16-related unit tests in test_c10d_nccl.py and test_grad_layout_1devicemodule_1replicaperprocess as follows:

- distributed/test_c10d_nccl test_bf16_compress_wrapper_is_view (main.DistributedDataParallelTest)
- distributed/test_c10d_nccl test_bf16_compress_wrapper_nccl (main.DistributedDataParallelTest)
- distributed/test_c10d_nccl test_grad_layout_1devicemodule_1replicaperprocess (main.DistributedDataParallelTest)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82020
Approved by: https://github.com/ezyang
2022-08-11 21:16:33 +00:00
Yi Wang
08d54b5cd5 Correct DDP example (#83034)
remove undefined `pg` from DDP example code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83034
Approved by: https://github.com/mrshenli
2022-08-09 18:58:33 +00:00
ProGamerGov
71d50f4f89 Change docstring type callable to Callable for consistency (#82487)
### Description

Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function.

### Testing

There shouldn't be any testing required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487
Approved by: https://github.com/albanD
2022-08-01 17:26:09 +00:00
anjali411
3bcc19b29a Add __all__ to various submodules in torch.fx, distributions, distributed, package (#80367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80367
Approved by: https://github.com/albanD
2022-06-27 21:27:30 +00:00
Rohan Varma
e7cb44b6c4 Guard distributed imports (#77727)
Move distributed import after dist.is_avail check to fix builds with USE_DISTRIBUTED=0. Although, note that this issue is not caught by any CI at the moment.

Closes https://github.com/pytorch/pytorch/issues/77704
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77727
Approved by: https://github.com/malfet
2022-05-18 11:27:52 +00:00
Rohan Varma
6f954d7bbb FSDP parameter sync
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77492

Approved by: https://github.com/zhaojuanmao
2022-05-17 19:58:49 +00:00
Rohan Varma
bbb1f106c7 Separate input moving to utils file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77187

Test fix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77235

Lint fix

Approved by: https://github.com/awgu
2022-05-11 21:55:38 +00:00
Rohan Varma
ffb0946504 Generalize param verification and broadcast
New PR for https://github.com/pytorch/pytorch/pull/75970 to be compatible with GHF.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76374
Approved by: https://github.com/awgu
2022-04-26 22:25:53 +00:00
pritam
b26df43f15 Fix bug where __getstate__ of DDP looks for self._replicated_tensor_module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76349

When we are not using ReplicatedTensor in DDP and try to save a DDP
module it will error out since it tries to delete the _replicated_tensor_module
attribute.

Fixing this by checking if this mode is enabled before triggering the delete.

Differential Revision: [D35875167](https://our.internmc.facebook.com/intern/diff/D35875167/)

Approved by: https://github.com/mrshenli, https://github.com/zhaojuanmao
2022-04-26 02:49:49 +00:00
pritam
3a38f175dd Convert DDP parameters to ReplicatedTensor during forward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75753

As per the design in https://github.com/pytorch/pytorch/issues/72138,
convert DDP parameters to ReplicatedTensor during its forward pass. Concretely,
this is done as follows:

1) Create a separate `_replicated_tensor_module` which is a copy of self.module
without creating copies of the Tensors themselves.
2) Use `_replicated_tensor_module` instead of `self.module` during the forward
pass.
3) Have a context manager `_ddp_replicated_tensor` to enable this, since
certain edge cases can fail where self.module is changed out of band resulting
in discrepancy between self.module and `_replicated_tensor_module`.

Differential Revision: [D35533736](https://our.internmc.facebook.com/intern/diff/D35533736/)

Approved by: https://github.com/wanchaol, https://github.com/rohan-varma
2022-04-18 03:27:23 +00:00
Junjie Wang (PyTorch)
0a6ac31797 [PT-D][DDP][BE] Add unit tests for Forward and Backward Hook (#74063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74063

Address the issue https://github.com/pytorch/pytorch/issues/66229 as part of BE effort.

Basically:
1. We remove the stale comment which confuses users.
2. Add more unit tests to test the forward/backward hook working for DDP.
ghstack-source-id: 151463380

Test Plan: CI

Reviewed By: rohan-varma

Differential Revision: D34800830

fbshipit-source-id: 21133209323b2b5eda0dd6472f6309d4fb779b97
(cherry picked from commit b9b165c8305572128395daffafc13fcac8b85e29)
2022-03-16 23:18:28 +00:00
Shihao Xu
bcd0843bec [torch.distributed][DDP] Disable DDP bucketing for the first iteration (#72843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72843

# [Debug Story] Training Hanging and DDP Bucketing

**What are the characteristics of the hanging training instance?**

The model uses TorchRec `PooledEmbeddingArch` and corresponding sharding solution.

The model config difference to trigger this hanging issue is turning on position weighted embedding tables.

A feature processor module, `GroupedPositionWeightedModule`, is constructed on all ranks, but `GroupedPositionWeightedModule.foward(...)` is only [called on subset ranks of the whole world](https://fburl.com/code/yqrmtvli).

**What was the initial manifested error?**

The training was stuck in the first iteration.

**What are useful debugging tools this time?**

After turning off [static_graph in DDP](https://fburl.com/code/4io81p5i), we saw there were sparse feature lengths becoming negative values after all-to-all collectives. Hanging becomes fatal failure.

After turning on [torch.distributed DETAIL debugging mode](https://fburl.com/code/cp8e28mm), we saw 2 trainers sent out mismatched collectives, one doing all-to-all, the other doing all-reduce. So we know the negative values comes from all-to-all being matched with all-reduce. the error had happened ahead, which is the wrong timing of either doing all-reduce or all-to-all.

With more added loggings inside of DDP, it turned out the DDP decided to do all-reduce at different timings across different ranks.

**What is DDP bucketing?**

Once a gradient is ready on a rank, DDP uses all-reduce to synchronize the average of this gradient across all ranks.

Say we have 4 tensor ops. A, B, C, D.

In the most naive version, we could do one synchronization when all gradients in the full backward graph are ready.

The time sequence would be,

* D.grad
* C.grad
* B.grad
* A.grad
* All reduce on [D.grad, C.grad, B.grad, A.grad].

But that would be a huge waste of communication channel bandwidth.

With DDP bucketing, we could put ahead some gradient synchronization batch by batch. The above time sequence now becomes,

* D.grad
* C.grad
* All reduce on [D.grad, C.grad].
* B.grad
* A.grad
* All reduce on [B.grad, A.grad].

With gradient computation overlaps with communication, bucketing technique brings better DDP execution performance.

**What exactly went wrong in this case?**

1. The bucketing doesn’t honor backward graph execution order.
2. There are other collectives comm ops in backward graph.
3. There are unused parameters (i.e unused sub-module) in subset ranks of the whole world.

Using the above example again, we have 4 tensor ops. A, B, C, D.

Say we have 2 trainers,

B is the feature processor module.

B only runs on trainer 0 (both forward and backward), but not on trainer1.

C is the All-to-all (Pooled embeddings distribution).

C sends out all-to-all collective in both its forward and backward pass.

Keep assuming all other ops run on both trainers.

trainer_0 op sequence is,

A, B (feature preproc), C (all-to-all), D | D.grad, C.grad (reverse all-to-all), B.grad (feature proc grads), A.grad

trainer_1 op sequence is,

A, C (all-to-all), D | D.grad, C.grad (reverse all-to-all), A.grad

Even though the correct bucketing should be (same bucketing for both ranks),

* bucket_0, [D.grad, C.grad]
* bucket_1, [B.grad, A.grad]

but because of 1), they end up like,

* bucket_0, [B.grad, D.grad]
* bucket_1, [C.grad, A.grad]

Plus 2) and 3), the time sequence could like,

(check mark represents the gradient is ready)

(bucket is ready to do synchronization if all its enclosing gradients are ready)

* trainer_0
   * t0,
      * D.grad
      * bucket_0, [B.grad, D.grad ✓]
   * t1,
      * **C.grad all-to-all**
      * C.grad ✓
      * bucket_1, [C.grad ✓, A.grad]
   * t2
      * B.grad
      * bucket_0, [B.grad ✓, D.grad ✓] ✓
   * t3
      * All-reduce for bucket_0
   * t4
      * A.grad
      * bucket_1, [C.grad ✓, A.grad ✓] ✓
* trainer_1
   * t0,
      * D.grad
      * bucket_0, [B.grad ✓, D.grad ✓] ✓. (Because B is not used on trainer_1, DDP marks its gradient as ready immediately.)
   * t1,
      * **All-reduce for bucket_0**
   * t2
      * C.grad all-to-all
      * bucket_1, [C.grad ✓, A.grad]
   * t3
      * A.grad
      * bucket_1, [C.grad ✓, A.grad ✓] ✓

This is why trainer_0 all-to-all is matched up with trainer_1 all-reduce.

**What is the solution for fixing DDP?**

Disable DDP bucketing for the first iteration. D34051938

This is because after the first iteration, buckets will be built again based on real backward graph execution order.

So the slow gradient synchronization only affects the first iteration.

Test Plan:
buck build mode/dev-nosan caffe2/test/distributed:distributed_gloo_spawn
BACKEND=gloo WORLD_SIZE=3 buck-out/gen/caffe2/test/distributed/distributed_gloo_spawn\#binary.par -r test_ddp_logging_data_cpu

P484179296

buck build mode/dev-nosan caffe2/test/distributed:distributed_nccl_spawn
BACKEND=nccl WORLD_SIZE=2 buck-out/gen/caffe2/test/distributed/distributed_nccl_spawn\#binary.par -r test_ddp_logging_data_cpu -r test_ddp_get_bucket_sizes
P484177200

Reviewed By: zhaojuanmao

Differential Revision: D34051938

fbshipit-source-id: 0c7f35875687095c3199f19990e73a8349b6e5b9
(cherry picked from commit bb8f11306ea51c2bd3ffd3ab001d62ce369a08ee)
2022-03-04 18:29:36 +00:00
Can Balioglu
e1db2f13ce Refactor TORCH_DISTRIBUTED_DEBUG implementation (#73166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73166

This PR refactors, cleans up, and optimizes the implementation of `TORCH_DISTRIBUTED_DEBUG`. It also introduces three new user APIs: `get_debug_level()`, `set_debug_level()`, and `set_debug_level_from_env()` to retrieve and modify the debug level after a process has started.
ghstack-source-id: 149778566

Test Plan: Run the existing unit tests.

Reviewed By: rohan-varma

Differential Revision: D34371226

fbshipit-source-id: e18443b411adcbaf39b2ec999178c198052fcd5b
(cherry picked from commit 26d6bb1584b83a0490d8b766482656a5887fa21d)
2022-02-24 02:33:05 +00:00
Andrew Gu
59dd84cab6 [Join][BE] Fix typo; remove obsolete method (#72886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72886

**Test Plan**
Searching for `_schedule_shadow_all_reduce_for_fwd_pass` shows that it is defined but never used.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34255651

Pulled By: awgu

fbshipit-source-id: 205a0325c2cdc05e127a183cb86fa2fc2e0db99d
(cherry picked from commit 4492f03a3f)
2022-02-16 15:03:09 +00:00
Yuxin Wu
1ed4653e89 Stop writing logs to root logger (#72649)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/72648

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72649

Reviewed By: soulitzer

Differential Revision: D34172113

Pulled By: mrshenli

fbshipit-source-id: 98cb4140b978a0d9fa53876e427ea3b8bbe884cf
(cherry picked from commit c14297cee6)
2022-02-11 21:30:53 +00:00
Rohan Varma
4feef6c970 Log static graph in constructor if it is set (#72456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72456

It is easier to log if static graph is set at construction time now that it is natively supported in DDP constructor, as opposed to waiting for the first iteration to finish. In some failure cases we're seeing the first iteration does not finish and thus we don't have this data which is vaulable to debug.
ghstack-source-id: 148840679

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D34045204

fbshipit-source-id: 72a187c1ce031db217de4b3ad20a64f2a74995bc
(cherry picked from commit 1d622c88f3)
2022-02-11 15:55:09 +00:00
Rohan Varma
37651894f9 [Easy] Small DDP fixes (#72455)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72455

- Improve helper function
- Improve/fix some logging
ghstack-source-id: 148840678

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D34044865

fbshipit-source-id: d2ae820effaaaecdd7155ffa8d3a1d8ebbd9f39e
(cherry picked from commit 3efbea8f41)
2022-02-11 15:55:09 +00:00
Rohan Varma
1c8fcc44cb [Opt Overlap] Support optimizing partial set of parameters (#71608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71608

Per title
ghstack-source-id: 147577178

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D33696382

fbshipit-source-id: 5b638d3edf5f03ba476356d61e96ca604de18c8f
(cherry picked from commit 436b547fb0)
2022-01-26 19:33:49 +00:00
Rohan Varma
d3354602fc [Easy] DDP typo fix (#71607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71607

Per title
ghstack-source-id: 147577177

Test Plan: N/a

Reviewed By: cbalioglu

Differential Revision: D33694038

fbshipit-source-id: 5a5a618f13bc8b91127169efcebb90b5a36474a1
(cherry picked from commit 62f17f116d)
2022-01-26 07:32:04 +00:00
Rohan Varma
10ca760c0a [Opt Overlap] Implement register_fused_optim in DDP (#71606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71606

Per title
ghstack-source-id: 147577172

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D33694037

fbshipit-source-id: a148d5ce6031f0cc20f33785cfe2c27d1fc2d682
(cherry picked from commit ace3261e0c)
2022-01-26 07:32:04 +00:00
Yanli Zhao
4b3cf1eaf7 [BE]Clarify how to check memory saving if using gradient_as_bucket_view (#71483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71483

claify that peak memory saving should be checked after first iteration when using gradient_as_bucket_view
ghstack-source-id: 147271113

Test Plan: unit test

Reviewed By: rohan-varma

Differential Revision: D33662424

fbshipit-source-id: f760da38e166ae85234e526ddf1526269ea25d42
(cherry picked from commit a40dda20da)
2022-01-20 19:38:41 +00:00
Yanli Zhao
1c61d8c43f [PT1.11] make static graph to be stable (#71459)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71459

1. add static_graph feature to DDP constructor;
2. still keep _set_static_graph() API, so that existing use cases are not affected, also it can be called internally by DDP constructor
3. four cases are covered:
    static_graph = False, _set_static_graph() is called;
    static_graph = False, _set_static_graph() is not called;
    static_graph = True, _set_static_graph() is not called;
    static_graph = True, _set_static_graph() is called;
ghstack-source-id: 147263797

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D33646738

fbshipit-source-id: 8c1730591152aab91afce7133d2adf1efd723855
(cherry picked from commit dc246a1129)
2022-01-20 19:38:41 +00:00
Rohan Varma
fcd1375b2b [DDP][BE][Docs] Clarify checkpoint support (#68827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68827

Add a note about current checkpoint support with DDP. Note that this
does not include the features enabled with _set_static_graph yet, as it is an
undocumented private API. Once we support static graph as beta feature in OSS
we can add to the note here.
ghstack-source-id: 144285041

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D32624957

fbshipit-source-id: e21d156a1c4744b6e2a807b5b5289ed26701886f
2021-11-30 12:37:37 -08:00
Santiago Castro
f776f30780 Keep the sequence or mapping type in default_collate (#68779)
Summary:
`default_collate`, `default_convert`, and `pin_memory` convert sequences into lists. I believe they should keep the original type when possible (e.g., I have a class that inherits from `list`, which comes from a 3rd party library that I can't change, and provides extra functionality).

Note it's easy to do when the type supports an iterable in its creation but it's not always the case (e.g., `range`).

Even though this can be accomplished if using a custom `default_collate`/`default_convert`, 1) this is behavior they should support out-of-the-box IMHO, and 2) `pin_memory` still does it.

cc VitalyFedyunin ejguan NivekT

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68779

Reviewed By: wenleix

Differential Revision: D32651129

Pulled By: ejguan

fbshipit-source-id: 17c390934bacc0e4ead060469cf15dde815550b4
2021-11-29 13:14:20 -08:00
Yifan Xiong
c7eaec86f0 [NCCL] Patch bfloat16 support (#67843)
Summary:
Patch bfloat16 support in NCCL, PR https://github.com/pytorch/pytorch/issues/63260 adds bfloat16 support but is
still not complete to enable bfloat16 for allreduce in end-to-end training.

This patch does the followings:
* fix minimum NCCL version from 2.9.7 to 2.10, NCCL adds bf16 support in
  v2.10.3-1 (commit 7e51592)
* update bfloat16 datatype flag in `csrc/cuda/nccl.cpp` so that NCCL
  operations like all reduce can use it
* enable unit tests for bfloat16 datatype if possible

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67843

Reviewed By: H-Huang

Differential Revision: D32248132

Pulled By: mrshenli

fbshipit-source-id: 081e96e725af3b933dd65ec157c5ad11c6873525
2021-11-09 13:46:13 -08:00
James Reed
80178d6152 [DDP] Fix some issues with code example in DDP docstring (#67883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67883

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: zhaojuanmao

Differential Revision: D32190946

Pulled By: jamesr66a

fbshipit-source-id: a376324b95cbe833ffa606ecdfc6156432880f70
2021-11-05 17:32:45 -07:00
Rohan Varma
bff64e84cd [DDP] Track models with sync bn (#66680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66680

Closes https://github.com/pytorch/pytorch/issues/66215. Tracks models
with sync BN so we can find workflows that use them and target for perf
optimization.
ghstack-source-id: 140875182

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D31679477

fbshipit-source-id: 0e68cd1a7aabbc5b26227895c53d33b8e98bfb8e
2021-10-18 22:31:52 -07:00