pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Andrew Gu	bfffc8d8ef	[DDP][Docs] Add warning that `no_sync()` should include forward (#89244 ) The issue where the user only includes `loss.backward()` inside `no_sync()` but not the forward pass has arisen several times now. I think adding an explicit warning in the docs is worthwhile. Rendered doc: <img width="769" alt="Screen Shot 2022-11-17 at 9 21 32 PM" src="https://user-images.githubusercontent.com/31054793/202602005-22c000b7-1093-4eaf-ba66-9c929a66906b.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89244 Approved by: https://github.com/zhaojuanmao	2022-11-18 22:06:24 +00:00
Colin Taylor	24b9890f03	[torchrec] [composable] update ShardedEmbeddingBagCollection to be use registered EBCs with shardedTensors as registered modules (#758 ) (#88026 ) Summary: X-link: https://github.com/pytorch/torchrec/pull/758 This PR fixes a bug in FSDP/DDP, where ShardedTensors are not supported even if passed in as params to ignore. this is important for composability because TorchRec named_parameters() will return FQN of shardedTensors (as defined in goals) It defines device of ShardedTensor to be None when local_tensor() does not exist on rank update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit Differential Revision: D40458625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88026 Approved by: https://github.com/wanchaol, https://github.com/rohan-varma	2022-11-17 04:26:13 +00:00
Charlie Yan	8523c45717	Delete stub file to enable mypy check (#4649 ) (#88701 ) Summary: X-link: https://github.com/facebookresearch/detectron2/pull/4649 Context in https://fburl.com/4irjskbe This change deletes distributed.pyi, so that lintrunner will run mypy on distributed.py for typing check. Test Plan: CI Differential Revision: D41028360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88701 Approved by: https://github.com/zhaojuanmao	2022-11-09 20:29:34 +00:00
Will Constable	678d038001	Support DDP ignored parameters in DDPOptimizer (#88460 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88460 Approved by: https://github.com/aazzolini	2022-11-04 21:42:15 +00:00
Kazuaki Ishizaki	2ddefbdc3c	Fix typos used in documents under torch directory (#88300 ) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300 Approved by: https://github.com/lezcano	2022-11-02 09:38:13 +00:00
Horace He	12dd877395	Fix all references to torchdynamo from the merge (#87731 ) cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @jansel Pull Request resolved: https://github.com/pytorch/pytorch/pull/87731 Approved by: https://github.com/yanboliang, https://github.com/ezyang, https://github.com/anijain2305, https://github.com/jansel	2022-10-31 06:51:07 +00:00
PyTorch MergeBot	641d8e0e69	Revert "Enable mypy check for distributed.py, and fix type errors (#87543 )" This reverts commit `2cc624cd43`. Reverted https://github.com/pytorch/pytorch/pull/87543 on behalf of https://github.com/weiwangmeta due to breaking internal builds	2022-10-28 02:20:25 +00:00
Charlie Yan	2cc624cd43	Enable mypy check for distributed.py, and fix type errors (#87543 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87543 Approved by: https://github.com/fduwjj	2022-10-27 00:22:54 +00:00
Charlie Yan	0294787bd6	Format distributed.py (#87667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87667 Approved by: https://github.com/zhaojuanmao	2022-10-26 06:02:30 +00:00
Charlie Yan	bebd162249	Fix doc of DDP (#86244 ) (#86256 ) [ghstack-poisoned] Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86256 Approved by: https://github.com/rohan-varma	2022-10-06 00:48:56 +00:00
Rohan Varma	be4e43c7d0	Remove DataParallel remnants from DDP doc (#86221 ) As @aazzolini pointed out, the docstring is incorrect and probably vestige from DP / single process multi device mode in DDP. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86221 Approved by: https://github.com/aazzolini	2022-10-05 22:30:02 +00:00
Will Constable	32fc0b958e	Expose get_active_ddp_module api for torchdynamo DDP (#83333 ) Pairs up with torchdynamo PR https://github.com/pytorch/torchdynamo/pull/628 Exposes a new API that lets torchdynamo know when it is compiling the 'forward' of a module that is inside a DDPmodule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83333 Approved by: https://github.com/mrshenli	2022-09-17 02:10:25 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Hubert Lu	cd18b78daa	[ROCm] Enable bf16-related tests in test_c10d_nccl.py and test_grad_layout_1devicemodule_1replicaperprocess (#82020 ) ### Description Enable bf16-related unit tests in test_c10d_nccl.py and test_grad_layout_1devicemodule_1replicaperprocess as follows: - distributed/test_c10d_nccl test_bf16_compress_wrapper_is_view (main.DistributedDataParallelTest) - distributed/test_c10d_nccl test_bf16_compress_wrapper_nccl (main.DistributedDataParallelTest) - distributed/test_c10d_nccl test_grad_layout_1devicemodule_1replicaperprocess (main.DistributedDataParallelTest) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82020 Approved by: https://github.com/ezyang	2022-08-11 21:16:33 +00:00
Yi Wang	08d54b5cd5	Correct DDP example (#83034 ) remove undefined `pg` from DDP example code Pull Request resolved: https://github.com/pytorch/pytorch/pull/83034 Approved by: https://github.com/mrshenli	2022-08-09 18:58:33 +00:00
ProGamerGov	71d50f4f89	Change docstring type callable to Callable for consistency (#82487 ) ### Description Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function. ### Testing There shouldn't be any testing required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487 Approved by: https://github.com/albanD	2022-08-01 17:26:09 +00:00
anjali411	3bcc19b29a	Add __all__ to various submodules in torch.fx, distributions, distributed, package (#80367 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80367 Approved by: https://github.com/albanD	2022-06-27 21:27:30 +00:00
Rohan Varma	e7cb44b6c4	Guard distributed imports (#77727 ) Move distributed import after dist.is_avail check to fix builds with USE_DISTRIBUTED=0. Although, note that this issue is not caught by any CI at the moment. Closes https://github.com/pytorch/pytorch/issues/77704 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77727 Approved by: https://github.com/malfet	2022-05-18 11:27:52 +00:00
Rohan Varma	6f954d7bbb	FSDP parameter sync Pull Request resolved: https://github.com/pytorch/pytorch/pull/77492 Approved by: https://github.com/zhaojuanmao	2022-05-17 19:58:49 +00:00
Rohan Varma	bbb1f106c7	Separate input moving to utils file Pull Request resolved: https://github.com/pytorch/pytorch/pull/77187 Test fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/77235 Lint fix Approved by: https://github.com/awgu	2022-05-11 21:55:38 +00:00
Rohan Varma	ffb0946504	Generalize param verification and broadcast New PR for https://github.com/pytorch/pytorch/pull/75970 to be compatible with GHF. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76374 Approved by: https://github.com/awgu	2022-04-26 22:25:53 +00:00
pritam	b26df43f15	Fix bug where __getstate__ of DDP looks for self._replicated_tensor_module Pull Request resolved: https://github.com/pytorch/pytorch/pull/76349 When we are not using ReplicatedTensor in DDP and try to save a DDP module it will error out since it tries to delete the _replicated_tensor_module attribute. Fixing this by checking if this mode is enabled before triggering the delete. Differential Revision: [D35875167](https://our.internmc.facebook.com/intern/diff/D35875167/) Approved by: https://github.com/mrshenli, https://github.com/zhaojuanmao	2022-04-26 02:49:49 +00:00
pritam	3a38f175dd	Convert DDP parameters to ReplicatedTensor during forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75753 As per the design in https://github.com/pytorch/pytorch/issues/72138, convert DDP parameters to ReplicatedTensor during its forward pass. Concretely, this is done as follows: 1) Create a separate `_replicated_tensor_module` which is a copy of self.module without creating copies of the Tensors themselves. 2) Use `_replicated_tensor_module` instead of `self.module` during the forward pass. 3) Have a context manager `_ddp_replicated_tensor` to enable this, since certain edge cases can fail where self.module is changed out of band resulting in discrepancy between self.module and `_replicated_tensor_module`. Differential Revision: [D35533736](https://our.internmc.facebook.com/intern/diff/D35533736/) Approved by: https://github.com/wanchaol, https://github.com/rohan-varma	2022-04-18 03:27:23 +00:00
Junjie Wang (PyTorch)	0a6ac31797	[PT-D][DDP][BE] Add unit tests for Forward and Backward Hook (#74063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74063 Address the issue https://github.com/pytorch/pytorch/issues/66229 as part of BE effort. Basically: 1. We remove the stale comment which confuses users. 2. Add more unit tests to test the forward/backward hook working for DDP. ghstack-source-id: 151463380 Test Plan: CI Reviewed By: rohan-varma Differential Revision: D34800830 fbshipit-source-id: 21133209323b2b5eda0dd6472f6309d4fb779b97 (cherry picked from commit b9b165c8305572128395daffafc13fcac8b85e29)	2022-03-16 23:18:28 +00:00
Shihao Xu	bcd0843bec	[torch.distributed][DDP] Disable DDP bucketing for the first iteration (#72843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72843 # [Debug Story] Training Hanging and DDP Bucketing What are the characteristics of the hanging training instance? The model uses TorchRec `PooledEmbeddingArch` and corresponding sharding solution. The model config difference to trigger this hanging issue is turning on position weighted embedding tables. A feature processor module, `GroupedPositionWeightedModule`, is constructed on all ranks, but `GroupedPositionWeightedModule.foward(...)` is only [called on subset ranks of the whole world](https://fburl.com/code/yqrmtvli). What was the initial manifested error? The training was stuck in the first iteration. What are useful debugging tools this time? After turning off [static_graph in DDP](https://fburl.com/code/4io81p5i), we saw there were sparse feature lengths becoming negative values after all-to-all collectives. Hanging becomes fatal failure. After turning on [torch.distributed DETAIL debugging mode](https://fburl.com/code/cp8e28mm), we saw 2 trainers sent out mismatched collectives, one doing all-to-all, the other doing all-reduce. So we know the negative values comes from all-to-all being matched with all-reduce. the error had happened ahead, which is the wrong timing of either doing all-reduce or all-to-all. With more added loggings inside of DDP, it turned out the DDP decided to do all-reduce at different timings across different ranks. What is DDP bucketing? Once a gradient is ready on a rank, DDP uses all-reduce to synchronize the average of this gradient across all ranks. Say we have 4 tensor ops. A, B, C, D. In the most naive version, we could do one synchronization when all gradients in the full backward graph are ready. The time sequence would be, * D.grad * C.grad * B.grad * A.grad * All reduce on [D.grad, C.grad, B.grad, A.grad]. But that would be a huge waste of communication channel bandwidth. With DDP bucketing, we could put ahead some gradient synchronization batch by batch. The above time sequence now becomes, * D.grad * C.grad * All reduce on [D.grad, C.grad]. * B.grad * A.grad * All reduce on [B.grad, A.grad]. With gradient computation overlaps with communication, bucketing technique brings better DDP execution performance. What exactly went wrong in this case? 1. The bucketing doesn’t honor backward graph execution order. 2. There are other collectives comm ops in backward graph. 3. There are unused parameters (i.e unused sub-module) in subset ranks of the whole world. Using the above example again, we have 4 tensor ops. A, B, C, D. Say we have 2 trainers, B is the feature processor module. B only runs on trainer 0 (both forward and backward), but not on trainer1. C is the All-to-all (Pooled embeddings distribution). C sends out all-to-all collective in both its forward and backward pass. Keep assuming all other ops run on both trainers. trainer_0 op sequence is, A, B (feature preproc), C (all-to-all), D \| D.grad, C.grad (reverse all-to-all), B.grad (feature proc grads), A.grad trainer_1 op sequence is, A, C (all-to-all), D \| D.grad, C.grad (reverse all-to-all), A.grad Even though the correct bucketing should be (same bucketing for both ranks), * bucket_0, [D.grad, C.grad] * bucket_1, [B.grad, A.grad] but because of 1), they end up like, * bucket_0, [B.grad, D.grad] * bucket_1, [C.grad, A.grad] Plus 2) and 3), the time sequence could like, (check mark represents the gradient is ready) (bucket is ready to do synchronization if all its enclosing gradients are ready) * trainer_0 * t0, * D.grad * bucket_0, [B.grad, D.grad ✓] * t1, * C.grad all-to-all * C.grad ✓ * bucket_1, [C.grad ✓, A.grad] * t2 * B.grad * bucket_0, [B.grad ✓, D.grad ✓] ✓ * t3 * All-reduce for bucket_0 * t4 * A.grad * bucket_1, [C.grad ✓, A.grad ✓] ✓ * trainer_1 * t0, * D.grad * bucket_0, [B.grad ✓, D.grad ✓] ✓. (Because B is not used on trainer_1, DDP marks its gradient as ready immediately.) * t1, * All-reduce for bucket_0 * t2 * C.grad all-to-all * bucket_1, [C.grad ✓, A.grad] * t3 * A.grad * bucket_1, [C.grad ✓, A.grad ✓] ✓ This is why trainer_0 all-to-all is matched up with trainer_1 all-reduce. What is the solution for fixing DDP? Disable DDP bucketing for the first iteration. D34051938 This is because after the first iteration, buckets will be built again based on real backward graph execution order. So the slow gradient synchronization only affects the first iteration. Test Plan: buck build mode/dev-nosan caffe2/test/distributed:distributed_gloo_spawn BACKEND=gloo WORLD_SIZE=3 buck-out/gen/caffe2/test/distributed/distributed_gloo_spawn\#binary.par -r test_ddp_logging_data_cpu P484179296 buck build mode/dev-nosan caffe2/test/distributed:distributed_nccl_spawn BACKEND=nccl WORLD_SIZE=2 buck-out/gen/caffe2/test/distributed/distributed_nccl_spawn\#binary.par -r test_ddp_logging_data_cpu -r test_ddp_get_bucket_sizes P484177200 Reviewed By: zhaojuanmao Differential Revision: D34051938 fbshipit-source-id: 0c7f35875687095c3199f19990e73a8349b6e5b9 (cherry picked from commit bb8f11306ea51c2bd3ffd3ab001d62ce369a08ee)	2022-03-04 18:29:36 +00:00
Can Balioglu	e1db2f13ce	Refactor TORCH_DISTRIBUTED_DEBUG implementation (#73166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73166 This PR refactors, cleans up, and optimizes the implementation of `TORCH_DISTRIBUTED_DEBUG`. It also introduces three new user APIs: `get_debug_level()`, `set_debug_level()`, and `set_debug_level_from_env()` to retrieve and modify the debug level after a process has started. ghstack-source-id: 149778566 Test Plan: Run the existing unit tests. Reviewed By: rohan-varma Differential Revision: D34371226 fbshipit-source-id: e18443b411adcbaf39b2ec999178c198052fcd5b (cherry picked from commit 26d6bb1584b83a0490d8b766482656a5887fa21d)	2022-02-24 02:33:05 +00:00
Andrew Gu	59dd84cab6	[Join][BE] Fix typo; remove obsolete method (#72886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72886 Test Plan Searching for `_schedule_shadow_all_reduce_for_fwd_pass` shows that it is defined but never used. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34255651 Pulled By: awgu fbshipit-source-id: 205a0325c2cdc05e127a183cb86fa2fc2e0db99d (cherry picked from commit `4492f03a3f`)	2022-02-16 15:03:09 +00:00
Yuxin Wu	1ed4653e89	Stop writing logs to root logger (#72649 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/72648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72649 Reviewed By: soulitzer Differential Revision: D34172113 Pulled By: mrshenli fbshipit-source-id: 98cb4140b978a0d9fa53876e427ea3b8bbe884cf (cherry picked from commit `c14297cee6`)	2022-02-11 21:30:53 +00:00
Rohan Varma	4feef6c970	Log static graph in constructor if it is set (#72456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72456 It is easier to log if static graph is set at construction time now that it is natively supported in DDP constructor, as opposed to waiting for the first iteration to finish. In some failure cases we're seeing the first iteration does not finish and thus we don't have this data which is vaulable to debug. ghstack-source-id: 148840679 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D34045204 fbshipit-source-id: 72a187c1ce031db217de4b3ad20a64f2a74995bc (cherry picked from commit `1d622c88f3`)	2022-02-11 15:55:09 +00:00
Rohan Varma	37651894f9	[Easy] Small DDP fixes (#72455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72455 - Improve helper function - Improve/fix some logging ghstack-source-id: 148840678 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D34044865 fbshipit-source-id: d2ae820effaaaecdd7155ffa8d3a1d8ebbd9f39e (cherry picked from commit `3efbea8f41`)	2022-02-11 15:55:09 +00:00
Rohan Varma	1c8fcc44cb	[Opt Overlap] Support optimizing partial set of parameters (#71608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71608 Per title ghstack-source-id: 147577178 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D33696382 fbshipit-source-id: 5b638d3edf5f03ba476356d61e96ca604de18c8f (cherry picked from commit `436b547fb0`)	2022-01-26 19:33:49 +00:00
Rohan Varma	d3354602fc	[Easy] DDP typo fix (#71607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71607 Per title ghstack-source-id: 147577177 Test Plan: N/a Reviewed By: cbalioglu Differential Revision: D33694038 fbshipit-source-id: 5a5a618f13bc8b91127169efcebb90b5a36474a1 (cherry picked from commit `62f17f116d`)	2022-01-26 07:32:04 +00:00
Rohan Varma	10ca760c0a	[Opt Overlap] Implement register_fused_optim in DDP (#71606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71606 Per title ghstack-source-id: 147577172 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D33694037 fbshipit-source-id: a148d5ce6031f0cc20f33785cfe2c27d1fc2d682 (cherry picked from commit `ace3261e0c`)	2022-01-26 07:32:04 +00:00
Yanli Zhao	4b3cf1eaf7	[BE]Clarify how to check memory saving if using gradient_as_bucket_view (#71483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71483 claify that peak memory saving should be checked after first iteration when using gradient_as_bucket_view ghstack-source-id: 147271113 Test Plan: unit test Reviewed By: rohan-varma Differential Revision: D33662424 fbshipit-source-id: f760da38e166ae85234e526ddf1526269ea25d42 (cherry picked from commit `a40dda20da`)	2022-01-20 19:38:41 +00:00
Yanli Zhao	1c61d8c43f	[PT1.11] make static graph to be stable (#71459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71459 1. add static_graph feature to DDP constructor; 2. still keep _set_static_graph() API, so that existing use cases are not affected, also it can be called internally by DDP constructor 3. four cases are covered: static_graph = False, _set_static_graph() is called; static_graph = False, _set_static_graph() is not called; static_graph = True, _set_static_graph() is not called; static_graph = True, _set_static_graph() is called; ghstack-source-id: 147263797 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D33646738 fbshipit-source-id: 8c1730591152aab91afce7133d2adf1efd723855 (cherry picked from commit `dc246a1129`)	2022-01-20 19:38:41 +00:00
Rohan Varma	fcd1375b2b	[DDP][BE][Docs] Clarify checkpoint support (#68827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68827 Add a note about current checkpoint support with DDP. Note that this does not include the features enabled with _set_static_graph yet, as it is an undocumented private API. Once we support static graph as beta feature in OSS we can add to the note here. ghstack-source-id: 144285041 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D32624957 fbshipit-source-id: e21d156a1c4744b6e2a807b5b5289ed26701886f	2021-11-30 12:37:37 -08:00
Santiago Castro	f776f30780	Keep the sequence or mapping type in `default_collate` (#68779 ) Summary: `default_collate`, `default_convert`, and `pin_memory` convert sequences into lists. I believe they should keep the original type when possible (e.g., I have a class that inherits from `list`, which comes from a 3rd party library that I can't change, and provides extra functionality). Note it's easy to do when the type supports an iterable in its creation but it's not always the case (e.g., `range`). Even though this can be accomplished if using a custom `default_collate`/`default_convert`, 1) this is behavior they should support out-of-the-box IMHO, and 2) `pin_memory` still does it. cc VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/68779 Reviewed By: wenleix Differential Revision: D32651129 Pulled By: ejguan fbshipit-source-id: 17c390934bacc0e4ead060469cf15dde815550b4	2021-11-29 13:14:20 -08:00
Yifan Xiong	c7eaec86f0	[NCCL] Patch bfloat16 support (#67843 ) Summary: Patch bfloat16 support in NCCL, PR https://github.com/pytorch/pytorch/issues/63260 adds bfloat16 support but is still not complete to enable bfloat16 for allreduce in end-to-end training. This patch does the followings: * fix minimum NCCL version from 2.9.7 to 2.10, NCCL adds bf16 support in v2.10.3-1 (commit 7e51592) * update bfloat16 datatype flag in `csrc/cuda/nccl.cpp` so that NCCL operations like all reduce can use it * enable unit tests for bfloat16 datatype if possible cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67843 Reviewed By: H-Huang Differential Revision: D32248132 Pulled By: mrshenli fbshipit-source-id: 081e96e725af3b933dd65ec157c5ad11c6873525	2021-11-09 13:46:13 -08:00
James Reed	80178d6152	[DDP] Fix some issues with code example in DDP docstring (#67883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67883 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: zhaojuanmao Differential Revision: D32190946 Pulled By: jamesr66a fbshipit-source-id: a376324b95cbe833ffa606ecdfc6156432880f70	2021-11-05 17:32:45 -07:00
Rohan Varma	bff64e84cd	[DDP] Track models with sync bn (#66680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66680 Closes https://github.com/pytorch/pytorch/issues/66215. Tracks models with sync BN so we can find workflows that use them and target for perf optimization. ghstack-source-id: 140875182 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31679477 fbshipit-source-id: 0e68cd1a7aabbc5b26227895c53d33b8e98bfb8e	2021-10-18 22:31:52 -07:00
Rohan Varma	38f5144eae	Fix https://github.com/pytorch/pytorch/issues/61982 (#66015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66015 Fixes https://github.com/pytorch/pytorch/issues/61982 by clone of tensors in DDPSink. Only applies once for static_graph and generally for unused params which already has overhead, so perf hit should not be an issue. Will verify with benchmark. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D31346633 fbshipit-source-id: 5b9245ade628565cffe01731f6a0dcbb6126029b	2021-10-07 18:11:18 -07:00
Rohan Varma	71704349aa	[DDP] Allow await of custom buffer reduction in backward (#64515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138793803 Test Plan: I Reviewed By: zhaojuanmao Differential Revision: D30757761 fbshipit-source-id: e1a2ead9ca850cb345fbee079cf0614e91bece44	2021-09-23 13:02:53 -07:00
Wanchao Liang	2f67579864	[ddp] use named_params and named_buffers explicitly (#65181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65181 This PR changes `state_dict()` during sync to `named_parameters` and `named_buffers` explicitly. the underlying motivation is that, `state_dict()` doesn't necessarily equals to "params + buffers" for all cases, state_dict is used for checkpoint purpose mainly, and params/buffers are used for training, we might have cases that params/buffers be in different forms with state_dict (i.e. state_dict we might want to save in small pieces of tensors while in training we want to concat the tensors together for performance reasons). ghstack-source-id: 138701159 Test Plan: wait for ci Reviewed By: divchenko, rohan-varma Differential Revision: D31007085 fbshipit-source-id: 4e1c4fbc07110163fb9b09b043ef7b4b75150f18	2021-09-22 17:32:54 -07:00
Rohan Varma	5739f77775	[DDP] Refactor and remove sync_params (#64514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64514 sync_params is a misnomer since we don't actually synchroniz parameters. While removing this I realized `self._check_and_sync_module_buffers` does almost everything we need it to, so just refactored that and made DDP forward call into it. ghstack-source-id: 138684982 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30751231 fbshipit-source-id: add7c684f5c6c71dad9e9597c7759849fa74f47a	2021-09-22 14:12:51 -07:00
Rohan Varma	ce5981e431	[DDP] Custom buffer reduction (#64513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64513 Proposal: https://github.com/pytorch/pytorch/issues/63041 Support custom buffer reduction in DDP via hook ghstack-source-id: 138655663 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30751152 fbshipit-source-id: 257a9d46bb178d8812d4ea5a4d9c6140b8a1791f	2021-09-22 14:11:35 -07:00
Jessica Choi	f24bd43375	Changing type and name of local_used_maps to reflect that it is only one map (#65380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65380 Fixing bugs that arise when running setup.py develop cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31104844 Pulled By: jaceyca fbshipit-source-id: acfd4cf316c71177df758ca55b470f51a17f776b	2021-09-22 11:35:33 -07:00
Jessica Choi	158b8bdc8a	Cleaning up DDP SPMD in reducer.cpp (#64113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64113 Since there is only one model replica per process, `replicas` can be simplified from `std::vector<std::vector<at::Tensor>>` to `std::vector<at::Tensor>` in the Reducer class. Test Plan: All tests are passing `pytest test/distributed/test_c10d_gloo.py -vs` Imported from OSS Reviewed By: mrshenli Differential Revision: D30615965 fbshipit-source-id: d2ec809d99b788c200b01411333e7dbad1269b51	2021-09-21 16:13:18 -07:00
Rohan Varma	45bd0f6181	Back out "Revert D30745960: [DDP] Remove SPMD from self.modules_buffers" (#64778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64778 Original commit changeset: d3f3fb813c45 ghstack-source-id: 138326910 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849443 fbshipit-source-id: 15dab8a959a29d2e2fefac6ad52b8d8168eacc02	2021-09-17 12:28:36 -07:00
Rohan Varma	70f286c1e2	Back out "Revert D30745961: [DDP] Remove self.modules_params" (#64777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64777 Original commit changeset: 59f7cc50d369 ghstack-source-id: 138326909 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849442 fbshipit-source-id: bb87ba83935374d8a3ebbc29365df1417dd4f26f	2021-09-17 12:28:34 -07:00
Rohan Varma	61dfcbf4bc	Back out "Revert D30745921: [DDP] Fix when buffers are reassigned in module" (#64776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64776 Original commit changeset: 343ead86bf1e ghstack-source-id: 138326914 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849444 fbshipit-source-id: 9a72805416fe7d6c68e51bdcdb88f6e1fecb614d	2021-09-17 12:28:32 -07:00

1 2 3 4 5 ...

265 Commits