pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Eddie Yan	7710d872fc	[DDP] Fix broadcast for channels-last tensors (#79060 ) #79043 CC @pritamdamania87 @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/79060 Approved by: https://github.com/pritamdamania87	2022-06-08 21:52:58 +00:00
pritam	b333a752c0	Validate that tensors are contiguous in ProcessGroupNCCL Fixes https://github.com/pytorch/pytorch/issues/77554 by ensuring we require contiguous tensors for send/recv in ProcessGroupNCCL. Differential Revision: [D36500769](https://our.internmc.facebook.com/intern/diff/D36500769/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77809 Approved by: https://github.com/rohan-varma, https://github.com/wanchaol	2022-05-19 17:48:22 +00:00
magialiao	7c8c8cc248	Use batched operations for PowerSGD This PR is a rebased version of #75157 which fixes CI issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/76041 Approved by: https://github.com/albanD, https://github.com/rohan-varma	2022-04-21 03:25:09 +00:00
PyTorch MergeBot	c5d57e7be9	Revert "Use batched operations for PowerSGD" This reverts commit `5654e63398`. Reverted https://github.com/pytorch/pytorch/pull/75157 on behalf of https://github.com/albanD	2022-04-18 13:10:29 +00:00
magialiao	5654e63398	Use batched operations for PowerSGD This implements method proposed in #74907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75157 Approved by: https://github.com/wayi1, https://github.com/rohan-varma	2022-04-18 04:34:17 +00:00
Rohan Varma	a5ea3b7fd9	[DDP] Generalize activation checkpoint tests (#74130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74130 enable these tests to run for all dist backends not just nccl. ghstack-source-id: 151429410 Test Plan: CI Reviewed By: awgu Differential Revision: D34281684 fbshipit-source-id: 956c1b0cafe0502b593dd42b157d518e89a47d8e (cherry picked from commit 15d58b88362c49565123823f24ca122d5344acc9)	2022-03-16 17:04:30 +00:00
Can Balioglu	e1db2f13ce	Refactor TORCH_DISTRIBUTED_DEBUG implementation (#73166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73166 This PR refactors, cleans up, and optimizes the implementation of `TORCH_DISTRIBUTED_DEBUG`. It also introduces three new user APIs: `get_debug_level()`, `set_debug_level()`, and `set_debug_level_from_env()` to retrieve and modify the debug level after a process has started. ghstack-source-id: 149778566 Test Plan: Run the existing unit tests. Reviewed By: rohan-varma Differential Revision: D34371226 fbshipit-source-id: e18443b411adcbaf39b2ec999178c198052fcd5b (cherry picked from commit 26d6bb1584b83a0490d8b766482656a5887fa21d)	2022-02-24 02:33:05 +00:00
Wanchao Liang	6feba4bc7e	Implement scatter primitive for ProcessGroupNCCL (#70029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70029 This PR implements NCCL scatter and add scatter to ProcessGroupNCCL. NCCL doesn’t directly provide primitives for scatter, so we need to be implemented on top of NCCL’s send/recv API. 1. In ProcessGroupNCCL.cpp, the inputTensors are first flattened, then outputTensors and inputFlattened are passed by the collective class to scatter() function in nccl.cpp. 2. In nccl.cpp, scatter is implemented using ncclSend/ncclRecv: the root rank uses a for loop to send(distribute) the inputTensors to each rank, then all the ranks receive the inputTensor from the root rank. ghstack-source-id: 147754837 Test Plan: test_scatter_ops test_scatter_stress test_scatter_checks Reviewed By: pritamdamania87 Differential Revision: D33154823 fbshipit-source-id: 4513e7eaf7d47a60eb67da99dc6c2e9a2882f3fd (cherry picked from commit `93201f9d4a`)	2022-01-27 19:37:55 +00:00
Wanchao Liang	9b53d3194c	Implement gather primitive for ProcessGroupNCCL (#66745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66745 This PR implement NCCL gather and add gather to ProcessGroupNCCL using nccl send/recv api. NCCL doesn’t directly provide primitives for gather, so we need to be implemented on top of NCCL’s send/recv API. 1. In ProcessGroupNCCL.cpp, the outputTensors are first flattened, then inputTensors and outputFlattened are passed by the collective class to gather() function in nccl.cpp. 1. In nccl.cpp, gather is implemented using ncclSend/ncclRecv: all the ranks send inputTensor to the root rank, and the root rank uses a for loop to receive these inputTensors. ghstack-source-id: 147754838 Test Plan: test_gather_ops test_gather_checks test_gather_stress Reviewed By: pritamdamania87 Differential Revision: D29616361 fbshipit-source-id: b500d9b8e67113194c5cc6575fb0e5d806dc7782 (cherry picked from commit `d560ee732e`)	2022-01-27 19:37:55 +00:00
Michael Carilli	f37d2046f8	Implements allreduce_coalesced for ProcessGroupNCCL (#62140 ) Summary: Implements allreduce_coalesced for ProcessGroupNCCL as an NCCL group of allreduces on separate tensors, as proposed in https://github.com/pytorch/pytorch/issues/38995#issuecomment-882804595. In recent versions of NCCL, performance of grouped comms has improved significantly. A group can execute with just one kernel, so a grouped comm on a set of unflattened tensors can be more performant than flattening+a single flat nccl call. The same approach can easily extend to broadcast_coalesced and reduce_coalesced. I'm still not sure how (hypothetical) all_gather_coalesced and reduce_scatter_coalesced ops should be exposed or implemented, because we need to consider "_base" variants where the output or input tensor is pre-flattened. For example, https://github.com/pytorch/pytorch/issues/61781 effectively wants "allgather_base_coalesced". I'm also not sure how the _multigpu variants should enter the picture. With the approach I've written here, ProcessGroupNCCL::allreduce accepts a vector of tensors that are either all on the same device (in which case it'll do an allreduce_coalesced) or all on different devices (in which case it'll do an allreduce_multigpu). In other words it can do _coalesced or _multigpu but not both at once. for some reason github wont let me add agolynski to the reviewers cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/62140 Reviewed By: fduwjj Differential Revision: D33781010 Pulled By: cbalioglu fbshipit-source-id: f0c233da9ebae57d7ccecf6d8dc432d936d4d3ce (cherry picked from commit `e43cb81d30`)	2022-01-26 13:31:30 +00:00
Rohan Varma	ba08440e88	[Opt Overlap] Remove redundant tests (#71600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71600 These tests in test_c10d_nccl test a subset of functionality that's already covered by distributed_test.py, no need for these additional tests. ghstack-source-id: 147458823 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D33662679 fbshipit-source-id: 2d1c1223fdd72a851c537b4793a71d65190d2553 (cherry picked from commit `14565ac5a6`)	2022-01-23 00:04:32 +00:00
Yanli Zhao	1c61d8c43f	[PT1.11] make static graph to be stable (#71459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71459 1. add static_graph feature to DDP constructor; 2. still keep _set_static_graph() API, so that existing use cases are not affected, also it can be called internally by DDP constructor 3. four cases are covered: static_graph = False, _set_static_graph() is called; static_graph = False, _set_static_graph() is not called; static_graph = True, _set_static_graph() is not called; static_graph = True, _set_static_graph() is called; ghstack-source-id: 147263797 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D33646738 fbshipit-source-id: 8c1730591152aab91afce7133d2adf1efd723855 (cherry picked from commit `dc246a1129`)	2022-01-20 19:38:41 +00:00
Rohan Varma	3b589c3497	[DDP Checkpointing] non-reentrant checkpoint tests (#69060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69060 Saved variable hooks checkpointing was added in https://github.com/pytorch/pytorch/pull/69508, this PR adds some tests for DDP. Specifically, we can support almost all DDP use cases with this new API, such as dynamic module with find_unused_parameters=True. One case remains to be supported, which is static_graph + non-reentrant based checkpointing. The underlying reason this does not work is https://github.com/pytorch/pytorch/issues/58111. ghstack-source-id: 147219887 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32712126 fbshipit-source-id: ba5ae9ca77fd8929ee020c7dc97838bae9a1931b (cherry picked from commit `9c7f93e217`)	2022-01-19 18:09:41 +00:00
Junjie Wang	7c2489bdae	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68786 To enable the auto grad for the sharded linear, we find we need to make some changes to the current nn function api (c10d api with auto grad enabled). So we made the following several changes: 1. Add a new api `reduce_scatter` since we need it in the rowwise sharding. 2. Modify the `all_to_all` api to make sure it consistent with the ones in distributed_c10d.py. 3. Found the cpp input params of `reduce_scatter` is missing input param, added more unit test to cover these cases. 4. Sync the NN test from gloo to nccl. ghstack-source-id: 144860208 Test Plan: CI + Unit Test Reviewed By: pritamdamania87 Differential Revision: D32569674 fbshipit-source-id: 9bd613f91bbf7a39eede0af32a5a5db0f2ade43b	2021-12-06 13:38:58 -08:00
Rohan Varma	c95277e92a	[FSDP] Remove auto_wrap() (#69356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69356 Per title ghstack-source-id: 144807949 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32816150 fbshipit-source-id: 6b4eacc63edd267bc1eb8a1c1d6c753bc581d63a	2021-12-06 12:11:14 -08:00
Wanchao Liang	ff3fc37267	[BE] rewrite ProcessGroupNCCLTest to be MultiProcess (#67705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67705 This PR rewrites ProcessGroupNCCLTest to be MultiProcessTestCase. It was originally written in a single process multi-GPU fashion, we change it to multi-process instead to align with other c10d tests. ghstack-source-id: 144555092 Test Plan: wait for CI Reviewed By: pritamdamania87, fduwjj Differential Revision: D32113626 fbshipit-source-id: 613d36aeae36bf441de1c2c83aa4755f4d33df4d	2021-12-02 10:12:05 -08:00
Rohan Varma	994f110a6f	Refactor DDP checkpoint tests (#68792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68792 Refactor tests to be more clear what features are supported and unsupported under certain DDP configs. ghstack-source-id: 144285040 Test Plan: Ci Reviewed By: pbelevich Differential Revision: D32609498 fbshipit-source-id: 5231242054d4ff6cd8e7acc4a50b096771ef23d1	2021-11-30 12:36:14 -08:00
Rohan Varma	183dcdf551	[reland] Fix flaky test_nccl_timeout (#68544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66882 In addition to changes in https://github.com/pytorch/pytorch/pull/68403, add one more error check that can be raised when a collective times out cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68544 Reviewed By: albanD Differential Revision: D32508706 Pulled By: rohan-varma fbshipit-source-id: 7d41b91f547d4ad763c44cd11e7b9914b452b617	2021-11-19 13:25:24 -08:00
Mike Ruberry	3e3bf40b0a	Revert D32452012: [pytorch][PR] Fix flaky test_nccl_timeout Test Plan: revert-hammer Differential Revision: D32452012 (`faa1e8b7cf`) Original commit changeset: c959b25957f2 fbshipit-source-id: a2786744b12ceed350eec0ca2834f5176a4e21ee	2021-11-17 06:08:53 -08:00
Rohan Varma	faa1e8b7cf	Fix flaky test_nccl_timeout (#68403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66882 - Remove time.sleep call - Use gloo barrier to enforce rank synchronization - Reduce timeouts for allrduce - Pass in timeout and call wait() in _check_for_nccl_abort() cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68403 Reviewed By: H-Huang Differential Revision: D32452012 Pulled By: rohan-varma fbshipit-source-id: c959b25957f2eb8d59c506075da6023d25bbcfd9	2021-11-16 23:43:23 -08:00
Yifan Xiong	c7eaec86f0	[NCCL] Patch bfloat16 support (#67843 ) Summary: Patch bfloat16 support in NCCL, PR https://github.com/pytorch/pytorch/issues/63260 adds bfloat16 support but is still not complete to enable bfloat16 for allreduce in end-to-end training. This patch does the followings: * fix minimum NCCL version from 2.9.7 to 2.10, NCCL adds bf16 support in v2.10.3-1 (commit 7e51592) * update bfloat16 datatype flag in `csrc/cuda/nccl.cpp` so that NCCL operations like all reduce can use it * enable unit tests for bfloat16 datatype if possible cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67843 Reviewed By: H-Huang Differential Revision: D32248132 Pulled By: mrshenli fbshipit-source-id: 081e96e725af3b933dd65ec157c5ad11c6873525	2021-11-09 13:46:13 -08:00
Shen Li	18955d3564	Raise warning when calling collectives on non-member group objects (#67639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67639 Due to BC considerations, we cannot directly error out, as that might break existing applications. Raise warnings first to improve debuggability. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D32075151 Pulled By: mrshenli fbshipit-source-id: 5680d420f5f6cd3f74a36616c03350e8a976b363	2021-11-02 20:04:07 -07:00
Rohan Varma	885da61d7d	[PG NCCL] Disable NCCL health check (#67668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67668 This adds an env var to enable NCCL health check, which when left unspecified, results in the check not being run. Unit tests that need to test this functionality have the env variable set. Please see internal diff for more details. Test Plan: CI Reviewed By: yuguo68, mrshenli Differential Revision: D32089763 fbshipit-source-id: dff5664a5e607f711515cd1042089ca769914fbb	2021-11-02 16:21:59 -07:00
Jane Xu	34051d74da	Add test owner to distributed files starting with test_ (#66797 ) Summary: Action based on https://github.com/pytorch/pytorch/issues/66232 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66797 Reviewed By: gchanan Differential Revision: D31761389 Pulled By: janeyx99 fbshipit-source-id: c27c9ab4acec1eb71d5edd4538cd113b770dfc6c	2021-10-19 10:55:20 -07:00
Rohan Varma	06fa6c15c0	Back out "Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"" (#66393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393 Third try! Fixes: - test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that. - ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed. ghstack-source-id: 140560113 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534735 fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2	2021-10-14 22:23:22 -07:00
Rohan Varma	901df0cc22	Skip test_nccl_errors_nonblocking (#66394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66394 Skips this test as it currently does not seem to pass after several internal local runs. ghstack-source-id: 140210583 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534806 fbshipit-source-id: 799849a6a715506a85c9697b46f7098d9b71b32e	2021-10-11 10:08:31 -07:00
Jane Xu	0a48f56318	Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" Test Plan: revert-hammer Differential Revision: D31299350 (`f1f3bd8c36`) Original commit changeset: 9ad5c8fa17f7 fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317	2021-10-08 17:55:28 -07:00
Rohan Varma	f1f3bd8c36	Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883 Original commit changeset: d8e962b8aab6 ghstack-source-id: 139836954 Test Plan: ci Reviewed By: zhaojuanmao Differential Revision: D31299350 fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130	2021-10-08 16:04:20 -07:00
Mike Ruberry	91f8755b0e	Revert D31005792: [NCCL] Init dummy NCCL comms in constructor Test Plan: revert-hammer Differential Revision: D31005792 (`2b22a5dde2`) Original commit changeset: c2c582dee25a fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27	2021-09-29 20:46:38 -07:00
Rohan Varma	2b22a5dde2	[NCCL] Init dummy NCCL comms in constructor (#65173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173 Initializes dummy NCCL communicators in constructor for a basic health check that communicators can be initialized prior to launching the first collective. After successful init, we immediately use `ncclCommAbort` to destroy these communicators to ensure they don't interfere with regular communicator creation during collectives. Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31005792 fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b	2021-09-29 15:36:54 -07:00
Tingting Markstrum	2a0208f4dc	fixed comments referring fairscale master branch (#65531 ) Summary: replace comments referring fairscale master branch with main branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/65531 Test Plan: buck build cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Reviewed By: H-Huang, anj-s Differential Revision: D31132552 Pulled By: tmarkstrum fbshipit-source-id: d3ee8920ab5cccad99f640934c21e8eee022e9b9	2021-09-23 14:37:58 -07:00
Michael Carilli	64d3c7388f	[RELAND] Enable ncclAvg for reductions (#62835 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/62303. Reverts the revert, and restores some diffs that were mysteriously missing from the reverted revert. I think some of the diffs I pushed to the original PR raced with its import or landing, such that the original PR's merge didn't pick up all the diffs I wanted. I don't know enough about the landing process to do more than speculate wildly, but hopefully this resubmit sorts things out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62835 Reviewed By: zhouzhuojie, seemethere, janeyx99, heitorschueroff Differential Revision: D30999982 Pulled By: malfet fbshipit-source-id: 1f70ab4055208f1c6a80c9fc9fbe292ce68ecaa9	2021-09-21 18:09:45 -07:00
Rohan Varma	e0e832c2ba	[c10d] Provide failure reason from ProcessGroup when aborting NCCL comm (#64241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64241 When things go wrong PG NCCL aborts nccl communicators via `ncclCommAbort`, but one issues is that often the error can be set to `ncclSystemError` (see https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/NCCLUtils.hpp#L176) when that might not be the true cause of the issue and the actual issue is that some prior work timed out, communicator was aborted on other rank, etc. This results in a lot of confusion when debugging jobs with a large no. of processes as the current message for ncclSystemError is not very informative: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/NCCLUtils.hpp#L22 The fix here is to pass in a string exception message from PG NCCL down to `NCCLUtils` which will aim to raise that as the actual issue and not the confusing `ncclSystemError` message. Test Plan: CI Reviewed By: pallab-zz, cbalioglu Differential Revision: D30658855 fbshipit-source-id: 17661dbe0a1bb8cc5b87b637c47634b1f52f54e1	2021-09-08 09:19:24 -07:00
Rohan Varma	16a4434422	[BE] Enable functional optim tests for windows (#63462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63462 Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, these tests can be run on windows. ghstack-source-id: 136437635 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30358923 fbshipit-source-id: 36739bdfe7214789f17de652d30c62c2bc124c73	2021-08-23 17:49:01 -07:00
Pritam Damania	2d671ca41b	[8/N] Remove c10d/ddp fork tests. (#63454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454 Continuation of https://github.com/pytorch/pytorch/pull/63443, this PR removes all fork tests from torch.distributed. ghstack-source-id: 136285511 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D30387872 fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513	2021-08-20 12:23:18 -07:00
Yinbin Ma	0d437fe6d0	BF16 allreduce hook (#63260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63260 Add BF16 all-reduce communication hook. Skip if CUDA version < 11 or NCCL version < 2.9.7. Reviewed By: SciPioneer Differential Revision: D30238317 fbshipit-source-id: bad35bf7d43f10f1c40997a282b831b61ef592bb	2021-08-18 20:53:49 -07:00
Rohan Varma	dcf90b797c	[BE] remove _SUPPORTED_OPTIM_MAP from tests (#63383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63383 Per title ghstack-source-id: 135966157 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30358921 fbshipit-source-id: 965e054e525194b1ee55980340df275bab355c9b	2021-08-17 17:17:25 -07:00
Rohan Varma	5b8862abf1	[DDP] Support step_param for AdamW (#63382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63382 Per title ghstack-source-id: 135966156 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30255446 fbshipit-source-id: e6ffbf339db0bc5b4702d02b74a462309df07c75	2021-08-17 17:16:11 -07:00
Pritam Damania	f7611b31aa	[4/N] Enable opt-asan for distributed unit tests. (#62051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62051 The goal here is to enable opt-asan for "spawn" based unit tests since this works for "spawn" unlike "dev-asan". As a result, we can run ASAN for "spawn" unit tests as well. This means we can completely remove fork unit tests from the code base since the only purpose for these tests was to run ASAN. ghstack-source-id: 135523770 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29854514 fbshipit-source-id: 02a5bfcfae2afc21badecff77082c7a6ad83636b	2021-08-10 22:38:31 -07:00
Eddie Yan	d893b44cd8	change nccl version reporting (#62916 ) Summary: https://github.com/pytorch/pytorch/issues/62295 Previously the packing and unpacking of the NCCL version "integer" was done to have parity with the upstream NCCL version encoding. However, there doesn't seem to be any place where this integer is directly compared with a version integer sourced from upstream NCCL, and syncing the encoding seems to be error-prone (e.g., a recent change where a special case was added for minor versions >= 10 `7e51592129/src/nccl.h.in (L22)`). This patch changes the reporting to return a tuple of version numbers instead (to preserve ease-of-use for comparisons) and tweaks the passing between C/Python to avoid the digit overflow problem. CC ngimel mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/62916 Reviewed By: anjali411 Differential Revision: D30201069 Pulled By: mrshenli fbshipit-source-id: 2e4e7c69f001c3f22bd04aa6df6a992e538bea45	2021-08-10 17:46:27 -07:00
Rohan Varma	1dba329d20	Enable step_param for Adam functional optimizer (#62611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62611 Enables optimizer overlap with backwards in DDP for Adam. Additional optimizers, especially Adagrad will be done in follow up diffs. 1. Implement `step_param` method based on `step` in _FunctionalAdam (perf permitting we can later dedupe `step` to call `step_param` 2. Modify tests to test all current functional optimizers. ghstack-source-id: 135207143 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29891783 fbshipit-source-id: 321915982afd5cb0a9c2e43d27550f433bff00d1	2021-08-06 10:53:55 -07:00
Pavel Belevich	0c8ed042f2	Revert D30095246: [pytorch][PR] Enable ncclAvg for reductions Test Plan: revert-hammer Differential Revision: D30095246 (`a749180e4e`) Original commit changeset: d3a3475345fa fbshipit-source-id: 34b5100b925859461296cae5a717a70e5eca6af6	2021-08-05 07:56:40 -07:00
Michael Carilli	a749180e4e	Enable ncclAvg for reductions (#62303 ) Summary: [ncclAvg](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html?highlight=ncclavg#c.ncclAvg) is a new `ncclRedOpt_t` that fuses a div-by-world-size with ncclAllReduce, Reduce, or ReduceScatter. This PR adds support. This PR and https://github.com/pytorch/pytorch/pull/62140 lay the foundation for to DDP allreduce+average grad tensors in place with a single nccl call without additional memory pass(es) to flatten or average or unflatten. I'll write the necessary DDP changes once this PR and https://github.com/pytorch/pytorch/pull/62140 land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62303 Reviewed By: soulitzer Differential Revision: D30095246 Pulled By: rohan-varma fbshipit-source-id: d3a3475345fafb0ab265c11d36db74d7c5613a0a	2021-08-04 19:43:50 -07:00
Sean Lawlor	34c9f5a8da	[DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662 Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface. Reviewed By: SciPioneer Differential Revision: D30012869 fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482	2021-08-04 09:27:31 -07:00
Yi Wang	2ec4f69b48	[DDP Comm Hook] Do not expose hook_then_optimizer as a public method (#62532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62532 This method is not stable at this time, so avoid releasing it when DDP communication hook feature is released as a stable feature. ghstack-source-id: 134787831 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_hook_with_optimizer_parity buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_hook_then_optimizer_nccl Reviewed By: rohan-varma Differential Revision: D30031222 fbshipit-source-id: e03a8e13fee5116a5ddd724eb76316ee98f2a676	2021-08-02 12:25:01 -07:00
Yi Wang	32b37ba246	[DDP Communication Hook] Update the typing info of comm hook output as well as some docstring (#62457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62457 Specify `Future[torch.Tensor]` as DDP communication hook return type, which should be explicitly a single tensor. The previous API takes a list that has a single tensor. Note that now the typing info no longer accepts the internal type of `torch._C.Future`, which does not support torchscript and hence cannot support `Future[torch.Tensor]`. ghstack-source-id: 134771419 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_invalid_comm_hook_return_type Reviewed By: rohan-varma Differential Revision: D30007390 fbshipit-source-id: 246667c9b575b4c6e617b0a5b373151f1bd81e7f	2021-07-30 20:51:34 -07:00
Yi Wang	acba9b3104	[DDP Communication Hook] Simplify the implementation of parseHookResult of PythonCommHook (#62389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62389 Simplify the implementation of `parseHookResult` of `PythonCommHook`, by not directly accepting the output of allreduce, which is a tensor list. Address the comment on https://github.com/pytorch/pytorch/pull/62074#discussion_r675303280 Additionally, formatter is also applied to `OptimizerHookState` and `hook_then_optimizer`. ghstack-source-id: 134626246 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork Reviewed By: rohan-varma Differential Revision: D29982485 fbshipit-source-id: 5b27cc5ef09d2f87c1ade4c0feef7eacc1af3a9a	2021-07-29 17:27:35 -07:00
Yi Wang	554daef820	Reformat test_c10d_nccl.py and distributed_test.py (#62388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62388 as title ghstack-source-id: 134626247 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D29984086 fbshipit-source-id: 0960e5acc379ccdf08813387e11d3fb0a5f0e4b0	2021-07-29 17:27:33 -07:00
Pritam Damania	82d81455ae	[2/N] Remove unittest.skip across all of torch.distributed. (#61887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887 1) Introduced a `sandcastle_skip_if` decorator that ensures these tests just get passed on sandcastle. 2) Fixed all test files under `test/distributed` to not use `unittest.skip` Overall goal is to avoid using skips since sandcastle tags these tests as continuously skipping. ghstack-source-id: 134382237 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29784152 fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d	2021-07-27 10:53:23 -07:00
Rohan Varma	64283fe146	[DDP/Functional Optim] Support kwarg arguments (#62079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62079 Adds support for kwarg arguments into functional optimizer running as hook. ghstack-source-id: 134330379 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29838127 fbshipit-source-id: 2ab051ef5f0dff19c145ebe2260668b927ba47b2	2021-07-26 22:12:50 -07:00

1 2

71 Commits