pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Tom Ritchford	d25e6e623f	Fix unused Python variables in test/[a-d]* (#134665 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134665 Approved by: https://github.com/albanD	2024-12-13 22:13:12 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
FFFrog	e14b58ffbd	Using device-agnostic autocast api (#136613 ) - using torch.autocast(device_str="cuda") instead of torch.cuda.amp.autocast() - using torch.autocast(device_str="cpu") instead of torch.cpu.amp.autocast() Pull Request resolved: https://github.com/pytorch/pytorch/pull/136613 Approved by: https://github.com/shink, https://github.com/cyyever, https://github.com/kwen2501	2024-09-27 07:16:24 +00:00
Oguz Ulgen	920f0426ae	Add None return type to init -- tests rest (#132376 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132376 Approved by: https://github.com/jamesjwu ghstack dependencies: #132335, #132351, #132352	2024-08-01 15:44:51 +00:00
ekamiti	9e473fd868	Make adding Buffers more like adding Parameters (#125971 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971 Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos	2024-07-31 10:32:40 +00:00
Xuehai Pan	db3290846e	[BE][Easy][10/19] enforce style for empty lines in import segments in `test/d*/` (#129761 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129761 Approved by: https://github.com/fegin	2024-07-17 16:57:39 +00:00
Yuanhao Ji	e3effa5855	Enable UFMT on all of `test/distributed` (#123539 ) Partially addresses #123062 Ran lintrunner on: - `test/distributed` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123539 Approved by: https://github.com/ezyang	2024-04-17 06:46:02 +00:00
PyTorch MergeBot	52be63eb2c	Revert "Enable UFMT on all of `test/distributed` (#123539 )" This reverts commit `89ac37fe91`. Reverted https://github.com/pytorch/pytorch/pull/123539 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/123539#issuecomment-2058329471))	2024-04-16 06:33:21 +00:00
Yuanhao Ji	89ac37fe91	Enable UFMT on all of `test/distributed` (#123539 ) Partially addresses #123062 Ran lintrunner on: - `test/distributed` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123539 Approved by: https://github.com/ezyang	2024-04-16 03:23:56 +00:00
soulitzer	73f4c1a406	[reland2] Update custom Function preserve torch function when inputs … (#110895 ) …returned as-is Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110895 Approved by: https://github.com/albanD	2023-10-11 21:37:19 +00:00
PyTorch MergeBot	d1c157c598	Revert "[reland] Update custom Function preserve torch function when inputs r… (#110679 )" This reverts commit `563728f61c`. Reverted https://github.com/pytorch/pytorch/pull/110679 on behalf of https://github.com/kit1980 due to The diff has Meta-internal changes, please land from Phabricator ([comment](https://github.com/pytorch/pytorch/pull/110679#issuecomment-1753523182))	2023-10-09 19:09:01 +00:00
soulitzer	563728f61c	[reland] Update custom Function preserve torch function when inputs r… (#110679 ) …eturned as-is reland of https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749803837 Opening this without ghstack to do codev. In our PR, we changed the signature of `_wrap_outputs`. There is some internal code that calls `_wrap_outputs` directly, so we also need to update that callsite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110679 Approved by: https://github.com/albanD	2023-10-07 00:27:45 +00:00
PyTorch MergeBot	236afe73a2	Revert "Update custom Function preserve torch function when inputs returned as-is (#109825 )" This reverts commit `4e73eee93f`. Reverted https://github.com/pytorch/pytorch/pull/109825 on behalf of https://github.com/PaliC due to causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749802739))	2023-10-05 23:49:41 +00:00
soulitzer	4e73eee93f	Update custom Function preserve torch function when inputs returned as-is (#109825 ) Fixes https://github.com/pytorch/pytorch/issues/109805 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109825 Approved by: https://github.com/albanD	2023-10-04 22:45:11 +00:00
Kurt Mohler	3f88e3105f	Reland: Remove remaining global `set_default_dtype` calls from tests (#108088 ) Fixes #68972 Relands #107246 To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088 Approved by: https://github.com/ezyang	2023-09-07 03:04:34 +00:00
PyTorch MergeBot	161ea463e6	Revert "Remove remaining global `set_default_dtype` calls from tests (#107246 )" This reverts commit `aa8ea1d787`. Reverted https://github.com/pytorch/pytorch/pull/107246 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/107246#issuecomment-1693838522))	2023-08-25 19:34:55 +00:00
Kurt Mohler	aa8ea1d787	Remove remaining global `set_default_dtype` calls from tests (#107246 ) Fixes #68972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107246 Approved by: https://github.com/ezyang	2023-08-24 16:10:48 +00:00
Jason Lu	bc88028e8e	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 ) Summary: Original commit changeset: 81319beb97f3 Original Phabricator Diff: D47961182 Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822 Reviewed By: atuljangra Differential Revision: D48131623 @diff-train-skip-merge (D48131623 landed internally) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743 Approved by: https://github.com/malfet	2023-08-08 15:27:34 +00:00
Mikayla Gawarecki	d8e5f2aa6d	Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224 Approved by: https://github.com/atalman, https://github.com/albanD	2023-07-31 17:18:56 +00:00
Andrey Talman	c6653b65d8	Back out "Make adding buffers more like adding parameters (#104069 )" (#105581 ) Summary: D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/ with `TypeError: register_buffer() takes 3 positional arguments but 4 were given` Original commit changeset: d4b4069fbd38 Original Phabricator Diff: D47537831 Test Plan: ``` buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform ``` Reviewed By: atalman Differential Revision: D47600140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581 Approved by: https://github.com/mikaylagawarecki	2023-07-20 03:39:53 +00:00
Justin Chu	232b96b6e2	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105433 Approved by: https://github.com/albanD	2023-07-19 14:27:11 +00:00
ekamiti	32d422f335	Make adding buffers more like adding parameters (#104069 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069 Approved by: https://github.com/mikaylagawarecki	2023-07-17 17:59:05 +00:00
Sergii Dymchenko	35bf5bac26	Fix "sandcastle_skip_if decorator name is confusing" (#95649 ) Fixes https://github.com/pytorch/pytorch/issues/89473 See the issue https://github.com/pytorch/pytorch/issues/89473 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95649 Approved by: https://github.com/atalman, https://github.com/malfet	2023-03-03 09:29:40 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Philip Meier	b5f2574f36	no longer coalesce sparse COO tensors before comparison (#69751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69751 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D34262453 Pulled By: ezyang fbshipit-source-id: e2e62d2aa03fc569d2951c880960b256f5dc4aaa (cherry picked from commit `cb6b0ef719`)	2022-02-17 02:33:08 +00:00
Alban Desmaison	7035738b50	Change ParameterList and ParameterDict to be able to contain any kind of objects (#70499 ) Summary: The only difference with plain list/dict now is that nn.Parameters are handled specially and registered as parameters properly. test_nn and parametrization works locally. Will see in CI if DP is fixed as well. Tentative fix for https://github.com/pytorch/pytorch/issues/36035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70499 Reviewed By: jbschlosser, alexeib Differential Revision: D34005332 Pulled By: albanD fbshipit-source-id: 7e76b0873d0fec345cb537e2a6ecba0258e662b9 (cherry picked from commit `dc1e6f8d86`)	2022-02-09 18:52:29 +00:00
Jane Xu	c4400fc431	Retire repeat_test_for_types (#71033 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69865 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/71033 Reviewed By: mruberry Differential Revision: D33486370 Pulled By: janeyx99 fbshipit-source-id: 71f9383dbc1e00b572f26eb4f04d0a94c6759e35	2022-01-10 09:13:54 -08:00
Jane Xu	34051d74da	Add test owner to distributed files starting with test_ (#66797 ) Summary: Action based on https://github.com/pytorch/pytorch/issues/66232 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66797 Reviewed By: gchanan Differential Revision: D31761389 Pulled By: janeyx99 fbshipit-source-id: c27c9ab4acec1eb71d5edd4538cd113b770dfc6c	2021-10-19 10:55:20 -07:00
Pritam Damania	82d81455ae	[2/N] Remove unittest.skip across all of torch.distributed. (#61887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887 1) Introduced a `sandcastle_skip_if` decorator that ensures these tests just get passed on sandcastle. 2) Fixed all test files under `test/distributed` to not use `unittest.skip` Overall goal is to avoid using skips since sandcastle tags these tests as continuously skipping. ghstack-source-id: 134382237 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29784152 fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d	2021-07-27 10:53:23 -07:00
Kyle Chen	06cf6d37b5	[ROCm] Enable test cases in test_data_parallel.py for ROCm (#52708 ) Summary: Enabling the test cases because they are passing for ROCm. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52708 Reviewed By: albanD Differential Revision: D26994458 Pulled By: mrshenli fbshipit-source-id: f0b3797c7889287a0154b1d5397df715ffb1c605	2021-03-11 18:29:37 -08:00
Jeffrey Wan	c0966914bc	Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49409 There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories: 1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead 3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?) Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False. So far exceptions to the above (as discovered by CI) include: - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103) - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236) - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235) - test_data_parallel (test_data_parallel_buffers_requiring_grad) - SIGSEGV (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697) - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315) Possible TODO is to prevent new tests from invoking external gradcheck. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133 Reviewed By: ezyang Differential Revision: D26147919 Pulled By: soulitzer fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432	2021-01-29 09:13:37 -08:00
Sidney Fletcher	71cfb73755	Add complex support to broadcast_coalesced (#48686 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47330 Add support for DataParallel complex tensors by handling them as `torch.view_as_real` for `broadcast_coalesced`, `scatter` and `gather` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48686 Reviewed By: osalpekar Differential Revision: D25261533 Pulled By: sidneyfletcher fbshipit-source-id: 3a25e05deee43e053f40d1068fc5c7867cfa9686	2020-12-09 05:11:40 -08:00
Richard Zou	22d21414d7	Revert D24574649: [pytorch][PR] Utility that loads a DP/DDP model state dict into a non-DDP model with the same architecture. Test Plan: revert-hammer Differential Revision: D24574649 (`b631c872c9`) Original commit changeset: 17d29ab16ae2 fbshipit-source-id: 6766c6b21b82c9463143da0370192d9c68dbce6c	2020-11-10 06:55:45 -08:00
Pradeep Ganesan	b631c872c9	Utility that loads a DP/DDP model state dict into a non-DDP model with the same architecture. (#45643 ) Summary: Added a convenience function that allows users to load models without DP/DDP from a DP/DDP state dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45643 Reviewed By: rohan-varma Differential Revision: D24574649 fbshipit-source-id: 17d29ab16ae24a30890168fa84da6c63650e61e9	2020-11-09 20:49:29 -08:00
lixinyu	870a5a0d6d	Enable DataParallel to run zero input Module (#46565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46565 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24405275 Pulled By: glaringlee fbshipit-source-id: a8baaf4cf227f7f21fc3b080a446f92f0effe18e	2020-10-22 18:04:33 -07:00
Emilio Castillo	d38a71d579	`torch.nn.modules.LazyModuleMixin` and `torch.nn.LazyLinear` (Shape Inference II) (#44538 ) Summary: Retake on https://github.com/pytorch/pytorch/issues/40493 after all the feedback from albanD This PR implements the generic Lazy mechanism and a sample `LazyLinear` layer with the `UninitializedParameter`. The main differences with the previous PR are two; Now `torch.nn.Module` remains untouched. We don't require an explicit initialization or a dummy forward pass before starting the training or inference of the actual module. Making this much simpler to use from the user side. As we discussed offline, there was the suggestion of not using a mixin, but changing the `__class__` attribute of `LazyLinear` to become `Linear` once it's completely initialized. While this can be useful, by the time being we need `LazyLinear` to be a `torch.nn.Module` subclass since there are many checks that rely on the modules being instances of `torch.nn.Module`. This can cause problems when we create complex modules such as ``` class MyNetwork(torch.nn.Module): def __init__(self): super(MyNetwork, self).__init__() self.conv = torch.nn.Conv2d(20, 4, 2) self.linear = torch.nn.LazyLinear(10) def forward(self, x): y = self.conv(x).clamp(min=0) return self.linear(y) ``` Here, when the __setattr__ function is called at the time LazyLinear is registered, it won't be added to the child modules of `MyNetwork`, so we have to manually do it later, but currently there is no way to do such thing as we can't access the parent module from LazyLinear once it becomes the Linear module. (We can add a workaround to this if needed). TODO: Add convolutions once the design is OK Fix docstrings Pull Request resolved: https://github.com/pytorch/pytorch/pull/44538 Reviewed By: ngimel Differential Revision: D24162854 Pulled By: albanD fbshipit-source-id: 6d58dfe5d43bfb05b6ee506e266db3cf4b885f0c	2020-10-19 13:13:54 -07:00
albanD	e155fbe915	add warning when ParameterList/Dict is used with DataParallel (#44405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44405 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D23783987 Pulled By: albanD fbshipit-source-id: 5018b0d381cb09301d2f88a98a910854f740ace1	2020-09-22 08:58:00 -07:00
Hongfei XU	f02753fabb	Support AMP in nn.parallel (#43102 ) Summary: Take care of the state of autocast in `parallel_apply`, so there is no need to decorate model implementations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43102 Reviewed By: ngimel Differential Revision: D23294610 Pulled By: mrshenli fbshipit-source-id: 0fbe0c79de976c88cadf2ceb3f2de99d9342d762	2020-08-25 08:38:49 -07:00
Eli Uriegas	ccc831ae35	test: Disable test_strided_grad_layout on ROCM (#42561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42561 Regression was introduced as part of `5939d8a3e0`, logs: https://app.circleci.com/pipelines/github/pytorch/pytorch/196558/workflows/9a2dd56e-86af-4d0f-9fb9-b205dcd12f93/jobs/6502042 Going to go ahead and disable the test to give rocm folks time to investigate what's going on Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D22932615 Pulled By: seemethere fbshipit-source-id: 41150f3085f848cce75990716362261fea9391a0	2020-08-04 16:20:44 -07:00
Michael Carilli	8066fba226	[RELAND2] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40358 ) Summary: https://github.com/pytorch/pytorch/pull/40129 fixed the error responsible for the first revert, but exposed another error in the same test. This PR is intended as the "master copy" for merge, and it runs on full CI. Two other PRs (restricted to run on a small subset of CI) supporting debugging DDP failures/hangs with multiple devices per process (`test_c10d.py:DistributedDataParallelTest.test_grad_layout_1devicemodule_2replicaperprocess`). - https://github.com/pytorch/pytorch/pull/40290 tries the test with purely rowmajor contiguous params on an untouched master. In other words https://github.com/pytorch/pytorch/pull/40290 contains none of this PR's diffs aside from the test itself. - https://github.com/pytorch/pytorch/pull/40178, for comparison, tries the test with this PR's diffs. Both fail the same way, indicating failure is unrelated to this PR's other diffs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40358 Differential Revision: D22165785 Pulled By: albanD fbshipit-source-id: ac7cdd79af5c080ab74341671392dca8e717554e	2020-06-22 17:13:21 -07:00
Alban Desmaison	08227fea4f	Revert D22079377: [pytorch][PR] [RELAND] Change AccumulateGrad to yield `.grad`s that match weights' memory layout Test Plan: revert-hammer Differential Revision: D22079377 Original commit changeset: 9bd2b7e0c34f fbshipit-source-id: c22cc349d790caa574eace0d63980854c33e5a59	2020-06-17 10:17:27 -07:00
Michael Carilli	1ec8ece2b9	[RELAND] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40129 ) Summary: https://github.com/pytorch/pytorch/pull/34904 was reverted because it had a misconfigured 4 GPU test that for some reason wasn't caught by external CI ([example failure](https://app.circleci.com/pipelines/github/pytorch/pytorch/181719/workflows/cfb37cd9-9a0c-4738-898b-d683934cd308/jobs/5868948/steps)). This PR reverts the revert, and adds diffs that should repair the misconfigured test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40129 Differential Revision: D22079377 Pulled By: albanD fbshipit-source-id: 9bd2b7e0c34fdaf887497b52037cfe82cba709c1	2020-06-17 09:02:54 -07:00
Alban Desmaison	f1e575a0bf	Revert D20496044: [pytorch][PR] Change AccumulateGrad to yield `.grad`s that match weights' memory layout Test Plan: revert-hammer Differential Revision: D20496044 Original commit changeset: 248d680f4b1b fbshipit-source-id: 6462b25e3fb9c8596c1da443389089f09c32df4d	2020-06-16 10:38:40 -07:00
Michael Carilli	2beb9690c3	Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#34904 ) Summary: Currently, whether `AccumulateGrad` [steals](`67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L42)`) or [clones](`67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L80)`) an incoming gradient, the gradient ends up rowmajor contiguous, regardless of its param's layout. If the param's layout is channels last, or otherwise not rowmajor contigous, later kernels that apply gradients to params are forced into an uncoalesced memory access pattern for either the param or the gradient. This may not sound like a big deal but for any binary op on large tensors it's a >3X increase in gmem traffic => 3X slowdown. The present PR changes `AccumulateGrad` to prefer, where possible, stashing gradients that match their params' layouts (["Gradient Layout Contract"](https://github.com/pytorch/pytorch/pull/34904/files#diff-ef1a56d24f66b280dcdb401502d6a796R29-R38)). Allowing `AccumulateGrad` to stash non-rowmajor-contiguous grads means DDP allreduces and DP reduces must allow non-rowmajor-contiguous grads. This PR extends DDP and DP to allow gradients with non-rowmajor-contiguous strides as long as their layout is nonoverlapping and dense. For good measure, I include changes that allow all five nccl primitives (allreduce, reduce, broadcast, allgather, reducescatter) to act on non-rowmajor-contiguous tensors (again as long as each input's layout is nonoverlapping and dense, and as long as all tensors participating in a given collective have the same layout). The primitive comm changes aren't necessary to enable the DDP changes, but I wasn't sure this would end up true until I had written both sets of changes. I think primitive comm enablement is reasonable to keep in the PR, especially since the code for it is simple. Channels last params will be a major beneficiary of this PR, but I don't see it as channels-last-specific fix. The spirit is layout matching in general: - Grads should be stashed with memory layouts matching their params. - Src and dst tensors on opposite ends of collectives should have matching dense layouts. This PR also updates autograd docs to describe potential BC-breaking changes below. ## BC notes ngimel albanD gchanan #### BC-breaking In the common case where the user lets AccumulateGrad decide grad layouts, strides for grads of dense but non-rowmajor-contiguous params will change. Any user code that was accustomed to `view(-1)`ing these grads will break. Also, the circumstances under which a grad can be stolen directly from the backward function that created it, as opposed to deep-copied by AccumulateGrad, have changed. In most cases we expect silent performance improvement, because we expect channels-last-aware backward kernels will create channels last gradients for channels last params. Now those can be stolen, whereas before this PR they were cloned and made rowmajor contiguous. IMO this is a mild BC breakage. Param backward hooks still see grads come in with whatever format the backward kernel gave them. The only BC breakage potential I see is if user code relies somehow on a grad in a hook having or not having the same deep memory as the eventual `param.grad`. Any such users hopefully know they're off the edge of the map and understand how to update their expectations. #### BC escape hatches At alband's recommendation, this PR's changes to AccumulateGrad do not alter the pre-PR code's decisions about whether grad is accumulated in or out of place. Accumulations of new grads onto an existing `.grad` attribute were (usually) in-place before this PR and remain in-place after this PR, keeping the existing `.grad`'s layout. After this PR, if the user wants to force accumulation into a grad with a particular layout, they can preset `param.grad` to a zeroed tensor with the desired strides or call `grad.contiguous(desired format)`. This likely won't be as performant as letting AccumulateGrad establish grad layouts by cloning or stealing grads with contract-compliant strides, but at least users have a control point. One limitation (present before this PR and unchanged by this PR): Presetting `param.grad` does not ensure in-place accumulation all the time. For example, if `create_graph=True`, or if incoming `new_grad` is dense and existing `variable_grad` is sparse, accumulation occurs out of place, and the out-of-place result may not match the existing grad's strides. ---------------------------- I also noticed some potential DDP improvements that I considered out of scope but want to mention for visibility: 1. make sure Reducer's ops sync with AccumulateGrad streams 2. ~to reduce CPU overhead and incur fewer kernel launches, lazily create flat `contents` tensors by a single `cat` kernel only when a bucket is full, instead of `copy_`ing grads into `contents` individually as soon as they are received.~ PR includes a [minor change](https://github.com/pytorch/pytorch/pull/34904/files#diff-c269190a925a4b0df49eda8a8f6c5bd3R312-R315) to divide grads while copying them into flat buffers, instead of copying them in, then dividing separately. Without cat+div fusion, div-while-copying is the best we can do. 3. https://github.com/pytorch/pytorch/issues/38942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34904 Differential Revision: D20496044 Pulled By: albanD fbshipit-source-id: 248d680f4b1bf77b0a986451844ec6e254469217	2020-06-16 08:43:31 -07:00
Mike Ruberry	13120bf677	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21740237 Pulled By: mruberry fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042	2020-05-27 06:31:07 -07:00
Rohan Varma	63e545e0fe	Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol Test Plan: revert-hammer Differential Revision: D21717199 Original commit changeset: 9feb856f94ee fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259	2020-05-26 18:23:59 -07:00
Mike Ruberry	6ddca30b2d	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21717199 Pulled By: mruberry fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a	2020-05-26 08:30:23 -07:00
Peter Bell	f8ec51bd86	Ensure DataParallel replicas can be saved (#37307 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37182 The `zero_grad` wrapper from `_replicate_for_data_parallel` can't be pickled. So instead, I set an attribute `_is_replica = True` and check for this in `Module.zero_grad`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37307 Differential Revision: D21246119 Pulled By: mrshenli fbshipit-source-id: 4755786d48a20bc247570ba672de9dd526914ce1	2020-04-25 20:57:24 -07:00
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
Wanchao Liang	3526627f46	Use unittest assertWarns instead (#36411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411 This PR remove pytorch specific defined assertwarns and use the unit test one, also format some tests Test Plan: Imported from OSS Differential Revision: D20998159 Pulled By: wanchaol fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201	2020-04-13 15:56:42 -07:00

1 2

63 Commits