pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
imaginary-person	9e53c823b8	Add AVX512 support in ATen & remove AVX support (#61903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61903 ### Remaining Tasks - [ ] Collate results of benchmarks on two Intel Xeon machines (with & without CUDA, to check if CPU throttling causes issues with GPUs) - make graphs, including Roofline model plots (Intel Advisor can't make them with libgomp, though, but with Intel OpenMP). ### Summary 1. This draft PR produces binaries with with 3 types of ATen kernels - default, AVX2, AVX512 . Using the environment variable `ATEN_AVX512_256=TRUE` also results in 3 types of kernels, but the compiler can use 32 ymm registers for AVX2, instead of the default 16. ATen kernels for `CPU_CAPABILITY_AVX` have been removed. 2. `nansum` is not using AVX512 kernel right now, as it has poorer accuracy for Float16, than does AVX2 or DEFAULT, whose respective accuracies aren't very good either (#59415). It was more convenient to disable AVX512 dispatch for all dtypes of `nansum` for now. 3. On Windows , ATen Quantized AVX512 kernels are not being used, as quantization tests are flaky. If `--continue-through-failure` is used, then `test_compare_model_outputs_functional_static` fails. But if this test is skipped, `test_compare_model_outputs_conv_static` fails. If both these tests are skipped, then a third one fails. These are hard to debug right now due to not having access to a Windows machine with AVX512 support, so it was more convenient to disable AVX512 dispatch of all ATen Quantized kernels on Windows for now. 4. One test is currently being skipped - [test_lstm` in `quantization.bc](https://github.com/pytorch/pytorch/issues/59098) - It fails only on Cascade Lake machines, irrespective of the `ATEN_CPU_CAPABILITY` used, because FBGEMM uses `AVX512_VNNI` on machines that support it. The value of `reduce_range` should be used as `False` on such machines. The list of the changes is at https://gist.github.com/imaginary-person/4b4fda660534f0493bf9573d511a878d. Credits to ezyang for proposing `AVX512_256` - these use AVX2 intrinsics but benefit from 32 registers, instead of the 16 ymm registers that AVX2 uses. Credits to limo1996 for the initial proposal, and for optimizing `hsub_pd` & `hadd_pd`, which didn't have direct AVX512 equivalents, and are being used in some kernels. He also refactored `vec/functional.h` to remove duplicated code. Credits to quickwritereader for helping fix 4 failing complex multiplication & division tests. ### Testing 1. `vec_test_all_types` was modified to test basic AVX512 support, as tests already existed for AVX2. Only one test had to be modified, as it was hardcoded for AVX2. 2. `pytorch_linux_bionic_py3_8_gcc9_coverage_test1` & `pytorch_linux_bionic_py3_8_gcc9_coverage_test2` are now using `linux.2xlarge` instances, as they support AVX512. They were used for testing AVX512 kernels, as AVX512 kernels are being used by default in both of the CI checks. Windows CI checks had already been using machines with AVX512 support. ### Would the downclocking caused by AVX512 pose an issue? I think it's important to note that AVX2 causes downclocking as well, and the additional downclocking caused by AVX512 may not hamper performance on some Skylake machines & beyond, because of the double vector-size. I think that [this post with verifiable references is a must-read](https://community.intel.com/t5/Software-Tuning-Performance/Unexpected-power-vs-cores-profile-for-MKL-kernels-on-modern-Xeon/m-p/1133869/highlight/true#M6450). Also, AVX512 would _probably not_ hurt performance on a high-end machine, [but measurements are recommended](https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/). In case it does, `ATEN_AVX512_256=TRUE` can be used for building PyTorch, as AVX2 can then use 32 ymm registers instead of the default 16. [FBGEMM uses `AVX512_256` only on Xeon D processors](https://github.com/pytorch/FBGEMM/pull/209), which are said to have poor AVX512 performance. This [official data](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) is for the Intel Skylake family, and the first link helps understand its significance. Cascade Lake & Ice Lake SP Xeon processors are said to be even better when it comes to AVX512 performance. Here is the corresponding data for [Cascade Lake](https://cdrdv2.intel.com/v1/dl/getContent/338848) - ![CASCADE LAKE AVX2](https://user-images.githubusercontent.com/76181208/120666172-ffec3f80-c451-11eb-8ea1-8933ccc12a1b.PNG) ![CASCADE LAKE AVX512](https://user-images.githubusercontent.com/76181208/120666190-04b0f380-c452-11eb-9faa-38d233c874c8.PNG) The corresponding data isn't publicly available for Intel Xeon SP 3rd gen (Ice Lake SP), but [Intel mentioned that the 3rd gen has frequency improvements pertaining to AVX512](https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf). Ice Lake SP machines also have 48 KB L1D caches, so that's another reason for AVX512 performance to be better on them. ### Is PyTorch always faster with AVX512? No, but then PyTorch is not always faster with AVX2 either. Please refer to #60202. The benefit from vectorization is apparent with with small tensors that fit in caches or in kernels that are more compute heavy. For instance, AVX512 or AVX2 would yield no benefit for adding two 64 MB tensors, but adding two 1 MB tensors would do well with AVX2, and even more so with AVX512. It seems that memory-bound computations, such as adding two 64 MB tensors can be slow with vectorization (depending upon the number of threads used), as the effects of downclocking can then be observed. Original pull request: https://github.com/pytorch/pytorch/pull/56992 Reviewed By: soulitzer Differential Revision: D29266289 Pulled By: ezyang fbshipit-source-id: 2d5e8d1c2307252f22423bbc14f136c67c3e6184	2021-07-22 08:51:49 -07:00
Rong Rong (AI Infra)	9ade039593	fix test file not found issue (#61610 ) Summary: it should not error out if the file is not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61610 Reviewed By: samestep Differential Revision: D29687958 Pulled By: walterddr fbshipit-source-id: 17cacba8daa131df9bfb37fd58d6e4870ff75198	2021-07-13 17:50:50 -07:00
Rong Rong (AI Infra)	a5a10fe353	Move all downloading logic out of common_utils.py (#61479 ) Summary: and into tools/ folder Currently run_tests.py invokes tools/test_selections.py 1. download and analyze what test_file to run 2. download and parse S3 stats and pass the info to local files. 3. common_utils.py uses download S3 stats to determine what test cases to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479 Reviewed By: janeyx99 Differential Revision: D29661986 Pulled By: walterddr fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595	2021-07-12 11:23:22 -07:00
Jane Xu	2bbcc80de3	Enable disabling test cases on specific platforms (#61427 ) Summary: This adds functionality to our common_utils.py to allow disabling test cases for platforms Mac, Windows, and Linux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61427 Test Plan: CI should not change as no issues currently have the line "Platforms:..." I tested locally by making sure `test_async_script` is skipped while running `python test/test_jit.py -k TestAsync.test_async_script` with a cached modified `.pytorch-disabled-tests.json`: ``` { "total_count": 32, "incomplete_results": false, "items": [ { "url": "https://api.github.com/repos/pytorch/pytorch/issues/60652", "repository_url": "https://api.github.com/repos/pytorch/pytorch", "labels_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/labels{/name}", "comments_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/comments", "events_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/events", "html_url": "https://github.com/pytorch/pytorch/issues/60652", "id": 929288995, "node_id": "MDU6SXNzdWU5MjkyODg5OTU=", "number": 60652, "title": "DISABLED test_async_script (jit.test_async.TestAsync)", "user": { "login": "ezyang", "id": 13564, "node_id": "MDQ6VXNlcjEzNTY0", "avatar_url": "https://avatars.githubusercontent.com/u/13564?v=4", "gravatar_id": "", "url": "https://api.github.com/users/ezyang", "html_url": "https://github.com/ezyang", "followers_url": "https://api.github.com/users/ezyang/followers", "following_url": "https://api.github.com/users/ezyang/following{/other_user}", "gists_url": "https://api.github.com/users/ezyang/gists{/gist_id}", "starred_url": "https://api.github.com/users/ezyang/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/ezyang/subscriptions", "organizations_url": "https://api.github.com/users/ezyang/orgs", "repos_url": "https://api.github.com/users/ezyang/repos", "events_url": "https://api.github.com/users/ezyang/events{/privacy}", "received_events_url": "https://api.github.com/users/ezyang/received_events", "type": "User", "site_admin": false }, "labels": [ { "id": 1301397902, "node_id": "MDU6TGFiZWwxMzAxMzk3OTAy", "url": "https://api.github.com/repos/pytorch/pytorch/labels/module:%20flaky-tests", "name": "module: flaky-tests", "color": "f7e101", "default": false, "description": "Problem is a flaky test in CI" }, { "id": 679953883, "node_id": "MDU6TGFiZWw2Nzk5NTM4ODM=", "url": "https://api.github.com/repos/pytorch/pytorch/labels/oncall:%20distributed", "name": "oncall: distributed", "color": "f7e101", "default": false, "description": "Add this issue/PR to distributed oncall triage queue" } ], "state": "open", "locked": false, "assignee": { "login": "rohan-varma", "id": 8039770, "node_id": "MDQ6VXNlcjgwMzk3NzA=", "avatar_url": "https://avatars.githubusercontent.com/u/8039770?v=4", "gravatar_id": "", "url": "https://api.github.com/users/rohan-varma", "html_url": "https://github.com/rohan-varma", "followers_url": "https://api.github.com/users/rohan-varma/followers", "following_url": "https://api.github.com/users/rohan-varma/following{/other_user}", "gists_url": "https://api.github.com/users/rohan-varma/gists{/gist_id}", "starred_url": "https://api.github.com/users/rohan-varma/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/rohan-varma/subscriptions", "organizations_url": "https://api.github.com/users/rohan-varma/orgs", "repos_url": "https://api.github.com/users/rohan-varma/repos", "events_url": "https://api.github.com/users/rohan-varma/events{/privacy}", "received_events_url": "https://api.github.com/users/rohan-varma/received_events", "type": "User", "site_admin": false }, "assignees": [ { "login": "rohan-varma", "id": 8039770, "node_id": "MDQ6VXNlcjgwMzk3NzA=", "avatar_url": "https://avatars.githubusercontent.com/u/8039770?v=4", "gravatar_id": "", "url": "https://api.github.com/users/rohan-varma", "html_url": "https://github.com/rohan-varma", "followers_url": "https://api.github.com/users/rohan-varma/followers", "following_url": "https://api.github.com/users/rohan-varma/following{/other_user}", "gists_url": "https://api.github.com/users/rohan-varma/gists{/gist_id}", "starred_url": "https://api.github.com/users/rohan-varma/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/rohan-varma/subscriptions", "organizations_url": "https://api.github.com/users/rohan-varma/orgs", "repos_url": "https://api.github.com/users/rohan-varma/repos", "events_url": "https://api.github.com/users/rohan-varma/events{/privacy}", "received_events_url": "https://api.github.com/users/rohan-varma/received_events", "type": "User", "site_admin": false } ], "milestone": null, "comments": 0, "created_at": "2021-06-24T14:28:33Z", "updated_at": "2021-06-24T16:40:42Z", "closed_at": null, "author_association": "CONTRIBUTOR", "active_lock_reason": null, "body": "Platforms:Mac, windows, Linux\r\n```\r\nJun 24 00:59:14 ======================================================================\r\nJun 24 00:59:14 ERROR [0.477s]: test_async_script (__main__.ProcessGroupGlooWrapperTest)\r\nJun 24 00:59:14 ----------------------------------------------------------------------\r\nJun 24 00:59:14 Traceback (most recent call last):\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 398, in wrapper\r\nJun 24 00:59:14 self._join_processes(fn)\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 590, in _join_processes\r\nJun 24 00:59:14 self._check_return_codes(elapsed_time)\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 633, in _check_return_codes\r\nJun 24 00:59:14 raise RuntimeError(error)\r\nJun 24 00:59:14 RuntimeError: Process 0 exited with error code 10 and exception:\r\nJun 24 00:59:14 RuntimeError: [/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.17.0.2]:21400\r\nJun 24 00:59:14 \r\nJun 24 00:59:14 During handling of the above exception, another exception occurred:\r\nJun 24 00:59:14 \r\nJun 24 00:59:14 Traceback (most recent call last):\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 516, in run_test\r\nJun 24 00:59:14 getattr(self, test_name)()\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 400, in wrapper\r\nJun 24 00:59:14 fn()\r\nJun 24 00:59:14 File \"distributed/test_pg_wrapper.py\", line 270, in test_collective_hang\r\nJun 24 00:59:14 self._test_collective_hang(pg)\r\nJun 24 00:59:14 File \"distributed/test_pg_wrapper.py\", line 52, in _test_collective_hang\r\nJun 24 00:59:14 wrapper_pg.allreduce([tensor])\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/unittest/case.py\", line 217, in __exit__\r\nJun 24 00:59:14 expected_regex.pattern, str(exc_value)))\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/unittest/case.py\", line 135, in _raiseFailure\r\nJun 24 00:59:14 raise self.test_case.failureException(msg)\r\nJun 24 00:59:14 AssertionError: \"Ranks 1 failed to pass monitoredBarrier\" does not match \"[/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.17.0.2]:21400\"\r\n```\r\n\r\nhttps://www.internalfb.com/intern/opensource/ci/job/log/225221175921058/\n\ncc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23", "performed_via_github_app": null, "score": 0.0 } ] } ``` Reviewed By: iramazanli Differential Revision: D29627799 Pulled By: janeyx99 fbshipit-source-id: 5ef79127cbe0055c4f41766048e66f98cf80d2c4	2021-07-09 09:29:16 -07:00
Jane Xu	fb00194030	Fix typo in common_utils.py (#61365 ) Summary: Missed this in review of https://github.com/pytorch/pytorch/pull/57953. I don't think this has affected much, though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61365 Reviewed By: walterddr Differential Revision: D29593764 Pulled By: janeyx99 fbshipit-source-id: 2c6f6aa961eabca0d8b8a7607aaae979667cca3b	2021-07-07 16:28:20 -07:00
driazati	45cc207a88	Fix breakpad build + add test canary (#60990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60990 This makes the breakpad build more explicit in its messaging and hints to cmake where to look for the library (it wasn't able to find it without `PATHS` on CI even though that works locally). This also adds a smoke test that will fail if breakpad isn't present on a CI job where it is expected (e.g. binary builds). Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29514316 Pulled By: driazati fbshipit-source-id: 79514363334788f311ba5d4f25deed3452f0c3eb	2021-07-06 14:15:07 -07:00
Pearu Peterson	374278f431	Improved sparse CSR tensor sampling method (#60283 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59379 The improved sparse CSR tensor sampling method is described in https://pearu.github.io/csr_sampling.html that features: - for specified `nnz`, one gets a CSR sample with the same `nnz` - variability of the number of specified columns per row is maximized - `crow_indices` content is randomized - a given row specific `col_indices` content is sorted and filled with unique values (see also https://github.com/pytorch/pytorch/issues/60277) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60283 Reviewed By: bhosmer Differential Revision: D29492605 Pulled By: cpuhrsch fbshipit-source-id: 8d875b7c2b0573a9ab37047c6d8fe8b540295ce1	2021-07-01 13:26:19 -07:00
Sam Estep	d5a44f9f12	Use expecttest from PyPI (#60658 ) Summary: This PR removes `torch/testing/_internal/expecttest.py` in favor of https://github.com/ezyang/expecttest. See also https://github.com/ezyang/ghstack/pull/71. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60658 Test Plan: CI. Reviewed By: ezyang Differential Revision: D29430763 Pulled By: samestep fbshipit-source-id: b7cdc7ba37330176149fd465312118e2254ae92e	2021-06-28 15:43:34 -07:00
Kushashwa Ravi Shrimali	08020220f3	[Testing] Adding reference tests to `OpInfo` class (#59369 ) Summary: This PR will ideally add `ref` argument to `OpInfo` base class. The idea is to add reference checks for all the ops _eligible_. For more discussion, please check https://github.com/pytorch/pytorch/issues/58294 * [x] Migrate (but not removing yet) and modify helper functions from `UnaryUfuncOpInfo` class to `OpInfo` base class. * [x] Test the reference checks for multiple ops. (also decide a list of different and eligible ops for this) * [x] Handle possible edge cases (for example: `uint64` isn't implemented in PyTorch but is there in NumPy, and this needs to be handled -- more on this later) -- _Update_: We decided that these reference tests should only test for values and not types. * [x] Create a sample PR for a single (of all different categories?) on adding reference functions to the eligible ops. -- _Update_: This is being done in this PR only. * [x] ~Remove reference tests from `test_unary_ufuncs.py` and test to make sure that nothing breaks.~ (Update: We won't be touching Unary Ufunc reference tests in this PR) * [x] Add comments, remove unnecessary prints/comments (added for debugging). Note: To keep the PR description short, examples of edge cases encountered have been mentioned in the comments below. cc: mruberry pmeier kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59369 Reviewed By: ngimel Differential Revision: D29347252 Pulled By: mruberry fbshipit-source-id: 69719deddb1d23c53db45287a7e66c1bfe7e65bb	2021-06-23 19:26:08 -07:00
Philip Meier	0c916c8a4e	up the priority of numpy array comparisons in self.assertEqual (#59067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58988. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067 Reviewed By: jbschlosser Differential Revision: D28986642 Pulled By: heitorschueroff fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0	2021-06-22 13:07:07 -07:00
Weiqiang Wu	6a87e8d087	Implement erfcx() (#58194 ) Summary: Implement erfcx() https://github.com/pytorch/pytorch/issues/31945 Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58194 Reviewed By: ngimel Differential Revision: D29285979 Pulled By: mruberry fbshipit-source-id: 5bcfe77fddfabbeb8c8068658ba6d9fec6430399	2021-06-22 12:38:38 -07:00
Peter Bell	45ae2e7863	Set TORCH_WARN_ONCE to always warn inside of assertNotWarn (#60020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60020 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29249909 Pulled By: mruberry fbshipit-source-id: 10a8d5c05bd8d4aec345f70b132efd3623601f6a	2021-06-21 21:35:54 -07:00
Rong Rong (AI Infra)	5921b5480a	ensure xml report path are relative to */pytorch/test (#60380 ) Summary: Changes the approach. Root cause of this is for some reason: `inspect.getfile` returns absolute path instead of relative path to `os.getcwd` in newer python version. we sanitize this by removing the CI_PREFIX if applies See: https://app.circleci.com/pipelines/github/pytorch/pytorch/339568/workflows/43cac71c-759e-471f-83c2-d59c152dcd8a/jobs/14278585 vs. https://app.circleci.com/pipelines/github/pytorch/pytorch/339568/workflows/43cac71c-759e-471f-83c2-d59c152dcd8a/jobs/14278285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60380 Test Plan: CI Plot twist: windows tests are actually launched via ``` pushd test python run_test.py ``` while linux/macos tests are ``` python test/run_test.py ``` This might cause problem when using `os.getcwd()` we will see from PR CI results. Reviewed By: malfet Differential Revision: D29276969 Pulled By: walterddr fbshipit-source-id: 336c2805d0c92733e0ff4c309ff2044dc2ed4e21	2021-06-21 20:47:23 -07:00
Rong Rong (AI Infra)	510334f34b	[BE] clean up IS_PYTORCH_CI and IN_CI (#60279 ) Summary: `IS_PYTORCH_CI` and `IN_CI` are used randomly, however in some cases IN_CI is not currently set because it only exist in .circleci/scripts/setup_ci_environment.sh. This cleans up the 2 flags and only use IN_CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/60279 Test Plan: CI Reviewed By: seemethere Differential Revision: D29239545 Pulled By: walterddr fbshipit-source-id: a069424a2bb8790a3adfdaf0dc460301026bf8c7	2021-06-20 19:45:07 -07:00
Philip Meier	d5988c5eca	remove unused `type: ignore` directives (#60006 ) Summary: During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern. With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006 Reviewed By: jbschlosser, malfet Differential Revision: D29133237 Pulled By: albanD fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a	2021-06-18 07:23:31 -07:00
Rong Rong (AI Infra)	b8ab98626b	only runs mem leak check on master (#60023 ) Summary: setting environment variable to only do cuda mem leak check on master CI jobs. See discussion in https://github.com/pytorch/pytorch/pull/59402#issuecomment-860773034 See stats before/after disabling mem leak check: https://github.com/pytorch/pytorch/pull/59942#issuecomment-860947095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60023 Test Plan: https://github.com/pytorch/pytorch/issues/60108 https://github.com/pytorch/pytorch/issues/60116 Reviewed By: janeyx99 Differential Revision: D29164182 Pulled By: walterddr fbshipit-source-id: dfe88c2c1275b6eb35f18b58aacdc220f34ccb59	2021-06-17 07:56:26 -07:00
Eli Uriegas	a62f6b6d04	ci: Add skipIfOnGHA util (#59748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59748 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D29008217 Pulled By: seemethere fbshipit-source-id: ffc2f7935df722f26c1252e3833085430ada7433	2021-06-09 21:19:26 -07:00
Jane Xu	97dfc7e300	[Reland] Adding run specified tests option to run_test.py (#59649 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/59487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59649 Reviewed By: samestep Differential Revision: D28970751 Pulled By: janeyx99 fbshipit-source-id: 6e28d4dcfdab8a49da4b6a02c57516b08bacd7b5	2021-06-08 16:04:46 -07:00
Rong Rong (AI Infra)	0208e604e3	seems os.environ.get() not working well on windows (#59634 ) Summary: replace with os.getenv() instead For some reason this was intermittently failing azure pipelines. I can't login to the pipeline itself for debugging but here are 2 examples: [successful](https://app.circleci.com/pipelines/github/pytorch/pytorch/332405/workflows/944609ad-5dcf-49da-984f-26c381d1f16c/jobs/13969059) vs [failed](https://app.circleci.com/pipelines/github/pytorch/pytorch/332518/workflows/21f8a5a6-3b95-432e-be42-ac98008c671b/jobs/13975637) However given the fact that the other common_utils.py exposed constants using `os.getenv()` was working. I am making them consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59634 Test Plan: CI/master Reviewed By: jbschlosser Differential Revision: D28966412 Pulled By: walterddr fbshipit-source-id: 7bcb9adf06df0acabd9574459eb6637c3e6a2947	2021-06-08 11:59:39 -07:00
Alban Desmaison	5d6a10a765	Revert D28913223: [pytorch][PR] Adding run-specified-test-cases option in run_test.py Test Plan: revert-hammer Differential Revision: D28913223 (`24432eaa29`) Original commit changeset: 0d1f99109734 fbshipit-source-id: 47c073720cff23a5d4cb64556381c46025e90937	2021-06-08 02:18:16 -07:00
Rong Rong (AI Infra)	57d8bccd00	only reorder tests based on git diff if IN_CI (#59565 ) Summary: Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565 Reviewed By: ejguan Differential Revision: D28943906 Pulled By: walterddr fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537	2021-06-07 17:54:19 -07:00
Jane Xu	24432eaa29	Adding run-specified-test-cases option in run_test.py (#59487 ) Summary: The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name. This PR also adds .json to some files we use for better clarity. Usage: `python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like: ``` test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750 test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729 test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825 test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825 test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805 test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487 Test Plan: Without specifying the option, everything should be as they were before. Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv Loading specified test cases to run from windows_smoke_tests.csv. Processed 28 test cases. Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781] s ---------------------------------------------------------------------- Ran 1 test in 0.000s OK (skipped=1) ... ``` With pytest, an example executable would be: `Running test_dataloader ... [2021-06-04 17:37:57.643039] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]` Reviewed By: samestep Differential Revision: D28913223 Pulled By: janeyx99 fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127	2021-06-07 16:27:43 -07:00
Natalia Gimelshein	344ecb2e71	flip via TI (#59509 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/58747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59509 Reviewed By: mruberry Differential Revision: D28918665 Pulled By: ngimel fbshipit-source-id: b045c7b35eaf22e53b1bc359ffbe5a4fda05dcda	2021-06-05 15:43:29 -07:00
Natalia Gimelshein	5117ac3bb4	Revert D28877076: [pytorch][PR] torch.flip via TI Test Plan: revert-hammer Differential Revision: D28877076 (`d82bc3feb8`) Original commit changeset: 4fa6eb519085 fbshipit-source-id: c81e7d3283ff6822db913bf9f49a1533268755d0	2021-06-04 23:03:53 -07:00
lezcano	d82bc3feb8	torch.flip via TI (#58747 ) Summary: Implements an idea by ngimel to improve the performance of `torch.flip` via a clever hack into TI to bypass the fact that TI is not designed to work with negative indices. Something that might be added is vectorisation support on CPU, given how simple the implementation is now. Some low-hanging fruits that I did not implement: - Write it as a structured kernel - Migrate the tests to opinfos - Have a look at `cumsum_backward` and `cumprod_backward`, as I think that they could be implemented faster with `flip`, now that `flip` is fast. Edit This operation already has OpInfos and it cannot be migrated to a structured kernel because it implements quantisation Summary of the PR: - x1.5-3 performance boost on CPU - x1.5-2 performance boost on CUDA - Comparable performance across dimensions, regardless of the strides (thanks TI) - Simpler code <details> <summary> Test Script </summary> ```python from itertools import product import torch from torch.utils.benchmark import Compare, Timer def get_timer(size, dims, num_threads, device): x = torch.rand(size, device=device) timer = Timer( "torch.flip(x, dims=dims)", globals={"x": x, "dims": dims}, label=f"Flip {device}", description=f"dims: {dims}", sub_label=f"size: {size}", num_threads=num_threads, ) return timer.blocked_autorange(min_run_time=5) def get_params(): sizes = ((1000,)2, (1000,)3, (10000,)2) for size, device in product(sizes, ("cpu", "cuda")): threads = (1, 2, 4) if device == "cpu" else (1,) list_dims = [(0,), (1,), (0, 1)] if len(size) == 3: list_dims.append((0, 2)) for num_threads, dims in product(threads, list_dims): yield size, dims, num_threads, device def compare(): compare = Compare([get_timer(*params) for params in get_params()]) compare.trim_significant_figures() compare.colorize() compare.print() compare() ``` </details> <details> <summary> Benchmark PR </summary> ![image](https://user-images.githubusercontent.com/3291265/119139954-81e46d80-ba3b-11eb-9aad-e825e515d41b.png) </details> <details> <summary> Benchmark master </summary> ![image](https://user-images.githubusercontent.com/3291265/119139915-76914200-ba3b-11eb-9aa8-84b3ca220c93.png) </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/58747 Reviewed By: agolynski Differential Revision: D28877076 Pulled By: ngimel fbshipit-source-id: 4fa6eb519085950176cb3a9161eeb3b6289ec575	2021-06-04 20:13:38 -07:00
albanD	e9e5588588	Improve Tensor traverse to traverse its grad_fn when possible (#58271 ) Summary: There are two main changes here: - THPVariable will actually visit their grad_fn if there are no other reference to the c++ Tensor and no other reference to the grad_fn. The critical observation compared to the existing comment (thanks Ed!) is that if we also check that the c++ Tensor object is not referenced somewhere else, we're sure that no one can change the grad_fn refcount between the traverse and the clear. - THPVariable don't need a special clear for this new cases as we're the only owner of the c++ Tensor and so the cdata.reset() will necessarily free the Tensor and all its resources. The two tests are to ensure: - That the cycles are indeed collectible by the gc Pull Request resolved: https://github.com/pytorch/pytorch/pull/58271 Reviewed By: ngimel Differential Revision: D28796461 Pulled By: albanD fbshipit-source-id: 62c05930ddd0c48422c79b03118db41a73c1355d	2021-06-01 10:27:52 -07:00
kshitij12345	ea465f7378	OpInfo: true_divide and minor fix (#59154 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59154 Reviewed By: ngimel Differential Revision: D28780115 Pulled By: mruberry fbshipit-source-id: 91e254698597fa0c7d4df6053ec017a85e180304	2021-05-30 18:35:30 -07:00
Kushashwa Ravi Shrimali	0c1420aa3c	OpInfo: `fmod` and `remainder` (#57941 ) Summary: See https://github.com/pytorch/pytorch/issues/54261 cc: mruberry Lezcano kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57941 Reviewed By: mrshenli Differential Revision: D28744464 Pulled By: mruberry fbshipit-source-id: 19847277d4f8d3a39a706c2b3c9eddf0dedcb20c	2021-05-27 20:32:56 -07:00
Meghan Lele	b14c3205fd	[JIT] Add torch._C.ScriptDict (#52659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52659 Summary This commit adds `torch._C.ScriptDict`, a dictionary type that has reference semantics across the Python/TorchScript boundary. That is, modifications made to instances of `torch._C.ScriptDict` in TorchScript are visible in Python even when it is not returned from the function. Instances can be constructed by passing an instance of a Python dictionary to `torch.jit.script`. In the case of an empty dictionary, its type is assumed to be `Dict[str, Tensor]` to be consistent with the handling of empty dictionaries in TorchScript source code. `torch._C.ScriptDict` is implemented using a modified version of pybind's `stl_bind.h`-style bindings attached to `ScriptDict`, `ScriptDictIterator` and `ScriptDictKeyIterator`, wrapper classes around `c10::impl::GenericDict` and `c10::impl::GenericDict::iterator`. These bindings allow instances of `torch._C.ScriptDict` to be used as if it were a regular `dict` Python. Reference semantics are achieved by simply retrieving the `IValue` contained in `ScriptDict` in `toIValue` (invoked when converting Python arguments to `IValues` before calling TorchScript code). Test Plan This commit adds `TestScriptDict` to `test_list_dict.py`, a set of tests that check that all of the common dictionary operations are supported and that instances have reference semantics across the Python/TorchScript boundary. Differential Revision: D27211605 D27211605 Test Plan: Imported from OSS Reviewed By: gmagogsfm Pulled By: SplitInfinity fbshipit-source-id: 446d4e5328375791aa73eb9e8b04dfe3465af960	2021-05-27 10:25:30 -07:00
Ivan Yashchuk	aaca12bcc2	Deprecate in docs torch.svd and change svd -> linalg_svd (#57981 ) Summary: This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549). In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772). Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981 Reviewed By: ngimel Differential Revision: D28345558 Pulled By: mruberry fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213	2021-05-11 18:04:10 -07:00
Rong Rong (AI Infra)	29753339b7	Do not download slow test when on sandcastle (#57953 ) Summary: Downloading slow_test list on SC causes timeout, this is even a bigger issue since `common_utils.py` is reused in many internal projects/modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57953 Test Plan: CI Reviewed By: janeyx99 Differential Revision: D28325527 fbshipit-source-id: ae47c9e43ad6f416008005bb26ceb2f3d6966f2e	2021-05-10 10:39:10 -07:00
Alexander	18c89a904b	Modernize test-suite in sparse tensor CSR (#56392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56392 Fixes for gh-56371 and gh-56369 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27913212 Pulled By: mruberry fbshipit-source-id: 2c78fe9fa4b6c6b566d9eb01f71e6016d672a545	2021-04-27 15:22:17 -07:00
Jeffrey Wan	d01302431c	Enable fast gradcheck for real inputs and outputs (#55237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55237 In this PR, we reenable fast-gradcheck and resolve misc issues that arise: Before landing this PR, land #55182 so that slow tests are still being run periodically. Bolded indicates the issue is handled in this PR, otherwise it is handled in a previous PR. Non-determinism issues: - ops that do not have deterministic implementation (as documented https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms) - test_pad_cuda (replication_pad2d) (test_nn) - interpolate (test_nn) - cummin, cummax (scatter_add_cuda_kernel) (test_ops) - test_fn_gradgrad_prod_cpu_float64 (test_ops) Randomness: - RRelu (new module tests) - we fix by using our own generator as to avoid messing with user RNG state (handled in #54480) Numerical precision issues: - jacobian mismatch: test_gelu (test_nn, float32, not able to replicate locally) - we fixed this by disabling for float32 (handled in previous PR) - cholesky_solve (test_linalg): #56235 handled in previous PR - cumprod (test_ops) - #56275 disabled fast gradcheck Not yet replicated: - test_relaxed_one_hot_categorical_2d (test_distributions) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27920906 fbshipit-source-id: 894dd7bf20b74f1a91a5bc24fe56794b4ee24656	2021-04-22 19:46:37 -07:00
Jeffrey Wan	5dcc7ac35c	Add new scheduled job to circle-ci workflow (#55182 ) Summary: Under this setting the job should run 3 times a day. When the environment variable, `PYTORCH_TEST_WITH_SLOW_GRADCHECK` is set to `ON`, set the default value for `fast_mode` in gradchack wrapper as False. This would be overriden by whatever value the user explicitly passes in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55182 Reviewed By: albanD Differential Revision: D27919236 Pulled By: soulitzer fbshipit-source-id: 3a55ec6edcfc6e65fbc3a8a09c63aaea1bd1c5bf	2021-04-21 17:05:10 -07:00
Ailing Zhang	59b61f912a	Switch assertWarnsOnceRegex logic to check any instead of all. (#56434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56434 If we hit multiple TORCH_WARN from different sources when running the statement, it makes more sense to me that we want to check the regex is met in any one of the warning messages instead of all messages. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27871946 Pulled By: ailzhang fbshipit-source-id: 5940a8e43e4cc91aef213ef01e48d506fd9a1132	2021-04-20 10:37:36 -07:00
Sam Estep	e3900d2ba5	Add lint for unqualified `noqa` (#56272 ) Summary: As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future. Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two: ``` test/jit/test_misc.py:27: print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999 test/jit/test_misc.py:28: print(f"format blank") # noqa F541 ``` However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored: - If you change them to anything else, the warnings will still be suppressed. - If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally: ``` test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment ``` I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2365189927 Reviewed By: janeyx99 Differential Revision: D27830127 Pulled By: samestep fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb	2021-04-19 13:16:18 -07:00
Can Balioglu	42f5d66080	[DDP] Fixes flaky tests caused by incorrect floating-point comparison (#56192 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50699. The root cause was that some floating-point assertions had a "greater than or equal to" condition. The "equal to" part was causing flakiness due to strict equality check (`==`) in `TestCase.assertGreaterEqual()`. This PR introduces a new assertion method called `assertGreaterAlmostEqual()` in `common_utils.py` that mitigates the problem by behaving similar to `TestCase.assertAlmostEqual()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56192 Reviewed By: zhaojuanmao Differential Revision: D27804724 Pulled By: cbalioglu fbshipit-source-id: bc44a41ca4ce45dfee62fb3769fb47bfd9028831	2021-04-15 17:15:42 -07:00
David Riazati	3c6b52ae62	Cache slow/disabled test files (#55682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55682 Fixes #55648 For now it downloads and writes the relevant files to the system's temp dir and marks it as valid for 3 hours. Test Plan: Imported from OSS Reviewed By: malfet, nikithamalgifb Differential Revision: D27685616 Pulled By: driazati fbshipit-source-id: 27469b85fe4b6b4addde6b22bf795bca3d4990ef	2021-04-12 09:17:07 -07:00
Mike Ruberry	399b66c813	Ports logdet from method_tests() to op_db (#55743 ) Summary: Per title. Also updates some tensor construction helpers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55743 Reviewed By: ngimel Differential Revision: D27702060 Pulled By: mruberry fbshipit-source-id: f64b7bee855733ad1f4fd182819ceec5831d9878	2021-04-11 20:39:16 -07:00
Alexander	6ee333cdb5	modernize test_sparse (#54572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54572 Adding device generic tests to `test_sparse`. Follow-up PR: #54153 I think is ready to review. Looking forward your comments cc mruberry. Thanks Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27562663 Pulled By: mruberry fbshipit-source-id: c48973e707f779b529bc7f61b75103194b428987	2021-04-09 12:19:29 -07:00
Jane Xu	2a24a2418a	common_utils.py use new file names for disabled/slow tests (#55620 ) Summary: Following these changes in renaming the files: https://github.com/pytorch/pytorch/pull/55618 https://github.com/pytorch/test-infra/pull/3 We should update the use sites in common_utils.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/55620 Reviewed By: samestep Differential Revision: D27651884 Pulled By: janeyx99 fbshipit-source-id: 298a981e55e0b7c95202294d9bc4b3fcce359590	2021-04-09 09:25:20 -07:00
Philip Meier	f4967d68f5	make torch.testing asserts importable (#54769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54769 Follow-up to #53820. This - makes the `asserts.py` module private as per suggestion from rgommers in https://github.com/pytorch/pytorch/pull/53820#issuecomment-802661387. With this the functions should only be accessible through `torch.testing`, giving us the option the change the underlying structure later. - moves the code from `torch/testing/__init__.py` to `torch/testing/_core.py` (happy to accept other name suggestions). Otherwise we can't import the new `_asserts.py` in `torch/testing/__init__.py` due to circular imports. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27438451 Pulled By: mruberry fbshipit-source-id: c7292b4d5709185b42b4aac8016648562688040e	2021-04-07 23:53:02 -07:00
Jane Xu	2e9eb5afa2	Use slow tests stats in common_utils (#55190 ) Summary: This is a step in adding automatic slowTest detection to our testing infrastructure. This uses stats (updated daily) in https://github.com/pytorch/test-infra/blob/master/stats/.pytorch-slow-tests to determine whether more tests need to be marked as slow as they are run. More details in previous PR draft/proposal [here](https://github.com/pytorch/pytorch/pull/54456#issue-598388491), though I no longer think we need the third step as using a raw git file does not require much processing. Upon looking at [logs](https://circleci.com/api/v1.1/project/github/pytorch/pytorch/12060292/output/107/0?file=true&allocation-id=606660dbd8e5857bcc2b2e0f-0-build%2F60DCA8CD) for the coverage tests as of the first commit [when I had not skipped the tests so we could see their actual times], here are some slow tests that weren't marked as slow before: ``` test_fn_gradgrad_unfold_cpu_complex128 (__main__.TestGradientsCPU) (172.554s) test_matmul_4d_4d_complex_cpu (__main__.TestAutogradDeviceTypeCPU) (180.057s) test_conv1d_basic (__main__.TestXNNPACKConv1dTransformPass) (94.737s) ``` And here is a test that wasn't actually slow but was still marked as slow based on stats: ``` test_trunc_normal (__main__.TestNNInit) ... ok (1.208s) ``` The new logs show the above tests as skipped (as they should be): [Coverage Test 1](https://app.circleci.com/pipelines/github/pytorch/pytorch/296224/workflows/ba6c2917-51f8-4fb8-be57-90151c2e5c25/jobs/12126156) and [Coverage Test 2](https://app.circleci.com/pipelines/github/pytorch/pytorch/296224/workflows/ba6c2917-51f8-4fb8-be57-90151c2e5c25/jobs/12126155) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55190 Reviewed By: samestep Differential Revision: D27566663 Pulled By: janeyx99 fbshipit-source-id: c13f8c676bb8eb15d9d697d224dbaef7df98aef3	2021-04-07 08:04:39 -07:00
lezcano	fd02fc5d71	Port put_ and take from TH to ATen (#53356 ) Summary: The two ports were don together, as they can be implemented with the same kernel. In TH, they were already implemented with the same kernel. Resolves https://github.com/pytorch/pytorch/issues/24751 Resolves https://github.com/pytorch/pytorch/issues/24614 Resolves https://github.com/pytorch/pytorch/issues/24640 Resolves https://github.com/pytorch/pytorch/issues/24772 This port makes sure that it interacts correctly with the "deterministic algorithms" flag, as done in https://github.com/pytorch/pytorch/pull/51388 This PR also makes these two functions correct in the following aspects (all of them added to the tests as well): - Support for complex numbers - Correct handling of scalar inputs and zero-dimensional inputs - Implementation that does not do any copies nor sorting of any of the input tensors - Faster and more correct implementation of the backwards (now it works as it should when `source.shape() != index.shape()`) - Now `put_(..., accumulate=True)` is implemented correctly with atomic operations on GPU / CPU (when possible) and is deterministic (modulo the loss of precision that might happen due to the reordering of a sum of floats) - Adds the `torch.put` function that was missing, (`index_put` exists, for example) - Corrected docs It also adds a much more thorough testing to the operations and their gradients. There is a BC-breaking change, and that is that now we check that the inputs do not overlap in the `put_` operation. This was handled (some of the cases, other cases were wrong) in the TH implementation by making contiguous copies of the inputs. How should we handle this one? Edit. Benchmarks: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch from itertools import product torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device, cmd): print(f"cmd: {cmd}, ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") large_tensor = torch.rand(([size] ndims), device=device) small_tensor = torch.rand((index_len,), device=device) index = torch.randint(size * ndims, (index_len,), dtype=torch.long, device=device) if cmd == "put": command = "large_tensor.put_(index, small_tensor, accumulate=False)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "accumulate": command = "large_tensor.put_(index, small_tensor, accumulate=True)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "take": command = "torch.take(large_tensor, index)" if device == cuda: command += "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() for method, device in product(["accumulate", "put", "take"], [cpu, cuda]): run_test(3, 1000, 10, device, method) run_test(3, 1000, 1000, device, method) run_test(3, 1000, 10000, device, method) run_test(2, 10000, 100000, device, method) ``` </details> ```python put_(accumulate=False) ``` <details> <summary>ATen CPU (1.5x - 2x speedup)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.05 µs ± 2.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.15 µs ± 5.13 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 21.6 µs ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 238 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 722 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 4.89 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 42.5 µs ± 96.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 428 µs ± 774 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.99 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 24.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.4 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.6 µs ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.44 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.09 µs ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.77 µs ± 0.998 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.8 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> ```python put_(accumulate=True) ``` <details> <summary>ATen CPU (x2 speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.12 µs ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.14 µs ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 20.8 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 264 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 814 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 5.11 µs ± 6.02 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 43.9 µs ± 49.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 442 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (3x - 11x speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.01 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.3 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 12.6 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 34.7 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 38.2 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 61.2 µs ± 50.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 140 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> ```python take() ``` <details> <summary>ATen CPU (1.1x speedup)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.18 µs ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.79 µs ± 2.96 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 16.6 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 161 µs ± 984 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.1 µs ± 3.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.93 µs ± 7.31 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 18.6 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 178 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.38 µs ± 23.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.7 µs ± 9.77 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.6 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.5 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.31 µs ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.52 µs ± 5.78 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.73 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.7 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/53356 Reviewed By: mruberry Differential Revision: D27520243 Pulled By: ngimel fbshipit-source-id: e3979349c2c62d2949e09fb05e5fd4883fbc9093	2021-04-05 18:05:38 -07:00
Heitor Schueroff	6d87b3667f	Added support for TensorList inputs in OpInfo (#54922 ) Summary: Stack: * https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs * __#54922 Added support for TensorList inputs in OpInfo__ Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck. Note: JIT testing support for TensorList inputs will be added in a follow up PR. Fixes https://github.com/pytorch/pytorch/issues/51996 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922 Reviewed By: H-Huang Differential Revision: D27448952 Pulled By: heitorschueroff fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278	2021-03-31 04:42:10 -07:00
Edward Yang	b5ab348253	Fix missing format string qualifier (#54705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54705 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27338808 Pulled By: ezyang fbshipit-source-id: b21c931c2306e525bc444766bc203bb303868dbf	2021-03-27 11:55:36 -07:00
Nikita Vedeneev	61b074581c	`torch.prod` backward for complex types. (#48125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53511 torch.det does depend on torch.prod, which in turn depends on several other functions, and they also depend on torch.prod, so there is a circular relationship, hence this PR will enable complex backward support for several functions at once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48125 Reviewed By: pbelevich Differential Revision: D27188589 Pulled By: anjali411 fbshipit-source-id: bbb80f8ecb83a0c3bea2b917627d3cd3b84eb09a	2021-03-19 09:44:08 -07:00
Edward Yang	a2a7179695	Fix bug in assertRaises NotImplemented handling when no exception is thrown (#54126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54126 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: agolynski, mruberry Differential Revision: D27109510 Pulled By: ezyang fbshipit-source-id: ba5a4de85ca00f81724f3d4e645797e8f32aa3b1	2021-03-17 12:30:51 -07:00
Edward Yang	c2f41b6b84	Add meta device to generic device testing framework, skip NotImplementedError (#53682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53682 With this, under the meta device, 101 tests passed and 16953 skipped. It ain't much, but it's a start. Some various bits and bobs: - NotImplementedError suppression at test level is implemented in the same way as CUDA memory leak check, i.e., by wrapping test methods and monkeypatching them back in. - I had to reimplement assertRaises/assertRaisesRegex from scratch to ignore NotImplementedError when _ignore_not_implemented_error is True. The implementation relies on a small amount of private API that hasn't changed since 2010 - expectedAlertNondeterministic doesn't really work so I skipped them all; there's probably a way to do it better I tested this using `pytest --disable-warnings --tb=native -k meta --sw test/*.py` and a pile of extra patches to make collection actually work (lol). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26955539 Pulled By: ezyang fbshipit-source-id: ac21c8734562497fdcca3b614a28010bc4c03d74	2021-03-14 20:41:19 -07:00
Edward Yang	d47d246206	Add 'noarch' tests which only run in one CI config (#53747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53747 Fixes #53743 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26971343 Pulled By: ezyang fbshipit-source-id: cee7aa10063ae674f741406a3af830e4b4f128df	2021-03-14 20:39:07 -07:00

1 2 3 4

178 Commits