pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xia, Weiwen	3a3e2002d8	[Quant] Add unified x86 quant backend (#84329 ) ## Description Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) ## Validation Correctness Covered by UT Accuracy By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) Performance Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput \| fbgemm \| x86 \| improvement -- \| -- \| -- \| -- wide_resnet101_2 \| 173.5675 \| 241.815 \| 39.32% resnext101_32x8d \| 174.365 \| 339.8175 \| 94.89% resnet50 \| 573.155 \| 1174.14 \| 104.86% vgg19_bn \| 260.335 \| 337.92 \| 29.80% vgg19 \| 257.935 \| 333.265 \| 29.21% inception_v3 \| 601.1175 \| 1309.33 \| 117.82% densenet161 \| 296.645 \| 435.5625 \| 46.83% mnasnet1_0 \| 1216.7 \| 4057.515 \| 233.49% squeezenet1_0 \| 1220.085 \| 5153.3875 \| 322.38% alexnet \| 2294.91 \| 2624.6375 \| 14.37% fbnetc_100 \| 976.2825 \| 3110.1825 \| 218.57% shufflenet_v2_x0_5 \| 1555.76 \| 3026.125 \| 94.51% spnasnet_100 \| 1059.065 \| 3502.0975 \| 230.68% pytorch-unet \| 192.76 \| 246.77 \| 28.02% acgan \| 257.32 \| 333.7325 \| 29.70% cgan \| 7790.6925 \| 7803.1025 \| 0.16% sgan \| 257.565 \| 338.8875 \| 31.57% se_resnet50 \| 492.3725 \| 916.5175 \| 86.14% vggm \| 300.2875 \| 316.2075 \| 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168	2022-09-29 00:44:40 +00:00
Nikita Karetnikov	8dd45424ea	[primTorch] Add ref for `huber_loss` and error inputs (#85041 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85041 Approved by: https://github.com/lezcano, https://github.com/mruberry	2022-09-28 19:56:17 +00:00
PyTorch MergeBot	a0b1693996	Revert "Update `amax/amin/norm/count_nonzero` signatures with `int[*]? dim` (#83300 )" This reverts commit `1c0f0b33a0`. Reverted https://github.com/pytorch/pytorch/pull/83300 on behalf of https://github.com/jeffdaily due to The commit breaks nvfuser tests	2022-09-28 17:04:53 +00:00
Eddie Yan	b04b2fa9aa	[CUBLAS][CUDA GRAPHS] (re-re-open of #83461 ) Explicitly set the workspace for cuBLAS handles (#85447 ) Now includes @dagitses 's optimizations and fixes for teardown CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85447 Approved by: https://github.com/malfet	2022-09-28 16:04:58 +00:00
Andres Lugo-Reyes	5709c67f1f	[ROCm] Retry loop implemented to avoid transient memory leak errors (#82607 ) ### Description Added a retry loop to memory leak checker to avoid rare case in which ROCM reports a false positive memory leak. ### Issue Original issue observed as part of this ticket: https://github.com/pytorch/pytorch/issues/62533 ### Testing - Applied changes and built - python test/test_cuda.py - Ensure all tests pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/82607 Approved by: https://github.com/malfet	2022-09-28 15:48:24 +00:00
Jeff Daily	0b251d985d	skip test TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32 (#85767 ) This test was marked as expected failure, but this test is flaky for ROCm but only because ROCm sometimes gets expected success. The test was only marked expected failure due to non-determinism that was already well-known. See the nearby comments. `a4c94f0739/torch/testing/_internal/common_methods_invocations.py (L11410-L11421)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85767 Approved by: https://github.com/clee2000	2022-09-28 14:05:02 +00:00
Thomas Viehmann	795028a3ce	Make Python reference for permute accept varargs (#85460 ) Fixes #85452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85460 Approved by: https://github.com/jjsjann123, https://github.com/mruberry, https://github.com/ngimel	2022-09-28 03:50:42 +00:00
Kurt Mohler	1c0f0b33a0	Update `amax/amin/norm/count_nonzero` signatures with `int[]? dim` (#83300 ) Changes `dim` arg to use `int[]?` type for the following functions in `native_funcitons.yaml`: * `amax` * `amin` * `norm` * `frobenius_norm` * `native_norm` * `count_nonzero` Part of #29137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83300 Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/kulinseth	2022-09-28 01:56:37 +00:00
PyTorch MergeBot	572dd862c4	Revert "Update `amax/amin/norm/count_nonzero` signatures with `int[*]? dim` (#83300 )" This reverts commit `8c7c7ed322`. Reverted https://github.com/pytorch/pytorch/pull/83300 on behalf of https://github.com/huydhn due to The commit pin breaks XLA test somehow	2022-09-28 01:36:43 +00:00
Kurt Mohler	8c7c7ed322	Update `amax/amin/norm/count_nonzero` signatures with `int[]? dim` (#83300 ) Changes `dim` arg to use `int[]?` type for the following functions in `native_funcitons.yaml`: * `amax` * `amin` * `norm` * `frobenius_norm` * `native_norm` * `count_nonzero` Part of #29137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83300 Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/kulinseth	2022-09-27 23:50:04 +00:00
Ke Wen	775a22c7c6	Add all_gather_into_tensor in place of _all_gather_base (#85686 ) ### Description - This PR renames `_all_gather_base` to `all_gather_into_tensor` so that it is clearer in meaning. - The `all_gather_into_tensor` API differs from the `all_gather` API in the output it accepts -- a single, large tensor instead of a list of tensors. - This PR also adds deprecation warning to `_all_gather_base`. ### Issue `_all_gather_base` was implemented in https://github.com/pytorch/pytorch/pull/33924 to avoid unnecessary flattening. There was previous effort (#82639) to merge `_all_gather_base` with the existing `all_gather` API by detecting the parameter type passed in for the output. There are, however, two "blockers" that make the merge difficult: (i) The merge leads to backward compatibility break. We would need to change the parameter name `tensor_list` in `all_gather` to a general name `output` that can cover both tensor and tensor list. (ii) Recently, the `all_gather` API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the `_all_gather_base` function, because that would require users to pass in additional tensor boundary information. In view of the above, we decided to productize `_all_gather_base` as a separate function, but with a clearer name. ### Testing Added tests: - `test_all_gather_into_cat_tensor_cuda` -- output form as with `torch.cat`. For example: ``` >>> tensor_in tensor([1, 2], device='cuda:0') # Rank 0 tensor([3, 4], device='cuda:1') # Rank 1 >>> tensor_out tensor([1, 2, 3, 4], device='cuda:0') # Rank 0 tensor([1, 2, 3, 4], device='cuda:1') # Rank 1 ``` - `test_all_gather_into_stack_tensor_cuda` -- output form as with `torch.stack`. For example: ``` >>> tensor_out2 tensor([[1, 2], [3, 4]], device='cuda:0') # Rank 0 tensor([[1, 2], [3, 4]], device='cuda:1') # Rank 1 ``` The output form is determined by the shape of the output tensor passed by the user, no flag used. Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85686 Approved by: https://github.com/rohan-varma, https://github.com/crcrpar	2022-09-27 22:50:22 +00:00
Rodrigo Kumpera	5cbffbbac9	C10D extension to enable per-thread PG (#84153 ) Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84153 Approved by: https://github.com/rohan-varma	2022-09-27 21:42:31 +00:00
Peter Bell	b656ba0b11	Use hexfloat for threshold OpInfo tests (#85676 ) 0.123 isn't exactly representable as a floating point value, and so the threshold will move marginally depending on the data type where the computation is performed. This leads to a rare flake in tests comparing against a reference implementation. Instead, this chooses a threshold which is exactly representable as a bfloat16 value and thus has the same value for all data types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85676 Approved by: https://github.com/ngimel	2022-09-27 16:44:46 +00:00
Peter Bell	fdef507897	Simplify noncontiguous_like (#85518 ) This removes the special casing for zero-dim tensors and also uses indexing instead of manual stride manipulations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85518 Approved by: https://github.com/albanD	2022-09-27 16:44:46 +00:00
soulitzer	15c52ffc4f	Disallow auto_element_wise for in-place and fix some in-place gradients (#85634 ) Fixes https://github.com/pytorch/pytorch/issues/85535 Also fixes the backward and forward gradients of `nn.functional.threshold`. The issue was that in-place gradients weren't tested because the in-place variants were not properly registered to the OpInfo. Perhaps an alternative to this to make auto_element_wise smart enough to actually handle the in-places cases (we have 4 cases total now where we manually copy_ after doing auto_element_wise), but that requires a few more changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85634 Approved by: https://github.com/albanD	2022-09-27 15:35:24 +00:00
Rodrigo Kumpera	54e03cdda9	Don't use a fixed name to avoid race conditions. (#84952 ) Fixes #84886 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84952 Approved by: https://github.com/rohan-varma	2022-09-27 14:45:58 +00:00
samdow	18d8c548f4	[Modes] remove enable and rewrite mode stack (squashed) (#84774 ) Based on @ezyang's suggestion, mode stack now has "one true mode" which is the _only_ mode that can ever be active at the C++ level. That mode's torch dispatch is just to take the top mode in the stack, reenable itself (if we aren't at the end of the mode stack), and run the top mode's torch_{dispatch\|function} This maintains that in the middle of a mode's torch dispatch, the mode itself will not be active. It changes the function the user has to call to see what the current mode is (no longer queries the C++, it's python only) but allows the user to also see the entire mode stack easily Removes `enable_torch_dispatch_mode` and `.restore()` since neither makes sense in this new setup ### Background Why do we want this? Well, a pretty common pattern that was coming up was that users had to do something like ```python ## PRE-PR UX def f(mode): with mode.restore(): # user needs to understand this restore thing? ... with Mode() as m: pass f(m) ``` Many users were getting error from forgetting to call `.restore` or from forgetting to add the (tbh weird) "mode instantiation" step where they use the mode as a context manager with an empty body. Really, they wanted to treat modes like context managers and just write ```python ## FROM FEEDBACK, USER DESIRED CODE. POSSIBLE POST-PR def f(mode): with mode: ... f(Mode()) ``` Technical Details With the old mode stack, we basically had a linked list so the mode itself could only be used once and had a fixed parent. In this new design, the mode stack is just a python list that we're pushing to and popping from. There's only one mode that's ever active at the C++ level and it runs the next mode in the Python list. The modes don't have state on them anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/84774 Approved by: https://github.com/ezyang, https://github.com/zou3519	2022-09-27 01:04:35 +00:00
George Qi	686555b663	[maskedtensor] port torch/_masked into torch/masked (#85515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515 Approved by: https://github.com/cpuhrsch	2022-09-26 23:41:13 +00:00
Daniel Dale	15435325eb	Configure PyTorch Testing ArgumentParser Instance To Avoid Unnecessary Conflicts with System Args (#85616 ) Fixes #85615 Currently, internal test discovery instantiates an `ArgumentParser` and adds numerous arguments to the internal parser: `f0570354dd/torch/testing/_internal/common_utils.py (L491-L500)` ... In this context, `argparse` will load [system args](`b494f5935c/Lib/argparse.py (L1826-L1829)`) from any external scripts invoking PyTorch testing (e.g. `vscode`). The default behavior of `argparse` is to [allow abbreviations](`b494f5935c/Lib/argparse.py (L2243-L2251)`) of arguments, but when an `ArgumentParser` instance has many arguments and may be invoked in the context of potentially conflicting system args, the `ArgumentParser` should reduce the potential for conflicts by being instantiated with `allow_abbrev` set to `False`. With the current default configuration, some abbreviations of the `ArgumentParser` long options conflict with system args used by `vscode` to invoke PyTorch test execution: ```bash python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \ ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/visualstudio_py_testlauncher.py \ --us=./test --up=test_cuda.py --uvInt=2 -ttest_cuda.TestCuda.test_memory_allocation \ --testFile=./test/test_cuda.py >>>PYTHON-EXEC-OUTPUT ... visualstudio_py_testlauncher.py: error: argument --use-pytest: ignored explicit argument './test' ``` The full relevant stack: ``` pytorch/test/jit/test_cuda.py, line 11, in <module>\n from torch.testing._internal.jit_utils import JitTestCase\n'\ pytorch/torch/testing/_internal/jit_utils.py, line 18, in <module>\n from torch.testing._internal.common_utils import IS_WINDOWS, \\\n' pytorch/torch/testing/_internal/common_utils.py, line 518, in <module>\n args, remaining = parser.parse_known_args()\n' argparse.py, line 1853, in parse_known_args\n namespace, args = self._parse_known_args(args, namespace)\n' argparse.py, line 2062, in _parse_known_args\n start_index = consume_optional(start_index)\n' argparse.py, line 1983, in consume_optional\n msg = _(\'ignored explicit argument %r\')\n' ``` The `argparse` [condition](`b494f5935c/Lib/argparse.py (L2250)`) that generates the error in this case: ```python print(option_string) --use-pytest print(option_prefix) --us option_string.startswith(option_prefix) True ``` It'd be nice if `vscode` didn't use two-letter options 🤦 but PyTorch testing shouldn't depend on such good behavior by invoking wrappers IMHO. I haven't seen any current dependency on the abbreviated internal PyTorch `ArgumentParser` options so this change should only extend the usability of the (always improving!) PyTorch testing modules. This simple PR avoids these conflicting options by instantiating the `ArgumentParser` with `allow_abbrev=False` Thanks to everyone in the community for their continued contributions to this incredibly valuable framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85616 Approved by: https://github.com/clee2000	2022-09-26 20:17:52 +00:00
Elias Ellison	bcc544e9d7	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-26 17:08:14 +00:00
Daniel Dale	6e50f8e395	Allow External Scripts (e.g. vscode) To Discover and Execute unittest Tests (#85584 ) Fixes #85578 Currently, many test modules customize test loading and discovery via the [load_tests protocol](https://docs.python.org/3/library/unittest.html#load-tests-protocol). The salient custom behavior included (introduced with https://github.com/pytorch/pytorch/pull/13250) is to verify that the script discovering or executing the test is the same script in which the test is defined. I believe this unnecessarily precludes the use of external tools to discover and execute tests (e.g. the vscode testing extension is widely used and IMHO quite convenient). This simple PR retains the current restriction by default while offering users the option to disable the aforementioned check if desired by setting an environmental variable. For example: 1. Setup a test env: ```bash ./tools/nightly.py checkout -b some_test_branch conda activate pytorch-deps conda install -c pytorch-nightly numpy expecttest mypy pytest hypothesis astunparse ninja pyyaml cmake cffi typing_extensions future six requests dataclasses -y ``` 2. The default test collection behavior discovers 5 matching tests (only tests within `test/jit/test_cuda.py` because it doesn't alter the default `load_test` behavior: ```bash python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \ ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/testing_tools/unittest_discovery.py \ ./test test_cuda.py \| grep test_cuda \| wc -l 5 ``` 3. Set the new env variable (in vscode, you would put it in the .env file) ```bash export PYTORCH_DISABLE_RUNNING_SCRIPT_CHK=1 ``` 4. All of the desired tests are now discovered and can be executed successfully! ```bash python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \ ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/testing_tools/unittest_discovery.py \ ./test test_cuda.py \| grep test_cuda \| wc -l 175 ``` ![image](https://user-images.githubusercontent.com/7462936/192068508-a292caaf-a1d2-4115-a557-02ac5da80b60.png) A potentially relevant note, the previous behavior of the custom `load_tests` flattened all the `TestSuite`s in each test module: `4c01c51266/torch/testing/_internal/common_utils.py (L3260-L3262)` I haven't been able to find any code that depends upon this behavior but I think retaining the `TestSuite` structure is preferable from a user perspective and likely safe (`TestSuite`s [can be executed](https://docs.python.org/3/library/unittest.html#load-tests-protocol:~:text=test%20runner%20to-,allow%20it%20to%20be%20run,-as%20any%20other) just like `TestCase`s and this is the structure [recommended](https://docs.python.org/3/library/unittest.html#load-tests-protocol:~:text=provides%20a%20mechanism%20for%20this%3A%20the%20test%20suite) by the standard python documentation). If necessary, I can change this PR to continue flattening each test module's `TestSuite`s. Since I expect external tools using the `unittest` `discover` API will usually assume discovered `TestSuite`s to retain their structure (e.g. like [vscode](`192c3eabd8/pythonFiles/visualstudio_py_testlauncher.py (L336-L349)`)) retaining the `testsuite` flattening behavior would likely require customization of those external tools for PyTorch though. Thanks to everyone in the community for the continued contributions to this incredibly valuable framework! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85584 Approved by: https://github.com/huydhn	2022-09-25 21:54:05 +00:00
Wang, Eikan	a531a604a0	Support BF16ImmPtr (#84041 ) - To support BF16 Immediate value by converting it to uint16. The behavior is as same as BF16 tensor - Enable BF16 test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84041 Approved by: https://github.com/ZolotukhinM	2022-09-24 11:58:43 +00:00
Huy Do	4d3acf1203	Enable pytest-shard for functorch (#85321 ) This extends https://github.com/pytorch/pytorch/pull/84961 to support functorch tests with pytest-shard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85321 Approved by: https://github.com/samdow, https://github.com/clee2000	2022-09-24 01:17:04 +00:00
dzdang	ea81138bd6	[quant][improvement][better-engineering] Refactored get_supported_device_types into common_quantization.py (#79607 ) Summary: Both test_quantized_tensor.py and test_quantize_fx.py had the same get_supported_device_types function defined. This PR refactors it into the common_quantization.py file for common usage Test Plan: ``` python test/test_quantization.py ``` Differential Revision: [D37173692](https://our.internmc.facebook.com/intern/diff/D37173692) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79607 Approved by: https://github.com/jerryzh168	2022-09-23 23:32:18 +00:00
Edward Z. Yang	604487f239	OpInfo for Slice (#85554 ) This is based on wconstab tests from #84680 Technically, slice is covered by the __getitem__ opinfo, but it is easier to debug/test on a more narrow internal function that only uses this functionality and not other advanced indexing stuff. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/85554 Approved by: https://github.com/mruberry, https://github.com/wconstab	2022-09-23 22:01:32 +00:00
Kshiteej K	bc6dc8d271	[fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330 ) Ref: #69991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85330 Approved by: https://github.com/zou3519	2022-09-23 21:40:07 +00:00
Catherine Lee	49e10c1598	[ci] test_ops in parallel, ci tests log to file (#85528 ) part one of splitting up https://github.com/pytorch/pytorch/pull/84961 into (probably 2) parts contains * logging to file * testing test_ops in parallel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85528 Approved by: https://github.com/huydhn	2022-09-23 20:45:20 +00:00
PyTorch MergeBot	d10de31cc8	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `78afa0cf0c`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk `78afa0cf0c`	2022-09-23 17:21:43 +00:00
Elias Ellison	78afa0cf0c	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-23 15:50:03 +00:00
Driss Guessous	253ffbf28b	Exposing native _scaled_dot_product_attention to torch.nn (#85044 ) # Summary This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the next few weeks and will eventually land as a prototype feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85044 Approved by: https://github.com/cpuhrsch	2022-09-22 16:30:16 +00:00
PyTorch MergeBot	5043457a8e	Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 )" This reverts commit `9c77083965`. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) `9c77083965`	2022-09-22 15:44:38 +00:00
Elias Ellison	9c77083965	Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang	2022-09-22 13:03:57 +00:00
kshitij12345	56a41b5998	[composite compliance] ctc_loss (#84752 ) #Ref #69991 I have mixed feelings about adding new (private) operators. Backends writers will have to override them as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84752 Approved by: https://github.com/zou3519	2022-09-22 00:21:11 +00:00
PyTorch MergeBot	3dce26635f	Revert "test in parallel at file granularity (#84961 )" This reverts commit `8107666c6a`. Reverted https://github.com/pytorch/pytorch/pull/84961 on behalf of https://github.com/clee2000 due to makes test_forward_ad_nn_functional_max_unpool2d_cuda_float32 flakily unexpectedly pass	2022-09-21 20:21:25 +00:00
Thomas Viehmann	764cba6848	add Python ref for isreal (#85361 ) Dipping my toes into prims waters Pull Request resolved: https://github.com/pytorch/pytorch/pull/85361 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-09-21 18:53:34 +00:00
Ivan Yashchuk	35943f30cb	Reference implementation for torch.Tensor.sum_to_size (#85338 ) New ref: `torch._refs.sum_to_size`. View consistency validation is disabled because the ref returns a view instead of returning the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85338 Approved by: https://github.com/mruberry	2022-09-21 18:12:52 +00:00
PyTorch MergeBot	0217a8d049	Revert "[fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330 )" This reverts commit `d3dec8097b`. Reverted https://github.com/pytorch/pytorch/pull/85330 on behalf of https://github.com/dagitses due to a PR this is based on got reverted, rebase and reland	2022-09-21 18:02:50 +00:00
PyTorch MergeBot	0ac6311356	Revert "[CUBLAS][CUDA GRAPHS] (re-open of #83461 ) Explicitly set the workspace for cuBLAS handles (#85292 )" This reverts commit `4012e623e8`. Reverted https://github.com/pytorch/pytorch/pull/85292 on behalf of https://github.com/dagitses due to broke an internal test during shutdown. Re-submit with #85399 in stack	2022-09-21 17:57:49 +00:00
lezcano	2a88f1b2d8	Land "Make ceil,floor,round,trunc handle integers" (#85144 ) PR to land https://github.com/pytorch/pytorch/pull/78480, as Rohit does not work in the PyTorch project anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/85144 Approved by: https://github.com/ngimel, https://github.com/mruberry	2022-09-21 17:23:47 +00:00
Catherine Lee	8107666c6a	test in parallel at file granularity (#84961 ) run tests in parallel at the test file granularity runs 3 files in parallel using multiprocessing pool, output goes to a file, which is then printed when the test finishes. Some tests cannot be run in parallel (usually due to lacking memory), so we run those after. Sharding is changed to attempt to mask large files with other large files/run them on the same shard. test_ops* gets a custom handler to run it because it is simply too big (2hrs on windows) and linalg_cholesky fails (I would really like a solution to this if possible, but until then we use the custom handler). reduces cuda tests by a lot, reduces total windows test time by ~1hr Ref. https://github.com/pytorch/pytorch/issues/82894 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84961 Approved by: https://github.com/huydhn	2022-09-21 16:58:11 +00:00
Andrew Gu	125e9256f4	[FSDP] Add back `forward_prefetch` (#85177 ) - This implements explicit forward prefetching following the static 1st iteration's pre-forward order when `forward_prefetch=True` in the FSDP constructor. - This has the same unit test coverage as the original `forward_prefetch`. - I checked via print statements that the prefetches are happening, but since I cannot get a good CPU bound workload, it is hard to tell via traces that the prefetch is working. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85177 Approved by: https://github.com/zhaojuanmao	2022-09-21 14:40:37 +00:00
Khushi Agrawal	563b065f5a	[fix] rrelu, rrelu_, & RReLU when lower bound > upper bound (#84996 ) Fixes #83160 cc @kshitij12345 @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/84996 Approved by: https://github.com/mruberry, https://github.com/albanD	2022-09-21 13:57:16 +00:00
Ivan Yashchuk	308b26fe4d	Add nvFuser support for transpose (#84629 ) `torch._refs.t`, `torch._refs.transpose`, `torch._refs.permute` are all should be working now with nvFuser executor. It would also work with graphs processed by AOT Autograd as these functions are registered to the aten->ref mapping via the "register_decomposition" decorator: `07d398fb26/torch/_refs/__init__.py (L3125-L3126)` `07d398fb26/torch/_refs/__init__.py (L3143-L3144)` `07d398fb26/torch/_refs/__init__.py (L2548-L2549)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84629 Approved by: https://github.com/ngimel	2022-09-21 12:45:15 +00:00
Horace He	2f4a517d67	Ported matmul compositeimplicitautograd impl into core (#85239 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85239 Approved by: https://github.com/ezyang, https://github.com/lezcano	2022-09-21 09:25:24 +00:00
PyTorch MergeBot	a3dc338ee1	Revert "Exposing native _scaled_dot_product_attention to torch.nn (#85044 )" This reverts commit `9fdd8a8b7f`. Reverted https://github.com/pytorch/pytorch/pull/85044 on behalf of https://github.com/huydhn due to This breaks CUDA 10.2 in trunk. We are deprecating CUDA 10.2, but it is still here in the mean time	2022-09-21 08:34:51 +00:00
Driss Guessous	9fdd8a8b7f	Exposing native _scaled_dot_product_attention to torch.nn (#85044 ) # Summary This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the next few weeks and will eventually land as a prototype feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85044 Approved by: https://github.com/cpuhrsch	2022-09-21 03:09:08 +00:00
jjsjann123	b9b27f7664	Added `Tensor.to` overloads to `torch._refs.to` (#84802 ) Fixes #84264 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84802 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-09-20 18:52:02 +00:00
kshitij12345	d3dec8097b	[fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330 ) Ref: #69991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85330 Approved by: https://github.com/zou3519	2022-09-20 18:18:39 +00:00
lezcano	d17b144e65	Adding multigammaln ref and fix arange (#85153 ) Partially based on https://github.com/pytorch/pytorch/pull/83662. I'll help land this one, as Rob does not work in the PyTorch project anymore I removed the data-dependent check for the args, as data dependencies are bad for many reasons (and it was failing when the input has NaNs). It also registers arange as a decomposition, and fixes the naming of its args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85153 Approved by: https://github.com/mruberry, https://github.com/ngimel	2022-09-20 17:52:56 +00:00
Sherlock Huang	9c1a6a522d	Make ones and zeros's ref accepts variadic size argument (#85117 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85117 Approved by: https://github.com/ngimel, https://github.com/lezcano	2022-09-20 16:41:30 +00:00

1 2 3 4 5 ...

3203 Commits