Commit Graph

3203 Commits

Author SHA1 Message Date
Xia, Weiwen
3a3e2002d8 [Quant] Add unified x86 quant backend (#84329)
## Description

Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM.

For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888)

## Validation
**Correctness**
Covered by UT

**Accuracy**
By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend:
[torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx)

**Performance**
Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance.
For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx

With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP.
Models/throughput | fbgemm | x86 | improvement
-- | -- | -- | --
wide_resnet101_2 | 173.5675 | 241.815 | 39.32%
resnext101_32x8d | 174.365 | 339.8175 | 94.89%
resnet50 | 573.155 | 1174.14 | 104.86%
vgg19_bn | 260.335 | 337.92 | 29.80%
vgg19 | 257.935 | 333.265 | 29.21%
inception_v3 | 601.1175 | 1309.33 | 117.82%
densenet161 | 296.645 | 435.5625 | 46.83%
mnasnet1_0 | 1216.7 | 4057.515 | 233.49%
squeezenet1_0 | 1220.085 | 5153.3875 | 322.38%
alexnet | 2294.91 | 2624.6375 | 14.37%
fbnetc_100 | 976.2825 | 3110.1825 | 218.57%
shufflenet_v2_x0_5 | 1555.76 | 3026.125 | 94.51%
spnasnet_100 | 1059.065 | 3502.0975 | 230.68%
pytorch-unet | 192.76 | 246.77 | 28.02%
acgan | 257.32 | 333.7325 | 29.70%
cgan | 7790.6925 | 7803.1025 | 0.16%
sgan | 257.565 | 338.8875 | 31.57%
se_resnet50 | 492.3725 | 916.5175 | 86.14%
vggm | 300.2875 | 316.2075 | 5.30%

Environment:
- PyTorch version: 1.13.0a0+gitcdd625b
- Is debug build: False
- CUDA used to build PyTorch: None
- ROCM used to build PyTorch: N/A
- OS: Ubuntu 20.04.3 LTS (x86_64)
- GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
- Clang version: Could not collect
- CMake version: version 3.22.5
- Libc version: glibc-2.31
- Python version: 3.9.12 (main, Jun  1 2022, 11:38:51)  [GCC 7.5.0] (64-bit runtime)
- Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31
- Is CUDA available: False
- CUDA runtime version: No CUDA
- GPU models and configuration: No CUDA
- Nvidia driver version: No CUDA
- cuDNN version: No CUDA
- HIP runtime version: N/A
- MIOpen runtime version: N/A
- Is XNNPACK available: True

Versions of relevant libraries:
- [pip3] intel-extension-for-pytorch==1.13.0+cpu
- [pip3] numpy==1.23.3
- [pip3] pytorch-widedeep==0.3.7
- [pip3] torch==1.13.0a0+git48b423b
- [pip3] torchvision==0.14.0a0+ebb68f3
- [conda] blas                      1.0                         mkl
- [conda] intel-extension-for-pytorch 1.13.0+cpu               pypi_0    pypi
- [conda] mkl                       2021.4.0           h06a4308_640
- [conda] mkl-include               2022.1.0                 pypi_0    pypi
- [conda] mkl-service               2.4.0            py39h7f8727e_0
- [conda] mkl-static                2022.1.0                 pypi_0    pypi
- [conda] mkl_fft                   1.3.1            py39hd3c417c_0
- [conda] mkl_random                1.2.2            py39h51133e4_0
- [conda] numpy                     1.23.3                   pypi_0    pypi
- [conda] numpy-base                1.22.3           py39hf524024_0
- [conda] torch                     1.13.0a0+git48b423b          pypi_0    pypi
- [conda] torchvision               0.14.0a0+ebb68f3          pypi_0    pypi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329
Approved by: https://github.com/jerryzh168
2022-09-29 00:44:40 +00:00
Nikita Karetnikov
8dd45424ea [primTorch] Add ref for huber_loss and error inputs (#85041)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85041
Approved by: https://github.com/lezcano, https://github.com/mruberry
2022-09-28 19:56:17 +00:00
PyTorch MergeBot
a0b1693996 Revert "Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)"
This reverts commit 1c0f0b33a0.

Reverted https://github.com/pytorch/pytorch/pull/83300 on behalf of https://github.com/jeffdaily due to The commit breaks nvfuser tests
2022-09-28 17:04:53 +00:00
Eddie Yan
b04b2fa9aa [CUBLAS][CUDA GRAPHS] (re-re-open of #83461) Explicitly set the workspace for cuBLAS handles (#85447)
Now includes @dagitses 's optimizations and fixes for teardown

CC @ngimel @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85447
Approved by: https://github.com/malfet
2022-09-28 16:04:58 +00:00
Andres Lugo-Reyes
5709c67f1f [ROCm] Retry loop implemented to avoid transient memory leak errors (#82607)
### Description
Added a retry loop to memory leak checker to avoid rare case in which ROCM reports a false positive memory leak.

### Issue
Original issue observed as part of this ticket: https://github.com/pytorch/pytorch/issues/62533

### Testing
- Applied changes and built
- python test/test_cuda.py
- Ensure all tests pass

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82607
Approved by: https://github.com/malfet
2022-09-28 15:48:24 +00:00
Jeff Daily
0b251d985d skip test TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32 (#85767)
This test was marked as expected failure, but this test is flaky for ROCm but only because ROCm sometimes gets expected success. The test was only marked expected failure due to non-determinism that was already well-known. See the nearby comments.

a4c94f0739/torch/testing/_internal/common_methods_invocations.py (L11410-L11421)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85767
Approved by: https://github.com/clee2000
2022-09-28 14:05:02 +00:00
Thomas Viehmann
795028a3ce Make Python reference for permute accept varargs (#85460)
Fixes #85452

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85460
Approved by: https://github.com/jjsjann123, https://github.com/mruberry, https://github.com/ngimel
2022-09-28 03:50:42 +00:00
Kurt Mohler
1c0f0b33a0 Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)
Changes `dim` arg to use `int[*]?` type for the following functions in `native_funcitons.yaml`:
* `amax`
* `amin`
* `norm`
* `frobenius_norm`
* `native_norm`
* `count_nonzero`

Part of #29137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83300
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/kulinseth
2022-09-28 01:56:37 +00:00
PyTorch MergeBot
572dd862c4 Revert "Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)"
This reverts commit 8c7c7ed322.

Reverted https://github.com/pytorch/pytorch/pull/83300 on behalf of https://github.com/huydhn due to The commit pin breaks XLA test somehow
2022-09-28 01:36:43 +00:00
Kurt Mohler
8c7c7ed322 Update amax/amin/norm/count_nonzero signatures with int[*]? dim (#83300)
Changes `dim` arg to use `int[*]?` type for the following functions in `native_funcitons.yaml`:
* `amax`
* `amin`
* `norm`
* `frobenius_norm`
* `native_norm`
* `count_nonzero`

Part of #29137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83300
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/kulinseth
2022-09-27 23:50:04 +00:00
Ke Wen
775a22c7c6 Add all_gather_into_tensor in place of _all_gather_base (#85686)
### Description
- This PR renames `_all_gather_base` to `all_gather_into_tensor` so that it is clearer in meaning.
- The `all_gather_into_tensor` API differs from the `all_gather` API in the output it accepts -- a single, large tensor instead of a list of tensors.
- This PR also adds deprecation warning to `_all_gather_base`.

### Issue
`_all_gather_base` was implemented in https://github.com/pytorch/pytorch/pull/33924 to avoid unnecessary flattening. There was previous effort (#82639) to merge `_all_gather_base` with the existing `all_gather` API by detecting the parameter type passed in for the output.

There are, however, two "blockers" that make the merge difficult:
(i) The merge leads to backward compatibility break. We would need to change the parameter name `tensor_list` in `all_gather` to a general name `output` that can cover both tensor and tensor list.
(ii) Recently, the `all_gather` API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the `_all_gather_base` function, because that would require users to pass in additional tensor boundary information.

In view of the above, we decided to productize `_all_gather_base` as a separate function, but with a clearer name.

### Testing
Added tests:
- `test_all_gather_into_cat_tensor_cuda` -- output form as with `torch.cat`. For example:
```
        >>> tensor_in
        tensor([1, 2], device='cuda:0') # Rank 0
        tensor([3, 4], device='cuda:1') # Rank 1
        >>> tensor_out
        tensor([1, 2, 3, 4], device='cuda:0') # Rank 0
        tensor([1, 2, 3, 4], device='cuda:1') # Rank 1
```
- `test_all_gather_into_stack_tensor_cuda` -- output form as with `torch.stack`. For example:
```
        >>> tensor_out2
        tensor([[1, 2],
                [3, 4]], device='cuda:0') # Rank 0
        tensor([[1, 2],
                [3, 4]], device='cuda:1') # Rank 1
```
The output form is determined by the shape of the output tensor passed by the user, no flag used.

Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85686
Approved by: https://github.com/rohan-varma, https://github.com/crcrpar
2022-09-27 22:50:22 +00:00
Rodrigo Kumpera
5cbffbbac9 C10D extension to enable per-thread PG (#84153)
Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84153
Approved by: https://github.com/rohan-varma
2022-09-27 21:42:31 +00:00
Peter Bell
b656ba0b11 Use hexfloat for threshold OpInfo tests (#85676)
0.123 isn't exactly representable as a floating point value, and so
the threshold will move marginally depending on the data type where
the computation is performed. This leads to a rare flake in tests
comparing against a reference implementation.

Instead, this chooses a threshold which is exactly representable as a
bfloat16 value and thus has the same value for all data types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85676
Approved by: https://github.com/ngimel
2022-09-27 16:44:46 +00:00
Peter Bell
fdef507897 Simplify noncontiguous_like (#85518)
This removes the special casing for zero-dim tensors and also uses
indexing instead of manual stride manipulations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85518
Approved by: https://github.com/albanD
2022-09-27 16:44:46 +00:00
soulitzer
15c52ffc4f Disallow auto_element_wise for in-place and fix some in-place gradients (#85634)
Fixes https://github.com/pytorch/pytorch/issues/85535

Also fixes the backward and forward gradients of `nn.functional.threshold`. The issue was that in-place gradients weren't tested because the in-place variants were not properly registered to the OpInfo.

Perhaps an alternative to this to make auto_element_wise smart enough to actually handle the in-places cases (we have 4 cases total now where we manually copy_ after doing auto_element_wise), but that requires a few more changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85634
Approved by: https://github.com/albanD
2022-09-27 15:35:24 +00:00
Rodrigo Kumpera
54e03cdda9 Don't use a fixed name to avoid race conditions. (#84952)
Fixes #84886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84952
Approved by: https://github.com/rohan-varma
2022-09-27 14:45:58 +00:00
samdow
18d8c548f4 [Modes] remove enable and rewrite mode stack (squashed) (#84774)
Based on @ezyang's suggestion, mode stack now has "one true mode" which is the _only_ mode that can ever be active at the C++ level. That mode's torch dispatch is just to take the top mode in the stack, reenable itself (if we aren't at the end of the mode stack), and run the top mode's torch_{dispatch|function}

This maintains that in the middle of a mode's torch dispatch, the mode itself will not be active. It changes the function the user has to call to see what the current mode is (no longer queries the C++, it's python only) but allows the user to also see the entire mode stack easily

Removes `enable_torch_dispatch_mode` and `.restore()` since neither makes sense in this new setup

### Background
Why do we want this? Well, a pretty common pattern that was coming up was that users had to do something like

```python
## PRE-PR UX
def f(mode):
  with mode.restore():  # user needs to understand this restore thing?
    ...

with Mode() as m:
  pass
f(m)
```

Many users were getting error from forgetting to call `.restore` or from forgetting to add the (tbh weird) "mode instantiation"  step where they use the mode as a context manager with an empty body. Really, they wanted to treat modes like context managers and just write
```python
## FROM FEEDBACK, USER DESIRED CODE. POSSIBLE POST-PR
def f(mode):
  with mode:
    ...
f(Mode())
```

** Technical Details **
With the old mode stack, we basically had a linked list so the mode itself could only be used once and had a fixed parent. In this new design, the mode stack is just a python list that we're pushing to and popping from. There's only one mode that's ever active at the C++ level and it runs the next mode in the Python list. The modes don't have state on them anymore
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84774
Approved by: https://github.com/ezyang, https://github.com/zou3519
2022-09-27 01:04:35 +00:00
George Qi
686555b663 [maskedtensor] port torch/_masked into torch/masked (#85515)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515
Approved by: https://github.com/cpuhrsch
2022-09-26 23:41:13 +00:00
Daniel Dale
15435325eb Configure PyTorch Testing ArgumentParser Instance To Avoid Unnecessary Conflicts with System Args (#85616)
Fixes #85615

Currently, internal test discovery instantiates an `ArgumentParser` and adds numerous arguments to the internal parser:
f0570354dd/torch/testing/_internal/common_utils.py (L491-L500)
...
In this context, `argparse` will load [system args](b494f5935c/Lib/argparse.py (L1826-L1829)) from any external scripts invoking PyTorch testing (e.g. `vscode`).

The default behavior of `argparse` is to [allow abbreviations](b494f5935c/Lib/argparse.py (L2243-L2251)) of arguments, but when an `ArgumentParser` instance has many arguments and may be invoked in the context of potentially conflicting system args, the `ArgumentParser` should reduce the potential for conflicts by being instantiated with `allow_abbrev` set to `False`.

With the current default configuration, some abbreviations of the `ArgumentParser` long options conflict with system args used by `vscode` to invoke PyTorch test execution:
```bash
python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \
~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/visualstudio_py_testlauncher.py \
--us=./test --up=test_cuda.py --uvInt=2 -ttest_cuda.TestCuda.test_memory_allocation \
--testFile=./test/test_cuda.py
>>>PYTHON-EXEC-OUTPUT
...
visualstudio_py_testlauncher.py: error: argument --use-pytest: ignored explicit argument './test'
```
The full relevant stack:
```
pytorch/test/jit/test_cuda.py, line 11, in <module>\n    from torch.testing._internal.jit_utils import JitTestCase\n'\
pytorch/torch/testing/_internal/jit_utils.py, line 18, in <module>\n    from torch.testing._internal.common_utils import IS_WINDOWS, \\\n'
pytorch/torch/testing/_internal/common_utils.py, line 518, in <module>\n    args, remaining = parser.parse_known_args()\n'
argparse.py, line 1853, in parse_known_args\n    namespace, args = self._parse_known_args(args, namespace)\n'
argparse.py, line 2062, in _parse_known_args\n    start_index = consume_optional(start_index)\n'
argparse.py, line 1983, in consume_optional\n    msg = _(\'ignored explicit argument %r\')\n'
```
The `argparse` [condition](b494f5935c/Lib/argparse.py (L2250)) that generates the error in this case:

```python
print(option_string)
--use-pytest
print(option_prefix)
--us
option_string.startswith(option_prefix)
True
```
It'd be nice if `vscode` didn't use two-letter options 🤦  but PyTorch testing shouldn't depend on such good behavior by invoking wrappers IMHO.

I haven't seen any current dependency on the abbreviated internal PyTorch `ArgumentParser` options so this change should only extend the usability of the (always improving!) PyTorch testing modules.

This simple PR avoids these conflicting options by instantiating the `ArgumentParser` with `allow_abbrev=False`

Thanks to everyone in the community for their continued contributions to this incredibly valuable framework.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85616
Approved by: https://github.com/clee2000
2022-09-26 20:17:52 +00:00
Elias Ellison
bcc544e9d7 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-26 17:08:14 +00:00
Daniel Dale
6e50f8e395 Allow External Scripts (e.g. vscode) To Discover and Execute unittest Tests (#85584)
Fixes #85578

Currently, many test modules customize test loading and discovery via the [load_tests protocol](https://docs.python.org/3/library/unittest.html#load-tests-protocol). The salient custom behavior included (introduced with https://github.com/pytorch/pytorch/pull/13250) is to verify that the script discovering or executing the test is the same script in which the test is defined.

I believe this unnecessarily precludes the use of external tools to discover and execute tests (e.g. the vscode testing extension is widely used and IMHO quite convenient).

This simple PR retains the current restriction by default while offering users the option to disable the aforementioned check if desired by setting an environmental variable.

For example:
1. Setup a test env:
```bash
./tools/nightly.py checkout -b some_test_branch
conda activate pytorch-deps
conda install -c pytorch-nightly numpy expecttest mypy pytest hypothesis astunparse ninja pyyaml cmake cffi typing_extensions future six requests dataclasses -y
```
2. The default test collection behavior discovers 5 matching tests (only tests within `test/jit/test_cuda.py` because it doesn't alter the default `load_test` behavior:
```bash
python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \
~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/testing_tools/unittest_discovery.py \
./test test_cuda.py | grep test_cuda | wc -l
5
```
3. Set the new env variable (in vscode, you would put it in the .env file)
 ```bash
 export PYTORCH_DISABLE_RUNNING_SCRIPT_CHK=1
 ```
4. All of the desired tests are now discovered and can be executed successfully!
  ```bash
python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \
~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/testing_tools/unittest_discovery.py \
./test test_cuda.py | grep test_cuda | wc -l
  175
  ```
![image](https://user-images.githubusercontent.com/7462936/192068508-a292caaf-a1d2-4115-a557-02ac5da80b60.png)

A potentially relevant note, the previous behavior of the custom `load_tests` flattened all the `TestSuite`s in each test module:
4c01c51266/torch/testing/_internal/common_utils.py (L3260-L3262)
I haven't been able to find any code that depends upon this behavior but I think retaining the `TestSuite` structure is preferable from a user perspective and likely safe (`TestSuite`s [can be executed](https://docs.python.org/3/library/unittest.html#load-tests-protocol:~:text=test%20runner%20to-,allow%20it%20to%20be%20run,-as%20any%20other) just like `TestCase`s and this is the structure [recommended](https://docs.python.org/3/library/unittest.html#load-tests-protocol:~:text=provides%20a%20mechanism%20for%20this%3A%20the%20test%20suite) by the standard python documentation).

 If necessary, I can change this PR to continue flattening each test module's `TestSuite`s. Since I expect external tools using the `unittest` `discover` API will usually assume discovered `TestSuite`s to retain their structure (e.g. like [vscode](192c3eabd8/pythonFiles/visualstudio_py_testlauncher.py (L336-L349))) retaining the `testsuite` flattening behavior would likely require customization of those external tools for PyTorch though.

Thanks to everyone in the community for the continued contributions to this incredibly valuable framework!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85584
Approved by: https://github.com/huydhn
2022-09-25 21:54:05 +00:00
Wang, Eikan
a531a604a0 Support BF16ImmPtr (#84041)
- To support BF16 Immediate value by converting it to uint16. The behavior is as same as BF16 tensor
- Enable BF16 test cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84041
Approved by: https://github.com/ZolotukhinM
2022-09-24 11:58:43 +00:00
Huy Do
4d3acf1203 Enable pytest-shard for functorch (#85321)
This extends https://github.com/pytorch/pytorch/pull/84961 to support functorch tests with pytest-shard.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85321
Approved by: https://github.com/samdow, https://github.com/clee2000
2022-09-24 01:17:04 +00:00
dzdang
ea81138bd6 [quant][improvement][better-engineering] Refactored get_supported_device_types into common_quantization.py (#79607)
Summary:
Both test_quantized_tensor.py and test_quantize_fx.py had the same
get_supported_device_types function defined. This PR refactors it into
the common_quantization.py file for common usage

Test Plan:
```
python test/test_quantization.py
```

Differential Revision: [D37173692](https://our.internmc.facebook.com/intern/diff/D37173692)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79607
Approved by: https://github.com/jerryzh168
2022-09-23 23:32:18 +00:00
Edward Z. Yang
604487f239 OpInfo for Slice (#85554)
This is based on wconstab tests from #84680

Technically, slice is covered by the __getitem__ opinfo, but it is
easier to debug/test on a more narrow internal function that only
uses this functionality and not other advanced indexing stuff.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85554
Approved by: https://github.com/mruberry, https://github.com/wconstab
2022-09-23 22:01:32 +00:00
Kshiteej K
bc6dc8d271 [fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330)
Ref: #69991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85330
Approved by: https://github.com/zou3519
2022-09-23 21:40:07 +00:00
Catherine Lee
49e10c1598 [ci] test_ops in parallel, ci tests log to file (#85528)
part one of splitting up https://github.com/pytorch/pytorch/pull/84961 into (probably 2) parts

contains
* logging to file
* testing test_ops in parallel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85528
Approved by: https://github.com/huydhn
2022-09-23 20:45:20 +00:00
PyTorch MergeBot
d10de31cc8 Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 78afa0cf0c.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk 78afa0cf0c
2022-09-23 17:21:43 +00:00
Elias Ellison
78afa0cf0c Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-23 15:50:03 +00:00
Driss Guessous
253ffbf28b Exposing native _scaled_dot_product_attention to torch.nn (#85044)
# Summary
This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the next few weeks and will eventually land as a prototype feature.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85044
Approved by: https://github.com/cpuhrsch
2022-09-22 16:30:16 +00:00
PyTorch MergeBot
5043457a8e Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)"
This reverts commit 9c77083965.

Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) 9c77083965
2022-09-22 15:44:38 +00:00
Elias Ellison
9c77083965 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417
Approved by: https://github.com/ezyang
2022-09-22 13:03:57 +00:00
kshitij12345
56a41b5998 [composite compliance] ctc_loss (#84752)
#Ref #69991

I have mixed feelings about adding new (private) operators. Backends writers will have to override them as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84752
Approved by: https://github.com/zou3519
2022-09-22 00:21:11 +00:00
PyTorch MergeBot
3dce26635f Revert "test in parallel at file granularity (#84961)"
This reverts commit 8107666c6a.

Reverted https://github.com/pytorch/pytorch/pull/84961 on behalf of https://github.com/clee2000 due to makes test_forward_ad_nn_functional_max_unpool2d_cuda_float32 flakily unexpectedly pass
2022-09-21 20:21:25 +00:00
Thomas Viehmann
764cba6848 add Python ref for isreal (#85361)
Dipping my toes into prims waters

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85361
Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
2022-09-21 18:53:34 +00:00
Ivan Yashchuk
35943f30cb Reference implementation for torch.Tensor.sum_to_size (#85338)
New ref: `torch._refs.sum_to_size`.

View consistency validation is disabled because the ref returns a view instead of returning the input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85338
Approved by: https://github.com/mruberry
2022-09-21 18:12:52 +00:00
PyTorch MergeBot
0217a8d049 Revert "[fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330)"
This reverts commit d3dec8097b.

Reverted https://github.com/pytorch/pytorch/pull/85330 on behalf of https://github.com/dagitses due to a PR this is based on got reverted, rebase and reland
2022-09-21 18:02:50 +00:00
PyTorch MergeBot
0ac6311356 Revert "[CUBLAS][CUDA GRAPHS] (re-open of #83461) Explicitly set the workspace for cuBLAS handles (#85292)"
This reverts commit 4012e623e8.

Reverted https://github.com/pytorch/pytorch/pull/85292 on behalf of https://github.com/dagitses due to broke an internal test during shutdown. Re-submit with #85399 in stack
2022-09-21 17:57:49 +00:00
lezcano
2a88f1b2d8 Land "Make ceil,floor,round,trunc handle integers" (#85144)
PR to land https://github.com/pytorch/pytorch/pull/78480, as Rohit does
not work in the PyTorch project anymore
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85144
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-09-21 17:23:47 +00:00
Catherine Lee
8107666c6a test in parallel at file granularity (#84961)
run tests in parallel at the test file granularity

runs 3 files in parallel using multiprocessing pool, output goes to a file, which is then printed when the test finishes.  Some tests cannot be run in parallel (usually due to lacking memory), so we run those after.  Sharding is changed to attempt to mask large files with other large files/run them on the same shard.

test_ops* gets a custom handler to run it because it is simply too big (2hrs on windows) and linalg_cholesky fails (I would really like a solution to this if possible, but until then we use the custom handler).

reduces cuda tests by a lot, reduces total windows test time by ~1hr

Ref. https://github.com/pytorch/pytorch/issues/82894
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84961
Approved by: https://github.com/huydhn
2022-09-21 16:58:11 +00:00
Andrew Gu
125e9256f4 [FSDP] Add back forward_prefetch (#85177)
- This implements explicit forward prefetching following the static 1st iteration's pre-forward order when `forward_prefetch=True` in the FSDP constructor.
- This has the same unit test coverage as the original `forward_prefetch`.
- I checked via print statements that the prefetches are happening, but since I cannot get a good CPU bound workload, it is hard to tell via traces that the prefetch is working.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85177
Approved by: https://github.com/zhaojuanmao
2022-09-21 14:40:37 +00:00
Khushi Agrawal
563b065f5a [fix] rrelu, rrelu_, & RReLU when lower bound > upper bound (#84996)
Fixes #83160

cc @kshitij12345 @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84996
Approved by: https://github.com/mruberry, https://github.com/albanD
2022-09-21 13:57:16 +00:00
Ivan Yashchuk
308b26fe4d Add nvFuser support for transpose (#84629)
`torch._refs.t`, `torch._refs.transpose`, `torch._refs.permute` are all should be working now with nvFuser executor. It would also work with graphs processed by AOT Autograd as these functions are registered to the aten->ref mapping via the "register_decomposition" decorator:
07d398fb26/torch/_refs/__init__.py (L3125-L3126)
07d398fb26/torch/_refs/__init__.py (L3143-L3144)
07d398fb26/torch/_refs/__init__.py (L2548-L2549)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84629
Approved by: https://github.com/ngimel
2022-09-21 12:45:15 +00:00
Horace He
2f4a517d67 Ported matmul compositeimplicitautograd impl into core (#85239)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85239
Approved by: https://github.com/ezyang, https://github.com/lezcano
2022-09-21 09:25:24 +00:00
PyTorch MergeBot
a3dc338ee1 Revert "Exposing native _scaled_dot_product_attention to torch.nn (#85044)"
This reverts commit 9fdd8a8b7f.

Reverted https://github.com/pytorch/pytorch/pull/85044 on behalf of https://github.com/huydhn due to This breaks CUDA 10.2 in trunk. We are deprecating CUDA 10.2, but it is still here in the mean time
2022-09-21 08:34:51 +00:00
Driss Guessous
9fdd8a8b7f Exposing native _scaled_dot_product_attention to torch.nn (#85044)
# Summary
This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the next few weeks and will eventually land as a prototype feature.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85044
Approved by: https://github.com/cpuhrsch
2022-09-21 03:09:08 +00:00
jjsjann123
b9b27f7664 Added Tensor.to overloads to torch._refs.to (#84802)
Fixes #84264

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84802
Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
2022-09-20 18:52:02 +00:00
kshitij12345
d3dec8097b [fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330)
Ref: #69991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85330
Approved by: https://github.com/zou3519
2022-09-20 18:18:39 +00:00
lezcano
d17b144e65 Adding multigammaln ref and fix arange (#85153)
Partially based on https://github.com/pytorch/pytorch/pull/83662.

I'll help land this one, as Rob does not work in the PyTorch project
anymore

I removed the data-dependent check for the args, as data dependencies
are bad for many reasons (and it was failing when the input has NaNs).

It also registers arange as a decomposition, and fixes the naming of its
args.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85153
Approved by: https://github.com/mruberry, https://github.com/ngimel
2022-09-20 17:52:56 +00:00
Sherlock Huang
9c1a6a522d Make ones and zeros's ref accepts variadic size argument (#85117)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85117
Approved by: https://github.com/ngimel, https://github.com/lezcano
2022-09-20 16:41:30 +00:00