Commit Graph

64607 Commits

Author SHA1 Message Date
cyy
d0ad848aa5 Enable misc clang-tidy checks (#110283)
This PR enables the misc-XX checks in clang-tidy. Meanwhile, I excluded some of them that require a lot of code changes and have no immediate benefits. Some additional fixes and suppression were also given.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110283
Approved by: https://github.com/albanD
2023-09-30 10:39:52 +00:00
Adnan Akhundov
2ead6c2f6e Skip launching kernels with zero grid in AOT Inductor (#110312)
Summary: with the grid computed in terms of unbacked `SymInt`s, it can happen that the grid is zero size. This causes CUDA error on `cuLaunchKernel` in the AOT Inductor codegen.

In this PR, when the grid contains unbacked `SymInt`s, a check is added around the `launchKernel` in the AOT Inductor's C++ wrapper codegen to make sure that the grid is not zero-size.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110312
Approved by: https://github.com/chenyang78
2023-09-30 09:12:56 +00:00
Huy Do
81a74457ca [BE] Clean up trymerge code handling flaky failures (#110133)
This is the 2nd part of https://github.com/pytorch/pytorch/pull/110054.  The flaky classification has been done on Dr.CI.  There is no need to download flaky rule files and do the check anymore.  Some tests are also updated with new examples because we mocked the list of flaky rules there.  Similar tests have been done on Dr.CI.

* [x] https://github.com/pytorch/pytorch/pull/110054
* [x] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110133
Approved by: https://github.com/clee2000
2023-09-30 08:01:00 +00:00
Oguz Ulgen
f7ba3e85e2 [Dynamo] Add functional triton kernel wrapper (#110185)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110185
Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/bdhirsh
ghstack dependencies: #109623
2023-09-30 04:20:20 +00:00
eqy
6b84658433 [CUDA][cudaMallocAsync] Improve PYTORCH_CUDA_ALLOC_CONF error message (#104891)
Tiny fix to improve use-facing errors for issues like #104801

CC @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104891
Approved by: https://github.com/kit1980
2023-09-30 02:59:02 +00:00
Nikita Shulga
ad8aef0f98 [BE] [3/N] Use nested namespaces (#110314)
Mostly in torch/csrc/jit/runtime and in `ATen/cuda/`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314
Approved by: https://github.com/seemethere
2023-09-30 02:23:48 +00:00
drisspg
8745d2d4f2 Small optimization to how we call flash-attention (#110324)
# Summary
Logging Mode is great, and helped me identify that we are doing an unnecessary slice sometimes.

### Numbers
For small sizes: ie. (16, 16, 32, 32)
This brings the timing from:

`flash_time: 29.344002110883594 micro seconds`

to

`flash_time: 26.971791498363018 micro seconds`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110324
Approved by: https://github.com/cpuhrsch
2023-09-30 02:15:07 +00:00
leslie-fang-intel
7eeb392eb3 [Inductor] Enable the item() and nonzero() codegen test on CPU (#110262)
**Summary**
Follow up https://github.com/pytorch/pytorch/pull/109893 which has issue in support of CPU as reported in https://github.com/pytorch/pytorch/issues/109897. This fix mainly includes 2 changes:

-  Current implementation of `rename_indexing`
10c646295d/torch/_inductor/codegen/common.py (L1023) only add symbol name start with `s` or `ps` into `kernel.args.sizevars`. However, `Unbacked symint` will start as `i`, so we extend the implementation of `rename_indexing` to support symbol start with `i`.
- Currently, the internal loop index also name start as `i`. Since `i` has has been used as `Unbacked symint`, change the name to start with `x` which should align with trition.

**Test Plan**
```
python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_bool_mask_nobreak
python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_nonzero_size_factory_nobreak
python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_item_zeros_nobreak
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110262
Approved by: https://github.com/ezyang, https://github.com/jgong5
2023-09-30 00:13:20 +00:00
ancestor-mithril
e0be9ebc18 Simplify the conditionals used for learning rate calculation for ConstantLR learning rate scheduler (#109785)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109785
Approved by: https://github.com/janeyx99, https://github.com/kit1980
2023-09-29 23:11:23 +00:00
Bin Bao
993eea0edd [aotinductor] Fix a missing schema issue for repeat_interleave (#110105)
Differential Revision: [D49686812](https://our.internmc.facebook.com/intern/diff/D49686812)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110105
Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/aakhundov
2023-09-29 23:01:37 +00:00
davidgens-cerebras
ee0bff209c [LTC] correct AdaptiveAvgPool3d channel dim index for shape inference (#109822)
Fixes #109821

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109822
Approved by: https://github.com/mikaylagawarecki, https://github.com/alanwaketan
2023-09-29 22:54:12 +00:00
Nikita Shulga
5a87477e3f [BE] Use std::make_unique (#110298)
Since C++14 `std::unique_ptr<type_t[]> x(new type_t[NUM])` is identical to `auto x = std::make_unique<type_t[]>(NUM);`

Leave two `std::unique_ptr<float[]> arr(new float[NUM]());` as statement not just allocates, but initializes it as well, se e below:
d04b35e7e3/aten/src/ATen/native/cpu/SoftMaxKernel.cpp (L700-L701)

On the other hand, from https://github.com/pytorch/pytorch/pull/60371 it's not at all clear, if it needs to be initialized to zero at that point...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110298
Approved by: https://github.com/kit1980
2023-09-29 22:46:30 +00:00
PyTorch MergeBot
b083058e45 Revert "Make unbind() overrideable for NT subclass (#109122)"
This reverts commit f5a23ca78d.

Reverted https://github.com/pytorch/pytorch/pull/109122 on behalf of https://github.com/PaliC due to breaking slow tests ([comment](https://github.com/pytorch/pytorch/pull/109122#issuecomment-1741555305))
2023-09-29 22:41:56 +00:00
Evgeni Burovski
1e95a1ae8c MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815)
1. Inherit from TestCase
2. Use pytorch parametrization
3. Use unittest.expectedFailure to mark xfails, also unittest skips

All this to make pytest-less invocation work:

$ python test/torch_np/test_basic.py

cross-ref #109593, #109718, #109775

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109815
Approved by: https://github.com/lezcano
2023-09-29 22:36:13 +00:00
Octavian Guzu
9c7071b0e3 [fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-use-after-free (size 8) in std::_Function_base::_M_empty() (#110289)
Summary: This diff fixes a heap UAF found by fuzzing in torch/csrc/jit/mobile/interpreter.cpp

Test Plan:
CI and
```
arc lionhead crash reproduce 1009060456885023
```
doesn't crash anymore.

Reviewed By: malfet

Differential Revision: D49538326

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110289
Approved by: https://github.com/malfet
2023-09-29 22:32:38 +00:00
PyTorch MergeBot
f2d7faf4ba Revert "MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815)"
This reverts commit 132a138a01.

Reverted https://github.com/pytorch/pytorch/pull/109815 on behalf of https://github.com/PaliC due to causing various slow tests to fail ([comment](https://github.com/pytorch/pytorch/pull/109815#issuecomment-1741525574))
2023-09-29 21:53:36 +00:00
drisspg
28d69d5256 Adding Backward Support for NestedTensors and FlashAttention (#97485)
# Summary
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 318764f</samp>

This pull request implements the CUDA backend of the SDPA kernel for nested tensors, which enables efficient transformer models with variable-length sequences. It adds a new dispatch key, a backward function, a unit test, and some helper functions for the kernel. It modifies `test/test_transformers.py`, `aten/src/ATen/native/native_functions.yaml`, `aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctionsBackward.cpp`, and `aten/src/ATen/native/nested/cuda/NestedTensorTransformerUtils.h`.

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at ed4a773</samp>

> _Fused kernels of doom, unleash the flash attention_
> _Nested tensors on fire, reshape and pad with caution_
> _Backward pass of power, dispatch the CUDA key_
> _Test the gradients of hell, warn the user if they disagree_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97485
Approved by: https://github.com/jbschlosser
2023-09-29 21:34:47 +00:00
Avik Chaudhuri
359c2a53f5 dynamic_shapes + retrace exported program (#110276)
An `ExportedProgram`'s `__call__` signature is different from the original module, so `dynamic_shapes` that follow the original signature would fail when applied to re-export an `ExportedProgram`.

This PR fixes this issue, in other words, the original `dynamic_shapes` should now work when re-exporting.

Differential Revision: D49764011

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110276
Approved by: https://github.com/tugsbayasgalan
2023-09-29 21:06:46 +00:00
PyTorch MergeBot
c2c7c4035f Revert "Simplify the conditionals used for learning rate calculation for ConstantLR learning rate scheduler (#109785)"
This reverts commit 83283b4f0d.

Reverted https://github.com/pytorch/pytorch/pull/109785 on behalf of https://github.com/PaliC due to causing macos errors as per 83283b4f0d ([comment](https://github.com/pytorch/pytorch/pull/109785#issuecomment-1741471142))
2023-09-29 20:49:28 +00:00
atalman
b253fc9c93 Revert "[1/N] Dynamo skipfiles refactor (#109567)" (#110296)
This reverts commit 84c5435b29.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110296
Approved by: https://github.com/yanboliang
2023-09-29 20:35:46 +00:00
Peter Bell
bc047ec906 [inductor] Make sure unfuse_addmm and addmm patterns don't overlap (#110235)
Inductor has two opposing patterns,
```
addmm -> add + mm
add + mm -> addmm
```

This uses the `extra_check` to disable the addmm fusion pattern when the
heuristic to unfuse add is met, for consistency.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110235
Approved by: https://github.com/lezcano, https://github.com/eellison
ghstack dependencies: #110232
2023-09-29 19:35:29 +00:00
Peter Bell
d04b35e7e3 [inductor] Fix bug in input mutation (#107614)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107614
Approved by: https://github.com/jansel
2023-09-29 18:27:06 +00:00
Sherlock Huang
d7de26804e [AOTInductor] ProxyExecutor supports List[Tensor] return type (#110182)
Summary:
Support custom ops returns List[Tensor] type, like `"fn_with_list_output(Tensor[] tensors, int i) -> Tensor[]"`

As an example
`out5, out6 = torch.ops.fb.fn_with_list_output([out3, out4], 1)`

got compiled into

```
    AtenTensorHandle buf8_handle;  // output buffer
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf8_handle));
    RAIIAtenTensorHandle buf8(buf8_handle);
    AtenTensorHandle buf9_handle;  // output buffer
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf9_handle));
    RAIIAtenTensorHandle buf9(buf9_handle);
    AtenTensorHandle tensor_args_var_5[] = {buf5.get(), buf6.get(), buf8.get(), buf9.get()};
    int64_t int_args_var_6[] = {1};
    aoti_torch_proxy_executor_call_function(proxy_executor, 2, 1, int_args_var_6, 4, tensor_args_var_5);
```

Test Plan: Test

Differential Revision: D49694691

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110182
Approved by: https://github.com/chenyang78
2023-09-29 18:21:48 +00:00
Mu-Chu Lee
d6d3f6cfe5 Add weight update for DSOModel. (#110273)
Summary: Add weight update for DSOModel and AOTInductorModel

Test Plan: buck2 test accelerators/workloads/models/slimdsnn:slimdsnn_dso_test - SlimDSNN.DSO_Update_Constants

Reviewed By: mikekgfb

Differential Revision: D49748685

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110273
Approved by: https://github.com/hl475
2023-09-29 18:14:01 +00:00
Jaromir Latal
6e2c14a0e8 [Codemod][[codemod] Replace third-party mock with unittest.mock] caffe2/caffe2 (#106541)
Reviewed By: thechrisu

Differential Revision: D47909974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106541
Approved by: https://github.com/thechrisu
2023-09-29 18:09:49 +00:00
Simon Fan
88ef126a93 rename nanogpt_generate to nanogpt to also support train (#109746)
Differential Revision: [D49522940](https://our.internmc.facebook.com/intern/diff/D49522940)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109746
Approved by: https://github.com/msaroufim, https://github.com/malfet, https://github.com/xuzhao9
2023-09-29 17:36:48 +00:00
Yang Chen
30759848fa [inductor] handle non-list/tuple outputs for FallbackKernel (#110145)
generate_output may return non-list/tuple outputs. Let's force
those to be list, because we will enumerate kernel.outputs
later in the codegen.

Also fixed a minor issue in an assertion message.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110145
Approved by: https://github.com/aakhundov
2023-09-29 17:13:26 +00:00
Catherine Lee
defb364adf Clean up test_external_module_register (#110254)
caused by #109866

The test registers new device module, the above pr checks for xpu, sees that it got registered and uses it but its a dummy module.

This causes any test after it to fail so I "clean up" the registered module

Another possible solution would be to run this test last lol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110254
Approved by: https://github.com/huydhn
2023-09-29 17:02:13 +00:00
Bin Bao
0ff1155d3a [aotinductor] Refactor test_aot_inductor to take different devices (#110216)
Summary: Replace hardcoded device to self.device, to make it easier to test both cpu and cuda

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110216
Approved by: https://github.com/chenyang78, https://github.com/bertmaher
ghstack dependencies: #110215
2023-09-29 16:30:19 +00:00
Bin Bao
ce6d09a775 [aotinductor] Refactor test_aot_inductor (#110215)
Summary: Remove the usage of output tensors in the test script, since AOTInductor now returns output tensors instead of taking in pre-allocated output tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110215
Approved by: https://github.com/angelayi, https://github.com/chenyang78
2023-09-29 16:30:19 +00:00
Andrei Gheorghe
28f52f2f80 Fix aminmax on CUDA when input shape contains 0 (#107564)
The CUDA kernel asserts numel() > 0, the CPU kernel doesn't and returns empty values (as expected)

Fixes #95349 and #85439

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107564
Approved by: https://github.com/lezcano
2023-09-29 16:18:08 +00:00
Oguz Ulgen
2d50a30d77 [Dynamo] Add native support for Triton Kernels to Dynamo (#109623)
This PR adds native support to Dynamo to detect Triton kernels and
create an FX graph node out of them. AOT eager and inductor modes will
be support in follow up PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109623
Approved by: https://github.com/jansel
2023-09-29 15:49:18 +00:00
Joel Schlosser
3693777a86 Pickle support for NT (#110219)
Fixes #104198
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110219
Approved by: https://github.com/cpuhrsch
2023-09-29 15:30:06 +00:00
Jane Xu
c9511e8ac9 [foreach][BE] cleaning up MultiTensorApply.cuh (#110228)
Followup edits to #109402 as suggested by @r-barnes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110228
Approved by: https://github.com/drisspg
2023-09-29 14:44:48 +00:00
Bert Maher
92f4a7b663 [inductor] Add fbcode include path for cuda (#110240)
We missed the cuda include, leading to failures in cases where CUDA
was not installed locally but only provided via third-party/GVFS.

Differential Revision: [D49745585](https://our.internmc.facebook.com/intern/diff/D49745585/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110240
Approved by: https://github.com/hl475
2023-09-29 13:39:40 +00:00
Peter Bell
758735b739 [dynamo] Convert dtype arguments as well as inputs in cast_to_fp64 (#110232)
Generating reference outputs somtimes fails because of type mismatches in the graph,
an issue which was noticed previously for `prims.convert_element_type` and fixed in #92036
but the same issue happens with other functions such as tensor constructors.

This expands the fix from #92036 to all dtype keyword arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110232
Approved by: https://github.com/ezyang
2023-09-29 12:42:14 +00:00
Rohan Varma
24e5d61af8 Log usage of optimizer in backward (#110206)
This will allow us to inspect and aggregate jobs that use optimizer in
backward

Differential Revision: [D48674740](https://our.internmc.facebook.com/intern/diff/D48674740/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110206
Approved by: https://github.com/awgu
2023-09-29 11:00:07 +00:00
PyTorch UpdateBot
acac92f806 [vision hash update] update the pinned vision hash (#110258)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110258
Approved by: https://github.com/pytorchbot
2023-09-29 04:17:27 +00:00
ancestor-mithril
d615f0078c Updating documentation for PolynomialLR (#110151)
Docstring mentions the power parameter is `int`, when it should have been `float`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110151
Approved by: https://github.com/janeyx99
2023-09-29 03:50:11 +00:00
Zain Rizvi
07ec95b17c TD: Fix sorting bug for historical correlations heuristic (#110257)
Fix bug where the historical correlations heuristic currently sorts heuristics in the opposite order, ranking the least relevant tests most highly

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 70333d1</samp>

> _`test_files` sorted_
> _by ratings, high to low_
> _a faster spring test_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110257
Approved by: https://github.com/clee2000
2023-09-29 03:29:08 +00:00
cyy
3dc479e70b [1/N] Apply clang-tidy to c10/test/*cpp (#109278)
This series of PR enables clang-tidy checks in c10/test. We aim to finally add the path to lintrunner.toml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109278
Approved by: https://github.com/kit1980
2023-09-29 02:20:57 +00:00
jjsjann123
e6b5e0ecc6 removing the functionality of nvfuser python APIs (#110124)
Removing the functionalities from nvfuser python APIs.

Since the use of nvfuser has been deprecated before the last release cut. We are removing torch script support.

I'll have the next PR to actually remove the code base.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110124
Approved by: https://github.com/davidberard98
2023-09-29 01:45:00 +00:00
rzou
88de391692 [torch.library] Fix some docstrings (#110214)
Removed some erroneous colons

Test Plan:
- code reading
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110214
Approved by: https://github.com/ezyang
2023-09-29 01:44:49 +00:00
ancestor-mithril
83283b4f0d Simplify the conditionals used for learning rate calculation for ConstantLR learning rate scheduler (#109785)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109785
Approved by: https://github.com/janeyx99, https://github.com/kit1980
2023-09-29 01:19:05 +00:00
Jerry Zhang
c9b8e06060 [quant] Enable quantization for wav2letter (#109830)
Summary:
Also added annotation support for conv1d_relu and conv1d in XNNPACKQuantizer, the quantized results still
matches fx quant path (didn't quantize conv1d) so tests are not disabled

Test Plan: with-proxy buck2 run executorch/examples/quantization:example -- -m=w2l --verify

Differential Revision: D49479546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109830
Approved by: https://github.com/kimishpatel
2023-09-29 00:47:34 +00:00
Animesh Jain
ce8b4f56d8 [dynamo] Dont put nn module guards on torch inbuilt nn modules (#110230)
This is one way to fix https://github.com/pytorch/pytorch/issues/110048

Looking for feedback.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110230
Approved by: https://github.com/ezyang
2023-09-29 00:43:16 +00:00
chunyuan
20dabea35d Inductor cpp wrapper: support MkldnnRnnLayer (#107858)
1. Directly use the `codegen` function of the parent class which already supported both python and cpp wrapper.
2. The output of the `at::mkldnn_rnn_layer` OP is actually a `std::tuple` 1491bae277/aten/src/ATen/native/mkldnn/RNN.cpp (L218) Fix the type when calling `MultiOutput`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107858
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-09-29 00:22:42 +00:00
Edward Z. Yang
d1a13129bb Add support for item() and nonzero() codegen in Inductor (#109893)
This is another version of
https://github.com/pytorch/pytorch/pull/109262 that I think is more
harmonious with inductor design.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109893
Approved by: https://github.com/jansel
2023-09-28 23:37:31 +00:00
Jerry Zhang
3de42995e4 [quant][pt2e] Add quant API re-entrant test (#110125)
Summary:
Add the test to make sure we can call the quantize API multiple times

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_reentrant

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110125
Approved by: https://github.com/kimishpatel
ghstack dependencies: #110097
2023-09-28 22:41:59 +00:00
skc7
bbb95878e9 [LLVM] Update apis incompatible with llvm versions in codegen (#110200)
Opaque pointers support is disabled in llvm 14 and enabled by default from llvm 15 and above.
setOpaquePointers api usage is deprecated from llvm 16. Removed this API.

Update CreateMalloc and CreateFree apis for latest llvm release.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110200
Approved by: https://github.com/Skylion007
2023-09-28 21:49:30 +00:00