pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
cyy	d0ad848aa5	Enable misc clang-tidy checks (#110283 ) This PR enables the misc-XX checks in clang-tidy. Meanwhile, I excluded some of them that require a lot of code changes and have no immediate benefits. Some additional fixes and suppression were also given. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110283 Approved by: https://github.com/albanD	2023-09-30 10:39:52 +00:00
Adnan Akhundov	2ead6c2f6e	Skip launching kernels with zero grid in AOT Inductor (#110312 ) Summary: with the grid computed in terms of unbacked `SymInt`s, it can happen that the grid is zero size. This causes CUDA error on `cuLaunchKernel` in the AOT Inductor codegen. In this PR, when the grid contains unbacked `SymInt`s, a check is added around the `launchKernel` in the AOT Inductor's C++ wrapper codegen to make sure that the grid is not zero-size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110312 Approved by: https://github.com/chenyang78	2023-09-30 09:12:56 +00:00
Huy Do	81a74457ca	[BE] Clean up trymerge code handling flaky failures (#110133 ) This is the 2nd part of https://github.com/pytorch/pytorch/pull/110054. The flaky classification has been done on Dr.CI. There is no need to download flaky rule files and do the check anymore. Some tests are also updated with new examples because we mocked the list of flaky rules there. Similar tests have been done on Dr.CI. * [x] https://github.com/pytorch/pytorch/pull/110054 * [x] Clean up the flaky rules logic because it has already been implemented on Dr. CI * [ ] Clean up the broken trunk logic for the same reason Pull Request resolved: https://github.com/pytorch/pytorch/pull/110133 Approved by: https://github.com/clee2000	2023-09-30 08:01:00 +00:00
Oguz Ulgen	f7ba3e85e2	[Dynamo] Add functional triton kernel wrapper (#110185 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110185 Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/bdhirsh ghstack dependencies: #109623	2023-09-30 04:20:20 +00:00
eqy	6b84658433	[CUDA][cudaMallocAsync] Improve `PYTORCH_CUDA_ALLOC_CONF` error message (#104891 ) Tiny fix to improve use-facing errors for issues like #104801 CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/104891 Approved by: https://github.com/kit1980	2023-09-30 02:59:02 +00:00
Nikita Shulga	ad8aef0f98	[BE] [3/N] Use nested namespaces (#110314 ) Mostly in torch/csrc/jit/runtime and in `ATen/cuda/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314 Approved by: https://github.com/seemethere	2023-09-30 02:23:48 +00:00
drisspg	8745d2d4f2	Small optimization to how we call flash-attention (#110324 ) # Summary Logging Mode is great, and helped me identify that we are doing an unnecessary slice sometimes. ### Numbers For small sizes: ie. (16, 16, 32, 32) This brings the timing from: `flash_time: 29.344002110883594 micro seconds` to `flash_time: 26.971791498363018 micro seconds` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110324 Approved by: https://github.com/cpuhrsch	2023-09-30 02:15:07 +00:00
leslie-fang-intel	7eeb392eb3	[Inductor] Enable the item() and nonzero() codegen test on CPU (#110262 ) Summary Follow up https://github.com/pytorch/pytorch/pull/109893 which has issue in support of CPU as reported in https://github.com/pytorch/pytorch/issues/109897. This fix mainly includes 2 changes: - Current implementation of `rename_indexing` `10c646295d/torch/_inductor/codegen/common.py (L1023)` only add symbol name start with `s` or `ps` into `kernel.args.sizevars`. However, `Unbacked symint` will start as `i`, so we extend the implementation of `rename_indexing` to support symbol start with `i`. - Currently, the internal loop index also name start as `i`. Since `i` has has been used as `Unbacked symint`, change the name to start with `x` which should align with trition. Test Plan ``` python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_bool_mask_nobreak python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_nonzero_size_factory_nobreak python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_item_zeros_nobreak ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110262 Approved by: https://github.com/ezyang, https://github.com/jgong5	2023-09-30 00:13:20 +00:00
ancestor-mithril	e0be9ebc18	Simplify the conditionals used for learning rate calculation for `ConstantLR` learning rate scheduler (#109785 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109785 Approved by: https://github.com/janeyx99, https://github.com/kit1980	2023-09-29 23:11:23 +00:00
Bin Bao	993eea0edd	[aotinductor] Fix a missing schema issue for repeat_interleave (#110105 ) Differential Revision: [D49686812](https://our.internmc.facebook.com/intern/diff/D49686812) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110105 Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/aakhundov	2023-09-29 23:01:37 +00:00
davidgens-cerebras	ee0bff209c	[LTC] correct AdaptiveAvgPool3d channel dim index for shape inference (#109822 ) Fixes #109821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109822 Approved by: https://github.com/mikaylagawarecki, https://github.com/alanwaketan	2023-09-29 22:54:12 +00:00
Nikita Shulga	5a87477e3f	[BE] Use `std::make_unique` (#110298 ) Since C++14 `std::unique_ptr<type_t[]> x(new type_t[NUM])` is identical to `auto x = std::make_unique<type_t[]>(NUM);` Leave two `std::unique_ptr<float[]> arr(new float[NUM]());` as statement not just allocates, but initializes it as well, se e below: `d04b35e7e3/aten/src/ATen/native/cpu/SoftMaxKernel.cpp (L700-L701)` On the other hand, from https://github.com/pytorch/pytorch/pull/60371 it's not at all clear, if it needs to be initialized to zero at that point... Pull Request resolved: https://github.com/pytorch/pytorch/pull/110298 Approved by: https://github.com/kit1980	2023-09-29 22:46:30 +00:00
PyTorch MergeBot	b083058e45	Revert "Make unbind() overrideable for NT subclass (#109122 )" This reverts commit `f5a23ca78d`. Reverted https://github.com/pytorch/pytorch/pull/109122 on behalf of https://github.com/PaliC due to breaking slow tests ([comment](https://github.com/pytorch/pytorch/pull/109122#issuecomment-1741555305))	2023-09-29 22:41:56 +00:00
Evgeni Burovski	1e95a1ae8c	MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815 ) 1. Inherit from TestCase 2. Use pytorch parametrization 3. Use unittest.expectedFailure to mark xfails, also unittest skips All this to make pytest-less invocation work: $ python test/torch_np/test_basic.py cross-ref #109593, #109718, #109775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109815 Approved by: https://github.com/lezcano	2023-09-29 22:36:13 +00:00
Octavian Guzu	9c7071b0e3	[fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-use-after-free (size 8) in std::_Function_base::_M_empty() (#110289 ) Summary: This diff fixes a heap UAF found by fuzzing in torch/csrc/jit/mobile/interpreter.cpp Test Plan: CI and ``` arc lionhead crash reproduce 1009060456885023 ``` doesn't crash anymore. Reviewed By: malfet Differential Revision: D49538326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110289 Approved by: https://github.com/malfet	2023-09-29 22:32:38 +00:00
PyTorch MergeBot	f2d7faf4ba	Revert "MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815 )" This reverts commit `132a138a01`. Reverted https://github.com/pytorch/pytorch/pull/109815 on behalf of https://github.com/PaliC due to causing various slow tests to fail ([comment](https://github.com/pytorch/pytorch/pull/109815#issuecomment-1741525574))	2023-09-29 21:53:36 +00:00
drisspg	28d69d5256	Adding Backward Support for NestedTensors and FlashAttention (#97485 ) # Summary <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 318764f</samp> This pull request implements the CUDA backend of the SDPA kernel for nested tensors, which enables efficient transformer models with variable-length sequences. It adds a new dispatch key, a backward function, a unit test, and some helper functions for the kernel. It modifies `test/test_transformers.py`, `aten/src/ATen/native/native_functions.yaml`, `aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctionsBackward.cpp`, and `aten/src/ATen/native/nested/cuda/NestedTensorTransformerUtils.h`. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at ed4a773</samp> > _Fused kernels of doom, unleash the flash attention_ > _Nested tensors on fire, reshape and pad with caution_ > _Backward pass of power, dispatch the CUDA key_ > _Test the gradients of hell, warn the user if they disagree_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97485 Approved by: https://github.com/jbschlosser	2023-09-29 21:34:47 +00:00
Avik Chaudhuri	359c2a53f5	dynamic_shapes + retrace exported program (#110276 ) An `ExportedProgram`'s `__call__` signature is different from the original module, so `dynamic_shapes` that follow the original signature would fail when applied to re-export an `ExportedProgram`. This PR fixes this issue, in other words, the original `dynamic_shapes` should now work when re-exporting. Differential Revision: D49764011 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110276 Approved by: https://github.com/tugsbayasgalan	2023-09-29 21:06:46 +00:00
PyTorch MergeBot	c2c7c4035f	Revert "Simplify the conditionals used for learning rate calculation for `ConstantLR` learning rate scheduler (#109785 )" This reverts commit `83283b4f0d`. Reverted https://github.com/pytorch/pytorch/pull/109785 on behalf of https://github.com/PaliC due to causing macos errors as per `83283b4f0d` ([comment](https://github.com/pytorch/pytorch/pull/109785#issuecomment-1741471142))	2023-09-29 20:49:28 +00:00
atalman	b253fc9c93	Revert "[1/N] Dynamo skipfiles refactor (#109567 )" (#110296 ) This reverts commit `84c5435b29`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110296 Approved by: https://github.com/yanboliang	2023-09-29 20:35:46 +00:00
Peter Bell	bc047ec906	[inductor] Make sure unfuse_addmm and addmm patterns don't overlap (#110235 ) Inductor has two opposing patterns, ``` addmm -> add + mm add + mm -> addmm ``` This uses the `extra_check` to disable the addmm fusion pattern when the heuristic to unfuse add is met, for consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110235 Approved by: https://github.com/lezcano, https://github.com/eellison ghstack dependencies: #110232	2023-09-29 19:35:29 +00:00
Peter Bell	d04b35e7e3	[inductor] Fix bug in input mutation (#107614 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107614 Approved by: https://github.com/jansel	2023-09-29 18:27:06 +00:00
Sherlock Huang	d7de26804e	[AOTInductor] ProxyExecutor supports List[Tensor] return type (#110182 ) Summary: Support custom ops returns List[Tensor] type, like `"fn_with_list_output(Tensor[] tensors, int i) -> Tensor[]"` As an example `out5, out6 = torch.ops.fb.fn_with_list_output([out3, out4], 1)` got compiled into ``` AtenTensorHandle buf8_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf8_handle)); RAIIAtenTensorHandle buf8(buf8_handle); AtenTensorHandle buf9_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf9_handle)); RAIIAtenTensorHandle buf9(buf9_handle); AtenTensorHandle tensor_args_var_5[] = {buf5.get(), buf6.get(), buf8.get(), buf9.get()}; int64_t int_args_var_6[] = {1}; aoti_torch_proxy_executor_call_function(proxy_executor, 2, 1, int_args_var_6, 4, tensor_args_var_5); ``` Test Plan: Test Differential Revision: D49694691 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110182 Approved by: https://github.com/chenyang78	2023-09-29 18:21:48 +00:00
Mu-Chu Lee	d6d3f6cfe5	Add weight update for DSOModel. (#110273 ) Summary: Add weight update for DSOModel and AOTInductorModel Test Plan: buck2 test accelerators/workloads/models/slimdsnn:slimdsnn_dso_test - SlimDSNN.DSO_Update_Constants Reviewed By: mikekgfb Differential Revision: D49748685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110273 Approved by: https://github.com/hl475	2023-09-29 18:14:01 +00:00
Jaromir Latal	6e2c14a0e8	[Codemod][[codemod] Replace third-party mock with unittest.mock] caffe2/caffe2 (#106541 ) Reviewed By: thechrisu Differential Revision: D47909974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106541 Approved by: https://github.com/thechrisu	2023-09-29 18:09:49 +00:00
Simon Fan	88ef126a93	rename nanogpt_generate to nanogpt to also support train (#109746 ) Differential Revision: [D49522940](https://our.internmc.facebook.com/intern/diff/D49522940) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109746 Approved by: https://github.com/msaroufim, https://github.com/malfet, https://github.com/xuzhao9	2023-09-29 17:36:48 +00:00
Yang Chen	30759848fa	[inductor] handle non-list/tuple outputs for FallbackKernel (#110145 ) generate_output may return non-list/tuple outputs. Let's force those to be list, because we will enumerate kernel.outputs later in the codegen. Also fixed a minor issue in an assertion message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110145 Approved by: https://github.com/aakhundov	2023-09-29 17:13:26 +00:00
Catherine Lee	defb364adf	Clean up test_external_module_register (#110254 ) caused by #109866 The test registers new device module, the above pr checks for xpu, sees that it got registered and uses it but its a dummy module. This causes any test after it to fail so I "clean up" the registered module Another possible solution would be to run this test last lol Pull Request resolved: https://github.com/pytorch/pytorch/pull/110254 Approved by: https://github.com/huydhn	2023-09-29 17:02:13 +00:00
Bin Bao	0ff1155d3a	[aotinductor] Refactor test_aot_inductor to take different devices (#110216 ) Summary: Replace hardcoded device to self.device, to make it easier to test both cpu and cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/110216 Approved by: https://github.com/chenyang78, https://github.com/bertmaher ghstack dependencies: #110215	2023-09-29 16:30:19 +00:00
Bin Bao	ce6d09a775	[aotinductor] Refactor test_aot_inductor (#110215 ) Summary: Remove the usage of output tensors in the test script, since AOTInductor now returns output tensors instead of taking in pre-allocated output tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110215 Approved by: https://github.com/angelayi, https://github.com/chenyang78	2023-09-29 16:30:19 +00:00
Andrei Gheorghe	28f52f2f80	Fix aminmax on CUDA when input shape contains 0 (#107564 ) The CUDA kernel asserts numel() > 0, the CPU kernel doesn't and returns empty values (as expected) Fixes #95349 and #85439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107564 Approved by: https://github.com/lezcano	2023-09-29 16:18:08 +00:00
Oguz Ulgen	2d50a30d77	[Dynamo] Add native support for Triton Kernels to Dynamo (#109623 ) This PR adds native support to Dynamo to detect Triton kernels and create an FX graph node out of them. AOT eager and inductor modes will be support in follow up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109623 Approved by: https://github.com/jansel	2023-09-29 15:49:18 +00:00
Joel Schlosser	3693777a86	Pickle support for NT (#110219 ) Fixes #104198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110219 Approved by: https://github.com/cpuhrsch	2023-09-29 15:30:06 +00:00
Jane Xu	c9511e8ac9	[foreach][BE] cleaning up MultiTensorApply.cuh (#110228 ) Followup edits to #109402 as suggested by @r-barnes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110228 Approved by: https://github.com/drisspg	2023-09-29 14:44:48 +00:00
Bert Maher	92f4a7b663	[inductor] Add fbcode include path for cuda (#110240 ) We missed the cuda include, leading to failures in cases where CUDA was not installed locally but only provided via third-party/GVFS. Differential Revision: [D49745585](https://our.internmc.facebook.com/intern/diff/D49745585/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110240 Approved by: https://github.com/hl475	2023-09-29 13:39:40 +00:00
Peter Bell	758735b739	[dynamo] Convert dtype arguments as well as inputs in `cast_to_fp64` (#110232 ) Generating reference outputs somtimes fails because of type mismatches in the graph, an issue which was noticed previously for `prims.convert_element_type` and fixed in #92036 but the same issue happens with other functions such as tensor constructors. This expands the fix from #92036 to all dtype keyword arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110232 Approved by: https://github.com/ezyang	2023-09-29 12:42:14 +00:00
Rohan Varma	24e5d61af8	Log usage of optimizer in backward (#110206 ) This will allow us to inspect and aggregate jobs that use optimizer in backward Differential Revision: [D48674740](https://our.internmc.facebook.com/intern/diff/D48674740/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110206 Approved by: https://github.com/awgu	2023-09-29 11:00:07 +00:00
PyTorch UpdateBot	acac92f806	[vision hash update] update the pinned vision hash (#110258 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110258 Approved by: https://github.com/pytorchbot	2023-09-29 04:17:27 +00:00
ancestor-mithril	d615f0078c	Updating documentation for `PolynomialLR` (#110151 ) Docstring mentions the power parameter is `int`, when it should have been `float`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110151 Approved by: https://github.com/janeyx99	2023-09-29 03:50:11 +00:00
Zain Rizvi	07ec95b17c	TD: Fix sorting bug for historical correlations heuristic (#110257 ) Fix bug where the historical correlations heuristic currently sorts heuristics in the opposite order, ranking the least relevant tests most highly <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 70333d1</samp> > _`test_files` sorted_ > _by ratings, high to low_ > _a faster spring test_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110257 Approved by: https://github.com/clee2000	2023-09-29 03:29:08 +00:00
cyy	3dc479e70b	[1/N] Apply clang-tidy to c10/test/*cpp (#109278 ) This series of PR enables clang-tidy checks in c10/test. We aim to finally add the path to lintrunner.toml Pull Request resolved: https://github.com/pytorch/pytorch/pull/109278 Approved by: https://github.com/kit1980	2023-09-29 02:20:57 +00:00
jjsjann123	e6b5e0ecc6	removing the functionality of nvfuser python APIs (#110124 ) Removing the functionalities from nvfuser python APIs. Since the use of nvfuser has been deprecated before the last release cut. We are removing torch script support. I'll have the next PR to actually remove the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110124 Approved by: https://github.com/davidberard98	2023-09-29 01:45:00 +00:00
rzou	88de391692	[torch.library] Fix some docstrings (#110214 ) Removed some erroneous colons Test Plan: - code reading Pull Request resolved: https://github.com/pytorch/pytorch/pull/110214 Approved by: https://github.com/ezyang	2023-09-29 01:44:49 +00:00
ancestor-mithril	83283b4f0d	Simplify the conditionals used for learning rate calculation for `ConstantLR` learning rate scheduler (#109785 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109785 Approved by: https://github.com/janeyx99, https://github.com/kit1980	2023-09-29 01:19:05 +00:00
Jerry Zhang	c9b8e06060	[quant] Enable quantization for wav2letter (#109830 ) Summary: Also added annotation support for conv1d_relu and conv1d in XNNPACKQuantizer, the quantized results still matches fx quant path (didn't quantize conv1d) so tests are not disabled Test Plan: with-proxy buck2 run executorch/examples/quantization:example -- -m=w2l --verify Differential Revision: D49479546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109830 Approved by: https://github.com/kimishpatel	2023-09-29 00:47:34 +00:00
Animesh Jain	ce8b4f56d8	[dynamo] Dont put nn module guards on torch inbuilt nn modules (#110230 ) This is one way to fix https://github.com/pytorch/pytorch/issues/110048 Looking for feedback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110230 Approved by: https://github.com/ezyang	2023-09-29 00:43:16 +00:00
chunyuan	20dabea35d	Inductor cpp wrapper: support MkldnnRnnLayer (#107858 ) 1. Directly use the `codegen` function of the parent class which already supported both python and cpp wrapper. 2. The output of the `at::mkldnn_rnn_layer` OP is actually a `std::tuple` `1491bae277/aten/src/ATen/native/mkldnn/RNN.cpp (L218)` Fix the type when calling `MultiOutput`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107858 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-09-29 00:22:42 +00:00
Edward Z. Yang	d1a13129bb	Add support for item() and nonzero() codegen in Inductor (#109893 ) This is another version of https://github.com/pytorch/pytorch/pull/109262 that I think is more harmonious with inductor design. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109893 Approved by: https://github.com/jansel	2023-09-28 23:37:31 +00:00
Jerry Zhang	3de42995e4	[quant][pt2e] Add quant API re-entrant test (#110125 ) Summary: Add the test to make sure we can call the quantize API multiple times Test Plan: python test/test_quantization.py TestQuantizePT2E.test_reentrant Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110125 Approved by: https://github.com/kimishpatel ghstack dependencies: #110097	2023-09-28 22:41:59 +00:00
skc7	bbb95878e9	[LLVM] Update apis incompatible with llvm versions in codegen (#110200 ) Opaque pointers support is disabled in llvm 14 and enabled by default from llvm 15 and above. setOpaquePointers api usage is deprecated from llvm 16. Removed this API. Update CreateMalloc and CreateFree apis for latest llvm release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110200 Approved by: https://github.com/Skylion007	2023-09-28 21:49:30 +00:00

1 2 3 4 5 ...

64607 Commits