pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Jez Ng	178ce1433c	Hoist out auxiliary values in optional-typed arguments (#123613 ) This fixes #123176, and partially addresses #121814 too. #123176 uses an optional device arg while #121814 uses an optional list arg. For optional arguments that have auxiliary info -- specifically, tuples / lists with their length parameter, and device types with their device index -- we need to hoist out the extra argument. E.g. when passing a device with ID 1, we want to emit ``` auto var_0 = cached_torch_device_type_cpu; aoti_torch_foo(..., &var_0, 1); ``` instead of the (syntactically incorrect) ``` auto var_0 = cached_torch_device_type_cpu,1; aoti_torch_foo(..., &var_0); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123613 Approved by: https://github.com/desertfire	2024-04-09 20:17:35 +00:00
Jez Ng	1b9eebb6bb	[AOTI] Handle null outputs (#123460 ) Summary: I skipped over the codegen for output handle assignment if the outputs are null -- in addition to being redundant, it was causing compile errors. I also modified the runtime to do the necessary null checks. Fixes #123173. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123460 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2024-04-08 23:07:03 +00:00
Adnan Akhundov	63c221b7fa	Clone mutated inputs in first pass of CPP wrapper compilation (#123316 ) Summary: CPP wrapper compilation is currently done in two passes: in the first pass, Python wrapper is generated and run to compile Triton kernels as a side effect, in the second pass C++ wrapper is generated and compiled. When model inputs are mutated, running the Python wrapper in the first pass mutates the inputs, although the first pass (including the Python wrapper run) is strictly a part of the compilation process, hence must not introduce any side effects on the example inputs. In this PR, we clone mutated inputs in the first pass to avoid input mutation. Fixes https://github.com/pytorch/pytorch/issues/117364. Test Plan: ``` $ TORCHINDUCTOR_CPP_WRAPPER=1 python test/inductor/test_torchinductor.py -k test_inductor_layout_optimization_input_mutations_cuda ... . ---------------------------------------------------------------------- Ran 1 test in 6.368s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123316 Approved by: https://github.com/jansel, https://github.com/chenyang78, https://github.com/desertfire	2024-04-05 21:47:19 +00:00
Bin Bao	aa063054ce	[AOTI] Fix the codegen for aten.randint.low_out (#123346 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/123174. There are two problems here, * Incorrectly calling convert_arrayref_tensor_to_tensor on int arguments. Removing relevant code since we don't use ArrayRef when there is a fallback op. * codegen_kwargs generates an argument for the out parameter of ExternKernelOut. The fix is to leave that logic to corresponding wrapper codegen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123346 Approved by: https://github.com/chenyang78	2024-04-04 23:23:50 +00:00
Bin Bao	0c6e8af257	[AOTI][refactor] Update some test cases (#123093 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123093 Approved by: https://github.com/Skylion007, https://github.com/chenyang78	2024-04-03 00:51:11 +00:00
chunyuan	8b7da5b791	Inductor cpp wrapper: fix dtype of ShapeAsConstantBuffer (#122297 ) For `at::scalar_tensor` the default dtype will be `float` ([link to scalar_tensor](`0d8e960f74/aten/src/ATen/native/TensorFactories.cpp (L856)`), [link to default dtype](`0d8e960f74/c10/core/TensorOptions.h (L551)`)) if we don't set the `dtype` value. However, the input scalar value is not necessarily a `float` value. With `torch::tensor(x)`, the dtype of the tensor will be decided according to the dtype of the scalar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122297 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-04-01 01:32:41 +00:00
Bin Bao	537cd66e73	[Inductor] Support custom op in JIT with cpp wrapper (#122554 ) Summary: To call custom ops in an ABI-compatible way requires doing boxed call with varargs across C shim. In the JIT mode, we can get around it by calling into Python. https://gist.github.com/desertfire/be2a65b0a9b47780bb716b53ac2cd2b3 is an example of generated code. Differential Revision: [D55326556](https://our.internmc.facebook.com/intern/diff/D55326556) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122554 Approved by: https://github.com/jansel, https://github.com/chenyang78	2024-03-26 18:48:45 +00:00
Sam Larsen	535bc71d03	Enable FX graph caching in another batch of inductor tests (#121697 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121697 Approved by: https://github.com/eellison	2024-03-15 19:38:51 +00:00
Bin Bao	818b14025a	[AOTI][refactor] Remove is_legacy_abi_kernel and abi_compatible_kernel (#121523 ) Summary: is_legacy_abi_kernel was used for _scaled_dot_product_flash_attention fallback. It is only needed for C shim kernel name matching now, and the name matching is done with a direct string comparison. Also consolidate the fallback cpp kernel naming logic in CppWrapperCpu. Differential Revision: [D54727789](https://our.internmc.facebook.com/intern/diff/D54727789) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121523 Approved by: https://github.com/chenyang78	2024-03-14 22:05:38 +00:00
Bin Bao	0339f1ca82	[Inductor] Allocate another shard for testing cpp-wrapper JIT (#121310 ) Summary: The ABI-compatible for cpp wrapper has not been turned on as default, so test them separately. Expect to add more tests for the shard. Differential Revision: [D54617287](https://our.internmc.facebook.com/intern/diff/D54617287) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121310 Approved by: https://github.com/chenyang78 ghstack dependencies: #121309	2024-03-07 14:24:21 +00:00
Xia, Weiwen	83d848e1c7	[Quant][Inductor] Enable lowering of dynamic qlinear for X86Inductor (#120605 ) description Enable lowering of dynamic qlinear for X86Inductor. The pattern is `choose_qparams -> getitem -> q -> dq -> linear`. We only fuse `dq -> linear` and get `choose_qparams -> getitem -> q -> onednn.qlinear_pointwise`. So, we treat it as dynamic quantization of activation + static quantized linear. The previous implementation of `onednn.qlinear_pointwise` is for the case where `x_scale` and `x_zp` are scalars. Since `choose_qparams` returns tensors, we added a variation `onednn.qlinear_pointwise.tensor` to support the case. This feature is targeting PyTorch 2.3 release. Test plan ``` python inductor/test_mkldnn_pattern_matcher.py -k test_dynamic_qlinear_cpu python inductor/test_mkldnn_pattern_matcher.py -k test_dynamic_qlinear_qat_cpu python inductor/test_cpu_cpp_wrapper.py -k test_dynamic_qlinear ``` Performance before and after lowering `choose_qparam` to Inductor Before - latency for shape (32, 32) = 0.151 ms latency for shape (128, 128) = 0.153 ms latency for shape (1024, 1024) = 0.247 ms After - latency for shape (32, 32) = 0.049 ms - latency for shape (128, 128) = 0.052 ms - latency for shape (1024, 1024) = 0.133 ms Test method: A module with a single Linear layer, dynamic-quantize, lower to X86Inductor Test env & config: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz, single instance, single core, using Intel OpenMP and Tcmalloc Pull Request resolved: https://github.com/pytorch/pytorch/pull/120605 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168	2024-03-02 05:11:17 +00:00
Bin Bao	946ea47a4f	[inductor] Fix an internal test issue (#118903 ) Summary: test_add_complex4 that introduced in https://github.com/pytorch/pytorch/pull/117929 fails internally, because of a cpp compilation issue for cpu. Specify the right device in the test instead. Differential Revision: [D53333919](https://our.internmc.facebook.com/intern/diff/D53333919) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118903 Approved by: https://github.com/clee2000	2024-02-02 03:18:12 +00:00
hodavand	8026534a2f	Add torch.complex128 and torch.complex32 to DTYPE_TO_ATEN dictionary. (#117929 ) Fixes #117370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117929 Approved by: https://github.com/Skylion007, https://github.com/desertfire	2024-01-31 19:34:58 +00:00
chunyuan	1ae39a372e	Inductor cpp wrapper: fix cumsum codegen (#116171 ) Fixes https://github.com/pytorch/pytorch/issues/115829 For `cumsum(Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor`, `dim` is not a `kwarg_only` argument, but it could be provided as a kwarg when calling this OP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116171 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/jansel	2024-01-03 05:33:17 +00:00
Bin Bao	a81edf9f23	[inductor] Fix cpp_wrapper codegen for ir.ComplexView (#116481 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116481 Approved by: https://github.com/htyu	2024-01-02 05:38:58 +00:00
Bin Bao	a597a00c87	[AOTI][refactor][3/n] Declare python_kernel_name and cpp_kernel_name in ExternKernel (#115972 ) Summary: Both ExternKernelAlloc and ExternKernelOut need the two fields, so declaring them in the base class. Also add cpp codegen for IndexPutFallback and InplaceBernoulliFallback in this PR. This is a reland of https://github.com/pytorch/pytorch/pull/115831 Differential Revision: [D52290900](https://our.internmc.facebook.com/intern/diff/D52290900) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115972 Approved by: https://github.com/chenyang78	2023-12-20 03:22:03 +00:00
Jiong Gong	715d663794	[inductor] split test_cpp_wrapper.py into cpu and cuda test files (#115479 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115479 Approved by: https://github.com/atalman ghstack dependencies: #115167	2023-12-15 21:21:10 +00:00
PyTorch MergeBot	66994bca5f	Revert "[inductor] split test_cpp_wrapper.py into cpu and cuda test files (#115479 )" This reverts commit `653acd8fe1`. Reverted https://github.com/pytorch/pytorch/pull/115479 on behalf of https://github.com/desertfire due to will cause land race in fbcode because https://github.com/pytorch/pytorch/pull/115831 is already landed internally ([comment](https://github.com/pytorch/pytorch/pull/115479#issuecomment-1857979948))	2023-12-15 14:35:40 +00:00
Jiong Gong	653acd8fe1	[inductor] split test_cpp_wrapper.py into cpu and cuda test files (#115479 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115479 Approved by: https://github.com/atalman ghstack dependencies: #115167	2023-12-15 04:04:08 +00:00

19 Commits