pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Shangdi Yu	3e05a48927	Fix clamp type promotion in inductor decomposition (#154471 ) Summary: as title, the clamp type promotion should take min/max arg into consideration as well. Test Plan: ``` buck run fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_clamp_decomposition_cpu python test/inductor/test_torchinductor.py -k test_clamp -v ``` Differential Revision: D75490124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154471 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2025-05-28 23:24:25 +00:00
PyTorch MergeBot	d81217be2e	Revert "Improve torch.ops typing (#153558 )" This reverts commit `c5cba39d46`. Reverted https://github.com/pytorch/pytorch/pull/153558 on behalf of https://github.com/yangw-dev due to Your diff will not be landed to fbcode since we suspect it caused the following breakage in an internal test:[D75007157](https://www.internalfb.com/diff/D75007157) for instance: tests_gpu/lookup_gpu_index_test.py:232:8 Undefined attribute [16]: torch._ops._OpNamespace has no attribute simple_index_mm_batch ([comment](https://github.com/pytorch/pytorch/pull/153558#issuecomment-2892506789))	2025-05-19 23:32:36 +00:00
Benjamin Glass	c5cba39d46	Improve torch.ops typing (#153558 ) Fixes longstanding issue where direct references to aten operations are seen as untyped by type checkers. This is accomplished by setting attributes on several classes more consistently, so that `__getattr__` can return a single type in all other cases. Decisions made along the way: 1. `torch.ops.higher_order` is now implemented by a single-purpose class. This was effectively true before, but the class implementing it attempted to be generalized unnecessarily. Fixing this simplified typing for the `_Ops` class. 2. `__getattr__` is only called when all other lookup methods have failed, so several constant special-cases in the function could be implemented as class variables. The remainder of this PR is fixing up all the bugs exposed by the updated typing, as well as all the nitpicky typing issues. Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/153558 Approved by: https://github.com/rec, https://github.com/Skylion007, https://github.com/cyyever	2025-05-19 14:52:32 +00:00
Zhang, Jianyi	1bc5762495	[Intel GPU][Inductor] Fallback embedding_dense_backward on XPU (#151637 ) Reopen #146888, now the modification only affects xpu device. We do not want to decompose embedding_dense_backward for torch.compile. Current XPU devices have hardware limitations on atomic ops. Fallback to eager and we can use sort to implement this op. hf_T5 amp bf16 training in torchbench can get 2x improvement on Max 1550. ~~I also align with cuda on gelu decomposition in _addmm_activation~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/151637 Approved by: https://github.com/guangyey, https://github.com/etaf, https://github.com/jansel, https://github.com/EikanWang	2025-05-19 02:19:37 +00:00
Pian Pawakapan	8ea95d2e73	[inductor] dtype promotion error in cat decomp (#152995 ) cloning single tensor wasn't following dtype promotion rules for SAM model: https://github.com/pytorch/pytorch/issues/152606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152995 Approved by: https://github.com/yushangdi, https://github.com/eellison	2025-05-09 16:58:58 +00:00
PaulZhang12	84aa0985fb	[Inductor] Add decomposeK as an autotuning choice for mm (#150654 ) As a result of adding subgraph as a choice to inductor https://github.com/pytorch/pytorch/pull/149761 and enabling FP32 output from PyTorch GEMMs from FP16/BF16 inputs: https://github.com/pytorch/pytorch/pull/150812, this PR enables decompose_k as an autotuning choice for Inductor in generating the fastest matmuls with Triton. DecomposeK is currently only enabled for `torch.compile`. Followups: * decompose_k does not currently support epilogue fusion, which will take some work to enable * Enable autotuning the bmm with Triton Templates as well without requiring tons of more compile time, async compilation. Anecdotal evidence shows that Triton BMM performs better usually than aten BMM * Add for addmm * Enable for Inference and AOTI Below are the results of running TritonBench for Split-K shapes, comparing the aten performance versus pt2_triton, which now autotunes on decompose_k, seeing >10% speedup compared to aten on average, and for some shapes over 3x the performance of the best Triton mm previously: <img width="929" alt="Screenshot 2025-04-28 at 9 15 39 PM" src="https://github.com/user-attachments/assets/27d85bbc-4f3a-43a6-a8fa-d4a5bbb8c999" /> TorchInductor Benchmark Dashboard: <img width="1727" alt="Screenshot 2025-04-30 at 2 02 53 PM" src="https://github.com/user-attachments/assets/4acd7ffc-407f-4cfd-98bb-2e3d8b1f00b3" /> We see speedups across all runs for training. Compile time increased as expected, with more `mm` options to tune over. Differential Revision: [D73820115](https://our.internmc.facebook.com/intern/diff/D73820115) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150654 Approved by: https://github.com/eellison	2025-05-03 02:23:54 +00:00
Laith Sakka	376529c78b	consolidate guard_or_x and definitely_x (#152463 ) definitely_true is almost same as guard_or_false, the potential differences are not meaningful to a degree that justify the existence of both. same for definitely_false, it can be expressed with guard_or_true and guard_or_false. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152463 Approved by: https://github.com/bobrenjc93	2025-05-02 18:08:11 +00:00
PyTorch MergeBot	7c3e679ddd	Revert "[Inductor] Add decomposeK as an autotuning choice for mm (#150654 )" This reverts commit `fdcfc6a61a`. Reverted https://github.com/pytorch/pytorch/pull/150654 on behalf of https://github.com/wdvr due to Failing ROCM tests: inductor/test_subgraph_choice.py::TestSubgraphChoice::test_subgraph_decompose_k [GH job link](https://github.com/pytorch/pytorch/actions/runs/14786111108/job/41515742446) [HUD commit link](`3c54e0c216`) ([comment](https://github.com/pytorch/pytorch/pull/150654#issuecomment-2846470409))	2025-05-02 06:31:38 +00:00
PaulZhang12	fdcfc6a61a	[Inductor] Add decomposeK as an autotuning choice for mm (#150654 ) As a result of adding subgraph as a choice to inductor https://github.com/pytorch/pytorch/pull/149761 and enabling FP32 output from PyTorch GEMMs from FP16/BF16 inputs: https://github.com/pytorch/pytorch/pull/150812, this PR enables decompose_k as an autotuning choice for Inductor in generating the fastest matmuls with Triton. DecomposeK is currently only enabled for `torch.compile`. Followups: * decompose_k does not currently support epilogue fusion, which will take some work to enable * Enable autotuning the bmm with Triton Templates as well without requiring tons of more compile time, async compilation. Anecdotal evidence shows that Triton BMM performs better usually than aten BMM * Add for addmm * Enable for Inference and AOTI Below are the results of running TritonBench for Split-K shapes, comparing the aten performance versus pt2_triton, which now autotunes on decompose_k, seeing >10% speedup compared to aten on average, and for some shapes over 3x the performance of the best Triton mm previously: <img width="929" alt="Screenshot 2025-04-28 at 9 15 39 PM" src="https://github.com/user-attachments/assets/27d85bbc-4f3a-43a6-a8fa-d4a5bbb8c999" /> TorchInductor Benchmark Dashboard: <img width="1727" alt="Screenshot 2025-04-30 at 2 02 53 PM" src="https://github.com/user-attachments/assets/4acd7ffc-407f-4cfd-98bb-2e3d8b1f00b3" /> We see speedups across all runs for training. Compile time increased as expected, with more `mm` options to tune over. Differential Revision: [D73820115](https://our.internmc.facebook.com/intern/diff/D73820115) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150654 Approved by: https://github.com/eellison	2025-05-01 23:01:30 +00:00
Laith Sakka	cbf8e0fb1a	use statically known true instead of guard size oblivious in bmm and mm inductor decompositions . (#148893 ) this was discussed with @eellison and he recommended using statically_known_true here, the intuition is. We already have 0/1 specializations in place, if we reach those checks with dynamic shapes that are not already specialized then we do not want them to specialize them, "a recompilation here is not justified". Those are all non-semantic changing optimizations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148893 Approved by: https://github.com/eellison	2025-04-28 16:44:25 +00:00
Manuel Candales	f38566dfe4	[MPSInductor] Disable mm/bmm decompositions (#150541 ) Disables mm/bmm decompositions. torch.compile on MPS was speeding up stories15M (~4x) but it was making stories110M much slower. Self-contained reproducer to demonstrate the difference (before the change, after it should be identical) ```python import torch import timeit def bench_mm(f, x, y): from torch.utils.benchmark import Timer return Timer(stmt="f(x, y); torch.mps.synchronize()", globals={"x": x, "y": y, "f": f}, language="python", timer=timeit.default_timer).blocked_autorange() x = torch.rand(1024, 512, device='mps') y = torch.rand(512, 1, device='mps') mm_c = torch.compile(torch.mm, options={"coordinate_descent_tuning": False}) mm_c_cdt = torch.compile(torch.mm, options={"coordinate_descent_tuning": True}) print(f"Compiled torch.mm perf (with cdt disabled) for 1024x512 and 512x1 matrices are {bench_mm(mm_c, x, y).median}") print(f"Compiled torch.mm perf (with cdt enabled) for 1024x512 and 512x1 matrices are {bench_mm(mm_c_cdt, x, y).median}") ``` Disabling the inductor mm decomposition, speeds up stories15M further (~6x) and speeds up stories110M (~7x) The table below show average tokens/sec across 5 runs on M1 Pro for stories15M and stories110M: \| \| stories15M \| stories110M \| \|------------------------\|------------\|-------------\| \| without compile \| 99.40 \| 53.11 \| \| compile before change \| 367.68 \| 19.43 \| \| compile after change \| 582.96 \| 355.07 \| stories110M (without compile) ``` (gptfast) mcandales@mcandales-mbp gpt-fast % python generate.py --checkpoint_path checkpoints/stories110M/stories110M.pt --prompt "Once upon a time" --device mps [...] Average tokens/sec: 53.11 ``` stories110M (compile before change) ``` (gptfast) mcandales@mcandales-mbp gpt-fast % python generate.py --checkpoint_path checkpoints/stories110M/stories110M.pt --prompt "Once upon a time" --device mps --compile [...] Average tokens/sec: 19.43 ``` stories110M (compile after change) ``` (gptfast) mcandales@mcandales-mbp gpt-fast % python generate.py --checkpoint_path checkpoints/stories110M/stories110M.pt --prompt "Once upon a time" --device mps --compile [...] Average tokens/sec: 355.07 ``` stories15M (without compile) ``` (gptfast) mcandales@mcandales-mbp gpt-fast % python generate.py --checkpoint_path checkpoints/stories110M/stories110M.pt --prompt "Once upon a time" --device mps [...] Average tokens/sec: 99.40 ``` stories15M (compile before change) ``` (gptfast) mcandales@mcandales-mbp gpt-fast % python generate.py --checkpoint_path checkpoints/stories110M/stories110M.pt --prompt "Once upon a time" --device mps --compile [...] Average tokens/sec: 367.68 ``` stories15M (compile after change) ``` (gptfast) mcandales@mcandales-mbp gpt-fast % python generate.py --checkpoint_path checkpoints/stories110M/stories110M.pt --prompt "Once upon a time" --device mps --compile [...] Average tokens/sec: 582.96 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150541 Approved by: https://github.com/malfet	2025-04-02 16:07:18 +00:00
Isuru Fernando	82ceebce58	[inductor] Lowerings for max_pool3d (#148210 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148210 Approved by: https://github.com/eellison	2025-04-02 14:13:01 +00:00
Scott Wolchok	dc39e673e2	Remove aten.elu core ATen decomp because it is now core ATen (#149780 ) Per @larryliu0820. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149780 Approved by: https://github.com/larryliu0820	2025-03-25 01:59:57 +00:00
Isuru Fernando	66b0a0b61a	[inductor] support dilation in max_pool2d lowering (#148209 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148209 Approved by: https://github.com/eellison	2025-03-24 13:00:12 +00:00
Xuehai Pan	1cb4e2df65	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550 Approved by: https://github.com/jansel	2025-02-28 13:33:19 +00:00
Aaron Orenstein	893ca1dfe1	PEP585 update - torch/_inductor/[_-i]* (#145137 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145137 Approved by: https://github.com/bobrenjc93	2025-01-19 01:22:47 +00:00
Tom Ritchford	46fbd63405	Fix unbind_copy and add its decomposition (#134319 ) * Fixes https://github.com/pytorch/pytorch/issues/130829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134319 Approved by: https://github.com/amjames, https://github.com/eellison	2025-01-17 18:21:22 +00:00
bobrenjc93	a3ab27b8e0	Migrate from Tuple -> tuple in torch/_inductor (#144264 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144264 Approved by: https://github.com/eellison	2025-01-07 03:27:27 +00:00
Aaron Orenstein	45ef3309e3	[BE] typing for decorators (#144161 ) Summary: Untyped decorators strip annotations from the decorated items. - _compile - _inductor/fx_passes/post_grad - _inductor/lowering - _library/custom_ops - _meta_registrations - _ops - _refs/nn/functional - ao/quantization/quantizer/xnnpack_quantizer_utils - distributed/_composable/contract - fx/experimental/graph_gradual_typechecker - fx/experimental/migrate_gradual_types/constraint_generator - optim/optimizer - signal/windows/windows - testing/_internal/common_device_type - torch/_inductor/decomposition - utils/flop_counter Test Plan: unit tests Differential Revision: D62302684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144161 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-01-04 16:40:09 +00:00
Michael Lazos	8960cb5809	Add support for bfloat16 atomic adds in fbcode (#143629 ) Reland https://github.com/pytorch/pytorch/pull/141857 and fallback on A100 which doesn't have bfloat16 atomic add instrs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143629 Approved by: https://github.com/eellison	2024-12-20 23:05:13 +00:00
Michael Lazos	b4e0e3bfa3	Backout D66648013 (#143433 ) Summary: backing out https://www.internalfb.com/diff/D66648013 (see comments there for justification) I will reland and disallow the bfloat16 atomics behavior on A100 because it causes a pretty significant performance regression. Test Plan: This is a revert Differential Revision: D67357485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143433 Approved by: https://github.com/davidberard98	2024-12-19 00:53:49 +00:00
Michael Lazos	a3abe1a5ae	Add support for bfloat16 atomic adds in fbcode (#141857 ) This adds support for bfloat16 atomic add in fbcode (OSS will have to wait until those changes are upstreamed to triton) Originally I attempted to write inline asm, but the triton API was not flexible enough to support this use case. In the long run the right answer is to implement this properly in OSS triton. relevant issues: * https://github.com/pytorch/pytorch/issues/137425 in fbcode only * https://github.com/pytorch/pytorch/issues/97016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141857 Approved by: https://github.com/eellison	2024-12-10 11:40:15 +00:00
IvanKobzarev	f85e238186	[aotd] capture rrelu_with_noise noise mutation in compile (#141867 ) Rebase-copy of long standing already approved PR https://github.com/pytorch/pytorch/pull/138503 that was blocked on landing by xla build issues. Got a new PR with the same content (ghstack checkout was failing due to changed submodules) Corresponding xla PR: https://github.com/pytorch/xla/pull/8363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141867 Approved by: https://github.com/bdhirsh	2024-12-04 12:18:58 +00:00
Chien-Lin Chen	161425ff9f	Added aten.bernoulli.p and aten.bernoulli.default decompositions (#139141 ) Fixes #105519 Added aten.bernoulli.p decomposition and moved/rewrote aten.bernoulli.deafult to make them included in core aten decomposition. Tested the sample code in [105519](https://github.com/pytorch/pytorch/issues/105519), torch.bernoulli could be decomposed by the code snippet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139141 Approved by: https://github.com/eellison	2024-11-20 19:52:57 +00:00
eellison	34e420519d	[Reland] dont decompose baddbmm (#141045 ) Previously the decomposition would upcasts inputs to fp32. This led to a slowdown compared to eager which would run in fp16. We also tried keeping the bmm in fp16, and the upcasting for the epilogue but that led to worse numerics because the bmm in eager would do the epilogue all in fp32 without a downcast in the bmm accumulator. Fix for https://github.com/pytorch/pytorch/issues/137897 Reland of https://github.com/pytorch/pytorch/pull/137904 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141045 Approved by: https://github.com/BoyuanFeng	2024-11-19 21:07:58 +00:00
Masaki Kozuki	6a368b3fc5	Add ScalarList overload to `_foreach_lerp` (#134482 ) Related: - https://github.com/pytorch/pytorch/issues/133367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134482 Approved by: https://github.com/janeyx99	2024-11-12 19:03:41 +00:00
leslie-fang-intel	d84a344410	[Inductor] Skip coordinate_descent_tuning for mm/bmm decomposition on CPU (#139537 ) Summary Fix issue: https://github.com/pytorch/pytorch/issues/138823, `coordinate_descent_tuning` doesn't benefit on CPU and prefer lowering `mm`/`bmm` into ATEN kernels or CPP GEMM Template. Test Plan ``` python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_cpp_coordinate_descent_tuning ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139537 Approved by: https://github.com/jansel	2024-11-03 10:10:29 +00:00
PyTorch MergeBot	38645e8a3e	Revert "Fix unbind_copy and add its decomposition (#134319 )" This reverts commit `8aedc649bd`. Reverted https://github.com/pytorch/pytorch/pull/134319 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but this is still failing the same test on ExecuTorch ([comment](https://github.com/pytorch/pytorch/pull/134319#issuecomment-2443209139))	2024-10-29 04:54:37 +00:00
PyTorch MergeBot	6aef58a249	Revert "Dont decompose aten.baddmm in inductor (#137904 )" This reverts commit `c066f4a055`. Reverted https://github.com/pytorch/pytorch/pull/137904 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the test is failing in trunk, maybe a landrace? ([comment](https://github.com/pytorch/pytorch/pull/137904#issuecomment-2443158194))	2024-10-29 04:08:11 +00:00
eellison	c066f4a055	Dont decompose aten.baddmm in inductor (#137904 ) Previously the decomposition would upcasts inputs to fp32. This led to a slowdown compared to eager which would run in fp16. We also tried keeping the bmm in fp16, and the upcasting for the epilogue but that led to worse numerics because the bmm in eager would do the epilogue all in fp32 without a downcast in the bmm accumulator. Fix for https://github.com/pytorch/pytorch/issues/137897 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137904 Approved by: https://github.com/ngimel	2024-10-29 00:54:29 +00:00
Tom Ritchford	8aedc649bd	Fix unbind_copy and add its decomposition (#134319 ) * Fixes https://github.com/pytorch/pytorch/issues/130829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134319 Approved by: https://github.com/amjames, https://github.com/eellison	2024-10-23 19:13:44 +00:00
Tom Ritchford	1bc73f3157	Add decomposition for permute_copy (#130944 ) * Extracted from #129476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130944 Approved by: https://github.com/amjames, https://github.com/eellison	2024-10-23 17:42:11 +00:00
PyTorch MergeBot	af306a392c	Revert "Dont decompose aten.baddmm in inductor (#137904 )" This reverts commit `7a117f3b3e`. Reverted https://github.com/pytorch/pytorch/pull/137904 on behalf of https://github.com/clee2000 due to unfortunately the failures on the previous import are still present on the current one D64568703 ([comment](https://github.com/pytorch/pytorch/pull/137904#issuecomment-2422789143))	2024-10-18 16:01:01 +00:00
eellison	7a117f3b3e	Dont decompose aten.baddmm in inductor (#137904 ) Previously the decomposition would upcasts inputs to fp32. This led to a slowdown compared to eager which would run in fp16. We also tried keeping the bmm in fp16, and the upcasting for the epilogue but that led to worse numerics because the bmm in eager would do the epilogue all in fp32 without a downcast in the bmm accumulator. Fix for https://github.com/pytorch/pytorch/issues/137897 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137904 Approved by: https://github.com/ngimel	2024-10-17 19:24:54 +00:00
PyTorch MergeBot	5254a0d383	Revert "Dont decompose aten.baddmm in inductor (#137904 )" This reverts commit `cef6c3dcb0`. Reverted https://github.com/pytorch/pytorch/pull/137904 on behalf of https://github.com/clee2000 due to failing internal tests D64418200, some results not within tolerance? ([comment](https://github.com/pytorch/pytorch/pull/137904#issuecomment-2418122735))	2024-10-16 23:16:44 +00:00
eellison	cef6c3dcb0	Dont decompose aten.baddmm in inductor (#137904 ) Previously the decomposition would upcasts inputs to fp32. This led to a slowdown compared to eager which would run in fp16. We also tried keeping the bmm in fp16, and the upcasting for the epilogue but that led to worse numerics because the bmm in eager would do the epilogue all in fp32 without a downcast in the bmm accumulator. Fix for https://github.com/pytorch/pytorch/issues/137897 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137904 Approved by: https://github.com/ngimel	2024-10-15 14:54:56 +00:00
Benjamin Glass	a968576777	Add lowering for aten.searchsorted (#135701 ) Adds lowering for `aten.searchsorted`. This entails: 1. Adding support for multi-dimensional bucket tensors to `ops.bucketize`. 2. Adding support for striding to `ops.bucketize`. 3. Adding support for sorting tensors to `ops.bucketize`. 4. Adding a lowering for `aten.searchsorted.Tensor`. 5. Adding a basic decomposition for `aten.searchsorted.Scalar` that calls into the lowering for tensors. 6. Updating the meta-function for `aten.searchsorted` to properly check some of the sizing conditions. Closes #135873 Differential Revision: [D63766514](https://our.internmc.facebook.com/intern/diff/D63766514) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135701 Approved by: https://github.com/amjames, https://github.com/eellison, https://github.com/davidberard98	2024-10-04 19:26:05 +00:00
Isuru Fernando	ef6fd3d780	Fix adaptive_max_pool2d fallback (#136367 ) Fixes #136332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136367 Approved by: https://github.com/amjames, https://github.com/eellison	2024-10-01 16:20:34 +00:00
Huamin Li	fd494dd426	Change wrapped_linear_prepack and wrapped_quantized_linear_prepacked to private by adding _ as prefix (#135401 ) Summary: In https://github.com/pytorch/pytorch/pull/134232, we added two new ops wrapped_linear_prepack and wrapped_quantized_linear_prepacked. From the review comments and offline discussion, we are changing them to private by adding `_` as prefix Differential Revision: D62325142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135401 Approved by: https://github.com/houseroad	2024-09-08 04:16:24 +00:00
chilli	23a2161ad1	Changed addmv to be a decomposition and not a fallback (#134823 ) Overall seems to be faster ![image](https://github.com/user-attachments/assets/0cbea76e-fb78-4634-9265-047de0291549) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134823 Approved by: https://github.com/jansel ghstack dependencies: #134813, #134818, #134819	2024-09-03 06:33:31 +00:00
Huamin Li	ccafc93be5	[AOTI][CPU] Make int8 qlinear work (#134368 ) Summary: This diff will decompose torch.ops._quantized.wrapped_quantized_linear into torch.ops._quantized.wrapped_linear_prepack and torch.ops._quantized.wrapped_quantized_linear_prepacked for AOTI, and added the corresponding impl into shim The way it works will be similar to what we did previously for fbgemm fp16 dynamic qlinear. We will do constant folding for packed weight during runtime (warm up) to achieve the speed up Reviewed By: desertfire Differential Revision: D61396144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134368 Approved by: https://github.com/houseroad	2024-08-24 08:25:25 +00:00
eellison	baa4c9ca46	Optimize aten.cat calls of a repeated element (#132081 ) This was a particular problem for a model I saw which would have a large number of repeats, making compilation slow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132081 Approved by: https://github.com/shunting314	2024-07-30 02:56:00 +00:00
Tom Ritchford	962f248437	Add decomposition for expand_copy (#130940 ) * Extracted from #129476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130940 Approved by: https://github.com/peterbell10	2024-07-29 16:23:56 +00:00
Adnan Akhundov	33069630ce	[inductor] Add type hints to functions in decompositions.py (#131780 ) Summary: ATT Test Plan: lintrunner Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/131780 Approved by: https://github.com/eellison	2024-07-26 04:50:23 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
Xuehai Pan	b6d477fd56	[BE][Easy][16/19] enforce style for empty lines in import segments in `torch/_i*/` (#129768 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129768 Approved by: https://github.com/jansel	2024-07-20 16:20:58 +00:00
eellison	67c6941b4e	Update torch.cat decomp for 0-dim (#130763 ) Fix for https://github.com/pytorch/pytorch/issues/130615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130763 Approved by: https://github.com/Skylion007, https://github.com/mlazos	2024-07-16 13:34:01 +00:00
PyTorch MergeBot	a2f630a9a4	Revert "Decompose expand_copy and permute_copy (#129476 )" This reverts commit `7d4cb21098`. Reverted https://github.com/pytorch/pytorch/pull/129476 on behalf of https://github.com/izaitsevfb due to depends on #128416 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/129476#issuecomment-2224019720))	2024-07-11 22:06:15 +00:00
Tom Ritchford	7d4cb21098	Decompose expand_copy and permute_copy (#129476 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129476 Approved by: https://github.com/amjames, https://github.com/lezcano	2024-07-10 17:12:01 +00:00
Isuru Fernando	c12a4f2e65	Add decomposition for slice_scatter (#123744 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123744 Approved by: https://github.com/peterbell10	2024-06-28 17:02:10 +00:00

1 2 3 4 5

247 Commits