pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jason Ansel	8d08b49015	Reland: [inductor] Simplify grid handling (#148305 ) Summary: Relands D69965761 / https://github.com/pytorch/pytorch/pull/147583 Before this PR, calling a triton kernel would look like: ```py kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0) ``` where the `grid=` was passed as a callable (function closure) arg. This PR removes the grid arg: ```py kernel.run(a, b, xnumel, stream=stream0) ``` instead now the grid computation is included in the kernel launcher, with something like: ```py def launcher(in_ptr0, out_ptr0, xnumel, stream): grid_0 = ((xnumel + 1023) >> 10) grid_1 = 1 grid_2 = 1 runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel) ``` This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`. It also allows us to unify the handling of grids between the Python and C++ wrapper code. Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid. This unification allows this PR to be a net deletion of code. Differential Revision: D70471332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148305 Approved by: https://github.com/shunting314, https://github.com/eellison	2025-03-11 18:51:06 +00:00
PyTorch MergeBot	608377d341	Revert "[import][inductor] Simplify grid handling (#147583 )" This reverts commit `b59776d857`. Reverted https://github.com/pytorch/pytorch/pull/147583 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/147583#issuecomment-2693016036))	2025-03-03 00:49:32 +00:00
Jason Ansel	b59776d857	[import][inductor] Simplify grid handling (#147583 ) Before this PR, calling a triton kernel would look like: ```py kernel.run(a, b, xnumel, grid=grid(xnumel), stream=stream0) ``` where the `grid=` was passed as a callable (function closure) arg. This PR removes the grid arg: ```py kernel.run(a, b, xnumel, stream=stream0) ``` instead now the grid computation is included in the kernel launcher, with something like: ```py def launcher(in_ptr0, out_ptr0, xnumel, stream): grid_0 = ((xnumel + 1023) >> 10) grid_1 = 1 grid_2 = 1 runner(grid_0, grid_1, grid_2, stream, function, metadata, None, launch_enter_hook, launch_exit_hook, in_ptr0, out_ptr0, xnumel) ``` This should be faster, since we remove multiple function/dict calls and are able to specialize the grid computation for each `triton.Config`. It also allows us to unify the handling of grids between the Python and C++ wrapper code. Before this, C++ wrapper code didn't actually support dynamic grid sizes and instead burned in a static grid. This unification allows this PR to be a net deletion of code. Note the attached diff contains some minor fbcode-only changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147583 Approved by: https://github.com/eellison, https://github.com/shunting314	2025-03-02 07:31:07 +00:00
Xuehai Pan	1cb4e2df65	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550 Approved by: https://github.com/jansel	2025-02-28 13:33:19 +00:00
Tom Ritchford	44ee9ca593	[inductor] Add type annotations to _inductor/utils.py (#144108 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144108 Approved by: https://github.com/eellison	2025-02-15 23:13:41 +00:00
Jason Ansel	06604c4ec1	[inductor] Refactor op handlers part 5 (#146257 ) This makes OpHandler just a normal class using inheritance, and removes typing workarounds needed because it wasn't Pull Request resolved: https://github.com/pytorch/pytorch/pull/146257 Approved by: https://github.com/shunting314 ghstack dependencies: #146252, #146254, #146255	2025-02-08 18:00:30 +00:00
PyTorch MergeBot	49effa0deb	Revert "[inductor] Refactor op handlers part 5 (#146257 )" This reverts commit `d3dd3eeb7f`. Reverted https://github.com/pytorch/pytorch/pull/146257 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/146252 ([comment](https://github.com/pytorch/pytorch/pull/146257#issuecomment-2638251994))	2025-02-05 23:20:38 +00:00
Jason Ansel	d3dd3eeb7f	[inductor] Refactor op handlers part 5 (#146257 ) This makes OpHandler just a normal class using inheritance, and removes typing workarounds needed because it wasn't Pull Request resolved: https://github.com/pytorch/pytorch/pull/146257 Approved by: https://github.com/shunting314 ghstack dependencies: #146225, #146226, #146235, #146252, #146254, #146255	2025-02-04 23:36:25 +00:00
Jason Ansel	67be5953fe	[inductor] Refactor op handlers part 1 (#146235 ) This enforces the invariant that every backend implements the same set of ops and removes a layer of indirection for BasicMathOps. Interestingly this is a small compile time win: ``` ... WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 30151159301 is -6.13% lower than expected 32120000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 44447549162 -1.69% is within expected 45210000000 ±2.50% WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26743557195 is -2.25% lower than expected 27360000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones PASS: benchmark ('basic_modules_ListOfLinears_eager', 'compile_time_instruction_count') pass, actual result 945129734 +0.93% is within expected 936400000 ±1.50% WIN: benchmark ('basic_modules_ListOfLinears_inductor', 'compile_time_instruction_count') failed, actual result 18984384503 is -3.19% lower than expected 19610000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones WIN: benchmark ('basic_modules_ListOfLinears_inductor_gpu_force_shape_pad', 'compile_time_instruction_count') failed, actual result 17258025389 is -1.94% lower than expected 17600000000 ±1.50% please update the expected results. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146235 Approved by: https://github.com/shunting314 ghstack dependencies: #146225, #146226	2025-02-04 23:35:53 +00:00
Jason Ansel	e9f6e273e7	[inductor] Add typing to common.CSE (#145993 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145993 Approved by: https://github.com/yanboliang ghstack dependencies: #145916	2025-02-04 16:05:39 +00:00
PyTorch MergeBot	d3c7e4bb9c	Revert "[inductor] Add typing to common.CSE (#145993 )" This reverts commit `8c657ae4be`. Reverted https://github.com/pytorch/pytorch/pull/145993 on behalf of https://github.com/atalman due to Sorry need to revert https://github.com/pytorch/pytorch/pull/145916 ([comment](https://github.com/pytorch/pytorch/pull/145993#issuecomment-2632712384))	2025-02-04 03:04:01 +00:00
PyTorch MergeBot	2f40f789da	Revert "[inductor] Refactor op handlers part 1 (#146235 )" This reverts commit `204be4e0a2`. Reverted https://github.com/pytorch/pytorch/pull/146235 on behalf of https://github.com/atalman due to Breaks lint, sorry: Definition of polygamma in base class MetalOverrides is incompatible with definition in base class OpsHandler. Please rebase fix lint and reland ([comment](https://github.com/pytorch/pytorch/pull/146235#issuecomment-2632444514))	2025-02-04 00:00:08 +00:00
Jason Ansel	204be4e0a2	[inductor] Refactor op handlers part 1 (#146235 ) This enforces the invariant that every backend implements the same set of ops and removes a layer of indirection for BasicMathOps. Interestingly this is a small compile time win: ``` ... WIN: benchmark ('add_loop_inductor', 'compile_time_instruction_count') failed, actual result 30151159301 is -6.13% lower than expected 32120000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones PASS: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') pass, actual result 44447549162 -1.69% is within expected 45210000000 ±2.50% WIN: benchmark ('add_loop_inductor_gpu', 'compile_time_instruction_count') failed, actual result 26743557195 is -2.25% lower than expected 27360000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones PASS: benchmark ('basic_modules_ListOfLinears_eager', 'compile_time_instruction_count') pass, actual result 945129734 +0.93% is within expected 936400000 ±1.50% WIN: benchmark ('basic_modules_ListOfLinears_inductor', 'compile_time_instruction_count') failed, actual result 18984384503 is -3.19% lower than expected 19610000000 ±1.50% please update the expected results. please update all results that changed significantly, and not only the failed ones WIN: benchmark ('basic_modules_ListOfLinears_inductor_gpu_force_shape_pad', 'compile_time_instruction_count') failed, actual result 17258025389 is -1.94% lower than expected 17600000000 ±1.50% please update the expected results. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146235 Approved by: https://github.com/shunting314 ghstack dependencies: #146225, #146226	2025-02-03 23:15:13 +00:00
Jason Ansel	8c657ae4be	[inductor] Add typing to common.CSE (#145993 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145993 Approved by: https://github.com/yanboliang ghstack dependencies: #145913, #145914, #145915, #145916	2025-02-01 16:34:18 +00:00
Jason Ansel	2df2f9d895	[inductor] Change type of get_backend_features to OrderedSet (#145692 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145692 Approved by: https://github.com/yanboliang	2025-01-28 01:44:32 +00:00
Aaron Orenstein	2bf772d1ba	PEP585 update - torch/_inductor/codegen (#145106 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145106 Approved by: https://github.com/bobrenjc93	2025-01-18 06:56:03 +00:00
bobrenjc93	a3ab27b8e0	Migrate from Tuple -> tuple in torch/_inductor (#144264 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144264 Approved by: https://github.com/eellison	2025-01-07 03:27:27 +00:00
Tom Ritchford	da67a6a7bb	[inductor] Replace set by OrderedSet (#138466 ) Uses the set_linter from https://github.com/pytorch/pytorch/pull/138454 and considerable manual editing Pull Request resolved: https://github.com/pytorch/pytorch/pull/138466 Approved by: https://github.com/eellison	2024-12-13 16:08:45 +00:00
Tom Ritchford	dc23f1944a	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-12 17:39:14 +00:00
PyTorch MergeBot	5c97ac9721	Revert "Remove unused Python variables in torch/[_-a]* (#133492 )" This reverts commit `fda975a7b3`. Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))	2024-12-11 17:29:12 +00:00
Tom Ritchford	fda975a7b3	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-10 21:48:44 +00:00
Blaine Burton Rister	f9af86de01	[Inductor] Represent tiling as a dict (#141751 ) # Summary Preparatory refactor for https://github.com/pytorch/pytorch/pull/137243. This makes it easier to generalize to multi-dimensional reductions. This diff refactors `self.numels` from a tuple like `(8,16)` to a dict like `{"x": 8, "r": 16}`. Note: this is based off of https://github.com/pytorch/pytorch/pull/141738, which enables `tree.is_reduction`. That PR should land first. # Test plan The existing CI provides good coverage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141751 Approved by: https://github.com/jansel	2024-12-05 02:28:16 +00:00
PyTorch MergeBot	38d10a1b17	Revert "[Inductor] Represent tiling as a dict (#141751 )" This reverts commit `5deca07c0d`. Reverted https://github.com/pytorch/pytorch/pull/141751 on behalf of https://github.com/atalman due to Failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/141751#issuecomment-2517815899))	2024-12-04 15:43:16 +00:00
Blaine Burton Rister	5deca07c0d	[Inductor] Represent tiling as a dict (#141751 ) # Summary Preparatory refactor for https://github.com/pytorch/pytorch/pull/137243. This makes it easier to generalize to multi-dimensional reductions. This diff refactors `self.numels` from a tuple like `(8,16)` to a dict like `{"x": 8, "r": 16}`. Note: this is based off of https://github.com/pytorch/pytorch/pull/141738, which enables `tree.is_reduction`. That PR should land first. # Test plan The existing CI provides good coverage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141751 Approved by: https://github.com/jansel	2024-12-01 09:54:34 +00:00
Blaine Burton Rister	49fde426ba	[Inductor] Use a helper function to tell if a tree or prefix is a reduction (#141738 ) Preparatory refactor for https://github.com/pytorch/pytorch/pull/137243. Previously, we would typically check for reductions by `tree.prefix == "r"`. This PR moves the check into a helper function. This makes it easier to generalize the code to multi-dimensional reductions, which could have multiple prefixes like `("r0_", "r1_")`. Tested by the existing CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141738 Approved by: https://github.com/jansel	2024-11-30 22:38:13 +00:00
Isuru Fernando	44186a0a4e	Move Sympy printers to torch/utils/_sympy/printers.py (#140597 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140597 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2024-11-26 18:11:00 +00:00
PyTorch MergeBot	f23621ec56	Revert "Move Sympy printers to torch/utils/_sympy/printers.py (#140597 )" This reverts commit `c25b201583`. Reverted https://github.com/pytorch/pytorch/pull/140597 on behalf of https://github.com/huydhn due to Trunk is sad again after this lands, this looks like a landrace this time, so please do a rebase ([comment](https://github.com/pytorch/pytorch/pull/140597#issuecomment-2494052978))	2024-11-22 15:43:39 +00:00
Isuru Fernando	c25b201583	Move Sympy printers to torch/utils/_sympy/printers.py (#140597 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140597 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2024-11-22 02:04:36 +00:00
PyTorch MergeBot	701e06b643	Revert "Move Sympy printers to torch/utils/_sympy/printers.py (#140597 )" This reverts commit `aefcdb3c9f`. Reverted https://github.com/pytorch/pytorch/pull/140597 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think it fails inductor/test_padding in trunk. This is a target determination miss and that failed test was not run in your PR ([comment](https://github.com/pytorch/pytorch/pull/140597#issuecomment-2489641453))	2024-11-20 22:13:57 +00:00
Isuru Fernando	aefcdb3c9f	Move Sympy printers to torch/utils/_sympy/printers.py (#140597 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140597 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2024-11-20 20:26:49 +00:00
Aaron Gokaslan	12e95aa4ee	[BE]: Apply PERF401 autofixes from ruff (#140980 ) * Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-11-20 17:52:07 +00:00
Jason Ansel	ed30fa74ab	[inductor] sympy.Integer([01]) -> sympy.S.(Zero\|One) (#139523 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139523 Approved by: https://github.com/ezyang ghstack dependencies: #139364, #139365, #139370, #139452	2024-11-04 04:28:40 +00:00
PyTorch MergeBot	98e11b0021	Revert "[inductor] sympy.Integer([01]) -> sympy.S.(Zero\|One) (#139523 )" This reverts commit `c53beab377`. Reverted https://github.com/pytorch/pytorch/pull/139523 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lots of internal tests in D65345157 ([comment](https://github.com/pytorch/pytorch/pull/139364#issuecomment-2452897337))	2024-11-02 06:49:10 +00:00
Jason Ansel	c53beab377	[inductor] sympy.Integer([01]) -> sympy.S.(Zero\|One) (#139523 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139523 Approved by: https://github.com/ezyang ghstack dependencies: #139364, #139365, #139370, #139452	2024-11-02 03:04:22 +00:00
Jason Ansel	f9ef880c0b	[inductor] Refactor kernel args into SIMDKernelFeatures (#139327 ) This is a refactor PR to move stuff around. I'm planning to use the SIMDKernelFeatures class (in a future PR) to host new heuristics for selecting kernel types and block sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139327 Approved by: https://github.com/eellison, https://github.com/shunting314	2024-11-01 00:30:14 +00:00
Jason Ansel	2b937e4e6d	[inductor] Cooperative reductions (#137756 ) Example generated code for `(x+y).sum()`: ```py @triton.jit def triton_unk_fused_add_sum_0(in_ptr0, in_ptr1, out_ptr0, ws_ptr, semaphores_ptr, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr, RSPLIT : tl.constexpr): xnumel = 1 rnumel = 1048576 rsplit_id = tl.program_id(0) num_rblocks = (rnumel + RBLOCK - 1) // RBLOCK rsplit_chunk = (num_rblocks + RSPLIT - 1) // RSPLIT * RBLOCK rsplit_start = rsplit_chunk * rsplit_id rsplit_end = rsplit_chunk * (rsplit_id + 1) xoffset = tl.program_id(1) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] xmask = tl.full([XBLOCK, RBLOCK], True, tl.int1) rbase = tl.arange(0, RBLOCK)[None, :] _tmp4 = tl.full([XBLOCK, RBLOCK], 0, tl.float32) for roffset in range(rsplit_start, rsplit_end, RBLOCK): rindex = roffset + rbase rmask = rindex < rnumel r0 = rindex tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_first', other=0.0) tmp1 = tl.load(in_ptr1 + (r0), rmask, eviction_policy='evict_first', other=0.0) tmp2 = tmp0 + tmp1 tmp3 = tl.broadcast_to(tmp2, [XBLOCK, RBLOCK]) tmp5 = _tmp4 + tmp3 _tmp4 = tl.where(rmask, tmp5, _tmp4) tmp4 = tl.sum(_tmp4, 1)[:, None] if RSPLIT > 1: tmp4_ws = (ws_ptr + 0).to(tl.pointer_type(tl.float32)) tl.store(tmp4_ws + (xindex * RSPLIT + rsplit_id), tmp4, None) if RSPLIT > 1: triton_helpers.gpu_barrier(semaphores_ptr + (2 * tl.program_id(1) + 0), RSPLIT, True) if RSPLIT > 1: tmp4_peers = tl.load(tmp4_ws + (xindex * RSPLIT + tl.arange(0, RSPLIT)[None,:]), None, eviction_policy='evict_first') tmp4 = tl.sum(tmp4_peers, 1)[:, None] if rsplit_id == (0 % RSPLIT): tl.store(out_ptr0 + (tl.full([XBLOCK, 1], 0, tl.int32)), tmp4, None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/137756 Approved by: https://github.com/eellison	2024-10-29 00:45:53 +00:00
PyTorch MergeBot	60d1c7138d	Revert "[inductor] Cooperative reductions (#137756 )" This reverts commit `fed37dbfbc`. Reverted https://github.com/pytorch/pytorch/pull/137756 on behalf of https://github.com/jeanschmidt due to ROCM tests are timing out :( ([comment](https://github.com/pytorch/pytorch/pull/137756#issuecomment-2441579322))	2024-10-28 13:24:33 +00:00
Jason Ansel	fed37dbfbc	[inductor] Cooperative reductions (#137756 ) Example generated code for `(x+y).sum()`: ```py @triton.jit def triton_unk_fused_add_sum_0(in_ptr0, in_ptr1, out_ptr0, ws_ptr, semaphores_ptr, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr, RSPLIT : tl.constexpr): xnumel = 1 rnumel = 1048576 rsplit_id = tl.program_id(0) num_rblocks = (rnumel + RBLOCK - 1) // RBLOCK rsplit_chunk = (num_rblocks + RSPLIT - 1) // RSPLIT * RBLOCK rsplit_start = rsplit_chunk * rsplit_id rsplit_end = rsplit_chunk * (rsplit_id + 1) xoffset = tl.program_id(1) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] xmask = tl.full([XBLOCK, RBLOCK], True, tl.int1) rbase = tl.arange(0, RBLOCK)[None, :] _tmp4 = tl.full([XBLOCK, RBLOCK], 0, tl.float32) for roffset in range(rsplit_start, rsplit_end, RBLOCK): rindex = roffset + rbase rmask = rindex < rnumel r0 = rindex tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_first', other=0.0) tmp1 = tl.load(in_ptr1 + (r0), rmask, eviction_policy='evict_first', other=0.0) tmp2 = tmp0 + tmp1 tmp3 = tl.broadcast_to(tmp2, [XBLOCK, RBLOCK]) tmp5 = _tmp4 + tmp3 _tmp4 = tl.where(rmask, tmp5, _tmp4) tmp4 = tl.sum(_tmp4, 1)[:, None] if RSPLIT > 1: tmp4_ws = (ws_ptr + 0).to(tl.pointer_type(tl.float32)) tl.store(tmp4_ws + (xindex * RSPLIT + rsplit_id), tmp4, None) if RSPLIT > 1: triton_helpers.gpu_barrier(semaphores_ptr + (2 * tl.program_id(1) + 0), RSPLIT, True) if RSPLIT > 1: tmp4_peers = tl.load(tmp4_ws + (xindex * RSPLIT + tl.arange(0, RSPLIT)[None,:]), None, eviction_policy='evict_first') tmp4 = tl.sum(tmp4_peers, 1)[:, None] if rsplit_id == (0 % RSPLIT): tl.store(out_ptr0 + (tl.full([XBLOCK, 1], 0, tl.int32)), tmp4, None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/137756 Approved by: https://github.com/eellison ghstack dependencies: #138970	2024-10-27 16:31:38 +00:00
Xinran / Allan Rui	ba6526814a	Add dtype attribute to CSEVariable (#136778 ) Summary: - This diff introduces `dtype` attribute to `TritonCSEVariable` and a dtype propagation helper function to infer dtype from input to output for each op. - There will be a follow-up diff that uses this `dtype` information in `TritonCSEVariable` to perform dtype-aware codegen. Test Plan: CI Differential Revision: D61815079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136778 Approved by: https://github.com/eellison, https://github.com/blaine-rister	2024-10-25 18:00:30 +00:00
Jason Ansel	4632594546	[inductor] Move V.graph.scheduler.current_device to V.graph.current_device (#138252 ) There are some places where it would be nice to use this, but the scheduler hasn't yet been created. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138252 Approved by: https://github.com/eellison ghstack dependencies: #138170	2024-10-18 23:05:54 +00:00
Jez Ng	71aac59e93	Add Triton CPU as an Inductor backend (#133408 ) The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408 Approved by: https://github.com/jansel, https://github.com/blaine-rister, https://github.com/malfet	2024-09-30 20:24:52 +00:00
PyTorch MergeBot	36428f91e9	Revert "Add Triton CPU as an Inductor backend (#133408 )" This reverts commit `31c0467594`. Reverted https://github.com/pytorch/pytorch/pull/133408 on behalf of https://github.com/int3 due to internal tests failing ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2379692517))	2024-09-27 16:54:27 +00:00
Jez Ng	31c0467594	Add Triton CPU as an Inductor backend (#133408 ) The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408 Approved by: https://github.com/jansel, https://github.com/blaine-rister, https://github.com/malfet	2024-09-26 15:35:26 +00:00
PyTorch MergeBot	d0cebedb31	Revert "Add Triton CPU as an Inductor backend (#133408 )" This reverts commit `e498b02b47`. Reverted https://github.com/pytorch/pytorch/pull/133408 on behalf of https://github.com/jeanschmidt due to Broke internal signals, see D62737208 for more details ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2353623816))	2024-09-16 18:33:33 +00:00
Jez Ng	e498b02b47	Add Triton CPU as an Inductor backend (#133408 ) The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408 Approved by: https://github.com/jansel	2024-09-14 21:45:19 +00:00
xinan.lin	13ee85ca5e	[Inductor] Generalize cuda cpp wrapper as common triton based GPU cpp wrapper, will be reused by xpu in next PR. (#135312 ) [Inductor] Generalize cuda cpp wrapper as common triton based GPU cpp wrapper, will be reused by xpu in next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135312 Approved by: https://github.com/jansel, https://github.com/desertfire, https://github.com/eellison	2024-09-11 23:59:54 +00:00
Isuru Fernando	1b10a5c652	Allow SymInts and SymFloats as other in div_softmax_pattern (#133989 ) Fixes https://github.com/pytorch/pytorch/issues/133759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133989 Approved by: https://github.com/ezyang	2024-08-22 14:36:01 +00:00
Isuru Fernando	7470ae85e4	Fix triton codegen with math.trunc (#133354 ) Fixes https://github.com/pytorch/pytorch/issues/133172 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133354 Approved by: https://github.com/ezyang, https://github.com/jansel	2024-08-15 16:38:26 +00:00
Oguz Ulgen	09f9c256ad	Add basic mypy annotations to inductor (#132416 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416 Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu ghstack dependencies: #132415	2024-08-04 18:43:37 +00:00
PyTorch MergeBot	f2ddd5e9e0	Revert "Add basic mypy annotations to inductor (#132416 )" This reverts commit `78927d37f6`. Reverted https://github.com/pytorch/pytorch/pull/132416 on behalf of https://github.com/ZainRizvi due to Sorry, this PR has entered a weird state in the diff train. Trying to revert it to skip it, and then we can try relanding it ([comment](https://github.com/pytorch/pytorch/pull/132415#issuecomment-2267631785))	2024-08-04 18:39:29 +00:00

1 2

76 Commits