pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jason Ansel	d325aaef39	[halide-backend] Use get_reduction_combine_fn for reduction ops (#130212 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130212 Approved by: https://github.com/eellison	2024-07-08 17:23:32 +00:00
Jason Ansel	acd03ca2d9	[halide-backend] Support scan kernels (#129035 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129035 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #130129	2024-07-06 03:49:50 +00:00
Jason Ansel	c5110f6388	[halide-backend] Use 0D scalar inputs/outputs (#130129 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130129 Approved by: https://github.com/shunting314	2024-07-06 03:49:50 +00:00
Jason Ansel	4fc9157e90	[halide-backend] Disable split reductions for Halide (#129320 ) In theory Halide doesn't need the split reduction stuff we do for Triton since it can generate multiple kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129320 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #129321	2024-07-03 05:56:40 +00:00
Jason Ansel	0abcca85b7	[halide-backend] Support manual schedules (#129321 ) Currently using this for some by-hand hacking, but might need to implement our own scheduler later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129321 Approved by: https://github.com/shunting314	2024-07-03 05:56:40 +00:00
PyTorch MergeBot	e385bf8ef8	Revert "[halide-backend] Disable split reductions for Halide (#129320 )" This reverts commit `a18eb651d3`. Reverted https://github.com/pytorch/pytorch/pull/129320 on behalf of https://github.com/jeanschmidt due to This PR is breaking internal builds, please check comments on it D59204360 ([comment](https://github.com/pytorch/pytorch/pull/129320#issuecomment-2200351678))	2024-07-01 14:44:35 +00:00
PyTorch MergeBot	a83eaf1c3a	Revert "[halide-backend] Support manual schedules (#129321 )" This reverts commit `9ae78a578c`. Reverted https://github.com/pytorch/pytorch/pull/129321 on behalf of https://github.com/jeanschmidt due to Reverting, as it is required to do so in order to revert #129320 ([comment](https://github.com/pytorch/pytorch/pull/129321#issuecomment-2200345664))	2024-07-01 14:42:33 +00:00
Jason Ansel	9ae78a578c	[halide-backend] Support manual schedules (#129321 ) Currently using this for some by-hand hacking, but might need to implement our own scheduler later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129321 Approved by: https://github.com/shunting314 ghstack dependencies: #126417, #129025, #129026, #127506, #129036, #129320	2024-06-29 14:06:28 +00:00
Jason Ansel	a18eb651d3	[halide-backend] Disable split reductions for Halide (#129320 ) In theory Halide doesn't need the split reduction stuff we do for Triton since it can generate multiple kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129320 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026, #127506, #129036	2024-06-29 14:06:28 +00:00
Jason Ansel	4cb8cb04a7	[halide-backend] Enable bfloat16 support (#129036 ) Requires https://github.com/halide/Halide/pull/8255 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129036 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026, #127506	2024-06-29 14:06:25 +00:00
Jason Ansel	b93bf55b6a	[halide-backend] Add GPU support (#127506 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127506 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026	2024-06-29 14:06:21 +00:00
Jason Ansel	86cadc6385	[halide-backend] Dimension-based indexing (#129026 ) Prior to this the generated Halide code was a rather literal translation of the Triton code, with XBLOCK/YBLOCK/RBLOCK and 1D inputs. Halide prefers dimensions, and this 1D index triggers a lot of bugs and perf issues. This PR infers dimensions and changes the indexing in the generated code. Before ```py @hl.generator(name="kernel") class Kernel: in_ptr0 = hl.InputBuffer(hl.Float(32), 1) out_ptr3 = hl.OutputBuffer(hl.Float(32), 2) def generate(g): in_ptr0 = g.in_ptr0 out_ptr3 = g.out_ptr3 xindex = hl.Var('xindex') rindex = hl.Var('rindex') r1 = rindex x0 = xindex idom = hl.RDom([hl.Range(0, 16), hl.Range(0, 32)]) odom = hl.RDom([hl.Range(0, 16)]) rdom = hl.RDom([hl.Range(0, 32)]) xindex_idom = idom.x xindex_odom = odom.x rindex_idom = idom.y r1_idom = rindex_idom x0_idom = xindex_idom x0_odom = xindex_odom tmp0 = hl.Func('tmp0') tmp0[rindex, xindex] = in_ptr0[r1 + (32*x0)] tmp1 = hl.Func('tmp1') tmp1[xindex] = hl.maximum(rdom, tmp0[rdom, xindex]) tmp2 = hl.Func('tmp2') tmp2[rindex, xindex] = tmp0[rindex, xindex] - tmp1[xindex] tmp3 = hl.Func('tmp3') tmp3[rindex, xindex] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[rindex, xindex])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[rindex, xindex]) tmp4 = hl.Func('tmp4') tmp4[xindex] = hl.sum(rdom, tmp3[rdom, xindex]) tmp5 = hl.Func('tmp5') tmp5[rindex, xindex] = tmp3[rindex, xindex] / tmp4[xindex] out_ptr3_i0 = hl.Var('out_ptr3_i0') out_ptr3_i1 = hl.Var('out_ptr3_i1') out_ptr3[out_ptr3_i0, out_ptr3_i1] = hl.cast(out_ptr3.type(), tmp5[out_ptr3_i0, out_ptr3_i1]) assert g.using_autoscheduler() in_ptr0.set_estimates([hl.Range(0, 512)]) out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)]) ``` After ```py @hl.generator(name="kernel") class Kernel: in_ptr0 = hl.InputBuffer(hl.Float(32), 2) out_ptr3 = hl.OutputBuffer(hl.Float(32), 2) def generate(g): in_ptr0 = g.in_ptr0 out_ptr3 = g.out_ptr3 h0 = hl.Var('h0') h1 = hl.Var('h1') rdom = hl.RDom([hl.Range(0, 32)]) hr1 = rdom[0] tmp0 = hl.Func('tmp0') tmp0[h0, h1] = in_ptr0[h0, h1,] tmp1 = hl.Func('tmp1') tmp1[h1] = hl.maximum(rdom, tmp0[hr1, h1]) tmp2 = hl.Func('tmp2') tmp2[h0, h1] = tmp0[h0, h1] - tmp1[h1] tmp3 = hl.Func('tmp3') tmp3[h0, h1] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[h0, h1])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[h0, h1]) tmp4 = hl.Func('tmp4') tmp4[h1] = hl.sum(rdom, tmp3[hr1, h1]) tmp5 = hl.Func('tmp5') tmp5[h0, h1] = tmp3[h0, h1] / tmp4[h1] out_ptr3[h0, h1,] = hl.cast(hl.Float(32), tmp5[h0, h1]) assert g.using_autoscheduler() in_ptr0.dim(0).set_min(0) in_ptr0.dim(0).set_stride(1) in_ptr0.dim(0).set_extent(32) in_ptr0.dim(1).set_min(0) in_ptr0.dim(1).set_stride(32) in_ptr0.dim(1).set_extent(16) in_ptr0.set_estimates([hl.Range(0, 32), hl.Range(0, 16)]) out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129026 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025	2024-06-29 14:06:16 +00:00
Jason Ansel	da5f37515e	[halide-backend] Generate standalone runtime (#129025 ) This puts the halide runtime in a global shared object, rather than copying it to each kernel. Having many copies of the runtime causes many issues with cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129025 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417	2024-06-29 14:06:12 +00:00
Jason Ansel	e34b7e6af3	[halide-backend] Initial implementation of HalideKernel and HalideScheduling (#126417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126417 Approved by: https://github.com/shunting314, https://github.com/eellison	2024-06-29 14:06:08 +00:00
PyTorch MergeBot	1a54bb0f96	Revert "[halide-backend] Initial implementation of HalideKernel and HalideScheduling (#126417 )" This reverts commit `4f9399bd0d`. Reverted https://github.com/pytorch/pytorch/pull/126417 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/126417#issuecomment-2186999121))	2024-06-24 16:50:15 +00:00
PyTorch MergeBot	063facf352	Revert "[halide-backend] Generate standalone runtime (#129025 )" This reverts commit `10c64c3b49`. Reverted https://github.com/pytorch/pytorch/pull/129025 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/129025#issuecomment-2186995467))	2024-06-24 16:47:25 +00:00
Jason Ansel	10c64c3b49	[halide-backend] Generate standalone runtime (#129025 ) This puts the halide runtime in a global shared object, rather than copying it to each kernel. Having many copies of the runtime causes many issues with cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129025 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417	2024-06-22 17:39:52 +00:00
Jason Ansel	4f9399bd0d	[halide-backend] Initial implementation of HalideKernel and HalideScheduling (#126417 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126417 Approved by: https://github.com/shunting314, https://github.com/eellison	2024-06-22 17:39:52 +00:00

18 Commits