Isuru Fernando
7470ae85e4
Fix triton codegen with math.trunc ( #133354 )
...
Fixes https://github.com/pytorch/pytorch/issues/133172
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133354
Approved by: https://github.com/ezyang , https://github.com/jansel
2024-08-15 16:38:26 +00:00
Oguz Ulgen
09f9c256ad
Add basic mypy annotations to inductor ( #132416 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416
Approved by: https://github.com/XuehaiPan , https://github.com/jamesjwu
ghstack dependencies: #132415
2024-08-04 18:43:37 +00:00
PyTorch MergeBot
f2ddd5e9e0
Revert "Add basic mypy annotations to inductor ( #132416 )"
...
This reverts commit 78927d37f6 .
Reverted https://github.com/pytorch/pytorch/pull/132416 on behalf of https://github.com/ZainRizvi due to Sorry, this PR has entered a weird state in the diff train. Trying to revert it to skip it, and then we can try relanding it ([comment](https://github.com/pytorch/pytorch/pull/132415#issuecomment-2267631785 ))
2024-08-04 18:39:29 +00:00
Oguz Ulgen
78927d37f6
Add basic mypy annotations to inductor ( #132416 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416
Approved by: https://github.com/XuehaiPan , https://github.com/jamesjwu
ghstack dependencies: #132415
2024-08-01 20:14:25 +00:00
eellison
f32ab3b9e3
Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet ( #130004 )
...
Python's set is non deterministic. There is an internal failure which we recently ran into which did not consistently fail.
See, repro here: P1453035092.
Now, with these changes, it does consistently fail. In follow ups we could also consider adding a lintrule for uses of either set() or set literals.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130004
Approved by: https://github.com/oulgen
2024-08-01 04:37:15 +00:00
PyTorch MergeBot
784a6ec5a3
Revert "Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet ( #130004 )"
...
This reverts commit 13d744464f .
Reverted https://github.com/pytorch/pytorch/pull/130004 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/10183945999/job/28170099930 ) [HUD commit link](13d744464f ) probably a landrace, the base is 21 hours old ([comment](https://github.com/pytorch/pytorch/pull/130004#issuecomment-2260946562 ))
2024-07-31 16:49:21 +00:00
eellison
13d744464f
Migrate Inductor scheduler, dependencies, ir, and codegen/common to use OrderedSet ( #130004 )
...
Python's set is non deterministic. There is an internal failure which we recently ran into which did not consistently fail.
See, repro here: P1453035092.
Now, with these changes, it does consistently fail. In follow ups we could also consider adding a lintrule for uses of either set() or set literals.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130004
Approved by: https://github.com/oulgen
2024-07-31 16:22:11 +00:00
eellison
8b507a922a
Mode to emulate amp numerics ( #131595 )
...
```
# Mode to emulate pytorch eager numerics for lower precision (fp16, bf16)
# Pytorch eager computes bf16/fp16 by upcasting inputs to fp32 and downcasting after
# For multiple, fused pointwise nodes, inductor will elide the intermediary upcasts and downcasts
# Typically this should be closer to fp64 ref numerics. However, it can be useful for debugging
# to emulate the eager numerics.
```
We add extra upcasts and downcasts for pointwise nodes that correspond to casts that existed in the original user program (excluding pointwise nodes that are emitted during decomposition). Since this is mostly for debugging, I added this information in the `meta` so that this mode does not have unintended side effects like changing pattern matching.
in theory there could also be some other casts with fused reduction -> reduction, although i havent seen this in practice as much. could be done as follow up. note: only works with cuda backend right now.
This mode was sufficient to eliminate compile differences from https://fb.workplace.com/groups/385893200869952/posts/464263173032954/?comment_id=465199259606012&reply_comment_id=465676792891592 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131595
Approved by: https://github.com/shunting314 , https://github.com/bdhirsh , https://github.com/jansel
2024-07-29 22:42:23 +00:00
eellison
5772c13f56
Dont wrap negative indexing in scatter reduce ( #131503 )
...
Fix for https://github.com/pytorch/pytorch/issues/131321
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131503
Approved by: https://github.com/shunting314
2024-07-24 04:01:32 +00:00
Xuehai Pan
b6d477fd56
[BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/ ( #129768 )
...
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501 . Most changes are auto-generated by linter.
You can review these PRs via:
```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129768
Approved by: https://github.com/jansel
2024-07-20 16:20:58 +00:00
Li-Huai (Allan) Lin
dc7725cc16
[halide-backend] Random number generation ( #130211 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130211
Approved by: https://github.com/jansel
2024-07-15 05:03:24 +00:00
Jason Ansel
d325aaef39
[halide-backend] Use get_reduction_combine_fn for reduction ops ( #130212 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130212
Approved by: https://github.com/eellison
2024-07-08 17:23:32 +00:00
Jason Ansel
acd03ca2d9
[halide-backend] Support scan kernels ( #129035 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129035
Approved by: https://github.com/shunting314 , https://github.com/eellison
ghstack dependencies: #130129
2024-07-06 03:49:50 +00:00
Jason Ansel
c5110f6388
[halide-backend] Use 0D scalar inputs/outputs ( #130129 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130129
Approved by: https://github.com/shunting314
2024-07-06 03:49:50 +00:00
Jason Ansel
4fc9157e90
[halide-backend] Disable split reductions for Halide ( #129320 )
...
In theory Halide doesn't need the split reduction stuff we do for Triton since it can generate multiple kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129320
Approved by: https://github.com/shunting314 , https://github.com/eellison
ghstack dependencies: #129321
2024-07-03 05:56:40 +00:00
Jason Ansel
0abcca85b7
[halide-backend] Support manual schedules ( #129321 )
...
Currently using this for some by-hand hacking, but might need to implement our own scheduler later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129321
Approved by: https://github.com/shunting314
2024-07-03 05:56:40 +00:00
PyTorch MergeBot
e385bf8ef8
Revert "[halide-backend] Disable split reductions for Halide ( #129320 )"
...
This reverts commit a18eb651d3 .
Reverted https://github.com/pytorch/pytorch/pull/129320 on behalf of https://github.com/jeanschmidt due to This PR is breaking internal builds, please check comments on it D59204360 ([comment](https://github.com/pytorch/pytorch/pull/129320#issuecomment-2200351678 ))
2024-07-01 14:44:35 +00:00
PyTorch MergeBot
a83eaf1c3a
Revert "[halide-backend] Support manual schedules ( #129321 )"
...
This reverts commit 9ae78a578c .
Reverted https://github.com/pytorch/pytorch/pull/129321 on behalf of https://github.com/jeanschmidt due to Reverting, as it is required to do so in order to revert #129320 ([comment](https://github.com/pytorch/pytorch/pull/129321#issuecomment-2200345664 ))
2024-07-01 14:42:33 +00:00
Jason Ansel
9ae78a578c
[halide-backend] Support manual schedules ( #129321 )
...
Currently using this for some by-hand hacking, but might need to implement our own scheduler later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129321
Approved by: https://github.com/shunting314
ghstack dependencies: #126417 , #129025 , #129026 , #127506 , #129036 , #129320
2024-06-29 14:06:28 +00:00
Jason Ansel
a18eb651d3
[halide-backend] Disable split reductions for Halide ( #129320 )
...
In theory Halide doesn't need the split reduction stuff we do for Triton since it can generate multiple kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129320
Approved by: https://github.com/shunting314 , https://github.com/eellison
ghstack dependencies: #126417 , #129025 , #129026 , #127506 , #129036
2024-06-29 14:06:28 +00:00
Jason Ansel
4cb8cb04a7
[halide-backend] Enable bfloat16 support ( #129036 )
...
Requires https://github.com/halide/Halide/pull/8255
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129036
Approved by: https://github.com/shunting314 , https://github.com/eellison
ghstack dependencies: #126417 , #129025 , #129026 , #127506
2024-06-29 14:06:25 +00:00
Jason Ansel
b93bf55b6a
[halide-backend] Add GPU support ( #127506 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127506
Approved by: https://github.com/shunting314 , https://github.com/eellison
ghstack dependencies: #126417 , #129025 , #129026
2024-06-29 14:06:21 +00:00
Jason Ansel
86cadc6385
[halide-backend] Dimension-based indexing ( #129026 )
...
Prior to this the generated Halide code was a rather literal translation of the Triton code, with XBLOCK/YBLOCK/RBLOCK and 1D inputs. Halide prefers dimensions, and this 1D index triggers a lot of bugs and perf issues. This PR infers dimensions and changes the indexing in the generated code.
Before
```py
@hl.generator(name="kernel")
class Kernel:
in_ptr0 = hl.InputBuffer(hl.Float(32), 1)
out_ptr3 = hl.OutputBuffer(hl.Float(32), 2)
def generate(g):
in_ptr0 = g.in_ptr0
out_ptr3 = g.out_ptr3
xindex = hl.Var('xindex')
rindex = hl.Var('rindex')
r1 = rindex
x0 = xindex
idom = hl.RDom([hl.Range(0, 16), hl.Range(0, 32)])
odom = hl.RDom([hl.Range(0, 16)])
rdom = hl.RDom([hl.Range(0, 32)])
xindex_idom = idom.x
xindex_odom = odom.x
rindex_idom = idom.y
r1_idom = rindex_idom
x0_idom = xindex_idom
x0_odom = xindex_odom
tmp0 = hl.Func('tmp0')
tmp0[rindex, xindex] = in_ptr0[r1 + (32*x0)]
tmp1 = hl.Func('tmp1')
tmp1[xindex] = hl.maximum(rdom, tmp0[rdom, xindex])
tmp2 = hl.Func('tmp2')
tmp2[rindex, xindex] = tmp0[rindex, xindex] - tmp1[xindex]
tmp3 = hl.Func('tmp3')
tmp3[rindex, xindex] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[rindex, xindex])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[rindex, xindex])
tmp4 = hl.Func('tmp4')
tmp4[xindex] = hl.sum(rdom, tmp3[rdom, xindex])
tmp5 = hl.Func('tmp5')
tmp5[rindex, xindex] = tmp3[rindex, xindex] / tmp4[xindex]
out_ptr3_i0 = hl.Var('out_ptr3_i0')
out_ptr3_i1 = hl.Var('out_ptr3_i1')
out_ptr3[out_ptr3_i0, out_ptr3_i1] = hl.cast(out_ptr3.type(), tmp5[out_ptr3_i0, out_ptr3_i1])
assert g.using_autoscheduler()
in_ptr0.set_estimates([hl.Range(0, 512)])
out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])
```
After
```py
@hl.generator(name="kernel")
class Kernel:
in_ptr0 = hl.InputBuffer(hl.Float(32), 2)
out_ptr3 = hl.OutputBuffer(hl.Float(32), 2)
def generate(g):
in_ptr0 = g.in_ptr0
out_ptr3 = g.out_ptr3
h0 = hl.Var('h0')
h1 = hl.Var('h1')
rdom = hl.RDom([hl.Range(0, 32)])
hr1 = rdom[0]
tmp0 = hl.Func('tmp0')
tmp0[h0, h1] = in_ptr0[h0, h1,]
tmp1 = hl.Func('tmp1')
tmp1[h1] = hl.maximum(rdom, tmp0[hr1, h1])
tmp2 = hl.Func('tmp2')
tmp2[h0, h1] = tmp0[h0, h1] - tmp1[h1]
tmp3 = hl.Func('tmp3')
tmp3[h0, h1] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[h0, h1])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[h0, h1])
tmp4 = hl.Func('tmp4')
tmp4[h1] = hl.sum(rdom, tmp3[hr1, h1])
tmp5 = hl.Func('tmp5')
tmp5[h0, h1] = tmp3[h0, h1] / tmp4[h1]
out_ptr3[h0, h1,] = hl.cast(hl.Float(32), tmp5[h0, h1])
assert g.using_autoscheduler()
in_ptr0.dim(0).set_min(0)
in_ptr0.dim(0).set_stride(1)
in_ptr0.dim(0).set_extent(32)
in_ptr0.dim(1).set_min(0)
in_ptr0.dim(1).set_stride(32)
in_ptr0.dim(1).set_extent(16)
in_ptr0.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])
out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129026
Approved by: https://github.com/shunting314 , https://github.com/eellison
ghstack dependencies: #126417 , #129025
2024-06-29 14:06:16 +00:00
Jason Ansel
da5f37515e
[halide-backend] Generate standalone runtime ( #129025 )
...
This puts the halide runtime in a global shared object, rather than copying it to each kernel. Having many copies of the runtime causes many issues with cuda.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129025
Approved by: https://github.com/shunting314 , https://github.com/eellison
ghstack dependencies: #126417
2024-06-29 14:06:12 +00:00
Jason Ansel
e34b7e6af3
[halide-backend] Initial implementation of HalideKernel and HalideScheduling ( #126417 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126417
Approved by: https://github.com/shunting314 , https://github.com/eellison
2024-06-29 14:06:08 +00:00
PyTorch MergeBot
1a54bb0f96
Revert "[halide-backend] Initial implementation of HalideKernel and HalideScheduling ( #126417 )"
...
This reverts commit 4f9399bd0d .
Reverted https://github.com/pytorch/pytorch/pull/126417 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/126417#issuecomment-2186999121 ))
2024-06-24 16:50:15 +00:00
PyTorch MergeBot
063facf352
Revert "[halide-backend] Generate standalone runtime ( #129025 )"
...
This reverts commit 10c64c3b49 .
Reverted https://github.com/pytorch/pytorch/pull/129025 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/129025#issuecomment-2186995467 ))
2024-06-24 16:47:25 +00:00
Jason Ansel
10c64c3b49
[halide-backend] Generate standalone runtime ( #129025 )
...
This puts the halide runtime in a global shared object, rather than copying it to each kernel. Having many copies of the runtime causes many issues with cuda.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129025
Approved by: https://github.com/shunting314 , https://github.com/eellison
ghstack dependencies: #126417
2024-06-22 17:39:52 +00:00
Jason Ansel
4f9399bd0d
[halide-backend] Initial implementation of HalideKernel and HalideScheduling ( #126417 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126417
Approved by: https://github.com/shunting314 , https://github.com/eellison
2024-06-22 17:39:52 +00:00