Commit Graph

28 Commits

Author SHA1 Message Date
Aaron Gokaslan
3d82d8d0ed [BE] Enable more flake8-comprehensions checks (#94601)
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.

This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
Jason Ansel
24ae50bcc7 Add config option to reduce warnings in inductor (#94413)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94413
Approved by: https://github.com/ezyang
2023-02-10 15:44:15 +00:00
Edward Z. Yang
dc70b00d0b Track and record hint on SymNode and use when possible (#94201)
Historically, we work out `size_hint` by working it out on the fly by doing a substitution on the sympy expression with the `var_to_val` mapping. With this change, we also maintain the hint directly on SymNode (in `expr._hint`) and use it in lieu of Sympy substitution when it is available (mostly guards on SymInt, etc; in particular, in idiomatic Inductor code, we typically manipulate Sympy expressions directly and so do not have a way to conveniently maintain hints.)

While it's possible this will give us modest performance improvements, this is not the point of this PR; the goal is to make it easier to carefully handle unbacked SymInts, where hints are expected not to be available. You can now easily test if a SymInt is backed or not by checking `symint.node.hint is None`.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94201
Approved by: https://github.com/voznesenskym
2023-02-09 00:00:44 +00:00
Will Constable
f2156ef42b Make triton debug util reusable (#94225)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94225
Approved by: https://github.com/Chillee
2023-02-08 22:03:35 +00:00
chunyuan
cff4d3bb22 inductor: fix convert_shape_to_symint (#93349)
Fixes https://github.com/pytorch/pytorch/issues/93833.

When `lst` is composed of a mix of static shapes and `sympy.Expr`, convert static shapes to ints and `sympy.Expr` to `symints`.
The old logic required that all of the elements of `lst` be static and it can then convert them to ints.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93349
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-02-02 07:34:57 +00:00
Horace He
19c9b09449 Replace IndexingDiv with FloorDiv in Inductor (#92878)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92878
Approved by: https://github.com/ezyang
2023-01-24 15:06:22 +00:00
Horace He
20bf77f9bd Fixed virtualized import and typing rule (#92774)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92774
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2023-01-22 22:19:40 +00:00
Horace He
5c4f0fd72c Change convolution to use symbolic shapes for propagation (#92397)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92397
Approved by: https://github.com/ezyang
2023-01-21 21:54:24 +00:00
Horace He
4f4b62e4a2 some fixes to get symbolic shapes working through inductor (#92320)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92320
Approved by: https://github.com/ezyang
2023-01-19 03:09:02 +00:00
Jason Ansel
9b173b87b2 Refactor away leftover import indirection (#92188)
This indirect ways of importing are a leftover from when we wanted to support both `import torchdynamo` and `import torch._dynamo`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92188
Approved by: https://github.com/desertfire
2023-01-18 04:53:05 +00:00
Jason Ansel
7c1c239db1 [inductor] Rewrite Triton templates + epilogue fusion (retry) (#91575)
This reverts commit 94262efc7d to reland #91105 / #90738.

Fixes https://github.com/pytorch/torchdynamo/issues/2015

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91575
Approved by: https://github.com/ngimel
2023-01-11 00:08:03 +00:00
PyTorch MergeBot
94262efc7d Revert "[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105)"
This reverts commit d6dd2e97da.

Reverted https://github.com/pytorch/pytorch/pull/91105 on behalf of https://github.com/atalman due to Broke internal builds
2022-12-21 00:02:38 +00:00
Jason Ansel
d6dd2e97da [inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105)
https://github.com/pytorch/pytorch/pull/90738 seems a bit borked. ghimport fails on it, and I unlinked it from the Phabricator diff, but it still won't land.  This is an exact copy that PR without using ghstack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91105
Approved by: https://github.com/ngimel
2022-12-20 02:38:23 +00:00
Peter Bell
81f351acd7 [inductor] Prevent blowup in inner_fn_str and extract_read_writes (#88933)
Currently the default `ops` handler expects strings as arguments and
just formats them into a function call template string. For complex
expressions, this can lead to exponential growth in terms. Say for
example you have:

```python
def fn(a):
   for _ in range(3)
       a = ops.mul(a, a)
   return a
```

You might expect `inner_fn_str` to contain 1 load and 3 multiplies,
but instead you find 8 loads and 7 multiplies:
```python
load(arg_0, i0) * load(arg_0, i0) * load(arg_0, i0) * load(arg_0, i0) * load(arg_0, i0) * load(arg_0, i0) * load(arg_0, i0) * load(arg_0, i0)
```

This type of blowup is present in the lowering for
`max_pool2d_with_indices_backward` which in #pytorch/torchdynamo#1352
was reported to have caused the entire compilation to hang.

This PR fixes the issue by formatting the string as a series of assignments to
variables, so for the example above, we now get:
```
tmp0 = load(arg_0, i0)
tmp1 = tmp0 * tmp0
tmp2 = tmp1 * tmp1
tmp3 = tmp2 * tmp2
return tmp3
```

Which corresponds to sequence of `ops` calls made.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88933
Approved by: https://github.com/jansel
2022-12-15 15:36:52 +00:00
Andrew M. James
7a7f29704f Remove hard numpy dep introduced by _inductor/utils.py (#90716)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90716
Approved by: https://github.com/cpuhrsch
2022-12-13 04:58:26 +00:00
Natalia Gimelshein
a88400e0cc pad low precision matmuls when requested (#90235)
Matmul padding is beneficial not only for fp32, fp16/bf16 with amp can benefit as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90235
Approved by: https://github.com/jiawenliu64
2022-12-06 04:13:24 +00:00
Animesh Jain
d09c52e4fd [inductor] Deterministic kernel names (#89713)
`node.origins` is a set and does not have an order. Therefore, inductor w and w/o cudagraphs experiments generate different kernel names, making it hard to debug.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89713
Approved by: https://github.com/soumith, https://github.com/mlazos, https://github.com/ngimel
2022-12-02 02:37:36 +00:00
Natalia Gimelshein
a188f05e8c Reland #89031 Added conv constraint that infers layouts (#89530)
Relands #89031
Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530
Approved by: https://github.com/Chillee
2022-11-23 20:18:54 +00:00
Horace He
419ef2cdcf Added utility to count memory reads/written in Inductor (#89203)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89203
Approved by: https://github.com/jansel, https://github.com/ngimel
2022-11-19 04:18:26 +00:00
Jiawen Liu
55b88cde0a [Inductor] Build Shape Padding in Inductor (#88709)
Summary: Build shape padding for matmul/bmm/addmm in Inductor

Differential Revision: D41071282

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88709
Approved by: https://github.com/bertmaher, https://github.com/Chillee
2022-11-15 03:10:36 +00:00
Michael Lazos
c1553880de Have kernel names include fused ops (#88624)
- Propagates origin fx nodes through inlining during lowering
- Concatenates op names into kernel name
- Adds config to cap the number of ops in the kernel name so they don't get too long

Caveats:
- The ordering in the name may not match the order that the ops are executed in the kernel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88624
Approved by: https://github.com/anijain2305, https://github.com/jansel
2022-11-10 21:38:06 +00:00
Elias Ellison
2381548071 add stride constraints to fallbacks (#88534)
Add stride/contiguity constraints to fallbacks so that inputs will be in the right stride permutation for the fallback kernel.

Improves perf of coat_lite_mini from 1.48415536054865 -> 2.010956856330101.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88534
Approved by: https://github.com/ngimel
2022-11-10 01:13:44 +00:00
Animesh Jain
d67b2edec3 [dynamo][dashboard] minor fixes for a clean Dashboard (#88056)
* better check for cold start latency
* sort on inductor column for better readability.

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88056
Approved by: https://github.com/ngimel
2022-10-31 02:30:29 +00:00
Animesh Jain
1b575782a0 [dynamo][benchmarks] use fresh inductor cache and raise batch size wherever possible (#88044)
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88044
Approved by: https://github.com/ngimel
2022-10-30 17:10:17 +00:00
Horace He
2418ddb1ec Unified symbolic shape variables between Inductor and AOTDispatcher (#87161)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87161
Approved by: https://github.com/jansel
2022-10-19 04:50:34 +00:00
Zachary DeVito
d36c284d14 [triton] allow cuda properties to be queried from workers (#87101)
Fixes https://github.com/pytorch/pytorch/pull/87048 by saving the needed properties before fork.

Actually attempting to get CUDA to load in the workers is probably not desired: cuda initialization takes O(seconds). Having multiple processes using the same device will slow things down.

This just moves the needed properties from the main trainer process to the workers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87101
Approved by: https://github.com/soumith
2022-10-18 04:48:29 +00:00
Jason Ansel
054a2fd6c2 Sync changes from pytorch/torchdynamo (#87013)
This updates to:
6380959be2

Generated with:
https://github.com/pytorch/torchdynamo/blob/main/copy_to_core.sh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87013
Approved by: https://github.com/voznesenskym
2022-10-15 21:00:57 +00:00
Jason Ansel
c7c09722ad Move TorchDynamo into PyTorch core (#86461)
Context:
https://github.com/pytorch/torchdynamo/issues/1588

This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core.
- `torchdynamo` becomes `torch._dynamo`
- `torchinductor` becomes `torch._inductor`

This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461
Approved by: https://github.com/voznesenskym
2022-10-13 23:18:06 +00:00