Commit Graph

17 Commits

Author SHA1 Message Date
Edward Z. Yang
cad79bd0bb Remove follow_imports = skip from sympy (#118469)
dmypy silently ignores follow_imports = skip, so to get parity between
dmypy and mypy we have to suck it up and type: ignore all of the sympy
typing problems.

The suppressions were added automatically with the following script generated by GPT-4:

```
import re

# Read the error file
with open("error_file.txt", "r") as f:
    errors = f.readlines()

# Parse the lines with errors and error types
error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

# Insert ignore comments in the source files
for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118469
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418, #118432, #118467, #118468
2024-01-28 13:38:38 +00:00
vfdev-5
85aa372374 [inductor] Fixed conv issue with dynamic shapes (#114351)
EDIT: fixes https://github.com/pytorch/pytorch/issues/114354

Description:
The following code is failing:
```python
import torch

def func(x, w):
    return torch.nn.functional.conv2d(x, w, groups=int(w.shape[0]))

x = torch.rand(1, 3, 64, 64)
w = torch.rand(3, 1, 3, 3)
y1 = func(x, w)
cfunc = torch.compile(func, fullgraph=True, dynamic=True)
y2 = cfunc(x, w)

torch.testing.assert_close(y1, y2)
```
with the error:
```
  File "/pytorch/torch/_inductor/kernel/conv.py", line 315, in convolution
    assert isinstance(groups, int)
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
LoweringException: AssertionError:
  target: aten.convolution.default
  args[0]: TensorBox(StorageBox(
    InputBuffer(name='arg3_1', layout=FixedLayout('cpu', torch.float32, size=[1, s0, s1, s1], stride=[s0*s1**2, s1**2, s1, 1]))
  ))
  args[1]: TensorBox(StorageBox(
    InputBuffer(name='arg1_1', layout=FixedLayout('cpu', torch.float32, size=[s0, 1, s0, s0], stride=[s0**2, s0**2, s0, 1]))
  ))
  args[2]: None
  args[3]: [1, 1]
  args[4]: [0, 0]
  args[5]: [1, 1]
  args[6]: False
  args[7]: [0, 0]
  args[8]: s0
```
where `groups` argument is a symbol but expected to be `int`.

This PR specializes `group` to its int value and fixes the problem.

Context: Failing tests in torchvision with gaussian blur and adjust_sharpness ops
- https://github.com/pytorch/vision/actions/runs/6955843968/job/18926393710?pr=8127

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114351
Approved by: https://github.com/ezyang
2023-11-23 13:13:06 +00:00
Jez Ng
c77dd684c9 Enable typechecking in _inductor/ir.py (#110112)
I used a bunch of ignore-type comments, mostly due to
https://github.com/pytorch/pytorch/issues/109963.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110112
Approved by: https://github.com/peterbell10
2023-10-07 04:19:38 +00:00
Ying Zhang
097fd43f8c [Inductor CUTLASS backend] Step 4: CUDA (template) kernels (#107931)
This is the step 4 to add cutlass as an alternative inductor backend.
Full tests can be found from the last PR in the stack.

Feature request: https://github.com/pytorch/pytorch/issues/106991.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107931
Approved by: https://github.com/aakhundov, https://github.com/jansel, https://github.com/kadeng
ghstack dependencies: #107802, #107847, #107901
2023-09-12 17:44:38 +00:00
Jack Taylor
a18ee0c6ec [ROCm] ROCm compatible configs for triton kernels (#107584)
This PR brings in a few inductor changes required for ROCm

~**1 - Introduction of a toggle for enforced channel last convolution fallbacks**~
This addition is split off into its own PR after some cleanup by @pragupta  https://github.com/pytorch/pytorch/pull/107812

**2 - Addition of ROCm specific block sizes**
We are now able to support the MAX_AUTOTUNE mode on ROCm, we are proposing conditions to allow us to finetune our own block tuning. Currently triton on ROCm does not benefit from pipelining so we are setting all configs to `num_stages=1` and we have removed some upstream tunings on ROCm to avoid running out of shared memory resources.

In the future we will provide more optimised tunings for ROCm but for now this should mitigate any issues

~**3 - Addition of device_type to triton's compile_meta**~
~Proposing this addition to `triton_heuristics.py`, Triton on ROCm requires device_type to be set to hip https://github.com/ROCmSoftwarePlatform/triton/pull/284 suggesting to bring this change in here so we can pass down the correct device type to triton.~
This change is split off and will arrive in the wheel update PR https://github.com/pytorch/pytorch/pull/107600 leaving this PR to focus on the ROCm specific block sizes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107584
Approved by: https://github.com/jithunnair-amd, https://github.com/jansel, https://github.com/eellison
2023-08-26 18:24:55 +00:00
Jez Ng
9c9982a0aa Turn on typechecking for _inductor/kernel/conv.py (#106258)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106258
Approved by: https://github.com/Skylion007
ghstack dependencies: #106252
2023-08-18 08:49:18 +00:00
eellison
8298720299 Enable Lowering Channels last Conv1x1 when max autotune is set (#107004)
This can lead to a large speedup when max autotune is set, e.g. resnet 2.1x -> 2.5x, particularly in combination with freezing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107004
Approved by: https://github.com/jansel, https://github.com/shunting314, https://github.com/int3
ghstack dependencies: #106911, #106912
2023-08-17 16:05:32 +00:00
Edward Z. Yang
a01a732954 Rename some sizevars methods for clarity (#105585)
The guard functions require you to ALREADY KNOW that a particular
condition holds.  If you don't know (you want to guard on an expression
being a particular value, and then get access to that value), use
the evaluate functions.

I renamed the functions that don't abide by this:

```
guard_min -> evaluate_min
guard_max (deleted, no uses)
guard_static_shape -> evaluate_static_shape
guard_static_shapes -> evaluate_static_shapes
```

Some added comments.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105585
Approved by: https://github.com/voznesenskym
2023-07-21 04:46:23 +00:00
chunyuan
d61cd03b97 Inductor cpp wrapper: support ConvTranspose and fix Convolution ir (#103308)
The changes in this PR include:
- Support ConvTranspose in cpp wrapper
- Fix cpp wrapper support for aten convolution when bias is `not None`: bias is in `args` instead of `kwargs` when it is `not None`. The change is covered by ConvTranspose dynamic shapes UT since we'll fall back to aten convolution in dynamic shape cases.
- Fix cpp wrapper support for `inf`. This is a UT added in https://github.com/pytorch/pytorch/issues/101865. The cpp wrapper UT is covered in `test_conv2d_unary` of `test_cpp_wrapper.py`. It's in `slowTest` category and seems not captured in the CI of that PR.

I will submit another PR to remove the hard-coded schema in these `ExternKernel`s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103308
Approved by: https://github.com/jgong5, https://github.com/desertfire
2023-06-10 03:53:05 +00:00
Shunting Zhang
86c7652503 [inductor] layout optimization for conv (#99773)
convolution kernel with channels last runs much faster then kernel with contiguous inputs. The PR leverage that to optimize tensor layouts so we provide 'channels last' inputs to convolution. Some care need to be taken to not convert tensor layout between contiguous and channels last back and forth. Those extra copies hurt performance quite much.

Latest perf number [here](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2024%20May%202023%2023%3A40%3A37%20GMT&stopTime=Wed%2C%2031%20May%202023%2023%3A40%3A37%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=shunting-layout-opt-19&lCommit=baa797fc100688dfb044fbcbdebcfd2591710f78&rBranch=main&rCommit=999bae0f54108ffc5b7cf2524a02a83901554b16)
- TB: 1.64x -> 1.69x
- HF: 1.79x -> 1.78x (random noise)
- TIMM: 1.51x -> 1.65x

Right now we disable layout optimization for dynamic shape since there is perf loss in that combination. Here is a GH issue to followup: https://github.com/pytorch/pytorch/issues/102670

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99773
Approved by: https://github.com/jansel
2023-06-02 21:08:18 +00:00
Edward Z. Yang
b94f143ace SymIntify convNd and conv_transposeNd, fix inductor symint handling (#101488)
Fixes https://github.com/pytorch/pytorch/issues/101014

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101488
Approved by: https://github.com/ngimel
2023-05-16 17:46:52 +00:00
Michael Voznesensky
a0934f8bad Replace maybe_guard with statically_known (#99383)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99383
Approved by: https://github.com/ngimel
2023-04-26 05:53:48 +00:00
Bin Bao
0c0e5c574e [inductor] Consolidate constant_args and cpp_constant_args (#98742)
Summary: Refactor code to simplify the logic. Support convolution as an
extern call in CudaWrapperCodeGen.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98742
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-04-12 11:59:08 +00:00
Peter Bell
b7ff717232 [inductor] Use 64-bit indexing for large tensors in triton code (#97447)
This changes `TritonKernel` to have an `index_dtype` property which is
used as the dtype in indexing calculations. By default it is
`tl.int32` but if any input or output buffer is larger than `INT_MAX`
then we use `tl.int64` instead.

should fix #96978, #93606 (need to double check)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97447
Approved by: https://github.com/ngimel
2023-04-08 00:55:51 +00:00
Nicolas Macchioni
29608fd28d [pt2][inductor] hardcode autotuning names (#98351)
Summary: switch to hardcoded autotuning names, we want consistency incase the default choice changes

Test Plan: CI

Differential Revision: D44643318

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98351
Approved by: https://github.com/jansel
2023-04-07 03:40:33 +00:00
Yanbo Liang
ccc27bc361 [Inductor] Fix convolution lowering if stride or padding or dilation is 1 element list (#98448)
Fixes error from 14k github models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98448
Approved by: https://github.com/ngimel
2023-04-06 10:40:06 +00:00
Jason Ansel
9370f253e3 [inductor] Rewrite convolution triton templates (#95556)
Fixes #95775

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95556
Approved by: https://github.com/Chillee, https://github.com/ngimel
2023-03-22 18:12:23 +00:00