Xuehai Pan
c73a92fbf5
[BE][CI] bump ruff to 0.9.2: multiline assert statements ( #144546 )
...
Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements
> Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target:
>
> ```python
> # Input
> assert (
> len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
>
> # Black
> assert (
> len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
> # Ruff
> assert len(policy_types) >= priority + num_duplicates, (
> f"This tests needs at least {priority + num_duplicates} many types."
> )
> ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546
Approved by: https://github.com/malfet
2025-02-27 20:46:16 +00:00
Aaron Orenstein
07669ed960
PEP585 update - benchmarks tools torchgen ( #145101 )
...
This is one of a series of PRs to update us to PEP585 (changing Dict -> dict, List -> list, etc). Most of the PRs were completely automated with RUFF as follows:
Since RUFF UP006 is considered an "unsafe" fix first we need to enable unsafe fixes:
```
--- a/tools/linter/adapters/ruff_linter.py
+++ b/tools/linter/adapters/ruff_linter.py
@@ -313,6 +313,7 @@
"ruff",
"check",
"--fix-only",
+ "--unsafe-fixes",
"--exit-zero",
*([f"--config={config}"] if config else []),
"--stdin-filename",
```
Then we need to tell RUFF to allow UP006 (as a final PR once all of these have landed this will be made permanent):
```
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -40,7 +40,7 @@
[tool.ruff]
-target-version = "py38"
+target-version = "py39"
line-length = 88
src = ["caffe2", "torch", "torchgen", "functorch", "test"]
@@ -87,7 +87,6 @@
"SIM116", # Disable Use a dictionary instead of consecutive `if` statements
"SIM117",
"SIM118",
- "UP006", # keep-runtime-typing
"UP007", # keep-runtime-typing
]
select = [
```
Finally running `lintrunner -a --take RUFF` will fix up the deprecated uses.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145101
Approved by: https://github.com/bobrenjc93
2025-01-18 05:05:07 +00:00
Xuehai Pan
dcc3cf7066
[BE] fix ruff rule E226: add missing whitespace around operator in f-strings ( #144415 )
...
The fixes are generated by:
```bash
ruff check --fix --preview --unsafe-fixes --select=E226 .
lintrunner -a --take "RUFF,PYFMT" --all-files
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415
Approved by: https://github.com/huydhn , https://github.com/Skylion007
2025-01-08 21:55:00 +00:00
bobrenjc93
fcf9dc3b11
Migrate from Tuple -> tuple in benchmarks ( #144259 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144259
Approved by: https://github.com/yanboliang
2025-01-07 04:09:52 +00:00
Oguz Ulgen
dc55704b48
Rename cache limit to recompile limit in configs ( #143709 )
...
This PR renames every cache_limit to recompile_limit via sed.
Old config options are maintained via Config(alias='xyz')
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709
Approved by: https://github.com/jansel
2024-12-22 10:03:57 +00:00
William Wen
c04f0bb7b9
[dynamo] add benchmark for guard eval ( #142430 )
...
Benchmarks:
- 713.2us (3.10)
- 598.8us (3.12)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142430
Approved by: https://github.com/jansel
ghstack dependencies: #142117
2024-12-17 18:54:27 +00:00
Xuehai Pan
267f82b860
[BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format ( #132577 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132577
Approved by: https://github.com/malfet
2024-10-11 18:30:26 +00:00
Oguz Ulgen
034af88c2d
Add a microbechmark for cache read path ( #137607 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137607
Approved by: https://github.com/jamesjwu
2024-10-10 16:36:18 +00:00
Oguz Ulgen
ae03c0cff3
Add microbenchmark for FxGraphHashDetails.debug_lines ( #137506 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137506
Approved by: https://github.com/jamesjwu
2024-10-09 16:15:05 +00:00
Jason Ansel
8da9c4178c
[inductor] Benchmark Halide in operatorbench.py ( #136809 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136809
Approved by: https://github.com/eellison
ghstack dependencies: #136808
2024-09-28 19:26:04 +00:00
Jason Ansel
375921b755
[inductor] Improve operatorbench.py ( #136808 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136808
Approved by: https://github.com/eellison
2024-09-28 06:22:02 +00:00
Yueming Hao
a71e5509bc
[inductor]Add profiler to operatorbench ( #135515 )
...
Add profiling to operatorbench. The new argument `--profile` is added and the profiling trace is like the following figure.
<img width="954" alt="image" src="https://github.com/user-attachments/assets/5b00d6e3-4905-4a77-a5e9-9f62620a5fd5 ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135515
Approved by: https://github.com/shunting314
2024-09-10 02:33:30 +00:00
Nicolas Macchioni
5cb05a82b4
[BC breaking] move benchmarking + prefer inductor path ( #132827 )
...
move benchmarking out of `torch._inductor.runtime.runtime_utils` and into `torch._inductor.runtime.benchmarking`, and prefer this path over directly accessing Triton's benchmarking
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132827
Approved by: https://github.com/eellison
2024-08-08 00:47:45 +00:00
Xuehai Pan
c0ed38e644
[BE][Easy][3/19] enforce style for empty lines in import segments in benchmarks/ ( #129754 )
...
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501 . Most changes are auto-generated by linter.
You can review these PRs via:
```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754
Approved by: https://github.com/ezyang
2024-07-17 14:34:42 +00:00
Peter Bell
e2e624a02f
[AOTAutograd] Micro-optimize runtime_wrapper ( #128188 )
...
This moves a bunch of runtime inspection of the `output_info` for alias handling into the construction of fixed output handlers that are created during compilation and captured by the runtime wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128188
Approved by: https://github.com/bdhirsh
2024-07-04 03:53:06 +00:00
Xuehai Pan
26f4f10ac8
[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch ( #127126 )
...
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
PyTorch MergeBot
55c0ab2887
Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch ( #127126 )"
...
This reverts commit 7763c83af6 .
Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286 ))
2024-05-27 09:22:08 +00:00
Xuehai Pan
7763c83af6
[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch ( #127126 )
...
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122 , #127123 , #127124 , #127125
2024-05-27 04:22:18 +00:00
eellison
d777685ef9
Script for choosing template configurations ( #126560 )
...
This adds logging that will mark any invocation of a matmul for a particular input shapes, and record every template configs performance on it. Then, we can parse that into a script which will minimize the total mm execution time given N allowed templates. And in future, other experiments..
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126560
Approved by: https://github.com/nmacchioni , https://github.com/jansel
2024-05-21 02:28:39 +00:00
Jason Ansel
7fd8870e6b
[inductor] Refactor runtime files into torch._inductor.runtime (part 3) ( #124557 )
...
I am planning to make the compile_worker process not import torch so it can start up much faster. This stack is prep for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124557
Approved by: https://github.com/yanboliang
ghstack dependencies: #124552 , #124553
2024-04-22 18:46:24 +00:00
PyTorch MergeBot
0b90af0bf5
Revert "[inductor] Refactor runtime files into torch._inductor.runtime (part 3) ( #124557 )"
...
This reverts commit fcf28b0ad5 .
Reverted https://github.com/pytorch/pytorch/pull/124557 on behalf of https://github.com/jeanschmidt due to There are internal breakages, already discussed with author and he'll FF ([comment](https://github.com/pytorch/pytorch/pull/124552#issuecomment-2070548223 ))
2024-04-22 18:28:05 +00:00
Jason Ansel
fcf28b0ad5
[inductor] Refactor runtime files into torch._inductor.runtime (part 3) ( #124557 )
...
I am planning to make the compile_worker process not import torch so it can start up much faster. This stack is prep for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124557
Approved by: https://github.com/yanboliang
ghstack dependencies: #124552 , #124553
2024-04-22 04:51:15 +00:00
Xuehai Pan
93e249969b
[BE] enable ruff rule RSE and remove useless parentheses in raise statements ( #124261 )
...
Remove useless parentheses in `raise` statements if the exception type is raised with no argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261
Approved by: https://github.com/albanD
2024-04-17 19:29:34 +00:00
Jason Ansel
5a10b56083
[dynamo] Small microbenchmark changes ( #122032 )
...
Used to generate numbers in #122029
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122032
Approved by: https://github.com/yanboliang
2024-03-18 18:08:06 +00:00
Jason Ansel
1e9a7df8fe
[dynamo] Compile time optimizations in tx.step() ( #121790 )
...
`python benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
- Before: `symbolic_convert_overhead_stress_test: 10.7s`
- After: `symbolic_convert_overhead_stress_test: 8.6s`
`tx.step()` is a small part of that benchmark, so likely the speedup in that isolated function is larger than the top line.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121790
Approved by: https://github.com/oulgen
2024-03-15 01:01:05 +00:00
Jason Ansel
9aa3fedb75
Slightly faster FX graph iterator ( #121611 )
...
Before:
```
iterating over 100000000 FX nodes took 5.9s (16830686 nodes/s)
```
After:
```
iterating over 100000000 FX nodes took 5.0s (19937698 nodes/s)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121611
Approved by: https://github.com/oulgen
2024-03-11 20:00:19 +00:00
Jason Ansel
c5702a0891
[dynamo] Optimize BACKEND_MATCH guard ( #118065 )
...
As measured by `benchmarks/dynamo/microbenchmarks/overheads.py`:
- Before `22.5us`
- After `18.1us`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118065
Approved by: https://github.com/ydwu4
2024-01-24 07:47:52 +00:00
Jason Ansel
a669319450
[inductor] Faster C++ kernel python bindings ( #117500 )
...
Calling C++ from Python via ctypes is notoriously slow. This switches to generating our own C++ bindings directly, which is a >5x speedup on this kernel-launch-bound microbenchmark:
```python
from ctypes import c_void_p
import torch
from torch import empty
from torch._inductor.codecache import AsyncCompile
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
from torch._inductor.wrapper_benchmark import compiled_module_main
async_compile = AsyncCompile()
src = '''
#include "/tmp/torchinductor_jansel/gb/cgbau5vlj6cetmcjbjbtw6x4rrivaln6f45s5d72gy2bfx5foz3k.h"
extern "C" void kernel(const float* in_ptr0,
float* out_ptr0)
{
{
auto tmp0 = in_ptr0[static_cast<long>(0L)];
auto tmp1 = static_cast<float>(1.0);
auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
out_ptr0[static_cast<long>(0L)] = tmp2;
}
}
'''
cpp_fused_add_ctypes = async_compile.cpp(src)
cpp_fused_add_cpython = async_compile.cpp_pybinding(["const float*", "float*"], src)
async_compile.wait(globals())
del async_compile
def call(arg0_1):
buf0 = empty((1,), device='cpu', dtype=torch.float32)
if use_ctypes:
for _ in range(100):
cpp_fused_add_ctypes(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
else:
for _ in range(100):
cpp_fused_add_cpython(arg0_1, buf0)
del arg0_1
return (buf0,)
def benchmark_compiled_module(times=1000, repeat=100):
arg0_1 = rand_strided((1,), (1,), device='cpu', dtype=torch.float32)
return print_performance(lambda: call(arg0_1), times=times, repeat=repeat)
print("old ctypes bindings: ", end='')
use_ctypes = True
compiled_module_main('None', benchmark_compiled_module)
print("new bindings: ", end='')
use_ctypes = False
compiled_module_main('None', benchmark_compiled_module)
```
Output:
```
old ctypes bindings: 0.000073
new bindings: 0.000013
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117500
Approved by: https://github.com/desertfire
2024-01-18 16:20:12 +00:00
Nikita Shulga
a1afd1b195
Revert "[inductor] Faster C++ kernel python bindings ( #117500 )"
...
It should have never been landed, but was landed again, thanks to
ghstack grafting/ungrafting see discussion on https://github.com/pytorch/pytorch/pull/116910
This reverts commit e457b6fb18 .
2024-01-17 17:06:32 -08:00
titaiwangms
e457b6fb18
[inductor] Faster C++ kernel python bindings ( #117500 )
...
Calling C++ from Python via ctypes is notoriously slow. This switches to generating our own C++ bindings directly, which is a >5x speedup on this kernel-launch-bound microbenchmark:
```python
from ctypes import c_void_p
import torch
from torch import empty
from torch._inductor.codecache import AsyncCompile
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
from torch._inductor.wrapper_benchmark import compiled_module_main
async_compile = AsyncCompile()
src = '''
#include "/tmp/torchinductor_jansel/gb/cgbau5vlj6cetmcjbjbtw6x4rrivaln6f45s5d72gy2bfx5foz3k.h"
extern "C" void kernel(const float* in_ptr0,
float* out_ptr0)
{
{
auto tmp0 = in_ptr0[static_cast<long>(0L)];
auto tmp1 = static_cast<float>(1.0);
auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
out_ptr0[static_cast<long>(0L)] = tmp2;
}
}
'''
cpp_fused_add_ctypes = async_compile.cpp(src)
cpp_fused_add_cpython = async_compile.cpp_pybinding(["const float*", "float*"], src)
async_compile.wait(globals())
del async_compile
def call(arg0_1):
buf0 = empty((1,), device='cpu', dtype=torch.float32)
if use_ctypes:
for _ in range(100):
cpp_fused_add_ctypes(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
else:
for _ in range(100):
cpp_fused_add_cpython(arg0_1, buf0)
del arg0_1
return (buf0,)
def benchmark_compiled_module(times=1000, repeat=100):
arg0_1 = rand_strided((1,), (1,), device='cpu', dtype=torch.float32)
return print_performance(lambda: call(arg0_1), times=times, repeat=repeat)
print("old ctypes bindings: ", end='')
use_ctypes = True
compiled_module_main('None', benchmark_compiled_module)
print("new bindings: ", end='')
use_ctypes = False
compiled_module_main('None', benchmark_compiled_module)
```
Output:
```
old ctypes bindings: 0.000073
new bindings: 0.000013
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117500
Approved by: https://github.com/desertfire
ghstack dependencies: #117409 , #116667 , #117591
2024-01-17 23:03:15 +00:00
PyTorch MergeBot
da6abaeeac
Revert "[inductor] Faster C++ kernel python bindings ( #117500 )"
...
This reverts commit bb0fd1bd3c .
Reverted https://github.com/pytorch/pytorch/pull/117500 on behalf of https://github.com/PaliC due to breaking internal discussed with author offline ([comment](https://github.com/pytorch/pytorch/pull/117500#issuecomment-1896516512 ))
2024-01-17 19:34:26 +00:00
titaiwangms
bb0fd1bd3c
[inductor] Faster C++ kernel python bindings ( #117500 )
...
Calling C++ from Python via ctypes is notoriously slow. This switches to generating our own C++ bindings directly, which is a >5x speedup on this kernel-launch-bound microbenchmark:
```python
from ctypes import c_void_p
import torch
from torch import empty
from torch._inductor.codecache import AsyncCompile
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
from torch._inductor.wrapper_benchmark import compiled_module_main
async_compile = AsyncCompile()
src = '''
#include "/tmp/torchinductor_jansel/gb/cgbau5vlj6cetmcjbjbtw6x4rrivaln6f45s5d72gy2bfx5foz3k.h"
extern "C" void kernel(const float* in_ptr0,
float* out_ptr0)
{
{
auto tmp0 = in_ptr0[static_cast<long>(0L)];
auto tmp1 = static_cast<float>(1.0);
auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
out_ptr0[static_cast<long>(0L)] = tmp2;
}
}
'''
cpp_fused_add_ctypes = async_compile.cpp(src)
cpp_fused_add_cpython = async_compile.cpp_pybinding(["const float*", "float*"], src)
async_compile.wait(globals())
del async_compile
def call(arg0_1):
buf0 = empty((1,), device='cpu', dtype=torch.float32)
if use_ctypes:
for _ in range(100):
cpp_fused_add_ctypes(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
else:
for _ in range(100):
cpp_fused_add_cpython(arg0_1, buf0)
del arg0_1
return (buf0,)
def benchmark_compiled_module(times=1000, repeat=100):
arg0_1 = rand_strided((1,), (1,), device='cpu', dtype=torch.float32)
return print_performance(lambda: call(arg0_1), times=times, repeat=repeat)
print("old ctypes bindings: ", end='')
use_ctypes = True
compiled_module_main('None', benchmark_compiled_module)
print("new bindings: ", end='')
use_ctypes = False
compiled_module_main('None', benchmark_compiled_module)
```
Output:
```
old ctypes bindings: 0.000073
new bindings: 0.000013
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117500
Approved by: https://github.com/desertfire
ghstack dependencies: #117409 , #116667 , #117591
2024-01-17 19:12:24 +00:00
PyTorch MergeBot
9da01affd3
Revert "[inductor] Faster C++ kernel python bindings ( #117500 )"
...
This reverts commit 3a52147cc5 .
Reverted https://github.com/pytorch/pytorch/pull/117500 on behalf of https://github.com/PaliC due to breaking internal discussed with author offline ([comment](https://github.com/pytorch/pytorch/pull/117500#issuecomment-1896426304 ))
2024-01-17 18:42:39 +00:00
Jason Ansel
3a52147cc5
[inductor] Faster C++ kernel python bindings ( #117500 )
...
Calling C++ from Python via ctypes is notoriously slow. This switches to generating our own C++ bindings directly, which is a >5x speedup on this kernel-launch-bound microbenchmark:
```python
from ctypes import c_void_p
import torch
from torch import empty
from torch._inductor.codecache import AsyncCompile
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
from torch._inductor.wrapper_benchmark import compiled_module_main
async_compile = AsyncCompile()
src = '''
#include "/tmp/torchinductor_jansel/gb/cgbau5vlj6cetmcjbjbtw6x4rrivaln6f45s5d72gy2bfx5foz3k.h"
extern "C" void kernel(const float* in_ptr0,
float* out_ptr0)
{
{
auto tmp0 = in_ptr0[static_cast<long>(0L)];
auto tmp1 = static_cast<float>(1.0);
auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
out_ptr0[static_cast<long>(0L)] = tmp2;
}
}
'''
cpp_fused_add_ctypes = async_compile.cpp(src)
cpp_fused_add_cpython = async_compile.cpp_pybinding(["const float*", "float*"], src)
async_compile.wait(globals())
del async_compile
def call(arg0_1):
buf0 = empty((1,), device='cpu', dtype=torch.float32)
if use_ctypes:
for _ in range(100):
cpp_fused_add_ctypes(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
else:
for _ in range(100):
cpp_fused_add_cpython(arg0_1, buf0)
del arg0_1
return (buf0,)
def benchmark_compiled_module(times=1000, repeat=100):
arg0_1 = rand_strided((1,), (1,), device='cpu', dtype=torch.float32)
return print_performance(lambda: call(arg0_1), times=times, repeat=repeat)
print("old ctypes bindings: ", end='')
use_ctypes = True
compiled_module_main('None', benchmark_compiled_module)
print("new bindings: ", end='')
use_ctypes = False
compiled_module_main('None', benchmark_compiled_module)
```
Output:
```
old ctypes bindings: 0.000073
new bindings: 0.000013
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117500
Approved by: https://github.com/desertfire
2024-01-16 22:30:04 +00:00
Aaron Gokaslan
d9f2cf9974
[BE]: Enable ruff rule PIE800 - unnecessary nested dict expansion ( #113880 )
...
Adds an additional list which removes unnecessary dict literal unpacking, also applies the fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113880
Approved by: https://github.com/albanD
2023-11-16 22:34:38 +00:00
Peter Bell
bbd5b935e4
Use pytree.tree_leaves everywhere ( #112324 )
...
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327 , #112323
2023-10-30 03:39:04 +00:00
Justin Chu
5ef023b05a
[BE] Enable ruff's UP rules and autoformat benchmarks/ ( #105429 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105429
Approved by: https://github.com/malfet
2023-07-19 04:46:37 +00:00
Aaron Gokaslan
2f95a3d0fc
[BE]: Apply ruff PERF fixes to torch ( #104917 )
...
Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917
Approved by: https://github.com/ezyang , https://github.com/albanD
2023-07-11 20:45:21 +00:00
Elias Ellison
2baadc2ade
Small operatorbench improvements ( #103110 )
...
- Don't copy inputs in cudagraphs wrapping, since the copies will distorts timing and triton do_bench will clear cache anyway
- Don't skip op if there is a fallback, since we have both fallbacks and lowerings for some ops
- Add option for channels last
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103110
Approved by: https://github.com/desertfire
2023-06-07 22:04:59 +00:00
xndcn
bebb8b7c1e
[inductor] use native fetch_add function for trivial types ( #101931 )
...
floating-point is supported by std::atomic::fetch_add since C++20.
However, this code path is not activated yet because cpp_flags in codecache.py is hard-coded to "-std=c++17"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101931
Approved by: https://github.com/jgong5 , https://github.com/EikanWang , https://github.com/jansel
2023-06-01 03:47:56 +00:00
lezcano
8b4e28d65d
Fix microbenchmarks ( #101065 )
...
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101065
Approved by: https://github.com/jansel
2023-05-11 09:14:22 +00:00
Shunting Zhang
68bc0fc012
[inductor] a script to benchmark the perf impact from tensor layout ( #99583 )
...
Follow up on Jason's idea of tensor layout tuning. Add a script to show the perf impact of layout to convolution (will add more cases like batch/layer norm, reduction to the scripts).
For convolution, a quick test shows using channels last layout, we get 1.4x speedup for convolution:
```
baseline 4.509183883666992 test 3.178528070449829 speedup 1.419x
```
The speedup definitely also depends on input/weight shapes. E.g., change input channel from 3 in the test to 8, we see speedup to be 2.1x
The trace shows cudnn calls different kernels when input layout changes to channels last.
<img width="997" alt="Screenshot 2023-04-19 at 5 27 54 PM" src="https://user-images.githubusercontent.com/52589240/233228656-4bdcac0a-7633-416a-82e1-17d8dc8ea9a6.png ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99583
Approved by: https://github.com/jansel
2023-04-20 06:26:10 +00:00
PyTorch MergeBot
629377ea8b
Revert "Replace _dynamo.config with an object instead of module ( #96455 )"
...
This reverts commit 420104a886 .
Reverted https://github.com/pytorch/pytorch/pull/96455 on behalf of https://github.com/jansel due to BC breaking, was landed prematurely
2023-04-12 15:06:14 +00:00
Han Qi
420104a886
Replace _dynamo.config with an object instead of module ( #96455 )
...
Summary:
Replace _dynamo.config with an object instead of module
Current usage patterns of setting and reading fields on config will work
unchanged.
Only changes needed going forward:
1. import torch._dynamo.config will not work. However, just doing
import torch._dynamo is sufficient to access dynamo config
as torch._dynamo.config.
2. Files inside of _dynamo folder need to access config via
from torch._dynamo.config_util import config instead of
from torch._dynamo import config. Because _dynamo/__init__.py
imports some of the files so it would be circular import.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96455
Approved by: https://github.com/williamwen42
2023-04-11 21:23:32 +00:00
Edward Z. Yang
9a8f71f23e
Convert logging f-strings to use % format ( #98697 )
...
Codemod done with
https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with
assistance from ChatGPT.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Jason Ansel
9370f253e3
[inductor] Rewrite convolution triton templates ( #95556 )
...
Fixes #95775
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95556
Approved by: https://github.com/Chillee , https://github.com/ngimel
2023-03-22 18:12:23 +00:00
BowenBao
60a68477a6
Bump black version to 23.1.0 ( #96578 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Aaron Gokaslan
3d82d8d0ed
[BE] Enable more flake8-comprehensions checks ( #94601 )
...
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.
This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
Jason Ansel
5d709af59a
Rename aot_cudagraphs to cudagraphs ( #93821 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93821
Approved by: https://github.com/ezyang
2023-02-03 21:01:27 +00:00
Liao, Xuan
764f79f680
[Microbenchmark] microbench fix for triton template ( #92282 )
...
Fixes microbench bug due to triton template https://github.com/pytorch/pytorch/pull/91575
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92282
Approved by: https://github.com/jgong5 , https://github.com/desertfire , https://github.com/jansel
2023-01-18 00:58:00 +00:00