pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Sean McGovern	297805fd8f	Typo fixes for "overridden" in comments and function names (#155944 ) This word appears often in class descriptions and is not consistently spelled. Update comments and some function names to use the correct spelling consistently. Facilitates searching the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155944 Approved by: https://github.com/Skylion007	2025-06-14 03:37:38 +00:00
Xuehai Pan	c73a92fbf5	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 ) Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements > Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target: > > ```python > # Input > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > > # Black > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > # Ruff > assert len(policy_types) >= priority + num_duplicates, ( > f"This tests needs at least {priority + num_duplicates} many types." > ) > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546 Approved by: https://github.com/malfet	2025-02-27 20:46:16 +00:00
Aaron Gokaslan	e738f7ba23	[BE]: Enable ruff rule SIM113 (#147290 ) Lint rules that tells the user to avoid keeping track of their own counter and use the builtin enumerate when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147290 Approved by: https://github.com/jansel	2025-02-16 22:41:16 +00:00
Tom Ritchford	498a7808ff	Fix unused Python variables outside torch/ and test/ (#136359 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359 Approved by: https://github.com/albanD	2024-12-11 17:10:23 +00:00
cyy	aa95618268	[2/N] Apply py39 ruff fixes (#141938 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141938 Approved by: https://github.com/ezyang	2024-12-05 06:26:06 +00:00
Xuehai Pan	267f82b860	[BE] Format `.ci/` / `.github/` / `benchmarks/` / `functorch/` / `tools/` / `torchgen/` with `ruff format` (#132577 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132577 Approved by: https://github.com/malfet	2024-10-11 18:30:26 +00:00
Alnis Murtovi	e51c8ad369	AutoHeuristic: Heuristic that ranks choices for mm (#131714 ) This PR adds a heuristic for tuned_mm that predicts the top 10 best choices. To be safe, aten.mm is always included. Perf run: https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2008%20Aug%202024%2020%3A20%3A28%20GMT&stopTime=Thu%2C%2015%20Aug%202024%2020%3A20%3A28%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&deviceName=cuda%20(a100)&lBranch=gh/AlnisM/22/head&lCommit=905826f4ab5344efb0bcaa87e3b27a25299927ab&rBranch=main&rCommit=79ca596dc6ea16b6cdd0f2517451e19840717d37 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131714 Approved by: https://github.com/eellison ghstack dependencies: #131705, #131710	2024-08-16 16:20:38 +00:00
Alnis Murtovi	add0f0085c	AutoHeuristic: Support ranking/pruning choices (#131705 ) This PR adds support in train_decision if one wants to learn a heuristic for ranking. The main idea is that the user has to provide a number of choices the heuristic should return. I added a way to prune the learned decision tree such that it always returns the number of choices provided by the user. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131705 Approved by: https://github.com/eellison	2024-08-16 01:20:52 +00:00
Alnis Murtovi	f1c439cbed	AutoHeuristic: refactoring (#133170 ) This PR refactors train_decision.py and adds some basic logging, which I'll extend in another PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133170 Approved by: https://github.com/Chillee	2024-08-13 01:46:53 +00:00
Alnis Murtovi	48929184e9	AutoHeuristic: mixed_mm heuristic for A100 (#131613 ) This PR introduces changes to AutoHeuristic that allow one to learn a heuristic as a decision tree. I used this to learn a heuristic for mixed_mm on A100 that consistenly performs better than the default choice (https://github.com/pytorch/pytorch/blob/main/torch/_inductor/kernel/mm.py#L402). This is how the results look like: Explanation of columns: wrong_max_spdup: In the worst case, how much better would the best choice have been wrong_gman_spdup: For inputs where the heuristic is wrong, how much better is the best choice on average (geomean) max_spdup_default: Highest speedup achieved by the learned heuristic over the default choice gman_spdup_default: Geomean speedup achived by the learned heuristic over the default choice max_slowdown_default: If the default choice is better than the choice predicted by the learned heuristic, how much is it better in the worst case non_default_preds: Number of times the learned heuristic predicted a choice that is not the default choice default_better: Number of times the default choice is better than the choice made by the heuristic ``` set crit max_depth min_samples_leaf correct wrong unsure total wrong_max_spdup wrong_gman_spdup max_spdup_default gman_spdup_default max_slowdown_default non_default_preds default_better train entropy 5 0.01 2376 740 323 3439 1.855386 1.063236 11.352318 3.438279 1.022164 3116 2 test entropy 5 0.01 563 183 71 817 1.622222 1.060897 10.084181 3.507741 1.017039 746 2 ``` While the number of wrong predictions is high, on average the best choice is only around 6% better. What is important is that the choice predicted by the learned heuristic performs better than the default choice. I evaluated my heuristic on gpt-fast `meta-llama/Llama-2-7b-chat-hf` with int8 weight quantization. To get the `tuned_mixed_mm` to trigger, I had to replace `F.linear()` in https://github.com/pytorch-labs/gpt-fast/blob/main/quantize.py#L355 with `torch.matmul(input, self.weight.t().to(dtype=input.dtype))` because the mixed_mm pattern does not match if there is a transpose between a cast and the matmul. \|batch size\|prompt length\| fallback \| heuristic \| speedup \| \|----------\|-------------\|------------:\|------------:\|--------:\| \| 1 \| 7 \| 75.31 tok/s \| 148.83 tok/s\| 1.97 \| \| 1 \| 11 \| 75.99 tok/s \| 148.15 tok/s\| 1.94 \| \| 4 \| 7 \| 103.48 tok/s \| 472.00 tok/s\| 4.56 \| \| 4 \| 11 \| 103.56 tok/s \| 371.36 tok/s\| 3.58 \| \| 8 \| 7 \| 201.92 tok/s \| 813.44 tok/s\| 4.02 \| \| 8 \| 11 \| 201.76 tok/s \| 699.36 tok/s\| 3.46 \| Currently, the heuristic only applies to the following inputs: - m <= 128, k >= 1024, n >= 1024 (For these sizes, one of the triton kernels wins in most cases, but the heuristic still has to be careful to not choose a config that performs worse than the fallback) - k % 256 == 0 (If k is not a multiple of the block size, some choices perform extremely bad. In one case one config, that usually performs very well, was 130x slower.) - mat1 not transposed - mat2 transposed (In some cases, it was hard for the learned heuristic to detect some cases where it Pull Request resolved: https://github.com/pytorch/pytorch/pull/131613 Approved by: https://github.com/eellison	2024-08-02 13:54:37 +00:00
PyTorch MergeBot	a28cda11ef	Revert "AutoHeuristic: mixed_mm heuristic for A100 (#131613 )" This reverts commit `344c15a0bb`. Reverted https://github.com/pytorch/pytorch/pull/131613 on behalf of https://github.com/AlnisM due to lintrunner issues ([comment](https://github.com/pytorch/pytorch/pull/131613#issuecomment-2261884149))	2024-08-01 03:22:11 +00:00
Alnis Murtovi	344c15a0bb	AutoHeuristic: mixed_mm heuristic for A100 (#131613 ) This PR introduces changes to AutoHeuristic that allow one to learn a heuristic as a decision tree. I used this to learn a heuristic for mixed_mm on A100 that consistenly performs better than the default choice (https://github.com/pytorch/pytorch/blob/main/torch/_inductor/kernel/mm.py#L402). This is how the results look like: Explanation of columns: wrong_max_spdup: In the worst case, how much better would the best choice have been wrong_gman_spdup: For inputs where the heuristic is wrong, how much better is the best choice on average (geomean) max_spdup_default: Highest speedup achieved by the learned heuristic over the default choice gman_spdup_default: Geomean speedup achived by the learned heuristic over the default choice max_slowdown_default: If the default choice is better than the choice predicted by the learned heuristic, how much is it better in the worst case non_default_preds: Number of times the learned heuristic predicted a choice that is not the default choice default_better: Number of times the default choice is better than the choice made by the heuristic ``` set crit max_depth min_samples_leaf correct wrong unsure total wrong_max_spdup wrong_gman_spdup max_spdup_default gman_spdup_default max_slowdown_default non_default_preds default_better train entropy 5 0.01 2376 740 323 3439 1.855386 1.063236 11.352318 3.438279 1.022164 3116 2 test entropy 5 0.01 563 183 71 817 1.622222 1.060897 10.084181 3.507741 1.017039 746 2 ``` While the number of wrong predictions is high, on average the best choice is only around 6% better. What is important is that the choice predicted by the learned heuristic performs better than the default choice. I evaluated my heuristic on gpt-fast `meta-llama/Llama-2-7b-chat-hf` with int8 weight quantization. To get the `tuned_mixed_mm` to trigger, I had to replace `F.linear()` in https://github.com/pytorch-labs/gpt-fast/blob/main/quantize.py#L355 with `torch.matmul(input, self.weight.t().to(dtype=input.dtype))` because the mixed_mm pattern does not match if there is a transpose between a cast and the matmul. \|batch size\|prompt length\| fallback \| heuristic \| speedup \| \|----------\|-------------\|------------:\|------------:\|--------:\| \| 1 \| 7 \| 75.31 tok/s \| 148.83 tok/s\| 1.97 \| \| 1 \| 11 \| 75.99 tok/s \| 148.15 tok/s\| 1.94 \| \| 4 \| 7 \| 103.48 tok/s \| 472.00 tok/s\| 4.56 \| \| 4 \| 11 \| 103.56 tok/s \| 371.36 tok/s\| 3.58 \| \| 8 \| 7 \| 201.92 tok/s \| 813.44 tok/s\| 4.02 \| \| 8 \| 11 \| 201.76 tok/s \| 699.36 tok/s\| 3.46 \| Currently, the heuristic only applies to the following inputs: - m <= 128, k >= 1024, n >= 1024 (For these sizes, one of the triton kernels wins in most cases, but the heuristic still has to be careful to not choose a config that performs worse than the fallback) - k % 256 == 0 (If k is not a multiple of the block size, some choices perform extremely bad. In one case one config, that usually performs very well, was 130x slower.) - mat1 not transposed - mat2 transposed (In some cases, it was hard for the learned heuristic to detect some cases where it Pull Request resolved: https://github.com/pytorch/pytorch/pull/131613 Approved by: https://github.com/eellison ghstack dependencies: #131610, #131611	2024-08-01 02:25:54 +00:00

12 Commits