pytorch/torchgen
Alnis Murtovi 383f2ac914 AutoHeuristic: mixed_mm H100 heuristic (#132685)
H100 heuristic for mixed_mm. Performance looks similar to A100 heuristic.
```
  set     crit  max_depth  min_samples_leaf  correct  wrong  unsure  total  wrong_max_spdup  wrong_gman_spdup  max_spdup_default  gman_spdup_default  max_slowdown_default  non_default_preds  default_better
train  entropy          5              0.01     1562    604     145   2311         1.522201          1.077722          10.399141            3.134170              1.034802               2061               2
 test  entropy          5              0.01      361    164      24    549         1.443590          1.079169           8.159173            3.105360              1.197973                500               2
```

gpt-fast speedups
|batch size|prompt length| fallback    |  heuristic  | speedup |
|----------|-------------|------------:|------------:|--------:|
|     1    |      7      |      109.95  |       220.63|  2      |
|     1    |     11      |      109.65  | 	    210.92|  1.92   |
|     4    |      7      |       149.04 |       625.80|  4.19   |
|     4    |     11      |       149.56 |       494.64|  3.30   |
|     8    |      7      |       293.68 |       956.72|  3.25   |
|     8    |     11      |       294.48 |       925.60|  3.14   |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132685
Approved by: https://github.com/eellison
2024-08-07 23:48:01 +00:00
..
_autoheuristic AutoHeuristic: mixed_mm H100 heuristic (#132685) 2024-08-07 23:48:01 +00:00
aoti [cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343) 2024-06-30 19:22:16 +00:00
api [BE][Easy][5/19] enforce style for empty lines in import segments in tools/ and torchgen/ (#129756) 2024-07-17 06:44:35 +00:00
decompositions [BE][Easy] eliminate relative import in torchgen (#128872) 2024-06-21 14:11:46 +00:00
dest [Intel GPU] xpu-ops codegen via backend whitelist (#130082) 2024-07-31 16:31:38 +00:00
executorch [12/N] Use std::optional (#132361) 2024-08-02 13:46:46 +00:00
fuse [BE] update type annotations for basic utilities in torch/__init__.py (#129001) 2024-06-24 18:04:38 +00:00
operator_versions [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
selective_build [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
shape_functions [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
static_runtime [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
__init__.py
BUCK.oss
BUILD.bazel
build.bzl update rules_python and let bazel install its own pip dependencies (#101405) 2023-05-23 06:20:33 +00:00
code_template.py [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
context.py [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
gen_aoti_c_shim.py [cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343) 2024-06-30 19:22:16 +00:00
gen_backend_stubs.py [BE][Easy] replace import pathlib with from pathlib import Path (#129426) 2024-06-30 01:36:07 +00:00
gen_executorch.py [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
gen_functionalization_type.py propagate XLA's metadata after functional sync (#131076) 2024-07-31 18:20:00 +00:00
gen_lazy_tensor.py [3/N] Change #include <c10/util/Optional.h> to #include <optional> (#130300) 2024-07-09 13:32:57 +00:00
gen_vmap_plumbing.py [12/N] Use std::optional (#132361) 2024-08-02 13:46:46 +00:00
gen.py Include _native.h for structured_native_functions (#131208) 2024-07-24 02:55:36 +00:00
local.py [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
model.py [Intel GPU] xpu-ops codegen via backend whitelist (#130082) 2024-07-31 16:31:38 +00:00
native_function_generation.py [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
utils.py [torchgen] reference generated comment to actual location of the generator and template (#130020) 2024-07-05 21:47:14 +00:00
yaml_utils.py [Reland] Update mypy to 1.4.1 (#105227) 2023-07-15 20:30:20 +00:00