pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Max Podkorytov	ab0694f1c6	[ROCm][Inductor][CK backend] Install rocm-composable-kernel python package on ROCm Linux CI docker images (#162288 ) Reopened from #158747 which got reverted since without setuptools-scm in pytorch index URL the wheel cannot be built We reconsider the original PR idea of introducing CK as a pytorch dependency on ROCm Linux and install the CK python package in CI only -- since (1) rocm-composable-kernel depends on setuptools-scm which depends on tomli and the existing index URLs need to be modified to host the new packages and (2) there also is a packaging [bug](https://github.com/pypa/setuptools/issues/3269#issuecomment-1254507377) in Ubuntu 22.04 which prevents correct dynamic version calculation with default system pip. Extras: -> this PR reconsiders how TORCHINDUCTOR_CK_DIR env variable is used; previously, this var was used to point to rocm-composable-kernel package installation path on the filesystem; now, the path is inferred by trying to import ck4inductor -> the tests are updated to reflect this change -> since in CI clang points to a bash script which invokes sccache, we cannot patch PATH to not contain sccache, this logic is removed from the testing code -> scaled_mm test crashes during the benchmarking when the benchmarking happens in the main process, and times out benchmarking when it happens in a subprocess, on gfx942, so it is disabled TBD: roll back rocm-mi300 workflow before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/162288 Approved by: https://github.com/jeffdaily	2025-09-10 19:33:40 +00:00
PyTorch MergeBot	d711f27845	Revert "[ROCm] [CK] Composable Kernel integration for inductor backend (#158747 )" This reverts commit `019fed39aa`. Reverted https://github.com/pytorch/pytorch/pull/158747 on behalf of https://github.com/jithunnair-amd due to Broke linux-binary-manywheel-rocm / manywheel-py3_9-rocm6_4-test: `019fed39aa/1` ... PR didn't have this job run successfully due to CI outage ([comment](https://github.com/pytorch/pytorch/pull/158747#issuecomment-3259212343))	2025-09-05 17:27:45 +00:00
iupaikov-amd	019fed39aa	[ROCm] [CK] Composable Kernel integration for inductor backend (#158747 ) This is a part of our effort for integrating Composable Kernel library for Inductor backend. Currently we have a submodule, but would prefer to have commit pin control over the library as with Triton. We intentionally avoid putting all installation logic in CI scripts to allow locally built versions to have this functionality. The idea is to have CK as a pytorch dependency in pytorch 2.9 release to allow people to use it with inductor and AOT inductor and then gradually step away from submodule usage. Right now CK usage in SDPA/Gemm is tied to submodule files. This PR is a remake of due to branch error: https://github.com/pytorch/pytorch/pull/156192 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158747 Approved by: https://github.com/jeffdaily Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-09-04 16:51:06 +00:00
gaoyvfeng	50f23ff6f8	rename-HAS_CUDA-to-HAS_CUDA_AND_TRITON (#159883 ) Fixes #159399 "Modified torch.testing._internal.inductor_utils and test/inductor" Pull Request resolved: https://github.com/pytorch/pytorch/pull/159883 Approved by: https://github.com/janeyx99	2025-08-08 15:44:52 +00:00
Max Podkorytov	ee2edf3d37	[ROCm][CK][Inductor] enable gfx950 for max autotune with CK (#159195 ) + update inductor config for new gfx arch + fixes in codegen for conv2d and ck-tile matmul + use appropriate fp8 dtypes + test cleanup Pull Request resolved: https://github.com/pytorch/pytorch/pull/159195 Approved by: https://github.com/chenyang78	2025-07-27 20:47:13 +00:00
Xuehai Pan	f5e6e52f25	[BE][PYFMT] migrate PYFMT for `test/inductor/` to `ruff format` (#148186 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148186 Approved by: https://github.com/jansel	2025-06-24 11:12:11 +00:00
Max Podkorytov	1e6a653234	[ROCm][Inductor][CK] Split ck and ck-tile inductor backend(s) (#155294 ) ... and fix ck-tile instances not being generated due to incorrect caching ### Testing Added test cases for CKTILE instances ``` pytest test/inductor/test_ck_backend.py -k gemm_backends_CKTILE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/155294 Approved by: https://github.com/coconutruben	2025-06-09 20:40:26 +00:00
Max Podkorytov	7ef2c62fd3	[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices (#152341 ) This PR adds code generation for CK-tile based universal gemm kernels to the CK backend for Inductor, and adds these kernels to autotune choices. Unlike legacy-CK based kernels (which are generated by parsing the CK instances from CK library), we generate the set of instances by manually specifying the tuning parameters. This PR introduces a new template for code generation, and compilation/autotuning is handled by the existing infrastructure. Points of discussion: * For simplicity and reduced coupling with CK, the instance filter checks only data type and layout, and doesn't check the alignment requirement - meaning that more instances will be compiled than necessary - while keeping the code generation independent from internal CK logic which checks the alignment validity at runtime * CK-tile instances are enabled whenever legacy-CK instances are enabled. A config knob could be introduced to differentiate between the instance types if that's needed * Whether gemm problem size K is ever dynamic, since whenever it's not a compile-time constant, we need to perform a runtime dispatch between several kernels Testing Use the existing tests in `test/inductor/test_ck_backend.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152341 Approved by: https://github.com/chenyang78	2025-05-21 23:59:16 +00:00
Bin Bao	94d761fbf0	[AOTI][reland] Update test runner to use the new APIs (#149412 ) Summary: Reland https://github.com/pytorch/pytorch/pull/147105. Switch to the newer aoti_compile_and_package APIs. Some tests still kept using legacy APIs, and will follow up with internal test refactoring. Differential Revision: [D71470265](https://our.internmc.facebook.com/intern/diff/D71470265) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149412 Approved by: https://github.com/yushangdi	2025-03-19 17:56:44 +00:00
PyTorch MergeBot	405025778d	Revert "[AOTI] Update test runner to use the new APIs (#147105 )" This reverts commit `9a78513c3c`. Reverted https://github.com/pytorch/pytorch/pull/147105 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/147105#issuecomment-2733656413))	2025-03-18 15:25:40 +00:00
Bin Bao	9a78513c3c	[AOTI] Update test runner to use the new APIs (#147105 ) Summary: Switch to the newer aoti_compile_and_package APIs. Some tests still kept using legacy APIs, and will follow up with internal test refactoring. Differential Revision: [D69609685](https://our.internmc.facebook.com/intern/diff/D69609685) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147105 Approved by: https://github.com/jingsh	2025-03-18 00:27:09 +00:00
Jason Ansel	e343f46464	[inductor] Refactor is_big_gpu (#142220 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142220 Approved by: https://github.com/yanboliang ghstack dependencies: #142219, #142033, #142222	2024-12-08 18:51:36 +00:00
Max Podkorytov	822e8a01c6	[ROCm][Inductor][CK] Add batched gemms into gemm max autotune with CK backend (#141520 ) ## Testing ``` TORCH_LOGS=+torch._inductor pytest --capture=no test/inductor/test_ck_backend.py -k bmm ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141520 Approved by: https://github.com/chenyang78	2024-12-05 16:03:12 +00:00
Max Podkorytov	d64827dc35	[ROCm][Inductor][CK] Enable scaled mm with bias in gemm max autotune with CK backend (#140674 ) ## Testing ``` pytest test/inductor/test_ck_backend.py -k scaled_mm ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140674 Approved by: https://github.com/chenyang78	2024-11-15 22:08:38 +00:00
Max Podkorytov	ee54dfb64d	[Inductor][ROCm][CK] Enable lowering conv2d instances in CK Inductor backend (#138643 ) Set PYTORCH_MIOPEN_SUGGEST_NHWC environment variable to force output layout to channels-last. This way, the channels-last CK instances will be added to benchmark choices in max autotune # Testing ``` pytest test/inductor/test_ck_backend.py -k conv2d ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138643 Approved by: https://github.com/chenyang78	2024-11-07 18:37:39 +00:00
PyTorch MergeBot	4dd04db5d0	Revert "[Inductor][ROCm][CK] Enable lowering conv2d instances in CK Inductor backend (#138643 )" This reverts commit `4d92d6e604`. Reverted https://github.com/pytorch/pytorch/pull/138643 on behalf of https://github.com/wdvr due to reverting due to a large number of internal failures, see below ([comment](https://github.com/pytorch/pytorch/pull/138643#issuecomment-2442036958))	2024-10-28 16:18:38 +00:00
Max Podkorytov	4d92d6e604	[Inductor][ROCm][CK] Enable lowering conv2d instances in CK Inductor backend (#138643 ) Set PYTORCH_MIOPEN_SUGGEST_NHWC environment variable to force output layout to channels-last. This way, the channels-last CK instances will be added to benchmark choices in max autotune # Testing ``` pytest test/inductor/test_ck_backend.py -k conv2d ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138643 Approved by: https://github.com/chenyang78	2024-10-25 22:11:44 +00:00
Max Podkorytov	52ba40c6f6	[ROCm][AOTI] add CK backend (#135641 ) Companion to #134379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135641 Approved by: https://github.com/ColinPeppler, https://github.com/chenyang78 Co-authored-by: Colin Peppler <colinpeppler@meta.com>	2024-10-07 23:53:58 +00:00
Colin Peppler	bc916a5537	[easy] for test_ck_backend enable RE & activate remaining tests for FBCode (#137305 ) Differential Revision: D63859208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137305 Approved by: https://github.com/muchulee8, https://github.com/chenyang78	2024-10-04 00:22:35 +00:00
Colin Peppler	d117ec1d6e	[3/3][Inductor] Make CK work in FBCode (#136234 ) Summary: # Context Goal: Enable CK for Inductor in FBCode We split this stack into three diffs to help with review & in case we need to revert anything. # This Diff * Gets us to have CK kernels as an option for GEMM autotuning in Inductor. Reviewed By: zjing14 Differential Revision: D62662705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136234 Approved by: https://github.com/tenpercent, https://github.com/chenyang78	2024-10-02 12:17:38 +00:00
Max Podkorytov	7283530db2	[ROCm][Inductor][CK] FP8 gemm (#136337 ) At the moment, lowering torch._scaled_mm with tensorwise scaling and rowwise scaling for both A and B We probably also want to support either combination of tensorwise and rowwise for A and B, as well as bias support Pull Request resolved: https://github.com/pytorch/pytorch/pull/136337 Approved by: https://github.com/chenyang78	2024-09-24 05:19:45 +00:00
Aaron Orenstein	8c356ce3da	Fix lint errors in fbcode (#135614 ) Summary: Fixed a bunch of fbcode imports that happened to work but confused autodeps. After this autodeps still suggests "improvements" to TARGETS (which breaks our builds) but at least it can find all the imports. Test Plan: ``` fbpython fbcode/tools/build/buck/linters/lint_autoformat.py --linter=autodeps --default-exec-timeout=1800 -- fbcode/caffe2/TARGETS fbcode/caffe2/test/TARGETS ``` Before: ``` ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/testing.py:229) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fbur$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export.py:87) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_serdes.py:9) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fb$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_serdes.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_retraceability.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https:$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_retraceability.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See ht$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_nonstrict.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See http$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_nonstrict.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See $ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:8) when processing rule "test_export". Please make sure it's listed in the srcs parameter of an$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of anoth$ ERROR while processing caffe2/test/TARGETS: Found "//python/typeshed_internal:typeshed_internal_library" owner for "cv2" but it is protected by visibility rules: [] (from caffe2/test/test_bundled_images.py:7) when processing rule "test_bundled_$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "caffe2.test.profiler_test_cpp_thread_lib" (from caffe2/test/profiler/test_cpp_thread.py:29) when processing rule "profiler_test_cpp_thread". Please make sure it's listed in t$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_custom_ops.py:23) when processing rule "custom_ops". Please make sure it's listed in the srcs parameter of anoth$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_public_bindings.py:13) when processing rule "public_bindings". Please make sure it's listed in the srcs paramete$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.symbolize_tracebacks" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another $ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.gather_traceback" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another rule$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for include <torch/csrc/autograd/profiler_kineto.h> (from caffe2/test/profiler/test_cpp_thread.cpp:2) when processing profiler_test_cpp_thread_lib. Some things to try: ``` Differential Revision: D62049222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135614 Approved by: https://github.com/oulgen, https://github.com/laithsakka	2024-09-13 02:04:34 +00:00
Max Podkorytov	3d45717219	[ROCm][CK][Inductor] enable dynamic shapes for CK backend to gemm max autotune (#133285 ) This PR enables dynamic shapes for the CK backend for gemm max autotune (see #125453). This is achieved via unhardcoding the problem sizes from the template body and passing them as parameters instead. We handle passing the problem sizes for the kernel call as well as for the benchmark call. # Testing `pytest test/inductor/test_ck_backend.py [-k dynamic]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133285 Approved by: https://github.com/ColinPeppler	2024-08-16 06:05:23 +00:00
Max Podkorytov	1c7dc335f7	[ROCm][CK][Inductor] Enable addmm for CK backend to gemm max autotune (#130576 ) Add functional support for torch.addmm with CK backend. See also #125453 # Implementation details 1. It turns out we can use the same template between addmm and matmul; essentially, matmul is addmm with empty bias 2. The Python generator in CK was updated to generate the shared cpp template. The pip package can be installed from `pip install git+https://github.com/rocm/composable_kernel@add-addmm` and will be merged into `develop` branch after this PR lands to avoid breaking the current matmul # Testing `pytest test/inductor/test_ck_backend.py -k addmm` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130576 Approved by: https://github.com/chenyang78	2024-08-05 17:49:09 +00:00
Xuehai Pan	134bc4fc34	[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129763 Approved by: https://github.com/jansel	2024-07-18 07:49:19 +00:00
PyTorch MergeBot	b732b52f1e	Revert "[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 )" This reverts commit `aecc746fcc`. Reverted https://github.com/pytorch/pytorch/pull/129763 on behalf of https://github.com/XuehaiPan due to need reland after rerunning lintrunner on main ([comment](https://github.com/pytorch/pytorch/pull/129763#issuecomment-2235736732))	2024-07-18 06:39:58 +00:00
Xuehai Pan	aecc746fcc	[BE][Easy][12/19] enforce style for empty lines in import segments in `test/i*/` (#129763 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129763 Approved by: https://github.com/jansel	2024-07-18 05:13:41 +00:00
Max Podkorytov	79959d707c	[Inductor][ROCm] Composable Kernel backend for Inductor (#125453 ) This PR adds an alternative backend for Inductor, adding Composable Kernel Universal GEMM instances to the autotune instance selection. The implementation is heavily influenced by the series of PRs which adds CUTLASS backend (https://github.com/pytorch/pytorch/issues/106991). The main differences are (1) customizing compiler for the ROCm platform (2) customizing template code generation for Composable Kernel Universal GEMM instances. We provide config tuning knobs for balancing between instance sources compilation time and finding the best instance. ### Testing Install the ck library ``` pip install git+https://github.com/rocm/composable_kernel@develop ``` Run the test ``` TORCH_LOGS=+torch._inductor \ pytest --capture=tee-sys test/inductor/test_ck_backend.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125453 Approved by: https://github.com/eellison, https://github.com/jansel	2024-06-25 20:54:14 +00:00

28 Commits