Commit Graph

162 Commits

Author SHA1 Message Date
Shangdi Yu
28c1d2f81b [aoti] AOTI mingw cross compilation (#163188)
To run this, you need to install `mingw64-gcc-c++` and download windows cuda library toolkit.

See design doc and demo instructions in https://docs.google.com/document/d/1iDaChqA5nNKkBFTzsdkmoomvQlXHbnlb1Z4yEp7xaJA/edit?tab=t.0

If cross_platform_target is windows, we do the following:

- do not link to `sleef`. This can be improved in the future if we need it. Currently I avoid it because that requires extra setup on the linux side
- Use `mingw64-gcc-c++` to compile
- Use `WINDOWS_CUDA_HOME` instead of `CUDA_HOME` when linking to cuda

```
 python test/inductor/test_aot_inductor_windows.py -k so
 ```

 Other changes:
 - de-couples compile_standalone config and dynamic link flag
 - create a new aot_inductor_mode config module, which is used to control configs in aot_inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163188
Approved by: https://github.com/desertfire
2025-10-01 02:22:06 +00:00
ghostspiders
26eefd5ae2 Fix windows path escape characters (#162761)
Fixes #135954
Torch Inductor Windows Path Escape Characters

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162761
Approved by: https://github.com/jansel, https://github.com/mlazos
2025-09-17 23:39:39 +00:00
Mu-Chu Lee
2291199e9b [AOTInductor] Use CudaCachingAllocator for memory allocation (#162893)
Summary:
Use c10::CudaCachingAllocator for AOTInductor's initial constant buffer
allocation.

Test Plan:
Activate test under test/cpp/aoti_inference/test.cpp

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162893
Approved by: https://github.com/desertfire
2025-09-17 17:08:20 +00:00
xinan.lin
39450e7b00 [Fix XPU CI][Inductor UT] Fix test cases broken by community. (#162933)
Fixes #162937

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162933
Approved by: https://github.com/EikanWang, https://github.com/jansel
2025-09-17 05:35:06 +00:00
Shangdi Yu
636a511084 [aoti] add config for libtorch free so (#162655)
Users can specify the following to get a libtorch_free `.so`.

"aot_inductor.use_libtorch": False,

The following config is only used for torchnative (see https://github.com/meta-pytorch/torchnative/pull/110). It's not intended to be used by executorch. The reason we need it for torchnative is because a lot of the symbol definitions in torchnative repo is only in header files.

"aot_inductor.libtorch_free_header": "/data/users/shangdiy/torchnative/standalone,/data/users/shangdiy/torchnative/" (or their custom headers)

The main motivating use case is for executorch to produce a libtorch free `.so`.

TODO for follow-up PR: this flag should be consolidated with the `compile_standalone` flag.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162655
Approved by: https://github.com/angelayi
2025-09-12 07:31:04 +00:00
PyTorch MergeBot
ab7787fb82 Revert "[inductor] Windows inductor use intel-openmp. (#160258)"
This reverts commit 41673110cd.

Reverted https://github.com/pytorch/pytorch/pull/160258 on behalf of https://github.com/malfet due to Reverting to fix https://github.com/pytorch/pytorch/issues/160898 and https://github.com/pytorch/pytorch/issues/160962 ([comment](https://github.com/pytorch/pytorch/pull/160258#issuecomment-3220158145))
2025-08-25 12:57:47 +00:00
Xu Han
22df59efc0 [inductor] add MSVC language pack check. (#161298)
Check MSVC's language pack: https://github.com/pytorch/pytorch/issues/157673#issuecomment-3051682766

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161298
Approved by: https://github.com/angelayi
2025-08-23 07:06:48 +00:00
Xu Han
17b0263e86 [inductor] fix march=native pass to Windows CC. (#161264)
fix march=native pass to Windows CC.

<img width="593" height="218" alt="image" src="https://github.com/user-attachments/assets/1caedffa-d9be-43d9-9ce2-590c055980cd" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161264
Approved by: https://github.com/angelayi
2025-08-22 18:38:51 +00:00
Xu Han
c4670e40c9 [inductor] remove Windows unsupported build options. (#161197)
Changes:
1. Math related build option is not supported by msvc, skip them on Windows.
2. Move all math related build option to `_get_ffast_math_flags` function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161197
Approved by: https://github.com/jansel
2025-08-22 06:23:43 +00:00
Xu Han
9b3ebd25ac [inductor] Enable max compatible to msvc for oneAPI headers. (#161196)
Enable max compatible to msvc for oneAPI headers.

The key context is `The /permissive- option is compatible with almost all of the header files from the latest Windows Kits` from https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161196
Approved by: https://github.com/jansel
2025-08-22 06:23:26 +00:00
Xu Han
db38c44ad6 [inductor] add libraries_dirs for level_zero (#161146)
Changes:
1. change set `include_dirs` to append value.
2. add append `libraries_dirs` for level_zero.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161146
Approved by: https://github.com/angelayi
2025-08-21 19:55:12 +00:00
Xu Han
1e3fe78a10 [inductor] disable min/max macro on Windows. (#161133)
Disable min/max macro on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161133
Approved by: https://github.com/angelayi
2025-08-21 19:52:56 +00:00
Xu Han
be87f22dfb [inductor] Enable updated __cplusplus macro (#161064)
Intel oneAPI has some header depends on `__cplusplus` macro.
This PR is enable updated __cplusplus macro for msvc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161064
Approved by: https://github.com/angelayi
2025-08-21 00:17:08 +00:00
Xu Han
2a7a7ad711 [inductor] add level zero for xpu (#161061)
Add level zero for Inductor xpu on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161061
Approved by: https://github.com/angelayi
2025-08-21 00:14:15 +00:00
Xu Han
41673110cd [inductor] Windows inductor use intel-openmp. (#160258)
After some debug work, I found PyTorch torch_cpu.dll is using intel-openmp, but not MSVC openmp.
So, switch Windows inductor to intel-openmp.

It fixed: c8205cb354/test/inductor/test_aot_inductor.py (L2405-L2408)
<img width="896" height="230" alt="image" src="https://github.com/user-attachments/assets/273b00f8-7dc1-43c9-9b7f-752e16355a80" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160258
Approved by: https://github.com/ezyang
2025-08-13 02:36:19 +00:00
Ivan Zaitsev
f8f0414a59 fix cpp builder to avoid missing-source compile error (#160354)
Summary:
the condition
```
if config.is_fbcode() and (not self._aot_mode or self._use_relative_path):
    sources = [os.path.basename(i) for i in sources]
```
unintentionally (?) stripped paths even when use_relative_path was False (as long as aot_mode was False), breaking local tests that rely on absolute temp-file paths.

Fixes internal issue:
```

FAILED (errors=1)

CppCompileError: C++ compile error

Command:
/mnt/gvfs/third-party2/llvm-fb/0f1f083aa5508772f3db24bf4f697bc118ba0958/17/platform010/72a2ff8/bin/clang-17 czyi3nhzin5b3mc3376vmfnlbjobvjcghbvv4tatuazs3syqubay.cpp -shared -fPIC -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -Werror=ignored-optimization-argument -g -o /re_tmp/tmpsp58ya2h/zy/test_symbol.so

Output:
clang-17: error: no such file or directory: 'czyi3nhzin5b3mc3376vmfnlbjobvjcghbvv4tatuazs3syqubay.cpp'
clang-17: error: no input files
```

Reviewed By: clee2000

Differential Revision: D80025417

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160354
Approved by: https://github.com/benjaminglass1, https://github.com/clee2000
2025-08-12 21:36:22 +00:00
Han, Xu
e1cf0d496e [inductor] unification for inductor debug. (#159998)
Unification inductor debug build, follow @desertfire 's suggestion: https://github.com/pytorch/pytorch/pull/159938#pullrequestreview-3093803196

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159998
Approved by: https://github.com/angelayi
2025-08-07 16:38:00 +00:00
Xu Han
c71950907d [inductor] add _get_inductor_debug_symbol_cflags for debug symbol control. (#159938)
We need to add inductor debug symbol support for crash case debug. When we turn on generate debug symbol.
On Windows, it should create a [module_name].pdb file. It helps debug by WinDBG.
On Linux, it should create some debug sections in binary file.

I added UT for it also.

It works well on Windows inductor debug.
<img width="1648" height="833" alt="image" src="https://github.com/user-attachments/assets/5282a7de-cef3-4a38-9cd4-a0e63482c8b6" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159938
Approved by: https://github.com/jansel, https://github.com/angelayi
2025-08-06 19:31:45 +00:00
Bin Bao
a4b07fe8f6 [AOTI] Add more default options to compile_standalone (#158560)
Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560
Approved by: https://github.com/yushangdi
2025-08-06 15:59:27 +00:00
Xu Han
7e00f2ec9d [AOTI] add zero size consts asm handler (#159225)
Add `get_zero_consts_asm_code` to handle zero size consts to object.
This function is used to handle zero consts situation. Because cpp standard does not allow zero size array:
https://stackoverflow.com/questions/9722632/what-happens-if-i-define-a-0-size-array-in-c-c
1. On Windows, MSVC will report error C2466:
https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2466?view=msvc-170
So, we can use assmbely compiler to handle this situation.
2. On Windows, why not use Win32 asm to handle all path? Because ml64 only supports up to align `16`, it is
not aligned to pytorch's `64`. Reference: https://learn.microsoft.com/en-us/cpp/assembler/masm/ml-and-ml64-command-line-reference?view=msvc-170
```
Packs structures on the specified byte boundary. The alignment can be 1, 2, 4, 8, or 16.
```
3. It function can handle zero size case on both Windows and Linux, as that:
    A. On Linux, we added `-pedantic` to disable zero size array on C++ compiler. 8e07c9870d/torch/_inductor/cpp_builder.py (L580)
    B. On Windows, msvc is not support zero size array by default.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159225
Approved by: https://github.com/desertfire
2025-07-31 22:46:33 +00:00
PyTorch MergeBot
7d6f340238 Revert "[AOTI] Add more default options to compile_standalone (#158560)"
This reverts commit a991e285ae.

Reverted https://github.com/pytorch/pytorch/pull/158560 on behalf of https://github.com/jeffdaily due to broke rocm CI, no test signal was available from rocm ciflow/trunk, need to add ciflow/rocm to reland ([comment](https://github.com/pytorch/pytorch/pull/158560#issuecomment-3103633964))
2025-07-22 16:20:17 +00:00
Bin Bao
a991e285ae [AOTI] Add more default options to compile_standalone (#158560)
Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560
Approved by: https://github.com/yushangdi
2025-07-21 21:16:48 +00:00
Huamin Li
bc7b1f5252 [AOTI] Use libstdc++ only for fbcode cpu case (#158659)
Differential Revision: D78567218

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158659
Approved by: https://github.com/kflu, https://github.com/zoranzhao
2025-07-18 22:27:10 +00:00
yuchengliu1
b4358c5e87 [inductor] Explicitly link c10 in inductor. (#158622)
MSVC have error "unresolved external symbol" when compiling inductor. Explicitly link c10 in inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158622
Approved by: https://github.com/desertfire

Co-authored-by: Xu Han <xu.han@outlook.com>
2025-07-18 18:00:50 +00:00
Huamin Li
ddf502c988 [AOTI] add -lstdc++ into aoti link cmd for Meta internal (#158325)
Differential Revision: D78123716

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158325
Approved by: https://github.com/desertfire
2025-07-16 07:55:08 +00:00
Shangdi Yu
4781d72faa [AOTI] codegen for static linkage (#157129)
Design doc: https://docs.google.com/document/d/1ncV7RpJ8xDwy8-_aCBfvZmpTTL824C-aoNPBLLVkOHM/edit?tab=t.0 (internal)

- Add codegen for static linkage
- refactor test code for test_compile_after_package tests

For now,  the following options must be used together with `"aot_inductor.compile_standalone": True`.
"aot_inductor.package_cpp_only": True,

Will change `"aot_inductor.package_cpp_only"` to be automatically set to True in followup PR.

```
python test/inductor/test_aot_inductor_package.py -k test_compile_after_package
python test/inductor/test_aot_inductor_package.py -k test_run_static_linkage_model
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157129
Approved by: https://github.com/desertfire
2025-07-10 16:03:50 +00:00
Xiangyang (Mark) Guo
b354328ecd [AOTI] add flag AOT_INDUCTOR_ENABLE_LTO (#157773)
Add env var AOT_INDUCTOR_ENABLE_LTO to enable clang's ThinLTO by setting AOT_INDUCTOR_ENABLE_LTO=1. The LTO is disabled by default because it may increase the build time.

Rollback Plan:

Differential Revision: D77899195

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157773
Approved by: https://github.com/desertfire
2025-07-09 16:54:19 +00:00
Xu Han
fcbf7c749a [Windows][Inductor] normalize_path_separator compiler path (#157835)
Fixes #157673

For the call trace:
```
......

  File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\common.py", line 2569, in reduction
    return self.kernel.reduction(dtype, src_dtype, reduction_type, value)
  File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 2155, in reduction
    self._gen_parallel_reduction_buffers(acc, acc_type, reduction_type, init_dtype)
  File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 1942, in _gen_parallel_reduction_buffers
    reduction_prefix_array(
  File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 335, in reduction_prefix_array
    if cpp_builder.is_msvc_cl()
  File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 317, in is_msvc_cl
    return _is_msvc_cl(get_cpp_compiler())
  File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 240, in _is_msvc_cl
    subprocess.check_output([cpp_compiler, "/help"], stderr=subprocess.STDOUT)
torch._inductor.exc.InductorError: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
```
On non-English language pack msvc environment, compiler path has raised `utf-8` issue. I add the `normalize_path_separator` to normalize the compiler path and avoid the issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157835
Approved by: https://github.com/jansel
2025-07-09 04:02:20 +00:00
Han, Xu
39b71d11fc [Inductor] add pedantic to limit inductor code follow standard. (#156914)
### Background:

During my development work, I found Windows msvc don't support to compile zero size array, please reference: https://github.com/pytorch/pytorch/issues/153180

As discussed with MSFT engineer, we found zero size array don't align to c++ standard, though gcc/clang can support it. When we add `-pedantic` option to gcc, it should check and raise c++ standard strictly. Reference: https://github.com/pytorch/pytorch/issues/153180#issuecomment-2986676878

So this PR add `-pedantic` to torch inductor build option list to constraint codegen generate c++ standard well code.
Additional, It also fixed a halide zero size array code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156914
Approved by: https://github.com/jansel
2025-06-30 16:29:08 +00:00
Nikita Shulga
039a1ce0eb [BE] Remove CXX11_ABI references from cpp_builder.py (#156896)
As all Linux builds are CXX11_ABI compatible at this point

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156896
Approved by: https://github.com/desertfire, https://github.com/jansel
2025-06-26 17:28:01 +00:00
Xuehai Pan
6ff6630375 [BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313
Approved by: https://github.com/jingsh
2025-06-23 02:57:12 +00:00
PyTorch MergeBot
f1331f3f1b Revert "[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313)"
This reverts commit 3627270bdf.

Reverted https://github.com/pytorch/pytorch/pull/156313 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](c95f7fa874) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))
2025-06-22 12:31:57 +00:00
Xuehai Pan
3627270bdf [BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313
Approved by: https://github.com/jingsh
2025-06-22 08:43:09 +00:00
cyy
c2beeadeb4 [Reland] Use 3.27 as the minimum CMake version (#154783)
Reland of #153153, which was incidentally closed.
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as CUDA::nvperf_host so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783.
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154783
Approved by: https://github.com/ezyang
2025-06-14 16:37:51 +00:00
cyy
1393f71e07 Use CUDA language in generated CMakeLists.txt from cpp_builder.py (#155979)
The CMake CUDA module has been deprecated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155979
Approved by: https://github.com/ezyang
2025-06-14 06:52:51 +00:00
angelayi
a4ab392251 [aoti][mps] mps constants support (#154287)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154287
Approved by: https://github.com/malfet
ghstack dependencies: #155752
2025-06-12 23:33:07 +00:00
Oguz Ulgen
d1947a8707 Migrate from lru_cache to cache (#155613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
Bin Bao
44df7cf28d [AOTI] Fix embed_kernel_binary error when max_autotune is ON (#155569)
Summary: Stop removing cubin files so that it won't be missing when max_autotune is ON.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155569
Approved by: https://github.com/angelayi, https://github.com/yushangdi
2025-06-11 12:27:36 +00:00
PyTorch MergeBot
bd10ea4e6c Revert "Use 3.27 as the minimum CMake version (#153153)"
This reverts commit ad26ec6abe.

Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923997777))
2025-05-31 02:14:24 +00:00
cyy
ad26ec6abe Use 3.27 as the minimum CMake version (#153153)
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783.
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-31 01:54:35 +00:00
PyTorch MergeBot
108422ac26 Revert "Use 3.27 as the minimum CMake version (#153153)"
This reverts commit 78624679a8.

Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923785799))
2025-05-31 00:28:03 +00:00
cyy
78624679a8 Use 3.27 as the minimum CMake version (#153153)
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783.
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-31 00:01:52 +00:00
PyTorch MergeBot
7e8532077f Revert "Use 3.27 as the minimum CMake version (#153153)"
This reverts commit 1ece53b157.

Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2922369830))
2025-05-30 13:16:33 +00:00
cyy
1ece53b157 Use 3.27 as the minimum CMake version (#153153)
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783.
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-30 11:25:30 +00:00
Bin Bao
5a21d6f982 [AOTI][reland] Support multi-arch when using package_cpp_only (#154608)
Summary: Reland https://github.com/pytorch/pytorch/pull/154414

Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154608
Approved by: https://github.com/yushangdi
2025-05-29 19:32:33 +00:00
PyTorch MergeBot
fdc339003b Revert "[AOTI] Support multi-arch when using package_cpp_only (#154414)"
This reverts commit a84d8c4a1c.

Reverted https://github.com/pytorch/pytorch/pull/154414 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm trunk job ([comment](https://github.com/pytorch/pytorch/pull/154414#issuecomment-2915597821))
2025-05-28 09:23:31 +00:00
Bin Bao
a84d8c4a1c [AOTI] Support multi-arch when using package_cpp_only (#154414)
Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary.

Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154414
Approved by: https://github.com/angelayi
ghstack dependencies: #154412, #154413
2025-05-28 01:20:38 +00:00
Bin Bao
cde82d25b7 [AOTI] Add a multi_arch_kernel_binary option (#154413)
Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs.

Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154413
Approved by: https://github.com/angelayi
ghstack dependencies: #154412
2025-05-28 01:20:38 +00:00
Bin Bao
72a3c8dfa8 [AOTI][reland] Add an option to specify custom op C shim (#153968)
Summary: Reland https://github.com/pytorch/pytorch/pull/153851 after fixing a fuzzer test issue.

Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153968
Approved by: https://github.com/hl475
2025-05-21 15:57:57 +00:00
Dan Zimmerman
e0f8174001 [triton][fb] Move build_paths into triton_utils (#153652)
Summary: TSA, this is just a small cleanup

Test Plan: CI

Differential Revision: D74835506

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153652
Approved by: https://github.com/Skylion007
2025-05-20 18:59:50 +00:00