xinan.lin
39450e7b00
[Fix XPU CI][Inductor UT] Fix test cases broken by community. ( #162933 )
...
Fixes #162937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162933
Approved by: https://github.com/EikanWang , https://github.com/jansel
2025-09-17 05:35:06 +00:00
Shangdi Yu
636a511084
[aoti] add config for libtorch free so ( #162655 )
...
Users can specify the following to get a libtorch_free `.so`.
"aot_inductor.use_libtorch": False,
The following config is only used for torchnative (see https://github.com/meta-pytorch/torchnative/pull/110 ). It's not intended to be used by executorch. The reason we need it for torchnative is because a lot of the symbol definitions in torchnative repo is only in header files.
"aot_inductor.libtorch_free_header": "/data/users/shangdiy/torchnative/standalone,/data/users/shangdiy/torchnative/" (or their custom headers)
The main motivating use case is for executorch to produce a libtorch free `.so`.
TODO for follow-up PR: this flag should be consolidated with the `compile_standalone` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162655
Approved by: https://github.com/angelayi
2025-09-12 07:31:04 +00:00
PyTorch MergeBot
ab7787fb82
Revert "[inductor] Windows inductor use intel-openmp. ( #160258 )"
...
This reverts commit 41673110cd .
Reverted https://github.com/pytorch/pytorch/pull/160258 on behalf of https://github.com/malfet due to Reverting to fix https://github.com/pytorch/pytorch/issues/160898 and https://github.com/pytorch/pytorch/issues/160962 ([comment](https://github.com/pytorch/pytorch/pull/160258#issuecomment-3220158145 ))
2025-08-25 12:57:47 +00:00
Xu Han
22df59efc0
[inductor] add MSVC language pack check. ( #161298 )
...
Check MSVC's language pack: https://github.com/pytorch/pytorch/issues/157673#issuecomment-3051682766
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161298
Approved by: https://github.com/angelayi
2025-08-23 07:06:48 +00:00
Xu Han
17b0263e86
[inductor] fix march=native pass to Windows CC. ( #161264 )
...
fix march=native pass to Windows CC.
<img width="593" height="218" alt="image" src="https://github.com/user-attachments/assets/1caedffa-d9be-43d9-9ce2-590c055980cd " />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161264
Approved by: https://github.com/angelayi
2025-08-22 18:38:51 +00:00
Xu Han
c4670e40c9
[inductor] remove Windows unsupported build options. ( #161197 )
...
Changes:
1. Math related build option is not supported by msvc, skip them on Windows.
2. Move all math related build option to `_get_ffast_math_flags` function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161197
Approved by: https://github.com/jansel
2025-08-22 06:23:43 +00:00
Xu Han
9b3ebd25ac
[inductor] Enable max compatible to msvc for oneAPI headers. ( #161196 )
...
Enable max compatible to msvc for oneAPI headers.
The key context is `The /permissive- option is compatible with almost all of the header files from the latest Windows Kits` from https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161196
Approved by: https://github.com/jansel
2025-08-22 06:23:26 +00:00
Xu Han
db38c44ad6
[inductor] add libraries_dirs for level_zero ( #161146 )
...
Changes:
1. change set `include_dirs` to append value.
2. add append `libraries_dirs` for level_zero.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161146
Approved by: https://github.com/angelayi
2025-08-21 19:55:12 +00:00
Xu Han
1e3fe78a10
[inductor] disable min/max macro on Windows. ( #161133 )
...
Disable min/max macro on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161133
Approved by: https://github.com/angelayi
2025-08-21 19:52:56 +00:00
Xu Han
be87f22dfb
[inductor] Enable updated __cplusplus macro ( #161064 )
...
Intel oneAPI has some header depends on `__cplusplus` macro.
This PR is enable updated __cplusplus macro for msvc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161064
Approved by: https://github.com/angelayi
2025-08-21 00:17:08 +00:00
Xu Han
2a7a7ad711
[inductor] add level zero for xpu ( #161061 )
...
Add level zero for Inductor xpu on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161061
Approved by: https://github.com/angelayi
2025-08-21 00:14:15 +00:00
Xu Han
41673110cd
[inductor] Windows inductor use intel-openmp. ( #160258 )
...
After some debug work, I found PyTorch torch_cpu.dll is using intel-openmp, but not MSVC openmp.
So, switch Windows inductor to intel-openmp.
It fixed: c8205cb354/test/inductor/test_aot_inductor.py (L2405-L2408)
<img width="896" height="230" alt="image" src="https://github.com/user-attachments/assets/273b00f8-7dc1-43c9-9b7f-752e16355a80 " />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160258
Approved by: https://github.com/ezyang
2025-08-13 02:36:19 +00:00
Ivan Zaitsev
f8f0414a59
fix cpp builder to avoid missing-source compile error ( #160354 )
...
Summary:
the condition
```
if config.is_fbcode() and (not self._aot_mode or self._use_relative_path):
sources = [os.path.basename(i) for i in sources]
```
unintentionally (?) stripped paths even when use_relative_path was False (as long as aot_mode was False), breaking local tests that rely on absolute temp-file paths.
Fixes internal issue:
```
FAILED (errors=1)
CppCompileError: C++ compile error
Command:
/mnt/gvfs/third-party2/llvm-fb/0f1f083aa5508772f3db24bf4f697bc118ba0958/17/platform010/72a2ff8/bin/clang-17 czyi3nhzin5b3mc3376vmfnlbjobvjcghbvv4tatuazs3syqubay.cpp -shared -fPIC -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -Werror=ignored-optimization-argument -g -o /re_tmp/tmpsp58ya2h/zy/test_symbol.so
Output:
clang-17: error: no such file or directory: 'czyi3nhzin5b3mc3376vmfnlbjobvjcghbvv4tatuazs3syqubay.cpp'
clang-17: error: no input files
```
Reviewed By: clee2000
Differential Revision: D80025417
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160354
Approved by: https://github.com/benjaminglass1 , https://github.com/clee2000
2025-08-12 21:36:22 +00:00
Han, Xu
e1cf0d496e
[inductor] unification for inductor debug. ( #159998 )
...
Unification inductor debug build, follow @desertfire 's suggestion: https://github.com/pytorch/pytorch/pull/159938#pullrequestreview-3093803196
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159998
Approved by: https://github.com/angelayi
2025-08-07 16:38:00 +00:00
Xu Han
c71950907d
[inductor] add _get_inductor_debug_symbol_cflags for debug symbol control. ( #159938 )
...
We need to add inductor debug symbol support for crash case debug. When we turn on generate debug symbol.
On Windows, it should create a [module_name].pdb file. It helps debug by WinDBG.
On Linux, it should create some debug sections in binary file.
I added UT for it also.
It works well on Windows inductor debug.
<img width="1648" height="833" alt="image" src="https://github.com/user-attachments/assets/5282a7de-cef3-4a38-9cd4-a0e63482c8b6 " />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159938
Approved by: https://github.com/jansel , https://github.com/angelayi
2025-08-06 19:31:45 +00:00
Bin Bao
a4b07fe8f6
[AOTI] Add more default options to compile_standalone ( #158560 )
...
Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560
Approved by: https://github.com/yushangdi
2025-08-06 15:59:27 +00:00
Xu Han
7e00f2ec9d
[AOTI] add zero size consts asm handler ( #159225 )
...
Add `get_zero_consts_asm_code` to handle zero size consts to object.
This function is used to handle zero consts situation. Because cpp standard does not allow zero size array:
https://stackoverflow.com/questions/9722632/what-happens-if-i-define-a-0-size-array-in-c-c
1. On Windows, MSVC will report error C2466:
https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2466?view=msvc-170
So, we can use assmbely compiler to handle this situation.
2. On Windows, why not use Win32 asm to handle all path? Because ml64 only supports up to align `16`, it is
not aligned to pytorch's `64`. Reference: https://learn.microsoft.com/en-us/cpp/assembler/masm/ml-and-ml64-command-line-reference?view=msvc-170
```
Packs structures on the specified byte boundary. The alignment can be 1, 2, 4, 8, or 16.
```
3. It function can handle zero size case on both Windows and Linux, as that:
A. On Linux, we added `-pedantic` to disable zero size array on C++ compiler. 8e07c9870d/torch/_inductor/cpp_builder.py (L580)
B. On Windows, msvc is not support zero size array by default.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159225
Approved by: https://github.com/desertfire
2025-07-31 22:46:33 +00:00
PyTorch MergeBot
7d6f340238
Revert "[AOTI] Add more default options to compile_standalone ( #158560 )"
...
This reverts commit a991e285ae .
Reverted https://github.com/pytorch/pytorch/pull/158560 on behalf of https://github.com/jeffdaily due to broke rocm CI, no test signal was available from rocm ciflow/trunk, need to add ciflow/rocm to reland ([comment](https://github.com/pytorch/pytorch/pull/158560#issuecomment-3103633964 ))
2025-07-22 16:20:17 +00:00
Bin Bao
a991e285ae
[AOTI] Add more default options to compile_standalone ( #158560 )
...
Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560
Approved by: https://github.com/yushangdi
2025-07-21 21:16:48 +00:00
Huamin Li
bc7b1f5252
[AOTI] Use libstdc++ only for fbcode cpu case ( #158659 )
...
Differential Revision: D78567218
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158659
Approved by: https://github.com/kflu , https://github.com/zoranzhao
2025-07-18 22:27:10 +00:00
yuchengliu1
b4358c5e87
[inductor] Explicitly link c10 in inductor. ( #158622 )
...
MSVC have error "unresolved external symbol" when compiling inductor. Explicitly link c10 in inductor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158622
Approved by: https://github.com/desertfire
Co-authored-by: Xu Han <xu.han@outlook.com>
2025-07-18 18:00:50 +00:00
Huamin Li
ddf502c988
[AOTI] add -lstdc++ into aoti link cmd for Meta internal ( #158325 )
...
Differential Revision: D78123716
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158325
Approved by: https://github.com/desertfire
2025-07-16 07:55:08 +00:00
Shangdi Yu
4781d72faa
[AOTI] codegen for static linkage ( #157129 )
...
Design doc: https://docs.google.com/document/d/1ncV7RpJ8xDwy8-_aCBfvZmpTTL824C-aoNPBLLVkOHM/edit?tab=t.0 (internal)
- Add codegen for static linkage
- refactor test code for test_compile_after_package tests
For now, the following options must be used together with `"aot_inductor.compile_standalone": True`.
"aot_inductor.package_cpp_only": True,
Will change `"aot_inductor.package_cpp_only"` to be automatically set to True in followup PR.
```
python test/inductor/test_aot_inductor_package.py -k test_compile_after_package
python test/inductor/test_aot_inductor_package.py -k test_run_static_linkage_model
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157129
Approved by: https://github.com/desertfire
2025-07-10 16:03:50 +00:00
Xiangyang (Mark) Guo
b354328ecd
[AOTI] add flag AOT_INDUCTOR_ENABLE_LTO ( #157773 )
...
Add env var AOT_INDUCTOR_ENABLE_LTO to enable clang's ThinLTO by setting AOT_INDUCTOR_ENABLE_LTO=1. The LTO is disabled by default because it may increase the build time.
Rollback Plan:
Differential Revision: D77899195
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157773
Approved by: https://github.com/desertfire
2025-07-09 16:54:19 +00:00
Xu Han
fcbf7c749a
[Windows][Inductor] normalize_path_separator compiler path ( #157835 )
...
Fixes #157673
For the call trace:
```
......
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\common.py", line 2569, in reduction
return self.kernel.reduction(dtype, src_dtype, reduction_type, value)
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 2155, in reduction
self._gen_parallel_reduction_buffers(acc, acc_type, reduction_type, init_dtype)
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 1942, in _gen_parallel_reduction_buffers
reduction_prefix_array(
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 335, in reduction_prefix_array
if cpp_builder.is_msvc_cl()
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 317, in is_msvc_cl
return _is_msvc_cl(get_cpp_compiler())
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 240, in _is_msvc_cl
subprocess.check_output([cpp_compiler, "/help"], stderr=subprocess.STDOUT)
torch._inductor.exc.InductorError: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
```
On non-English language pack msvc environment, compiler path has raised `utf-8` issue. I add the `normalize_path_separator` to normalize the compiler path and avoid the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157835
Approved by: https://github.com/jansel
2025-07-09 04:02:20 +00:00
Han, Xu
39b71d11fc
[Inductor] add pedantic to limit inductor code follow standard. ( #156914 )
...
### Background:
During my development work, I found Windows msvc don't support to compile zero size array, please reference: https://github.com/pytorch/pytorch/issues/153180
As discussed with MSFT engineer, we found zero size array don't align to c++ standard, though gcc/clang can support it. When we add `-pedantic` option to gcc, it should check and raise c++ standard strictly. Reference: https://github.com/pytorch/pytorch/issues/153180#issuecomment-2986676878
So this PR add `-pedantic` to torch inductor build option list to constraint codegen generate c++ standard well code.
Additional, It also fixed a halide zero size array code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156914
Approved by: https://github.com/jansel
2025-06-30 16:29:08 +00:00
Nikita Shulga
039a1ce0eb
[BE] Remove CXX11_ABI references from cpp_builder.py ( #156896 )
...
As all Linux builds are CXX11_ABI compatible at this point
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156896
Approved by: https://github.com/desertfire , https://github.com/jansel
2025-06-26 17:28:01 +00:00
Xuehai Pan
6ff6630375
[BE][3/16] fix typos in torch/ (torch/_inductor/) ( #156313 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313
Approved by: https://github.com/jingsh
2025-06-23 02:57:12 +00:00
PyTorch MergeBot
f1331f3f1b
Revert "[BE][3/16] fix typos in torch/ (torch/_inductor/) ( #156313 )"
...
This reverts commit 3627270bdf .
Reverted https://github.com/pytorch/pytorch/pull/156313 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912 ) [HUD commit link](c95f7fa874 ) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213 ))
2025-06-22 12:31:57 +00:00
Xuehai Pan
3627270bdf
[BE][3/16] fix typos in torch/ (torch/_inductor/) ( #156313 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313
Approved by: https://github.com/jingsh
2025-06-22 08:43:09 +00:00
cyy
c2beeadeb4
[Reland] Use 3.27 as the minimum CMake version ( #154783 )
...
Reland of #153153 , which was incidentally closed.
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as CUDA::nvperf_host so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783 .
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154783
Approved by: https://github.com/ezyang
2025-06-14 16:37:51 +00:00
cyy
1393f71e07
Use CUDA language in generated CMakeLists.txt from cpp_builder.py ( #155979 )
...
The CMake CUDA module has been deprecated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155979
Approved by: https://github.com/ezyang
2025-06-14 06:52:51 +00:00
angelayi
a4ab392251
[aoti][mps] mps constants support ( #154287 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154287
Approved by: https://github.com/malfet
ghstack dependencies: #155752
2025-06-12 23:33:07 +00:00
Oguz Ulgen
d1947a8707
Migrate from lru_cache to cache ( #155613 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
Bin Bao
44df7cf28d
[AOTI] Fix embed_kernel_binary error when max_autotune is ON ( #155569 )
...
Summary: Stop removing cubin files so that it won't be missing when max_autotune is ON.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155569
Approved by: https://github.com/angelayi , https://github.com/yushangdi
2025-06-11 12:27:36 +00:00
PyTorch MergeBot
bd10ea4e6c
Revert "Use 3.27 as the minimum CMake version ( #153153 )"
...
This reverts commit ad26ec6abe .
Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923997777 ))
2025-05-31 02:14:24 +00:00
cyy
ad26ec6abe
Use 3.27 as the minimum CMake version ( #153153 )
...
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783 .
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-31 01:54:35 +00:00
PyTorch MergeBot
108422ac26
Revert "Use 3.27 as the minimum CMake version ( #153153 )"
...
This reverts commit 78624679a8 .
Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923785799 ))
2025-05-31 00:28:03 +00:00
cyy
78624679a8
Use 3.27 as the minimum CMake version ( #153153 )
...
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783 .
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-31 00:01:52 +00:00
PyTorch MergeBot
7e8532077f
Revert "Use 3.27 as the minimum CMake version ( #153153 )"
...
This reverts commit 1ece53b157 .
Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2922369830 ))
2025-05-30 13:16:33 +00:00
cyy
1ece53b157
Use 3.27 as the minimum CMake version ( #153153 )
...
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783 .
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-30 11:25:30 +00:00
Bin Bao
5a21d6f982
[AOTI][reland] Support multi-arch when using package_cpp_only ( #154608 )
...
Summary: Reland https://github.com/pytorch/pytorch/pull/154414
Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154608
Approved by: https://github.com/yushangdi
2025-05-29 19:32:33 +00:00
PyTorch MergeBot
fdc339003b
Revert "[AOTI] Support multi-arch when using package_cpp_only ( #154414 )"
...
This reverts commit a84d8c4a1c .
Reverted https://github.com/pytorch/pytorch/pull/154414 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm trunk job ([comment](https://github.com/pytorch/pytorch/pull/154414#issuecomment-2915597821 ))
2025-05-28 09:23:31 +00:00
Bin Bao
a84d8c4a1c
[AOTI] Support multi-arch when using package_cpp_only ( #154414 )
...
Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary.
Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154414
Approved by: https://github.com/angelayi
ghstack dependencies: #154412 , #154413
2025-05-28 01:20:38 +00:00
Bin Bao
cde82d25b7
[AOTI] Add a multi_arch_kernel_binary option ( #154413 )
...
Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs.
Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154413
Approved by: https://github.com/angelayi
ghstack dependencies: #154412
2025-05-28 01:20:38 +00:00
Bin Bao
72a3c8dfa8
[AOTI][reland] Add an option to specify custom op C shim ( #153968 )
...
Summary: Reland https://github.com/pytorch/pytorch/pull/153851 after fixing a fuzzer test issue.
Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153968
Approved by: https://github.com/hl475
2025-05-21 15:57:57 +00:00
Dan Zimmerman
e0f8174001
[triton][fb] Move build_paths into triton_utils ( #153652 )
...
Summary: TSA, this is just a small cleanup
Test Plan: CI
Differential Revision: D74835506
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153652
Approved by: https://github.com/Skylion007
2025-05-20 18:59:50 +00:00
PyTorch MergeBot
3102ae6798
Revert "[AOTI] Add an option to specify custom op C shim ( #153851 )"
...
This reverts commit 365ac49840 .
Reverted https://github.com/pytorch/pytorch/pull/153851 on behalf of https://github.com/malfet due to Looks like it broke fuzzer test, but I could be wrong, see c4d1ff02f8/1 ([comment](https://github.com/pytorch/pytorch/pull/153851#issuecomment-2894619773 ))
2025-05-20 14:23:50 +00:00
Bin Bao
365ac49840
[AOTI] Add an option to specify custom op C shim ( #153851 )
...
Summary: Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions.
Differential Revision: [D75014177](https://our.internmc.facebook.com/intern/diff/D75014177 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153851
Approved by: https://github.com/hl475
2025-05-20 05:12:09 +00:00
Bin Bao
a2d0ef242d
[AOTI] Embed cubin files into .so ( #150739 )
...
Summary: Embed cubin files so AOTI is one step closer to generate a single binary. Controlled by a flag and off as default.
Differential Revision: [D72535357](https://our.internmc.facebook.com/intern/diff/D72535357 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150739
Approved by: https://github.com/angelayi
2025-05-19 01:11:46 +00:00