pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Shangdi Yu	636a511084	[aoti] add config for libtorch free so (#162655 ) Users can specify the following to get a libtorch_free `.so`. "aot_inductor.use_libtorch": False, The following config is only used for torchnative (see https://github.com/meta-pytorch/torchnative/pull/110). It's not intended to be used by executorch. The reason we need it for torchnative is because a lot of the symbol definitions in torchnative repo is only in header files. "aot_inductor.libtorch_free_header": "/data/users/shangdiy/torchnative/standalone,/data/users/shangdiy/torchnative/" (or their custom headers) The main motivating use case is for executorch to produce a libtorch free `.so`. TODO for follow-up PR: this flag should be consolidated with the `compile_standalone` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162655 Approved by: https://github.com/angelayi	2025-09-12 07:31:04 +00:00
PyTorch MergeBot	ab7787fb82	Revert "[inductor] Windows inductor use intel-openmp. (#160258 )" This reverts commit `41673110cd`. Reverted https://github.com/pytorch/pytorch/pull/160258 on behalf of https://github.com/malfet due to Reverting to fix https://github.com/pytorch/pytorch/issues/160898 and https://github.com/pytorch/pytorch/issues/160962 ([comment](https://github.com/pytorch/pytorch/pull/160258#issuecomment-3220158145))	2025-08-25 12:57:47 +00:00
Xu Han	22df59efc0	[inductor] add MSVC language pack check. (#161298 ) Check MSVC's language pack: https://github.com/pytorch/pytorch/issues/157673#issuecomment-3051682766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161298 Approved by: https://github.com/angelayi	2025-08-23 07:06:48 +00:00
Xu Han	17b0263e86	[inductor] fix march=native pass to Windows CC. (#161264 ) fix march=native pass to Windows CC. <img width="593" height="218" alt="image" src="https://github.com/user-attachments/assets/1caedffa-d9be-43d9-9ce2-590c055980cd" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/161264 Approved by: https://github.com/angelayi	2025-08-22 18:38:51 +00:00
Xu Han	c4670e40c9	[inductor] remove Windows unsupported build options. (#161197 ) Changes: 1. Math related build option is not supported by msvc, skip them on Windows. 2. Move all math related build option to `_get_ffast_math_flags` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161197 Approved by: https://github.com/jansel	2025-08-22 06:23:43 +00:00
Xu Han	9b3ebd25ac	[inductor] Enable max compatible to msvc for oneAPI headers. (#161196 ) Enable max compatible to msvc for oneAPI headers. The key context is `The /permissive- option is compatible with almost all of the header files from the latest Windows Kits` from https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161196 Approved by: https://github.com/jansel	2025-08-22 06:23:26 +00:00
Xu Han	db38c44ad6	[inductor] add libraries_dirs for level_zero (#161146 ) Changes: 1. change set `include_dirs` to append value. 2. add append `libraries_dirs` for level_zero. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161146 Approved by: https://github.com/angelayi	2025-08-21 19:55:12 +00:00
Xu Han	1e3fe78a10	[inductor] disable min/max macro on Windows. (#161133 ) Disable min/max macro on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161133 Approved by: https://github.com/angelayi	2025-08-21 19:52:56 +00:00
Xu Han	be87f22dfb	[inductor] Enable updated __cplusplus macro (#161064 ) Intel oneAPI has some header depends on `__cplusplus` macro. This PR is enable updated __cplusplus macro for msvc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161064 Approved by: https://github.com/angelayi	2025-08-21 00:17:08 +00:00
Xu Han	2a7a7ad711	[inductor] add level zero for xpu (#161061 ) Add level zero for Inductor xpu on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161061 Approved by: https://github.com/angelayi	2025-08-21 00:14:15 +00:00
Xu Han	41673110cd	[inductor] Windows inductor use intel-openmp. (#160258 ) After some debug work, I found PyTorch torch_cpu.dll is using intel-openmp, but not MSVC openmp. So, switch Windows inductor to intel-openmp. It fixed: `c8205cb354/test/inductor/test_aot_inductor.py (L2405-L2408)` <img width="896" height="230" alt="image" src="https://github.com/user-attachments/assets/273b00f8-7dc1-43c9-9b7f-752e16355a80" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160258 Approved by: https://github.com/ezyang	2025-08-13 02:36:19 +00:00
Ivan Zaitsev	f8f0414a59	fix cpp builder to avoid missing-source compile error (#160354 ) Summary: the condition ``` if config.is_fbcode() and (not self._aot_mode or self._use_relative_path): sources = [os.path.basename(i) for i in sources] ``` unintentionally (?) stripped paths even when use_relative_path was False (as long as aot_mode was False), breaking local tests that rely on absolute temp-file paths. Fixes internal issue: ``` FAILED (errors=1) CppCompileError: C++ compile error Command: /mnt/gvfs/third-party2/llvm-fb/0f1f083aa5508772f3db24bf4f697bc118ba0958/17/platform010/72a2ff8/bin/clang-17 czyi3nhzin5b3mc3376vmfnlbjobvjcghbvv4tatuazs3syqubay.cpp -shared -fPIC -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -Werror=ignored-optimization-argument -g -o /re_tmp/tmpsp58ya2h/zy/test_symbol.so Output: clang-17: error: no such file or directory: 'czyi3nhzin5b3mc3376vmfnlbjobvjcghbvv4tatuazs3syqubay.cpp' clang-17: error: no input files ``` Reviewed By: clee2000 Differential Revision: D80025417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160354 Approved by: https://github.com/benjaminglass1, https://github.com/clee2000	2025-08-12 21:36:22 +00:00
Han, Xu	e1cf0d496e	[inductor] unification for inductor debug. (#159998 ) Unification inductor debug build, follow @desertfire 's suggestion: https://github.com/pytorch/pytorch/pull/159938#pullrequestreview-3093803196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159998 Approved by: https://github.com/angelayi	2025-08-07 16:38:00 +00:00
Xu Han	c71950907d	[inductor] add _get_inductor_debug_symbol_cflags for debug symbol control. (#159938 ) We need to add inductor debug symbol support for crash case debug. When we turn on generate debug symbol. On Windows, it should create a [module_name].pdb file. It helps debug by WinDBG. On Linux, it should create some debug sections in binary file. I added UT for it also. It works well on Windows inductor debug. <img width="1648" height="833" alt="image" src="https://github.com/user-attachments/assets/5282a7de-cef3-4a38-9cd4-a0e63482c8b6" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/159938 Approved by: https://github.com/jansel, https://github.com/angelayi	2025-08-06 19:31:45 +00:00
Bin Bao	a4b07fe8f6	[AOTI] Add more default options to compile_standalone (#158560 ) Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560 Approved by: https://github.com/yushangdi	2025-08-06 15:59:27 +00:00
Xu Han	7e00f2ec9d	[AOTI] add zero size consts asm handler (#159225 ) Add `get_zero_consts_asm_code` to handle zero size consts to object. This function is used to handle zero consts situation. Because cpp standard does not allow zero size array: https://stackoverflow.com/questions/9722632/what-happens-if-i-define-a-0-size-array-in-c-c 1. On Windows, MSVC will report error C2466: https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2466?view=msvc-170 So, we can use assmbely compiler to handle this situation. 2. On Windows, why not use Win32 asm to handle all path? Because ml64 only supports up to align `16`, it is not aligned to pytorch's `64`. Reference: https://learn.microsoft.com/en-us/cpp/assembler/masm/ml-and-ml64-command-line-reference?view=msvc-170 ``` Packs structures on the specified byte boundary. The alignment can be 1, 2, 4, 8, or 16. ``` 3. It function can handle zero size case on both Windows and Linux, as that: A. On Linux, we added `-pedantic` to disable zero size array on C++ compiler. `8e07c9870d/torch/_inductor/cpp_builder.py (L580)` B. On Windows, msvc is not support zero size array by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159225 Approved by: https://github.com/desertfire	2025-07-31 22:46:33 +00:00
PyTorch MergeBot	7d6f340238	Revert "[AOTI] Add more default options to compile_standalone (#158560 )" This reverts commit `a991e285ae`. Reverted https://github.com/pytorch/pytorch/pull/158560 on behalf of https://github.com/jeffdaily due to broke rocm CI, no test signal was available from rocm ciflow/trunk, need to add ciflow/rocm to reland ([comment](https://github.com/pytorch/pytorch/pull/158560#issuecomment-3103633964))	2025-07-22 16:20:17 +00:00
Bin Bao	a991e285ae	[AOTI] Add more default options to compile_standalone (#158560 ) Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560 Approved by: https://github.com/yushangdi	2025-07-21 21:16:48 +00:00
Huamin Li	bc7b1f5252	[AOTI] Use libstdc++ only for fbcode cpu case (#158659 ) Differential Revision: D78567218 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158659 Approved by: https://github.com/kflu, https://github.com/zoranzhao	2025-07-18 22:27:10 +00:00
yuchengliu1	b4358c5e87	[inductor] Explicitly link c10 in inductor. (#158622 ) MSVC have error "unresolved external symbol" when compiling inductor. Explicitly link c10 in inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158622 Approved by: https://github.com/desertfire Co-authored-by: Xu Han <xu.han@outlook.com>	2025-07-18 18:00:50 +00:00
Huamin Li	ddf502c988	[AOTI] add -lstdc++ into aoti link cmd for Meta internal (#158325 ) Differential Revision: D78123716 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158325 Approved by: https://github.com/desertfire	2025-07-16 07:55:08 +00:00
Shangdi Yu	4781d72faa	[AOTI] codegen for static linkage (#157129 ) Design doc: https://docs.google.com/document/d/1ncV7RpJ8xDwy8-_aCBfvZmpTTL824C-aoNPBLLVkOHM/edit?tab=t.0 (internal) - Add codegen for static linkage - refactor test code for test_compile_after_package tests For now, the following options must be used together with `"aot_inductor.compile_standalone": True`. "aot_inductor.package_cpp_only": True, Will change `"aot_inductor.package_cpp_only"` to be automatically set to True in followup PR. ``` python test/inductor/test_aot_inductor_package.py -k test_compile_after_package python test/inductor/test_aot_inductor_package.py -k test_run_static_linkage_model ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/157129 Approved by: https://github.com/desertfire	2025-07-10 16:03:50 +00:00
Xiangyang (Mark) Guo	b354328ecd	[AOTI] add flag AOT_INDUCTOR_ENABLE_LTO (#157773 ) Add env var AOT_INDUCTOR_ENABLE_LTO to enable clang's ThinLTO by setting AOT_INDUCTOR_ENABLE_LTO=1. The LTO is disabled by default because it may increase the build time. Rollback Plan: Differential Revision: D77899195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/157773 Approved by: https://github.com/desertfire	2025-07-09 16:54:19 +00:00
Xu Han	fcbf7c749a	[Windows][Inductor] normalize_path_separator compiler path (#157835 ) Fixes #157673 For the call trace: ``` ...... File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\common.py", line 2569, in reduction return self.kernel.reduction(dtype, src_dtype, reduction_type, value) File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 2155, in reduction self._gen_parallel_reduction_buffers(acc, acc_type, reduction_type, init_dtype) File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 1942, in _gen_parallel_reduction_buffers reduction_prefix_array( File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 335, in reduction_prefix_array if cpp_builder.is_msvc_cl() File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 317, in is_msvc_cl return _is_msvc_cl(get_cpp_compiler()) File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 240, in _is_msvc_cl subprocess.check_output([cpp_compiler, "/help"], stderr=subprocess.STDOUT) torch._inductor.exc.InductorError: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte ``` On non-English language pack msvc environment, compiler path has raised `utf-8` issue. I add the `normalize_path_separator` to normalize the compiler path and avoid the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157835 Approved by: https://github.com/jansel	2025-07-09 04:02:20 +00:00
Han, Xu	39b71d11fc	[Inductor] add pedantic to limit inductor code follow standard. (#156914 ) ### Background: During my development work, I found Windows msvc don't support to compile zero size array, please reference: https://github.com/pytorch/pytorch/issues/153180 As discussed with MSFT engineer, we found zero size array don't align to c++ standard, though gcc/clang can support it. When we add `-pedantic` option to gcc, it should check and raise c++ standard strictly. Reference: https://github.com/pytorch/pytorch/issues/153180#issuecomment-2986676878 So this PR add `-pedantic` to torch inductor build option list to constraint codegen generate c++ standard well code. Additional, It also fixed a halide zero size array code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156914 Approved by: https://github.com/jansel	2025-06-30 16:29:08 +00:00
Nikita Shulga	039a1ce0eb	[BE] Remove CXX11_ABI references from cpp_builder.py (#156896 ) As all Linux builds are CXX11_ABI compatible at this point Pull Request resolved: https://github.com/pytorch/pytorch/pull/156896 Approved by: https://github.com/desertfire, https://github.com/jansel	2025-06-26 17:28:01 +00:00
Xuehai Pan	6ff6630375	[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313 Approved by: https://github.com/jingsh	2025-06-23 02:57:12 +00:00
PyTorch MergeBot	f1331f3f1b	Revert "[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )" This reverts commit `3627270bdf`. Reverted https://github.com/pytorch/pytorch/pull/156313 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Xuehai Pan	3627270bdf	[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313 Approved by: https://github.com/jingsh	2025-06-22 08:43:09 +00:00
cyy	c2beeadeb4	[Reland] Use 3.27 as the minimum CMake version (#154783 ) Reland of #153153, which was incidentally closed. Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as CUDA::nvperf_host so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/154783 Approved by: https://github.com/ezyang	2025-06-14 16:37:51 +00:00
cyy	1393f71e07	Use CUDA language in generated CMakeLists.txt from cpp_builder.py (#155979 ) The CMake CUDA module has been deprecated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155979 Approved by: https://github.com/ezyang	2025-06-14 06:52:51 +00:00
angelayi	a4ab392251	[aoti][mps] mps constants support (#154287 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154287 Approved by: https://github.com/malfet ghstack dependencies: #155752	2025-06-12 23:33:07 +00:00
Oguz Ulgen	d1947a8707	Migrate from lru_cache to cache (#155613 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613 Approved by: https://github.com/ezyang ghstack dependencies: #155612	2025-06-11 19:44:18 +00:00
Bin Bao	44df7cf28d	[AOTI] Fix embed_kernel_binary error when max_autotune is ON (#155569 ) Summary: Stop removing cubin files so that it won't be missing when max_autotune is ON. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155569 Approved by: https://github.com/angelayi, https://github.com/yushangdi	2025-06-11 12:27:36 +00:00
PyTorch MergeBot	bd10ea4e6c	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `ad26ec6abe`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923997777))	2025-05-31 02:14:24 +00:00
cyy	ad26ec6abe	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-31 01:54:35 +00:00
PyTorch MergeBot	108422ac26	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `78624679a8`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923785799))	2025-05-31 00:28:03 +00:00
cyy	78624679a8	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-31 00:01:52 +00:00
PyTorch MergeBot	7e8532077f	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `1ece53b157`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2922369830))	2025-05-30 13:16:33 +00:00
cyy	1ece53b157	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-30 11:25:30 +00:00
Bin Bao	5a21d6f982	[AOTI][reland] Support multi-arch when using package_cpp_only (#154608 ) Summary: Reland https://github.com/pytorch/pytorch/pull/154414 Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154608 Approved by: https://github.com/yushangdi	2025-05-29 19:32:33 +00:00
PyTorch MergeBot	fdc339003b	Revert "[AOTI] Support multi-arch when using package_cpp_only (#154414 )" This reverts commit `a84d8c4a1c`. Reverted https://github.com/pytorch/pytorch/pull/154414 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm trunk job ([comment](https://github.com/pytorch/pytorch/pull/154414#issuecomment-2915597821))	2025-05-28 09:23:31 +00:00
Bin Bao	a84d8c4a1c	[AOTI] Support multi-arch when using package_cpp_only (#154414 ) Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary. Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154414 Approved by: https://github.com/angelayi ghstack dependencies: #154412, #154413	2025-05-28 01:20:38 +00:00
Bin Bao	cde82d25b7	[AOTI] Add a multi_arch_kernel_binary option (#154413 ) Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs. Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154413 Approved by: https://github.com/angelayi ghstack dependencies: #154412	2025-05-28 01:20:38 +00:00
Bin Bao	72a3c8dfa8	[AOTI][reland] Add an option to specify custom op C shim (#153968 ) Summary: Reland https://github.com/pytorch/pytorch/pull/153851 after fixing a fuzzer test issue. Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153968 Approved by: https://github.com/hl475	2025-05-21 15:57:57 +00:00
Dan Zimmerman	e0f8174001	[triton][fb] Move build_paths into triton_utils (#153652 ) Summary: TSA, this is just a small cleanup Test Plan: CI Differential Revision: D74835506 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153652 Approved by: https://github.com/Skylion007	2025-05-20 18:59:50 +00:00
PyTorch MergeBot	3102ae6798	Revert "[AOTI] Add an option to specify custom op C shim (#153851 )" This reverts commit `365ac49840`. Reverted https://github.com/pytorch/pytorch/pull/153851 on behalf of https://github.com/malfet due to Looks like it broke fuzzer test, but I could be wrong, see `c4d1ff02f8/1` ([comment](https://github.com/pytorch/pytorch/pull/153851#issuecomment-2894619773))	2025-05-20 14:23:50 +00:00
Bin Bao	365ac49840	[AOTI] Add an option to specify custom op C shim (#153851 ) Summary: Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions. Differential Revision: [D75014177](https://our.internmc.facebook.com/intern/diff/D75014177) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153851 Approved by: https://github.com/hl475	2025-05-20 05:12:09 +00:00
Bin Bao	a2d0ef242d	[AOTI] Embed cubin files into .so (#150739 ) Summary: Embed cubin files so AOTI is one step closer to generate a single binary. Controlled by a flag and off as default. Differential Revision: [D72535357](https://our.internmc.facebook.com/intern/diff/D72535357) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150739 Approved by: https://github.com/angelayi	2025-05-19 01:11:46 +00:00
Benjamin Glass	cda572b053	codecache: Remove cpp_prefix.h duplication per build, then precompile it (#144293 ) Prior to this PR, `_inductor/codegen/cpp_prefix.h` was copied into a new temporary directory on every inductor run utilizing the CPP backend (i.e. CPU-only), then included in the output source code. Instead, this PR puts it in an appropriate place in the torch includes, and includes it from there. This allows us to precompile it in cpp_wrapper and AOT inductor mode, saving significant compilation time. Due to difficulties getting this to work in FBCode, the precompilation itself is only enabled in OSS PyTorch. Differential Revision: [D69420620](https://our.internmc.facebook.com/intern/diff/D69420620) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144293 Approved by: https://github.com/desertfire	2025-05-16 17:41:36 +00:00

1 2 3 4

158 Commits