pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Shangdi Yu	28c1d2f81b	[aoti] AOTI mingw cross compilation (#163188 ) To run this, you need to install `mingw64-gcc-c++` and download windows cuda library toolkit. See design doc and demo instructions in https://docs.google.com/document/d/1iDaChqA5nNKkBFTzsdkmoomvQlXHbnlb1Z4yEp7xaJA/edit?tab=t.0 If cross_platform_target is windows, we do the following: - do not link to `sleef`. This can be improved in the future if we need it. Currently I avoid it because that requires extra setup on the linux side - Use `mingw64-gcc-c++` to compile - Use `WINDOWS_CUDA_HOME` instead of `CUDA_HOME` when linking to cuda ``` python test/inductor/test_aot_inductor_windows.py -k so ``` Other changes: - de-couples compile_standalone config and dynamic link flag - create a new aot_inductor_mode config module, which is used to control configs in aot_inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163188 Approved by: https://github.com/desertfire	2025-10-01 02:22:06 +00:00
ghostspiders	26eefd5ae2	Fix windows path escape characters (#162761 ) Fixes #135954 Torch Inductor Windows Path Escape Characters Pull Request resolved: https://github.com/pytorch/pytorch/pull/162761 Approved by: https://github.com/jansel, https://github.com/mlazos	2025-09-17 23:39:39 +00:00
Mu-Chu Lee	2291199e9b	[AOTInductor] Use CudaCachingAllocator for memory allocation (#162893 ) Summary: Use c10::CudaCachingAllocator for AOTInductor's initial constant buffer allocation. Test Plan: Activate test under test/cpp/aoti_inference/test.cpp Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/162893 Approved by: https://github.com/desertfire	2025-09-17 17:08:20 +00:00
xinan.lin	39450e7b00	[Fix XPU CI][Inductor UT] Fix test cases broken by community. (#162933 ) Fixes #162937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162933 Approved by: https://github.com/EikanWang, https://github.com/jansel	2025-09-17 05:35:06 +00:00
Shangdi Yu	636a511084	[aoti] add config for libtorch free so (#162655 ) Users can specify the following to get a libtorch_free `.so`. "aot_inductor.use_libtorch": False, The following config is only used for torchnative (see https://github.com/meta-pytorch/torchnative/pull/110). It's not intended to be used by executorch. The reason we need it for torchnative is because a lot of the symbol definitions in torchnative repo is only in header files. "aot_inductor.libtorch_free_header": "/data/users/shangdiy/torchnative/standalone,/data/users/shangdiy/torchnative/" (or their custom headers) The main motivating use case is for executorch to produce a libtorch free `.so`. TODO for follow-up PR: this flag should be consolidated with the `compile_standalone` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162655 Approved by: https://github.com/angelayi	2025-09-12 07:31:04 +00:00
PyTorch MergeBot	ab7787fb82	Revert "[inductor] Windows inductor use intel-openmp. (#160258 )" This reverts commit `41673110cd`. Reverted https://github.com/pytorch/pytorch/pull/160258 on behalf of https://github.com/malfet due to Reverting to fix https://github.com/pytorch/pytorch/issues/160898 and https://github.com/pytorch/pytorch/issues/160962 ([comment](https://github.com/pytorch/pytorch/pull/160258#issuecomment-3220158145))	2025-08-25 12:57:47 +00:00
Xu Han	22df59efc0	[inductor] add MSVC language pack check. (#161298 ) Check MSVC's language pack: https://github.com/pytorch/pytorch/issues/157673#issuecomment-3051682766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161298 Approved by: https://github.com/angelayi	2025-08-23 07:06:48 +00:00
Xu Han	17b0263e86	[inductor] fix march=native pass to Windows CC. (#161264 ) fix march=native pass to Windows CC. <img width="593" height="218" alt="image" src="https://github.com/user-attachments/assets/1caedffa-d9be-43d9-9ce2-590c055980cd" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/161264 Approved by: https://github.com/angelayi	2025-08-22 18:38:51 +00:00
Xu Han	c4670e40c9	[inductor] remove Windows unsupported build options. (#161197 ) Changes: 1. Math related build option is not supported by msvc, skip them on Windows. 2. Move all math related build option to `_get_ffast_math_flags` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161197 Approved by: https://github.com/jansel	2025-08-22 06:23:43 +00:00
Xu Han	9b3ebd25ac	[inductor] Enable max compatible to msvc for oneAPI headers. (#161196 ) Enable max compatible to msvc for oneAPI headers. The key context is `The /permissive- option is compatible with almost all of the header files from the latest Windows Kits` from https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161196 Approved by: https://github.com/jansel	2025-08-22 06:23:26 +00:00
Xu Han	db38c44ad6	[inductor] add libraries_dirs for level_zero (#161146 ) Changes: 1. change set `include_dirs` to append value. 2. add append `libraries_dirs` for level_zero. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161146 Approved by: https://github.com/angelayi	2025-08-21 19:55:12 +00:00
Xu Han	1e3fe78a10	[inductor] disable min/max macro on Windows. (#161133 ) Disable min/max macro on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161133 Approved by: https://github.com/angelayi	2025-08-21 19:52:56 +00:00
Xu Han	be87f22dfb	[inductor] Enable updated __cplusplus macro (#161064 ) Intel oneAPI has some header depends on `__cplusplus` macro. This PR is enable updated __cplusplus macro for msvc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161064 Approved by: https://github.com/angelayi	2025-08-21 00:17:08 +00:00
Xu Han	2a7a7ad711	[inductor] add level zero for xpu (#161061 ) Add level zero for Inductor xpu on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161061 Approved by: https://github.com/angelayi	2025-08-21 00:14:15 +00:00
Xu Han	41673110cd	[inductor] Windows inductor use intel-openmp. (#160258 ) After some debug work, I found PyTorch torch_cpu.dll is using intel-openmp, but not MSVC openmp. So, switch Windows inductor to intel-openmp. It fixed: `c8205cb354/test/inductor/test_aot_inductor.py (L2405-L2408)` <img width="896" height="230" alt="image" src="https://github.com/user-attachments/assets/273b00f8-7dc1-43c9-9b7f-752e16355a80" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160258 Approved by: https://github.com/ezyang	2025-08-13 02:36:19 +00:00
Ivan Zaitsev	f8f0414a59	fix cpp builder to avoid missing-source compile error (#160354 ) Summary: the condition ``` if config.is_fbcode() and (not self._aot_mode or self._use_relative_path): sources = [os.path.basename(i) for i in sources] ``` unintentionally (?) stripped paths even when use_relative_path was False (as long as aot_mode was False), breaking local tests that rely on absolute temp-file paths. Fixes internal issue: ``` FAILED (errors=1) CppCompileError: C++ compile error Command: /mnt/gvfs/third-party2/llvm-fb/0f1f083aa5508772f3db24bf4f697bc118ba0958/17/platform010/72a2ff8/bin/clang-17 czyi3nhzin5b3mc3376vmfnlbjobvjcghbvv4tatuazs3syqubay.cpp -shared -fPIC -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -Werror=ignored-optimization-argument -g -o /re_tmp/tmpsp58ya2h/zy/test_symbol.so Output: clang-17: error: no such file or directory: 'czyi3nhzin5b3mc3376vmfnlbjobvjcghbvv4tatuazs3syqubay.cpp' clang-17: error: no input files ``` Reviewed By: clee2000 Differential Revision: D80025417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160354 Approved by: https://github.com/benjaminglass1, https://github.com/clee2000	2025-08-12 21:36:22 +00:00
Han, Xu	e1cf0d496e	[inductor] unification for inductor debug. (#159998 ) Unification inductor debug build, follow @desertfire 's suggestion: https://github.com/pytorch/pytorch/pull/159938#pullrequestreview-3093803196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159998 Approved by: https://github.com/angelayi	2025-08-07 16:38:00 +00:00
Xu Han	c71950907d	[inductor] add _get_inductor_debug_symbol_cflags for debug symbol control. (#159938 ) We need to add inductor debug symbol support for crash case debug. When we turn on generate debug symbol. On Windows, it should create a [module_name].pdb file. It helps debug by WinDBG. On Linux, it should create some debug sections in binary file. I added UT for it also. It works well on Windows inductor debug. <img width="1648" height="833" alt="image" src="https://github.com/user-attachments/assets/5282a7de-cef3-4a38-9cd4-a0e63482c8b6" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/159938 Approved by: https://github.com/jansel, https://github.com/angelayi	2025-08-06 19:31:45 +00:00
Bin Bao	a4b07fe8f6	[AOTI] Add more default options to compile_standalone (#158560 ) Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560 Approved by: https://github.com/yushangdi	2025-08-06 15:59:27 +00:00
Xu Han	7e00f2ec9d	[AOTI] add zero size consts asm handler (#159225 ) Add `get_zero_consts_asm_code` to handle zero size consts to object. This function is used to handle zero consts situation. Because cpp standard does not allow zero size array: https://stackoverflow.com/questions/9722632/what-happens-if-i-define-a-0-size-array-in-c-c 1. On Windows, MSVC will report error C2466: https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2466?view=msvc-170 So, we can use assmbely compiler to handle this situation. 2. On Windows, why not use Win32 asm to handle all path? Because ml64 only supports up to align `16`, it is not aligned to pytorch's `64`. Reference: https://learn.microsoft.com/en-us/cpp/assembler/masm/ml-and-ml64-command-line-reference?view=msvc-170 ``` Packs structures on the specified byte boundary. The alignment can be 1, 2, 4, 8, or 16. ``` 3. It function can handle zero size case on both Windows and Linux, as that: A. On Linux, we added `-pedantic` to disable zero size array on C++ compiler. `8e07c9870d/torch/_inductor/cpp_builder.py (L580)` B. On Windows, msvc is not support zero size array by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159225 Approved by: https://github.com/desertfire	2025-07-31 22:46:33 +00:00
PyTorch MergeBot	7d6f340238	Revert "[AOTI] Add more default options to compile_standalone (#158560 )" This reverts commit `a991e285ae`. Reverted https://github.com/pytorch/pytorch/pull/158560 on behalf of https://github.com/jeffdaily due to broke rocm CI, no test signal was available from rocm ciflow/trunk, need to add ciflow/rocm to reland ([comment](https://github.com/pytorch/pytorch/pull/158560#issuecomment-3103633964))	2025-07-22 16:20:17 +00:00
Bin Bao	a991e285ae	[AOTI] Add more default options to compile_standalone (#158560 ) Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560 Approved by: https://github.com/yushangdi	2025-07-21 21:16:48 +00:00
Huamin Li	bc7b1f5252	[AOTI] Use libstdc++ only for fbcode cpu case (#158659 ) Differential Revision: D78567218 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158659 Approved by: https://github.com/kflu, https://github.com/zoranzhao	2025-07-18 22:27:10 +00:00
yuchengliu1	b4358c5e87	[inductor] Explicitly link c10 in inductor. (#158622 ) MSVC have error "unresolved external symbol" when compiling inductor. Explicitly link c10 in inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158622 Approved by: https://github.com/desertfire Co-authored-by: Xu Han <xu.han@outlook.com>	2025-07-18 18:00:50 +00:00
Huamin Li	ddf502c988	[AOTI] add -lstdc++ into aoti link cmd for Meta internal (#158325 ) Differential Revision: D78123716 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158325 Approved by: https://github.com/desertfire	2025-07-16 07:55:08 +00:00
Shangdi Yu	4781d72faa	[AOTI] codegen for static linkage (#157129 ) Design doc: https://docs.google.com/document/d/1ncV7RpJ8xDwy8-_aCBfvZmpTTL824C-aoNPBLLVkOHM/edit?tab=t.0 (internal) - Add codegen for static linkage - refactor test code for test_compile_after_package tests For now, the following options must be used together with `"aot_inductor.compile_standalone": True`. "aot_inductor.package_cpp_only": True, Will change `"aot_inductor.package_cpp_only"` to be automatically set to True in followup PR. ``` python test/inductor/test_aot_inductor_package.py -k test_compile_after_package python test/inductor/test_aot_inductor_package.py -k test_run_static_linkage_model ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/157129 Approved by: https://github.com/desertfire	2025-07-10 16:03:50 +00:00
Xiangyang (Mark) Guo	b354328ecd	[AOTI] add flag AOT_INDUCTOR_ENABLE_LTO (#157773 ) Add env var AOT_INDUCTOR_ENABLE_LTO to enable clang's ThinLTO by setting AOT_INDUCTOR_ENABLE_LTO=1. The LTO is disabled by default because it may increase the build time. Rollback Plan: Differential Revision: D77899195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/157773 Approved by: https://github.com/desertfire	2025-07-09 16:54:19 +00:00
Xu Han	fcbf7c749a	[Windows][Inductor] normalize_path_separator compiler path (#157835 ) Fixes #157673 For the call trace: ``` ...... File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\common.py", line 2569, in reduction return self.kernel.reduction(dtype, src_dtype, reduction_type, value) File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 2155, in reduction self._gen_parallel_reduction_buffers(acc, acc_type, reduction_type, init_dtype) File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 1942, in _gen_parallel_reduction_buffers reduction_prefix_array( File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 335, in reduction_prefix_array if cpp_builder.is_msvc_cl() File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 317, in is_msvc_cl return _is_msvc_cl(get_cpp_compiler()) File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 240, in _is_msvc_cl subprocess.check_output([cpp_compiler, "/help"], stderr=subprocess.STDOUT) torch._inductor.exc.InductorError: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte ``` On non-English language pack msvc environment, compiler path has raised `utf-8` issue. I add the `normalize_path_separator` to normalize the compiler path and avoid the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157835 Approved by: https://github.com/jansel	2025-07-09 04:02:20 +00:00
Han, Xu	39b71d11fc	[Inductor] add pedantic to limit inductor code follow standard. (#156914 ) ### Background: During my development work, I found Windows msvc don't support to compile zero size array, please reference: https://github.com/pytorch/pytorch/issues/153180 As discussed with MSFT engineer, we found zero size array don't align to c++ standard, though gcc/clang can support it. When we add `-pedantic` option to gcc, it should check and raise c++ standard strictly. Reference: https://github.com/pytorch/pytorch/issues/153180#issuecomment-2986676878 So this PR add `-pedantic` to torch inductor build option list to constraint codegen generate c++ standard well code. Additional, It also fixed a halide zero size array code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156914 Approved by: https://github.com/jansel	2025-06-30 16:29:08 +00:00
Nikita Shulga	039a1ce0eb	[BE] Remove CXX11_ABI references from cpp_builder.py (#156896 ) As all Linux builds are CXX11_ABI compatible at this point Pull Request resolved: https://github.com/pytorch/pytorch/pull/156896 Approved by: https://github.com/desertfire, https://github.com/jansel	2025-06-26 17:28:01 +00:00
Xuehai Pan	6ff6630375	[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313 Approved by: https://github.com/jingsh	2025-06-23 02:57:12 +00:00
PyTorch MergeBot	f1331f3f1b	Revert "[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 )" This reverts commit `3627270bdf`. Reverted https://github.com/pytorch/pytorch/pull/156313 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Xuehai Pan	3627270bdf	[BE][3/16] fix typos in torch/ (torch/_inductor/) (#156313 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313 Approved by: https://github.com/jingsh	2025-06-22 08:43:09 +00:00
cyy	c2beeadeb4	[Reland] Use 3.27 as the minimum CMake version (#154783 ) Reland of #153153, which was incidentally closed. Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as CUDA::nvperf_host so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/154783 Approved by: https://github.com/ezyang	2025-06-14 16:37:51 +00:00
cyy	1393f71e07	Use CUDA language in generated CMakeLists.txt from cpp_builder.py (#155979 ) The CMake CUDA module has been deprecated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155979 Approved by: https://github.com/ezyang	2025-06-14 06:52:51 +00:00
angelayi	a4ab392251	[aoti][mps] mps constants support (#154287 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154287 Approved by: https://github.com/malfet ghstack dependencies: #155752	2025-06-12 23:33:07 +00:00
Oguz Ulgen	d1947a8707	Migrate from lru_cache to cache (#155613 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613 Approved by: https://github.com/ezyang ghstack dependencies: #155612	2025-06-11 19:44:18 +00:00
Bin Bao	44df7cf28d	[AOTI] Fix embed_kernel_binary error when max_autotune is ON (#155569 ) Summary: Stop removing cubin files so that it won't be missing when max_autotune is ON. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155569 Approved by: https://github.com/angelayi, https://github.com/yushangdi	2025-06-11 12:27:36 +00:00
PyTorch MergeBot	bd10ea4e6c	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `ad26ec6abe`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923997777))	2025-05-31 02:14:24 +00:00
cyy	ad26ec6abe	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-31 01:54:35 +00:00
PyTorch MergeBot	108422ac26	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `78624679a8`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923785799))	2025-05-31 00:28:03 +00:00
cyy	78624679a8	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-31 00:01:52 +00:00
PyTorch MergeBot	7e8532077f	Revert "Use 3.27 as the minimum CMake version (#153153 )" This reverts commit `1ece53b157`. Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2922369830))	2025-05-30 13:16:33 +00:00
cyy	1ece53b157	Use 3.27 as the minimum CMake version (#153153 ) Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783. It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153 Approved by: https://github.com/malfet	2025-05-30 11:25:30 +00:00
Bin Bao	5a21d6f982	[AOTI][reland] Support multi-arch when using package_cpp_only (#154608 ) Summary: Reland https://github.com/pytorch/pytorch/pull/154414 Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154608 Approved by: https://github.com/yushangdi	2025-05-29 19:32:33 +00:00
PyTorch MergeBot	fdc339003b	Revert "[AOTI] Support multi-arch when using package_cpp_only (#154414 )" This reverts commit `a84d8c4a1c`. Reverted https://github.com/pytorch/pytorch/pull/154414 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm trunk job ([comment](https://github.com/pytorch/pytorch/pull/154414#issuecomment-2915597821))	2025-05-28 09:23:31 +00:00
Bin Bao	a84d8c4a1c	[AOTI] Support multi-arch when using package_cpp_only (#154414 ) Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary. Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154414 Approved by: https://github.com/angelayi ghstack dependencies: #154412, #154413	2025-05-28 01:20:38 +00:00
Bin Bao	cde82d25b7	[AOTI] Add a multi_arch_kernel_binary option (#154413 ) Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs. Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154413 Approved by: https://github.com/angelayi ghstack dependencies: #154412	2025-05-28 01:20:38 +00:00
Bin Bao	72a3c8dfa8	[AOTI][reland] Add an option to specify custom op C shim (#153968 ) Summary: Reland https://github.com/pytorch/pytorch/pull/153851 after fixing a fuzzer test issue. Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153968 Approved by: https://github.com/hl475	2025-05-21 15:57:57 +00:00
Dan Zimmerman	e0f8174001	[triton][fb] Move build_paths into triton_utils (#153652 ) Summary: TSA, this is just a small cleanup Test Plan: CI Differential Revision: D74835506 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153652 Approved by: https://github.com/Skylion007	2025-05-20 18:59:50 +00:00

1 2 3 4

162 Commits