Xu Han
7e00f2ec9d
[AOTI] add zero size consts asm handler ( #159225 )
...
Add `get_zero_consts_asm_code` to handle zero size consts to object.
This function is used to handle zero consts situation. Because cpp standard does not allow zero size array:
https://stackoverflow.com/questions/9722632/what-happens-if-i-define-a-0-size-array-in-c-c
1. On Windows, MSVC will report error C2466:
https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2466?view=msvc-170
So, we can use assmbely compiler to handle this situation.
2. On Windows, why not use Win32 asm to handle all path? Because ml64 only supports up to align `16`, it is
not aligned to pytorch's `64`. Reference: https://learn.microsoft.com/en-us/cpp/assembler/masm/ml-and-ml64-command-line-reference?view=msvc-170
```
Packs structures on the specified byte boundary. The alignment can be 1, 2, 4, 8, or 16.
```
3. It function can handle zero size case on both Windows and Linux, as that:
A. On Linux, we added `-pedantic` to disable zero size array on C++ compiler. 8e07c9870d/torch/_inductor/cpp_builder.py (L580)
B. On Windows, msvc is not support zero size array by default.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159225
Approved by: https://github.com/desertfire
2025-07-31 22:46:33 +00:00
PyTorch MergeBot
7d6f340238
Revert "[AOTI] Add more default options to compile_standalone ( #158560 )"
...
This reverts commit a991e285ae .
Reverted https://github.com/pytorch/pytorch/pull/158560 on behalf of https://github.com/jeffdaily due to broke rocm CI, no test signal was available from rocm ciflow/trunk, need to add ciflow/rocm to reland ([comment](https://github.com/pytorch/pytorch/pull/158560#issuecomment-3103633964 ))
2025-07-22 16:20:17 +00:00
Bin Bao
a991e285ae
[AOTI] Add more default options to compile_standalone ( #158560 )
...
Summary: When compiling for standalone, make embed_kernel_binary and emit_multi_arch_kernel default to True, and add a default name for model_name_for_generated_files to make the generated cpp project easier to understand. Also improved the weights object file naming to be more readable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158560
Approved by: https://github.com/yushangdi
2025-07-21 21:16:48 +00:00
Huamin Li
bc7b1f5252
[AOTI] Use libstdc++ only for fbcode cpu case ( #158659 )
...
Differential Revision: D78567218
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158659
Approved by: https://github.com/kflu , https://github.com/zoranzhao
2025-07-18 22:27:10 +00:00
yuchengliu1
b4358c5e87
[inductor] Explicitly link c10 in inductor. ( #158622 )
...
MSVC have error "unresolved external symbol" when compiling inductor. Explicitly link c10 in inductor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158622
Approved by: https://github.com/desertfire
Co-authored-by: Xu Han <xu.han@outlook.com>
2025-07-18 18:00:50 +00:00
Huamin Li
ddf502c988
[AOTI] add -lstdc++ into aoti link cmd for Meta internal ( #158325 )
...
Differential Revision: D78123716
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158325
Approved by: https://github.com/desertfire
2025-07-16 07:55:08 +00:00
Shangdi Yu
4781d72faa
[AOTI] codegen for static linkage ( #157129 )
...
Design doc: https://docs.google.com/document/d/1ncV7RpJ8xDwy8-_aCBfvZmpTTL824C-aoNPBLLVkOHM/edit?tab=t.0 (internal)
- Add codegen for static linkage
- refactor test code for test_compile_after_package tests
For now, the following options must be used together with `"aot_inductor.compile_standalone": True`.
"aot_inductor.package_cpp_only": True,
Will change `"aot_inductor.package_cpp_only"` to be automatically set to True in followup PR.
```
python test/inductor/test_aot_inductor_package.py -k test_compile_after_package
python test/inductor/test_aot_inductor_package.py -k test_run_static_linkage_model
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157129
Approved by: https://github.com/desertfire
2025-07-10 16:03:50 +00:00
Xiangyang (Mark) Guo
b354328ecd
[AOTI] add flag AOT_INDUCTOR_ENABLE_LTO ( #157773 )
...
Add env var AOT_INDUCTOR_ENABLE_LTO to enable clang's ThinLTO by setting AOT_INDUCTOR_ENABLE_LTO=1. The LTO is disabled by default because it may increase the build time.
Rollback Plan:
Differential Revision: D77899195
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157773
Approved by: https://github.com/desertfire
2025-07-09 16:54:19 +00:00
Xu Han
fcbf7c749a
[Windows][Inductor] normalize_path_separator compiler path ( #157835 )
...
Fixes #157673
For the call trace:
```
......
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\common.py", line 2569, in reduction
return self.kernel.reduction(dtype, src_dtype, reduction_type, value)
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 2155, in reduction
self._gen_parallel_reduction_buffers(acc, acc_type, reduction_type, init_dtype)
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 1942, in _gen_parallel_reduction_buffers
reduction_prefix_array(
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\codegen\cpp.py", line 335, in reduction_prefix_array
if cpp_builder.is_msvc_cl()
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 317, in is_msvc_cl
return _is_msvc_cl(get_cpp_compiler())
File "D:\Programs\Python\virtualenvs\torch_code-afvE469o\lib\site-packages\torch\_inductor\cpp_builder.py", line 240, in _is_msvc_cl
subprocess.check_output([cpp_compiler, "/help"], stderr=subprocess.STDOUT)
torch._inductor.exc.InductorError: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
```
On non-English language pack msvc environment, compiler path has raised `utf-8` issue. I add the `normalize_path_separator` to normalize the compiler path and avoid the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157835
Approved by: https://github.com/jansel
2025-07-09 04:02:20 +00:00
Han, Xu
39b71d11fc
[Inductor] add pedantic to limit inductor code follow standard. ( #156914 )
...
### Background:
During my development work, I found Windows msvc don't support to compile zero size array, please reference: https://github.com/pytorch/pytorch/issues/153180
As discussed with MSFT engineer, we found zero size array don't align to c++ standard, though gcc/clang can support it. When we add `-pedantic` option to gcc, it should check and raise c++ standard strictly. Reference: https://github.com/pytorch/pytorch/issues/153180#issuecomment-2986676878
So this PR add `-pedantic` to torch inductor build option list to constraint codegen generate c++ standard well code.
Additional, It also fixed a halide zero size array code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156914
Approved by: https://github.com/jansel
2025-06-30 16:29:08 +00:00
Nikita Shulga
039a1ce0eb
[BE] Remove CXX11_ABI references from cpp_builder.py ( #156896 )
...
As all Linux builds are CXX11_ABI compatible at this point
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156896
Approved by: https://github.com/desertfire , https://github.com/jansel
2025-06-26 17:28:01 +00:00
Xuehai Pan
6ff6630375
[BE][3/16] fix typos in torch/ (torch/_inductor/) ( #156313 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313
Approved by: https://github.com/jingsh
2025-06-23 02:57:12 +00:00
PyTorch MergeBot
f1331f3f1b
Revert "[BE][3/16] fix typos in torch/ (torch/_inductor/) ( #156313 )"
...
This reverts commit 3627270bdf .
Reverted https://github.com/pytorch/pytorch/pull/156313 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912 ) [HUD commit link](c95f7fa874 ) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213 ))
2025-06-22 12:31:57 +00:00
Xuehai Pan
3627270bdf
[BE][3/16] fix typos in torch/ (torch/_inductor/) ( #156313 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156313
Approved by: https://github.com/jingsh
2025-06-22 08:43:09 +00:00
cyy
c2beeadeb4
[Reland] Use 3.27 as the minimum CMake version ( #154783 )
...
Reland of #153153 , which was incidentally closed.
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as CUDA::nvperf_host so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783 .
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154783
Approved by: https://github.com/ezyang
2025-06-14 16:37:51 +00:00
cyy
1393f71e07
Use CUDA language in generated CMakeLists.txt from cpp_builder.py ( #155979 )
...
The CMake CUDA module has been deprecated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155979
Approved by: https://github.com/ezyang
2025-06-14 06:52:51 +00:00
angelayi
a4ab392251
[aoti][mps] mps constants support ( #154287 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154287
Approved by: https://github.com/malfet
ghstack dependencies: #155752
2025-06-12 23:33:07 +00:00
Oguz Ulgen
d1947a8707
Migrate from lru_cache to cache ( #155613 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
Bin Bao
44df7cf28d
[AOTI] Fix embed_kernel_binary error when max_autotune is ON ( #155569 )
...
Summary: Stop removing cubin files so that it won't be missing when max_autotune is ON.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155569
Approved by: https://github.com/angelayi , https://github.com/yushangdi
2025-06-11 12:27:36 +00:00
PyTorch MergeBot
bd10ea4e6c
Revert "Use 3.27 as the minimum CMake version ( #153153 )"
...
This reverts commit ad26ec6abe .
Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923997777 ))
2025-05-31 02:14:24 +00:00
cyy
ad26ec6abe
Use 3.27 as the minimum CMake version ( #153153 )
...
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783 .
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-31 01:54:35 +00:00
PyTorch MergeBot
108422ac26
Revert "Use 3.27 as the minimum CMake version ( #153153 )"
...
This reverts commit 78624679a8 .
Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2923785799 ))
2025-05-31 00:28:03 +00:00
cyy
78624679a8
Use 3.27 as the minimum CMake version ( #153153 )
...
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783 .
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-31 00:01:52 +00:00
PyTorch MergeBot
7e8532077f
Revert "Use 3.27 as the minimum CMake version ( #153153 )"
...
This reverts commit 1ece53b157 .
Reverted https://github.com/pytorch/pytorch/pull/153153 on behalf of https://github.com/cyyever due to It still breaks windows debug builds ([comment](https://github.com/pytorch/pytorch/pull/153153#issuecomment-2922369830 ))
2025-05-30 13:16:33 +00:00
cyy
1ece53b157
Use 3.27 as the minimum CMake version ( #153153 )
...
Update the minimum CMake version to 3.27 because of it provides more CUDA targets such as `CUDA::nvperf_host` so that it is possible to remove some of our forked CUDA modules. See https://github.com/pytorch/pytorch/pull/153783 .
It's also possible to facilitate future third-party updates such as FBGEMM (its current shipped version requires 3.21).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153153
Approved by: https://github.com/malfet
2025-05-30 11:25:30 +00:00
Bin Bao
5a21d6f982
[AOTI][reland] Support multi-arch when using package_cpp_only ( #154608 )
...
Summary: Reland https://github.com/pytorch/pytorch/pull/154414
Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154608
Approved by: https://github.com/yushangdi
2025-05-29 19:32:33 +00:00
PyTorch MergeBot
fdc339003b
Revert "[AOTI] Support multi-arch when using package_cpp_only ( #154414 )"
...
This reverts commit a84d8c4a1c .
Reverted https://github.com/pytorch/pytorch/pull/154414 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm trunk job ([comment](https://github.com/pytorch/pytorch/pull/154414#issuecomment-2915597821 ))
2025-05-28 09:23:31 +00:00
Bin Bao
a84d8c4a1c
[AOTI] Support multi-arch when using package_cpp_only ( #154414 )
...
Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary.
Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154414
Approved by: https://github.com/angelayi
ghstack dependencies: #154412 , #154413
2025-05-28 01:20:38 +00:00
Bin Bao
cde82d25b7
[AOTI] Add a multi_arch_kernel_binary option ( #154413 )
...
Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs.
Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154413
Approved by: https://github.com/angelayi
ghstack dependencies: #154412
2025-05-28 01:20:38 +00:00
Bin Bao
72a3c8dfa8
[AOTI][reland] Add an option to specify custom op C shim ( #153968 )
...
Summary: Reland https://github.com/pytorch/pytorch/pull/153851 after fixing a fuzzer test issue.
Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153968
Approved by: https://github.com/hl475
2025-05-21 15:57:57 +00:00
Dan Zimmerman
e0f8174001
[triton][fb] Move build_paths into triton_utils ( #153652 )
...
Summary: TSA, this is just a small cleanup
Test Plan: CI
Differential Revision: D74835506
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153652
Approved by: https://github.com/Skylion007
2025-05-20 18:59:50 +00:00
PyTorch MergeBot
3102ae6798
Revert "[AOTI] Add an option to specify custom op C shim ( #153851 )"
...
This reverts commit 365ac49840 .
Reverted https://github.com/pytorch/pytorch/pull/153851 on behalf of https://github.com/malfet due to Looks like it broke fuzzer test, but I could be wrong, see c4d1ff02f8/1 ([comment](https://github.com/pytorch/pytorch/pull/153851#issuecomment-2894619773 ))
2025-05-20 14:23:50 +00:00
Bin Bao
365ac49840
[AOTI] Add an option to specify custom op C shim ( #153851 )
...
Summary: Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions.
Differential Revision: [D75014177](https://our.internmc.facebook.com/intern/diff/D75014177 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153851
Approved by: https://github.com/hl475
2025-05-20 05:12:09 +00:00
Bin Bao
a2d0ef242d
[AOTI] Embed cubin files into .so ( #150739 )
...
Summary: Embed cubin files so AOTI is one step closer to generate a single binary. Controlled by a flag and off as default.
Differential Revision: [D72535357](https://our.internmc.facebook.com/intern/diff/D72535357 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150739
Approved by: https://github.com/angelayi
2025-05-19 01:11:46 +00:00
Benjamin Glass
cda572b053
codecache: Remove cpp_prefix.h duplication per build, then precompile it ( #144293 )
...
Prior to this PR, `_inductor/codegen/cpp_prefix.h` was copied into a new temporary directory on every inductor run utilizing the CPP backend (i.e. CPU-only), then included in the output source code. Instead, this PR puts it in an appropriate place in the torch includes, and includes it from there. This allows us to precompile it in cpp_wrapper and AOT inductor mode, saving significant compilation time.
Due to difficulties getting this to work in FBCode, the precompilation itself is only enabled in OSS PyTorch.
Differential Revision: [D69420620](https://our.internmc.facebook.com/intern/diff/D69420620 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144293
Approved by: https://github.com/desertfire
2025-05-16 17:41:36 +00:00
Aaron Gokaslan
1c659b5bc0
[BE]: Use more portable shutil.which call for cpp_builder ( #153325 )
...
We should be using shutil.which instead of calling some binary subprocess here for portability and security.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153325
Approved by: https://github.com/xuhancn , https://github.com/cyyever , https://github.com/albanD
2025-05-12 15:15:21 +00:00
Benjamin Glass
b80bb87689
cpp_wrapper: Miscellaneous fixups ( #150143 )
...
1. Revisit preprocessing code in cpp_bulider.py, removing a hack that channels it through stdout.
2. Fix ops that return None.
Differential Revision: [D72053414](https://our.internmc.facebook.com/intern/diff/D72053414 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150143
Approved by: https://github.com/desertfire
2025-04-10 03:31:12 +00:00
Jason Ansel
37ebb0b56a
[inductor] Fix inductor windows linker error ( #150256 )
...
Fixes #149889
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150256
Approved by: https://github.com/anijain2305 , https://github.com/eellison
2025-04-01 18:30:55 +00:00
Vlad K
f1b74037b1
Fix bug when Inductor include path contains spaces ( #148271 )
...
This PR fixes a bug with how include directories with spaces are handled on Windows. I ran into an edge case with torch.compile() - it will error out with an exception on Windows. In particular, it will try to execute the following: `cl /I C:/Program Files/Python311/Include ...`, where `C:/Program` will be treated as separate from `Files/Python311/Include`.
I looked into using something like `shlex.quote` or `pathlib.Path`, but I didn't find those options to be suitable (shlex is POSIX shell only, pathlib.Path does not escape spaces).
There is another place in the function that also deals with escaping spaces. My fix follows the same style. 0ff2e6a85a/torch/_inductor/cpp_builder.py (L1464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148271
Approved by: https://github.com/Skylion007
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-03-31 06:46:05 +00:00
Xu Han
bc1b8730a4
[Windows][inductor] fix blank space break windows file path ( #149388 )
...
Fixes #149310
From origin error message:
```cmd
Command:
cl /I C:/Program Files/Python310/Include /I c:/code/.env/lib/site-packages/torch/include /I c:/code/.env/lib/site-packages/torch/include/torch/csrc/api/include /I c:/code/.env/lib/site-packages/torch/include/TH /I c:/code/.env/lib/site-packages/torch/include/THC /D TORCH_INDUCTOR_CPP_WRAPPER /D STANDALONE_TORCH_HEADER /D C10_USING_CUSTOM_GENERATED_MACROS /DLL /MD /O2 /std:c++20 /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /openmp /openmp:experimental C:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp /LD /FeC:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.pyd /link /LIBPATH:c:/code/.env/Scripts/libs /LIBPATH:c:/code/.env/lib/site-packages/torch/lib torch.lib torch_cpu.lib torch_python.lib sleef.lib
Output:
Microsoft (R) C/C++ Optimizing Compiler Version 19.43.34809 for x86
Copyright (C) Microsoft Corporation. All rights reserved.
cl : Command line warning D9025 : overriding '/openmp' with '/openmp:experimental'
cl : Command line warning D9024 : unrecognized source file type 'Files/Python310/Include', object file assumed
coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp
C:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp(21): fatal error C1083: Cannot open include file: 'Python.h': No such file or directory
```
Python installed in `C:/Program Files/Python310` path, and the blank space break the file path.
Solution:
Add quotes to declare Windows file paths, after that:
```cmd
cl /I "C:/Users/Xuhan/.conda/envs/new_build/Include" /I "C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/include" /I "C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/include/torch/csrc/api/include" /D TORCH_INDUCTOR_CPP_WRAPPER /D STANDALONE_TORCH_HEADER /D C10_USING_CUSTOM_GENERATED_MACROS /D CPU_CAPABILITY_AVX512 /DLL /MD /O2 /std:c++20 /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /openmp /openmp:experimental C:/Users/Xuhan/AppData/Local/Temp/tmp1wsj0m8r/za/czarp3ly5c22ge3hydvnzvad4cjimyr3hkwvofodxqffgil7frfd.cpp /arch:AVX512 /FeC:/Users/Xuhan/AppData/Local/Temp/tmp1wsj0m8r/za/czarp3ly5c22ge3hydvnzvad4cjimyr3hkwvofodxqffgil7frfd.pyd /LD /link /LIBPATH:"C:/Users/Xuhan/.conda/envs/new_build/libs" /LIBPATH:"C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/lib" "torch.lib" "torch_cpu.lib" "torch_python.lib" "sleef.lib"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149388
Approved by: https://github.com/jansel
2025-03-20 03:10:30 +00:00
Benjamin Glass
e8dd58b8cf
cpp_wrapper: Precompile device-specific header files ( #146928 )
...
This saves us about a second per compilation, which is _massive_ for the OpInfo tests. Total OpInfo test runtime is down about 2x from this change alone.
Relands #144002 , with changes needed by fbcode internals.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146928
Approved by: https://github.com/desertfire
2025-03-17 20:40:15 +00:00
Shangdi Yu
df60500ab8
Fix too big to optimize in test, actually use O0 when aot_inductor.compile_wrapper_with_O0 is set ( #148714 )
...
Summary:
1. Check against the "0" char instead
2. We got the following error when using anything other than O0 flag: `error: Function ZN5torch12aot_inductorL22__check_inputs_outputsEPP16AtenTensorOpaqueS3 is too big to optimize [-Werror,-Wignored-optimization-argument]` So we use O0 flag in wrapper code when `aot_inductor.compile_wrapper_opt_level` is set to `O0`.
Test Plan:
```
buck run 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:ads_second_stage_dsnn_models_aoti_lowering_test -- -r AdsSecondStageDSNNModelsAOTILoweringTest
```
Differential Revision: D70670957
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148714
Approved by: https://github.com/desertfire
2025-03-13 10:22:06 +00:00
Zhuoran Zhao
3745da18f4
[AOTI] Swith to local cpp compile for fbcode ( #148592 )
...
Summary: as title, otherwise we can not find lamdhip64
Test Plan: https://www.internalfb.com/phabricator/paste/view/P1747104431
Differential Revision: D70637798
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148592
Approved by: https://github.com/hl475
2025-03-08 08:38:26 +00:00
Benjamin Glass
d6d670ab4d
[AOTI] build CPU CPP kernels at O3, and all other code at O1 ( #148587 )
...
In the future, we may also want to add LTO linking to further optimize the results (while still hopefully netting compile time benefits).
Differential Revision: [D70641543](https://our.internmc.facebook.com/intern/diff/D70641543 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148587
Approved by: https://github.com/desertfire
2025-03-05 22:47:46 +00:00
Bin Bao
df7e43e5d4
[AOTI] Fix aot_inductor_package test errors ( #148279 )
...
Summary: Fix fbcode test failures introduced by https://github.com/pytorch/pytorch/pull/147975 . Make sure script.ld is copied to the build-time directory.
Differential Revision: D70454149
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148279
Approved by: https://github.com/zoranzhao
2025-03-05 05:22:48 +00:00
Xuehai Pan
1cb4e2df65
[BE][PYFMT] migrate PYFMT for torch._inductor to ruff format ( #144550 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550
Approved by: https://github.com/jansel
2025-02-28 13:33:19 +00:00
Bin Bao
f104ef1248
[AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode ( #147975 )
...
Summary: Let CppBuilder handle all the cpp build logic
Differential Revision: D70141808
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147975
Approved by: https://github.com/angelayi , https://github.com/yushangdi
2025-02-27 00:35:12 +00:00
PyTorch MergeBot
acca9b9cb0
Revert "[AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode_cpu_re ( #147803 )"
...
This reverts commit 0b9da1ae0a .
Reverted https://github.com/pytorch/pytorch/pull/147803 on behalf of https://github.com/wdvr due to breaking internal tests, discussed with author ([comment](https://github.com/pytorch/pytorch/pull/147803#issuecomment-2683938121 ))
2025-02-26 05:32:17 +00:00
Bin Bao
0b9da1ae0a
[AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode_cpu_re ( #147803 )
...
Summary: Let CppBuilder handle all the cpp build logic
Differential Revision: [D70146185](https://our.internmc.facebook.com/intern/diff/D70146185 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147803
Approved by: https://github.com/malfet
ghstack dependencies: #147805 , #147806 , #147807
2025-02-25 13:33:12 +00:00
Bin Bao
cc1c9826d4
[AOTI][refactor] Fix a typo ( #147807 )
...
Summary: defination -> definition
Differential Revision: [D70146182](https://our.internmc.facebook.com/intern/diff/D70146182 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147807
Approved by: https://github.com/malfet
ghstack dependencies: #147805 , #147806
2025-02-25 13:33:12 +00:00