Commit Graph

111 Commits

Author SHA1 Message Date
Bin Bao
365ac49840 [AOTI] Add an option to specify custom op C shim (#153851)
Summary: Add an option to tell AOTInductor codegen to generate C shim functions for certain custom ops instead of relying on ProxyExecutor. The lib that defines custom ops need to implement corresponding C shim functions.

Differential Revision: [D75014177](https://our.internmc.facebook.com/intern/diff/D75014177)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153851
Approved by: https://github.com/hl475
2025-05-20 05:12:09 +00:00
Bin Bao
a2d0ef242d [AOTI] Embed cubin files into .so (#150739)
Summary: Embed cubin files so AOTI is one step closer to generate a single binary. Controlled by a flag and off as default.

Differential Revision: [D72535357](https://our.internmc.facebook.com/intern/diff/D72535357)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150739
Approved by: https://github.com/angelayi
2025-05-19 01:11:46 +00:00
Benjamin Glass
cda572b053 codecache: Remove cpp_prefix.h duplication per build, then precompile it (#144293)
Prior to this PR, `_inductor/codegen/cpp_prefix.h` was copied into a new temporary directory on every inductor run utilizing the CPP backend (i.e. CPU-only), then included in the output source code. Instead, this PR puts it in an appropriate place in the torch includes, and includes it from there. This allows us to precompile it in cpp_wrapper and AOT inductor mode, saving significant compilation time.

Due to difficulties getting this to work in FBCode, the precompilation itself is only enabled in OSS PyTorch.

Differential Revision: [D69420620](https://our.internmc.facebook.com/intern/diff/D69420620)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144293
Approved by: https://github.com/desertfire
2025-05-16 17:41:36 +00:00
Aaron Gokaslan
1c659b5bc0 [BE]: Use more portable shutil.which call for cpp_builder (#153325)
We should be using shutil.which instead of calling some binary subprocess here for portability and security.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153325
Approved by: https://github.com/xuhancn, https://github.com/cyyever, https://github.com/albanD
2025-05-12 15:15:21 +00:00
Benjamin Glass
b80bb87689 cpp_wrapper: Miscellaneous fixups (#150143)
1. Revisit preprocessing code in cpp_bulider.py, removing a hack that channels it through stdout.
2. Fix ops that return None.

Differential Revision: [D72053414](https://our.internmc.facebook.com/intern/diff/D72053414)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150143
Approved by: https://github.com/desertfire
2025-04-10 03:31:12 +00:00
Jason Ansel
37ebb0b56a [inductor] Fix inductor windows linker error (#150256)
Fixes #149889

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150256
Approved by: https://github.com/anijain2305, https://github.com/eellison
2025-04-01 18:30:55 +00:00
Vlad K
f1b74037b1 Fix bug when Inductor include path contains spaces (#148271)
This PR fixes a bug with how include directories with spaces are handled on Windows. I ran into an edge case with torch.compile() - it will error out with an exception on Windows. In particular, it will try to execute the following: `cl /I C:/Program Files/Python311/Include ...`, where `C:/Program` will be treated as separate from `Files/Python311/Include`.

I looked into using something like `shlex.quote` or `pathlib.Path`, but I didn't find those options to be suitable (shlex is POSIX shell only, pathlib.Path does not escape spaces).

There is another place in the function that also deals with escaping spaces. My fix follows the same style. 0ff2e6a85a/torch/_inductor/cpp_builder.py (L1464)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148271
Approved by: https://github.com/Skylion007

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-03-31 06:46:05 +00:00
Xu Han
bc1b8730a4 [Windows][inductor] fix blank space break windows file path (#149388)
Fixes #149310

From origin error message:
```cmd
Command:
cl /I C:/Program Files/Python310/Include /I c:/code/.env/lib/site-packages/torch/include /I c:/code/.env/lib/site-packages/torch/include/torch/csrc/api/include /I c:/code/.env/lib/site-packages/torch/include/TH /I c:/code/.env/lib/site-packages/torch/include/THC /D TORCH_INDUCTOR_CPP_WRAPPER /D STANDALONE_TORCH_HEADER /D C10_USING_CUSTOM_GENERATED_MACROS /DLL /MD /O2 /std:c++20 /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /openmp /openmp:experimental C:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp /LD /FeC:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.pyd /link /LIBPATH:c:/code/.env/Scripts/libs /LIBPATH:c:/code/.env/lib/site-packages/torch/lib torch.lib torch_cpu.lib torch_python.lib sleef.lib

Output:
Microsoft (R) C/C++ Optimizing Compiler Version 19.43.34809 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

cl : Command line warning D9025 : overriding '/openmp' with '/openmp:experimental'
cl : Command line warning D9024 : unrecognized source file type 'Files/Python310/Include', object file assumed
coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp
C:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp(21): fatal error C1083: Cannot open include file: 'Python.h': No such file or directory
```
Python installed in `C:/Program Files/Python310` path, and the blank space break the file path.

Solution:
Add quotes to declare Windows file paths, after that:
```cmd
cl /I "C:/Users/Xuhan/.conda/envs/new_build/Include" /I "C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/include" /I "C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/include/torch/csrc/api/include"  /D TORCH_INDUCTOR_CPP_WRAPPER /D STANDALONE_TORCH_HEADER /D  C10_USING_CUSTOM_GENERATED_MACROS /D CPU_CAPABILITY_AVX512  /DLL /MD /O2 /std:c++20 /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /openmp /openmp:experimental  C:/Users/Xuhan/AppData/Local/Temp/tmp1wsj0m8r/za/czarp3ly5c22ge3hydvnzvad4cjimyr3hkwvofodxqffgil7frfd.cpp  /arch:AVX512  /FeC:/Users/Xuhan/AppData/Local/Temp/tmp1wsj0m8r/za/czarp3ly5c22ge3hydvnzvad4cjimyr3hkwvofodxqffgil7frfd.pyd /LD /link /LIBPATH:"C:/Users/Xuhan/.conda/envs/new_build/libs" /LIBPATH:"C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/lib"  "torch.lib" "torch_cpu.lib" "torch_python.lib" "sleef.lib"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149388
Approved by: https://github.com/jansel
2025-03-20 03:10:30 +00:00
Benjamin Glass
e8dd58b8cf cpp_wrapper: Precompile device-specific header files (#146928)
This saves us about a second per compilation, which is _massive_ for the OpInfo tests.  Total OpInfo test runtime is down about 2x from this change alone.

Relands #144002, with changes needed by fbcode internals.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146928
Approved by: https://github.com/desertfire
2025-03-17 20:40:15 +00:00
Shangdi Yu
df60500ab8 Fix too big to optimize in test, actually use O0 when aot_inductor.compile_wrapper_with_O0 is set (#148714)
Summary:
1. Check against the "0" char instead

2. We got the following error when using anything other than O0 flag: `error: Function ZN5torch12aot_inductorL22__check_inputs_outputsEPP16AtenTensorOpaqueS3 is too big to optimize [-Werror,-Wignored-optimization-argument]` So we use O0 flag in wrapper code when `aot_inductor.compile_wrapper_opt_level` is set to `O0`.

Test Plan:
```
 buck run  'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:ads_second_stage_dsnn_models_aoti_lowering_test -- -r AdsSecondStageDSNNModelsAOTILoweringTest
```

Differential Revision: D70670957

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148714
Approved by: https://github.com/desertfire
2025-03-13 10:22:06 +00:00
Zhuoran Zhao
3745da18f4 [AOTI] Swith to local cpp compile for fbcode (#148592)
Summary: as title, otherwise we can not find lamdhip64

Test Plan: https://www.internalfb.com/phabricator/paste/view/P1747104431

Differential Revision: D70637798

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148592
Approved by: https://github.com/hl475
2025-03-08 08:38:26 +00:00
Benjamin Glass
d6d670ab4d [AOTI] build CPU CPP kernels at O3, and all other code at O1 (#148587)
In the future, we may also want to add LTO linking to further optimize the results (while still hopefully netting compile time benefits).

Differential Revision: [D70641543](https://our.internmc.facebook.com/intern/diff/D70641543)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148587
Approved by: https://github.com/desertfire
2025-03-05 22:47:46 +00:00
Bin Bao
df7e43e5d4 [AOTI] Fix aot_inductor_package test errors (#148279)
Summary: Fix fbcode test failures introduced by https://github.com/pytorch/pytorch/pull/147975. Make sure script.ld is copied to the build-time directory.

Differential Revision: D70454149

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148279
Approved by: https://github.com/zoranzhao
2025-03-05 05:22:48 +00:00
Xuehai Pan
1cb4e2df65 [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550
Approved by: https://github.com/jansel
2025-02-28 13:33:19 +00:00
Bin Bao
f104ef1248 [AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode (#147975)
Summary: Let CppBuilder handle all the cpp build logic

Differential Revision: D70141808

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147975
Approved by: https://github.com/angelayi, https://github.com/yushangdi
2025-02-27 00:35:12 +00:00
PyTorch MergeBot
acca9b9cb0 Revert "[AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode_cpu_re (#147803)"
This reverts commit 0b9da1ae0a.

Reverted https://github.com/pytorch/pytorch/pull/147803 on behalf of https://github.com/wdvr due to breaking internal tests, discussed with author ([comment](https://github.com/pytorch/pytorch/pull/147803#issuecomment-2683938121))
2025-02-26 05:32:17 +00:00
Bin Bao
0b9da1ae0a [AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode_cpu_re (#147803)
Summary: Let CppBuilder handle all the cpp build logic

Differential Revision: [D70146185](https://our.internmc.facebook.com/intern/diff/D70146185)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147803
Approved by: https://github.com/malfet
ghstack dependencies: #147805, #147806, #147807
2025-02-25 13:33:12 +00:00
Bin Bao
cc1c9826d4 [AOTI][refactor] Fix a typo (#147807)
Summary: defination -> definition

Differential Revision: [D70146182](https://our.internmc.facebook.com/intern/diff/D70146182)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147807
Approved by: https://github.com/malfet
ghstack dependencies: #147805, #147806
2025-02-25 13:33:12 +00:00
Bin Bao
2680e835c8 [AOTI][refactor] Rename use_absolute_path to use_relative_path (#147805)
Summary: The option really means to compile a cpp file using its basename instead of the its full path. Reland https://github.com/pytorch/pytorch/pull/147679.

Differential Revision: [D70146184](https://our.internmc.facebook.com/intern/diff/D70146184)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147805
Approved by: https://github.com/malfet
2025-02-25 13:32:54 +00:00
PyTorch MergeBot
890213f65f Revert "[AOTI][refactor] Rename use_absolute_path to use_relative_path (#147679)"
This reverts commit 0b52d801d2.

Reverted https://github.com/pytorch/pytorch/pull/147679 on behalf of https://github.com/desertfire due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/147679#issuecomment-2680389225))
2025-02-25 04:11:13 +00:00
Benjamin Glass
33ff96b3f9 cpp_builder: unbreak clang++ detection (#147775)
Fixes an issue where `_is_gcc` would match on `clang++` due to the string ending with `g++`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147775
Approved by: https://github.com/desertfire
2025-02-25 02:33:01 +00:00
Bin Bao
0b52d801d2 [AOTI][refactor] Rename use_absolute_path to use_relative_path (#147679)
The option really means to compile a cpp file using its basename instead of the its full path.

Differential Revision: [D69722709](https://our.internmc.facebook.com/intern/diff/D69722709/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147679
Approved by: https://github.com/angelayi
2025-02-24 21:44:33 +00:00
xinan.lin
8d618f3da7 [AOTI][XPU] Suppress multi-line comment warning for XPU. (#147710)
This PR aim to suppress multi-line comment waring in sycl header when building Inductor cpp_wrapper .
```
/intel/oneapi/compiler/2025.0/include/sycl/detail/builtins/builtins.hpp:235:1: warning: multi-line comment [-Wcomment]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147710
Approved by: https://github.com/EikanWang, https://github.com/jansel
2025-02-24 07:28:59 +00:00
Bin Bao
d38db94689 [inductor][refactor] Move _compile_file to cpp_builder (#147202)
Summary: To further conslidate cpp build logic into cpp_builder

Test Plan: CI

Differential Revision: D69595327

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147202
Approved by: https://github.com/yushangdi
2025-02-14 21:02:30 +00:00
PyTorch MergeBot
2fafcd37c3 Revert "cpp_wrapper: Precompile device-specific header files (#144002)"
This reverts commit de6efa1feb.

Reverted https://github.com/pytorch/pytorch/pull/144002 on behalf of https://github.com/huydhn due to Sorry for reverting your change but this breaks some inductor tests running internally ([comment](https://github.com/pytorch/pytorch/pull/144002#issuecomment-2649569562))
2025-02-11 00:42:22 +00:00
Benjamin Glass
de6efa1feb cpp_wrapper: Precompile device-specific header files (#144002)
This saves us about a second per compilation, which is _massive_ for the OpInfo tests. Total OpInfo test runtime is down about 2x from this change alone.

Differential Revision: [D69185685](https://our.internmc.facebook.com/intern/diff/D69185685)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144002
Approved by: https://github.com/desertfire
2025-02-10 17:13:09 +00:00
Henry Tsang
9d5bf38dec [cpp_builder] refactor to reduce libcudart_static logs (#146394)
Want to reduce logs from `log_msg = f'"libcudart_static.a" not found under {path}'`, which was added in https://github.com/pytorch/pytorch/pull/142175

Differential Revision: D69096354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146394
Approved by: https://github.com/benjaminglass1, https://github.com/chenyang78
2025-02-05 00:41:30 +00:00
Bin Bao
16420a78eb [AOTI] Remove AOTI_USE_CREATE_TENSOR_FROM_BLOB_V1 (#146039)
Summary: The AOTI_USE_CREATE_TENSOR_FROM_BLOB_V1 macro was used to solve a FC issue and it can be removed now.

Test Plan: CI

Differential Revision: D68871245

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146039
Approved by: https://github.com/yushangdi, https://github.com/hl475
2025-01-30 19:01:19 +00:00
PyTorch MergeBot
cfbb27462e Revert "[inductor][BE] Enable test_cpu_cpp_wrapper in fbcode (#145373)"
This reverts commit b8087747f5.

Reverted https://github.com/pytorch/pytorch/pull/145373 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/145373#issuecomment-2619674197))
2025-01-28 17:46:11 +00:00
H. Vetinari
e6c1e6e20e simplify torch.utils.cpp_extension.include_paths; use it in cpp_builder (#145480)
While working on conda-forge integration, I needed to look at the way the include paths are calculated, and noticed an avoidable duplication between `torch/utils/cpp_extension.py` and `torch/_inductor/cpp_builder.py`. The latter already imports the former anyway, so simply reuse the same function.

Furthermore, remove long-obsolete include-paths. AFAICT, the `/TH` headers have not existed since pytorch 1.11.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145480
Approved by: https://github.com/ezyang
2025-01-27 07:19:42 +00:00
Bin Bao
b8087747f5 [inductor][BE] Enable test_cpu_cpp_wrapper in fbcode (#145373)
Differential Revision: D68278174

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145373
Approved by: https://github.com/Skylion007
2025-01-24 17:59:13 +00:00
Irem Yuksel
66bf7da446 Enable sleef for Win Arm64 (#144876)
Sleef module was disabled for Windows Arm64 on b021486405
This PR enables it again since the issue is no longer valid.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144876
Approved by: https://github.com/albanD, https://github.com/malfet

Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>
2025-01-23 19:22:58 +00:00
Aaron Orenstein
893ca1dfe1 PEP585 update - torch/_inductor/[_-i]* (#145137)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145137
Approved by: https://github.com/bobrenjc93
2025-01-19 01:22:47 +00:00
bobrenjc93
a3ab27b8e0 Migrate from Tuple -> tuple in torch/_inductor (#144264)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144264
Approved by: https://github.com/eellison
2025-01-07 03:27:27 +00:00
Bin Bao
fecf03fa3f [AOTI][reland] Emit a CMakeLists.txt when package_cpp_only (#143680)
Summary: Emit a CMakeLists.txt with compile and link options when package_cpp_only is specified. After unzipping AOTI generated .pt2 package file, user can manually build the generated model code in their local environment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143680
Approved by: https://github.com/huydhn
2024-12-21 03:48:40 +00:00
xinan.lin
b5e159270a [AOTI XPU] Replace intel compiler with g++ to build inductor CPP wrapper in runtime. (#142322)
This PR aims to removes the de pendency on Intel Compiler at Inductor runtime. Now we only need a SYCL_HOME in runtime to find the sycl headers and libs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142322
Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/albanD
ghstack dependencies: #143491
2024-12-21 02:27:04 +00:00
Tom Ritchford
b5475d334e [inductor] Fix an unused variable in cpu_vec_isa.py (#138473)
----

* Extracted from https://github.com/pytorch/pytorch/pull/133492

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138473
Approved by: https://github.com/EikanWang, https://github.com/albanD, https://github.com/xuhancn
2024-12-20 18:50:19 +00:00
Huamin Li
f5af87c23c Make Inductor cpp backend enable_floating_point_contract_flag to take string (#143450)
Differential Revision: D66269001

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143450
Approved by: https://github.com/desertfire
2024-12-20 16:28:54 +00:00
PyTorch MergeBot
71479a9b9c Revert "[AOTI] Emit a CMakeLists.txt when package_cpp_only (#143352)"
This reverts commit 429f4cd140.

Reverted https://github.com/pytorch/pytorch/pull/143352 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the new test is failing on ROCm ([comment](https://github.com/pytorch/pytorch/pull/143352#issuecomment-2556365140))
2024-12-20 06:21:31 +00:00
Bin Bao
429f4cd140 [AOTI] Emit a CMakeLists.txt when package_cpp_only (#143352)
Summary: Emit a CMakeLists.txt with compile and link options when package_cpp_only is specified. After unzipping AOTI generated .pt2 package file, user can manually build the generated model code in their local environment.

Differential Revision: [D67458526](https://our.internmc.facebook.com/intern/diff/D67458526)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143352
Approved by: https://github.com/malfet
2024-12-19 22:01:05 +00:00
Bin Bao
0e8013fc1c [AOTI] Fix a typo in cpp_builder.py (#143351)
Summary: passthough -> passthrough

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143351
Approved by: https://github.com/yushangdi, https://github.com/chenyang78
ghstack dependencies: #143350
2024-12-18 16:28:37 +00:00
Benjamin Glass
bb06fc79fb cpp_builder: handle CUDA lib paths involving "stubs" in more circumstances (#142175)
conda packages for `cuda-driver-dev=12.4.127` use a "stubs" subdirectory to contain `libcuda.so`.  This was previously only handled by cpp_builder in some cases, but now needs to be potentially handled more generally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142175
Approved by: https://github.com/desertfire
2024-12-17 17:21:27 +00:00
Tom Ritchford
dc23f1944a Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-12 17:39:14 +00:00
Colin L. Rice
d68403df3b filelock: Make waitcounter variant to use (#139816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816
Approved by: https://github.com/ezyang
2024-12-12 01:18:34 +00:00
PyTorch MergeBot
5c97ac9721 Revert "Remove unused Python variables in torch/[_-a]* (#133492)"
This reverts commit fda975a7b3.

Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))
2024-12-11 17:29:12 +00:00
PyTorch MergeBot
2374d460d0 Revert "filelock: Make waitcounter variant to use (#139816)"
This reverts commit 237c4b559c.

Reverted https://github.com/pytorch/pytorch/pull/139816 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/139816#issuecomment-2536616808))
2024-12-11 17:26:46 +00:00
Colin L. Rice
237c4b559c filelock: Make waitcounter variant to use (#139816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816
Approved by: https://github.com/ezyang
2024-12-10 23:02:59 +00:00
Tom Ritchford
fda975a7b3 Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-10 21:48:44 +00:00
Colin Peppler
0602676c8d [CUTLASS][AOTI] Fixes undefined symbol: cudaLaunchKernelExC (#142094)
Summary:
### Context
* When compiling the object file for a CUTLASS kernel, CUDA RT symbols are left undefined.
* When compiling the final shared object file, we statically link with `libcudart_static.a`.
* One important thing is that ordering matters when specifying the lib search paths (-L).

Test Plan:
```
// before diff
RuntimeError: Failure loading .so: /tmp/tmpqhz_dnza/model.so: undefined symbol: cudaLaunchKernelExC
```

Differential Revision: D66793974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142094
Approved by: https://github.com/chenyang78, https://github.com/hl475
2024-12-06 02:18:54 +00:00
xinan.lin
4742080ed9 [AOTI XPU] Enable Cpp wraper for Intel GPU. (#135318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135318
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire
2024-11-26 11:51:32 +00:00