Commit Graph

101 Commits

Author SHA1 Message Date
Zhuoran Zhao
3745da18f4 [AOTI] Swith to local cpp compile for fbcode (#148592)
Summary: as title, otherwise we can not find lamdhip64

Test Plan: https://www.internalfb.com/phabricator/paste/view/P1747104431

Differential Revision: D70637798

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148592
Approved by: https://github.com/hl475
2025-03-08 08:38:26 +00:00
Benjamin Glass
d6d670ab4d [AOTI] build CPU CPP kernels at O3, and all other code at O1 (#148587)
In the future, we may also want to add LTO linking to further optimize the results (while still hopefully netting compile time benefits).

Differential Revision: [D70641543](https://our.internmc.facebook.com/intern/diff/D70641543)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148587
Approved by: https://github.com/desertfire
2025-03-05 22:47:46 +00:00
Bin Bao
df7e43e5d4 [AOTI] Fix aot_inductor_package test errors (#148279)
Summary: Fix fbcode test failures introduced by https://github.com/pytorch/pytorch/pull/147975. Make sure script.ld is copied to the build-time directory.

Differential Revision: D70454149

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148279
Approved by: https://github.com/zoranzhao
2025-03-05 05:22:48 +00:00
Xuehai Pan
1cb4e2df65 [BE][PYFMT] migrate PYFMT for torch._inductor to ruff format (#144550)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550
Approved by: https://github.com/jansel
2025-02-28 13:33:19 +00:00
Bin Bao
f104ef1248 [AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode (#147975)
Summary: Let CppBuilder handle all the cpp build logic

Differential Revision: D70141808

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147975
Approved by: https://github.com/angelayi, https://github.com/yushangdi
2025-02-27 00:35:12 +00:00
PyTorch MergeBot
acca9b9cb0 Revert "[AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode_cpu_re (#147803)"
This reverts commit 0b9da1ae0a.

Reverted https://github.com/pytorch/pytorch/pull/147803 on behalf of https://github.com/wdvr due to breaking internal tests, discussed with author ([comment](https://github.com/pytorch/pytorch/pull/147803#issuecomment-2683938121))
2025-02-26 05:32:17 +00:00
Bin Bao
0b9da1ae0a [AOTI][refactor] Consolidate CppBuilder.build and CppBuilder.build_fbcode_cpu_re (#147803)
Summary: Let CppBuilder handle all the cpp build logic

Differential Revision: [D70146185](https://our.internmc.facebook.com/intern/diff/D70146185)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147803
Approved by: https://github.com/malfet
ghstack dependencies: #147805, #147806, #147807
2025-02-25 13:33:12 +00:00
Bin Bao
cc1c9826d4 [AOTI][refactor] Fix a typo (#147807)
Summary: defination -> definition

Differential Revision: [D70146182](https://our.internmc.facebook.com/intern/diff/D70146182)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147807
Approved by: https://github.com/malfet
ghstack dependencies: #147805, #147806
2025-02-25 13:33:12 +00:00
Bin Bao
2680e835c8 [AOTI][refactor] Rename use_absolute_path to use_relative_path (#147805)
Summary: The option really means to compile a cpp file using its basename instead of the its full path. Reland https://github.com/pytorch/pytorch/pull/147679.

Differential Revision: [D70146184](https://our.internmc.facebook.com/intern/diff/D70146184)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147805
Approved by: https://github.com/malfet
2025-02-25 13:32:54 +00:00
PyTorch MergeBot
890213f65f Revert "[AOTI][refactor] Rename use_absolute_path to use_relative_path (#147679)"
This reverts commit 0b52d801d2.

Reverted https://github.com/pytorch/pytorch/pull/147679 on behalf of https://github.com/desertfire due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/147679#issuecomment-2680389225))
2025-02-25 04:11:13 +00:00
Benjamin Glass
33ff96b3f9 cpp_builder: unbreak clang++ detection (#147775)
Fixes an issue where `_is_gcc` would match on `clang++` due to the string ending with `g++`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147775
Approved by: https://github.com/desertfire
2025-02-25 02:33:01 +00:00
Bin Bao
0b52d801d2 [AOTI][refactor] Rename use_absolute_path to use_relative_path (#147679)
The option really means to compile a cpp file using its basename instead of the its full path.

Differential Revision: [D69722709](https://our.internmc.facebook.com/intern/diff/D69722709/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147679
Approved by: https://github.com/angelayi
2025-02-24 21:44:33 +00:00
xinan.lin
8d618f3da7 [AOTI][XPU] Suppress multi-line comment warning for XPU. (#147710)
This PR aim to suppress multi-line comment waring in sycl header when building Inductor cpp_wrapper .
```
/intel/oneapi/compiler/2025.0/include/sycl/detail/builtins/builtins.hpp:235:1: warning: multi-line comment [-Wcomment]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147710
Approved by: https://github.com/EikanWang, https://github.com/jansel
2025-02-24 07:28:59 +00:00
Bin Bao
d38db94689 [inductor][refactor] Move _compile_file to cpp_builder (#147202)
Summary: To further conslidate cpp build logic into cpp_builder

Test Plan: CI

Differential Revision: D69595327

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147202
Approved by: https://github.com/yushangdi
2025-02-14 21:02:30 +00:00
PyTorch MergeBot
2fafcd37c3 Revert "cpp_wrapper: Precompile device-specific header files (#144002)"
This reverts commit de6efa1feb.

Reverted https://github.com/pytorch/pytorch/pull/144002 on behalf of https://github.com/huydhn due to Sorry for reverting your change but this breaks some inductor tests running internally ([comment](https://github.com/pytorch/pytorch/pull/144002#issuecomment-2649569562))
2025-02-11 00:42:22 +00:00
Benjamin Glass
de6efa1feb cpp_wrapper: Precompile device-specific header files (#144002)
This saves us about a second per compilation, which is _massive_ for the OpInfo tests. Total OpInfo test runtime is down about 2x from this change alone.

Differential Revision: [D69185685](https://our.internmc.facebook.com/intern/diff/D69185685)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144002
Approved by: https://github.com/desertfire
2025-02-10 17:13:09 +00:00
Henry Tsang
9d5bf38dec [cpp_builder] refactor to reduce libcudart_static logs (#146394)
Want to reduce logs from `log_msg = f'"libcudart_static.a" not found under {path}'`, which was added in https://github.com/pytorch/pytorch/pull/142175

Differential Revision: D69096354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146394
Approved by: https://github.com/benjaminglass1, https://github.com/chenyang78
2025-02-05 00:41:30 +00:00
Bin Bao
16420a78eb [AOTI] Remove AOTI_USE_CREATE_TENSOR_FROM_BLOB_V1 (#146039)
Summary: The AOTI_USE_CREATE_TENSOR_FROM_BLOB_V1 macro was used to solve a FC issue and it can be removed now.

Test Plan: CI

Differential Revision: D68871245

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146039
Approved by: https://github.com/yushangdi, https://github.com/hl475
2025-01-30 19:01:19 +00:00
PyTorch MergeBot
cfbb27462e Revert "[inductor][BE] Enable test_cpu_cpp_wrapper in fbcode (#145373)"
This reverts commit b8087747f5.

Reverted https://github.com/pytorch/pytorch/pull/145373 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/145373#issuecomment-2619674197))
2025-01-28 17:46:11 +00:00
H. Vetinari
e6c1e6e20e simplify torch.utils.cpp_extension.include_paths; use it in cpp_builder (#145480)
While working on conda-forge integration, I needed to look at the way the include paths are calculated, and noticed an avoidable duplication between `torch/utils/cpp_extension.py` and `torch/_inductor/cpp_builder.py`. The latter already imports the former anyway, so simply reuse the same function.

Furthermore, remove long-obsolete include-paths. AFAICT, the `/TH` headers have not existed since pytorch 1.11.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145480
Approved by: https://github.com/ezyang
2025-01-27 07:19:42 +00:00
Bin Bao
b8087747f5 [inductor][BE] Enable test_cpu_cpp_wrapper in fbcode (#145373)
Differential Revision: D68278174

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145373
Approved by: https://github.com/Skylion007
2025-01-24 17:59:13 +00:00
Irem Yuksel
66bf7da446 Enable sleef for Win Arm64 (#144876)
Sleef module was disabled for Windows Arm64 on b021486405
This PR enables it again since the issue is no longer valid.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144876
Approved by: https://github.com/albanD, https://github.com/malfet

Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>
2025-01-23 19:22:58 +00:00
Aaron Orenstein
893ca1dfe1 PEP585 update - torch/_inductor/[_-i]* (#145137)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145137
Approved by: https://github.com/bobrenjc93
2025-01-19 01:22:47 +00:00
bobrenjc93
a3ab27b8e0 Migrate from Tuple -> tuple in torch/_inductor (#144264)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144264
Approved by: https://github.com/eellison
2025-01-07 03:27:27 +00:00
Bin Bao
fecf03fa3f [AOTI][reland] Emit a CMakeLists.txt when package_cpp_only (#143680)
Summary: Emit a CMakeLists.txt with compile and link options when package_cpp_only is specified. After unzipping AOTI generated .pt2 package file, user can manually build the generated model code in their local environment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143680
Approved by: https://github.com/huydhn
2024-12-21 03:48:40 +00:00
xinan.lin
b5e159270a [AOTI XPU] Replace intel compiler with g++ to build inductor CPP wrapper in runtime. (#142322)
This PR aims to removes the de pendency on Intel Compiler at Inductor runtime. Now we only need a SYCL_HOME in runtime to find the sycl headers and libs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142322
Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/albanD
ghstack dependencies: #143491
2024-12-21 02:27:04 +00:00
Tom Ritchford
b5475d334e [inductor] Fix an unused variable in cpu_vec_isa.py (#138473)
----

* Extracted from https://github.com/pytorch/pytorch/pull/133492

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138473
Approved by: https://github.com/EikanWang, https://github.com/albanD, https://github.com/xuhancn
2024-12-20 18:50:19 +00:00
Huamin Li
f5af87c23c Make Inductor cpp backend enable_floating_point_contract_flag to take string (#143450)
Differential Revision: D66269001

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143450
Approved by: https://github.com/desertfire
2024-12-20 16:28:54 +00:00
PyTorch MergeBot
71479a9b9c Revert "[AOTI] Emit a CMakeLists.txt when package_cpp_only (#143352)"
This reverts commit 429f4cd140.

Reverted https://github.com/pytorch/pytorch/pull/143352 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the new test is failing on ROCm ([comment](https://github.com/pytorch/pytorch/pull/143352#issuecomment-2556365140))
2024-12-20 06:21:31 +00:00
Bin Bao
429f4cd140 [AOTI] Emit a CMakeLists.txt when package_cpp_only (#143352)
Summary: Emit a CMakeLists.txt with compile and link options when package_cpp_only is specified. After unzipping AOTI generated .pt2 package file, user can manually build the generated model code in their local environment.

Differential Revision: [D67458526](https://our.internmc.facebook.com/intern/diff/D67458526)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143352
Approved by: https://github.com/malfet
2024-12-19 22:01:05 +00:00
Bin Bao
0e8013fc1c [AOTI] Fix a typo in cpp_builder.py (#143351)
Summary: passthough -> passthrough

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143351
Approved by: https://github.com/yushangdi, https://github.com/chenyang78
ghstack dependencies: #143350
2024-12-18 16:28:37 +00:00
Benjamin Glass
bb06fc79fb cpp_builder: handle CUDA lib paths involving "stubs" in more circumstances (#142175)
conda packages for `cuda-driver-dev=12.4.127` use a "stubs" subdirectory to contain `libcuda.so`.  This was previously only handled by cpp_builder in some cases, but now needs to be potentially handled more generally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142175
Approved by: https://github.com/desertfire
2024-12-17 17:21:27 +00:00
Tom Ritchford
dc23f1944a Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-12 17:39:14 +00:00
Colin L. Rice
d68403df3b filelock: Make waitcounter variant to use (#139816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816
Approved by: https://github.com/ezyang
2024-12-12 01:18:34 +00:00
PyTorch MergeBot
5c97ac9721 Revert "Remove unused Python variables in torch/[_-a]* (#133492)"
This reverts commit fda975a7b3.

Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))
2024-12-11 17:29:12 +00:00
PyTorch MergeBot
2374d460d0 Revert "filelock: Make waitcounter variant to use (#139816)"
This reverts commit 237c4b559c.

Reverted https://github.com/pytorch/pytorch/pull/139816 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/139816#issuecomment-2536616808))
2024-12-11 17:26:46 +00:00
Colin L. Rice
237c4b559c filelock: Make waitcounter variant to use (#139816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816
Approved by: https://github.com/ezyang
2024-12-10 23:02:59 +00:00
Tom Ritchford
fda975a7b3 Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-10 21:48:44 +00:00
Colin Peppler
0602676c8d [CUTLASS][AOTI] Fixes undefined symbol: cudaLaunchKernelExC (#142094)
Summary:
### Context
* When compiling the object file for a CUTLASS kernel, CUDA RT symbols are left undefined.
* When compiling the final shared object file, we statically link with `libcudart_static.a`.
* One important thing is that ordering matters when specifying the lib search paths (-L).

Test Plan:
```
// before diff
RuntimeError: Failure loading .so: /tmp/tmpqhz_dnza/model.so: undefined symbol: cudaLaunchKernelExC
```

Differential Revision: D66793974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142094
Approved by: https://github.com/chenyang78, https://github.com/hl475
2024-12-06 02:18:54 +00:00
xinan.lin
4742080ed9 [AOTI XPU] Enable Cpp wraper for Intel GPU. (#135318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135318
Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire
2024-11-26 11:51:32 +00:00
Joseph Kleinhenz
7b2138b864 [inductor] fix uncaught exception when checking for openmp on macos (#141208)
Based on #133776

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141208
Approved by: https://github.com/Skylion007
2024-11-21 22:17:52 +00:00
Aaron Gokaslan
12e95aa4ee [BE]: Apply PERF401 autofixes from ruff (#140980)
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-11-20 17:52:07 +00:00
Valentine233
263a5bf95e [cpu] Modify inductor opt flag --- ftree-loop-vectorize (#136827)
Reopen https://github.com/pytorch/pytorch/pull/121782, as more optimizations have landed.

Fixes https://github.com/pytorch/pytorch/issues/115261, https://github.com/pytorch/pytorch/issues/113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

### Validation on 3 benchmark suites

#### FP32
![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e)

Outlier models (speedup<0.8, single socket): None.

#### BF16
![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80)

Outlier models (speedup<0.8, single socket multi threads):

- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136827
Approved by: https://github.com/jansel, https://github.com/jgong5
2024-11-12 01:26:18 +00:00
PyTorch MergeBot
347f96061f Revert "[cpu] Modify inductor opt flag --- ftree-loop-vectorize (#136827)"
This reverts commit cf0bb6c435.

Reverted https://github.com/pytorch/pytorch/pull/136827 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. See D65605094 for more details ([comment](https://github.com/pytorch/pytorch/pull/136827#issuecomment-2465805271))
2024-11-08 21:52:33 +00:00
Valentine233
cf0bb6c435 [cpu] Modify inductor opt flag --- ftree-loop-vectorize (#136827)
Reopen https://github.com/pytorch/pytorch/pull/121782, as more optimizations have landed.

Fixes https://github.com/pytorch/pytorch/issues/115261, https://github.com/pytorch/pytorch/issues/113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

### Validation on 3 benchmark suites

#### FP32
![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e)

Outlier models (speedup<0.8, single socket): None.

#### BF16
![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80)

Outlier models (speedup<0.8, single socket multi threads):

- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136827
Approved by: https://github.com/jansel, https://github.com/jgong5
2024-11-07 02:49:52 +00:00
Irem Yuksel
b021486405 Enable Windows Arm64 (#133088)
This PR enables Pytorch for Windows on Arm64 - CPU only.
Currently, there aren't any checks in place to build and test for Windows on Arm64, but we're working to implement those as soon as possible.
We recommend using [Arm Performance Libraries (APL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) as a BLAS option, which is introduced in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133088
Approved by: https://github.com/malfet

Co-authored-by: cristian panaite <panaite.cristian2000@gmail.com>
Co-authored-by: Stefan-Alin Pahontu <56953855+alinpahontu2912@users.noreply.github.com>
Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>
2024-10-24 16:10:44 +00:00
Zhuoran Zhao
2414c3f534 AOTI fixes for MI300 lowering (#137939)
Summary:
1) Add sleef back to enable SIMD on AMD
2) adding kpack to triton compute_meta  for AMD triton, since there will be user-defined triton kernels using this for k-dim packing

Test Plan:
```
HIP_VISIBLE_DEVICES=0 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" buck run mode/{opt,amd-gpu} -c fbcode.triton_backend=amd -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark --  --skip-flop-estimation --skip-trt --skip-ait --enable-aot-inductor --sync-mode=0 --gpu-trace --sample-input-tile-factor=1  --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/input.merge" --lowering-input-str='{"serialized_inference_model_input_path":"ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/input.merge","serialized_inference_model_output_path":"ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/mi300_output.merge","submodule_names_to_lower":["merge"],"inductor_lowering_context":{"aot_inductor_lowering_settings":{"use_scripting":true,"preset_lowerer":"ifu_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change","precision":3,"output_precision":3, "remove_unexpected_type_cast":false, "sample_input_tile_factor":32}},"model_entity_id":925729118,"model_snapshot_id":0,"add_sample_inputs":false,"hardware_type":0,"platform_arch":1,"dense_in_place_format":2}' --precision=bf16 2>&1 | tee local_benchmark_log.txt

```

Differential Revision: D64262924

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137939
Approved by: https://github.com/frank-wei
2024-10-17 16:09:04 +00:00
Bin Bao
fe43f72be7 [AOTI] Remove the non-ABI-compatible mode (part 2) (#138047)
Summary: Continue to clean up non-ABI-compatible mode related code.

Differential Revision: [D64444327](https://our.internmc.facebook.com/intern/diff/D64444327)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138047
Approved by: https://github.com/chenyang78
ghstack dependencies: #137982, #138016, #138009
2024-10-17 02:54:24 +00:00
Henry Tsang
a0a978ce23 [aoti config] add raise_error_on_ignored_optimization (#138035)
Summary: Unfortunately this means adding another config.

Test Plan: ci

Differential Revision: D64437699

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138035
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2024-10-16 18:38:47 +00:00
Bin Bao
c04b35a5ae [AOTI] Add standalone version of TORCH_CHECK (#136873)
Summary: In the standalone mode, TORCH_CHECK throws std::runtime_error, instead of c10::Error. The goal is to cut dependency on libtorch. Specifically, AOTI generates CPU code which may call ATen vectorization ops and we need to make sure those ops are self-contained.

Differential Revision: [D63911928](https://our.internmc.facebook.com/intern/diff/D63911928)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136873
Approved by: https://github.com/albanD, https://github.com/chenyang78
2024-10-08 15:30:01 +00:00