Commit Graph

1875 Commits

Author SHA1 Message Date
Ivan Zaitsev
3d4ca228be Remove METADATA.bzl files (#166574)
(meta-internal, should not be synced)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166574
Approved by: https://github.com/bigfootjon
2025-10-29 23:17:41 +00:00
PyTorch MergeBot
1fdef664a5 Revert "[Pytorch] Update Kineto Submodule (#166317)"
This reverts commit be28329710.

Reverted https://github.com/pytorch/pytorch/pull/166317 on behalf of https://github.com/jeffdaily due to ROCm CI was clean, but post-merge ROCm failures showed up ([comment](https://github.com/pytorch/pytorch/pull/166317#issuecomment-3458665809))
2025-10-28 21:55:38 +00:00
Shivam Raikundalia
be28329710 [Pytorch] Update Kineto Submodule (#166317)
Summary: Update Submodule

Test Plan: CI

Differential Revision: D85579130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166317
Approved by: https://github.com/Skylion007
2025-10-28 10:41:17 +00:00
Cui, Yifeng
90b30ebf7e Update torch-xpu-ops commit pin (#166129)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@8d373b](8d373ba272), includes:

- Add CONFIGURE_DEPENDS in install_xpu_headers macro to track these headers
- Add check to ensure P2P Tensors are dense
- Switch philox_engine_inputs usage to philox_xpu_state per XPU graph request
- Add vectorization path for maxpool backward channel last
- Fix SYCL_PRINT macro usable on Windows
- Eliminate unnecessary warning if no AOT enabled

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166129
Approved by: https://github.com/EikanWang
2025-10-27 08:17:03 +00:00
Pawel Swider
d97f6550a2 [Intel GPU] Xpu matmul implementation for complex dtype (#160867)
Enabling complex datatype support for 4 ops: `mm`, `bmm`, `addmm`, `baddbmm` for XPU. From now implementation will call functions created in: https://github.com/intel/torch-xpu-ops/pull/1992.

Additionally added complex datatype tests for matmul operators. More detailed tests are going to be enabled in: https://github.com/intel/torch-xpu-ops/pull/1993

CI runs have found that `test_comprehensive_linalg_eig_xpu` tests were calling internally matmul with complex datatype. With this PR test starts to pass so linalg.eig was removed from `inductor_expected_failures_single_sample["xpu"]` as otherwise it was failing with: `Unexpected success` message.

Part of: https://github.com/intel/torch-xpu-ops/issues/1853

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160867
Approved by: https://github.com/guangyey, https://github.com/ZhiweiYan-96, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/Silv3S, https://github.com/CuiYifeng, https://github.com/jansel
2025-10-25 17:13:13 +00:00
Darshan Sanghani
263901cec4 [pytorch/kineto] Update Kineto Submodule (#166150)
Summary: Update to include some race condition fixes.

Test Plan: n/a

Differential Revision: D85390799

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166150
Approved by: https://github.com/sraikund16, https://github.com/cyyever
2025-10-24 04:03:13 +00:00
Simon Layton
0b58d87aec [Submodule] Bump FBGEMM to latest (#165544)
Summary:

* FBGEMM submodule updated to main
* CMake updated to reflect necessary changes
* Notably pulls in NVFP4 grouped gemm kernels

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Simon Layton <simonlayton@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165544
Approved by: https://github.com/cyyever, https://github.com/jeffdaily
2025-10-22 20:57:15 +00:00
PyTorch MergeBot
0da1f911dc Revert "[Submodule] Bump FBGEMM to latest (#165544)"
This reverts commit 23417ae50f.

Reverted https://github.com/pytorch/pytorch/pull/165544 on behalf of https://github.com/clee2000 due to failing in internal D84996252, probably needs some sort of update to fbgemm internally? ([comment](https://github.com/pytorch/pytorch/pull/165544#issuecomment-3422993703))
2025-10-20 17:06:07 +00:00
Aaron Gokaslan
ceb11a584d [BE]: Update kleidai submodule to v1.15.0 (#165842)
This mostly just adds a few new kernels and fixes some IMA and performance improvement of prev kernels. Also improves compiler support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165842
Approved by: https://github.com/albanD
2025-10-19 08:25:03 +00:00
Aaron Gokaslan
33adb276fe [BE][Ez]: Update Eigen to 5.0.0. C++14 support and more! (#165840)
Update Eigen pin to 5.0.0 . Tons of new features and perf improvements. Most importantly updates minimum from C++03 to C++14 giving a ton of performance optimizations like properly implemented move operators, simplified code, etc. Also improved vectorization particularily on ARM. We really only use this library as a fallback for sparse operators, but still useful to update it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165840
Approved by: https://github.com/albanD
2025-10-19 08:00:06 +00:00
Simon Layton
23417ae50f [Submodule] Bump FBGEMM to latest (#165544)
Summary:

* FBGEMM submodule updated to main
* CMake updated to reflect necessary changes
* Notably pulls in NVFP4 grouped gemm kernels

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Signed-off-by: Simon Layton <simonlayton@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165544
Approved by: https://github.com/cyyever, https://github.com/jeffdaily
2025-10-18 03:58:08 +00:00
Aaron Gokaslan
de09bab4b6 [BE]: Update cudnn frontend submodule to 1.15.0 (#165776)
Update cudnn frontend submodule to 1.15.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165776
Approved by: https://github.com/eqy
2025-10-18 02:23:27 +00:00
Cui, Yifeng
9e89b1c4c7 Update torch-xpu-ops commit pin (#165321)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@ce9db1](ce9db15136), includes:

- Fix test_barrier hang by using static global rank in ProcessGroupXCCL
- Update install_xpu_headers only when content should change to speedup recompilation
- Add global rank information to communication logging
- Remove duplicate normalization from FFT methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165321
Approved by: https://github.com/EikanWang
2025-10-14 09:07:24 +00:00
Cui, Yifeng
53f5af8c92 Update torch-xpu-ops commit pin (#164237)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@f30173](f301733b03), includes:

- Install xpu internal headers to PyTorch
- Fix error handling for BatchLinearAlgebra Ops
- Fix unnecessary double data type conversion
- Fix overflow when calculating workgroups count
- Fix segmentation fault and calculation error in AveragePool2dKernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164237
Approved by: https://github.com/EikanWang
2025-10-09 10:38:59 +00:00
Henry Tsang
7d7ae4d7b2 [submodule] upgrade cutlass version to 4.2.1 and completely resolved python/cutlass name collision (#164156)
Differential Revision: D83489362

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164156
Approved by: https://github.com/Skylion007, https://github.com/mlazos
2025-09-30 17:04:57 +00:00
Cui, Yifeng
4783e3ff49 Update torch-xpu-ops commit pin (#163758)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@229e8b](229e8ba104), includes:

- Revert tracking of Work status for FlightRecorder in ProcessGroupXCCL to fix memory leak
- Enable SYCL warnings on Linux
- Fix accuracy issues with CTC loss
- Enable aten::nonzero_static on XPU backend
- Stop recursive calculations in polynomial kernels if tensor has NaNs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163758
Approved by: https://github.com/EikanWang
2025-09-26 09:05:08 +00:00
Shivam Raikundalia
45d9dcccc5 Update Kineto Submodule (#162222)
Summary: Update

Test Plan:
CI

Rollback Plan:

Differential Revision: D81727392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162222
Approved by: https://github.com/sanrise
2025-09-23 06:08:55 +00:00
Chris Thi
e310cc5e06 Update fbgemm submodule (#163411)
Test Plan:

As titled, includes some new changes fbgemm to see if CUDA13 breakage is fixed.

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163411
Approved by: https://github.com/Skylion007
2025-09-22 15:46:11 +00:00
Yuanyuan Chen
8a281d7214 [submodule] Bump libfmt to 12.0.0 (#163441)
libfmt 12.0 brings new optimisations and fixes some compilation issues for clang 21 (https://github.com/fmtlib/fmt/pull/4477).
For a detailed release log, see https://github.com/fmtlib/fmt/releases/tag/12.0.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163441
Approved by: https://github.com/Skylion007
2025-09-21 22:37:25 +00:00
PyTorch MergeBot
607469bdad Revert "[ROCm] Bump FBGEMM commit to avoid CK errors (#162590)"
This reverts commit c9b80c4d4b.

Reverted https://github.com/pytorch/pytorch/pull/162590 on behalf of https://github.com/malfet due to This breaks CUDA 13 builds ([comment](https://github.com/pytorch/pytorch/pull/162590#issuecomment-3313263772))
2025-09-19 18:13:00 +00:00
Han Chao
e134bb340a Update torch-xpu-ops commit pin (#163244)
Update the torch-xpu-ops commit to 24fab67b6e, includes:

- Clean up getDeviceIndexOfCurrentQueue
- Fix hardswish gradients corner case
- Fix xccl contiguous check
- Move checks from nonzero kernel to operator
- support high priority stream for xccl

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163244
Approved by: https://github.com/EikanWang
2025-09-19 02:04:40 +00:00
Prachi Gupta
c9b80c4d4b [ROCm] Bump FBGEMM commit to avoid CK errors (#162590)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162590
Approved by: https://github.com/jeffdaily
2025-09-19 01:14:20 +00:00
henrylhtsang
a81a2e54ed [submodule] CUTLASS upgrade to 4.2.0 and change cutlass to cutlass_cppgen (#163092)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163092
Approved by: https://github.com/drisspg, https://github.com/Skylion007
2025-09-18 18:03:51 +00:00
Eddie Yan
9b7a8c4d05 [cuDNN][SDPA][submodule] Roll-back cuDNN frontend upgrade, update Meta registration (#163104)
For https://github.com/pytorch/torchtitan/issues/1713

Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time 1a7b4b78db/include/cudnn_frontend/node/sdpa_support_surface.h (L447%C2%A0)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163104
Approved by: https://github.com/drisspg
2025-09-17 15:48:54 +00:00
Nikita Shulga
65845d7291 Update Gloo submodule (#163112)
Which makes PyTorch buildable with gcc-15, tested by running the build inside `fedora:44` docker
```
docker run --rm -it fedora:44 bash -c "yum install -y g++ python3-devel git; git clone https://github.com/pytorch/pytorch; cd pytorch; git checkout 8f710acce8332979c9a7bf97e72666dfd35c43e6; python3 -mpip install -r requirements.txt; python3 setup.py bdist_wheel"
```

Fixes https://github.com/pytorch/pytorch/issues/156595
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163112
Approved by: https://github.com/huydhn
2025-09-17 03:04:09 +00:00
Cui, Yifeng
9786243b64 Update torch-xpu-ops commit pin (#162804)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@d8c3ee](d8c3eefc29), includes:

- Optimize adaptive average pool for channel-last memory format
- Add unregister wait_tensor
- Replace deprecated `[[intel::reqd_sub_group_size(SgSize)]]` with `[[sycl::reqd_sub_group_size(SIMD)]]` and remove unnecessary attributes
- Revert "Roll back to original usage of sycl::get_kernel_bundle"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162804
Approved by: https://github.com/EikanWang
2025-09-16 06:30:48 +00:00
Xu Han
52d4660ae9 [AOTI] Fix Windows fail to zip opened file. (#162617)
Original issue:
<img width="1767" height="544" alt="Image" src="https://github.com/user-attachments/assets/9de90d50-217f-4049-8f19-77ff1660c8b0" />

reproducer:
```cmd
pytest test\inductor\test_aot_inductor.py -v -k test_weight_on_disk_legacy_cpu
```

Fixed list:
1. `WritableTempFile`'s `__exit__` function auto unlink opened file, when the file was opened, it should raise error. Ignore it on Windows.
2. When open zip file, if the file is opened, it would be failed. Switch to `_wfsopen` with shared access flag, which can open file with shared access.

Local test passed:
<img width="1101" height="233" alt="image" src="https://github.com/user-attachments/assets/935cbf2e-52db-41f1-80fa-617569b92a96" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162617
Approved by: https://github.com/jansel
2025-09-11 06:22:21 +00:00
PyTorch MergeBot
96ef26f71a Revert "[ROCm] Integrate AITER Fav3 fwd kernels (#160105)"
This reverts commit d2393c2d7d.

Reverted https://github.com/pytorch/pytorch/pull/160105 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing internal ROCm build ([comment](https://github.com/pytorch/pytorch/pull/160105#issuecomment-3273297183))
2025-09-10 04:42:28 +00:00
PyTorch MergeBot
2281d009e5 Revert "[ROCm] Add specific compile options for CK SDPA (#161759)"
This reverts commit d22d916719.

Reverted https://github.com/pytorch/pytorch/pull/161759 on behalf of https://github.com/huydhn due to Sorry for reverting your change but this seems to break internal ROCm jobs ([comment](https://github.com/pytorch/pytorch/pull/161759#issuecomment-3272807726))
2025-09-10 00:44:30 +00:00
Andy Lugo
d2393c2d7d [ROCm] Integrate AITER Fav3 fwd kernels (#160105)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160105
Approved by: https://github.com/jeffdaily
2025-09-09 22:30:12 +00:00
Andy Lugo
d22d916719 [ROCm] Add specific compile options for CK SDPA (#161759)
Updates CK version and adds CK specific compilation options

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161759
Approved by: https://github.com/jeffdaily
2025-09-09 20:04:19 +00:00
Aaron Gokaslan
ec2c1371af [BE]: Update cudnn frontend submodule to 1.14.1 (#162347)
Fixes a few bugs introduced to CUDNN 1.11 which affects all our CUDA13 builds. Also adds support for new CUDNN features whenever we choose to update. @eqy pretty sure this addresses the concern you had over the previous upgrade since that bugfix is now merged. This is a simple header only update.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162347
Approved by: https://github.com/eqy, https://github.com/atalman
2025-09-08 20:03:23 +00:00
Daniel Vega-Myhre
b6d0a9ea90 MXFP8 grouped GEMM support for torch._scaled_grouped_mm + submodule bump (#162209)
## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: https://github.com/pytorch/FBGEMM/pull/4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
    - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
    - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
    - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
    - Bump FBGEMM third party submodule to include:
          - https://github.com/pytorch/FBGEMM/pull/4816
          - https://github.com/pytorch/FBGEMM/pull/4820
          - https://github.com/pytorch/FBGEMM/pull/4821
          - https://github.com/pytorch/FBGEMM/pull/4823

#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`

## Test plan

#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581)

#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...

test/test_matmul_cuda.py .........                                                                                                                        [100%]

============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162209
Approved by: https://github.com/ngimel
2025-09-06 15:25:30 +00:00
Aaron Gokaslan
1f51056bd6 [BE]: Update cpp-httplib submodule to 0.26.0 (#162181)
Update cpp-httplib with better error handling, bugfixes, and performance. Header only library update.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162181
Approved by: https://github.com/jansel
2025-09-04 18:59:32 +00:00
Cui, Yifeng
ba7f546ccc Update torch-xpu-ops commit pin (#162062)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@83c5a5](83c5a5a551), includes:

- Revert "Disable xccl timer avoid drlm hang" because XPU time event issue has been fixed
- Fallback lu_factor kernel to CPU for single batch
- Enable aten::linalg_inv and aten::linalg_inv_ex on XPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162062
Approved by: https://github.com/EikanWang
2025-09-04 17:05:33 +00:00
PyTorch MergeBot
4cdaf8265d Revert "Update Kineto submodule (#161572)"
This reverts commit d33840c542.

Reverted https://github.com/pytorch/pytorch/pull/161572 on behalf of https://github.com/seemethere due to This appears as though its causing downstream build failures in inductor workflows and for developers working locally. Going to revert out of an abundance of caution. ([comment](https://github.com/pytorch/pytorch/pull/161572#issuecomment-3247121981))
2025-09-02 23:28:19 +00:00
Yu, Guangye
a99d8d39bc Update torch-xpu-ops commit pin (#161919)
# Motivation
1. Fallback some linalg functionality such as `linalg_eig`, `linalg_householder_product`, `linalg_solve_triangular` to CPU;
2. Fix codegen dependency bug.

# Additional Context
This PR aims to fix https://github.com/pytorch/pytorch/issues/161498

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161919
Approved by: https://github.com/EikanWang
2025-09-02 17:09:07 +00:00
Shivam Raikundalia
d33840c542 Update Kineto submodule (#161572)
Differential Revision: D81087601

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161572
Approved by: https://github.com/cyyever, https://github.com/aaronenyeshi
2025-09-02 16:31:55 +00:00
yucai-intel
f44ad54bc6 Update torch-xpu-ops commit pin (#161152)
Update the torch-xpu-ops commit to [8b58040ee32689487f660462f655085f31506dab](8b58040ee3), includes:

- Add vectorization path on maxpool forward channel last
- Add FlightRecorder support for ProcessGroupXCCL
- Fix random build failure on codegen
- Suppress dllexport warning on Windows
- Make torch-xpu-ops build depend on ATen XPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161152
Approved by: https://github.com/EikanWang

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
2025-08-30 07:19:24 +00:00
Benjamin Glass
cbc53b7696 Update pybind11 submodule to 3.0.1 (#160754)
Upgrade to PyBind11 v3. This allows us to strip out our own (possibly broken?) handling of the C++ ABI when building extensions, in favor of the more-complete PyBind11 internal handling.

Fixes a few test failures due to https://github.com/pybind/pybind11/issues/5774, which effectively makes the `__qualname__` attribute of functions platform-dependent.

Test plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160754
Approved by: https://github.com/Skylion007
2025-08-27 21:15:01 +00:00
PyTorch MergeBot
1b34e04485 Revert "Update pybind11 submodule to 3.0.1 (#160754)"
This reverts commit 660b0b8128.

Reverted https://github.com/pytorch/pytorch/pull/160754 on behalf of https://github.com/atalman due to please see https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226051449 ([comment](https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226078102))
2025-08-26 23:35:22 +00:00
Benjamin Glass
660b0b8128 Update pybind11 submodule to 3.0.1 (#160754)
Upgrade to PyBind11 v3. This allows us to strip out our own (possibly broken?) handling of the C++ ABI when building extensions, in favor of the more-complete PyBind11 internal handling.

Fixes a few test failures due to https://github.com/pybind/pybind11/issues/5774, which effectively makes the `__qualname__` attribute of functions platform-dependent.

Test plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160754
Approved by: https://github.com/Skylion007
2025-08-26 01:21:18 +00:00
frost-intel
9b4adc4db7 [fr] [xpu] Add FlightRecorder support for ProcessGroupXCCL (#158568)
Adds support for FlightRecorder in ProcessGroupXCCL.

See https://github.com/intel/torch-xpu-ops/pull/1867 for XCCL implementation and more details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158568
Approved by: https://github.com/guangyey, https://github.com/fduwjj
2025-08-22 09:03:35 +00:00
Johnny
691d17a5c6 Update TensorPipe submodule (#160808)
To a commit containing  https://github.com/pytorch/tensorpipe/pull/464 that fixes compilation with CUDA-13

Fixes https://github.com/pytorch/pytorch/issues/160104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160808
Approved by: https://github.com/nWEIdia, https://github.com/Skylion007, https://github.com/malfet
2025-08-17 14:11:41 +00:00
chunhuanMeng
663da17b62 Update torch-xpu-ops commit pin (#160062)
Update the torch-xpu-ops commit to [77cc792cd265179745d335579d233e6d4f9a2667](77cc792cd2), includes:

- Ensures that the XPU cache is cleared before creating tensors during the test
- Add unused variable warning
- Fix test_linalg and test_torch issue with bf32_on_and_off updates
- Fix deterministic indexing with broadcast
- Fix dist.gather with noncontiguous tensor
- Improve accuracy of index put deterministic kernel
- Add generate file rely avoid build before generate
- optimize embedding bag

Fixes #160661

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160062
Approved by: https://github.com/EikanWang
2025-08-15 15:27:24 +00:00
cyy
c184cb3852 [submodule] Bump fbgemm to latest (#158210)
Merge the recent commits of FBGEMM and remove unnecessary CMake code.
Specifically, we
1. enable `fbgemm_autovec` since the target is now correctly handled.
2. remove option `USE_FAKELOWP` which is not used.
3. remove `CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS` check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158210
Approved by: https://github.com/q10
2025-08-11 13:48:02 +00:00
Frank Seide
b8ef60b6bc Enable XNNPACK aarch64 builds (#159762)
Summary:
This fixes the build of TorchScript's XNNPACK dependency for our aarch64 device.

Thanks to andrewjcg for proposing this fix.

Rollback Plan:

Reviewed By: andrewjcg

Differential Revision: D79497613

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159762
Approved by: https://github.com/frankseide, https://github.com/malfet

Co-authored-by: Frank Seide <seide@meta.com>
2025-08-06 20:20:32 +00:00
Nikita Shulga
9b953bb3fb [BE] Update TensorPipe pin (#159834)
No functional changes, just:
- Update C++ standard to C++17
- Update `cmake` min version to 3.18
- Update `libuv` dependency to 1.51 (to move its cmake min version to 3.10)
- Replace boost optional implementation with `std::optional` wrapper
- Make it compilable with gcc-14.x plus by including `cstddef` in few headers
-  Avoid using deprecated enums for MacOS builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159834
Approved by: https://github.com/Skylion007
2025-08-05 20:45:09 +00:00
Cui, Yifeng
57ab39f7e4 Update torch-xpu-ops commit pin (#159621)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@1f7a57](1f7a57f507) includes:

- Add Template Parameter to the function `gpu_kernel` for Controlling Broadcasting Vectorization
- Add optional NaN checks to XCCL
- Fix NllLossForwardReduce2DKernelFunctor accuracy
- Extend the existing communication logging to include the reduction operation for collective calls
- [Reland] Install xpu codegen header to torch/include
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159621
Approved by: https://github.com/EikanWang
2025-08-05 01:46:15 +00:00
Andy Lugo
06d28de17a Update CK Kernel generation and update ck submodule (#157964)
changes required to reduce the number of ck kernels generated. This change depends on https://github.com/ROCm/composable_kernel/pull/2480 to be merged first.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157964
Approved by: https://github.com/842974287
2025-08-01 22:24:27 +00:00