Commit Graph

1454 Commits

Author SHA1 Message Date
Ben Olson
1bebe0424e Fix platform detection in MKLDNN CMake file (#142067)
When building PyTorch with `USE_XPU=True` and Clang,
the user sees misleading errors related to incorrect platform
detection that assumes that all users that are not using the GNU
compilers are on Windows. We can fix this by simply using CMake's
builtin platform detection variables.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142067
Approved by: https://github.com/EikanWang, https://github.com/min-jean-cho, https://github.com/guangyey
2025-05-26 06:09:37 +00:00
cyy
8fe7ec6721 Add /Zc:preprocessor for torch libraries in MSVC builds (#147825)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147825
Approved by: https://github.com/janeyx99
2025-05-24 06:57:46 +00:00
Xu Han
7421c21b5e remove unused code. (#153979)
Remove the unused cmake code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153979
Approved by: https://github.com/albanD
2025-05-22 17:50:11 +00:00
Scott Todd
0e5f2339d0 [ROCm][Windows] Run hipcc with compatibility flags. (#153986)
See also https://github.com/ROCm/TheRock/issues/590. Including the `-Wno-ignored-attributes` flag here avoids 700MB of log warning spam while compiling and the `-fms-extensions` seems beneficial to include: https://clang.llvm.org/docs/MSVCCompatibility.html.

Co-authored-by: Aaryaman Vasishta <jem456.vasishta@gmail.com>
Co-authored-by: Scott Todd <scott.todd0@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153986
Approved by: https://github.com/Skylion007, https://github.com/jeffdaily

Co-authored-by: Aaryaman Vasishta <jem456.vasishta@gmail.com>
2025-05-21 20:26:52 +00:00
Yu, Guangye
daa68e7a93 Update USE_XCCL option if USE_XPU is OFF (#153936)
# Motivation
Disable `USE_XCCL` when `USE_XPU` is turned `OFF` to ensure configuration consistency. This is required because XCCL depends on XPU functionality.
Especially, ensure that `USE_XCCL` is correctly set to `OFF` when [caffe2_update_option(USE_XPU OFF)](1075bb37d3/cmake/Dependencies.cmake (L97)) is invoked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153936
Approved by: https://github.com/Skylion007
2025-05-21 01:32:41 +00:00
Xu Han
179e7d8624 Fix vs2022 caused AVX512 illegal instruction issue. (#153480)
Fixes #145702

Add `/d2implyavx512upperregs-` to disable compiler over-aggressive optimization, which caused involeved AVX512 register on AVX2 machine.

Reference to: https://github.com/pytorch/pytorch/issues/145702#issuecomment-2874029459

Local test passed:
<img width="1208" alt="image" src="https://github.com/user-attachments/assets/26f4cb91-6bb5-416f-aa35-c899eb1489b2" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153480
Approved by: https://github.com/Blackhex, https://github.com/cyyever, https://github.com/atalman
2025-05-20 20:37:00 +00:00
Gantaphon Chalumporn
05bc78e64f [submodule] Update fbgemm pinned version (#153950)
Summary:
Update fbgemm pinned version in PyTroch.
Related update in fbgemm: D74434751

Included changes:
Update fbgemm external dependencies directory in setup.py
Add DISABLE_FBGEMM_AUTOVEC flag to disable fbgemm's autovec

Test Plan: PyTorch OSS CI

Differential Revision: D75073516

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153950
Approved by: https://github.com/Skylion007, https://github.com/ngimel
2025-05-20 20:24:27 +00:00
Xu Han
2ade886412 [XPU] [Windows] Auto turn on kineto XPU build when compiler version support. (#153681)
Since SYCL compiler 20250101, it will remove dependency of level zero header. We can turn on kineto XPU by default.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153681
Approved by: https://github.com/chuanqi129, https://github.com/cyyever, https://github.com/EikanWang
2025-05-19 03:07:14 +00:00
cyy
7ae7324ac4 [submodule] Update google benchmark to v1.9.3 (#153676)
And remove `include_directories`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153676
Approved by: https://github.com/Skylion007
2025-05-16 23:31:53 +00:00
PyTorch MergeBot
084c4aa614 Revert "Reapply "Delete TorchScript based Android demo app and point to ExecuTorch (#153633)" (#153656)"
This reverts commit 7ed377f577.

Reverted https://github.com/pytorch/pytorch/pull/153656 on behalf of https://github.com/larryliu0820 due to Still being used internally so can't remove ([comment](https://github.com/pytorch/pytorch/pull/153656#issuecomment-2887665403))
2025-05-16 21:00:11 +00:00
Mengwei Liu
7ed377f577 Reapply "Delete TorchScript based Android demo app and point to ExecuTorch (#153633)" (#153656)
This reverts commit ae0e8f0c73.

Keep android/libs/fbjni because it's being used by other components of
PyTorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153656
Approved by: https://github.com/malfet
2025-05-16 04:35:42 +00:00
cyy
9d3b6ee4c1 [submodule] Update gtest to v1.17.0 (#153618)
And remove some outdated CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153618
Approved by: https://github.com/malfet
2025-05-16 01:24:19 +00:00
Tristan Rice
d1dd2c1fc8 gloo: cuda (#153406)
This enables Gloo CUDA when used with a backend that supports GPUDirect which currently is only the IBVERBS backend.

This requires some changes to Gloo which are in https://github.com/pytorch/gloo/pull/441

Since we're now depending on gloo_cuda we need to split ProcessGroupGloo into two pieces, one with the CPU bits (libtorch_cpu) and one with CUDA kernels in libtorch_cuda. This unfortunately requires some major refactoring as some CPU code is shared across both.

The gloo submodule is updated to depend on the new Gloo changes

Test plan:

```py
import os
import time

transport = "TCP"
#transport = "IBVERBS"

os.environ["GLOO_DEVICE_TRANSPORT"] = transport
rank = int(os.environ["RANK"])
os.environ["CUDA_VISIBLE_DEVICES"] = str(rank)

ibv = "mlx5_0:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_9:1,mlx5_10:1,mlx5_11:1".split(",")[rank]
ibv_name, ibv_port = ibv.split(":")
os.environ["TORCH_GLOO_IBV_NAME"] = ibv_name
os.environ["TORCH_GLOO_IBV_PORT"] = ibv_port
os.environ["TORCH_GLOO_IBV_INDEX"] = "3"

import torch
import torch.distributed as dist

dist.init_process_group("gloo")

rank = dist.get_rank()

# initial sanity check
#device = "cpu"
#t = torch.zeros(10, device=device)
#dist.all_reduce(t)
#print("sanity complete")

device = "cpu"

iters = 10
warmup_iters = 2

for nelem in [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000]:
    t = torch.zeros(nelem, device=device)

    torch.cuda.current_stream().synchronize()
    for i in range(warmup_iters):
        dist.all_reduce(t)

    torch.cuda.current_stream().synchronize()

    start = time.perf_counter()

    for i in range(iters):
        dist.all_reduce(t)

    torch.cuda.current_stream().synchronize()

    dur = (time.perf_counter() - start)
    qps = iters/dur

    bandwidth_gb = t.nbytes * iters / dur / 1e9

    gb = t.nbytes / 1e9

    if rank == 0:
        print(f"{transport=} {device=} {iters=} {nelem=} {qps=} {gb=} {bandwidth_gb=}\n", end="")
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153406
Approved by: https://github.com/fduwjj
2025-05-16 01:13:13 +00:00
cyy
e5e06d9cab [submodule] Update kleidiai to v1.8.0 (#153592)
And cleans up some CMake instructions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153592
Approved by: https://github.com/malfet
2025-05-15 10:14:05 +00:00
hanchao
0ca91af6b8 Define USE_C10D_XCCL and USE_XCCL in pytorch (#147593)
### Motivation:

Add `USE_XCCL` and `USE_C10D_XCCL` to enable support of XCCL backend building in stock PyTorch, similar to `USE_NCCL` and `USE_C10D_NCCL`.
 By default, `USE_XCCL` is OFF and allowed set to ON explicitly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147593
Approved by: https://github.com/guangyey, https://github.com/malfet, https://github.com/albanD, https://github.com/cyyever
2025-05-15 05:39:00 +00:00
Anthony Shoumikhin
7d39e73c57 Fix more URLs (#153277)
Or ignore them.
Found by running the lint_urls.sh script locally with https://github.com/pytorch/pytorch/pull/153246

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153277
Approved by: https://github.com/malfet
2025-05-14 16:23:50 +00:00
Tristan Rice
9c3cef437c gloo: support ibverbs in cmake (#153425)
This updates the gloo submodule in PyTorch to a version that supports the new ibverbs backend that can be used with PyTorch.

Test plan:

```
sudo dnf install rdma-core-devel
USE_GLOO_IBVERBS=ON python setup.py develop
torchrun --nproc_per_node 2 ~/scripts/gloo_ibverbs_test.py
```

```py
"""
run with:

torchrun --nproc_per_node 2 ~/scripts/gloo_ibverbs_test.py
"""

import os

os.environ["GLOO_DEVICE_TRANSPORT"] = "IBVERBS"

import torch
import torch.distributed as dist

dist.init_process_group("gloo")

rank = dist.get_rank()

if rank == 0:
    device = "cpu"
else:
    device = "cuda"

print(device)

t = torch.full((10, 100), fill_value=(rank+1), device=device)
target = torch.full((10, 100), fill_value=3, device=device)

dist.all_reduce(t)

torch.testing.assert_close(t, target)

t = torch.full((10, 100), fill_value=(rank+1), device=device)

if rank == 0:
    dist.send(t, dst=1)
else:
    dist.recv(t, src=0)
    torch.testing.assert_close(t, torch.full_like(t, 1))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153425
Approved by: https://github.com/fduwjj
2025-05-13 17:09:00 +00:00
cyy
15e08f9571 [submodule] Update ONNX to 1.18 (#152200)
Update ONNX to 1.18.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152200
Approved by: https://github.com/justinchuby, https://github.com/malfet
2025-05-13 04:18:45 +00:00
Jithun Nair
f11d7a5978 [ROCm] Update spack includes (#152569)
* Cleans up code in `caffe2/CMakeLists.txt` to remove individual ROCm library include paths and use `ROCM_INCLUDE_DIRS` CMake var instead
* `ROCM_INCLUDE_DIRS` CMake var is set in `cmake/public/LoadHIP.cmake` by adding all the ROCm packages that PyTorch depends on
* `rocm_version.h` is provided by the `rocm-core` package, so use the include directory for that component to be compliant with Spack
* Move `find_package_and_print_version(hip REQUIRED CONFIG)` earlier so that `hip_version.h` can be located in the hip package include dir for Spack
* `list(REMOVE_DUPLICATES ROCM_INCLUDE_DIRS)` to remove duplicate `/opt/rocm/include` entries in the non-Spack case
* Remove user-provided env var `ROCM_INCLUDE_DIRS` since `ROCM_PATH` already exists as a user-provided env var, which should be sufficient to locate the include directories for ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152569
Approved by: https://github.com/renjithravindrankannath, https://github.com/jeffdaily

Co-authored-by: Renjith Ravindran <Renjith.RavindranKannath@amd.com>
2025-05-09 21:36:38 +00:00
Milos Puzovic
642e9305eb Fixes detection of ArmPL on Linux platform (#150031)
On Linux it failed to detect that there is bin directory as it wasn't looking for armpl-info which is the only file that is in that directory on Linux and also adding link to math library as it is required to link against when checking for LAPACK functions.

Fixes #149610

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150031
Approved by: https://github.com/fadara01, https://github.com/malfet
2025-05-07 19:47:21 +00:00
cyy
2f09e79142 Fix Codegen.cmake warning (#153023)
Fix
```
CMake Warning (dev) in cmake/Codegen.cmake:
  A logical block opening on the line

    /var/lib/jenkins/workspace/cmake/Codegen.cmake:393 (if)

  closes on the line

    /var/lib/jenkins/workspace/cmake/Codegen.cmake:401 (endif)

  with mis-matching arguments.
```
by removing the condition in `endif`.

We could instead fix it, however, that is not best practice.  For example, cmake_lint warns that, and CMake says
```
The optional <condition> argument is supported for backward compatibility only.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153023
Approved by: https://github.com/aditew01, https://github.com/Skylion007
2025-05-07 12:45:20 +00:00
cyy
ac792a0dca [submodule] Bump ITTAPI to 3.25.5 (#150263)
It hasn't been updated for 3 years. And also to remove CMake 4 workaround.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150263
Approved by: https://github.com/sraikund16
2025-05-06 01:02:18 +00:00
Jane Xu
2107d87dc9 [BE] remove outdated warning about TORCH_CUDA_ARCH_LIST (#152715)
I saw this warning when compiling a 3rd party lib and did not agree with it. I'm not sure the original reason why we would want to force people to pass in TORCH_CUDA_ARCH_LIST to cmake vs set it as an env var. As a developer, it's much easier to set it as an env var or have it be autodetected. I also realized this warning was from before 2018!!! 7 years ago! And there are no plans to actually enforce this (nor should there be), so let's remove this misleading warning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152715
Approved by: https://github.com/malfet, https://github.com/zou3519
2025-05-02 23:00:51 +00:00
Nikita Shulga
07290bdcdc Skip search for MKL on ARM cpus (#145850)
It will not find it anyway and makes a bit easier parsing thru CMake log on non-x86 systems
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145850
Approved by: https://github.com/atalman
2025-05-02 18:39:49 +00:00
cyy
e9e1aacef8 Enable -Wunused on torch targets (#150077)
For GCC, ``-Wunused`` contains:
```
-Wunused-function
Warn whenever a static function is declared but not defined or a non\-inline static function is unused.

-Wunused-label
Warn whenever a label is declared but not used.
To suppress this warning use the unused attribute.

-Wunused-parameter
Warn whenever a function parameter is unused aside from its declaration.
To suppress this warning use the unused attribute.

-Wunused-variable
Warn whenever a local variable or non-constant static variable is unused aside from its declaration
To suppress this warning use the unused attribute.
```
For Clang, some of the diagnostics controlled by ``-Wunused`` are enabled by default:
```
Controls [-Wunused-argument](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-argument),
[-Wunused-but-set-variable](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-but-set-variable),
[-Wunused-function](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-function),
[-Wunused-label](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-label), [-Wunused-lambda-capture](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-lambda-capture),
[-Wunused-local-typedef](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-local-typedef),
[-Wunused-private-field](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-private-field),
[-Wunused-property-ivar](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-property-ivar),
[-Wunused-value](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-value), [-Wunused-variable](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-variable).
```
These checks are all usefull. This PR aims to enable ``-Wunused`` without breaking code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150077
Approved by: https://github.com/zou3519, https://github.com/wdvr
2025-05-02 07:14:19 +00:00
PyTorch MergeBot
6dadfc4457 Revert "Enable -Wunused on torch targets (#150077)"
This reverts commit 688adc9941.

Reverted https://github.com/pytorch/pytorch/pull/150077 on behalf of https://github.com/wdvr due to failing internally with use of undeclared identifier ([comment](https://github.com/pytorch/pytorch/pull/150077#issuecomment-2846499828))
2025-05-02 06:53:20 +00:00
cyy
688adc9941 Enable -Wunused on torch targets (#150077)
For GCC, ``-Wunused`` contains:
```
-Wunused-function
Warn whenever a static function is declared but not defined or a non\-inline static function is unused.

-Wunused-label
Warn whenever a label is declared but not used.
To suppress this warning use the unused attribute.

-Wunused-parameter
Warn whenever a function parameter is unused aside from its declaration.
To suppress this warning use the unused attribute.

-Wunused-variable
Warn whenever a local variable or non-constant static variable is unused aside from its declaration
To suppress this warning use the unused attribute.
```
For Clang, some of the diagnostics controlled by ``-Wunused`` are enabled by default:
```
Controls [-Wunused-argument](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-argument),
[-Wunused-but-set-variable](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-but-set-variable),
[-Wunused-function](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-function),
[-Wunused-label](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-label), [-Wunused-lambda-capture](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-lambda-capture),
[-Wunused-local-typedef](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-local-typedef),
[-Wunused-private-field](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-private-field),
[-Wunused-property-ivar](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-property-ivar),
[-Wunused-value](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-value), [-Wunused-variable](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-variable).
```
These checks are all usefull. This PR aims to enable ``-Wunused`` without breaking code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150077
Approved by: https://github.com/zou3519
2025-05-01 04:09:06 +00:00
Vinitha Vijayan
e872bf8f88 Avoid linking multiple OMP runtimes in libtorch_cpu.so if BLAS used is OpenBLAS. (#147725)
When PyTorch is built with OpenBLAS support and libopenblas is ldrectly linked with libgomp.so the libtorch_cpu.so ends up getting multiple omp runtimes linked against it. This may result in unexpected runtime behaviour /regression. This patch fixes this by avoiding linking against libomp.so if OpenBLAS is linked against libgomp.so

Fixes #146603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147725
Approved by: https://github.com/albanD
2025-04-29 23:39:48 +00:00
Tristan Rice
c039cb1a06 submodules: point gloo to new home in pytorch/ (#152438)
Gloo moved to the PyTorch GitHub org. This updates PyTorch to point to the new location.

https://github.com/pytorch/gloo

Test plan:

CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152438
Approved by: https://github.com/fduwjj
2025-04-29 20:42:24 +00:00
Ryo Suzuki
fcbbb03d48 Extend vec backend with BF16 SVE intrinsics (#143666)
- Following the work in https://github.com/pytorch/pytorch/pull/119571, BF16 SVE intrinsics are added to the Vectorized class, providing ~1.7x speedup on `silu` and `softmax`.
- Added bf16 detection in CMake
- Added a guard for native NEON code to prevent compilation errors

@aditew01 @maajidkhann please have a look

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143666
Approved by: https://github.com/malfet, https://github.com/aditew01, https://github.com/nikhil-arm

Co-authored-by: Aditya Tewari <aditya.tewari@arm.com>
2025-04-28 18:25:44 +00:00
Anthony Shoumikhin
7cae7902a2 Add scripts to check xrefs and urls (#151844)
Traverses the docs and code to find any broken links
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151844
Approved by: https://github.com/huydhn
2025-04-28 09:30:07 +00:00
Anthony Shoumikhin
e2f9759bd0 Fix broken URLs (#152237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237
Approved by: https://github.com/huydhn, https://github.com/malfet
2025-04-27 09:56:42 +00:00
cyy
65b845f82b Remove useless options for third-party ONNX build (#147616)
Treat ONNX CMake targets properly and remove unneeded options.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147616
Approved by: https://github.com/malfet
2025-04-26 02:34:08 +00:00
Prachi Gupta
b8f4dc5a9f [ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs (#146264)
In this approach, we are catching any lane within a wave that is doing fastatomics to the same destination address and computing the sum on the CU. This is leading to 3x improvement in scatter_add performance and 2x improvement in index_select.

scatter_add performance on MI300x:
dtype|Baseline (before optimizations)|opportunistic fastatomics
-------|----------------------------------|----------------------------------
f32|1.389425039|0.430447996
fp16|2.195472956|0.779729486
bf16|2.194051027|0.784599513

Using the following reproducer
```
import torch
import triton

def main():
    dtype = torch.float32
    dim = 1305301
    a = torch.rand(100, device="cuda", dtype=dtype)
    index = torch.randint(0, 100, (dim,), device="cuda")
    src = torch.rand(dim, device="cuda", dtype=dtype)

    print("=" * 20)
    print(
        triton.testing.do_bench(
            lambda: a.scatter_add(0, index, src),
            return_mode="median",
        )
    )
    print("=" * 20)

if __name__ == "__main__":
    main()
```

co-authored by: @amd-hhashemi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146264
Approved by: https://github.com/jeffdaily, https://github.com/mxz297

Co-authored-by: Hashem Hashemi <hashem.hashemi@amd.com>
2025-04-22 21:55:40 +00:00
Wei Wang
b74be52454 [CUDA][NVTX] Move nvtx3 code from cmake/public/cuda.cmake to cmake/Dependencies.cmake (#151583)
Fixes [#147220]

Context: In the CUDA NVTX world, there are NVTX v2 and NVTX v3. As announced in CUDA release notes, e.g. [CUDA 12.8 Update 1]( https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#deprecated-or-dropped-operating-systems) "`NVTX v2 is deprecated. To migrate to NVTX v3. Change your code from: #include <nvtoolsext.h> to #include "nvtx3/nvtoolsext.h`". This header is included in the toolkit."
On the PyTorch side, TORCH_CUDA_USE_NVTX3 compile time macro is used and it is set to true when (most of the time) nvtx3 is found. nvtx3 is found in two cases: 1) USE_SYSTEM_NVTX=0 (default), torch build process would automatically look for the nvtx3 in pytorch/third_party/nvtx. This is the most common and default case. 2) when USE_SYSTEM_NVTX=1 is used, nvtx3 is found from the installed CUDA toolkit (e.g. CUDA 12.8 and even some earlier cuda versions).
As described in #147220, the reason it can find pytorch/third_party/nvtx is because it used
6f035d8462/cmake/public/cuda.cmake (L176)
note the "PROJECT_SOURCE_DIR" usage in [pytorch/cmake/public/cuda.cmake](6f035d8462/cmake/public/cuda.cmake (L176))

Before this PR:
PyTorch build would succeed in finding nvtx3 due to the above described process, everything is good. But downstream projects like torchvision *can* fail, and would by default fail because the following are happening:
1) USE_SYSTEM_NVTX=0 is used (and most likely it is this case because it is the default)
2) NVTX v2 can no longer be found (e.g. future CUDA versions because deprecation would eventually become removal)
3) TorchVision cannot find NVTX3 either because torchvision was invoking [pytorch/cmake/public/cuda.cmake] but the PROJECT_SOURCE_DIR is no longer the pytorch source but torchvision source!
4) One workaround is to "USE_SYSTEM_NVTX=1" but users have to explicitly set this and do the plumbing work

After this PR:
PyTorch can still find nvtx3 because the part of the code that finds nvtx3 is just moved to a new place. The CI logs are showing it being able to find nvtx3. e.g. [this job](https://productionresultssa14.blob.core.windows.net/actions-results/47f8efaa-0afe-4e1f-bc94-0a82629941cb/workflow-job-run-dc8201b1-845b-5da1-a6ea-d3360ce1b508/logs/job/job-logs.txt?rsct=text%2Fplain&se=2025-04-18T20%3A38%3A05Z&sig=yMd6egC%2Banl3lR%2BudXFX18bfUH189z0DTGLtscHQJwY%3D&ske=2025-04-19T06%3A21%3A45Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2025-04-18T18%3A21%3A45Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-01-05&sp=r&spr=https&sr=b&st=2025-04-18T20%3A28%3A00Z&sv=2025-01-05), which reads "`Found nvtx3: C:/actions-runner/_work/pytorch/pytorch/pytorch/third_party/NVTX/c/include`"
For torchvision, it still invoke  [pytorch/cmake/public/cuda.cmake] but it no longer tries to find nvtx3 as torchvision is not using nvtx3 (if in future it uses, it can set USE_SYSTEM_NVTX=1 by default). So it would avoid the error reported in [#147220]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151583
Approved by: https://github.com/eqy, https://github.com/atalman, https://github.com/malfet
2025-04-18 21:18:09 +00:00
Nikita Shulga
331423e5c2 Fix tensorpipe compilation with clang-17 (#151344)
By suppressing `missing-template-arg-list-after-template-kw` warning, which seems to be required to compile Google's libnop, which is in a semi-abandoned state now
```
In file included from /Users/malfet/git/pytorch/pytorch/third_party/tensorpipe/third_party/libnop/include/nop/base/variant.h:21:
/Users/malfet/git/pytorch/pytorch/third_party/tensorpipe/third_party/libnop/include/nop/types/variant.h:241:30: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
  241 |     index_ = value_.template Construct(std::forward<Args>(args)...);
      |                              ^
/Users/malfet/git/pytorch/pytorch/third_party/tensorpipe/third_party/libnop/include/nop/types/variant.h:258:26: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
  258 |     if (!value_.template Assign(TypeTag<T>{}, index_, std::forward<U>(value))) {
      |                          ^
/Users/malfet/git/pytorch/pytorch/third_party/tensorpipe/third_party/libnop/include/nop/types/variant.h:265:26: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
  265 |     if (!value_.template Assign(index_, std::forward<T>(value))) {
      |                          ^
3 errors generated.
```

Fixes https://github.com/pytorch/pytorch/issues/151316

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151344
Approved by: https://github.com/ZainRizvi, https://github.com/seemethere
2025-04-15 22:18:06 +00:00
Shivam Raikundalia
ad5e9065ac [Profiler/Easy] Remove temp flag for on-demand Memory Snapshot (#151068)
Summary: Now that we have profiler impl in we don't need the temporary flag. submodule update too.

Test Plan: CI

Reviewed By: sanrise

Differential Revision: D72672186

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151068
Approved by: https://github.com/davidberard98
2025-04-11 18:50:25 +00:00
Yuki Kobayashi
78b3d71ece Docs: Add missing whitespace in the cmake warning message (#150929)
A trailing whitespace is needed to be concatenated to the following string correctly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150929
Approved by: https://github.com/Skylion007
2025-04-10 02:50:56 +00:00
Nikita Shulga
ec5f2e3028 [Build] Fix fbgemm build with gcc-12+ (#150847)
By suppressing more warnings

TODO: fbgemm pin really needs to get updated

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150847
Approved by: https://github.com/atalman, https://github.com/Skylion007
2025-04-08 16:03:40 +00:00
Shivam Raikundalia
99c9a31386 [submodule] [Snapshot/Profiler] Memory Snapshot On Demand (#150559)
Summary:
Profiler side of memory snapshot.

1. Add API to actually do snapshot when client interface is called
2. Add ifdefs to builds so that kineto hooks snapshot correctly.

Design Philosophy: There is one interesting part of this implementation and it is during export. For export we are callign the python impl of the export rather than CPP even though we are already in CPP. This is because it is better to simply have one path of export rather than 2. Personally, I want there to be parity between auto-trace and on-demand so it if we can limit the side paths then we will have an easier time maintaining this relationship

Test Plan: {F1976563426}

Reviewed By: sanrise

Differential Revision: D70733247

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150559
Approved by: https://github.com/sanrise
2025-04-07 13:04:38 +00:00
Natalia Gimelshein
55e62ff74a bf16 grouped gemm (#150374)
Enabled bf16 grouped gemm with an API similar to _scaled_group_gemm, except without scale and fast accum arguments. All transpose variants are enabled, unlike scaled gemm. Ideally we'd factor out a lot more code from scaled gemm, currently there's a lot of repetition between scaled and non-scaled versions. I factored out only a helper kernel that prepares arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150374
Approved by: https://github.com/drisspg
2025-04-06 04:53:24 +00:00
Nikita Shulga
91666eef60 Update gloo submodule (#150320)
That updates its CMake minimum version(via https://github.com/facebookincubator/gloo/pull/424 ) and removes cmake-4.0.0 workarounds for gloo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150320
Approved by: https://github.com/atalman
2025-03-31 22:40:27 +00:00
Faa Diallo
423e4a4568 [ROCm] cmake 4 workaround for hiprtc (#150324)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150324
Approved by: https://github.com/jeffdaily, https://github.com/atalman, https://github.com/malfet
2025-03-31 21:55:53 +00:00
Aleksei Nikiforov
ab342d3793 Make PyTorch buildable by CMake-4.x on s390x (#150294)
This is a continuation of
https://github.com/pytorch/pytorch/pull/150203
that fixes nightly build on s390x.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150294
Approved by: https://github.com/malfet
2025-03-31 18:10:02 +00:00
Nikita Shulga
493c7fa66f [Cmake] Make PyTorch buildable by CMake-4.x (#150203)
By turning on compatibility mode for protobuf, nnpack, PSimd and FP16, ittapi, TensorPipe and Gloo
Update CMake requirements

 Revert 0ece461cca and b0901d62ae to test that it actually works

TODO:
  - Update/get rid of those libraries

Fixes https://github.com/pytorch/pytorch/issues/150149

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150203
Approved by: https://github.com/clee2000
2025-03-29 01:39:13 +00:00
PyTorch MergeBot
a9d08ed0ce Revert "Parallelize sort (#149505)"
This reverts commit 842d51500b.

Reverted https://github.com/pytorch/pytorch/pull/149505 on behalf of https://github.com/ZainRizvi due to Reverting since this is breaking inductor builds on trunk. More details [GH job link](https://github.com/pytorch/pytorch/actions/runs/14000726218/job/39207447863) [HUD commit link](842d51500b) ([comment](https://github.com/pytorch/pytorch/pull/149505#issuecomment-2759082390))
2025-03-27 18:43:11 +00:00
Aleksandar Samardžić
43cc954f88 Refactor row-wise scaled MM (#149978)
1. Add config selection for SM89.
2. Only build kernels if compiling for given arch.
3. Factor out CMake code to enforce compiling for needed archs for individual files into a function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149978
Approved by: https://github.com/drisspg
2025-03-26 23:49:41 +00:00
Nichols A. Romero
c0af782f30 [ROCm] Change LoadHIP to use find_file for rocm_version.h (#149983)
Fixes #149805

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149983
Approved by: https://github.com/jeffdaily
2025-03-26 21:26:41 +00:00
cyy
79e8a69257 Enable move warnings for torch targets (#149923)
This PR enables more move warnings for torch targets and fixes some code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149923
Approved by: https://github.com/malfet
2025-03-26 08:38:13 +00:00
Nikita Shulga
5a7588f183 [Build] Remove pre-CXX11 ABI logic from build script (#149888)
Only keep one in check_binary_symbols to make sure there are no pre-CXX11 ABI symbols in the library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149888
Approved by: https://github.com/atalman, https://github.com/seemethere
ghstack dependencies: #149887
2025-03-25 03:17:16 +00:00