Dmitry Rogozhkin
9852c6d236
xpu: fix 3rd party builds on systems with cmake<3.25 ( #135767 )
...
Cmake LINUX variable is available on starting from cmake 3.25. Better to use CMAKE_SYSTEM_NAME instead to relax cmake version requirement.
See: https://cmake.org/cmake/help/v3.25/variable/LINUX.html
Fixes : #135766
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135767
Approved by: https://github.com/malfet , https://github.com/guangyey
2024-09-12 05:31:01 +00:00
CaoE
f7c0c06692
Add oneDNN BRGEMM support on CPU ( #131878 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131878
Approved by: https://github.com/jgong5 , https://github.com/peterbell10
2024-09-07 13:22:30 +00:00
min-jean-cho
ecbd715363
[Intel GPU][Windows] Fix overriding default CMAKE_CXX_FLAGS ( #135093 )
...
The root cause is that `/EHsc` is part of the default `CMAKE_CXX_FLAGS` in CMake.
Fix to not override the default `CMAKE_CXX_FLAGS`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135093
Approved by: https://github.com/EikanWang , https://github.com/atalman
2024-09-05 12:52:43 +00:00
Edward Z. Yang
a258844a32
Properly handle empty CPUINFO variable ( #134916 )
...
Fixes https://github.com/pytorch/pytorch/issues/134915
But I did not root cause why CPUINFO is totally empty to begin with...
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134916
Approved by: https://github.com/Skylion007
2024-09-03 15:59:59 +00:00
Yu, Guangye
3402a5d865
fix windows xpu build issue ( #133845 )
...
# Motivation
If build XPU via oneAPI 2024.2, it will fail because `sycl-preview.lib` exists in windows. And linking the unexpected lib results in `error LNK2019: unresolved external symbol`.
# Solution
Use explicitly `sycl-preview` in linux build only.
# Additional Context
For `find_library`, please note that the variable will not be updated if it has been stored.
```
If the library is found the result is stored in the variable and the search will not be repeated unless the variable is cleared.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133845
Approved by: https://github.com/min-jean-cho , https://github.com/EikanWang , https://github.com/atalman , https://github.com/malfet
2024-08-29 23:53:32 +00:00
min-jean-cho
416a7894fe
[Windows][XPU] Disable Kineto PTI on Windows only ( #134620 )
...
Disable Kineto + XPU PTI on Windows only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134620
Approved by: https://github.com/guangyey , https://github.com/malfet
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-08-29 20:58:55 +00:00
Gregory Comer
3b40b07efb
Update PyTorch for XNNPACK 87ee0b4 ( #134518 )
...
Summary: Update XNNPACK library version.
Test Plan: Combined diff CI is clean: D61586079 (all changes, has to be split out for export).
Differential Revision: D61822610
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134518
Approved by: https://github.com/mcr229
2024-08-28 19:24:04 +00:00
Xinya Zhang
5fd670e0ef
[ROCM] Properly disable Flash Attention/Efficient Attention with environment variables ( #133866 )
...
Now `USE_FLASH_ATTENTION=0 USE_MEM_EFF_ATTENTION=0 python setup.py` can compile correctly
Fixes #125230
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133866
Approved by: https://github.com/jithunnair-amd , https://github.com/jeffdaily , https://github.com/malfet
2024-08-27 18:24:29 +00:00
cyyever
c638a40a93
[Caffe2] Remove unused AVX512 code ( #133160 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133160
Approved by: https://github.com/albanD
2024-08-23 23:16:16 +00:00
Zitong Zhan
90c821814e
SparseCsrCUDA: cuDSS backend for linalg.solve ( #129856 )
...
This PR switches to cuDSS library and has the same purpose of #127692 , which is to add Sparse CSR tensor support to linalg.solve.
Fixes #69538
Minimum example of usage:
```
import torch
if __name__ == '__main__':
spd = torch.rand(4, 3)
A = spd.T @ spd
b = torch.rand(3).to(torch.float64).cuda()
A = A.to_sparse_csr().to(torch.float64).cuda()
x = torch.linalg.solve(A, b)
print((A @ x - b).norm())
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856
Approved by: https://github.com/amjames , https://github.com/lezcano , https://github.com/huydhn
Co-authored-by: Zihang Fang <zhfang1108@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2024-08-22 07:57:30 +00:00
cyy
c3d02fa390
[Reland2] Update NVTX to NVTX3 ( #109843 )
...
Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code. The advantage of NVTX3 over NVTX is that it is a header-only library so that linking with NVTX3 can greatly simplify our CMake and other building scripts for finding libraries in user environments. In addition, NVTX are indeed still present in the latest CUDA versions, but they're no longer a compiled library: It's now a header-only library. That's why there isn't a .lib file anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109843
Approved by: https://github.com/peterbell10 , https://github.com/eqy
Co-authored-by: Ivan Zaitsev <108101595+izaitsevfb@users.noreply.github.com>
2024-08-20 16:33:26 +00:00
Mikayla Gawarecki
018e48c337
[Reland] Add wrappers for synchronous GPUDirect Storage APIs ( #133489 )
...
Reland #130633
USE_CUFILE turned off by default in this version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489
Approved by: https://github.com/albanD
2024-08-15 17:11:52 +00:00
PyTorch MergeBot
fa1d7b0262
Revert "Remove unused Caffe2 macros ( #132979 )"
...
This reverts commit da65cfbdea .
Reverted https://github.com/pytorch/pytorch/pull/132979 on behalf of https://github.com/ezyang due to these are apparently load bearing internally ([comment](https://github.com/pytorch/pytorch/pull/132979#issuecomment-2284666332 ))
2024-08-12 18:34:56 +00:00
cyy
da65cfbdea
Remove unused Caffe2 macros ( #132979 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132979
Approved by: https://github.com/ezyang
2024-08-09 04:48:20 +00:00
cyy
05e8e87a69
[Submodule] Remove foxi ( #132976 )
...
It is not used after removal of Caffe2 code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132976
Approved by: https://github.com/ezyang
2024-08-09 03:46:52 +00:00
Chen, Zejun
26b0011fb8
[XPU][Kineto Submodule] Introduce kineto-based XPU profiler ( #130811 )
...
As XPU became a PyTorch built-in device, the profiler support is indispensable part of functionality completeness. This PR is associated with the PR to introduce XPU profiler plugin into the kineto. When USE_XPU is enabled, the LIBKINETO_NOXPUPTI option will be suppressed accordingly, which allows kineto to build with XPU profiler plugin.
Associated PR to introduce kineto-based XPU profiler into kineto:
https://github.com/pytorch/kineto/pull/961
Also updates the Kineto Submodule to include XPU changes.
Co-authored-by: Aaron Enye Shi <enye.shi@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130811
Approved by: https://github.com/aaronenyeshi
2024-08-07 18:41:37 +00:00
Yu, Guangye
92bebb46fa
Support XPU ABI=0 build ( #130110 )
...
# Motivation
This PR intends to support ABI=0 build for XPU backend.
# Additional Context
The major change is adding a compilation option `-D__INTEL_PREVIEW_BREAKING_CHANGES` for the host compiler(gcc) and `-fpreview-breaking-changes` for XPU device kernel code compiler(icpx), why?
Because we use
- gcc to compile host code and link SYCL runtime. So we need to pass `-D__INTEL_PREVIEW_BREAKING_CHANGES` to tell the host compiler invoking the ABI-neutral API included in SYCL. And
- use icpx to compile device kernel code and link SYCL runtime. So we need to pass `-fpreview-breaking-changes` to tell the device kernel compiler building ABI-neutral code. Besides,
- `libsycl-preview.so` is an ABI-neutral library but `libsycl.so` is not.
This PR depends on https://github.com/pytorch/pytorch/pull/131643 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130110
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD
2024-08-01 21:42:14 +00:00
PyTorch MergeBot
e191b83462
Revert "Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )"
...
This reverts commit 709ddf7a9d .
Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2253239607 ))
2024-07-26 18:08:20 +00:00
Mikayla Gawarecki
709ddf7a9d
Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )
...
Based in part on https://github.com/NVIDIA/apex/pull/1774
Differential Revision: [D60155434](https://our.internmc.facebook.com/intern/diff/D60155434 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-25 22:23:38 +00:00
cyy
803c5b8640
[CMake] Fix private compile options for CUDA code ( #130546 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130546
Approved by: https://github.com/ezyang
2024-07-25 00:22:18 +00:00
PyTorch MergeBot
e4b5645f83
Revert "Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )"
...
This reverts commit 5b5e0698a5 .
Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738 ))
2024-07-23 17:19:34 +00:00
Mikayla Gawarecki
5b5e0698a5
Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )
...
Based in part on https://github.com/NVIDIA/apex/pull/1774
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-22 14:51:24 +00:00
Xu Han
f1456c74a0
Fix mkl-static issue for Windows. ( #130697 )
...
Background:
We found the pytorch Windows release/2.4 performance regression: https://github.com/pytorch/pytorch/issues/130619
After some debug works, I found the pytorch Windows static mkl build options are wrong:
<img width="1049" alt="image" src="https://github.com/user-attachments/assets/38692142-bfca-4c98-8092-6e105c82bb13 ">
1. Thread lib is wrong.
2. Miss `openmp` lib and config.
> Debug history: https://github.com/pytorch/pytorch/issues/130619#issuecomment-2226782504 and https://github.com/pytorch/pytorch/issues/130619#issuecomment-2226418611
This PR will fix `mkl-static` build options issue.
<img width="863" alt="image" src="https://github.com/user-attachments/assets/834f6cee-7e6d-4d74-b2bc-8a270f05e429 ">
Reference:
<img width="482" alt="image" src="https://github.com/user-attachments/assets/8184dadb-f230-4062-a49f-51df1d7285f5 ">
https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.c6izlg
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130697
Approved by: https://github.com/jgong5 , https://github.com/atalman
2024-07-15 19:28:11 +00:00
Nikita Shulga
c547b2e871
Fix python detection in cuda.cmake ( #130651 )
...
If Python package has not been detected previously, call it here
This fixes regression introduced by https://github.com/pytorch/pytorch/pull/128801 that results in annoying, but harmless warning reported in https://github.com/pytorch/pytorch/issues/129777
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130651
Approved by: https://github.com/Skylion007
2024-07-15 03:45:31 +00:00
cyy
c5b66c3fe1
Enable -Werror=pedantic on torch targets ( #130319 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130319
Approved by: https://github.com/ezyang
2024-07-11 12:27:32 +00:00
cyy
85b8503621
[Caffe2] Remove Caffe2 documentation ( #130089 )
...
Due to the removal of Caffe2 code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130089
Approved by: https://github.com/r-barnes , https://github.com/albanD
2024-07-10 00:52:16 +00:00
cyy
a6345d3477
[CMake] [3/N] Remove unused code ( #130322 )
...
Some functions used by Caffe2 were removed along with some outdated checks. Follows #130006 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130322
Approved by: https://github.com/r-barnes
2024-07-09 19:33:33 +00:00
Yichen Yan
953c6476bd
[CMAKE] Look for Development.Module instead of Development ( #129669 )
...
Based on the [cmake issue](https://gitlab.kitware.com/cmake/cmake/-/issues/23716 ) and [manylinux issue](https://github.com/pypa/manylinux/issues/1347 ), when building a python module, it should find the `Development.Module` module, not `Development`, which includes `Development.Module` and `Development.Embed`, and will expect the shared python library only. After this PR and before #124613 , pytorch could be built with a static libpython (e.g. in manylinux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129669
Approved by: https://github.com/malfet
2024-07-09 09:16:43 +00:00
cyy
2f219f7d79
Enforce unused-{variable/function} checks to all torch targets ( #130189 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130189
Approved by: https://github.com/ezyang
2024-07-06 16:03:01 +00:00
cyy
e5841bb8d5
[3/N] Enforce unused-function and unused-variable checks ( #130084 )
...
Follows #129878 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130084
Approved by: https://github.com/ezyang
2024-07-05 23:56:00 +00:00
cyy
99ec7bbee7
Force inconsistent-missing-override for torch targets ( #130010 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130010
Approved by: https://github.com/ezyang
2024-07-04 02:37:57 +00:00
Shivam Raikundalia
a21d4363d2
[Profiler] Remove all instances of TMP_USE_TSC_AS_TIMESTAMP ( #129973 )
...
Summary: Now that D56584521 is in, we can remove all insteances of TMP_USE_TSC_AS_TIMESTAMP
Test Plan:
Ran resnet. Trace looks good
https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Jun_27_14_46_01.1967733.pt.trace.json.gz&bucket=gpu_traces
Reviewed By: aaronenyeshi, swolchok
Differential Revision: D59132793
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129973
Approved by: https://github.com/aaronenyeshi
2024-07-03 19:28:52 +00:00
cyy
46366888d7
Remove outdated CMake code ( #129851 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129851
Approved by: https://github.com/ezyang
2024-07-02 00:40:37 +00:00
Jithun Nair
87693b534c
[ROCm] Use AOTriton as a dynamic library ( #129094 )
...
This PR enables using AOTriton as a shared library dependency instead of a static one.
Resolves the issue of linker errors when trying to build PyTorch for a lot of (>7 or so) gfx archs due to huge size of aotriton static library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129094
Approved by: https://github.com/malfet
2024-07-01 21:39:27 +00:00
cyy
ca5d13c672
[1/N] Enable unused variable warnings on torch_cpu and fix some violations ( #128670 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128670
Approved by: https://github.com/ezyang
2024-07-01 14:56:46 +00:00
Nikita Shulga
fe4032fe20
[BE][CMake] Do not use EXEC_PROGRAM ( #129714 )
...
It was deprecated since CMake-3.0 in favor of `execute_process`, see https://cmake.org/cmake/help/v3.18/command/exec_program.html
This makes the following warning disappear:
```
CMake Warning (dev) at cmake/Modules/FindARM.cmake:5 (EXEC_PROGRAM):
Policy CMP0153 is not set: The exec_program command should not be called.
Run "cmake --help-policy CMP0153" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
Use execute_process() instead.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129714
Approved by: https://github.com/kit1980
2024-06-28 13:29:52 +00:00
Nikita Shulga
4b598d87d3
Fix FindBLAS.cmake ( #129713 )
...
Fixes regression introduced by https://github.com/pytorch/pytorch/pull/125227 by adding `INCLUDE(CheckFunctionExists)` that fixes
```
CMake Error at cmake/Modules/FindBLAS.cmake:413 (check_function_exists):
Unknown CMake command "check_function_exists".
```
Fixes https://github.com/pytorch/pytorch/issues/129693
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129713
Approved by: https://github.com/kit1980
2024-06-28 02:15:16 +00:00
Shivam Raikundalia
1d0efedc85
[Profiler] Add TSC Clock Callback to CUPTI ( #125036 )
...
Summary:
Right now we use the default clock for CUPTI which is not monotonic nor particularly fast. We have already added the Kineto side of the implementation here: https://www.internalfb.com/diff/D56525885
This diff only adds the compile flags such that the TSC format is used and sets the converter using a libkineto call in the profiler
Test Plan:
Obtained following trace using resnet test:
https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Apr_25_11_03_18.3862943.pt.trace.json.gz&bucket=gpu_traces
TBD: Add benchmarks
Differential Revision: D56584521
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125036
Approved by: https://github.com/aaronenyeshi
2024-06-27 21:07:43 +00:00
Chirag Pandya
64f1111d38
Expose nholmann json to torch ( #129570 )
...
Summary:
Expose nlohmann json library so that it can be used from inside Pytorch. The library already exists in the `third_party` directory. This PR is making `nlohmann/json.hpp` header available to be used from `torch.distributed`.
The next PR makes actual use of this header.
imported-using-ghimport
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision: D59035246
Pulled By: c-p-i-o
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129570
Approved by: https://github.com/d4l3k , https://github.com/malfet
2024-06-26 21:59:26 +00:00
vinithakv
f8db12a538
Fix logic to find sbgemm in BLAS library ( #125227 )
...
Current logic to set the HAS_SBGEMM flag is ignored in case the BLAS libraries are found already, ie, if set from environment variable BLAS=OpenBLAS . If BLAS_LIBRARIES are already set the code to find if BLAS_LIBRARY has sbgemm is never executed. The following commit brings out this logic outside unconditionally.
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125227
Approved by: https://github.com/malfet
2024-06-25 16:34:38 +00:00
drisspg
cb1c56caba
Set target dependencies to always build for sm90a on rowwise scaling ( #129402 )
...
# Summary
Instead of landing global builder changes; https://github.com/pytorch/builder/pull/1878
This PR targets only the Rowwise file and adds the sm90a featurs.
Verified locally by setting:
```
TORCH_CUDA_ARCH_LIST=9.0
```
We can see in the build.ninja file that the proper flags are set:
```
build caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RowwiseScaledMM.cu.o: CUDA_COMPILER__torch_cuda_unscanned_Release /home/drisspg/meta/pytorch/aten/src/ATen/native/cuda/RowwiseScaledMM.cu || cmake_object_order_depends_target_torch_cuda
DEFINES = -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS
DEP_FILE = caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RowwiseScaledMM.cu.o.d
FLAGS = -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_90,code=sm_90 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -std=c++17 -Xcompiler=-fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Xcompiler=-Wall,-Wextra,-Wdeprecated,-Wno-unused-parameter,-Wno-missing-field-initializers,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-unused-function,-Wno-maybe-uninitialized -Wno-deprecated-copy -gencode arch=compute_90a,code=sm_90a
INCLUDES = -I/home/drisspg/meta/pytorch/build/aten/src -I/home/drisspg/meta/pytorch/aten/src -I/home/drisspg/meta/pytorch/build -I/home/drisspg/meta/pytorch -I/home/drisspg/meta/pytorch/third_party/onnx -I/home/drisspg/meta/pytorch/build/third_party/onnx -I/home/drisspg/meta/pytorch/third_party/foxi -I/home/drisspg/meta/pytorch/build/third_party/foxi -I/home/drisspg/meta/pytorch/aten/src/THC -I/home/drisspg/meta/pytorch/aten/src/ATen/cuda -I/home/drisspg/meta/pytorch/aten/src/ATen/../../../third_party/cutlass/include -I/home/drisspg/meta/pytorch/aten/src/ATen/../../../third_party/cutlass/tools/util/include -I/home/drisspg/meta/pytorch/build/caffe2/aten/src -I/home/drisspg/meta/pytorch/aten/src/ATen/.. -I/home/drisspg/meta/pytorch/build/nccl/include -I/home/drisspg/meta/pytorch/c10/cuda/../.. -I/home/drisspg/meta/pytorch/c10/.. -I/home/drisspg/meta/pytorch/third_party/tensorpipe -I/home/drisspg/meta/pytorch/build/third_party/tensorpipe -I/home/drisspg/meta/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/drisspg/meta/pytorch/torch/csrc/api -I/home/drisspg/meta/pytorch/torch/csrc/api/include -isystem /home/drisspg/meta/pytorch/build/third_party/gloo -isystem /home/drisspg/meta/pytorch/cmake/../third_party/gloo -isystem /home/drisspg/meta/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /home/drisspg/meta/pytorch/third_party/protobuf/src -isystem /home/drisspg/meta/pytorch/third_party/ittapi/include -isystem /home/drisspg/meta/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda-12.3/include -isystem /home/drisspg/meta/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /home/drisspg/meta/pytorch/third_party/ideep/include -isystem /home/drisspg/meta/pytorch/cmake/../third_party/cudnn_frontend/include
OBJECT_DIR = caffe2/CMakeFiles/torch_cuda.dir
OBJECT_FILE_DIR = caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129402
Approved by: https://github.com/malfet
2024-06-25 13:54:51 +00:00
cyy
479ce5e2f4
Remove outdated CUDA code from CMake ( #128801 )
...
It's possible to simplify some CUDA handling logic in CMake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128801
Approved by: https://github.com/r-barnes , https://github.com/malfet
2024-06-21 15:00:00 +00:00
cyy
9ebec1f345
Enable Wunused-function in torch_cpu ( #128576 )
...
Follows #128499
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128576
Approved by: https://github.com/ezyang , https://github.com/r-barnes
2024-06-14 00:12:58 +00:00
PyTorch MergeBot
75b0720a97
Revert "Use hidden visibility in OBJECTCXX files ( #127265 )"
...
This reverts commit 669560d51a .
Reverted https://github.com/pytorch/pytorch/pull/127265 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I suspect that it causes this failure https://github.com/pytorch/vision/issues/8478 on vision where its C++ extension could not be loaded on macOS ([comment](https://github.com/pytorch/pytorch/pull/127265#issuecomment-2156401838 ))
2024-06-09 09:05:17 +00:00
Xinya Zhang
d34075e0bd
Add Efficient Attention support on ROCM ( #124885 )
...
This patch implements `with sdpa_kernel(SDPBackend.EFFICIENT_ATTENTION):` by reusing AOTriton's accelerated SDPA implementation
Known limitations:
- Only supports MI200/MI300X GPUs
- Does not support varlen
- Does not support `CausalVariant`
- Optional arguments `causal_diagonal` and `seqlen_k` in `_efficient_attention_forward/backward` must be null
- Does not work well with inductor's SDPA rewriter. The rewriter has been updated to only use math and flash attention on ROCM.
This PR also uses a different approach of installing AOTriton binary instead of building it from source in the base docker image. More details on motivation: https://github.com/pytorch/pytorch/pull/124885#issuecomment-2153229129
`PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_transformers.py` yields "55028 passed, 20784 skipped" results with this change. [Previous result](https://hud.pytorch.org/pr/127528 ) of `test_transformers.py` was 0 error, 0 failure, 55229 skipped out of 75517 tests in total (the XML report does not contain total number of passed tests).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124885
Approved by: https://github.com/malfet
2024-06-08 22:41:05 +00:00
sdp
b4a0161449
Build SYCL kernels for ATen XPU ops on Native Windows (take 2) ( #127390 )
...
Original PR https://github.com/pytorch/pytorch/pull/126725 is closed due to bad rebase.
-------
As proposed in https://github.com/pytorch/pytorch/issues/126719 , we are enabling PyTorch XPU on Native Windows on Intel GPU.
This PR enables XPU build on Windows as the first step of #126719 :
- Enable `USE_XPU` build on Windows using MSVC as host compiler. The use of MSVC as host compiler seamlessly aligns with the existing PyTorch build on Windows.
- Build oneDNN GPU library on Windows.
Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127390
Approved by: https://github.com/guangyey , https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/ezyang
2024-06-06 01:41:06 +00:00
cyy
3d617333e7
Simplify CMake code ( #127683 )
...
Due to the recent adoption of find(python), it is possible to further simplify some CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127683
Approved by: https://github.com/ezyang
2024-06-05 15:17:31 +00:00
cyy
df75a9dc80
Remove Caffe2/onnx ( #127991 )
...
Remove Caffe2/onnx since it is not used. Other tiny fixes are also applied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127991
Approved by: https://github.com/ezyang
2024-06-05 15:10:12 +00:00
Ting Lu
1b704a160f
Add linker script optimization flag to CMAKE rule for CUDA ARM wheel ( #127514 )
...
Original PR - https://github.com/pytorch/pytorch/pull/127220
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127514
Approved by: https://github.com/Aidyn-A , https://github.com/atalman
2024-06-04 20:51:44 +00:00
Tristan Rice
597922ba21
Reapply "distributed debug handlers ( #126601 )" ( #127805 )
...
This reverts commit 7646825c3e .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127805
Approved by: https://github.com/PaliC
2024-06-04 19:44:30 +00:00