Commit Graph

1146 Commits

Author SHA1 Message Date
Jeff Daily
28c0b07d19 [ROCm] remove HCC references (#111975)
- rename `__HIP_PLATFORM_HCC__` to `__HIP_PLATFORM_AMD__`
- rename `HIP_HCC_FLAGS` to `HIP_CLANG_FLAGS`
- rename `PYTORCH_HIP_HCC_LIBRARIES` to `PYTORCH_HIP_LIBRARIES`
- workaround in tools/amd_build/build_amd.py until submodules are updated

These symbols have had a long deprecation cycle and will finally be removed in ROCm 6.0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111975
Approved by: https://github.com/ezyang, https://github.com/hongxiayang
2023-10-26 02:39:10 +00:00
Nikita Shulga
6dc54fe8d6 [BE] Compile FBGEMM with ASAN (#111266)
If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined`

This fixes regression in sanitizer coverage introduced by https://github.com/pytorch/pytorch/pull/93147  that change effects of sanitizer from the entire project to just torch libraries, and finally allows one to reliably catch regression reported in https://github.com/pytorch/pytorch/issues/111189

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111266
Approved by: https://github.com/huydhn
2023-10-14 20:35:04 +00:00
PyTorch MergeBot
f68d6e8108 Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881)"
This reverts commit 68a1219f74.

Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/kit1980 due to breaking internal builds, undefined symbol: _ZN3c1022RefcountedMapAllocator6decrefEv ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1761950014))
2023-10-13 17:57:53 +00:00
Peter Bell
68a1219f74 Move at::{Refcounted,}MapAllocator to c10 (#109881)
`libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`,
 so it makes sense to move it to c10 along with the other memory allocators.

This means `libshm.so` only depends on `c10` and we don't need to relink
`libshm.so` for every ATen change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881
Approved by: https://github.com/albanD
2023-10-12 10:51:13 +00:00
cyy
a6b452dfdc [2/N] Enable Wunused-result, Wunused-variable and Wmissing-braces in torch targets (#110836)
This PR enables Wunused-result, Wunused-variable and Wmissing-braces because our code base is clean.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110836
Approved by: https://github.com/Skylion007
2023-10-11 23:49:15 +00:00
PyTorch MergeBot
02a02a23ee Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881)"
This reverts commit 0341deb1c7.

Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/albanD due to It does break buck build ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1756195823))
2023-10-10 20:39:12 +00:00
Anthony Alayo
31611b40b9 cmake: allow to build pytorch as a CMake subproject (#110373)
This is a re-attempt of fixing https://github.com/pytorch/pytorch/issues/53980, first submitted in https://github.com/pytorch/pytorch/pull/54978.

Quoting @SpaceIm
```
Fixes https://github.com/pytorch/pytorch/issues/53980

Maybe it would be nice to find why some files are generated in CMAKE_BINARY_DIR instead of CMAKE_CURRENT_BINARY_DIR or Torch_BINARY_DIR or PROJECT_BINARY_DIR, but there is a lot of indirection in the logic of pytorch build files, so I was not able to find where it comes from.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110373
Approved by: https://github.com/malfet
2023-10-10 17:47:35 +00:00
Peter Bell
0341deb1c7 Move at::{Refcounted,}MapAllocator to c10 (#109881)
`libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`,
 so it makes sense to move it to c10 along with the other memory allocators.

This means `libshm.so` only depends on `c10` and we don't need to relink
`libshm.so` for every ATen change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881
Approved by: https://github.com/albanD
2023-10-09 23:53:47 +00:00
cyy
3a70a02a81 Enable Wrange-loop-analysis (#110837)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110837
Approved by: https://github.com/Skylion007
2023-10-09 11:19:03 +00:00
cyy
c3e4e4f6d2 [4/N] Add -Wdeprecated and related fixes (#110204)
This PR enables Wdeprecated on torch_cpu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110204
Approved by: https://github.com/ezyang
2023-10-07 19:46:08 +00:00
Dmytro Dzhulgakov
a0cea517e7 Add 9.0a to cpp_extension supported compute archs (#110587)
There's an extended compute capability 9.0a for Hopper that was introduced in Cuda 12.0: https://docs.nvidia.com/cuda/archive/12.0.0/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

E.g. Cutlass leverages it: 5f13dcad78/python/cutlass/emit/pytorch.py (L684)

This adds it to the list of permitted architectures to use in `cpp_extension` directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110587
Approved by: https://github.com/ezyang
2023-10-05 17:41:06 +00:00
Alin Pahontu
21d77bcf80 added path to correct directory containing headers (#110063)
After make install the headers are placed in include/openblas/ folder instead of include/ folder. Updated FindOpenBLAS.cmake to make that change clear.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110063
Approved by: https://github.com/Blackhex, https://github.com/kit1980
2023-10-04 21:56:36 +00:00
Huy Do
f7909cb947 Build and test iOS on GitHub M1 runners (#110406)
They are here https://github.blog/2023-10-02-introducing-the-new-apple-silicon-powered-m1-macos-larger-runner-for-github-actions

I have been able to run iOS simulator tests on my M1 laptop without issues.  Some numbers:

* iOS build takes ~1h with x86 runners
* The new M1 runners take ~20m https://github.com/pytorch/pytorch/actions/runs/6386171957

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110406
Approved by: https://github.com/malfet, https://github.com/seemethere
2023-10-03 03:17:10 +00:00
cyy
ef5ff79019 [2/N] Clean up CMake target linking (#109986)
This PR cleans up more CMake target linking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109986
Approved by: https://github.com/malfet
2023-10-01 05:36:08 +00:00
Aleksei Nikiforov
e05eb69c93 Don't link to libcpuinfo on s390x (#109875)
Don't even build it.
It does not support s390x.

This is a follow up for https://github.com/pytorch/pytorch/pull/109496

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109875
Approved by: https://github.com/kit1980
2023-09-26 12:43:35 +00:00
cyy
265acd4bea Clean up CMake target linking (#109959)
This PR cleans up more CMake target linking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109959
Approved by: https://github.com/ezyang
2023-09-25 01:37:14 +00:00
cyy
ba0362a09e Remove unused build system checks and definitions (#109711)
Remove some outdated checks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109711
Approved by: https://github.com/ezyang
2023-09-21 16:52:16 +00:00
cyy
92b0db2967 Don't find MKL if it isn't used (#109426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109426
Approved by: https://github.com/Skylion007
2023-09-17 03:39:39 +00:00
cyy
9a492fc27f Fix unknown c++ flag detection in CMake (#109000)
Unknown -Wno-XXX flags are still appended to GCC via append_cxx_flag_if_supported  because of the behavior mentioned in GCC document:
```
When an unrecognized warning option is requested (e.g., -Wunknown-warning),
GCC emits a diagnostic stating that the option is not recognized.
However, if the -Wno- form is used, the behavior is slightly different:
no diagnostic is produced for -Wno-unknown-warning unless other diagnostics are being produced.
This allows the use of new -Wno- options with old compilers,
but if something goes wrong, the compiler warns that an unrecognized option is present.
```
This PR tries to fix by detection the flag of the -WXXX form. Unfortunately, third_party/fbgemm/CMakeLists.txt redefines append_cxx_flag_if_supported and our version is overwritten. As a result, we have to re-include utils.cmake to overwrite it again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109000
Approved by: https://github.com/malfet
2023-09-11 08:32:07 +00:00
Huy Do
4a98c898e2 Refactor ios-build-test workflow to support binary release (#108322)
This refactors the logic from CircleCI iOS [build](https://github.com/pytorch/pytorch/blob/main/.circleci/config.yml#L1323-L1344) and [upload](https://github.com/pytorch/pytorch/blob/main/.circleci/config.yml#L1369-L1377) jobs to GHA.

* Nightly artifacts will be available again on `ossci-ios-build` S3 bucket, for example `libtorch_lite_ios_nightly_2.1.0.20230517.zip`.  The last one there was s3://ossci-ios-build/libtorch_lite_ios_nightly_2.1.0.20230517.zip from May 17th
  * [LibTorch-Lite-Nightly](https://github.com/CocoaPods/Specs/blob/master/Specs/c/3/1/LibTorch-Lite-Nightly/1.14.0.20221109/LibTorch-Lite-Nightly.podspec.json) on cocoapods
* Release artifacts will be on `ossci-ios` S3 bucket, for example `s3://ossci-ios/libtorch_lite_ios_1.13.0.zip` from Nov 3rd 2022
  * [LibTorch-Lite](https://github.com/CocoaPods/Specs/blob/master/Specs/c/c/3/LibTorch-Lite/1.13.0.1/LibTorch-Lite.podspec.json) on cocoapods
  * [LibTorch](https://github.com/CocoaPods/Specs/blob/master/Specs/1/3/c/LibTorch/1.13.0.1/LibTorch.podspec.json) on cocoapods

I will clean up Circle CI code in another PR.

### Testing

Generate new release artifacts for testing from main branch.  Simulator testing have all passed.

* With lite interpreter https://github.com/pytorch/pytorch/actions/runs/6093860118
  * https://ossci-ios.s3.amazonaws.com/libtorch_lite_ios_2.1.0.zip
  * https://ossci-ios.s3.amazonaws.com/LibTorch-Lite-2.1.0.podspec

* LibTorch binary can be built without lite interpreter https://github.com/pytorch/pytorch/actions/runs/6103616035 and uses TorchScript, but it has been long dead from my understanding.  The binary can still be built and tested though.
  * https://ossci-ios.s3.amazonaws.com/libtorch_ios_2.1.0.zip
  * https://ossci-ios.s3.amazonaws.com/LibTorch-2.1.0.podspec

### Next step for release

* Once the PR is committed.  I plan to use the workflow dispatch to build the binaries manually on `release/2.1` branch.  Once they looks good, we can publish them on cocoapods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108322
Approved by: https://github.com/atalman
2023-09-10 19:08:15 +00:00
Andrei Gheorghe
2028987bf7 Fix finding Intel MKL on Windows, as well as LAPACK, cuDNN and cuSPARSELt (#108040)
Fixes #108039

Intel MKL is now found correctly:

-- MKL libraries: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_intel_lp64.lib;C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_sequential.lib;C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_core.lib
-- MKL include directory: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/include

and LAPACK too (excerpt from build.ninja):

LINK_LIBRARIES = lib\c10.lib  lib\pthreadpool.lib  lib\cpuinfo.lib  lib\XNNPACK.lib  lib\fbgemm.lib  lib\libittnotify.lib  lib\gloo.lib  lib\foxi_loader.lib  lib\kineto.lib  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\**mkl_lapack95_lp64.lib**"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib"  "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib"

cuSPARSELt is also found correctly:

-- Found CUSPARSELT: C:/Program Files/NVIDIA cuSPARSELt/v0.4/lib/cusparseLt.lib

Also cuDNN include directory is properly added for the test target cuda_cudnn_test:

build caffe2\CMakeFiles\cuda_cudnn_test.dir\__\aten\src\ATen\test\cuda_cudnn_test.cpp.obj: CXX_COMPILER__cuda_cudnn_test_RelWithDebInfo C$:\work\Repos\pytorch\aten\src\ATen\test\cuda_cudnn_test.cpp || cmake_object_order_depends_target_cuda_cudnn_test
  DEFINES = ....
  FLAGS = ....
  INCLUDES = -IC:\work\Repos\pytorch\build\aten\src -IC:\work\Repos\pytorch\aten\src ........... -external:IC:\work\Repos\pytorch\third_party\ittapi\include -external:IC:\work\Repos\pytorch\cmake\..\third_party\eigen -external:I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include" -external:IC:\work\Repos\pytorch\torch\include -external:IC:\work\Repos\pytorch\third_party\ideep\include -external:IC:\work\Repos\pytorch\third_party\googletest\googletest\include -external:IC:\work\Repos\pytorch\third_party\googletest\googletest **-external:I"C:\Program Files\NVIDIA cuDNN\include"** -external:IC:\work\Repos\pytorch\cmake\..\third_party\cudnn_frontend\include -external:W0

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108040
Approved by: https://github.com/ezyang
2023-09-08 14:41:00 +00:00
cyy
0cc2f06aec [Reland] Improve MKL related logic in FindOpenMP.cmake (#104224)
Reland of PR #94924. The purpose of this PR is to deal with the complicated interactions between MKL and OpenMP.
There are two improvements:
1. It uses a flag to avoid infinite mutual recursion in calling find_package(MKL) and find_package(OpenMP) in some cases.
2. The logic of finding iomp5 is improved and now we can test  MKLDNN under ASAN.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104224
Approved by: https://github.com/malfet
2023-09-02 07:55:11 +00:00
cyy
d6a9c2b4b5 [BC BREAKING] Remove outdated python submodules (#108236)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108236
Approved by: https://github.com/malfet
2023-09-02 06:24:20 +00:00
drisspg
182a9cf366 Add Independent Memory Efficient and Flash Attention Build Flags (#107985)
# Summary
In an effort to simplify https://github.com/pytorch/pytorch/pull/105602, this PR pulls out independent chunks of code that can be landed prior to FlashV2 landing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107985
Approved by: https://github.com/cpuhrsch
2023-08-28 18:39:18 +00:00
Xia, Weiwen
97a291f6bd [ONEDNN][BC-breaking] update onednn from v2.7.3 to v3.1.1 (#97957)
**Summary**
Update onednn from v2.7.3 to v3.1.1.
It is bc-breaking as some APIs are changed on oneDNN side. Changes include:
- PyTorch code where oneDNN is directly called
- Submodule `third_party/ideep` to adapt to oneDNN's new API.
- CMAKE files to fix build issues.

**Test plan**
Building issues and correctness are covered by CI checks.
For performance, we have run TorchBench models to ensure there is no regression. Below is the comparison before and after oneDNN update.
![image](https://github.com/pytorch/pytorch/assets/12522207/415a4ff0-7566-40c6-aed0-24997a475b0e)

Note:
- Base commit of PyTorch: da322ea
- CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Ice Lake)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97957
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-25 12:13:18 +00:00
Aaron Gokaslan
93f2a64d4d Update submodule NCCL to v2.18.3 (#104993)
Update NCCL submodule to v2.18.3 which fixes numerous bugs and performance issues, particularly on newer GPUs: https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-18-3.html#rel_2-18-3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104993
Approved by: https://github.com/malfet
2023-08-18 23:43:01 +00:00
mingfeima
e10791c0bd enable mkl_gemm_bf16bf16f32 in cpublas::gemm (#107196)
This one is a wrapper upon `mkl_gemm_bf16bf16f32` which is used in flash attention kernel on intel 4th gen xeon.
Fallback path has also been implemented on cpublas::gemm in case `mkl_gemm_bf16bf16f32` is not available.

The primary target of this change is to help build kernels in `scaled_dot_product_attention`, e.g. flash attention and efficient attention.  In the attention kernel, `q @ k.T = attn`, q and k will be given as bfloat16 and attn is float32. This is actually both beneficial for both performance and accuracy, since attn will be used to compute lazy softmax which has to be done in float32.

This patch also adds routine from OpenBlas `sbgemm_` which also has a signature of bf16 * bf16 -> fp32; but since OpenBlas routine has different name from MKL's, we can not use `sbgemm_` in MKL.

In the fallback path, it takes two steps to do the computation: first do gemm with beta = 0; then add beta * C in full precision. Idea from @peterbell10 not to truncate C to bfloat16, so as to avoid unnecessary accuracy loss.

ref: https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2023-0/cblas-gemm-bf16bf16f32.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107196
Approved by: https://github.com/jgong5, https://github.com/peterbell10
2023-08-18 12:48:10 +00:00
PyTorch MergeBot
22cade56ba Revert "[Reland] Upgrade NVTX to NVTX3 (#97582)"
This reverts commit 5bbfb96203.

Reverted https://github.com/pytorch/pytorch/pull/97582 on behalf of https://github.com/izaitsevfb due to Breaks meta RL builds ([comment](https://github.com/pytorch/pytorch/pull/97582#issuecomment-1679568525))
2023-08-15 20:55:12 +00:00
cyy
5bbfb96203 [Reland] Upgrade NVTX to NVTX3 (#97582)
PR #90689 replaces NVTX with NVTX3. However, the torch::nvtoolsext is created only when the third party NVTX is used.
 This is clear a logical error. We now move the creation code out of the branch to cover all cases. This should fix the issues reported in the comments of  #90689.

It would be better to move configurations of the failed FRL jobs to CI tests so that we can find such issues early before merging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97582
Approved by: https://github.com/peterbell10
2023-08-14 16:55:25 +00:00
Nikita Shulga
44448754c1 [CI] Fix sccaching of nvcc builds (#106811)
In cmake-3.26 or newer, `--options-file` is used, which renders nvcc outputs uncacheable by `sccache`, which were enable for CUDA-11 or newer builds by default by 6377a43814

Fix it by disabling RESPONSE_FILE use for CUDA compilation.
Test Plan: Check that `multiple input files` stats in `PyTorch Build Statistics` is down to 13 files again, see https://github.com/pytorch/pytorch/actions/runs/5801865789/job/15727069855?pr=106811#step:10:42423

Fixes https://github.com/pytorch/pytorch/issues/105004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106811
Approved by: https://github.com/seemethere
2023-08-09 00:25:11 +00:00
Jesse Cai
f81f9093ec [core][pruning][feature] cuSPARSELt build integration (#103700)
Summary:

This stack of PR's integrates cuSPARSELt into PyTorch.

This PR adds support for cuSPARSELt into the build process.
It adds in a new flag, USE_CUSPARSELT that defaults to false.

When USE_CUSPASRELT=1 is specified, the user can also specify
CUSPASRELT_ROOT, which defines the path to the library.

Compiling pytorch with cusparselt support can be done as follows:

``
USE_CUSPARSELT=1
CUSPARSELT_ROOT=/path/to/cusparselt

python setup.py develop
```

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700
Approved by: https://github.com/albanD
2023-08-02 12:48:39 +00:00
Jeff Daily
5379b5f927 [ROCm] use hipblas instead of rocblas (#105881)
- BatchLinearAlgebraLib.cpp is now split into one additional file
  - BatchLinearAlgebraLib.cpp uses only cusolver APIs
  - BatchLinearAlgebraLibBlas.cpp uses only cublas APIs
  - hipify operates at the file level and cannot mix cusolver and cublas APIs within the same file
- cmake changes to link against hipblas instead of rocblas
- hipify mappings changes to map cublas -> hipblas instead of rocblas

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105881
Approved by: https://github.com/albanD
2023-07-31 20:42:55 +00:00
Rodrigo Kumpera
2636751fb9 [C10d] Add skeleton of LibUV backend. (#105672)
This commit hooks up tcpstore creation and build flags.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105672
Approved by: https://github.com/fduwjj
2023-07-28 13:19:06 +00:00
CURTLab
b5d3d58497 Fixed cmake mkl lib path in caffee2 public (#105525)
This small change fixes a linking error (Intel MKL) for distributed version of libtorch c++ using cmake.

Fixes #105215.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105525
Approved by: https://github.com/albanD
2023-07-20 17:15:09 +00:00
Peter Bell
c500f1d13b [CMake] Fix TORCH_CUDA_ARCH_LIST warning (#104680)
The warning complains that `TORCH_CUDA_ARCH_LIST` is set on the environment
instead of being defined as a build variable, which is fixed by the change to
`tools/setup_helpers/cmake.py`.

However, I still see the warning even with this fix because
```cmake
if((NOT EXISTS ${TORCH_CUDA_ARCH_LIST}) ...
```
is actually checking whether a file exists called "7.5" (or whatever arch is
being requested). Instead we want to check if the variable is defined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104680
Approved by: https://github.com/albanD
2023-07-07 15:12:54 +00:00
Connor Baker
0c8323e4a4 cmake: allow USE_SYSTEM_ZSTD (#104611)
Fixes #44255.

This is part of larger work I'm doing to allow for more `USE_SYSTEM_*` options to allow Nix to have faster re-builds of PyTorch: https://github.com/NixOS/nixpkgs/pull/239291.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104611
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-05 04:47:35 +00:00
Te
a73ad82c8f conditional CMAKE_CUDA_STANDARD (#104240)
Fixes #104237

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104240
Approved by: https://github.com/malfet
2023-06-27 18:41:25 +00:00
Xu Han
6c1ccccf21 Enable mimalloc on pytorch Windows (#102595)
This PR is implemention of [#102534](https://github.com/pytorch/pytorch/issues/102534), option 2.
Major changes:
1. Add mimalloc to the submodule.
2. Add build option "USE_MIMALLOC".
3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance.

Additional Test:
<img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3">
This PR also build & static link mimalloc on Linux well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102595
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-06-27 08:53:26 +00:00
Nikita Shulga
3a823e4617 [BE][CMake] Do not pass -mfpu=neon on Apple (#104078)
Followup after https://github.com/pytorch/pytorch/pull/103929 that
get rid of an annoying warning, which will become an error in newer Xcode

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 748d60d</samp>

> _`NEON_FOUND` is true_
> _But iOS may not like `-mfpu=neon`_
> _Check platform, then branch_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104078
Approved by: https://github.com/huydhn, https://github.com/kit1980
2023-06-23 17:09:30 +00:00
cyy
1e108d9c21 enable more ASAN tests (#101483)
Recently, we are seeing some bugs found by ASAN such as #101400, I think enabling ASAN for more tests is necessary to catch more hidden bugs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101483
Approved by: https://github.com/huydhn
2023-06-15 05:21:15 +00:00
Nikita Shulga
00e16179f0 [LibTorch] Fix append_whole_archive macro (#103348)
`-force_load` is not  compiler, but a linker option, and as such should depend on the platform (i.e. MacOS/iOS), rather than on compiler (i.e.  clang vs gcc)

Otherwise, attempt to link libtorch static with clang results in a cryptic `/usr/bin/ld: -f may not be used without -shared` error on Linux.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103348
Approved by: https://github.com/seemethere
2023-06-10 02:53:37 +00:00
Jack Taylor
87c976b69d Remove deprecated HIP flags (#102271)
Removes the outdated HIP flags appended to HIP_CXX_FLAGS

The will help remove the following warnings in the pytorch build log

```
[6238/6889] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/cudnn/hip/Conv_v8.cpp.o
cc1plus: warning: command line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unused-command-line-argument’
cc1plus: warning: unrecognized command line option ‘-Wno-exceptions’
cc1plus: warning: unrecognized command line option ‘-Wno-inconsistent-missing-override’
cc1plus: warning: unrecognized command line option ‘-Wno-macro-redefined’
```

This also updates the gloo submodule commit to include the similar change made to gloo.
597accfd79

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102271
Approved by: https://github.com/malfet
2023-06-01 18:58:48 +00:00
Andres Lugo-Reyes
eaffd98880 Enable hipSOLVER in ROCm builds (#97370)
Enables the hipSolver backend for ROCm builds
--------------------------------------------------------------------------

- Minimum ROCm version requirement - 5.3
- Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER
- Adds hipSOLVER API to hipification process
- combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings
- Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr()
- Will enable 100+ linalg unit tests for ROCm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370
Approved by: https://github.com/malfet
2023-05-31 16:53:23 +00:00
Nikita Shulga
30cecc0e11 [MPS] Fix build regressions introduced by #92868 (#101036)
https://github.com/pytorch/pytorch/pull/92868 introduced  `OBJC` and `OBJCXX` language dialects, but fails to propagate some important flags, like OpenMP include path(if found),  `-fno-objc-arc` and `-Wno-unguarded-availability-new` suppression.

This PR remedies that and fixes https://github.com/pytorch/pytorch/issues/100925

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 62677d4</samp>

This pull request improves the support for MPSGraph on Apple platforms by fixing some CMake flags for parallelism and memory management. It modifies `cmake/Dependencies.cmake` and `CMakeLists.txt` accordingly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101036
Approved by: https://github.com/atalman, https://github.com/huydhn
2023-05-10 04:15:41 +00:00
cyy
c8877e6080 enable some cuda warnings (#95568)
Currently some CUDA warnings are disabled due to  some old issues of code quality that are fixed now. So it is time to remove the suppression.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95568
Approved by: https://github.com/albanD
2023-04-28 02:39:17 +00:00
cyy
2b7161e2bf lower cmake version requirement in FindSanitizer.cmake (#97073)
As indicated by the last comment from PR #93147, we should replace CheckSourceRuns in **cmake/Modules/FindSanitizer.cmake**  with older versions to avoid dependency on CMake 3.19+
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97073
Approved by: https://github.com/vfdev-5, https://github.com/Skylion007
2023-04-22 02:02:14 +00:00
Aleksei Nikiforov
c130b8a716 Reintroduce s390x SIMD support (#99057)
Reintroduce s390x SIMD support

Use vectorized FMA to fix test precision failures

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99057
Approved by: https://github.com/malfet
2023-04-15 00:24:44 +00:00
mingfeima
ced5c89b6f add explicit vectorization for Half dtype on CPU (#96076)
This patch is part of half float performance optimization on CPU:
* add specification for dtype `Half` in `Vectorized<>` under both avx256 and avx512.
* add specification for dtype `Half` in functional utils, e.g. `vec::map_reduce<>()`, which uses float32 as accumulate type.

Also add a helper struct `vec_hold_type<scalar_t>`, since Vectorized<Half>::value_type is pointing to its underlying storage type which is `uint16_t`, leading to error if the kernel uses `Vec::value_type`.

Half uses the same logic as BFloat16 in the Vectorized<>, each half vector is mapped to 2x float vectors for computation.

Notice that this patch modified the cmake files by adding **-mf16c** on AVX2 build, from https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html, we can see that all the hardware platforms that support **avx2** already have **f16c**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96076
Approved by: https://github.com/malfet
2023-04-03 10:58:37 +00:00
Milos Puzovic
2630144786 Call to mkldnn_matmul from aten::addmm on AArch64 (#91763)
We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91763
Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/malfet
2023-04-01 04:25:57 +00:00
PyTorch MergeBot
3226ad21cf Revert "[Reland] fix some MKL detection issues of CMake (#94924)"
This reverts commit dc2b7aa955.

Reverted https://github.com/pytorch/pytorch/pull/94924 on behalf of https://github.com/atalman due to conda nightly build failures
2023-03-31 18:41:11 +00:00