pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jeff Daily	28c0b07d19	[ROCm] remove HCC references (#111975 ) - rename `__HIP_PLATFORM_HCC__` to `__HIP_PLATFORM_AMD__` - rename `HIP_HCC_FLAGS` to `HIP_CLANG_FLAGS` - rename `PYTORCH_HIP_HCC_LIBRARIES` to `PYTORCH_HIP_LIBRARIES` - workaround in tools/amd_build/build_amd.py until submodules are updated These symbols have had a long deprecation cycle and will finally be removed in ROCm 6.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111975 Approved by: https://github.com/ezyang, https://github.com/hongxiayang	2023-10-26 02:39:10 +00:00
Nikita Shulga	6dc54fe8d6	[BE] Compile FBGEMM with ASAN (#111266 ) If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined` This fixes regression in sanitizer coverage introduced by https://github.com/pytorch/pytorch/pull/93147 that change effects of sanitizer from the entire project to just torch libraries, and finally allows one to reliably catch regression reported in https://github.com/pytorch/pytorch/issues/111189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111266 Approved by: https://github.com/huydhn	2023-10-14 20:35:04 +00:00
PyTorch MergeBot	f68d6e8108	Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881 )" This reverts commit `68a1219f74`. Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/kit1980 due to breaking internal builds, undefined symbol: _ZN3c1022RefcountedMapAllocator6decrefEv ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1761950014))	2023-10-13 17:57:53 +00:00
Peter Bell	68a1219f74	Move at::{Refcounted,}MapAllocator to c10 (#109881 ) `libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`, so it makes sense to move it to c10 along with the other memory allocators. This means `libshm.so` only depends on `c10` and we don't need to relink `libshm.so` for every ATen change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881 Approved by: https://github.com/albanD	2023-10-12 10:51:13 +00:00
cyy	a6b452dfdc	[2/N] Enable Wunused-result, Wunused-variable and Wmissing-braces in torch targets (#110836 ) This PR enables Wunused-result, Wunused-variable and Wmissing-braces because our code base is clean. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110836 Approved by: https://github.com/Skylion007	2023-10-11 23:49:15 +00:00
PyTorch MergeBot	02a02a23ee	Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881 )" This reverts commit `0341deb1c7`. Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/albanD due to It does break buck build ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1756195823))	2023-10-10 20:39:12 +00:00
Anthony Alayo	31611b40b9	cmake: allow to build pytorch as a CMake subproject (#110373 ) This is a re-attempt of fixing https://github.com/pytorch/pytorch/issues/53980, first submitted in https://github.com/pytorch/pytorch/pull/54978. Quoting @SpaceIm ``` Fixes https://github.com/pytorch/pytorch/issues/53980 Maybe it would be nice to find why some files are generated in CMAKE_BINARY_DIR instead of CMAKE_CURRENT_BINARY_DIR or Torch_BINARY_DIR or PROJECT_BINARY_DIR, but there is a lot of indirection in the logic of pytorch build files, so I was not able to find where it comes from. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110373 Approved by: https://github.com/malfet	2023-10-10 17:47:35 +00:00
Peter Bell	0341deb1c7	Move at::{Refcounted,}MapAllocator to c10 (#109881 ) `libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`, so it makes sense to move it to c10 along with the other memory allocators. This means `libshm.so` only depends on `c10` and we don't need to relink `libshm.so` for every ATen change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881 Approved by: https://github.com/albanD	2023-10-09 23:53:47 +00:00
cyy	3a70a02a81	Enable Wrange-loop-analysis (#110837 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110837 Approved by: https://github.com/Skylion007	2023-10-09 11:19:03 +00:00
cyy	c3e4e4f6d2	[4/N] Add -Wdeprecated and related fixes (#110204 ) This PR enables Wdeprecated on torch_cpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/110204 Approved by: https://github.com/ezyang	2023-10-07 19:46:08 +00:00
Dmytro Dzhulgakov	a0cea517e7	Add 9.0a to cpp_extension supported compute archs (#110587 ) There's an extended compute capability 9.0a for Hopper that was introduced in Cuda 12.0: https://docs.nvidia.com/cuda/archive/12.0.0/cuda-compiler-driver-nvcc/index.html#gpu-feature-list E.g. Cutlass leverages it: `5f13dcad78/python/cutlass/emit/pytorch.py (L684)` This adds it to the list of permitted architectures to use in `cpp_extension` directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110587 Approved by: https://github.com/ezyang	2023-10-05 17:41:06 +00:00
Alin Pahontu	21d77bcf80	added path to correct directory containing headers (#110063 ) After make install the headers are placed in include/openblas/ folder instead of include/ folder. Updated FindOpenBLAS.cmake to make that change clear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110063 Approved by: https://github.com/Blackhex, https://github.com/kit1980	2023-10-04 21:56:36 +00:00
Huy Do	f7909cb947	Build and test iOS on GitHub M1 runners (#110406 ) They are here https://github.blog/2023-10-02-introducing-the-new-apple-silicon-powered-m1-macos-larger-runner-for-github-actions I have been able to run iOS simulator tests on my M1 laptop without issues. Some numbers: * iOS build takes ~1h with x86 runners * The new M1 runners take ~20m https://github.com/pytorch/pytorch/actions/runs/6386171957 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110406 Approved by: https://github.com/malfet, https://github.com/seemethere	2023-10-03 03:17:10 +00:00
cyy	ef5ff79019	[2/N] Clean up CMake target linking (#109986 ) This PR cleans up more CMake target linking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109986 Approved by: https://github.com/malfet	2023-10-01 05:36:08 +00:00
Aleksei Nikiforov	e05eb69c93	Don't link to libcpuinfo on s390x (#109875 ) Don't even build it. It does not support s390x. This is a follow up for https://github.com/pytorch/pytorch/pull/109496 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109875 Approved by: https://github.com/kit1980	2023-09-26 12:43:35 +00:00
cyy	265acd4bea	Clean up CMake target linking (#109959 ) This PR cleans up more CMake target linking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109959 Approved by: https://github.com/ezyang	2023-09-25 01:37:14 +00:00
cyy	ba0362a09e	Remove unused build system checks and definitions (#109711 ) Remove some outdated checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109711 Approved by: https://github.com/ezyang	2023-09-21 16:52:16 +00:00
cyy	92b0db2967	Don't find MKL if it isn't used (#109426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109426 Approved by: https://github.com/Skylion007	2023-09-17 03:39:39 +00:00
cyy	9a492fc27f	Fix unknown c++ flag detection in CMake (#109000 ) Unknown -Wno-XXX flags are still appended to GCC via append_cxx_flag_if_supported because of the behavior mentioned in GCC document: ``` When an unrecognized warning option is requested (e.g., -Wunknown-warning), GCC emits a diagnostic stating that the option is not recognized. However, if the -Wno- form is used, the behavior is slightly different: no diagnostic is produced for -Wno-unknown-warning unless other diagnostics are being produced. This allows the use of new -Wno- options with old compilers, but if something goes wrong, the compiler warns that an unrecognized option is present. ``` This PR tries to fix by detection the flag of the -WXXX form. Unfortunately, third_party/fbgemm/CMakeLists.txt redefines append_cxx_flag_if_supported and our version is overwritten. As a result, we have to re-include utils.cmake to overwrite it again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109000 Approved by: https://github.com/malfet	2023-09-11 08:32:07 +00:00
Huy Do	4a98c898e2	Refactor ios-build-test workflow to support binary release (#108322 ) This refactors the logic from CircleCI iOS [build](https://github.com/pytorch/pytorch/blob/main/.circleci/config.yml#L1323-L1344) and [upload](https://github.com/pytorch/pytorch/blob/main/.circleci/config.yml#L1369-L1377) jobs to GHA. * Nightly artifacts will be available again on `ossci-ios-build` S3 bucket, for example `libtorch_lite_ios_nightly_2.1.0.20230517.zip`. The last one there was s3://ossci-ios-build/libtorch_lite_ios_nightly_2.1.0.20230517.zip from May 17th * [LibTorch-Lite-Nightly](https://github.com/CocoaPods/Specs/blob/master/Specs/c/3/1/LibTorch-Lite-Nightly/1.14.0.20221109/LibTorch-Lite-Nightly.podspec.json) on cocoapods * Release artifacts will be on `ossci-ios` S3 bucket, for example `s3://ossci-ios/libtorch_lite_ios_1.13.0.zip` from Nov 3rd 2022 * [LibTorch-Lite](https://github.com/CocoaPods/Specs/blob/master/Specs/c/c/3/LibTorch-Lite/1.13.0.1/LibTorch-Lite.podspec.json) on cocoapods * [LibTorch](https://github.com/CocoaPods/Specs/blob/master/Specs/1/3/c/LibTorch/1.13.0.1/LibTorch.podspec.json) on cocoapods I will clean up Circle CI code in another PR. ### Testing Generate new release artifacts for testing from main branch. Simulator testing have all passed. * With lite interpreter https://github.com/pytorch/pytorch/actions/runs/6093860118 * https://ossci-ios.s3.amazonaws.com/libtorch_lite_ios_2.1.0.zip * https://ossci-ios.s3.amazonaws.com/LibTorch-Lite-2.1.0.podspec * LibTorch binary can be built without lite interpreter https://github.com/pytorch/pytorch/actions/runs/6103616035 and uses TorchScript, but it has been long dead from my understanding. The binary can still be built and tested though. * https://ossci-ios.s3.amazonaws.com/libtorch_ios_2.1.0.zip * https://ossci-ios.s3.amazonaws.com/LibTorch-2.1.0.podspec ### Next step for release * Once the PR is committed. I plan to use the workflow dispatch to build the binaries manually on `release/2.1` branch. Once they looks good, we can publish them on cocoapods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108322 Approved by: https://github.com/atalman	2023-09-10 19:08:15 +00:00
Andrei Gheorghe	2028987bf7	Fix finding Intel MKL on Windows, as well as LAPACK, cuDNN and cuSPARSELt (#108040 ) Fixes #108039 Intel MKL is now found correctly: -- MKL libraries: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_intel_lp64.lib;C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_sequential.lib;C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_core.lib -- MKL include directory: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/include and LAPACK too (excerpt from build.ninja): LINK_LIBRARIES = lib\c10.lib lib\pthreadpool.lib lib\cpuinfo.lib lib\XNNPACK.lib lib\fbgemm.lib lib\libittnotify.lib lib\gloo.lib lib\foxi_loader.lib lib\kineto.lib "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_lapack95_lp64.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib" cuSPARSELt is also found correctly: -- Found CUSPARSELT: C:/Program Files/NVIDIA cuSPARSELt/v0.4/lib/cusparseLt.lib Also cuDNN include directory is properly added for the test target cuda_cudnn_test: build caffe2\CMakeFiles\cuda_cudnn_test.dir\__\aten\src\ATen\test\cuda_cudnn_test.cpp.obj: CXX_COMPILER__cuda_cudnn_test_RelWithDebInfo C$:\work\Repos\pytorch\aten\src\ATen\test\cuda_cudnn_test.cpp \|\| cmake_object_order_depends_target_cuda_cudnn_test DEFINES = .... FLAGS = .... INCLUDES = -IC:\work\Repos\pytorch\build\aten\src -IC:\work\Repos\pytorch\aten\src ........... -external:IC:\work\Repos\pytorch\third_party\ittapi\include -external:IC:\work\Repos\pytorch\cmake\..\third_party\eigen -external:I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include" -external:IC:\work\Repos\pytorch\torch\include -external:IC:\work\Repos\pytorch\third_party\ideep\include -external:IC:\work\Repos\pytorch\third_party\googletest\googletest\include -external:IC:\work\Repos\pytorch\third_party\googletest\googletest -external:I"C:\Program Files\NVIDIA cuDNN\include" -external:IC:\work\Repos\pytorch\cmake\..\third_party\cudnn_frontend\include -external:W0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108040 Approved by: https://github.com/ezyang	2023-09-08 14:41:00 +00:00
cyy	0cc2f06aec	[Reland] Improve MKL related logic in FindOpenMP.cmake (#104224 ) Reland of PR #94924. The purpose of this PR is to deal with the complicated interactions between MKL and OpenMP. There are two improvements: 1. It uses a flag to avoid infinite mutual recursion in calling find_package(MKL) and find_package(OpenMP) in some cases. 2. The logic of finding iomp5 is improved and now we can test MKLDNN under ASAN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104224 Approved by: https://github.com/malfet	2023-09-02 07:55:11 +00:00
cyy	d6a9c2b4b5	[BC BREAKING] Remove outdated python submodules (#108236 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108236 Approved by: https://github.com/malfet	2023-09-02 06:24:20 +00:00
drisspg	182a9cf366	Add Independent Memory Efficient and Flash Attention Build Flags (#107985 ) # Summary In an effort to simplify https://github.com/pytorch/pytorch/pull/105602, this PR pulls out independent chunks of code that can be landed prior to FlashV2 landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107985 Approved by: https://github.com/cpuhrsch	2023-08-28 18:39:18 +00:00
Xia, Weiwen	97a291f6bd	[ONEDNN][BC-breaking] update onednn from v2.7.3 to v3.1.1 (#97957 ) Summary Update onednn from v2.7.3 to v3.1.1. It is bc-breaking as some APIs are changed on oneDNN side. Changes include: - PyTorch code where oneDNN is directly called - Submodule `third_party/ideep` to adapt to oneDNN's new API. - CMAKE files to fix build issues. Test plan Building issues and correctness are covered by CI checks. For performance, we have run TorchBench models to ensure there is no regression. Below is the comparison before and after oneDNN update. ![image](https://github.com/pytorch/pytorch/assets/12522207/415a4ff0-7566-40c6-aed0-24997a475b0e) Note: - Base commit of PyTorch: `da322ea` - CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Ice Lake) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97957 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-08-25 12:13:18 +00:00
Aaron Gokaslan	93f2a64d4d	Update submodule NCCL to v2.18.3 (#104993 ) Update NCCL submodule to v2.18.3 which fixes numerous bugs and performance issues, particularly on newer GPUs: https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-18-3.html#rel_2-18-3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104993 Approved by: https://github.com/malfet	2023-08-18 23:43:01 +00:00
mingfeima	e10791c0bd	enable mkl_gemm_bf16bf16f32 in cpublas::gemm (#107196 ) This one is a wrapper upon `mkl_gemm_bf16bf16f32` which is used in flash attention kernel on intel 4th gen xeon. Fallback path has also been implemented on cpublas::gemm in case `mkl_gemm_bf16bf16f32` is not available. The primary target of this change is to help build kernels in `scaled_dot_product_attention`, e.g. flash attention and efficient attention. In the attention kernel, `q @ k.T = attn`, q and k will be given as bfloat16 and attn is float32. This is actually both beneficial for both performance and accuracy, since attn will be used to compute lazy softmax which has to be done in float32. This patch also adds routine from OpenBlas `sbgemm_` which also has a signature of bf16 * bf16 -> fp32; but since OpenBlas routine has different name from MKL's, we can not use `sbgemm_` in MKL. In the fallback path, it takes two steps to do the computation: first do gemm with beta = 0; then add beta * C in full precision. Idea from @peterbell10 not to truncate C to bfloat16, so as to avoid unnecessary accuracy loss. ref: https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2023-0/cblas-gemm-bf16bf16f32.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/107196 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2023-08-18 12:48:10 +00:00
PyTorch MergeBot	22cade56ba	Revert "[Reland] Upgrade NVTX to NVTX3 (#97582 )" This reverts commit `5bbfb96203`. Reverted https://github.com/pytorch/pytorch/pull/97582 on behalf of https://github.com/izaitsevfb due to Breaks meta RL builds ([comment](https://github.com/pytorch/pytorch/pull/97582#issuecomment-1679568525))	2023-08-15 20:55:12 +00:00
cyy	5bbfb96203	[Reland] Upgrade NVTX to NVTX3 (#97582 ) PR #90689 replaces NVTX with NVTX3. However, the torch::nvtoolsext is created only when the third party NVTX is used. This is clear a logical error. We now move the creation code out of the branch to cover all cases. This should fix the issues reported in the comments of #90689. It would be better to move configurations of the failed FRL jobs to CI tests so that we can find such issues early before merging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97582 Approved by: https://github.com/peterbell10	2023-08-14 16:55:25 +00:00
Nikita Shulga	44448754c1	[CI] Fix sccaching of nvcc builds (#106811 ) In cmake-3.26 or newer, `--options-file` is used, which renders nvcc outputs uncacheable by `sccache`, which were enable for CUDA-11 or newer builds by default by `6377a43814` Fix it by disabling RESPONSE_FILE use for CUDA compilation. Test Plan: Check that `multiple input files` stats in `PyTorch Build Statistics` is down to 13 files again, see https://github.com/pytorch/pytorch/actions/runs/5801865789/job/15727069855?pr=106811#step:10:42423 Fixes https://github.com/pytorch/pytorch/issues/105004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106811 Approved by: https://github.com/seemethere	2023-08-09 00:25:11 +00:00
Jesse Cai	f81f9093ec	[core][pruning][feature] cuSPARSELt build integration (#103700 ) Summary: This stack of PR's integrates cuSPARSELt into PyTorch. This PR adds support for cuSPARSELt into the build process. It adds in a new flag, USE_CUSPARSELT that defaults to false. When USE_CUSPASRELT=1 is specified, the user can also specify CUSPASRELT_ROOT, which defines the path to the library. Compiling pytorch with cusparselt support can be done as follows: `` USE_CUSPARSELT=1 CUSPARSELT_ROOT=/path/to/cusparselt python setup.py develop ``` Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700 Approved by: https://github.com/albanD	2023-08-02 12:48:39 +00:00
Jeff Daily	5379b5f927	[ROCm] use hipblas instead of rocblas (#105881 ) - BatchLinearAlgebraLib.cpp is now split into one additional file - BatchLinearAlgebraLib.cpp uses only cusolver APIs - BatchLinearAlgebraLibBlas.cpp uses only cublas APIs - hipify operates at the file level and cannot mix cusolver and cublas APIs within the same file - cmake changes to link against hipblas instead of rocblas - hipify mappings changes to map cublas -> hipblas instead of rocblas Pull Request resolved: https://github.com/pytorch/pytorch/pull/105881 Approved by: https://github.com/albanD	2023-07-31 20:42:55 +00:00
Rodrigo Kumpera	2636751fb9	[C10d] Add skeleton of LibUV backend. (#105672 ) This commit hooks up tcpstore creation and build flags. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105672 Approved by: https://github.com/fduwjj	2023-07-28 13:19:06 +00:00
CURTLab	b5d3d58497	Fixed cmake mkl lib path in caffee2 public (#105525 ) This small change fixes a linking error (Intel MKL) for distributed version of libtorch c++ using cmake. Fixes #105215. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105525 Approved by: https://github.com/albanD	2023-07-20 17:15:09 +00:00
Peter Bell	c500f1d13b	[CMake] Fix TORCH_CUDA_ARCH_LIST warning (#104680 ) The warning complains that `TORCH_CUDA_ARCH_LIST` is set on the environment instead of being defined as a build variable, which is fixed by the change to `tools/setup_helpers/cmake.py`. However, I still see the warning even with this fix because ```cmake if((NOT EXISTS ${TORCH_CUDA_ARCH_LIST}) ... ``` is actually checking whether a file exists called "7.5" (or whatever arch is being requested). Instead we want to check if the variable is defined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104680 Approved by: https://github.com/albanD	2023-07-07 15:12:54 +00:00
Connor Baker	0c8323e4a4	cmake: allow USE_SYSTEM_ZSTD (#104611 ) Fixes #44255. This is part of larger work I'm doing to allow for more `USE_SYSTEM_*` options to allow Nix to have faster re-builds of PyTorch: https://github.com/NixOS/nixpkgs/pull/239291. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104611 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-05 04:47:35 +00:00
Te	a73ad82c8f	conditional CMAKE_CUDA_STANDARD (#104240 ) Fixes #104237 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104240 Approved by: https://github.com/malfet	2023-06-27 18:41:25 +00:00
Xu Han	6c1ccccf21	Enable mimalloc on pytorch Windows (#102595 ) This PR is implemention of [#102534](https://github.com/pytorch/pytorch/issues/102534), option 2. Major changes: 1. Add mimalloc to the submodule. 2. Add build option "USE_MIMALLOC". 3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance. Additional Test: <img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3"> This PR also build & static link mimalloc on Linux well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102595 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-27 08:53:26 +00:00
Nikita Shulga	3a823e4617	[BE][CMake] Do not pass `-mfpu=neon` on Apple (#104078 ) Followup after https://github.com/pytorch/pytorch/pull/103929 that get rid of an annoying warning, which will become an error in newer Xcode <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 748d60d</samp> > _`NEON_FOUND` is true_ > _But iOS may not like `-mfpu=neon`_ > _Check platform, then branch_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104078 Approved by: https://github.com/huydhn, https://github.com/kit1980	2023-06-23 17:09:30 +00:00
cyy	1e108d9c21	enable more ASAN tests (#101483 ) Recently, we are seeing some bugs found by ASAN such as #101400, I think enabling ASAN for more tests is necessary to catch more hidden bugs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101483 Approved by: https://github.com/huydhn	2023-06-15 05:21:15 +00:00
Nikita Shulga	00e16179f0	[LibTorch] Fix `append_whole_archive` macro (#103348 ) `-force_load` is not compiler, but a linker option, and as such should depend on the platform (i.e. MacOS/iOS), rather than on compiler (i.e. clang vs gcc) Otherwise, attempt to link libtorch static with clang results in a cryptic `/usr/bin/ld: -f may not be used without -shared` error on Linux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103348 Approved by: https://github.com/seemethere	2023-06-10 02:53:37 +00:00
Jack Taylor	87c976b69d	Remove deprecated HIP flags (#102271 ) Removes the outdated HIP flags appended to HIP_CXX_FLAGS The will help remove the following warnings in the pytorch build log ``` [6238/6889] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/cudnn/hip/Conv_v8.cpp.o cc1plus: warning: command line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++ cc1plus: warning: unrecognized command line option ‘-Wno-unused-command-line-argument’ cc1plus: warning: unrecognized command line option ‘-Wno-exceptions’ cc1plus: warning: unrecognized command line option ‘-Wno-inconsistent-missing-override’ cc1plus: warning: unrecognized command line option ‘-Wno-macro-redefined’ ``` This also updates the gloo submodule commit to include the similar change made to gloo. `597accfd79` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102271 Approved by: https://github.com/malfet	2023-06-01 18:58:48 +00:00
Andres Lugo-Reyes	eaffd98880	Enable hipSOLVER in ROCm builds (#97370 ) Enables the hipSolver backend for ROCm builds -------------------------------------------------------------------------- - Minimum ROCm version requirement - 5.3 - Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER - Adds hipSOLVER API to hipification process - combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings - Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr() - Will enable 100+ linalg unit tests for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370 Approved by: https://github.com/malfet	2023-05-31 16:53:23 +00:00
Nikita Shulga	30cecc0e11	[MPS] Fix build regressions introduced by #92868 (#101036 ) https://github.com/pytorch/pytorch/pull/92868 introduced `OBJC` and `OBJCXX` language dialects, but fails to propagate some important flags, like OpenMP include path(if found), `-fno-objc-arc` and `-Wno-unguarded-availability-new` suppression. This PR remedies that and fixes https://github.com/pytorch/pytorch/issues/100925 <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 62677d4</samp> This pull request improves the support for MPSGraph on Apple platforms by fixing some CMake flags for parallelism and memory management. It modifies `cmake/Dependencies.cmake` and `CMakeLists.txt` accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101036 Approved by: https://github.com/atalman, https://github.com/huydhn	2023-05-10 04:15:41 +00:00
cyy	c8877e6080	enable some cuda warnings (#95568 ) Currently some CUDA warnings are disabled due to some old issues of code quality that are fixed now. So it is time to remove the suppression. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95568 Approved by: https://github.com/albanD	2023-04-28 02:39:17 +00:00
cyy	2b7161e2bf	lower cmake version requirement in FindSanitizer.cmake (#97073 ) As indicated by the last comment from PR #93147, we should replace CheckSourceRuns in cmake/Modules/FindSanitizer.cmake with older versions to avoid dependency on CMake 3.19+ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97073 Approved by: https://github.com/vfdev-5, https://github.com/Skylion007	2023-04-22 02:02:14 +00:00
Aleksei Nikiforov	c130b8a716	Reintroduce s390x SIMD support (#99057 ) Reintroduce s390x SIMD support Use vectorized FMA to fix test precision failures Pull Request resolved: https://github.com/pytorch/pytorch/pull/99057 Approved by: https://github.com/malfet	2023-04-15 00:24:44 +00:00
mingfeima	ced5c89b6f	add explicit vectorization for Half dtype on CPU (#96076 ) This patch is part of half float performance optimization on CPU: * add specification for dtype `Half` in `Vectorized<>` under both avx256 and avx512. * add specification for dtype `Half` in functional utils, e.g. `vec::map_reduce<>()`, which uses float32 as accumulate type. Also add a helper struct `vec_hold_type<scalar_t>`, since Vectorized<Half>::value_type is pointing to its underlying storage type which is `uint16_t`, leading to error if the kernel uses `Vec::value_type`. Half uses the same logic as BFloat16 in the Vectorized<>, each half vector is mapped to 2x float vectors for computation. Notice that this patch modified the cmake files by adding -mf16c on AVX2 build, from https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html, we can see that all the hardware platforms that support avx2 already have f16c Pull Request resolved: https://github.com/pytorch/pytorch/pull/96076 Approved by: https://github.com/malfet	2023-04-03 10:58:37 +00:00
Milos Puzovic	2630144786	Call to mkldnn_matmul from aten::addmm on AArch64 (#91763 ) We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/91763 Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/malfet	2023-04-01 04:25:57 +00:00
PyTorch MergeBot	3226ad21cf	Revert "[Reland] fix some MKL detection issues of CMake (#94924 )" This reverts commit `dc2b7aa955`. Reverted https://github.com/pytorch/pytorch/pull/94924 on behalf of https://github.com/atalman due to conda nightly build failures	2023-03-31 18:41:11 +00:00

1 2 3 4 5 ...

1146 Commits