pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

Author	SHA1	Message	Date
cyy	ef5ff79019	[2/N] Clean up CMake target linking (#109986 ) This PR cleans up more CMake target linking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109986 Approved by: https://github.com/malfet	2023-10-01 05:36:08 +00:00
Aleksei Nikiforov	e05eb69c93	Don't link to libcpuinfo on s390x (#109875 ) Don't even build it. It does not support s390x. This is a follow up for https://github.com/pytorch/pytorch/pull/109496 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109875 Approved by: https://github.com/kit1980	2023-09-26 12:43:35 +00:00
cyy	265acd4bea	Clean up CMake target linking (#109959 ) This PR cleans up more CMake target linking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109959 Approved by: https://github.com/ezyang	2023-09-25 01:37:14 +00:00
cyy	ba0362a09e	Remove unused build system checks and definitions (#109711 ) Remove some outdated checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109711 Approved by: https://github.com/ezyang	2023-09-21 16:52:16 +00:00
Nikita Shulga	44448754c1	[CI] Fix sccaching of nvcc builds (#106811 ) In cmake-3.26 or newer, `--options-file` is used, which renders nvcc outputs uncacheable by `sccache`, which were enable for CUDA-11 or newer builds by default by `6377a43814` Fix it by disabling RESPONSE_FILE use for CUDA compilation. Test Plan: Check that `multiple input files` stats in `PyTorch Build Statistics` is down to 13 files again, see https://github.com/pytorch/pytorch/actions/runs/5801865789/job/15727069855?pr=106811#step:10:42423 Fixes https://github.com/pytorch/pytorch/issues/105004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106811 Approved by: https://github.com/seemethere	2023-08-09 00:25:11 +00:00
Jesse Cai	f81f9093ec	[core][pruning][feature] cuSPARSELt build integration (#103700 ) Summary: This stack of PR's integrates cuSPARSELt into PyTorch. This PR adds support for cuSPARSELt into the build process. It adds in a new flag, USE_CUSPARSELT that defaults to false. When USE_CUSPASRELT=1 is specified, the user can also specify CUSPASRELT_ROOT, which defines the path to the library. Compiling pytorch with cusparselt support can be done as follows: `` USE_CUSPARSELT=1 CUSPARSELT_ROOT=/path/to/cusparselt python setup.py develop ``` Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700 Approved by: https://github.com/albanD	2023-08-02 12:48:39 +00:00
Jeff Daily	5379b5f927	[ROCm] use hipblas instead of rocblas (#105881 ) - BatchLinearAlgebraLib.cpp is now split into one additional file - BatchLinearAlgebraLib.cpp uses only cusolver APIs - BatchLinearAlgebraLibBlas.cpp uses only cublas APIs - hipify operates at the file level and cannot mix cusolver and cublas APIs within the same file - cmake changes to link against hipblas instead of rocblas - hipify mappings changes to map cublas -> hipblas instead of rocblas Pull Request resolved: https://github.com/pytorch/pytorch/pull/105881 Approved by: https://github.com/albanD	2023-07-31 20:42:55 +00:00
Rodrigo Kumpera	2636751fb9	[C10d] Add skeleton of LibUV backend. (#105672 ) This commit hooks up tcpstore creation and build flags. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105672 Approved by: https://github.com/fduwjj	2023-07-28 13:19:06 +00:00
Connor Baker	0c8323e4a4	cmake: allow USE_SYSTEM_ZSTD (#104611 ) Fixes #44255. This is part of larger work I'm doing to allow for more `USE_SYSTEM_*` options to allow Nix to have faster re-builds of PyTorch: https://github.com/NixOS/nixpkgs/pull/239291. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104611 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-05 04:47:35 +00:00
Nikita Shulga	3a823e4617	[BE][CMake] Do not pass `-mfpu=neon` on Apple (#104078 ) Followup after https://github.com/pytorch/pytorch/pull/103929 that get rid of an annoying warning, which will become an error in newer Xcode <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 748d60d</samp> > _`NEON_FOUND` is true_ > _But iOS may not like `-mfpu=neon`_ > _Check platform, then branch_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104078 Approved by: https://github.com/huydhn, https://github.com/kit1980	2023-06-23 17:09:30 +00:00
cyy	1e108d9c21	enable more ASAN tests (#101483 ) Recently, we are seeing some bugs found by ASAN such as #101400, I think enabling ASAN for more tests is necessary to catch more hidden bugs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101483 Approved by: https://github.com/huydhn	2023-06-15 05:21:15 +00:00
Jack Taylor	87c976b69d	Remove deprecated HIP flags (#102271 ) Removes the outdated HIP flags appended to HIP_CXX_FLAGS The will help remove the following warnings in the pytorch build log ``` [6238/6889] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/cudnn/hip/Conv_v8.cpp.o cc1plus: warning: command line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++ cc1plus: warning: unrecognized command line option ‘-Wno-unused-command-line-argument’ cc1plus: warning: unrecognized command line option ‘-Wno-exceptions’ cc1plus: warning: unrecognized command line option ‘-Wno-inconsistent-missing-override’ cc1plus: warning: unrecognized command line option ‘-Wno-macro-redefined’ ``` This also updates the gloo submodule commit to include the similar change made to gloo. `597accfd79` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102271 Approved by: https://github.com/malfet	2023-06-01 18:58:48 +00:00
Andres Lugo-Reyes	eaffd98880	Enable hipSOLVER in ROCm builds (#97370 ) Enables the hipSolver backend for ROCm builds -------------------------------------------------------------------------- - Minimum ROCm version requirement - 5.3 - Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER - Adds hipSOLVER API to hipification process - combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings - Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr() - Will enable 100+ linalg unit tests for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370 Approved by: https://github.com/malfet	2023-05-31 16:53:23 +00:00
Nikita Shulga	30cecc0e11	[MPS] Fix build regressions introduced by #92868 (#101036 ) https://github.com/pytorch/pytorch/pull/92868 introduced `OBJC` and `OBJCXX` language dialects, but fails to propagate some important flags, like OpenMP include path(if found), `-fno-objc-arc` and `-Wno-unguarded-availability-new` suppression. This PR remedies that and fixes https://github.com/pytorch/pytorch/issues/100925 <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 62677d4</samp> This pull request improves the support for MPSGraph on Apple platforms by fixing some CMake flags for parallelism and memory management. It modifies `cmake/Dependencies.cmake` and `CMakeLists.txt` accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101036 Approved by: https://github.com/atalman, https://github.com/huydhn	2023-05-10 04:15:41 +00:00
Aleksei Nikiforov	c130b8a716	Reintroduce s390x SIMD support (#99057 ) Reintroduce s390x SIMD support Use vectorized FMA to fix test precision failures Pull Request resolved: https://github.com/pytorch/pytorch/pull/99057 Approved by: https://github.com/malfet	2023-04-15 00:24:44 +00:00
Milos Puzovic	2630144786	Call to mkldnn_matmul from aten::addmm on AArch64 (#91763 ) We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/91763 Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/malfet	2023-04-01 04:25:57 +00:00
PyTorch MergeBot	3226ad21cf	Revert "[Reland] fix some MKL detection issues of CMake (#94924 )" This reverts commit `dc2b7aa955`. Reverted https://github.com/pytorch/pytorch/pull/94924 on behalf of https://github.com/atalman due to conda nightly build failures	2023-03-31 18:41:11 +00:00
cyy	dc2b7aa955	[Reland] fix some MKL detection issues of CMake (#94924 ) This is reland of PR #94402 that tries to solve the additional link issues. The PR #94402 failed because caffe2::mkl had been converted to private dependency while libtorch_cuda_linalg hadn't linked to it explicitly. This is fixed in commit 4373bf0ae3dee32afc178f9d51a4154d6c5904c6 We also replace more references of MKL_LIBRARIES by caffe2::mkl in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94924 Approved by: https://github.com/malfet	2023-03-31 02:01:52 +00:00
Pruthvi Madugundu	08f125bcac	[ROCm] Remove usage of deprecated ROCm component header includes (#97620 ) - clang parameter 'amdgpu-target' changed to 'offload-arch' - HIP and MIOpen includes path updated for extensions Pull Request resolved: https://github.com/pytorch/pytorch/pull/97620 Approved by: https://github.com/ezyang, https://github.com/jithunnair-amd	2023-03-28 19:28:38 +00:00
wangxiyuan	4ab1588d99	Enhance error message for dependency check (#96642 ) If python development library is missing when building pytorch from source, cmake will raise the error like: ``` CMake Error at cmake/Dependencies.cmake:1079 (if): if given arguments: "VERSION_LESS" "3" Unknown arguments specified ``` it's quite a misleading information that user would consider it's a syntax error or cmake version problem. This PR add a check to ensure `PYTHONLIBS_VERSION_STRING` exist before using. Related #87993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96642 Approved by: https://github.com/kit1980	2023-03-22 08:42:48 +00:00
cyy	666efd8d5d	Improve ASAN and TSAN handling in cmake (#93147 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93147 Approved by: https://github.com/malfet	2023-03-07 14:10:13 +00:00
Peter Bell	c5f6092591	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2023-03-01 17:26:36 +00:00
PyTorch MergeBot	801b3f8fc7	Revert "Use FindCUDAToolkit to find cuda dependencies (#82695 )" This reverts commit `7289d22d67`. Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/peterbell10 due to Breaks torchaudio build	2023-02-28 02:29:09 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
Peter Bell	7289d22d67	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2023-02-21 22:35:17 +00:00
dllehr-amd	98012e4a59	[ROCm] hipGraph support for pytorch mainline (#88202 ) With the release of ROCm 5.3 hip now supports a hipGraph implementation. All necessary backend work and hipification is done to support the same functionality as cudaGraph. Unit tests are modified to support a new TEST_GRAPH feature which allows us to create a single check for graph support instead of attempted to gather the CUDA level in annotations for every graph test Pull Request resolved: https://github.com/pytorch/pytorch/pull/88202 Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet	2023-02-14 22:18:56 +00:00
PyTorch MergeBot	e743d316e2	Revert "fix some MKL detection issues of CMake (#94402 )" This reverts commit `7ef46d40a1`. Reverted https://github.com/pytorch/pytorch/pull/94402 on behalf of https://github.com/malfet due to Broke binary builds, see https://github.com/pytorch/pytorch/issues/94751#issuecomment-1428562517	2023-02-13 22:09:40 +00:00
PyTorch MergeBot	36dfbb08f3	Revert "Update Cutlass to v2.11 (#94188 )" This reverts commit `a0f9abdcb6`. Reverted https://github.com/pytorch/pytorch/pull/94188 on behalf of https://github.com/ezyang due to bouncing this to derisk branch cut	2023-02-13 19:03:36 +00:00
Aaron Gokaslan	a0f9abdcb6	Update Cutlass to v2.11 (#94188 ) Now that we are on CUDA 11+ exclusively, we can update Nvidia's Cutlass to the next version. We also had to remove the cuda build flag : "-D__CUDA_NO_HALF_CONVERSIONS__" since Cutlass no longer builds without it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94188 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-02-12 20:45:03 +00:00
cyy	7ef46d40a1	fix some MKL detection issues of CMake (#94402 ) This PR rewrites some logic of FindMKL.cmake and FindOpenMP.cmake to better detect the corresponding libraries and fix the infinitely recursion between them. It also contains some other fixes without changing the CMake interface. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94402 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-12 19:19:10 +00:00
cyy	5fa7120722	Simplify CMake CUDNN code (#91676 ) 1. Move CUDNN code to seperate module. 2. Merge CUDNN public and private targets into a single private target. There is no need to expose CUDNN dependency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91676 Approved by: https://github.com/malfet	2023-02-08 01:06:10 +00:00
cyy	9291f9b9e2	Simplify cmake code (#91546 ) We use various newer CMake features to simplify build system: 1.Caffe2::threads is replaced by threads::threads. 2.Some unused MSVC flags are removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91546 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-08 01:05:19 +00:00
cyy	afd7b581aa	Simplify OpenMP detection in CMake (#91576 ) We greatly simplify the handing of OpenMP in CMake by using caffe2::openmp target thoroughly. We follow the old behavior by defaulting to MKL OMP library and detecting OMP flags otherwise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91576 Approved by: https://github.com/malfet	2023-02-04 11:50:06 +00:00
Hansong Zhang	d996acfbc2	[XNNPACK] disable ARM_BF16 and ARM_FP16_VECTOR (#94020 ) Summary: This is not used and will cause build failure Test Plan: CI Differential Revision: D42982023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94020 Approved by: https://github.com/Skylion007, https://github.com/tiandiao123, https://github.com/digantdesai	2023-02-03 05:01:00 +00:00
Digant Desai	989722cd19	Use global PIC flag for XNNPACK (#93896 ) Summary: - XNNPACK Object libraries needs an explicit PIC flag when building static, PIC libXNPACK.a - Without this link process runs into relocation errors - Using this global switch to avoid updating XNNPACK CMake Test Plan: CI Differential Revision: D42944764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93896 Approved by: https://github.com/Skylion007, https://github.com/Neilblaze, https://github.com/salilsdesai	2023-02-02 23:38:21 +00:00
cyy	9710ac6531	Some CMake and CUDA cleanup given recent update to C++17 (#90599 ) The main changes are: 1. Remove outdated checks for old compiler versions because they can't support C++17. 2. Remove outdated CMake checks because it now requires 3.18. 3. Remove outdated CUDA checks because we are moving to CUDA 11. Almost all changes are in CMake files for easy audition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599 Approved by: https://github.com/soumith	2022-12-30 11:19:26 +00:00
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Michael Wootton	5351176caa	Kineto activity fix (#89785 ) Continuation of https://github.com/pytorch/pytorch/pull/88207 A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Previously a cmake variable was not being propagated to a '-D', causing an issue on Windows, which uses cuda but not cupti. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89785 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-12-08 00:24:55 +00:00
Dmytro Dzhulgakov	ae01615d75	Fix cupti search path in CMake (#88657 ) Minor fix for when cuda is installed via conda. In this case the libraries are in `lib` and not `lib64`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88657 Approved by: https://github.com/kit1980, https://github.com/malfet	2022-11-10 23:44:52 +00:00
Pruthvi Madugundu	fbd08fb358	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-04 04:43:05 +00:00
PyTorch MergeBot	0fa23663cc	Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 )" This reverts commit `1e2c4a6e0e`. Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev	2022-11-02 18:13:37 +00:00
Pruthvi Madugundu	1e2c4a6e0e	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-02 17:41:57 +00:00
Jithun Nair	2e48b478e0	[ROCm] Use -rpath-link to fix libtinfo conflict (#83552 ) Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors. cc @jeffdaily @sunway513 @ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552 Approved by: https://github.com/malfet, https://github.com/pruthvistony	2022-10-28 03:50:43 +00:00
PyTorch MergeBot	ac0c13f665	Revert "[ROCm] Use -rpath-link to fix libtinfo conflict (#83552 )" This reverts commit `a10446c4d8`. Reverted https://github.com/pytorch/pytorch/pull/83552 on behalf of https://github.com/kit1980 due to Broke ios/macos builds https://github.com/pytorch/pytorch/actions/runs/3329991911/jobs/5507911292	2022-10-26 16:43:13 +00:00
Jithun Nair	a10446c4d8	[ROCm] Use -rpath-link to fix libtinfo conflict (#83552 ) Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552 Approved by: https://github.com/malfet	2022-10-26 14:40:29 +00:00
Vladimír Aubrecht	409efebab8	Added define to fix issue with compatibility with latest Windows SDK (#85408 ) Fixes #83820. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85408 Approved by: https://github.com/ezyang	2022-10-12 15:44:28 +00:00
Jithun Nair	90b64e231e	Update hipification logic for all ROCm headers (#85320 ) ...to remove deprecation warnings. Remove component-specific include dirs from include path Pull Request resolved: https://github.com/pytorch/pytorch/pull/85320 Approved by: https://github.com/kit1980	2022-09-21 16:22:12 +00:00
John Detloff	e0229d6517	Remove caffe2 mobile (#84338 ) We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338 Approved by: https://github.com/dreiss	2022-09-08 01:49:55 +00:00
Nikita Shulga	62c8d30f9f	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-10 14:32:26 +00:00
PyTorch MergeBot	d3a1f17fc7	Revert "[BE] Add `append_cxx_flag_if_supported` macro (#82883 )" This reverts commit `d7e6aaa59b`. Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-10 10:27:59 +00:00
Nikita Shulga	d7e6aaa59b	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-08 21:04:09 +00:00
Nikita Shulga	83086b7f45	Fix NCCL detection by Gloo (#82773 ) Instruct Gloo to always use bundled version of the library by passing `NCCL_EXTERNAL` Otherwise, it would link with shared library if one could be found in the system Pull Request resolved: https://github.com/pytorch/pytorch/pull/82773 Approved by: https://github.com/ngimel	2022-08-04 16:26:30 +00:00
zhang, xiaobing	86b86202b5	fix torch.config can't respect USE_MKLDNN flag issue (#75001 ) Fixes https://github.com/pytorch/pytorch/issues/74949, which reports that torch.config can't respect USE_MKLDNN flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75001 Approved by: https://github.com/malfet	2022-07-17 15:00:48 +00:00
Nikita Shulga	17fe7ce0e4	[BE] Delete Win specific case for CMake older than 3.1 (#81411 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81411 Approved by: https://github.com/janeyx99	2022-07-14 00:31:31 +00:00
Tongliang Liao	dff70a5e1a	Make language std configurable. (#75519 ) RocksDB 7 starts to use C++17 in header. We should make this configurable, in case user needs higher std version. List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`. Doc string is from CMake code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519 Approved by: https://github.com/malfet	2022-07-13 14:21:27 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
Terry Lam	54bdaf76d6	[PFC] Native UCC process group for Pytorch (#79918 ) Summary: This diff integrates UCC process group as a native component of Pytorch Distributed core. It is based on the existing torch-ucc (https://github.com/facebookresearch/torch_ucc) as the wrapper for UCC collective communication library. The environment and cmake variables are named in mirroring to the existing process groups such as NCCL and Gloo. Specifically, - USE_UCC: enables UCC PG. This defaults to OFF, so there is no breakage of existing builds that do not have UCX/UCC external libraries. - USE_SYSTEM_UCC: uses external UCX and UCC shared libraries that are set accordingly with UCX_HOME and UCC_HOME. Currently, this diff only supports USE_SYSTEM_UCC=ON, i.e., requiring users to specify external libraries for UCX and UCC. In subsequent diffs, we will add UCX and UCC repos as third-party dependencies in pytorch/third-party. Test Plan: Passed Torch-UCC tests that invoke UCC process group. For example: $ sh test/start_test.sh test/torch_allreduce_test.py --backend gloo --use-cuda ... Test allreduce: succeeded Differential Revision: D36973688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79918 Approved by: https://github.com/kwen2501, https://github.com/kingchc	2022-07-12 14:45:44 +00:00
Michael Suo	b349d15907	[build] fix compiling with clang13 (#80916 ) This check is incorrect; clang 13.1.0 doesn't exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80916 Approved by: https://github.com/malfet	2022-07-06 02:35:46 +00:00
PyTorch MergeBot	1454515253	Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 )" This reverts commit `f988aa2b3f`. Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see `f988aa2b3f`	2022-06-30 12:49:41 +00:00
Jing Xu	f988aa2b3f	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-06-30 05:14:03 +00:00
Mo Zhou	799d71378c	cmake: Fix variable typo for USE_SYSTEM_PYBIND11. (#80272 ) The correct variable name should be USE_SYSTEM_PYBIND11, as defined in the root CMakeLists.txt. In cmake/Dependencies.cmake, it is incorrectly written as USE_SYSTEM_BIND11, but cmake will not complain about this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80272 Approved by: https://github.com/suo	2022-06-27 02:08:07 +00:00
Toyohisa Kameyama	8adec19230	Specify "Generic" BLAS library name. (#74269 ) When we use pytorch with unregistered blas, spack set BLAS=Generic. pytorch is searched only libblas. If the blas package's blas library name is not libblas, spack install py-torch is failed. This PR set blas lirary names to GENERIC_BLAS_LIBRARIES environment variable, and py-torch is found blas library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74269 Approved by: https://github.com/kit1980	2022-06-20 18:44:54 +00:00
Sergii Dymchenko	f1fb575b9e	Remove -Wno-unused-but-set-variable for clang 13.0.0 (#79666 ) Fixes #74805 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79666 Approved by: https://github.com/malfet	2022-06-16 04:26:39 +00:00
Mark Harfouche	221755cc71	Link BLAS privately (#78883 ) We've some users report that they are getting symbol collisions when linking to blas. I don't see a need to re-export the blas library symbols. I figured I would share here for other packagers to be able to benefit too. xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116 xref: https://github.com/conda-forge/openblas-feedstock/issues/134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78883 Approved by: https://github.com/ezyang	2022-06-09 17:02:06 +00:00
Peter Bell	5cdf79fddc	Bump minimum CMake version to 3.13 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76312 Approved by: https://github.com/malfet	2022-05-19 15:38:55 +00:00
Nikita Shulga	4b4a6a0b19	Use TensorPipe libuv in Gloo (#77312 ) Otherwise, its possible to build TensorPipe with one version of libuv and gloo with another. Also, delete strange `GLOO_INSTALL` logic, as none of the install artifacts are really packaged as part of PyTorch (and it were probably used by Caffe2 builds) This helps solve problem for compiling PyTorch for M1, where `libuv` is not available in conda Pull Request resolved: https://github.com/pytorch/pytorch/pull/77312 Approved by: https://github.com/seemethere	2022-05-17 03:31:48 +00:00
Nikita Shulga	8473173c36	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency. Add `third_party` to torch_cpu include directories if compiling with Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-05-03 20:21:55 +00:00
Peter Bell	653892e288	Kineto: Don't search for CUPTI in default paths Should fix #75369 Searching the default system paths may point to different cuda toolkit versions, so we should restrict the search to only the paths passed explicitly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76188 Approved by: https://github.com/ezyang	2022-04-22 01:08:55 +00:00
PyTorch MergeBot	d79d9fa283	Revert "Remove breakpad dependency" This reverts commit `9aa3c7fd83`. Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet	2022-04-17 17:58:51 +00:00
Nikita Shulga	9aa3c7fd83	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-04-17 17:43:45 +00:00
Min Si	42b4d0e934	[caffe2] remove unecessary RCCL dependency Summary: RCCL is required by two components in hipified Pytorch: (1) gloo and (2) hipified ProcessGroupNCCL. - For (1) the RCCL dependency is managed in `./third_party/gloo/cmake/Dependencies.cmake` and can be enabled/disabled via `USE_RCCL`. - For (2) the RCCL dependency is managed via `./cmake/Dependencies.cmake` and can be on/off via `USE_NCCL`. The additional dependency removed in this commit forced hipified Pytorch to load librccl.so even when USE_RCCL=OFF USE_NCCL=OFF is set, i.e., when using torch_ucc/ucc for AMD GPU mem type. This caused conflicts when we use a non-system default librccl.so (i.e., not in ROCM_PATH) for torch_ucc/ucc. This commit removes the unnecessary RCCL dependency. This will ensure a cleaner way to use torch_ucc with a user-specified RCCL library. Test Plan: ## Verify OSS pytorch on an AMD GPU machine (MI100) ``` ROCM_PATH=/opt/rocm-4.5.2 git clone https://github.com/pytorch/pytorch.git cd pytorch python3 tools/amd_build/build_amd.py USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py develop USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py install ``` log for develop: P492778257 log for install: P492778277 ## Verify OSS pytorch + TorchUCC on an AMD GPU machine (MI100) ``` export RCCL_INSTALL_DIR=/opt/rccl-rocm-rel-4.4 git clone https://github.com/facebookresearch/torch_ucc.git cd torch_ucc UCX_HOME=$RCCL_INSTALL_DIR UCC_HOME=$RCCL_INSTALL_DIR WITH_CUDA=$ROCM_PATH python setup.py # run param comm export HSA_ENABLE_SDMA=0 export LD_LIBRARY_PATH=$RCCL_INSTALL_DIR cd test git clone https://github.com/facebookresearch/param cd .. /bin/bash ./test/start_test.sh ./test/param/train/comms/pt/comms.py --backend ucc --device cuda --b 4 --e 4M --c 1 --collective all_reduce ``` - log for param comm: P493033836 - Verified librccl.so in `/opt/rccl-rocm-rel-4.4` is used via checking version string in log. "[localbuild]" is added in RCCL source. ``` RCCL version 2.9.9+hip4.4 [localbuild] ``` Differential Revision: D35476911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75547 Approved by: https://github.com/malfet, https://github.com/jeffdaily	2022-04-12 17:45:08 +00:00
Michael Suo	e5bf87963d	Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4) Test Plan: revert-hammer Differential Revision: D34584878 (`7dd0823011`) Original commit changeset: ce817aa8cc90 Original Phabricator Diff: D34584878 (`7dd0823011`) fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b (cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)	2022-03-21 23:07:14 +00:00
chunyuan	7dd0823011	Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111 ) Summary: ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111 Reviewed By: eellison Differential Revision: D34584878 Pulled By: malfet fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4 (cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)	2022-03-21 22:12:19 +00:00
Nikita Shulga	14dcb5a1a0	Fix asmjit compilation with clang-13 By suppressed `deprecated-copy` and `unused-but-set-variable` warnings, otherwise compilation fails with implicit default copy constructor: ``` /Users/malfet/git/pytorch/pytorch/third_party/fbgemm/third_party/asmjit/src/asmjit/core/../core/radefs_p.h:174:22: error: definition of implicit copy constructor for 'RARegCount' is deprecated because it has a user-declared copy assignment operator [-Werror,-Wdeprecated-copy] inline RARegCount& operator=(const RARegCount& other) noexcept = default; ``` Fixes https://github.com/pytorch/pytorch/issues/74352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74379 Approved by: https://github.com/seemethere, https://github.com/atalman	2022-03-17 17:09:07 +00:00
Edward Z. Yang	493bbdc4fe	Use shared CUPTI by default Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74009 Approved by: https://github.com/malfet	2022-03-16 21:04:12 +00:00
Andrey Talman	17b3ba148d	Set `BLAS_LIBRARIES` to `${MKL_LIBRARIES}` for MKL case (#72806 ) This reverts [suggestion](https://github.com/pytorch/pytorch/pull/49647#discussion_r677737470) proposed to https://github.com/pytorch/pytorch/pull/49647 Which is somehow sufficient to workaround symptoms of https://github.com/pytorch/pytorch/issue/72653 I.e. before this change, `BLAS_LIBRARIES` were set to `caffe2::mkl` which is an interface library with link property set as follows: `59dd84cab6/cmake/public/mkl.cmake (L10-L12)`	2022-02-16 07:14:27 -08:00
Aaron Enye Shi	8a43aa9538	[Kineto][Bug Fix] Avoid picking up old CUPTI headers (#72761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72761 By default, the CUPTI_INCLUDE_DIR will pick up cupti.h from /usr/include which is old (from 2017 on AWS), and missing many cupti headers. Use NO_DEFAULT_PATH to avoid that, instead search from the list of locations provided. Test Plan: Fixes missing headers error when building on AWS. (Avoids old cupti.h from /usr/include). Instead uses cupti.h from cuda/extras/CUPTI/include. ``` In file included from /scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.cpp:13:0: /scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.h:12:10: fatal error: cupti_profiler_target.h: No such file or directory #include <cupti_profiler_target.h> ^~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. ``` and ``` /scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.cpp:7:10: fatal error: nvperf_host.h: No such file or directory #include <nvperf_host.h> ^~~~~~~~~~~~~~~ compilation terminated. ``` Reviewed By: briancoutinho Differential Revision: D34191123 Pulled By: aaronenyeshi fbshipit-source-id: d84f80308c9939ba8ed504e667847d136a261453 (cherry picked from commit `33368bd93b`)	2022-02-15 22:43:03 +00:00
Peter Bell	bc1fb7a618	CMake: Limit python include directories to only python libraries (#69085 ) Summary: `include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes `python`, `numpy` and `pybind11` into targets that only `torch_python` and `caffe2_pybind_state` are linked to. So, python libraries can't be accidentally included elsewhere. Resubmit of https://github.com/pytorch/pytorch/issues/65654, Closes https://github.com/pytorch/pytorch/issues/65828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69085 Reviewed By: anjali411 Differential Revision: D33776456 Pulled By: malfet fbshipit-source-id: 018b0f6cd5a4f8c9e36df961deff832bc4afd479 (cherry picked from commit `57063107d6`)	2022-02-07 21:18:32 +00:00
Peter Bell	847dbb8684	CMake: Clean up unused definitions (#69216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69216 This cleans up 4 pre-processor defines not used by any code: - HAVE_GCC_GET_CPUID - USE_GCC_GET_CPUID - USE_AVX - USE_AVX2 `cpuid` isn't used in PyTorch any more, we only use `cpuinfo`. `USE_AVX` is also not used, instead `HAVE__CPU_DEFINITIONS` tells you which `CPU_CAPABILITY` flags are being compiled. There is also `fbgemm`'s code path adding `third_party` as an include path, despite `fbgemm` having a dedicated include directory and a CMake setup that properly includes it. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33794424 Pulled By: malfet fbshipit-source-id: 99d504af088818d4a26c2f6ce67ec0d59a5eb703 (cherry picked from commit `2e099d41f0`)	2022-01-31 22:49:11 +00:00
Peter Bell	d693739248	CMake: Clean up unused definitions (#69216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69216 Currently `torch_cpu` has command line arguments relating to cuda libraries e.g. `-DMAGMA_V2`. This happens because `include_directories` and `add_definitions` indescriminately change the compile commands of all targets. Instead creating a proper magma target allows limiting the flags to just `torch_cuda`. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33794174 Pulled By: malfet fbshipit-source-id: 762eabf3b9576bef94e8caa3ed4764c0e2c72b08 (cherry picked from commit `f7d127b654`)	2022-01-31 22:49:11 +00:00
Peter Bell	5045c18bd1	Error if pocketfft is not found (#67909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67842 cc mruberry peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67909 Reviewed By: albanD Differential Revision: D33759534 Pulled By: malfet fbshipit-source-id: 03548c95fe233b812b303ce9603c20ff9f626c39 (cherry picked from commit `214624e254`)	2022-01-31 17:29:48 +00:00
Han Qi	1bc3571078	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer module object Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged. Test Plan: unittest Reviewed By: malfet, gmagogsfm Differential Revision: D33239362 fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763	2022-01-12 16:30:39 -08:00
Andrey Talman	6c4437118b	Deprecating Python 3.6 (#70493 ) Summary: Deprecating python 3.6 from documentation and from cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/70493 Reviewed By: suo Differential Revision: D33433118 Pulled By: atalman fbshipit-source-id: c3adc7b75714efdb5b6acda5d4cddc068fb4a145	2022-01-05 11:46:32 -08:00
Michael Suo	1adb70c6f0	Revert D33409880: [pytorch][PR] Deprecating Python 3.6 Test Plan: revert-hammer Differential Revision: D33409880 (`d95be99561`) Original commit changeset: 4f9123398960 Original Phabricator Diff: D33409880 (`d95be99561`) fbshipit-source-id: 32dc1c3c07ef99a04fab7d0fb742cf4e6c4b718a	2022-01-04 16:37:09 -08:00
Andrey Talman	d95be99561	Deprecating Python 3.6 (#70493 ) Summary: Deprecating python 3.6 from documentation and from cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/70493 Reviewed By: malfet Differential Revision: D33409880 Pulled By: atalman fbshipit-source-id: 4f912339896096be95b344724a4d9ae88cdf1a8f	2022-01-04 14:41:27 -08:00
linuxone	f64906f470	ibm z14/15 SIMD support (#66407 ) Summary: https://github.com/pytorch/pytorch/issues/66406 implemented z arch 14/15 vector SIMD additions. so far besides bfloat all other types have their SIMD implementation. it has 99% coverage and currently passing the local test. it is concise and the main SIMD file is only one header file it's using template metaprogramming, mostly. but still, there are a few macrosses left with the intention not to modify PyTorch much Sleef supports z15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66407 Reviewed By: mrshenli Differential Revision: D33370163 Pulled By: malfet fbshipit-source-id: 0e5a57f31b22a718cd2a9ac59753fb468cdda140	2022-01-04 09:40:18 -08:00
Peter Bell	c34aa715fa	AT_MKL_SEQUENTIAL and build changes (#70259 ) Summary: Re-land of https://github.com/pytorch/pytorch/pull/69419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70259 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33246757 Pulled By: ngimel fbshipit-source-id: 738f8558d4cad6752be14108f9931ec3514f6682	2021-12-22 13:52:23 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Robert Blackwell	cee4e8f35d	Add FlexiBLAS build support per #64752 (#64815 ) Summary: To enable building torch+dependencies, set WITH_BLAS=flexi BLAS=FlexiBLAS Fixes https://github.com/pytorch/pytorch/issues/64752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64815 Reviewed By: jbschlosser Differential Revision: D31997745 Pulled By: albanD fbshipit-source-id: db208d59002f5896608a03132616400f09d972aa	2021-10-28 11:28:00 -07:00
Xiang Gao	b8dfb45ac2	Refactor cub namespace handling (#66219 ) Summary: This PR is to update PyTorch with the following cub changes: - Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added. And I do the following change to PyTorch: - Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag. - Fix caffe2 failures caused by the above change. - Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219 Reviewed By: bdhirsh Differential Revision: D31626931 Pulled By: ngimel fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d	2021-10-25 14:37:09 -07:00
Michael Suo	3ac2c74896	Revert D31082208: Use shared CUPTI by default Test Plan: revert-hammer Differential Revision: D31082208 (`8b0eae5aa8`) Original commit changeset: 14f66af92084 fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db	2021-10-12 14:37:54 -07:00
Edward Yang	8b0eae5aa8	Use shared CUPTI by default (#65401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401 Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gdankel Differential Revision: D31082208 Pulled By: ezyang fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34	2021-10-12 11:01:40 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Michael Suo	9b40eaaaab	Revert D31193205: [pytorch][PR] CMake: Limit python include directories to only python libraries Test Plan: revert-hammer Differential Revision: D31193205 (`971c57f1d0`) Original commit changeset: 5c1b554a59d0 fbshipit-source-id: 5719b7df987ded6e7e212749a438db947656df87	2021-09-29 09:49:33 -07:00
Peter Bell	971c57f1d0	CMake: Limit python include directories to only python libraries (#65654 ) Summary: `include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes python, numpy and pybind11 into targets that only torch_python and caffe2_pybind_state are linked to. So, python libraries can't be accidentally included elsewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65654 Reviewed By: gchanan Differential Revision: D31193205 Pulled By: malfet fbshipit-source-id: 5c1b554a59d0e441a701a04ebb62f0032d38b208	2021-09-29 08:09:08 -07:00
Nikita Shulga	399214efd6	Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows Test Plan: revert-hammer Differential Revision: D31172530 (`6b60884f12`) Original commit changeset: 2c69ed0282c5 fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934	2021-09-24 19:18:15 -07:00
Guangyun Han	6b60884f12	Enable CUPTI for kineto by default on windows (#65608 ) Summary: Retry of https://github.com/pytorch/pytorch/pull/62175 See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information. malfet gdankel Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608 Reviewed By: zou3519 Differential Revision: D31172530 Pulled By: gdankel fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66	2021-09-24 13:00:49 -07:00
Nikita Shulga	bc02255d5e	Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows. Test Plan: revert-hammer Differential Revision: D30721329 (`7dbc21bc2b`) Original commit changeset: aa1af47df8cc fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404	2021-09-23 22:14:32 -07:00
Guangyun Han	7dbc21bc2b	Enable CUPTI for kineto by default on windows. (#62175 ) Summary: It fix nothing. For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175 Reviewed By: ezyang Differential Revision: D30721329 Pulled By: gdankel fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84	2021-09-23 15:13:47 -07:00
Nick Kreeger	882b67dff4	Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892 ) Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b	2021-09-14 09:44:18 -07:00
Peter Bell	e4f44bec27	Fix pocketfft include path in mobile build (#63714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714 PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target, Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30498369 Pulled By: malfet fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef	2021-08-23 17:48:57 -07:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
Nikita Shulga	6e5d065b2b	Add pocketfft as submodule (#62841 ) Summary: Using https://github.com/mreineck/pocketfft Also delete explicit installation of pocketfft during the build as it will be available via submodule Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5 Partially addresses https://github.com/pytorch/pytorch/issues/62821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841 Reviewed By: seemethere Differential Revision: D30140441 Pulled By: malfet fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825	2021-08-17 15:29:56 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
Pruthvi Madugundu	ab7a472980	[ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786 ) Summary: - HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm. - TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786 Reviewed By: bdhirsh Differential Revision: D30281682 Pulled By: seemethere fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673	2021-08-13 15:00:43 -07:00
Isuru Fernando	b58e04f156	Make sure FindLAPACK finds the same BLAS library (#49647 ) Summary: BLAS library is found by cmake/Dependencies.cmake and then LAPACK library is found by FindLAPACK.cmake which in turn calls FindBLAS.cmake. This means that we are searching for BLAS twice and they might be different things. By setting a few variables, this can be avoided. cc seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647 Reviewed By: seemethere, ejguan Differential Revision: D29943680 Pulled By: malfet fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59	2021-08-02 20:41:00 -07:00
Can Balioglu	7565039ee9	Support system-provided Intel TBB (#61934 ) Summary: This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic. Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934 Reviewed By: malfet Differential Revision: D29805416 Pulled By: cbalioglu fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd	2021-08-02 07:39:00 -07:00
Hong Xu	7acb8b71e1	Remove AVX detection code that duplicates FindAVX.cmake (#61748 ) Summary: This PR deletes some code in `MiscCheck.cmake` that perform the exact same functionality as `FindAVX.cmake`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748 Reviewed By: ejguan Differential Revision: D29791282 Pulled By: malfet fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213	2021-07-20 14:34:36 -07:00
Tongliang Liao	0afbb9e81e	`PYTHON_LIBRARY` may be set to empty or NOTFOUND. (#61230 ) Summary: Not sure why (maybe from dependencies?) but it can certainly break package lookup upon re-entry of cmake. So instead of checking whether they are defined, we should check whether there is any meaningful value inside. Fixes https://github.com/pytorch/pytorch/issues/59887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61230 Reviewed By: H-Huang Differential Revision: D29668766 Pulled By: malfet fbshipit-source-id: 79a59578740c4434327aff4f9a22eba9c4bf48d1	2021-07-13 07:09:31 -07:00
Nikita Shulga	4036820506	Add PocketFFT support (#60976 ) Summary: Needed on platforms, that do not have MKL, such as aarch64 and M1 - Add `AT_POCKETFFT_ENABLED()` to Config.h.in - Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT - Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations Fixes https://github.com/pytorch/pytorch/issues/41592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976 Reviewed By: walterddr, driazati, janeyx99, samestep Differential Revision: D29466530 Pulled By: malfet fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf	2021-06-30 16:28:20 -07:00
Peter Bell	31a884987d	Remove some TH includes from ATen (#60323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60323 Test Plan: Imported from OSS Reviewed By: malfet, anjali411 Differential Revision: D29252862 Pulled By: ngimel fbshipit-source-id: 9ea13495d382c04dfd52b8dd63314f53b7e83936	2021-06-22 10:55:17 -07:00
Luca Wehrstedt	08ce5eedf5	[reland] Move RPC agents to libtorch (#60170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170 Reland of #59939. Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193234 fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60	2021-06-18 05:15:09 -07:00
Mike Ruberry	f233274f30	Revert D28875276: Move RPC agents to libtorch Test Plan: revert-hammer Differential Revision: D28875276 (`fc50f91929`) Original commit changeset: f2f6970fd74d fbshipit-source-id: 3c52af652579733ebea8ddfb06576a0ce262bf78	2021-06-17 00:48:58 -07:00
Luca Wehrstedt	fc50f91929	Move RPC agents to libtorch (#59939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59939 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28875276 fbshipit-source-id: f2f6970fd74de5f112636e78edaa4410c61d8c45	2021-06-15 16:20:53 -07:00
Nikita Shulga	8845cbabf0	[CMake] Split caffe2::cudnn into public and private (#59721 ) Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336	2021-06-09 13:18:48 -07:00
Michael Wootton	e66015dadf	Add build support for kineto + rocm (#58401 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58399 CMake changes to allow kineto to build with rocm support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58401 Reviewed By: mruberry Differential Revision: D28479807 Pulled By: walterddr fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7	2021-06-03 12:15:20 -07:00
neginraoof	599f5058cf	[ONNX] Update ONNX to rel-1.9 (#55889 ) (#57080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57080 ONNX optimizer is removed in ONNX 1.9 This PR removes ONNX optimizer from a C++ code path and uses `try-except` block in Python to keep it compatible with both ONNX-1.8 and 1.9. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28467330 Pulled By: malfet fbshipit-source-id: 5e4669dd0537648898e593f9e253da18d6dc7568 Co-authored-by: neginraoof <neginmr@utexas.edu> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-06-02 08:27:17 -07:00
Jeff Daily	ba694520e5	[ROCm] fix JIT codegen (#57400 ) Summary: Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT. - ROCM_VERSION macro must be available to both device and host compilation passes. - Unifies some of CUDA and HIP differences in the code generated. - NAN / POS_INFINITY / NEG_INFINITY - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated] - Differentiates bf16 codegen for HIP. - Optionally provides missing macros when using hiprtc precompiled header feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400 Reviewed By: ejguan Differential Revision: D28421065 Pulled By: malfet fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074	2021-05-27 11:45:07 -07:00
Nikita Shulga	7179e7ea7b	[CMake] Prefer third_party/pybind11 by default (#58951 ) Summary: To make build behaviour aligned with other third_party/ libraries, introduce `USE_SYSTEM_PYBIND11 (`d55b25a633`)` build option, which set to OFF by default, which means PyTorch will be build with bundled pybind11 even if other version is already installed locally. Fixes https://github.com/pytorch/pytorch/issues/58750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951 Reviewed By: driazati Differential Revision: D28690411 Pulled By: malfet fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c	2021-05-25 15:10:17 -07:00
Xiang Gao	6c70cbedb6	step 0 of cuDNN v8 convolution API integration (#51390 ) Summary: This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release. The work is not complete, and this PR is only step 0. What this PR does: - Add cudnn-frontend as a submodule. - Modify cmake to build that submodule. - Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default. - Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below. What this PR doesn't: - Only convolution forward, no backward. The backward will use v7 API. - No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions. - No test beyond PyTorch's unit tests. - Not tested for correctness on real models. - Not benchmarked for performance. - Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR) - cuDNN benchmark is not supported. - There are failing tests, which will be resolved later: ``` FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9 FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an... FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM ``` Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390 Reviewed By: malfet Differential Revision: D28513167 Pulled By: ngimel fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740	2021-05-19 12:54:09 -07:00
peter	432676599c	Stop installing libuv on Windows (#51936 ) Summary: Fixes #{issue number} gunandrose4u Pull Request resolved: https://github.com/pytorch/pytorch/pull/51936 Reviewed By: malfet Differential Revision: D28467662 Pulled By: seemethere fbshipit-source-id: 28d203ee3af13d6a3158f188c2e889e310ee6010	2021-05-17 08:52:29 -07:00
Ilia Cherniavskii	6997e7bd39	Update Kineto submodule (#58179 ) Summary: Update Kineto submodule, minor api changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/58179 Test Plan: CI Reviewed By: gdankel Differential Revision: D28391369 Pulled By: ilia-cher fbshipit-source-id: 61fbf63d9ec2db66fac203944679e4b99cb0d568	2021-05-13 04:03:04 -07:00
Ilia Cherniavskii	c714596027	[kineto] Update Kineto submodule, cupti library paths (#57789 ) Summary: Update kineto submodule, improve cupti detection Pull Request resolved: https://github.com/pytorch/pytorch/pull/57789 Test Plan: CI Reviewed By: ngimel Differential Revision: D28297175 Pulled By: ilia-cher fbshipit-source-id: 5895270fae160097ae8872a592984d0e4a1b187b	2021-05-10 19:15:59 -07:00
Ilia Cherniavskii	65fad0ebd2	Expand Kineto platform support (ci-all) (#56323 ) Summary: Expanding support to all builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/56323 Test Plan: CI Reviewed By: malfet Differential Revision: D28171478 Pulled By: ilia-cher fbshipit-source-id: 16bc752d1be3cbaeda5316f5d8a687ae05a83d22	2021-05-05 15:00:01 -07:00
davidriazati@fb.com	c44cbc63cc	Ignore more compiler warnings, unify WERROR options (#56630 ) Summary: This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet). ](https://our.intern.facebook.com/intern/diff/28005063/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630 Pulled By: driazati Reviewed By: malfet Differential Revision: D28005063 fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0	2021-04-29 21:20:29 -07:00
davidriazati@fb.com	4b96fc060b	Remove distutils (#57040 ) Summary: [distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places. Fixes #56527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040 Pulled By: driazati Reviewed By: nikithamalgifb Differential Revision: D28051356 fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720	2021-04-29 12:10:11 -07:00
davidriazati@fb.com	d1b6383d65	Hide warnings for deprecated quantization APIs (#56291 ) Summary: These have a tracking task to actually fix them but in the meantime they should not be clogging up everyone's build output (see #55952). ](https://our.intern.facebook.com/intern/diff/27830229/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56291 Pulled By: driazati Reviewed By: bertmaher Differential Revision: D27830229 fbshipit-source-id: f1e5d6e9b2c63d4a4ad99a1744a520f8c681c22b	2021-04-19 11:11:33 -07:00
Jeff Daily	e1752ffa04	[reland][ROCm] use hiprtc precompiled header (#55965 ) Summary: Revert "Revert D27449031 (`2a7df657fe`): [pytorch][PR] [ROCm] use hiprtc precompiled header". Reland PR https://github.com/pytorch/pytorch/issues/54350. This reverts commit `204ac21bf1`. The original PR was reverted under suspicion that it was causing CI instability, but it was instead due to a hardware failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55965 Reviewed By: jbschlosser Differential Revision: D27755907 Pulled By: malfet fbshipit-source-id: 75bf0b9d888df3dee62f00a366b1123757e0474e	2021-04-15 15:47:56 -07:00
Eddie Yan	81f181567a	Add `USE_MAGMA` build flag (#55994 ) Summary: Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master). A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be manually deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild? CC malfet ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994 Reviewed By: mruberry Differential Revision: D27766287 Pulled By: malfet fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421	2021-04-15 00:43:12 -07:00
Alexander Golynski	204ac21bf1	Revert D27449031: [pytorch][PR] [ROCm] use hiprtc precompiled header Test Plan: revert-hammer Differential Revision: D27449031 (`2a7df657fe`) Original commit changeset: 81a8d7847a47 fbshipit-source-id: b7b970c8ea4110357fba3ad4d52a86fa5641d90c	2021-04-01 06:42:04 -07:00
Jeff Daily	2a7df657fe	[ROCm] use hiprtc precompiled header (#54350 ) Summary: HIP's runtime compiler (hiprtc) is adding support for precompiled HIP headers in the ROCm 4.2 release. Conditionally add support for this feature. Using this feature will improve the ROCm torch wheel user experience; users will no longer need to install HIP headers separately to use torch JIT features. The use of this feature is conditionalized on a new ROCM_VERSION macro. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54350 Reviewed By: H-Huang Differential Revision: D27449031 Pulled By: malfet fbshipit-source-id: 81a8d7847a47ce2bb253d1ea58740ef66ed154a3	2021-03-31 13:36:50 -07:00
Shruti Ramesh	f1f3c8b0fa	Adding PyTorch + DNNL + AMD BLIS path (#54953 ) Summary: These changes provide the user with an additional option to choose the DNNL+BLIS path for PyTorch. This assumes BLIS is already downloaded or built from source and the necessary library file is available at the location: $BLIS_HOME/lib/libblis.so and include files are available at: $BLIS_HOME/include/blis/blis.h and $BLIS_HOME/include/blis/cblas.h Export the below variables to build PyTorch with MKLDNN+BLIS and proceed with the regular installation procedure as below: $export BLIS_HOME=path-to-BLIS $export PATH=$BLIS_HOME/include/blis:$PATH LD_LIBRARY_PATH=$BLIS_HOME/lib:$LD_LIBRARY_PATH $export BLAS=BLIS USE_MKLDNN_CBLAS=ON WITH_BLAS=blis $python setup.py install CPU only Dockerfile to build PyTorch with AMD BLIS is available at : docker/cpu-blis/Dockerfile Example command line to build using the Dockerfile: sudo DOCKER_BUILDKIT=1 docker build . -t docker-image-repo-name Example command line to run the built docker container: sudo docker run --name container-name -it docker-image-repo-name Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/54953 Reviewed By: glaringlee Differential Revision: D27466799 Pulled By: malfet fbshipit-source-id: e03bae9561be3a67429df3b1be95a79005c63050	2021-03-31 10:40:25 -07:00
Jeff Daily	1dffbe759b	[ROCm] utilize PUBLIC vs PRIVATE linking to avoid incorrect dependencies (#54727 ) Summary: Fixes the build of projects that depend on torch, such as torchaudio. Otherwise torchaudio will complain that gloo_hip is missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54727 Reviewed By: H-Huang Differential Revision: D27361513 Pulled By: ezyang fbshipit-source-id: 714cc2db23e7adf3e89303e941b78c27625b9460	2021-03-30 19:22:56 -07:00
Michael Melesse	2620bce42a	[ROCM] load only hipfft separately past rocm4.1 (#54349 ) Summary: This PR is a follow up to https://github.com/pytorch/pytorch/pull/53408. It only loads hipfft if the version is rocm 4.1 or after and stops loading rocfft. This was done to resolve some issues observed in our internal ci due to conflicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54349 Reviewed By: ezyang Differential Revision: D27374252 Pulled By: ngimel fbshipit-source-id: 724e80df5011ea8fabd81739e18ae8a13d3a7ea0	2021-03-26 19:54:25 -07:00
Michael Melesse	4c1af249fb	[ROCM] load hipfft separately from rocfft (#53408 ) Summary: This PR makes changes to how hipfft is loaded in pytorch. hipfft is packaged in a separate library to rocfft following rocm 4.1. We check the rocm version and if it is past rocm 4.1 we load hipfft in addition to rocfft. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53408 Reviewed By: albanD Differential Revision: D26952702 Pulled By: malfet fbshipit-source-id: f42be304b587c060816e39d36f5c1a2cdc37bfab	2021-03-11 09:18:33 -08:00
Ilia Cherniavskii	795ed5ca3f	Enable Kineto in CPU builds (#53174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53174 Enable Kineto also in the CPU builds (non-mobile, non-Windows(atm)) Test Plan: CI Reviewed By: gdankel Differential Revision: D26776112 Pulled By: ilia-cher fbshipit-source-id: 8733f65c2993105136c853f2a7b6e497d0fa53bf	2021-03-04 19:15:52 -08:00
Ashkan Aliabadi	e5ecd1ddf8	[Vulkan]Fix build warnings-treated-as-error on Linux. (#52781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D26669311 Pulled By: AshkanAliabadi fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311	2021-03-03 13:48:43 -08:00
Jeff Daily	d02ea9a141	[ROCm] add hipMAGMA support (#51238 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48831. - CI image is updated to build hipMAGMA from source and set env MAGMA_HOME. - CMake is updated to separate different requirements for CUDA versus ROCm MAGMA. - Some unit tests that become enabled with MAGMA are currently skipped for ROCm due to failures. Fixing these failures will be follow-on work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51238 Reviewed By: ngimel Differential Revision: D26184918 Pulled By: malfet fbshipit-source-id: ada632f1ae7b413e8cae6543fe931dcd46985821	2021-02-01 22:09:33 -08:00
Luca Wehrstedt	b77f72b5a0	Enable TensorPipe's SHM transport (#50760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120470890 Test Plan: Exported three times to CircleCI with tests consistently passing Reviewed By: mrshenli Differential Revision: D23814828 fbshipit-source-id: f355cb6515776debad536924de4f4d3fbb05a874	2021-01-27 11:45:09 -08:00
Jeff Daily	b2e5617553	[ROCm] rename HIP_HCC_FLAGS to HIP_CLANG_FLAGS (#50917 ) Summary: ROCm 3.5 replaced hcc with hip-clang and deprecated HIP_HCC_FLAGS. HIP_CLANG_FLAGS should be used moving forward. HIP_HCC_FLAGS will be removed soon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50917 Reviewed By: ejguan Differential Revision: D26008094 Pulled By: walterddr fbshipit-source-id: cfec4f96fbd9bd338834a841c37267f6a4703cab	2021-01-22 07:24:05 -08:00
Ilia Cherniavskii	e34992ebee	Set USE_KINETO=1 (#49897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897 Resend of https://github.com/pytorch/pytorch/pull/49201 Test Plan: see 49201 Reviewed By: malfet Differential Revision: D25717102 Pulled By: ilia-cher fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6	2021-01-22 00:09:21 -08:00
Luca Wehrstedt	112a583467	Enable TensorPipe's CMA channel (#50759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50759 ghstack-source-id: 120032288 Test Plan: Exported to CircleCI and tested Reviewed By: mrshenli Differential Revision: D25959326 fbshipit-source-id: be6df209ff3a79a8961acbda64ee7805a5c434a9	2021-01-20 10:53:47 -08:00
Ilia Cherniavskii	72b00a8a52	Revert D25480770: Set USE_KINETO=1 Test Plan: revert-hammer Differential Revision: D25480770 (`1a92802bde`) Original commit changeset: 037cd774f554 fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1	2020-12-18 07:06:28 -08:00
Ilia Cherniavskii	1a92802bde	Set USE_KINETO=1 (#49201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201 This unblocks kineto profiler for 1.8 release. This PR supercedes https://github.com/pytorch/pytorch/pull/48391 Note: this will somewhat increase the size of linux server binaries, bc we add libkineto.a and libcupti_static.a: -rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a -rw-r--r-- 1 root root 13699658 Nov 13 2019 /usr/local/cuda/lib64/libcupti_static.a Test Plan: CI https://github.com/pytorch/pytorch/pull/48391 Imported from OSS Reviewed By: ngimel Differential Revision: D25480770 fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c	2020-12-18 01:48:10 -08:00
Abdelrauf	95a1725a4a	Vsx initial support issue27678 (#41541 ) Summary: ### Pytorch Vec256 ppc64le support implemented types: - double - float - int16 - int32 - int64 - qint32 - qint8 - quint8 - complex_float - complex_double Notes: All basic vector operations are implemented: There are a few problems: - minimum maximum nan propagation for ppc64le is missing and was not checked - complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones. That's why they were either excluded or tested in smaller domain range - precisions of the implemented float math functions ~~Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT too~~ #### Details ##### Supported math functions + plus sign means vectorized, - minus sign means missing, (implementation notes are added inside braces) (notes). Example: -(both ) means it was also missing on x86 side g( func_name) means vectorization is using func_name sleef - redirected to the Sleef unsupported function_name \| float \| double \| complex float \| complex double \|-- \| -- \| -- \| -- \| --\| acos \| sleef \| sleef \| f(asin) \| f(asin) asin \| sleef \| sleef \| +(pytorch impl) \| +(pytorch impl) atan \| sleef \| sleef \| f(log) \| f(log) atan2 \| sleef \| sleef \| unsupported \| unsupported cos \| +((ppc64le:avx_mathfun) ) \| sleef \| -(both) \| -(both) cosh \| f(exp) \| -(both) \| -(both) \| erf \| sleef \| sleef \| unsupported \| unsupported erfc \| sleef \| sleef \| unsupported \| unsupported erfinv \| - (both) \| - (both) \| unsupported \| unsupported exp \| + \| sleef \| - (x86:f()) \| - (x86:f()) expm1 \| f(exp) \| sleef \| unsupported \| unsupported lgamma \| sleef \| sleef \| \| log \| + \| sleef \| -(both) \| -(both) log10 \| f(log) \| sleef \| f(log) \| f(log) log1p \| f(log) \| sleef \| unsupported \| unsupported log2 \| f(log) \| sleef \| f(log) \| f(log) pow \| + f(exp) \| sleef \| -(both) \| -(both) sin \| +((ppc64le:avx_mathfun) ) \| sleef \| -(both) \| -(both) sinh \| f(exp) \| sleef \| -(both) \| -(both) tan \| sleef \| sleef \| -(both) \| -(both) tanh \| f(exp) \| sleef \| -(both) \| -(both) hypot \| sleef \| sleef \| -(both) \| -(both) nextafter \| sleef \| sleef \| -(both) \| -(both) fmod \| sleef \| sleef \| -(both) \| -(both) [Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685) Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on Vsx vec_round failed for (even)+0 .5 values~~ . it was solved by using vec_rint - ~~complex types are not tested~~ After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Pull Request resolved: https://github.com/pytorch/pytorch/pull/41541 Reviewed By: zhangguanheng66 Differential Revision: D23922049 Pulled By: VitalyFedyunin fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098	2020-12-10 13:42:39 -08:00
peterjc123	5450614cf6	Correctly apply WIN32_LEAN_AND_MEAN to the whole repo (#49025 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49025 Reviewed By: zhangguanheng66 Differential Revision: D25399912 Pulled By: ezyang fbshipit-source-id: 9b7225b0e43511e0b8981c39035d814a4406c523	2020-12-08 19:38:23 -08:00
Rong Rong	b89c328493	Add fftw3 cmake as alternative for FFT/DFT (#48808 ) Summary: added cmake discovery in Dependencies.cmake for fftw3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48808 Reviewed By: janeyx99 Differential Revision: D25375320 Pulled By: walterddr fbshipit-source-id: cde3afc51eef9c621c7d19be7ad7573fc8b838c2	2020-12-08 10:35:33 -08:00

1 2 3 4 5 ...

625 Commits