Commit Graph

625 Commits

Author SHA1 Message Date
cyy
ef5ff79019 [2/N] Clean up CMake target linking (#109986)
This PR cleans up more CMake target linking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109986
Approved by: https://github.com/malfet
2023-10-01 05:36:08 +00:00
Aleksei Nikiforov
e05eb69c93 Don't link to libcpuinfo on s390x (#109875)
Don't even build it.
It does not support s390x.

This is a follow up for https://github.com/pytorch/pytorch/pull/109496

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109875
Approved by: https://github.com/kit1980
2023-09-26 12:43:35 +00:00
cyy
265acd4bea Clean up CMake target linking (#109959)
This PR cleans up more CMake target linking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109959
Approved by: https://github.com/ezyang
2023-09-25 01:37:14 +00:00
cyy
ba0362a09e Remove unused build system checks and definitions (#109711)
Remove some outdated checks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109711
Approved by: https://github.com/ezyang
2023-09-21 16:52:16 +00:00
Nikita Shulga
44448754c1 [CI] Fix sccaching of nvcc builds (#106811)
In cmake-3.26 or newer, `--options-file` is used, which renders nvcc outputs uncacheable by `sccache`, which were enable for CUDA-11 or newer builds by default by 6377a43814

Fix it by disabling RESPONSE_FILE use for CUDA compilation.
Test Plan: Check that `multiple input files` stats in `PyTorch Build Statistics` is down to 13 files again, see https://github.com/pytorch/pytorch/actions/runs/5801865789/job/15727069855?pr=106811#step:10:42423

Fixes https://github.com/pytorch/pytorch/issues/105004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106811
Approved by: https://github.com/seemethere
2023-08-09 00:25:11 +00:00
Jesse Cai
f81f9093ec [core][pruning][feature] cuSPARSELt build integration (#103700)
Summary:

This stack of PR's integrates cuSPARSELt into PyTorch.

This PR adds support for cuSPARSELt into the build process.
It adds in a new flag, USE_CUSPARSELT that defaults to false.

When USE_CUSPASRELT=1 is specified, the user can also specify
CUSPASRELT_ROOT, which defines the path to the library.

Compiling pytorch with cusparselt support can be done as follows:

``
USE_CUSPARSELT=1
CUSPARSELT_ROOT=/path/to/cusparselt

python setup.py develop
```

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700
Approved by: https://github.com/albanD
2023-08-02 12:48:39 +00:00
Jeff Daily
5379b5f927 [ROCm] use hipblas instead of rocblas (#105881)
- BatchLinearAlgebraLib.cpp is now split into one additional file
  - BatchLinearAlgebraLib.cpp uses only cusolver APIs
  - BatchLinearAlgebraLibBlas.cpp uses only cublas APIs
  - hipify operates at the file level and cannot mix cusolver and cublas APIs within the same file
- cmake changes to link against hipblas instead of rocblas
- hipify mappings changes to map cublas -> hipblas instead of rocblas

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105881
Approved by: https://github.com/albanD
2023-07-31 20:42:55 +00:00
Rodrigo Kumpera
2636751fb9 [C10d] Add skeleton of LibUV backend. (#105672)
This commit hooks up tcpstore creation and build flags.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105672
Approved by: https://github.com/fduwjj
2023-07-28 13:19:06 +00:00
Connor Baker
0c8323e4a4 cmake: allow USE_SYSTEM_ZSTD (#104611)
Fixes #44255.

This is part of larger work I'm doing to allow for more `USE_SYSTEM_*` options to allow Nix to have faster re-builds of PyTorch: https://github.com/NixOS/nixpkgs/pull/239291.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104611
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-05 04:47:35 +00:00
Nikita Shulga
3a823e4617 [BE][CMake] Do not pass -mfpu=neon on Apple (#104078)
Followup after https://github.com/pytorch/pytorch/pull/103929 that
get rid of an annoying warning, which will become an error in newer Xcode

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 748d60d</samp>

> _`NEON_FOUND` is true_
> _But iOS may not like `-mfpu=neon`_
> _Check platform, then branch_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104078
Approved by: https://github.com/huydhn, https://github.com/kit1980
2023-06-23 17:09:30 +00:00
cyy
1e108d9c21 enable more ASAN tests (#101483)
Recently, we are seeing some bugs found by ASAN such as #101400, I think enabling ASAN for more tests is necessary to catch more hidden bugs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101483
Approved by: https://github.com/huydhn
2023-06-15 05:21:15 +00:00
Jack Taylor
87c976b69d Remove deprecated HIP flags (#102271)
Removes the outdated HIP flags appended to HIP_CXX_FLAGS

The will help remove the following warnings in the pytorch build log

```
[6238/6889] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/cudnn/hip/Conv_v8.cpp.o
cc1plus: warning: command line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
cc1plus: warning: unrecognized command line option ‘-Wno-unused-command-line-argument’
cc1plus: warning: unrecognized command line option ‘-Wno-exceptions’
cc1plus: warning: unrecognized command line option ‘-Wno-inconsistent-missing-override’
cc1plus: warning: unrecognized command line option ‘-Wno-macro-redefined’
```

This also updates the gloo submodule commit to include the similar change made to gloo.
597accfd79

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102271
Approved by: https://github.com/malfet
2023-06-01 18:58:48 +00:00
Andres Lugo-Reyes
eaffd98880 Enable hipSOLVER in ROCm builds (#97370)
Enables the hipSolver backend for ROCm builds
--------------------------------------------------------------------------

- Minimum ROCm version requirement - 5.3
- Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER
- Adds hipSOLVER API to hipification process
- combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings
- Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr()
- Will enable 100+ linalg unit tests for ROCm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370
Approved by: https://github.com/malfet
2023-05-31 16:53:23 +00:00
Nikita Shulga
30cecc0e11 [MPS] Fix build regressions introduced by #92868 (#101036)
https://github.com/pytorch/pytorch/pull/92868 introduced  `OBJC` and `OBJCXX` language dialects, but fails to propagate some important flags, like OpenMP include path(if found),  `-fno-objc-arc` and `-Wno-unguarded-availability-new` suppression.

This PR remedies that and fixes https://github.com/pytorch/pytorch/issues/100925

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 62677d4</samp>

This pull request improves the support for MPSGraph on Apple platforms by fixing some CMake flags for parallelism and memory management. It modifies `cmake/Dependencies.cmake` and `CMakeLists.txt` accordingly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101036
Approved by: https://github.com/atalman, https://github.com/huydhn
2023-05-10 04:15:41 +00:00
Aleksei Nikiforov
c130b8a716 Reintroduce s390x SIMD support (#99057)
Reintroduce s390x SIMD support

Use vectorized FMA to fix test precision failures

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99057
Approved by: https://github.com/malfet
2023-04-15 00:24:44 +00:00
Milos Puzovic
2630144786 Call to mkldnn_matmul from aten::addmm on AArch64 (#91763)
We have noticed that on BERT_pytorch in torchbenchmark majority of time is spent in running GEMM in aten:addmm. At the moment this calls into BLAS routine, but on AArch64 it will be faster if it calls into mkldnn_matmul. Performance wise compared to build with OpenBLAS it runs faster 1.2x faster on 16 cores with batch size of 8 on Graviton3, while if fast math mode (mkldnn_matmul exposes through oneDNN and Arm Compute Library option to run GEMM with FP32 inputs using BBF16 operations) is enabled then it is 2.3x

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91763
Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/malfet
2023-04-01 04:25:57 +00:00
PyTorch MergeBot
3226ad21cf Revert "[Reland] fix some MKL detection issues of CMake (#94924)"
This reverts commit dc2b7aa955.

Reverted https://github.com/pytorch/pytorch/pull/94924 on behalf of https://github.com/atalman due to conda nightly build failures
2023-03-31 18:41:11 +00:00
cyy
dc2b7aa955 [Reland] fix some MKL detection issues of CMake (#94924)
This is reland of PR #94402 that tries to solve the additional link issues.
The  PR #94402 failed because caffe2::mkl had been converted to private dependency while libtorch_cuda_linalg hadn't linked to it explicitly. This is fixed in commit 4373bf0ae3dee32afc178f9d51a4154d6c5904c6
We also replace more references of MKL_LIBRARIES by caffe2::mkl in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94924
Approved by: https://github.com/malfet
2023-03-31 02:01:52 +00:00
Pruthvi Madugundu
08f125bcac [ROCm] Remove usage of deprecated ROCm component header includes (#97620)
- clang parameter 'amdgpu-target' changed to 'offload-arch'
- HIP and MIOpen includes path updated for extensions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97620
Approved by: https://github.com/ezyang, https://github.com/jithunnair-amd
2023-03-28 19:28:38 +00:00
wangxiyuan
4ab1588d99 Enhance error message for dependency check (#96642)
If python development library is missing when building pytorch from source, cmake will raise the error like:
```
CMake Error at cmake/Dependencies.cmake:1079 (if):
  if given arguments:

    "VERSION_LESS" "3"

  Unknown arguments specified
```

it's quite a misleading information that user would consider it's a syntax error or cmake version problem.

This PR add a check to ensure `PYTHONLIBS_VERSION_STRING` exist before using.

Related  #87993

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96642
Approved by: https://github.com/kit1980
2023-03-22 08:42:48 +00:00
cyy
666efd8d5d Improve ASAN and TSAN handling in cmake (#93147)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93147
Approved by: https://github.com/malfet
2023-03-07 14:10:13 +00:00
Peter Bell
c5f6092591 Use FindCUDAToolkit to find cuda dependencies (#82695)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695
Approved by: https://github.com/malfet
2023-03-01 17:26:36 +00:00
PyTorch MergeBot
801b3f8fc7 Revert "Use FindCUDAToolkit to find cuda dependencies (#82695)"
This reverts commit 7289d22d67.

Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/peterbell10 due to Breaks torchaudio build
2023-02-28 02:29:09 +00:00
cyy
f27e09de04 Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927)
This PR do two things:
1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang.
2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927
Approved by: https://github.com/malfet
2023-02-27 19:22:20 +00:00
Peter Bell
7289d22d67 Use FindCUDAToolkit to find cuda dependencies (#82695)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695
Approved by: https://github.com/malfet
2023-02-21 22:35:17 +00:00
dllehr-amd
98012e4a59 [ROCm] hipGraph support for pytorch mainline (#88202)
With the release of ROCm 5.3 hip now supports a hipGraph implementation.

All necessary backend work and hipification is done to support the same functionality as cudaGraph.

Unit tests are modified to support a new TEST_GRAPH feature which allows us to create a single check for graph support instead of attempted to gather the CUDA level in annotations for every graph test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88202
Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet
2023-02-14 22:18:56 +00:00
PyTorch MergeBot
e743d316e2 Revert "fix some MKL detection issues of CMake (#94402)"
This reverts commit 7ef46d40a1.

Reverted https://github.com/pytorch/pytorch/pull/94402 on behalf of https://github.com/malfet due to Broke binary builds, see https://github.com/pytorch/pytorch/issues/94751#issuecomment-1428562517
2023-02-13 22:09:40 +00:00
PyTorch MergeBot
36dfbb08f3 Revert "Update Cutlass to v2.11 (#94188)"
This reverts commit a0f9abdcb6.

Reverted https://github.com/pytorch/pytorch/pull/94188 on behalf of https://github.com/ezyang due to bouncing this to derisk branch cut
2023-02-13 19:03:36 +00:00
Aaron Gokaslan
a0f9abdcb6 Update Cutlass to v2.11 (#94188)
Now that we are on CUDA 11+ exclusively, we can update Nvidia's Cutlass to the next version. We also had to remove the cuda build flag : "-D__CUDA_NO_HALF_CONVERSIONS__" since Cutlass no longer builds without it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94188
Approved by: https://github.com/ezyang, https://github.com/jansel
2023-02-12 20:45:03 +00:00
cyy
7ef46d40a1 fix some MKL detection issues of CMake (#94402)
This PR rewrites some logic of FindMKL.cmake and FindOpenMP.cmake to better detect the corresponding libraries and fix the infinitely recursion between them. It also contains some other fixes without changing the CMake interface.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94402
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-12 19:19:10 +00:00
cyy
5fa7120722 Simplify CMake CUDNN code (#91676)
1. Move CUDNN code to seperate module.
2. Merge CUDNN public and private targets into a single private target. There is no need to expose CUDNN dependency.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91676
Approved by: https://github.com/malfet
2023-02-08 01:06:10 +00:00
cyy
9291f9b9e2 Simplify cmake code (#91546)
We use various newer CMake features to simplify build system:
1.Caffe2::threads is replaced by threads::threads.
2.Some unused MSVC flags are removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91546
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-08 01:05:19 +00:00
cyy
afd7b581aa Simplify OpenMP detection in CMake (#91576)
We greatly simplify the handing of OpenMP in CMake by using caffe2::openmp target thoroughly. We follow the old behavior by defaulting to MKL OMP library and detecting OMP flags otherwise.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91576
Approved by: https://github.com/malfet
2023-02-04 11:50:06 +00:00
Hansong Zhang
d996acfbc2 [XNNPACK] disable ARM_BF16 and ARM_FP16_VECTOR (#94020)
Summary: This is not used and will cause build failure

Test Plan: CI

Differential Revision: D42982023

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94020
Approved by: https://github.com/Skylion007, https://github.com/tiandiao123, https://github.com/digantdesai
2023-02-03 05:01:00 +00:00
Digant Desai
989722cd19 Use global PIC flag for XNNPACK (#93896)
Summary:
- XNNPACK Object libraries needs an explicit PIC flag when building static, PIC libXNPACK.a
- Without this link process runs into relocation errors
- Using this global switch to avoid updating XNNPACK CMake

Test Plan: CI

Differential Revision: D42944764

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93896
Approved by: https://github.com/Skylion007, https://github.com/Neilblaze, https://github.com/salilsdesai
2023-02-02 23:38:21 +00:00
cyy
9710ac6531 Some CMake and CUDA cleanup given recent update to C++17 (#90599)
The main changes are:
1. Remove outdated checks for old compiler versions because they can't support C++17.
2. Remove outdated CMake checks because it now requires 3.18.
3. Remove outdated CUDA checks because we are moving to CUDA 11.

Almost all changes are in CMake files for easy audition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599
Approved by: https://github.com/soumith
2022-12-30 11:19:26 +00:00
Nikita Shulga
36ac095ff8 Migrate PyTorch to C++17 (#85969)
With CUDA-10.2 gone we can finally do it!

This PR mostly contains build system related changes, invasive functional ones are to be followed.
Among many expected tweaks to the build system, here are few unexpected ones:
 - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it
 - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code.
 - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious
 - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it.

Some prerequisites:
 - https://github.com/pytorch/pytorch/pull/89297
 - https://github.com/pytorch/pytorch/pull/89605
 - https://github.com/pytorch/pytorch/pull/90228
 - https://github.com/pytorch/pytorch/pull/90389
 - https://github.com/pytorch/pytorch/pull/90379
 - https://github.com/pytorch/pytorch/pull/89570
 - https://github.com/facebookincubator/gloo/pull/336
 - https://github.com/facebookincubator/gloo/pull/343
 - 919676fb32

Fixes https://github.com/pytorch/pytorch/issues/56055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2022-12-08 02:27:48 +00:00
Michael Wootton
5351176caa Kineto activity fix (#89785)
Continuation of https://github.com/pytorch/pytorch/pull/88207

A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table().

Previously a cmake variable was not being propagated to a '-D', causing an issue on Windows, which uses cuda but not cupti.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89785
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-12-08 00:24:55 +00:00
Dmytro Dzhulgakov
ae01615d75 Fix cupti search path in CMake (#88657)
Minor fix for when cuda is installed via conda. In this case the libraries are in `lib` and not `lib64`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88657
Approved by: https://github.com/kit1980, https://github.com/malfet
2022-11-10 23:44:52 +00:00
Pruthvi Madugundu
fbd08fb358 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-04 04:43:05 +00:00
PyTorch MergeBot
0fa23663cc Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)"
This reverts commit 1e2c4a6e0e.

Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev
2022-11-02 18:13:37 +00:00
Pruthvi Madugundu
1e2c4a6e0e Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-02 17:41:57 +00:00
Jithun Nair
2e48b478e0 [ROCm] Use -rpath-link to fix libtinfo conflict (#83552)
Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors.

cc @jeffdaily @sunway513 @ROCmSupport
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552
Approved by: https://github.com/malfet, https://github.com/pruthvistony
2022-10-28 03:50:43 +00:00
PyTorch MergeBot
ac0c13f665 Revert "[ROCm] Use -rpath-link to fix libtinfo conflict (#83552)"
This reverts commit a10446c4d8.

Reverted https://github.com/pytorch/pytorch/pull/83552 on behalf of https://github.com/kit1980 due to Broke ios/macos builds https://github.com/pytorch/pytorch/actions/runs/3329991911/jobs/5507911292
2022-10-26 16:43:13 +00:00
Jithun Nair
a10446c4d8 [ROCm] Use -rpath-link to fix libtinfo conflict (#83552)
Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552
Approved by: https://github.com/malfet
2022-10-26 14:40:29 +00:00
Vladimír Aubrecht
409efebab8 Added define to fix issue with compatibility with latest Windows SDK (#85408)
Fixes #83820.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85408
Approved by: https://github.com/ezyang
2022-10-12 15:44:28 +00:00
Jithun Nair
90b64e231e Update hipification logic for all ROCm headers (#85320)
...to remove deprecation warnings. Remove component-specific include dirs from include path

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85320
Approved by: https://github.com/kit1980
2022-09-21 16:22:12 +00:00
John Detloff
e0229d6517 Remove caffe2 mobile (#84338)
We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338
Approved by: https://github.com/dreiss
2022-09-08 01:49:55 +00:00
Nikita Shulga
62c8d30f9f [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-10 14:32:26 +00:00
PyTorch MergeBot
d3a1f17fc7 Revert "[BE] Add append_cxx_flag_if_supported macro (#82883)"
This reverts commit d7e6aaa59b.

Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-10 10:27:59 +00:00
Nikita Shulga
d7e6aaa59b [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-08 21:04:09 +00:00
Nikita Shulga
83086b7f45 Fix NCCL detection by Gloo (#82773)
Instruct Gloo to always use bundled version of the library by passing `NCCL_EXTERNAL`
Otherwise, it would link with shared library if one could be found in the system
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82773
Approved by: https://github.com/ngimel
2022-08-04 16:26:30 +00:00
zhang, xiaobing
86b86202b5 fix torch.config can't respect USE_MKLDNN flag issue (#75001)
Fixes https://github.com/pytorch/pytorch/issues/74949, which reports that torch.config can't respect USE_MKLDNN flag.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75001
Approved by: https://github.com/malfet
2022-07-17 15:00:48 +00:00
Nikita Shulga
17fe7ce0e4 [BE] Delete Win specific case for CMake older than 3.1 (#81411)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81411
Approved by: https://github.com/janeyx99
2022-07-14 00:31:31 +00:00
Tongliang Liao
dff70a5e1a Make language std configurable. (#75519)
RocksDB 7 starts to use C++17 in header.
We should make this configurable, in case user needs higher std version.

List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`.
Doc string is from CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519
Approved by: https://github.com/malfet
2022-07-13 14:21:27 +00:00
Jing Xu
3c7044728b Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-07-13 13:50:15 +00:00
Terry Lam
54bdaf76d6 [PFC] Native UCC process group for Pytorch (#79918)
Summary:
This diff integrates UCC process group as a native component of Pytorch Distributed core. It is based on the existing torch-ucc (https://github.com/facebookresearch/torch_ucc) as the wrapper for UCC collective communication library.
The environment and cmake variables are named in mirroring to the existing process groups such as NCCL and Gloo. Specifically,
- USE_UCC: enables UCC PG. This defaults to OFF, so there is no breakage of existing builds that do not have UCX/UCC external libraries.
- USE_SYSTEM_UCC: uses external UCX and UCC shared libraries that are set accordingly with UCX_HOME and UCC_HOME.

Currently, this diff only supports USE_SYSTEM_UCC=ON, i.e., requiring users to specify external libraries for UCX and UCC. In subsequent diffs, we will add UCX and UCC repos as third-party dependencies in pytorch/third-party.

Test Plan:
Passed Torch-UCC tests that invoke UCC process group. For example:

$ sh test/start_test.sh test/torch_allreduce_test.py --backend gloo --use-cuda
...
Test allreduce: succeeded

Differential Revision: D36973688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79918
Approved by: https://github.com/kwen2501, https://github.com/kingchc
2022-07-12 14:45:44 +00:00
Michael Suo
b349d15907 [build] fix compiling with clang13 (#80916)
This check is incorrect; clang 13.1.0 doesn't exist.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80916
Approved by: https://github.com/malfet
2022-07-06 02:35:46 +00:00
PyTorch MergeBot
1454515253 Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)"
This reverts commit f988aa2b3f.

Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see f988aa2b3f
2022-06-30 12:49:41 +00:00
Jing Xu
f988aa2b3f Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-06-30 05:14:03 +00:00
Mo Zhou
799d71378c cmake: Fix variable typo for USE_SYSTEM_PYBIND11. (#80272)
The correct variable name should be USE_SYSTEM_PYBIND11, as defined in
the root CMakeLists.txt. In cmake/Dependencies.cmake, it is incorrectly
written as USE_SYSTEM_BIND11, but cmake will not complain about this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80272
Approved by: https://github.com/suo
2022-06-27 02:08:07 +00:00
Toyohisa Kameyama
8adec19230 Specify "Generic" BLAS library name. (#74269)
When we use pytorch with unregistered blas, spack set BLAS=Generic.
pytorch is searched only libblas.
If the blas package's blas library name is not libblas, spack install py-torch is failed.

This PR set blas lirary names to GENERIC_BLAS_LIBRARIES environment variable, and py-torch is found blas library.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74269
Approved by: https://github.com/kit1980
2022-06-20 18:44:54 +00:00
Sergii Dymchenko
f1fb575b9e Remove -Wno-unused-but-set-variable for clang 13.0.0 (#79666)
Fixes #74805

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79666
Approved by: https://github.com/malfet
2022-06-16 04:26:39 +00:00
Mark Harfouche
221755cc71 Link BLAS privately (#78883)
We've some users report that they are getting symbol collisions when linking to blas.

I don't see a need to re-export the blas library symbols.

I figured I would share here for other packagers to be able to benefit too.

xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116
xref: https://github.com/conda-forge/openblas-feedstock/issues/134
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78883
Approved by: https://github.com/ezyang
2022-06-09 17:02:06 +00:00
Peter Bell
5cdf79fddc Bump minimum CMake version to 3.13
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76312

Approved by: https://github.com/malfet
2022-05-19 15:38:55 +00:00
Nikita Shulga
4b4a6a0b19 Use TensorPipe libuv in Gloo (#77312)
Otherwise, its possible to build TensorPipe with one version of libuv
and gloo with another.

Also, delete strange `GLOO_INSTALL` logic, as none of the install artifacts are really packaged as part of PyTorch (and it were probably used by Caffe2 builds)

This helps solve problem for compiling PyTorch for M1, where `libuv` is not available in conda

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77312
Approved by: https://github.com/seemethere
2022-05-17 03:31:48 +00:00
Nikita Shulga
8473173c36 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency.

Add `third_party` to torch_cpu include directories if compiling with
Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-05-03 20:21:55 +00:00
Peter Bell
653892e288 Kineto: Don't search for CUPTI in default paths
Should fix #75369

Searching the default system paths may point to different cuda toolkit
versions, so we should restrict the search to only the paths passed explicitly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76188

Approved by: https://github.com/ezyang
2022-04-22 01:08:55 +00:00
PyTorch MergeBot
d79d9fa283 Revert "Remove breakpad dependency"
This reverts commit 9aa3c7fd83.

Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet
2022-04-17 17:58:51 +00:00
Nikita Shulga
9aa3c7fd83 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-04-17 17:43:45 +00:00
Min Si
42b4d0e934 [caffe2] remove unecessary RCCL dependency
Summary:
RCCL is required by two components in hipified Pytorch: (1) gloo and (2) hipified ProcessGroupNCCL.

- For (1) the RCCL dependency is managed in `./third_party/gloo/cmake/Dependencies.cmake` and can be enabled/disabled via `USE_RCCL`.
- For (2) the RCCL dependency is managed via `./cmake/Dependencies.cmake` and can be on/off via `USE_NCCL`.

The additional dependency removed in this commit forced hipified Pytorch to load librccl.so even when USE_RCCL=OFF USE_NCCL=OFF is set, i.e., when using torch_ucc/ucc for AMD GPU mem type. This caused conflicts when we use a non-system default librccl.so (i.e., not in ROCM_PATH) for torch_ucc/ucc.

This commit removes the unnecessary RCCL dependency. This will ensure a cleaner way to use torch_ucc with a user-specified RCCL library.

Test Plan:
## Verify OSS pytorch on an AMD GPU machine (MI100)
```
ROCM_PATH=/opt/rocm-4.5.2
git clone https://github.com/pytorch/pytorch.git
cd pytorch
python3 tools/amd_build/build_amd.py
USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py develop
USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py install
```
log for develop: P492778257
log for install: P492778277

## Verify OSS pytorch + TorchUCC on an AMD GPU machine (MI100)
```
export RCCL_INSTALL_DIR=/opt/rccl-rocm-rel-4.4
git clone https://github.com/facebookresearch/torch_ucc.git
cd torch_ucc
UCX_HOME=$RCCL_INSTALL_DIR UCC_HOME=$RCCL_INSTALL_DIR WITH_CUDA=$ROCM_PATH python setup.py

# run param comm
export HSA_ENABLE_SDMA=0
export LD_LIBRARY_PATH=$RCCL_INSTALL_DIR

cd test
git clone https://github.com/facebookresearch/param
cd ..
/bin/bash ./test/start_test.sh ./test/param/train/comms/pt/comms.py --backend ucc --device cuda --b 4 --e 4M --c 1 --collective all_reduce
```
- log for param comm: P493033836
- Verified librccl.so in `/opt/rccl-rocm-rel-4.4` is used via checking version string in log. "[localbuild]" is added in RCCL source.
```
RCCL version 2.9.9+hip4.4 [localbuild]
```

Differential Revision: D35476911

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75547
Approved by: https://github.com/malfet, https://github.com/jeffdaily
2022-04-12 17:45:08 +00:00
Michael Suo
e5bf87963d Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4)
Test Plan: revert-hammer

Differential Revision:
D34584878 (7dd0823011)

Original commit changeset: ce817aa8cc90

Original Phabricator Diff: D34584878 (7dd0823011)

fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b
(cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)
2022-03-21 23:07:14 +00:00
chunyuan
7dd0823011 Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111)
Summary:
## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111

Reviewed By: eellison

Differential Revision: D34584878

Pulled By: malfet

fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4
(cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)
2022-03-21 22:12:19 +00:00
Nikita Shulga
14dcb5a1a0 Fix asmjit compilation with clang-13
By suppressed `deprecated-copy` and `unused-but-set-variable` warnings, otherwise compilation fails with implicit default copy constructor:
```
/Users/malfet/git/pytorch/pytorch/third_party/fbgemm/third_party/asmjit/src/asmjit/core/../core/radefs_p.h:174:22: error: definition of implicit copy constructor for 'RARegCount' is deprecated because it has a user-declared copy assignment operator [-Werror,-Wdeprecated-copy]
  inline RARegCount& operator=(const RARegCount& other) noexcept = default;
```

Fixes https://github.com/pytorch/pytorch/issues/74352

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74379
Approved by: https://github.com/seemethere, https://github.com/atalman
2022-03-17 17:09:07 +00:00
Edward Z. Yang
493bbdc4fe Use shared CUPTI by default
Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI
causes exception handling to break on certain compiler configurations, likely
because CUPTI comes with incompatible libstdc++ symbols.  Rather than pray that
something reasonable happens, use the safer configuration (dynamic linking) by
default and give a warning if the user inverts the setting.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74009

Approved by: https://github.com/malfet
2022-03-16 21:04:12 +00:00
Andrey Talman
17b3ba148d
Set BLAS_LIBRARIES to ${MKL_LIBRARIES} for MKL case (#72806)
This reverts [suggestion](https://github.com/pytorch/pytorch/pull/49647#discussion_r677737470) proposed to https://github.com/pytorch/pytorch/pull/49647

Which is somehow sufficient to workaround symptoms of https://github.com/pytorch/pytorch/issue/72653 

I.e. before this change, `BLAS_LIBRARIES` were set to `caffe2::mkl`
which is an interface library with link property set as follows:
59dd84cab6/cmake/public/mkl.cmake (L10-L12)
2022-02-16 07:14:27 -08:00
Aaron Enye Shi
8a43aa9538 [Kineto][Bug Fix] Avoid picking up old CUPTI headers (#72761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72761

By default, the CUPTI_INCLUDE_DIR will pick up cupti.h from /usr/include which is old (from 2017 on AWS), and missing many cupti headers. Use NO_DEFAULT_PATH to avoid that, instead search from the list of locations provided.

Test Plan:
Fixes missing headers error when building on AWS. (Avoids old cupti.h from /usr/include). Instead uses cupti.h from cuda/extras/CUPTI/include.
```
In file included from /scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.cpp:13:0:
/scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.h:12:10: fatal error: cupti_profiler_target.h: No such file or directory
 #include <cupti_profiler_target.h>
          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
```
and
```
/scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.cpp:7:10: fatal error: nvperf_host.h: No such file or directory
 #include <nvperf_host.h>
          ^~~~~~~~~~~~~~~
compilation terminated.
```

Reviewed By: briancoutinho

Differential Revision: D34191123

Pulled By: aaronenyeshi

fbshipit-source-id: d84f80308c9939ba8ed504e667847d136a261453
(cherry picked from commit 33368bd93b)
2022-02-15 22:43:03 +00:00
Peter Bell
bc1fb7a618 CMake: Limit python include directories to only python libraries (#69085)
Summary:
`include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes `python`, `numpy` and `pybind11` into targets that only `torch_python` and `caffe2_pybind_state` are linked to. So, python libraries can't be accidentally included elsewhere.

Resubmit of https://github.com/pytorch/pytorch/issues/65654, Closes https://github.com/pytorch/pytorch/issues/65828

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69085

Reviewed By: anjali411

Differential Revision: D33776456

Pulled By: malfet

fbshipit-source-id: 018b0f6cd5a4f8c9e36df961deff832bc4afd479
(cherry picked from commit 57063107d6)
2022-02-07 21:18:32 +00:00
Peter Bell
847dbb8684 CMake: Clean up unused definitions (#69216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69216

This cleans up 4 pre-processor defines not used by any code:
- HAVE_GCC_GET_CPUID
- USE_GCC_GET_CPUID
- USE_AVX
- USE_AVX2

`cpuid` isn't used in PyTorch any more, we only use `cpuinfo`.
`USE_AVX*` is also not used, instead `HAVE_*_CPU_DEFINITIONS` tells
you which `CPU_CAPABILITY` flags are being compiled.

There is also `fbgemm`'s code path adding `third_party` as an include
path, despite `fbgemm` having a dedicated include directory and a
CMake setup that properly includes it.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33794424

Pulled By: malfet

fbshipit-source-id: 99d504af088818d4a26c2f6ce67ec0d59a5eb703
(cherry picked from commit 2e099d41f0)
2022-01-31 22:49:11 +00:00
Peter Bell
d693739248 CMake: Clean up unused definitions (#69216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69216

Currently `torch_cpu` has command line arguments relating to cuda
libraries e.g. `-DMAGMA_V2`. This happens because
`include_directories` and `add_definitions` indescriminately change
the compile commands of all targets.

Instead creating a proper magma target allows limiting the flags to
just `torch_cuda`.

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D33794174

Pulled By: malfet

fbshipit-source-id: 762eabf3b9576bef94e8caa3ed4764c0e2c72b08
(cherry picked from commit f7d127b654)
2022-01-31 22:49:11 +00:00
Peter Bell
5045c18bd1 Error if pocketfft is not found (#67909)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67842

cc mruberry peterbell10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67909

Reviewed By: albanD

Differential Revision: D33759534

Pulled By: malfet

fbshipit-source-id: 03548c95fe233b812b303ce9603c20ff9f626c39
(cherry picked from commit 214624e254)
2022-01-31 17:29:48 +00:00
Han Qi
1bc3571078 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201

Included functions:
save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer module object

Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged.

Test Plan: unittest

Reviewed By: malfet, gmagogsfm

Differential Revision: D33239362

fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763
2022-01-12 16:30:39 -08:00
Andrey Talman
6c4437118b Deprecating Python 3.6 (#70493)
Summary:
Deprecating python 3.6 from documentation and from cmake

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70493

Reviewed By: suo

Differential Revision: D33433118

Pulled By: atalman

fbshipit-source-id: c3adc7b75714efdb5b6acda5d4cddc068fb4a145
2022-01-05 11:46:32 -08:00
Michael Suo
1adb70c6f0 Revert D33409880: [pytorch][PR] Deprecating Python 3.6
Test Plan: revert-hammer

Differential Revision:
D33409880 (d95be99561)

Original commit changeset: 4f9123398960

Original Phabricator Diff: D33409880 (d95be99561)

fbshipit-source-id: 32dc1c3c07ef99a04fab7d0fb742cf4e6c4b718a
2022-01-04 16:37:09 -08:00
Andrey Talman
d95be99561 Deprecating Python 3.6 (#70493)
Summary:
Deprecating python 3.6 from documentation and from cmake

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70493

Reviewed By: malfet

Differential Revision: D33409880

Pulled By: atalman

fbshipit-source-id: 4f912339896096be95b344724a4d9ae88cdf1a8f
2022-01-04 14:41:27 -08:00
linuxone
f64906f470 ibm z14/15 SIMD support (#66407)
Summary:
https://github.com/pytorch/pytorch/issues/66406
implemented z arch 14/15 vector SIMD additions.
so far besides bfloat all other types have their SIMD implementation.

it has 99% coverage and currently passing the local test.
it is concise and the main SIMD file is only one header file
it's using template metaprogramming, mostly. but still, there are a few macrosses left with the intention not to modify PyTorch much
Sleef supports z15

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66407

Reviewed By: mrshenli

Differential Revision: D33370163

Pulled By: malfet

fbshipit-source-id: 0e5a57f31b22a718cd2a9ac59753fb468cdda140
2022-01-04 09:40:18 -08:00
Peter Bell
c34aa715fa AT_MKL_SEQUENTIAL and build changes (#70259)
Summary:
Re-land of  https://github.com/pytorch/pytorch/pull/69419

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70259

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33246757

Pulled By: ngimel

fbshipit-source-id: 738f8558d4cad6752be14108f9931ec3514f6682
2021-12-22 13:52:23 -08:00
Yanan Cao
17f3179d60 Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796

(Note: this ignores all push blocking failures!)

Test Plan: External CI + Sandcastle

Reviewed By: zhxchen17

Differential Revision: D33032671

fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef
2021-12-10 21:29:53 -08:00
Han Qi
d3649309e6 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306

Included functions:

save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer
Module object

Test Plan: unittests

Reviewed By: gmagogsfm

Differential Revision: D32806835

fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57
2021-12-09 14:53:31 -08:00
Alban Desmaison
00ebbd5ef6 Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer
Test Plan: revert-hammer

Differential Revision:
D32010095 (41d35dc201)

Original commit changeset: d763b0557780

fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d
2021-12-02 06:41:40 -08:00
Han Qi
41d35dc201 Add ability for a mobile::Module to save as flatbuffer (#67351)
Summary:
Included functions:

* save_mobile_module -> saves a mobile::Module to flatbuffer
* load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
* parse_mobile_module -> parses from bytes or deserialized flatbuffer
      Module object

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351

Reviewed By: iseeyuan

Differential Revision: D32010095

Pulled By: qihqi

fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1
2021-12-01 23:58:15 -08:00
Robert Blackwell
cee4e8f35d Add FlexiBLAS build support per #64752 (#64815)
Summary:
To enable building torch+dependencies, set WITH_BLAS=flexi BLAS=FlexiBLAS

Fixes https://github.com/pytorch/pytorch/issues/64752

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64815

Reviewed By: jbschlosser

Differential Revision: D31997745

Pulled By: albanD

fbshipit-source-id: db208d59002f5896608a03132616400f09d972aa
2021-10-28 11:28:00 -07:00
Xiang Gao
b8dfb45ac2 Refactor cub namespace handling (#66219)
Summary:
This PR is to update PyTorch with the following cub changes:
- Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added.

And I do the following change to PyTorch:
- Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag.
- Fix caffe2 failures caused by the above change.
- Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219

Reviewed By: bdhirsh

Differential Revision: D31626931

Pulled By: ngimel

fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d
2021-10-25 14:37:09 -07:00
Michael Suo
3ac2c74896 Revert D31082208: Use shared CUPTI by default
Test Plan: revert-hammer

Differential Revision:
D31082208 (8b0eae5aa8)

Original commit changeset: 14f66af92084

fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db
2021-10-12 14:37:54 -07:00
Edward Yang
8b0eae5aa8 Use shared CUPTI by default (#65401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401

Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI
causes exception handling to break on certain compiler configurations, likely
because CUPTI comes with incompatible libstdc++ symbols.  Rather than pray that
something reasonable happens, use the safer configuration (dynamic linking) by
default and give a warning if the user inverts the setting.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: gdankel

Differential Revision: D31082208

Pulled By: ezyang

fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34
2021-10-12 11:01:40 -07:00
Nikita Shulga
c373387709 Update CMake and use native CUDA language support (#62445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445

PyTorch currently uses the old style of compiling CUDA in CMake which is just a
bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as
a language just like C++ or C.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503350

fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55
2021-10-11 09:05:48 -07:00
Michael Suo
9b40eaaaab Revert D31193205: [pytorch][PR] CMake: Limit python include directories to only python libraries
Test Plan: revert-hammer

Differential Revision:
D31193205 (971c57f1d0)

Original commit changeset: 5c1b554a59d0

fbshipit-source-id: 5719b7df987ded6e7e212749a438db947656df87
2021-09-29 09:49:33 -07:00
Peter Bell
971c57f1d0 CMake: Limit python include directories to only python libraries (#65654)
Summary:
`include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes python, numpy and pybind11 into targets that only torch_python and caffe2_pybind_state are linked to. So, python libraries can't be accidentally included elsewhere.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65654

Reviewed By: gchanan

Differential Revision: D31193205

Pulled By: malfet

fbshipit-source-id: 5c1b554a59d0e441a701a04ebb62f0032d38b208
2021-09-29 08:09:08 -07:00
Nikita Shulga
399214efd6 Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows
Test Plan: revert-hammer

Differential Revision:
D31172530 (6b60884f12)

Original commit changeset: 2c69ed0282c5

fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934
2021-09-24 19:18:15 -07:00
Guangyun Han
6b60884f12 Enable CUPTI for kineto by default on windows (#65608)
Summary:
Retry of https://github.com/pytorch/pytorch/pull/62175

See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information.

malfet gdankel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608

Reviewed By: zou3519

Differential Revision: D31172530

Pulled By: gdankel

fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66
2021-09-24 13:00:49 -07:00
Nikita Shulga
bc02255d5e Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows.
Test Plan: revert-hammer

Differential Revision:
D30721329 (7dbc21bc2b)

Original commit changeset: aa1af47df8cc

fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404
2021-09-23 22:14:32 -07:00
Guangyun Han
7dbc21bc2b Enable CUPTI for kineto by default on windows. (#62175)
Summary:
It fix nothing.

For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175

Reviewed By: ezyang

Differential Revision: D30721329

Pulled By: gdankel

fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84
2021-09-23 15:13:47 -07:00
Nick Kreeger
882b67dff4 Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892)
Summary:
The library will no longer link properly on VS 2019 (14.29.30133). To
ensure that engineers building on Windows can use and debug with this
build type, incremental linking needs to be turned off for this build
flag.

Verified that this build type successfully builds, links, and provides
debuggable Python modules on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892

Reviewed By: jbschlosser

Differential Revision: D30902565

Pulled By: malfet

fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b
2021-09-14 09:44:18 -07:00
Peter Bell
e4f44bec27 Fix pocketfft include path in mobile build (#63714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714

PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target,

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30498369

Pulled By: malfet

fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef
2021-08-23 17:48:57 -07:00
driazati
bd8608cd5c Use CMake for breakpad (#63186)
Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.

```python
import torch

# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()

# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186

Reviewed By: malfet, seemethere

Differential Revision: D30318404

Pulled By: driazati

fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc
2021-08-19 10:42:01 -07:00
Nikita Shulga
6e5d065b2b Add pocketfft as submodule (#62841)
Summary:
Using https://github.com/mreineck/pocketfft

Also delete explicit installation of pocketfft during the build as it will be available via submodule

Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5

Partially addresses https://github.com/pytorch/pytorch/issues/62821

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841

Reviewed By: seemethere

Differential Revision: D30140441

Pulled By: malfet

fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825
2021-08-17 15:29:56 -07:00
Kimish Patel
38c185189c [Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419

This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.

Test Plan:
MobileProfiler.ModuleHierarchy

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993660

fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
2021-08-13 21:40:19 -07:00
Pruthvi Madugundu
ab7a472980 [ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786)
Summary:
- HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm.
- TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786

Reviewed By: bdhirsh

Differential Revision: D30281682

Pulled By: seemethere

fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673
2021-08-13 15:00:43 -07:00
Isuru Fernando
b58e04f156 Make sure FindLAPACK finds the same BLAS library (#49647)
Summary:
BLAS library is found by cmake/Dependencies.cmake and then
LAPACK library is found by FindLAPACK.cmake which in turn calls
FindBLAS.cmake. This means that we are searching for BLAS twice
and they might be different things. By setting a few variables,
this can be avoided.

cc seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647

Reviewed By: seemethere, ejguan

Differential Revision: D29943680

Pulled By: malfet

fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59
2021-08-02 20:41:00 -07:00
Can Balioglu
7565039ee9 Support system-provided Intel TBB (#61934)
Summary:
This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic.

Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934

Reviewed By: malfet

Differential Revision: D29805416

Pulled By: cbalioglu

fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd
2021-08-02 07:39:00 -07:00
Hong Xu
7acb8b71e1 Remove AVX detection code that duplicates FindAVX.cmake (#61748)
Summary:
This PR deletes some code in `MiscCheck.cmake` that perform the exact
same functionality as `FindAVX.cmake`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748

Reviewed By: ejguan

Differential Revision: D29791282

Pulled By: malfet

fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213
2021-07-20 14:34:36 -07:00
Tongliang Liao
0afbb9e81e PYTHON_LIBRARY may be set to empty or NOTFOUND. (#61230)
Summary:
Not sure why (maybe from dependencies?) but it can certainly break package lookup upon re-entry of cmake.
So instead of checking whether they are defined, we should check whether there is any meaningful value inside.

Fixes https://github.com/pytorch/pytorch/issues/59887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61230

Reviewed By: H-Huang

Differential Revision: D29668766

Pulled By: malfet

fbshipit-source-id: 79a59578740c4434327aff4f9a22eba9c4bf48d1
2021-07-13 07:09:31 -07:00
Nikita Shulga
4036820506 Add PocketFFT support (#60976)
Summary:
Needed on platforms, that do not have MKL, such as aarch64 and M1
- Add `AT_POCKETFFT_ENABLED()` to Config.h.in
- Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT
- Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL

Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations

Fixes https://github.com/pytorch/pytorch/issues/41592

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976

Reviewed By: walterddr, driazati, janeyx99, samestep

Differential Revision: D29466530

Pulled By: malfet

fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf
2021-06-30 16:28:20 -07:00
Peter Bell
31a884987d Remove some TH includes from ATen (#60323)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60323

Test Plan: Imported from OSS

Reviewed By: malfet, anjali411

Differential Revision: D29252862

Pulled By: ngimel

fbshipit-source-id: 9ea13495d382c04dfd52b8dd63314f53b7e83936
2021-06-22 10:55:17 -07:00
Luca Wehrstedt
08ce5eedf5 [reland] Move RPC agents to libtorch (#60170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170

Reland of #59939.

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29193234

fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60
2021-06-18 05:15:09 -07:00
Mike Ruberry
f233274f30 Revert D28875276: Move RPC agents to libtorch
Test Plan: revert-hammer

Differential Revision:
D28875276 (fc50f91929)

Original commit changeset: f2f6970fd74d

fbshipit-source-id: 3c52af652579733ebea8ddfb06576a0ce262bf78
2021-06-17 00:48:58 -07:00
Luca Wehrstedt
fc50f91929 Move RPC agents to libtorch (#59939)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59939

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28875276

fbshipit-source-id: f2f6970fd74de5f112636e78edaa4410c61d8c45
2021-06-15 16:20:53 -07:00
Nikita Shulga
8845cbabf0 [CMake] Split caffe2::cudnn into public and private (#59721)
Summary:
This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721

Reviewed By: ngimel

Differential Revision: D29000967

Pulled By: malfet

fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336
2021-06-09 13:18:48 -07:00
Michael Wootton
e66015dadf Add build support for kineto + rocm (#58401)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58399

CMake changes to allow kineto to build with rocm support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58401

Reviewed By: mruberry

Differential Revision: D28479807

Pulled By: walterddr

fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7
2021-06-03 12:15:20 -07:00
neginraoof
599f5058cf [ONNX] Update ONNX to rel-1.9 (#55889) (#57080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57080

ONNX optimizer is removed in ONNX 1.9
This PR removes ONNX optimizer from a C++ code path and uses `try-except` block in Python to keep it compatible with both ONNX-1.8 and 1.9.

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D28467330

Pulled By: malfet

fbshipit-source-id: 5e4669dd0537648898e593f9e253da18d6dc7568

Co-authored-by: neginraoof <neginmr@utexas.edu>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-06-02 08:27:17 -07:00
Jeff Daily
ba694520e5 [ROCm] fix JIT codegen (#57400)
Summary:
Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT.

- ROCM_VERSION macro must be available to both device and host compilation passes.
- Unifies some of CUDA and HIP differences in the code generated.
  - NAN / POS_INFINITY / NEG_INFINITY
  - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated]
- Differentiates bf16 codegen for HIP.
- Optionally provides missing macros when using hiprtc precompiled header feature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400

Reviewed By: ejguan

Differential Revision: D28421065

Pulled By: malfet

fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074
2021-05-27 11:45:07 -07:00
Nikita Shulga
7179e7ea7b [CMake] Prefer third_party/pybind11 by default (#58951)
Summary:
To make build behaviour aligned with other third_party/ libraries,
introduce `USE_SYSTEM_PYBIND11 (d55b25a633)` build option, which set to OFF by
default, which means PyTorch will be build with bundled pybind11 even if
other version is already installed locally.

Fixes https://github.com/pytorch/pytorch/issues/58750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951

Reviewed By: driazati

Differential Revision: D28690411

Pulled By: malfet

fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c
2021-05-25 15:10:17 -07:00
Xiang Gao
6c70cbedb6 step 0 of cuDNN v8 convolution API integration (#51390)
Summary:
This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release.

The work is not complete, and this PR is only step 0.

**What this PR does:**
- Add cudnn-frontend as a submodule.
- Modify cmake to build that submodule.
- Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default.
- Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below.

**What this PR doesn't:**
- Only convolution forward, no backward. The backward will use v7 API.
- No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions.
- No test beyond PyTorch's unit tests.
  - Not tested for correctness on real models.
  - Not benchmarked for performance.
- Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR)
- cuDNN benchmark is not supported.
- There are failing tests, which will be resolved later:
  ```
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in...
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (...
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9
  FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an...
  FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet
  FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
  FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
  ```

Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390

Reviewed By: malfet

Differential Revision: D28513167

Pulled By: ngimel

fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740
2021-05-19 12:54:09 -07:00
peter
432676599c Stop installing libuv on Windows (#51936)
Summary:
Fixes #{issue number}
gunandrose4u

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51936

Reviewed By: malfet

Differential Revision: D28467662

Pulled By: seemethere

fbshipit-source-id: 28d203ee3af13d6a3158f188c2e889e310ee6010
2021-05-17 08:52:29 -07:00
Ilia Cherniavskii
6997e7bd39 Update Kineto submodule (#58179)
Summary:
Update Kineto submodule, minor api changes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58179

Test Plan: CI

Reviewed By: gdankel

Differential Revision: D28391369

Pulled By: ilia-cher

fbshipit-source-id: 61fbf63d9ec2db66fac203944679e4b99cb0d568
2021-05-13 04:03:04 -07:00
Ilia Cherniavskii
c714596027 [kineto] Update Kineto submodule, cupti library paths (#57789)
Summary:
Update kineto submodule, improve cupti detection

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57789

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D28297175

Pulled By: ilia-cher

fbshipit-source-id: 5895270fae160097ae8872a592984d0e4a1b187b
2021-05-10 19:15:59 -07:00
Ilia Cherniavskii
65fad0ebd2 Expand Kineto platform support (ci-all) (#56323)
Summary:
Expanding support to all builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56323

Test Plan: CI

Reviewed By: malfet

Differential Revision: D28171478

Pulled By: ilia-cher

fbshipit-source-id: 16bc752d1be3cbaeda5316f5d8a687ae05a83d22
2021-05-05 15:00:01 -07:00
davidriazati@fb.com
c44cbc63cc Ignore more compiler warnings, unify WERROR options (#56630)
Summary:
This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet).
](https://our.intern.facebook.com/intern/diff/28005063/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630

Pulled By: driazati

Reviewed By: malfet

Differential Revision: D28005063

fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0
2021-04-29 21:20:29 -07:00
davidriazati@fb.com
4b96fc060b Remove distutils (#57040)
Summary:
[distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places.

Fixes #56527
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040

Pulled By: driazati

Reviewed By: nikithamalgifb

Differential Revision: D28051356

fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720
2021-04-29 12:10:11 -07:00
davidriazati@fb.com
d1b6383d65 Hide warnings for deprecated quantization APIs (#56291)
Summary:
These have a tracking task to actually fix them but in the meantime they
should not be clogging up everyone's build output (see #55952).
](https://our.intern.facebook.com/intern/diff/27830229/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56291

Pulled By: driazati

Reviewed By: bertmaher

Differential Revision: D27830229

fbshipit-source-id: f1e5d6e9b2c63d4a4ad99a1744a520f8c681c22b
2021-04-19 11:11:33 -07:00
Jeff Daily
e1752ffa04 [reland][ROCm] use hiprtc precompiled header (#55965)
Summary:
Revert "Revert D27449031 (2a7df657fe): [pytorch][PR] [ROCm] use hiprtc precompiled header".  Reland PR https://github.com/pytorch/pytorch/issues/54350.

This reverts commit 204ac21bf1.

The original PR was reverted under suspicion that it was causing CI instability, but it was instead due to a hardware failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55965

Reviewed By: jbschlosser

Differential Revision: D27755907

Pulled By: malfet

fbshipit-source-id: 75bf0b9d888df3dee62f00a366b1123757e0474e
2021-04-15 15:47:56 -07:00
Eddie Yan
81f181567a Add USE_MAGMA build flag (#55994)
Summary:
Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master).

A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be *manually* deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild?

CC malfet ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994

Reviewed By: mruberry

Differential Revision: D27766287

Pulled By: malfet

fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421
2021-04-15 00:43:12 -07:00
Alexander Golynski
204ac21bf1 Revert D27449031: [pytorch][PR] [ROCm] use hiprtc precompiled header
Test Plan: revert-hammer

Differential Revision:
D27449031 (2a7df657fe)

Original commit changeset: 81a8d7847a47

fbshipit-source-id: b7b970c8ea4110357fba3ad4d52a86fa5641d90c
2021-04-01 06:42:04 -07:00
Jeff Daily
2a7df657fe [ROCm] use hiprtc precompiled header (#54350)
Summary:
HIP's runtime compiler (hiprtc) is adding support for precompiled HIP headers in the ROCm 4.2 release.  Conditionally add support for this feature.  Using this feature will improve the ROCm torch wheel user experience; users will no longer need to install HIP headers separately to use torch JIT features.

The use of this feature is conditionalized on a new ROCM_VERSION macro.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54350

Reviewed By: H-Huang

Differential Revision: D27449031

Pulled By: malfet

fbshipit-source-id: 81a8d7847a47ce2bb253d1ea58740ef66ed154a3
2021-03-31 13:36:50 -07:00
Shruti Ramesh
f1f3c8b0fa Adding PyTorch + DNNL + AMD BLIS path (#54953)
Summary:
These changes provide the user with an additional option to choose the DNNL+BLIS path for PyTorch.

This assumes BLIS is already downloaded or built from source and the necessary library file is available at the location: $BLIS_HOME/lib/libblis.so and include files are available at: $BLIS_HOME/include/blis/blis.h and $BLIS_HOME/include/blis/cblas.h

Export the below variables to build PyTorch with MKLDNN+BLIS and proceed with the regular installation procedure as below:
$export BLIS_HOME=path-to-BLIS
$export PATH=$BLIS_HOME/include/blis:$PATH LD_LIBRARY_PATH=$BLIS_HOME/lib:$LD_LIBRARY_PATH
$export BLAS=BLIS USE_MKLDNN_CBLAS=ON WITH_BLAS=blis
$python setup.py install

CPU only Dockerfile to build PyTorch with AMD BLIS is available at : docker/cpu-blis/Dockerfile
Example command line to build using the Dockerfile:
sudo DOCKER_BUILDKIT=1 docker build . -t docker-image-repo-name
Example command line to run the built docker container:
sudo docker run --name container-name -it docker-image-repo-name

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54953

Reviewed By: glaringlee

Differential Revision: D27466799

Pulled By: malfet

fbshipit-source-id: e03bae9561be3a67429df3b1be95a79005c63050
2021-03-31 10:40:25 -07:00
Jeff Daily
1dffbe759b [ROCm] utilize PUBLIC vs PRIVATE linking to avoid incorrect dependencies (#54727)
Summary:
Fixes the build of projects that depend on torch, such as torchaudio.  Otherwise torchaudio will complain that gloo_hip is missing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54727

Reviewed By: H-Huang

Differential Revision: D27361513

Pulled By: ezyang

fbshipit-source-id: 714cc2db23e7adf3e89303e941b78c27625b9460
2021-03-30 19:22:56 -07:00
Michael Melesse
2620bce42a [ROCM] load only hipfft separately past rocm4.1 (#54349)
Summary:
This PR is a follow up to https://github.com/pytorch/pytorch/pull/53408.

It only loads hipfft if the version is rocm 4.1 or after and stops loading rocfft. This was done to resolve some issues observed in our internal ci due to conflicts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54349

Reviewed By: ezyang

Differential Revision: D27374252

Pulled By: ngimel

fbshipit-source-id: 724e80df5011ea8fabd81739e18ae8a13d3a7ea0
2021-03-26 19:54:25 -07:00
Michael Melesse
4c1af249fb [ROCM] load hipfft separately from rocfft (#53408)
Summary:
This PR makes changes to how hipfft is loaded in pytorch. hipfft is packaged in a separate library to rocfft following rocm 4.1.

We check the rocm version and if it is past rocm 4.1 we load hipfft in addition to rocfft.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53408

Reviewed By: albanD

Differential Revision: D26952702

Pulled By: malfet

fbshipit-source-id: f42be304b587c060816e39d36f5c1a2cdc37bfab
2021-03-11 09:18:33 -08:00
Ilia Cherniavskii
795ed5ca3f Enable Kineto in CPU builds (#53174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53174

Enable Kineto also in the CPU builds (non-mobile, non-Windows(atm))

Test Plan: CI

Reviewed By: gdankel

Differential Revision: D26776112

Pulled By: ilia-cher

fbshipit-source-id: 8733f65c2993105136c853f2a7b6e497d0fa53bf
2021-03-04 19:15:52 -08:00
Ashkan Aliabadi
e5ecd1ddf8 [Vulkan]Fix build warnings-treated-as-error on Linux. (#52781)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D26669311

Pulled By: AshkanAliabadi

fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311
2021-03-03 13:48:43 -08:00
Jeff Daily
d02ea9a141 [ROCm] add hipMAGMA support (#51238)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48831.

- CI image is updated to build hipMAGMA from source and set env MAGMA_HOME.
- CMake is updated to separate different requirements for CUDA versus ROCm MAGMA.
- Some unit tests that become enabled with MAGMA are currently skipped for ROCm due to failures.  Fixing these failures will be follow-on work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51238

Reviewed By: ngimel

Differential Revision: D26184918

Pulled By: malfet

fbshipit-source-id: ada632f1ae7b413e8cae6543fe931dcd46985821
2021-02-01 22:09:33 -08:00
Luca Wehrstedt
b77f72b5a0 Enable TensorPipe's SHM transport (#50760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50760

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.
ghstack-source-id: 120470890

Test Plan: Exported three times to CircleCI with tests consistently passing

Reviewed By: mrshenli

Differential Revision: D23814828

fbshipit-source-id: f355cb6515776debad536924de4f4d3fbb05a874
2021-01-27 11:45:09 -08:00
Jeff Daily
b2e5617553 [ROCm] rename HIP_HCC_FLAGS to HIP_CLANG_FLAGS (#50917)
Summary:
ROCm 3.5 replaced hcc with hip-clang and deprecated HIP_HCC_FLAGS.
HIP_CLANG_FLAGS should be used moving forward. HIP_HCC_FLAGS will
be removed soon.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50917

Reviewed By: ejguan

Differential Revision: D26008094

Pulled By: walterddr

fbshipit-source-id: cfec4f96fbd9bd338834a841c37267f6a4703cab
2021-01-22 07:24:05 -08:00
Ilia Cherniavskii
e34992ebee Set USE_KINETO=1 (#49897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897

Resend of https://github.com/pytorch/pytorch/pull/49201

Test Plan: see 49201

Reviewed By: malfet

Differential Revision: D25717102

Pulled By: ilia-cher

fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6
2021-01-22 00:09:21 -08:00
Luca Wehrstedt
112a583467 Enable TensorPipe's CMA channel (#50759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50759

ghstack-source-id: 120032288

Test Plan: Exported to CircleCI and tested

Reviewed By: mrshenli

Differential Revision: D25959326

fbshipit-source-id: be6df209ff3a79a8961acbda64ee7805a5c434a9
2021-01-20 10:53:47 -08:00
Ilia Cherniavskii
72b00a8a52 Revert D25480770: Set USE_KINETO=1
Test Plan: revert-hammer

Differential Revision:
D25480770 (1a92802bde)

Original commit changeset: 037cd774f554

fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1
2020-12-18 07:06:28 -08:00
Ilia Cherniavskii
1a92802bde Set USE_KINETO=1 (#49201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201

This unblocks kineto profiler for 1.8 release.
This PR supercedes https://github.com/pytorch/pytorch/pull/48391
Note: this will somewhat increase the size of linux server binaries, bc
we add libkineto.a and libcupti_static.a:
-rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a
-rw-r--r-- 1 root root 13699658 Nov 13  2019 /usr/local/cuda/lib64/libcupti_static.a

Test Plan:
CI
https://github.com/pytorch/pytorch/pull/48391

Imported from OSS

Reviewed By: ngimel

Differential Revision: D25480770

fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c
2020-12-18 01:48:10 -08:00
Abdelrauf
95a1725a4a Vsx initial support issue27678 (#41541)
Summary:
### Pytorch Vec256 ppc64le support
implemented types:

- double
- float
- int16
- int32
- int64
- qint32
- qint8
- quint8
- complex_float
- complex_double

Notes:
All basic vector operations are implemented:
There are a few problems:
- minimum maximum nan propagation for ppc64le is missing and was not checked
- complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones.  That's why they were either excluded or tested in smaller domain range
- precisions of the implemented float math functions

~~Besides, I added CPU_CAPABILITY for power. but as because of  quantization errors for DEFAULT I had to undef and  use vsx for DEFAULT too~~

#### Details
##### Supported math functions

+ plus sign means vectorized, -  minus sign means missing,   (implementation notes are added inside braces)
(notes). Example: -(both ) means it was also missing on x86 side
g( func_name)  means vectorization is using func_name
sleef - redirected to the Sleef
unsupported

function_name | float | double | complex float | complex double
|-- | -- | -- | -- | --|
acos | sleef | sleef | f(asin) | f(asin)
asin | sleef | sleef | +(pytorch impl) | +(pytorch impl)
atan | sleef | sleef | f(log) | f(log)
atan2 | sleef | sleef | unsupported | unsupported
cos | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both)
cosh | f(exp)   | -(both) | -(both) |
erf | sleef | sleef | unsupported | unsupported
erfc | sleef | sleef | unsupported | unsupported
erfinv | - (both) | - (both) | unsupported | unsupported
exp | + | sleef | - (x86:f()) | - (x86:f())
expm1 | f(exp)  | sleef | unsupported | unsupported
lgamma | sleef | sleef |   |
log | +  | sleef | -(both) | -(both)
log10 | f(log)  | sleef | f(log) | f(log)
log1p | f(log)  | sleef | unsupported | unsupported
log2 | f(log)  | sleef | f(log) | f(log)
pow | + f(exp)  | sleef | -(both) | -(both)
sin | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both)
sinh | f(exp)  | sleef | -(both) | -(both)
tan | sleef | sleef | -(both) | -(both)
tanh | f(exp)  | sleef | -(both) | -(both)
hypot | sleef | sleef | -(both) | -(both)
nextafter | sleef  | sleef | -(both) | -(both)
fmod | sleef | sleef | -(both) | -(both)

[Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685)
Current list:

- [x] Blends
- [x] Memory: UnAlignedLoadStore
- [x] Arithmetics: Plus,Minu,Multiplication,Division
- [x] Bitwise: BitAnd, BitOr, BitXor
- [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual
- [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp
- [x] SignManipulation: Absolute, Negate
- [x] Interleave: Interleave, DeInterleave
- [x] Rounding: Round, Ceil, Floor, Trunc
- [x] Mask: ZeroMask
- [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal
- [x] Trigonometric: Sin, Cos, Tan
- [x] Hyperbolic: Tanh, Sinh, Cosh
- [x] InverseTrigonometric: Asin, ACos, ATan, ATan2
- [x] Logarithm: Log, Log2, Log10, Log1p
- [x] Exponents: Exp, Expm1
- [x] ErrorFunctions: Erf, Erfc, Erfinv
- [x] Pow: Pow
- [x] LGamma: LGamma
- [x] Quantization: quantize, dequantize, requantize_from_int
- [x] Quantization: widening_subtract, relu, relu6
Missing:
- [ ] Constructors, initializations
- [ ] Conversion , Cast
- [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex)

#### Notes on tests and testing framework
- some math functions are tested within domain range
- mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions.
- some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~
- round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx  vec_round failed  for  (even)+0 .5 values**~~ . it was solved by using vec_rint
- ~~**complex types are not tested**~~  **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain**
- ~~quantizations are not tested~~  Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions
- the testing framework should be improved further
- ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~
Vec256 Test cases will be built for each CPU_CAPABILITY

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41541

Reviewed By: zhangguanheng66

Differential Revision: D23922049

Pulled By: VitalyFedyunin

fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098
2020-12-10 13:42:39 -08:00
peterjc123
5450614cf6 Correctly apply WIN32_LEAN_AND_MEAN to the whole repo (#49025)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49025

Reviewed By: zhangguanheng66

Differential Revision: D25399912

Pulled By: ezyang

fbshipit-source-id: 9b7225b0e43511e0b8981c39035d814a4406c523
2020-12-08 19:38:23 -08:00
Rong Rong
b89c328493 Add fftw3 cmake as alternative for FFT/DFT (#48808)
Summary:
added cmake discovery in Dependencies.cmake for fftw3.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48808

Reviewed By: janeyx99

Differential Revision: D25375320

Pulled By: walterddr

fbshipit-source-id: cde3afc51eef9c621c7d19be7ad7573fc8b838c2
2020-12-08 10:35:33 -08:00