Commit Graph

786 Commits

Author SHA1 Message Date
Huy Do
24e9bbe22a Revert "Flash Attention v2 (#105602)" (#108827)
This reverts commit add45aea1c.

There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually.

The diff has been reverted internally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827
Approved by: https://github.com/kit1980
2023-09-08 02:54:20 +00:00
cyy
621463a3e6 Update libfmt submodule to 10.1.1 (#108431)
This PR updates libfmt to version 10.1.1. We also set utf-8 source encoding earlier before include third party libraries on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108431
Approved by: https://github.com/Skylion007
2023-09-03 23:44:39 +00:00
drisspg
add45aea1c Flash Attention v2 (#105602)
# Summary
## PR Dependencies
I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier:
- [x] Separate build flags for Flash and MemEff: #107985

### Description
This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao

### Changes Made
The majority of the changes in this pull request involve:

- Copying over the flash_attention sources.
- Updating header files.
- Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd.
- Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates.
- Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80.
- Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes.
- Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources.
- Adding/Updating tests.

### Notes for Reviewers
This is not a fun review, and I apologize in advance.
Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO:
- aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp
- aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github)

There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts.

### Follow up items
- Include the updates from e07aa036db and 9e5e8bc91e | https://github.com/pytorch/pytorch/issues/108108

### Work Items
- [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee
- [x] Let multi_query/attention pass through and test | UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup.
- [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers.
- [x] Update test exercise above codepath
- [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it a4f148b6ab)
- [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b
- [x] Update dispatcher to universally prefer FlashV2
- [x] Update tests to exercise new head_dims
- [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional
- [x] Create template generator script
- [x] Initial cmake support for building kernels/ folder
- [x] Replay CudaGraph changes

### Results
#### Forward only
The TFlops are reported here are on a100 that is underclocked.
![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7)

#### Forward+Backward
Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back.
<img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602
Approved by: https://github.com/huydhn, https://github.com/cpuhrsch
2023-09-01 22:14:44 +00:00
PyTorch MergeBot
d569e506ab Revert "Flash Attention v2 (#105602)"
This reverts commit 9df3d882c8.

Reverted https://github.com/pytorch/pytorch/pull/105602 on behalf of https://github.com/huydhn due to I think we miss a case here for sm80 build on inductor workflow as it is now OOM on trunk https://github.com/pytorch/pytorch/actions/runs/6042843139 ([comment](https://github.com/pytorch/pytorch/pull/105602#issuecomment-1701974862))
2023-09-01 01:15:01 +00:00
drisspg
9df3d882c8 Flash Attention v2 (#105602)
# Summary
## PR Dependencies
I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier:
- [x] Separate build flags for Flash and MemEff: #107985

### Description
This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao

### Changes Made
The majority of the changes in this pull request involve:

- Copying over the flash_attention sources.
- Updating header files.
- Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd.
- Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates.
- Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80.
- Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes.
- Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources.
- Adding/Updating tests.

### Notes for Reviewers
This is not a fun review, and I apologize in advance.
Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO:
- aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp
- aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github)

There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts.

### Follow up items
- Include the updates from e07aa036db and 9e5e8bc91e | https://github.com/pytorch/pytorch/issues/108108

### Work Items
- [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee
- [x] Let multi_query/attention pass through and test | UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup.
- [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers.
- [x] Update test exercise above codepath
- [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it a4f148b6ab)
- [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b
- [x] Update dispatcher to universally prefer FlashV2
- [x] Update tests to exercise new head_dims
- [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional
- [x] Create template generator script
- [x] Initial cmake support for building kernels/ folder
- [x] Replay CudaGraph changes

### Results
#### Forward only
The TFlops are reported here are on a100 that is underclocked.
![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7)

#### Forward+Backward
Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back.
<img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602
Approved by: https://github.com/huydhn, https://github.com/cpuhrsch
2023-08-31 16:02:20 +00:00
drisspg
182a9cf366 Add Independent Memory Efficient and Flash Attention Build Flags (#107985)
# Summary
In an effort to simplify https://github.com/pytorch/pytorch/pull/105602, this PR pulls out independent chunks of code that can be landed prior to FlashV2 landing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107985
Approved by: https://github.com/cpuhrsch
2023-08-28 18:39:18 +00:00
peterjc123
8507b22fea propagate _GLIBCXX_USE_CXX11_ABI to NVCC (#107209)
Fixes #107161

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107209
Approved by: https://github.com/malfet
2023-08-16 22:41:52 +00:00
Jesse Cai
f81f9093ec [core][pruning][feature] cuSPARSELt build integration (#103700)
Summary:

This stack of PR's integrates cuSPARSELt into PyTorch.

This PR adds support for cuSPARSELt into the build process.
It adds in a new flag, USE_CUSPARSELT that defaults to false.

When USE_CUSPASRELT=1 is specified, the user can also specify
CUSPASRELT_ROOT, which defines the path to the library.

Compiling pytorch with cusparselt support can be done as follows:

``
USE_CUSPARSELT=1
CUSPARSELT_ROOT=/path/to/cusparselt

python setup.py develop
```

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700
Approved by: https://github.com/albanD
2023-08-02 12:48:39 +00:00
Driss Guessous
d184c81166 Add -fstandalone-debug debug flag (#104475)
# Summary

While debugging something in lldb, I found that the formatter I wrote for c10::intarrayref was not working correctly producing:
`(std::string) $6 = error: summary string parsing error`

Based off of this thread: https://github.com/vadimcn/codelldb/issues/415

I adde the standalone-debug information and fixed the std::string formatting issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104475
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-11 01:29:20 +00:00
Nikita Shulga
456ecefd52 [BE] Fix warning in top-level CMakeLists.txt (#104726)
Fixes warning introduced by https://github.com/pytorch/pytorch/issues/102594:
```
CMake Warning (dev) in CMakeLists.txt:
  A logical block opening on the line
    /pytorch/CMakeLists.txt:726 (if)
  closes on the line
    /pytorch/CMakeLists.txt:735 (endif)
  with mis-matching arguments.
```

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at b7555d5</samp>

> _`DEBUG_CUDA` on_
> _No more CUDA in exe_
> _Winter bug is fixed_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104726
Approved by: https://github.com/huydhn, https://github.com/atalman
2023-07-06 22:13:29 +00:00
Xu Han
a956b1c849 optimize mimalloc build options. (#104497)
1. pytorch only need static lib, disable other libs.
2. disable override, pytorch only access mimalloc via cpu_alloc/cpu_free.

Reference: https://github.com/microsoft/mimalloc/blob/master/CMakeLists.txt#L10-L25

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104497
Approved by: https://github.com/jgong5, https://github.com/albanD
2023-07-06 04:44:21 +00:00
Edward Z. Yang
3dc4adc7a6 Don't build CUDA with debug info by default. (#102617)
Fixes https://github.com/pytorch/pytorch/issues/102594

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102617
Approved by: https://github.com/malfet
2023-07-05 20:16:19 +00:00
Connor Baker
0c8323e4a4 cmake: allow USE_SYSTEM_ZSTD (#104611)
Fixes #44255.

This is part of larger work I'm doing to allow for more `USE_SYSTEM_*` options to allow Nix to have faster re-builds of PyTorch: https://github.com/NixOS/nixpkgs/pull/239291.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104611
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-05 04:47:35 +00:00
Connor Baker
e8174faa02 cmake: respect USE_SYSTEM_LIBS when USE_NCCL is set (#104511)
Even though `USE_SYSTEM_LIBS` is set to true, we still need to set `USE_SYSTEM_NCCL` for the system NCCL to be used.

This fixes that by adding a conditional `set` similar to what is done for `USE_TBB`: e9ebda29d8/CMakeLists.txt (L426-L428)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104511
Approved by: https://github.com/ezyang
2023-07-04 19:08:50 +00:00
Xu Han
6c1ccccf21 Enable mimalloc on pytorch Windows (#102595)
This PR is implemention of [#102534](https://github.com/pytorch/pytorch/issues/102534), option 2.
Major changes:
1. Add mimalloc to the submodule.
2. Add build option "USE_MIMALLOC".
3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance.

Additional Test:
<img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3">
This PR also build & static link mimalloc on Linux well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102595
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-06-27 08:53:26 +00:00
cyy
483f748dd5 [BE] Enforce missing override keyword (#104032)
This PR enables `-Winconsistent-missing-destructor-override` and `-Winconsistent-missing-override`
and fixes violations.

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 47e904e</samp>

This pull request updates the code of various classes and operators in the `caffe2` and `aten` subdirectories to use the `override` specifier instead of the `virtual` keyword for destructors and other virtual functions that override a base class function. This improves the code readability, quality, and consistency with C++ best practices. It also modifies the `./CMakeLists.txt` file to enable warnings for these specifiers, but disable errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104032
Approved by: https://github.com/malfet
2023-06-24 02:34:24 +00:00
Nikita Shulga
0b7320315a [CI] Move libtorch-debug CUDA build to CUDA-12.1 (#102756)
To avoid nvcc segfaults, compile without `--source-in-ptx` option on CUDA-12.1+

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 984e4b2</samp>

> _Sing, O Muse, of the daring deeds of PyTorch, the swift and fiery_
> _framework that harnesses the power of CUDA, the blazing tool of Nvidia._
> _How they faced a mighty challenge when CUDA, the ever-shifting,_
> _released a new version, twelve point one, that broke their code and caused them grief._

Fixes https://github.com/pytorch/pytorch/issues/102372

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102756
Approved by: https://github.com/atalman
2023-06-01 23:11:07 +00:00
Nikita Shulga
30cecc0e11 [MPS] Fix build regressions introduced by #92868 (#101036)
https://github.com/pytorch/pytorch/pull/92868 introduced  `OBJC` and `OBJCXX` language dialects, but fails to propagate some important flags, like OpenMP include path(if found),  `-fno-objc-arc` and `-Wno-unguarded-availability-new` suppression.

This PR remedies that and fixes https://github.com/pytorch/pytorch/issues/100925

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 62677d4</samp>

This pull request improves the support for MPSGraph on Apple platforms by fixing some CMake flags for parallelism and memory management. It modifies `cmake/Dependencies.cmake` and `CMakeLists.txt` accordingly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101036
Approved by: https://github.com/atalman, https://github.com/huydhn
2023-05-10 04:15:41 +00:00
TachikakaMin
bb28f3f519 USE_PRECOMPILED_HEADERS is not supported on Apple M1 (#92868)
Fixes #80018

```bash
MACOSX_DEPLOYMENT_TARGET=12.6 CC=gcc CXX=g++ DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 USE_PRECOMPILED_HEADERS=1 USE_MPS=1 python setup.py develop
```

`error: Objective-C was disabled in PCH file but is currently enabled`

This PR(https://github.com/pytorch/pytorch/pull/80432) has been reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92868
Approved by: https://github.com/kulinseth, https://github.com/malfet
2023-05-08 16:03:34 +00:00
Bin Bao
e43918b93a [inductor] Fix AOTInductor (#99203)
Summary: Fix the broken AOTInductor flow and add a smoketest on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99203
Approved by: https://github.com/jansel
2023-04-25 14:42:12 +00:00
Nikita Shulga
6b8ef8ea4c [BE] Build PyTorch with -Wnewline-eof (#99687)
This would avoid further regressions like the ones reported in https://github.com/pytorch/pytorch/pull/96668#issuecomment-1468029259

Surround some ONNX/flatbuffer includes with `C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED("-Wnewline-eof")` cone of shame

Fixes https://github.com/pytorch/pytorch/issues/96747

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99687
Approved by: https://github.com/kit1980
2023-04-21 14:46:47 +00:00
Nikita Shulga
a8f5d72edf Guard color diagnostics opts by compiler type (#98952)
On Linux system where `/usr/bin/c++` is not a symlink to either `g++` or `clang++`, `try_compile` can still incorrectly identify `gcc` as supporting `-fcolor-diagnostics` flag.

Rather than introducing a super complex condition (i.e. `USE_CCACHE` and `LINUX` ...) just guard the checks specific to compiler identifier.

See https://github.com/ccache/ccache/issues/1275

Fixes https://github.com/pytorch/pytorch/issues/83500

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98952
Approved by: https://github.com/albanD
2023-04-12 23:39:37 +00:00
Nikita Shulga
af0264ae08 [BE] Pass -faligned-new if supported by compiler (#97887)
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 507f7a2</samp>

> _`-faligned-new` flag_
> _always on for C++17_
> _simpler winter code_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97887
Approved by: https://github.com/atalman, https://github.com/Skylion007
2023-03-30 03:16:19 +00:00
QiangZiBro
a95815c6b7 fix compiler version detection on MacOS (#97883)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 43c1df6</samp>

Fix build error on macOS with Xcode 12 or newer by updating clang version detection in `CMakeLists.txt`.

Fixes https://github.com/pytorch/pytorch/issues/97882

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97883
Approved by: https://github.com/malfet
2023-03-30 02:56:22 +00:00
Nikita Shulga
96e3b3ac72 [BE] Cleanup CMake flag suppressions (#97584)
Use `append_cxx_flag_if_supported` to determine whether or not `-Werror` is supported
Do not suppress deprecation warnings if glog is not used/installed, as the way check is written right now, it will suppress deprecations even if `glog` is not installed.
Similarly, do not suppress deprecations on MacOS simply because we are compiling with protobuf.
Fix deprecation warnings in:
 - MPS by replacing `MTLResourceOptionCPUCacheModeDefault`->`MTLResourceCPUCacheModeDefaultCache`
 - In GTests by replacing `TYPED_TEST_CASE`->`TYPED_TEST_SUITE`
 - In `codegen/onednn/interface.cpp`, by using passing `Stack` by reference rathern than pointer.

Do not guard calls to `append_cxx_flag_if_supported` with `if(CLANG)` or `if(GCC)`.
Fix some deprecated calls in `Metal` hide more complex exception under `C10_CLANG_DIAGNOSTIC_IGNORE`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97584
Approved by: https://github.com/kit1980
2023-03-27 18:46:09 +00:00
Nikita Shulga
14177f0d3d [BE] Make USE_FLASH_ATTENTION private (#97579)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at b07152e</samp>

This pull request refactors the CMake configuration to enable the `USE_FLASH_ATTENTION` feature for the `torch_cuda` target only, using a target-specific macro. This avoids conflicts with other libraries that also use this feature, such as fairseq.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97579
Approved by: https://github.com/kit1980
2023-03-25 05:41:07 +00:00
mikey dagitses
5f5d675587 remove unused CAFFE2_VERSION macros (#97337)
remove unused CAFFE2_VERSION macros

Summary:
Nothing reads these and they are completely subsumed by TORCH_VERSION.

Getting rid of these will be helpful for build unification, since they
are also not used internally.

Test Plan: Rely on CI.

Reviewers: sahanp

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97337
Approved by: https://github.com/malfet
2023-03-24 16:02:35 +00:00
Nikita Shulga
62c1e33fc9 [BE] Remove fast_nvcc tool (#96665)
As of CUDA-11.4+ this functionality can be mimicked by passing
[`--threads`](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#threads-number-t) option to CUDA compiler

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96665
Approved by: https://github.com/atalman, https://github.com/PaliC
2023-03-14 03:17:31 +00:00
cyy
666efd8d5d Improve ASAN and TSAN handling in cmake (#93147)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93147
Approved by: https://github.com/malfet
2023-03-07 14:10:13 +00:00
Peter Bell
c5f6092591 Use FindCUDAToolkit to find cuda dependencies (#82695)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695
Approved by: https://github.com/malfet
2023-03-01 17:26:36 +00:00
PyTorch MergeBot
801b3f8fc7 Revert "Use FindCUDAToolkit to find cuda dependencies (#82695)"
This reverts commit 7289d22d67.

Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/peterbell10 due to Breaks torchaudio build
2023-02-28 02:29:09 +00:00
cyy
f27e09de04 Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927)
This PR do two things:
1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang.
2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927
Approved by: https://github.com/malfet
2023-02-27 19:22:20 +00:00
cyy
c1fa403e57 suppress nvfuser loading warning when we disable nvfuser (#95603)
To avoid annoying warnings such as "[W interface.cpp:47] Warning: Loading nvfuser library failed"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95603
Approved by: https://github.com/ezyang
2023-02-27 18:56:46 +00:00
Peter Bell
7289d22d67 Use FindCUDAToolkit to find cuda dependencies (#82695)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695
Approved by: https://github.com/malfet
2023-02-21 22:35:17 +00:00
cyy
1ab112cfab code is clean enough that some warnings can be enabled (#95139)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95139
Approved by: https://github.com/Skylion007
2023-02-21 07:24:20 +00:00
jjsjann123
21eb7f70f1 Nvfuser python API import fix (#94036)
1. Having nvfuser python API import working with both devel and upstream;
2. Add environment variable to allow custom nvfuser code base to be built with upstream pytorch core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94036
Approved by: https://github.com/malfet, https://github.com/davidberard98
2023-02-16 20:10:40 +00:00
Jing Xu
8b37eff69f remove abi uncertainty and potential abi conflict (#94306)
Currently there is a potential conflict for `GLIBCXX_USE_CXX11_ABI` configuration if users don't explicitly set this variable.
In `caffe2/CMakeLists.txt`, if the variable is not set, an `abi checker` will be used to retrieve the ABI configuration from compiler.
https://github.com/pytorch/pytorch/blob/master/caffe2/CMakeLists.txt#L1165-L1183
However, in 'torch/csrc/Module.cpp`, if the variable is not set, it will be set to `0`. The conflict happens when the default ABI of the compiler is `1`.
https://github.com/pytorch/pytorch/blob/master/torch/csrc/Module.cpp#L1612

This PR eliminate this uncertainty and potential conflict.
The ABI will be checked and set in `CMakeLists.txt`, and pass the value to `caffe2/CMakeLists.txt`. Meanwhile, in case the `caffe2/CMakeLists.txt` is directly invoked from a `cmake` command, The original GLIBC check logic is kept in this file.
If users doesn't explicitly assign a value to `GLIBCXX_USE_CXX11_ABI`, the `abi checker` will be executed and set the value accordingly. If the `abi checker` failed to compile or execute, the value will be set to `0`. If users explicitly assigned a value, then the provided value will be used.

Moreover, if `GLIBCXX_USE_CXX11_ABI` is set to `0`, the '-DGLIBCXX_USE_CXX11_ABI=0' flag won't be appended to `CMAKE_CXX_FLAGS`. Thus, whether to use ABI=0 or ABI=1 fully depends on compiler's default configuration. It could cause an issue that even users explicitly set `GLIBCXX_USE_CXX11_ABI` to `0`, the compiler still builds the binaries with ABI=1.
https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L44-L51
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94306
Approved by: https://github.com/malfet
2023-02-09 09:54:04 +00:00
cyy
9291f9b9e2 Simplify cmake code (#91546)
We use various newer CMake features to simplify build system:
1.Caffe2::threads is replaced by threads::threads.
2.Some unused MSVC flags are removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91546
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-08 01:05:19 +00:00
PyTorch MergeBot
1063394898 Revert "Add fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries for _GLIBCXX_USE_CXX11_ABI=1 (#93835)"
This reverts commit b562be793a.

Reverted https://github.com/pytorch/pytorch/pull/93835 on behalf of https://github.com/huydhn due to This breaks XLA build b562be793a
2023-02-07 04:49:06 +00:00
zhuhong61
b562be793a Add fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries for _GLIBCXX_USE_CXX11_ABI=1 (#93835)
Fixes #https://github.com/pytorch/pytorch/pull/92550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93835
Approved by: https://github.com/malfet
2023-02-07 03:05:39 +00:00
Aaron Gokaslan
2fc2ca7652 [BE]: Fix CMake LTO policy on pytorch (#93388)
Not this is a non-functional change since non of our CIs actually build with LTO.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93388
Approved by: https://github.com/albanD
2023-02-01 17:06:53 +00:00
Dmytro Dzhulgakov
5105a8d3fc Enable Kineto in OSS builds by fixing build condition (resubmit) (#93033)
Resubmit of https://github.com/pytorch/pytorch/pull/89174 . I think I fixed underlying issues back then, but only CI would tell.

Context: This PR enables Kineto on OSS builds because of how the flags were misconfigured before. I think generally having global observer in OSS is nice. There's some work to release on demand profiling with dynolog, and right now its build instructions start with "go change pytorch's CMake": https://github.com/facebookincubator/dynolog/blob/main/docs/pytorch_profiler.md#pytorch-setup

The previous PR was reverted because of the bug in Kineto that got fixed in https://github.com/pytorch/kineto/pull/696 (and the submodule was updated since)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93033
Approved by: https://github.com/kimishpatel
2023-01-27 08:58:03 +00:00
jjsjann123
c11b301bcd [NVFUSER] refactor nvfuser build (#89621)
This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library.

Contents inside this PR:
1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp)
2. splits the build system so nvfuser is generating its own `.so` files. Currently there are:
    - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser
    - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser`
3. nvfuser cpp tests is currently being compiled into `nvfuser_tests`
4. cmake is refactored so that:
    - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`.
    - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more
    - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built.
    - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary`

Future work that's scoped in following PR:
- Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet
- Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621
Approved by: https://github.com/davidberard98
2023-01-26 02:50:44 +00:00
Driss Guessous
a3715efd8b Remove windows check for cmake to build Fused kernels (#91909)
# Summary
Add support for fused attention kernels (FlashAttention and memory-efficient attention) on Windows. Previously we could not do this because the fixes required c++17 to do this but we have since update the PyTorch standard.

This PR:
- Changes invocations of unsigned long to the fixed width integer type
- Adds in the #define FP16_SWITCH(COND, ...) which has been added to the flash_attention main branch
- Changes the some macros used within mem-efficient attention code in order to work around the VA_ARG discrepancy between clang/gcc and msvc. An alternative would be setting the global flag Zc:preprocessor
- Selectively applies /Zc:lambda to only the mem-efficient sources since applying this globally caused quantization files to not compile

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91909
Approved by: https://github.com/cpuhrsch
2023-01-25 01:21:12 +00:00
PyTorch MergeBot
523d4f2562 Revert "[cuDNN][cuDNN V8 API] Always build assuming cuDNN >= 8.0 (#91527)"
This reverts commit 4d07ad74f1.

Reverted https://github.com/pytorch/pytorch/pull/91527 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-16 13:28:09 +00:00
Edward Z. Yang
1da0ac2c93 Enable -Werror=bool-operation (#92221)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92221
Approved by: https://github.com/Skylion007
2023-01-15 20:49:53 +00:00
Eddie Yan
4d07ad74f1 [cuDNN][cuDNN V8 API] Always build assuming cuDNN >= 8.0 (#91527)
We've been building with V8 (incl. V8 API) by default for a while now; this PR cleans up some guards for cuDNN < 8.0.

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91527
Approved by: https://github.com/ngimel
2023-01-13 18:55:37 +00:00
Huy Do
33e3c9ac67 Not explicitly set the manifest filename in Windows (#91988)
I'm at a loss to explain why this happens, but not setting the manifest file explicitly in the linker fixes it.

### Testing locally

* With `/MANIFESTFILE:bin\torch_python.dll.manifest`
```
C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\torch_python.rsp /out:bin\torch_python.dll /implib:lib\torch_python.lib /pdb:bin\torch_python.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /NODEFAULTLIB:LIBCMT.LIB -WHOLEARCHIVE:C:/actions-runner/_work/pytorch/pytorch/build/lib/onnx.lib /MANIFEST /MANIFESTFILE:bin\torch_python.dll.manifest

LINK : fatal error LNK1000: Internal error during CImplib::EmitImportThunk
```

* Work fine without the flag
```
C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\torch_python.rsp /out:bin\torch_python.dll /implib:lib\torch_python.lib /pdb:bin\torch_python.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /NODEFAULTLIB:LIBCMT.LIB -WHOLEARCHIVE:C:/actions-runner/_work/pytorch/pytorch/build/lib/onnx.lib /MANIFEST
```

In both case, the `/MANIFEST` flag is set, so the manifest file is there.  In the latter case, the filename comes by appending `.manifest` suffix to `bin\torch_python.dll`.  Thus, it's still correctly be `bin\torch_python.dll.manifest`.  Weird.

```
C:\actions-runner\_work\pytorch\pytorch>ls -la build/bin/torch_*
-rwxr-xr-x 1 runneruser 197121 246796288 Jan 11 04:30 build/bin/torch_cpu.dll
-rw-r--r-- 1 runneruser 197121       381 Jan 11 04:26 build/bin/torch_cpu.dll.manifest
-rwxr-xr-x 1 runneruser 197121      9728 Jan 11 03:55 build/bin/torch_global_deps.dll
-rw-r--r-- 1 runneruser 197121       381 Jan 11 03:55 build/bin/torch_global_deps.dll.manifest
-rwxr-xr-x 1 runneruser 197121  11746816 Jan 11 04:31 build/bin/torch_python.dll
-rw-r--r-- 1 runneruser 197121       381 Jan 11 04:30 build/bin/torch_python.dll.manifest
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91988
Approved by: https://github.com/malfet, https://github.com/Blackhex, https://github.com/ZainRizvi
2023-01-11 22:28:08 +00:00
salilsdesai
ec94cbc66a [Vulkan] Remove GLSL Code Gen (#91912)
@bypass-github-export-checks

GLSL Code Gen is not used, so this diff removes
- GLSL parts of ShaderSource
- Anything enclosed by USE_VULKAN_SHADERC_RUNTIME, as well as the flag itself
- gen_vulkan_glsl script

Plus some additional refactoring

Differential Revision: [D41358861](https://our.internmc.facebook.com/intern/diff/D41358861/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41358861/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91912
Approved by: https://github.com/mcr229
2023-01-10 20:29:47 +00:00
cyy
9710ac6531 Some CMake and CUDA cleanup given recent update to C++17 (#90599)
The main changes are:
1. Remove outdated checks for old compiler versions because they can't support C++17.
2. Remove outdated CMake checks because it now requires 3.18.
3. Remove outdated CUDA checks because we are moving to CUDA 11.

Almost all changes are in CMake files for easy audition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599
Approved by: https://github.com/soumith
2022-12-30 11:19:26 +00:00
Mengwei Liu
2f154f68ea [torchgen] Add CI job to make sure torchgen works for Executorch op registration (#89596)
## Job

Test running on most CI jobs.

## Test binary

* `test_main.cpp`: entry for gtest
* `test_operator_registration.cpp`: test cases for gtest

## Helper sources

* `operator_registry.h/cpp`: simple operator registry for testing purpose.
* `Evalue.h`: a boxed data type that wraps ATen types, for testing purpose.
* `selected_operators.yaml`: operators Executorch care about so far, we should cover all of them.

## Templates

* `NativeFunctions.h`: for generating headers for native functions. (not compiled in the test, since we will be using `libtorch`)
* `RegisterCodegenUnboxedKernels.cpp`: for registering boxed operators.
* `Functions.h`: for declaring operator C++ APIs. Generated `Functions.h` merely wraps `ATen/Functions.h`.

## Build files

* `CMakeLists.txt`: generate code to register ops.
* `build.sh`: driver file, to be called by CI job.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89596
Approved by: https://github.com/ezyang
2022-12-21 03:07:32 +00:00
mikey dagitses
322e4b4c8a set -Wsuggest-override for builds (#89852)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852).
* __->__ #89852
* #89851

set -Wsuggest-override for builds

Summary: This was flagged by a Meta internal build.

Test Plan: Rely on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852
Approved by: https://github.com/malfet
2022-12-19 22:08:47 +00:00
mikey dagitses
8bd959e462 set -Winconsistent-missing-override for builds (#89851)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89851).
* #89852
* __->__ #89851

set -Winconsistent-missing-override for builds

Summary: This has triggered internally on some PyTorch code.

Test Plan: Rely on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89851
Approved by: https://github.com/malfet
2022-12-17 00:30:06 +00:00
Nikita Shulga
36ac095ff8 Migrate PyTorch to C++17 (#85969)
With CUDA-10.2 gone we can finally do it!

This PR mostly contains build system related changes, invasive functional ones are to be followed.
Among many expected tweaks to the build system, here are few unexpected ones:
 - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it
 - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code.
 - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious
 - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it.

Some prerequisites:
 - https://github.com/pytorch/pytorch/pull/89297
 - https://github.com/pytorch/pytorch/pull/89605
 - https://github.com/pytorch/pytorch/pull/90228
 - https://github.com/pytorch/pytorch/pull/90389
 - https://github.com/pytorch/pytorch/pull/90379
 - https://github.com/pytorch/pytorch/pull/89570
 - https://github.com/facebookincubator/gloo/pull/336
 - https://github.com/facebookincubator/gloo/pull/343
 - 919676fb32

Fixes https://github.com/pytorch/pytorch/issues/56055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2022-12-08 02:27:48 +00:00
Nikita Shulga
3ad2a032f4 Update default cmake to 3.18 (#89570)
Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh `
Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570
Approved by: https://github.com/atalman
2022-11-23 23:23:26 +00:00
PyTorch MergeBot
902e4e3926 Revert "Fix the kineto daemon build condition (#89174)"
This reverts commit 9fd00f194a.

Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil.
2022-11-23 19:05:14 +00:00
mikey dagitses
92f9214a31 add -Wnarrowing as error to cmake builds (#89207)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89207
Approved by: https://github.com/wconstab, https://github.com/malfet
2022-11-18 03:16:18 +00:00
Dmytro Dzhulgakov
9fd00f194a Fix the kineto daemon build condition (#89174)
If we're not building the lite interpreter we shouldn't be disabling Kineto. This eliminates a step from https://github.com/facebookincubator/dynolog/blob/main/docs/pytorch_profiler.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89174
Approved by: https://github.com/kimishpatel, https://github.com/malfet
2022-11-18 02:42:45 +00:00
Pruthvi Madugundu
fbd08fb358 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-04 04:43:05 +00:00
PyTorch MergeBot
0fa23663cc Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)"
This reverts commit 1e2c4a6e0e.

Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev
2022-11-02 18:13:37 +00:00
Pruthvi Madugundu
1e2c4a6e0e Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-02 17:41:57 +00:00
Nikita Shulga
e1c123d29a Add UBSAN to ASAN (#88055)
Add undefined behavior sanitizer to `USE_ASAN` option.
Added `torch._C._crash_if_vptr_ubsan()` that only fails if vptr belongs to a wrong class after typecast
Deleted all ubsan supressions, but disabled `ProtoTest::Basic` as it fails above-mentioned vptr check.

Fixes https://github.com/pytorch/pytorch/issues/88042
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88055
Approved by: https://github.com/ezyang
2022-11-01 17:59:35 +00:00
Radek Bartoň
ba26bc0fc2 Fix random "C1041: cannot open program database" errors when compiling on Windows (#88084)
Adds `/FS` option to `CMAKE_CXX_FLAGS` and `CMAKE_CUDA_FLAGS`.

So far I've encountered this kind of errors:

```
C:\Users\MyUser\AppData\Local\Temp\tmpxft_00004728_00000000-7_cuda.cudafe1.cpp: fatal error C1041: cannot open program database 'C:\Projects\pytorch\build\third_party\gloo\gloo\CMakeFiles\gloo_cuda.dir\vc140.pdb'; if multiple CL.EXE write to the same .PDB file, please use /FS
```
when building with VS 2022.

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm

Related issues:
- https://github.com/pytorch/pytorch/issues/87691
- https://github.com/pytorch/pytorch/issues/39989
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88084
Approved by: https://github.com/ezyang
2022-10-31 21:11:16 +00:00
Nikita Shulga
30ea8f5c20 Limit ROCM option to Linux only (#87833)
As it's not available on neither Windows nor MacOS

cc @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87833
Approved by: https://github.com/kit1980
2022-10-27 01:24:03 +00:00
Nikita Shulga
c28cdb53ea [BE] Delete BUILD_SPLIT_CUDA option (#87502)
As we are linking with cuDNN and cuBLAS dynamically for all configs anyway, as statically linked cuDNN is different library than dynamically linked one, increases default memory footprint, etc, and libtorch_cuda even if compiled for all GPU architectures is no longer approaching 2Gb binary size limit, so BUILD_SPLIT_CUDA can go away.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87502
Approved by: https://github.com/atalman
2022-10-22 06:00:59 +00:00
albanD
c141f28b64 Fix compilation warning and spurious print (#87297)
Fixes compilation warning, make this warning an error and remove a random print.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87297
Approved by: https://github.com/malfet
2022-10-19 20:56:37 +00:00
Will Constable
78ef40973c Set -Werror=braced-scalar-init (#86911)
- `vector<T>({0})` would give you the vector(size, ...) ctor and produce an empty vector of T, along with the scalar-init warning
- `vector<T>({T(0)})` would give you the vector of a single T(0) as you might have intended, and bypasses the warning/error
- the warning can easily be missed but can have serious consequences, so make it an error

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86911
Approved by: https://github.com/albanD
2022-10-14 22:34:36 +00:00
Nikita Shulga
09364f4298 Compile C10 with Wshadow (#86666)
This should prevent further regressions like https://github.com/pytorch/pytorch/pull/86646
Update `fmt` to `7.1.0` to fix variable shadowing in that library

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86666
Approved by: https://github.com/seemethere
2022-10-11 22:39:58 +00:00
Huy Do
7f02f2ac0c [Experimentation] Add TSAN build and test (#85313)
Some parts of the PR are adopted from the previously abandoned https://github.com/pytorch/pytorch/pull/36694.  This PR is the first part to setup TSAN jobs in the CI.  The data race warnings from TSAN will need to be reviewed later in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85313
Approved by: https://github.com/osalpekar
2022-10-11 19:34:44 +00:00
PyTorch MergeBot
deb414a43f Revert "Use FindCUDAToolkit to find cuda dependencies (#82695)"
This reverts commit fb9b96593c.

Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/malfet due to Break cublas packaging into wheel
2022-10-11 02:50:47 +00:00
Peter Bell
fb9b96593c Use FindCUDAToolkit to find cuda dependencies (#82695)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695
Approved by: https://github.com/malfet
2022-10-06 15:43:39 +00:00
Sahan Paliskara
936e93058b Delete torch::deploy from pytorch core (#85953)
As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there.

This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953
Approved by: https://github.com/seemethere, https://github.com/malfet
2022-10-06 07:20:16 +00:00
Nirav Mehta
d724a91935 Adding Wunused-local-typedef build flag (#86154)
# Summary

In the past, we have seen PRs causing internal breakages caused by `-Wunused-local-typedef` flag which than had to be fixed. For example: [#79978](https://github.com/pytorch/pytorch/pull/79978)

As part of this change, we want to catch this error in the PR Checks itself.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86154
Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/osalpekar
2022-10-04 19:43:57 +00:00
Nikita Shulga
4c6dc6a1a4 [BE] Do not use VLA (#85800)
[Variable Length Array](https://en.wikipedia.org/wiki/Variable-length_array) is part of C99 standard, but has never been adopted to C++

Also, warning if they are used (surprisingly those warnings can not be turned into errors.
Remove code duplication in `OperationUtils.mm`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85800
Approved by: https://github.com/kulinseth, https://github.com/jeanschmidt
2022-09-28 17:12:25 +00:00
Omkar Salpekar
f4251525de Adding Wunused-lambda-capture to Clang build flags (#85655)
Add `-Wunused-lambda-capture` to clang build flags to better align internal and OSS build systems. This flag is not supported in gcc so only adding for clang builds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85655
Approved by: https://github.com/huydhn
2022-09-27 18:11:18 +00:00
Huy Do
e4471032da Enforce non-virtual-dtor everywhere (#85586)
This can finally be removed because NVIDIA has merged my PR on https://github.com/NVIDIA/cudnn-frontend/pull/33
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85586
Approved by: https://github.com/seemethere, https://github.com/ZainRizvi
2022-09-26 21:35:00 +00:00
cpuhrsch
6a04df3ac8 Get flash_attn to compile for CUDA 11.6 linux nightly build (#84941)
This PR only attempts to get this code to compile for all archs so that we can dispatch to it in https://github.com/pytorch/pytorch/pull/84653
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84941
Approved by: https://github.com/drisspg, https://github.com/malfet
2022-09-26 20:49:19 +00:00
Nikita Shulga
d05a11337c [CMake] Add functorch target (#83464)
Move functorch/functorch into `functorch` folder
- Add functorch/CMakeLists.txt that adds `functorch` native python exension
- Modify `setup.py` to package pytorch and functorch together into a single wheel
- Modify `functorch.__version__` is not equal to that of `torch.__version__`
- Add dummy `functorch/setup.py` file for the projects that still want to build it

Differential Revision: [D39058811](https://our.internmc.facebook.com/intern/diff/D39058811)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83464
Approved by: https://github.com/zou3519
2022-09-14 00:05:33 +00:00
Driss Guessous
0fc02dbba4 flash_attention integration (#81434)
# Summary:
- I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on.

- Only looked at CMake did not attempt bazel or buck yet.

-  I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434
Approved by: https://github.com/cpuhrsch
2022-09-09 20:11:26 +00:00
Dhruv Matani
0d46bfac5b [Mobile] Use -ffunction-sections and -fdata-sections for Mobile builds (#84704)
Summary: Set `-ffunction-sections` and `-fdata-sections` so that each method has its own text section. This allows the linker to remove unused section when the flag `-Wl,-gc-sections` is provided at link time.

Test Plan: CI and local build using `build_mobile.sh`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84704
Approved by: https://github.com/JacobSzwejbka, https://github.com/cccclai
2022-09-09 15:07:35 +00:00
Dhruv Matani
747f27a9ad [Mobile] Update build_mobile.sh to allow lite interpreter and tracing based builds (#84647)
Summary: Currently, build_mobile.sh doesn't allow lite interpreter builds or tracing based selective builds. build_mobile.sh is used for host builds of PyTorch for Mobile deployment.

Additionally, certain flags such as `USE_BLAS` were not being respected as they should be. This change addresses that as well.

Test Plan: Build using:

```
cat /tmp/selected_ops.yaml
- aten::add
- aten::sub
```

```
BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN=1 USE_LIGHTWEIGHT_DISPATCH=0 BUILD_LITE_INTERPRETER=1 SELECTED_OP_LIST=/tmp/selected_ops.yaml ./scripts/build_mobile.sh
```

```
cat /tmp/main.cpp

int main() {
  auto m = torch::jit::_load_for_mobile("/tmp/path_to_model.ptl");
  auto res = m.forward({});
  return 0;
}
```

Test using:

```
g++ /tmp/main.cpp -L build_mobile/lib/ -I build_mobile/install/include/ -lpthread -lc10 -ltorch_cpu -ltorch -lXNNPACK -lpytorch_qnnpack -lcpuinfo -lclog -lpthreadpool -lgloo -lkineto -lfmt -ldl -lc10
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84647
Approved by: https://github.com/JacobSzwejbka, https://github.com/cccclai
2022-09-09 15:02:29 +00:00
John Detloff
e0229d6517 Remove caffe2 mobile (#84338)
We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338
Approved by: https://github.com/dreiss
2022-09-08 01:49:55 +00:00
Xiao Wang
b18f984307 [cmake] Change COLORIZE_OUTPUT option to USE_COLORIZE_OUTPUT (#83716)
Close https://github.com/pytorch/pytorch/issues/83500

Change COLORIZE_OUTPUT option to USE_COLORIZE_OUTPUT so that it can be passed and disabled through environment variable.

Not sure why COLORIZE_OUTPUT=0 didn't work before but USE_COLORIZE_OUTPUT=0 works after.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83716
Approved by: https://github.com/malfet
2022-08-23 01:09:29 +00:00
Peter Bell
d07a9ba11b Don't build nvfuser benchmarks by default (#67857)
None of the other benchmarks are built by default, so this seems unnecessary.

cc @jjsjann123
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67857
Approved by: https://github.com/jjsjann123, https://github.com/janeyx99
2022-08-12 16:44:41 +00:00
Nikita Shulga
5e477714fa [BE] Fix MPS build warnings (#83048)
Mostly get rid of unused variables, but also:
- Hide `-[MPSGraphTensorData printNDArray]` behind undefined method
  access
- Rename several methods in ScatterGather/TriangularOps:
   - `scatterAlongAxisWithDataTensor:`->`scatterAlongAxis:withDataTensor:`
   - `gatherAlongAxisWithUpdatesTensor:`->`gatherAlongAxis:withUpdatesTensor:`
   - `getCoordinateValueWithShapeTensor:`->`coordinateAlongAxisTensor:withShapeTensor:`
- Add `-Wno-unguarded-availability-new` to suppress 12.3+ availability warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83048
Approved by: https://github.com/albanD
2022-08-10 17:29:44 +00:00
Nikita Shulga
62c8d30f9f [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-10 14:32:26 +00:00
PyTorch MergeBot
d3a1f17fc7 Revert "[BE] Add append_cxx_flag_if_supported macro (#82883)"
This reverts commit d7e6aaa59b.

Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-10 10:27:59 +00:00
Nikita Shulga
d7e6aaa59b [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-08 21:04:09 +00:00
Tongliang Liao
dff70a5e1a Make language std configurable. (#75519)
RocksDB 7 starts to use C++17 in header.
We should make this configurable, in case user needs higher std version.

List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`.
Doc string is from CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519
Approved by: https://github.com/malfet
2022-07-13 14:21:27 +00:00
Jing Xu
3c7044728b Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-07-13 13:50:15 +00:00
atalman
d552ba3b4f Use fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries (#81058)
Fixes: #80489

Test using cuda 11.3 manywheel binary:
```
import torch
print(torch.__version__)
print(torch._C._PYBIND11_BUILD_ABI)
````

Output
```
1.13.0.dev20220707+cu113
_cxxabi1011
```

Functorch test torch : 1.13.0.dev20220707+cu113, functorch with cu102
```
import torch
print(torch.__version__)
print(torch._C._PYBIND11_BUILD_ABI)
from functorch import vmap
x = torch.randn(2, 3, 5)
vmap(lambda x: x, out_dims=3)(x)
```

Output
```
1.13.0.dev20220707+cu113
_cxxabi1011
/home/atalman/temp/testc1.py:5: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:73.)
  x = torch.randn(2, 3, 5)
Traceback (most recent call last):
  File "/home/atalman/temp/testc1.py", line 6, in <module>
    vmap(lambda x: x, out_dims=3)(x)
  File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 361, in wrapped
    return _flat_vmap(
  File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 488, in _flat_vmap
    return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
  File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched
    flat_outputs = [
  File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp>
    _remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
```

Related Builder  PR: https://github.com/pytorch/builder/pull/1083

Test PR: https://github.com/pytorch/pytorch/pull/81232
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81058
Approved by: https://github.com/zou3519, https://github.com/malfet
2022-07-12 17:56:33 +00:00
Terry Lam
54bdaf76d6 [PFC] Native UCC process group for Pytorch (#79918)
Summary:
This diff integrates UCC process group as a native component of Pytorch Distributed core. It is based on the existing torch-ucc (https://github.com/facebookresearch/torch_ucc) as the wrapper for UCC collective communication library.
The environment and cmake variables are named in mirroring to the existing process groups such as NCCL and Gloo. Specifically,
- USE_UCC: enables UCC PG. This defaults to OFF, so there is no breakage of existing builds that do not have UCX/UCC external libraries.
- USE_SYSTEM_UCC: uses external UCX and UCC shared libraries that are set accordingly with UCX_HOME and UCC_HOME.

Currently, this diff only supports USE_SYSTEM_UCC=ON, i.e., requiring users to specify external libraries for UCX and UCC. In subsequent diffs, we will add UCX and UCC repos as third-party dependencies in pytorch/third-party.

Test Plan:
Passed Torch-UCC tests that invoke UCC process group. For example:

$ sh test/start_test.sh test/torch_allreduce_test.py --backend gloo --use-cuda
...
Test allreduce: succeeded

Differential Revision: D36973688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79918
Approved by: https://github.com/kwen2501, https://github.com/kingchc
2022-07-12 14:45:44 +00:00
Nikita Shulga
2beb57a823 Add -Werror=non-virtual-dtor (reland) (#81012)
This PR relands #80584, but instead of adding suppression in CMakeLists.txt suppresses it directly in `llvm_codegen.cpp` and just for a single header.

In general, it's better to avoid `set_target_properties` pattern for suppressing warnings, as it makes build brittle and hard to debug/understand

Test plan: wait for `ciflow/binaries_wheel` to finish
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81012
Approved by: https://github.com/huydhn, https://github.com/kit1980
2022-07-07 05:33:55 +00:00
PyTorch MergeBot
0491c10a63 Revert "Add -Werror=non-virtual-dtor (#80584)"
This reverts commit 7670035862.

Reverted https://github.com/pytorch/pytorch/pull/80584 on behalf of https://github.com/malfet due to Broke nighly builds, see https://github.com/pytorch/pytorch/runs/7209779559?check_suite_focus=true
2022-07-06 22:26:59 +00:00
Ronak Malik
d03f989d53 [ROCm] Load ROCm if Torch is used as a dependency (#80469)
Includes LoadHIP.cmake if pytorch is used as a dependency for another project and ROCm is enabled. This removes the need to explicitly link against ROCm libraries in extension projects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80469
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-07-05 21:04:07 +00:00
Huy Do
7670035862 Add -Werror=non-virtual-dtor (#80584)
This also resolves https://github.com/pytorch/pytorch/pull/77323

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80584
Approved by: https://github.com/seemethere
2022-07-04 16:54:47 +00:00
PyTorch MergeBot
1454515253 Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)"
This reverts commit f988aa2b3f.

Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see f988aa2b3f
2022-06-30 12:49:41 +00:00
Jing Xu
f988aa2b3f Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-06-30 05:14:03 +00:00
Huy Do
39bd81a11f Add clang -Wconstant-conversion (#80461)
This catchs the compilation error detected in https://github.com/pytorch/pytorch/pull/75400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80461
Approved by: https://github.com/osalpekar
2022-06-29 23:42:20 +00:00
Nikita Shulga
b370959da1 [MPS] Make it compilable with either xCode or CLI (#79430)
`xcrun --sdk macosx --show-sdk-version` works with either CommandLineTools or Xcode, but `xcodebuild -sdk macosx -version SDKVersion` works only if full Xcode is installed, which is not necessary to build PyTorch

Above command yield the same output when Xcode is installed:
```
% xcodebuild -sdk macosx -version SDKVersion
12.3
 %  xcrun --sdk macosx --show-sdk-version
12.3
```

But first one fails if Xcode is missing:
```
% xcodebuild -sdk macosx -version SDKVersion
xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance
% xcrun --sdk macosx --show-sdk-version
12.3

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79430
Approved by: https://github.com/albanD
2022-06-13 21:03:48 +00:00
Nikita Shulga
3255ddeec9 Make Wunused-local-typedef a hard error (#77918)
Only allow it for `libtorch_python` and tests
Helps prevent regression like https://github.com/pytorch/pytorch/pull/76547#issuecomment-1132208232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77918
Approved by: https://github.com/osalpekar, https://github.com/seemethere
2022-06-09 18:14:01 +00:00
Nikita Shulga
634954c55c [MPS] Do not pass linker command to a compiler (#78630)
`-weak_framework` is a linker rather than a compiler option and as such
it should not be passed as CXX flag
Also, use `string(APPEND` rather than `set(FOO "$(FOO) ...)`

Likely fixes our ability to use `sccache` for MacOS CI builds, see https://github.com/pytorch/pytorch/issues/78375#issuecomment-1143697183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78630
Approved by: https://github.com/albanD
2022-06-01 22:08:54 +00:00
Alban Desmaison
fd121dfeec Move x86 binaries builder to macos-12 to enable MPS build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77662

Approved by: https://github.com/seemethere
2022-05-19 21:59:08 +00:00
Peter Bell
5cdf79fddc Bump minimum CMake version to 3.13
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76312

Approved by: https://github.com/malfet
2022-05-19 15:38:55 +00:00
Eddie Yan
14ab3ff484 [cuDNN V8 API] Enable cuDNN v8 API by default (#75466)
Testing via CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75466
Approved by: https://github.com/ngimel
2022-05-17 21:54:17 +00:00
Alban Desmaison
cf975dde0d Make sure that we can build without xcode on mac (#77450)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77450
Approved by: https://github.com/drisspg, https://github.com/kulinseth
2022-05-13 21:18:55 +00:00
Kulin Seth
e011a8e18b Enable PyTorch operations on MPS Backend. (#77343)
Add PyTorch operations to MPS backend.

- https://github.com/pytorch/pytorch/issues/77394
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343
Approved by: https://github.com/albanD
2022-05-13 18:28:53 +00:00
sanchitintel
4ee29d6033 [Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5)
Re-landing #68111/#74596

## Description
v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of #50256, the below improvements are included:

 * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used
 * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

 ### User API:
The optimization pass is disabled by default. Users could enable it by:

```
 torch.jit.enable_onednn_fusion(True)
```
`torch.jit.freeze` should be used after tracing (recommended) or scripting a model.

 ### Performance:
 [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:

 * SkyLake 8180 (1 socket of 28 cores):
   ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)
* SkyLake 8180 (single thread):
   ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
   * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
   ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

 ### Directory structure of the integration code
 Fuser-related code is placed under:

 ```
 torch/csrc/jit/codegen/onednn/
 ```

 Optimization pass registration is done in:

 ```
 torch/csrc/jit/passes/onednn_graph_fuser.h
 ```

 CMake for the integration code is in:

 ```
 caffe2/CMakeLists.txt
 cmake/public/mkldnn.cmake
 cmake/Modules/FindMKLDNN.cmake
 ```

 ## Limitations
 * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step.
 * We have only optimized the inference use-case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622
Approved by: https://github.com/eellison
2022-05-05 16:57:03 +00:00
Eddie Yan
e838137b3e Add high level control of fp32 matmul precision; disable TF32 for matmuls by default
#76440

CC @mruberry @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76509
Approved by: https://github.com/ngimel
2022-05-04 20:40:13 +00:00
Nikita Shulga
8473173c36 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency.

Add `third_party` to torch_cpu include directories if compiling with
Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-05-03 20:21:55 +00:00
PyTorch MergeBot
3dcd67a1b3 Revert "[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)"
This reverts commit 8b11d81058.

Reverted https://github.com/pytorch/pytorch/pull/74596 on behalf of https://github.com/janeyx99
2022-04-29 15:40:17 +00:00
chunyuan
8b11d81058 [Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)
Re-landing https://github.com/pytorch/pytorch/pull/68111

## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596
Approved by: https://github.com/malfet
2022-04-29 01:01:33 +00:00
Kulin Seth
54c75e1e8f Add "mps" device to PyTorch framework.
Remove the "mlc" device for Mac platforms.

This commit will be followed up with:

* adding MPS runtime components
* PyTorch ops for MPS device

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76291
Approved by: https://github.com/albanD
2022-04-27 19:21:57 +00:00
PyTorch MergeBot
d79d9fa283 Revert "Remove breakpad dependency"
This reverts commit 9aa3c7fd83.

Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet
2022-04-17 17:58:51 +00:00
Nikita Shulga
9aa3c7fd83 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-04-17 17:43:45 +00:00
Nikita Shulga
bdf5a87714 Extend sign-compare warnings to gcc (take 2)
Remove `-Wno-sign-compare` option for GCC
Suppress erroneous sign-compare warning in `c10::greater_than_max`(see  https://godbolt.org/z/Tr3Msnz99)
Fix sign-compare in torch/deploy,  `caffe2::QTensor::dim32()` and `generate_proposals_op_test.cc`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75544
Approved by: https://github.com/osalpekar
2022-04-13 00:06:52 +00:00
Edward Z. Yang
c2124f5c66 Turn on -Wsign-compare
This is enabled on some of our internal builds, is a common source
of fbcode only errors and apparently we are relatively clean on it.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74996

Approved by: https://github.com/malfet
2022-04-12 18:58:14 +00:00
PyTorch MergeBot
80e05b7df4 Revert "Extend sign-compare warnings to gcc"
This reverts commit 34446653c7.

Reverted https://github.com/pytorch/pytorch/pull/75544 on behalf of https://github.com/janeyx99
2022-04-12 18:22:53 +00:00
Nikita Shulga
34446653c7 Extend sign-compare warnings to gcc
Remove `-Wno-sign-compare` option for GCC
Suppress erroneous sign-compare warning in `c10::greater_than_max`(see  https://godbolt.org/z/Tr3Msnz99)
Fix sign-compare in torch/deploy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75544
Approved by: https://github.com/osalpekar
2022-04-12 17:36:48 +00:00
Nikita Shulga
90a56fc515 Add -Wsign-compare to list of clang flags
It caused a number of internal only compilation failures, for example
see:
https://github.com/pytorch/pytorch/pull/74425#issuecomment-1075476438
and https://github.com/pytorch/pytorch/pull/74542#issuecomment-1083518880

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75085

Approved by: https://github.com/ngimel, https://github.com/albanD
2022-04-05 14:16:47 +00:00
Xiang Gao
3b29bd00eb Make ProcessGroupNCCL load torch_ucc.so when TORCH_UCC_LIBRARY_PATH is set (#69552)
Summary:
This is the very first step for the UCC-NCCL integration. This PR lets `ProcessGroupNCCL` load the `torch_ucc.so` if the user specifies an environmental variable `TORCH_UCC_LIBRARY_PATH`. If this environment variable is not specified by the user, then there will be no visible change.

In the future, we may want to make PyTorch smart enough to automatically detect the `torch_ucc.so` in the user's system, but before doing that, I believe we should first make sure that `ProcessGroupUCC` is very well tested.

Note that in this PR, `ProcessGroupNCCL` just loads the library but will not use it. I am trying to make PRs small, so the usage of `torch_ucc.so` will be submitted in later PRs.

This PR requires the change in https://github.com/facebookresearch/torch_ucc/pull/56, otherwise `torch_ucc.so` can not be successfully loaded. But his PR can be landed separately without waiting for https://github.com/facebookresearch/torch_ucc/pull/56 because, in PyTorch's unit tests, UCC is never used or tested.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69552

Reviewed By: mruberry

Differential Revision: D34675212

Pulled By: jiayisuse

fbshipit-source-id: a3d1fb98340dbe3a931af555423863efd381f1ae
(cherry picked from commit 3778b6fabe70c26b5a65e6ddec641d2ef9113cd1)
2022-03-25 18:19:39 +00:00
Will Constable
3547f20872 Land remaining parts of Torchscript Lazy Tensor backend (#74111)
Summary:
Also enables bazel build to run lazy codegen.  Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111

Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds

Reviewed By: bdhirsh

Differential Revision: D34772403

fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496
(cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)
2022-03-22 23:14:03 +00:00
Edward Z. Yang
493bbdc4fe Use shared CUPTI by default
Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI
causes exception handling to break on certain compiler configurations, likely
because CUPTI comes with incompatible libstdc++ symbols.  Rather than pray that
something reasonable happens, use the safer configuration (dynamic linking) by
default and give a warning if the user inverts the setting.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74009

Approved by: https://github.com/malfet
2022-03-16 21:04:12 +00:00
Ashwin Hari
7ed73b2803 CMake option for using static MKL libraries
Fixes #70587

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73069
Approved by: https://github.com/malfet
2022-03-07 19:32:33 +00:00
Mengwei Liu
9ce9803abe [PyTorch] Add codegen unboxing ability (#69881)
Summary:
RFC: https://github.com/pytorch/rfcs/pull/40

This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run
```
tools/jit/gen_unboxing.py -d cg/torch/share/ATen
```

Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`.

## Lite predictor build specifics

1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`.
2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off.

## Current CI job test coverage update

Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options:
* `USE_LIGHTWEIGHT_DISPATCH=1`
* `BUILD_LITE_INTERPRETER=1`
* `STATIC_DISPATCH_BACKEND=CPU`

This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881

Reviewed By: iseeyuan

Differential Revision: D33692299

Pulled By: larryliu0820

fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023
(cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)
2022-03-01 23:28:13 +00:00
Nikita Shulga
6302cdb9bc [Reland] Add BUILD_LAZY_CUDA_LINALG option (#73447)
Summary:
When enabled, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs

Avoid symbol clashes that can result in infinite recursion by moving all symbols in the library to its own namespace.

Add checks that should prevent calling self in recursion to `LinearAlgebraStubs.cpp`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73447

Reviewed By: albanD

Differential Revision: D34538827

Pulled By: malfet

fbshipit-source-id: f2535b471d3524768a84b2e169b6aa24c26c03bf
(cherry picked from commit 4ec24b079c861c1122f0fa86e280b977c3c2f7ac)
2022-03-01 21:33:07 +00:00
Andrey Talman
197764b35d Remove cuda 11.1 references (#73514)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/73377

We've migrated to CUDA-11.3 as default toolkit in 1.9, it's time to stop builds (especially considering forward-compatibility guarantee across CUDA-11.x drivers)

Hence we are removing CUDA 11.1 support. We should also cleanup old cuda related code from our builder and pytorch repo making scripts a little more clean.

We have code that references cuda 9.2 , 10.1 , 11.0, 11.1, 11.2 and none of these are currently use

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73514

Reviewed By: janeyx99

Differential Revision: D34551989

Pulled By: atalman

fbshipit-source-id: 9ceaaa9b25ad49689986f4b29a26d20370d9d011
(cherry picked from commit fe109c62daf429e9053c03f6e374568ba23cd041)
2022-03-01 16:37:37 +00:00
Jane Xu
31271284bc Revert D33992795: Add BUILD_LAZY_CUDA_LINALG option
Test Plan: revert-hammer

Differential Revision:
D33992795 (82130758f0)

Original commit changeset: d1fa351a3206

Original Phabricator Diff: D33992795 (82130758f0)

fbshipit-source-id: f0a66d7431aea2c358718eef16fab05712cd6cae
(cherry picked from commit df4900115f712e477ed5cc97510e6515a1ca17a9)
2022-02-25 18:37:31 +00:00
Digant Desai
b2054d3025 Prepare for an update to the XNNPACK submodule (#72642)
Summary:
- Target Sha1: ae108ef49aa5623b896fc93d4298c49d1750d9ba
- Make USE_XNNPACK a dependent option on cmake minimum version 3.12
- Print USE_XNNPACK under cmake options summary, and print the
  availability from collet_env.py
- Skip XNNPACK based tests when XNNPACK is not available
    - Add SkipIfNoXNNPACK wrapper to skip tests
- Update cmake version for xenial-py3.7-gcc5.4 image to 3.12.4
    - This is required for the backwards compatibility test.
      The PyTorch op schema is XNNPACK dependent. See,
      aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp for
      example. The nightly version is assumed to have USE_XNNPACK=ON,
      so with this change we ensure that the test build can also
      have XNNPACK.
- HACK: skipping test_xnnpack_integration tests on ROCM

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72642

Reviewed By: kimishpatel

Differential Revision: D34456794

Pulled By: digantdesai

fbshipit-source-id: 85dbfe0211de7846d8a84321b14fdb061cd6c037
(cherry picked from commit 6cf48e7b64d6979962d701b5d493998262cc8bfa)
2022-02-25 00:39:15 +00:00
Nikita Shulga
82130758f0 Add BUILD_LAZY_CUDA_LINALG option (#72306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72306

When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33992795

Pulled By: malfet

fbshipit-source-id: d1fa351a320659b29754997c20d754e69bfe36c0
(cherry picked from commit d5d6c69a988b9454538ecd28674206da2541de17)
2022-02-24 03:30:04 +00:00
Daniël de Kok
d50211860a Use SLEEF functions for NEON vectors on macOS ARM64 (#70354)
Summary:
We noticed that on M1 Macs Tranformer network profiles are dominated by scalar `exp` and `erff` functions (for softmax and GELU).

The NEON `Vectorized<float>` implementation does not use SLEEF functions in order to compile on mobile platforms. However, SLEEF is already compiled on macOS ARM64 and is safe to use there. This change adds another implementation of `Vectorized<float>` that uses SLEEF functions. This implementation is only used on macOS ARM64.

This change speeds up e.g. prediction of spaCy transformer models by 20% on M1 Macs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70354

Reviewed By: albanD

Differential Revision: D33659540

Pulled By: kimishpatel

fbshipit-source-id: b8f02a61321873fc60778190a005c466c7d0cc0c
(cherry picked from commit 71286a207c)
2022-02-07 21:55:28 +00:00
Peter Bell
4829dcea09 Codegen: Generate seperate headers per operator (#68247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247

This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and
`NativeMetaFunctions.h` into seperate headers per operator base name.
With `at::sum` as an example, we can include:
```cpp
<ATen/core/sum.h>         // Like Functions.h
<ATen/core/sum_ops.h>     // Like Operators.h
<ATen/core/sum_native.h>  // Like NativeFunctions.h
<ATen/core/sum_meta.h>    // Like NativeMetaFunctions.h
```

The umbrella headers are still being generated, but all they do is
include from the `ATen/ops' folder.

Further, `TensorBody.h` now only includes the operators that have
method variants. Which means files that only include `Tensor.h` don't
need to be rebuilt when you modify function-only operators. Currently
there are about 680 operators that don't have method variants, so this
is potentially a significant win for incremental builds.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32596272

Pulled By: albanD

fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272
2021-12-14 06:40:08 -08:00
Yanan Cao
17f3179d60 Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796

(Note: this ignores all push blocking failures!)

Test Plan: External CI + Sandcastle

Reviewed By: zhxchen17

Differential Revision: D33032671

fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef
2021-12-10 21:29:53 -08:00
Nikita Shulga
e305e4d4d8 Suppress common warnings when building by clang (#69710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69710

Namely no range-loop-analysis (that detect when loop variable can not be const reference

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997003

Pulled By: malfet

fbshipit-source-id: dba0e7875e5b667e2cc394c70dd75e2403265918
2021-12-10 16:45:38 -08:00
Han Qi
d3649309e6 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306

Included functions:

save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer
Module object

Test Plan: unittests

Reviewed By: gmagogsfm

Differential Revision: D32806835

fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57
2021-12-09 14:53:31 -08:00
Peter Bell
21919be96b CMake: Update precompiled header and fix support (#67851)
Summary:
This fixes the `USE_PRECOMPILED_HEADERS` cmake version check which was accidentally inverted, so it was always disabled.

I've also made the precompiled header so it only includes headers used in 95% or more of code, weighted by compile time. This limits it to the standard library, `c10` and a limited subset of `ATen/core`. Crucially, the new pch doesn't depend on `native_functions.yaml` so won't cause as much unnecessary rebuilding.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67851

Reviewed By: zou3519

Differential Revision: D32290902

Pulled By: dagitses

fbshipit-source-id: dfc33330028c99b02ff40963926c1f1260d00d00
2021-12-03 06:51:56 -08:00
Alban Desmaison
00ebbd5ef6 Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer
Test Plan: revert-hammer

Differential Revision:
D32010095 (41d35dc201)

Original commit changeset: d763b0557780

fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d
2021-12-02 06:41:40 -08:00
Han Qi
41d35dc201 Add ability for a mobile::Module to save as flatbuffer (#67351)
Summary:
Included functions:

* save_mobile_module -> saves a mobile::Module to flatbuffer
* load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
* parse_mobile_module -> parses from bytes or deserialized flatbuffer
      Module object

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351

Reviewed By: iseeyuan

Differential Revision: D32010095

Pulled By: qihqi

fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1
2021-12-01 23:58:15 -08:00
Yi Zhang
31d36fd35d fix sccache issue on Windows CPU (#68870)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68796

```
2021-11-24T10:12:40.7634007Z Compile requests                   4312
2021-11-24T10:12:40.7634484Z Compile requests executed          4300
2021-11-24T10:12:40.7634823Z Cache hits                         4227
2021-11-24T10:12:40.7635122Z Cache hits (C/C++)                 4227
2021-11-24T10:12:40.7636139Z Cache misses                         62
2021-11-24T10:12:40.7636930Z Cache misses (C/C++)                 62
2021-11-24T10:12:40.7637333Z Cache timeouts                        0
2021-11-24T10:12:40.7637839Z Cache read errors                     0
2021-11-24T10:12:40.7638161Z Forced recaches                       0
2021-11-24T10:12:40.7638489Z Cache write errors                    0
2021-11-24T10:12:40.7638828Z Compilation failures                  1
2021-11-24T10:12:40.7639180Z Cache errors                         10
2021-11-24T10:12:40.7639490Z Cache errors (C/C++)                 10
2021-11-24T10:12:40.7639856Z Non-cacheable compilations            0
2021-11-24T10:12:40.7640244Z Non-cacheable calls                   0
2021-11-24T10:12:40.7640601Z Non-compilation calls                12
2021-11-24T10:12:40.7640987Z Unsupported compiler calls            0
2021-11-24T10:12:40.7641426Z Average cache write               0.104 s
2021-11-24T10:12:40.7641763Z Average cache read miss           6.000 s
2021-11-24T10:12:40.7642110Z Average cache read hit            0.046 s
2021-11-24T10:12:40.7642485Z Failed distributed compilations       0
```
https://github.com/pytorch/pytorch/runs/4310176911?check_suite_focus=true

cc seemethere malfet pytorch/pytorch-dev-infra

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68870

Reviewed By: ejguan

Differential Revision: D32646289

Pulled By: janeyx99

fbshipit-source-id: bf04446439e55a4ccaf9ce7c77812752ca717a7c
2021-11-24 08:04:59 -08:00
Peter Bell
e7e1b76106 Require CMake 3.13 when building with Ninja (#68731)
Summary:
There is a bug in CMake's Ninja generator where files considered inputs to the cmake command couldn't be generated by another build step. The fix was included in CMake 3.13, but 3.10.3 is still sufficient for other cmake generators e.g. makefiles.
For reference, the bug is here https://gitlab.kitware.com/cmake/cmake/-/issues/18584

This is necessary for https://github.com/pytorch/pytorch/issues/68246 but I'm isolating the change here to make testing easier.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68731

Reviewed By: jbschlosser

Differential Revision: D32604545

Pulled By: malfet

fbshipit-source-id: 9bc0bd8641ba415dd63ce21a05c177e2f1dd9866
2021-11-23 09:34:20 -08:00
Jiakai Liu
3dc0754c53 [pytorch][mobile] deprecate the LLVM-based static analyzer (#68180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180

Since we've open sourced the tracing-based selective build, we can deprecate the
op-dependency-graph-based selective build and the static analyzer tool that
produces the dependency graph.
ghstack-source-id: 143108377

Test Plan: CIs

Reviewed By: seemethere

Differential Revision: D32358467

fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c
2021-11-11 16:37:08 -08:00
Nikita Shulga
77beccaedb Do not build PyTorch with caffe2 by default (#66658)
Summary:
CAFFE2 has been deprecated for a while, but still included in every PyTorch build.
We should stop building it by default, although CI should still validate that caffe2 code is buildable.

Build even fewer dependencies when compiling mobile builds without Caffe2
Introduce `TEST_CAFFE2` in torch.common.utils
Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc`  is code is compiled without Caffe2
Should be landed after https://github.com/pytorch/builder/pull/864

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658

Reviewed By: driazati, seemethere, janeyx99

Differential Revision: D31669156

Pulled By: malfet

fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d
2021-10-21 20:32:47 -07:00
Chen Lai
76efbccc3b [PyTorch Edge][tracing-based] Unify tracer between internal and external (#64152)
Summary:
As title, introduce the file `TracerRunner` shared by internal/external tracer and the main function is
```
TracerResult trace_run(const std::string& input_module_path);
```
which basically takes the path to model file and generate the trace result. The main difference between external tracer and internal tracer is
1. the dependency on `<yaml-cpp/yaml.h>`.
2. the output yaml file from internal tracer includes `model_version` and `model_asset`. These are only needed for internal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64152

ghstack-source-id: 140692467

Test Plan:
```
./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_with_bundled_input.ptl" --build_yaml_path  "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml"
```
```
./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/deeplabv3_scripted_with_bundled_input.ptl
```
have the same operator output

selected_operators.yaml (P460296279)
selected_mobile_ops.h (P460296258)

Reviewed By: dhruvbird

Differential Revision: D30632224

fbshipit-source-id: eb0321dbc0f1fcf6d2e05384695eebb59ac04f8c
2021-10-15 02:19:45 -07:00
Michael Suo
3ac2c74896 Revert D31082208: Use shared CUPTI by default
Test Plan: revert-hammer

Differential Revision:
D31082208 (8b0eae5aa8)

Original commit changeset: 14f66af92084

fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db
2021-10-12 14:37:54 -07:00
Edward Yang
8b0eae5aa8 Use shared CUPTI by default (#65401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401

Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI
causes exception handling to break on certain compiler configurations, likely
because CUPTI comes with incompatible libstdc++ symbols.  Rather than pray that
something reasonable happens, use the safer configuration (dynamic linking) by
default and give a warning if the user inverts the setting.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: gdankel

Differential Revision: D31082208

Pulled By: ezyang

fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34
2021-10-12 11:01:40 -07:00
Nikita Shulga
c373387709 Update CMake and use native CUDA language support (#62445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445

PyTorch currently uses the old style of compiling CUDA in CMake which is just a
bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as
a language just like C++ or C.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503350

fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55
2021-10-11 09:05:48 -07:00
Chen Lai
3fe5895a00 Back out "Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS" (#66267)
Summary:
Previously https://github.com/pytorch/pytorch/pull/64087 broke the  test `binary_macos_wheel_3_7_cpu_build`, because wheel build is not happy with `model_tracer`. Considering it's prototype and there is no need to ship model_tracer via wheel at the moment, using the option `TRACING_BASED` for building tracer. When tracing-based is mature enough, we can ship the tracer binary via wheel eventually.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66267

Original commit changeset: 8ac3d75a52d0
ghstack-source-id: 140122106

Test Plan:
binary_macos_wheel_3_7_cpu_build passes

{F668643831}

Reviewed By: dhruvbird

Differential Revision: D31478593

fbshipit-source-id: 726cab1b31c4596f6268b7824eecb20e2e59d161
2021-10-08 20:12:12 -07:00
Nikita Shulga
4c4525fa5c Compile without -Wno-unused-variable (take 2) (#66041)
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`

Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants

Do not delete `caffe2::OperatorBase::Output` calls as they have side effects

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041

Reviewed By: ngimel

Differential Revision: D31360142

Pulled By: malfet

fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8
2021-10-04 20:39:39 -07:00
Nikita Shulga
e4ee5ca698 Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable
Test Plan: revert-hammer

Differential Revision:
D31326599 (a6280ab653)

Original commit changeset: 924155f1257a

fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf
2021-10-01 20:40:47 -07:00
Nikita Shulga
5ef350d7cc Revert D31359010: [pytorch][PR] Fix cang-tidy regressions caused by #65954
Test Plan: revert-hammer

Differential Revision:
D31359010 (c269f471f4)

Original commit changeset: dce4b91a9891

fbshipit-source-id: 085417432b6748d3672b9b7141460f47d1c17a7f
2021-10-01 20:35:35 -07:00
Nikita Shulga
c269f471f4 Fix cang-tidy regressions caused by #65954 (#66040)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66040

Reviewed By: ZolotukhinM

Differential Revision: D31359010

Pulled By: malfet

fbshipit-source-id: dce4b91a98913c8d8c2d8f9ebc49654265239158
2021-10-01 19:50:53 -07:00
Nikita Shulga
a6280ab653 Compile without -Wno-unused-variable (#65954)
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`

Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954

Reviewed By: ngimel

Differential Revision: D31326599

Pulled By: malfet

fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3
2021-10-01 17:40:47 -07:00
Dhruv Matani
a84feeeade [PyTorch Edge] Conditionally trim dispatch key set to save heap memory at runtime (#65732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65732

For certain on-device uses, runtime memory comes at a premium. On-device deployments won't use all the available dispatch keys, so it makes sense to keep only the on-device specific ones around for such uses to reduce runtime heap memory allocated.

This change keeps just 10 dispatch keys (the ones that used on-device), guarded under the `C10_MOBILE_TRIM_DISPATCH_KEYS` macro. it tries to keep the other code-paths unaffected and uses `constexpr` for use in the `array` declaration, and simple inline functions to ensure that the compiler is able to optimize these for server builds.

Test Plan:
Build and check mobile models end to end.

```
buck build -c "pt.enable_milan_dispatch_keys_trimming"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor
```

Reviewed By: ezyang

Differential Revision: D31185407

fbshipit-source-id: e954765606373dea6ee9466a851dca7684167b0b
2021-09-29 12:20:33 -07:00
jiej
127c9402d0 Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137)
Summary:
This reverts commit 03389dc851.

Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745
Fixes the windows build failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137

Reviewed By: seemethere, dzhulgakov, heitorschueroff

Differential Revision: D30994556

Pulled By: malfet

fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d
2021-09-22 04:54:51 -07:00
Tao Xu
18fa58c4e9 [CoreML][OSS] Integrate with CMake (#64523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64523

- Build Pytorch with CoreML delegate - ` USE_PYTORCH_METAL=ON python setup.py install --cmake`
- Build iOS static libs - `IOS_PLATFORM=SIMULATOR USE_COREML_DELEGATE=1  ./scripts/build_ios.sh`
ghstack-source-id: 138324216

Test Plan:
- Test the Helloword example

{F657778559}

Reviewed By: iseeyuan

Differential Revision: D30594041

fbshipit-source-id: 8cece0b2d4b3ef82d3ef4da8c1054919148beb16
2021-09-17 10:32:00 -07:00
Nikita Shulga
67570a60ba Disable ParallelTBB (#65092)
Summary:
As ParallelTBB's `at::get_thread_num` is not compatible with general model used by OpenMP and ParallelNative (where it is an contiguous thread index within parallel loop), see https://github.com/pytorch/pytorch/issues/64571#issuecomment-914691883

More examples of similar regressions: https://github.com/pytorch/pytorch/runs/3612142217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65092

Reviewed By: zhouzhuojie

Differential Revision: D30995936

Pulled By: malfet

fbshipit-source-id: db145b6a850d794f2c954f59f30249b291473e36
2021-09-16 12:38:45 -07:00
Eli Uriegas
03389dc851 Revert D30752939: [pytorch][PR] nvfuser update
Test Plan: revert-hammer

Differential Revision:
D30752939 (cfaecaf40b)

Original commit changeset: ce122e80f01b

fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2
2021-09-15 17:38:47 -07:00
jiej
cfaecaf40b nvfuser update (#63745)
Summary:
Syncing nvfuser code base from devel branch, Listing a few of our development since last sync:

- Extends support to normalization and reduction kernels.
- Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation.
- profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes).

To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle.

internal updates are files located in:
1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda`
2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser`
3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h`

updates affecting integration:

1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/*`,
2. exposed a few more symbols `aten/src/ATen/core/*` used by codegen

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745

Reviewed By: saketh-are

Differential Revision: D30752939

Pulled By: malfet

fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c
2021-09-15 14:42:55 -07:00
Nick Kreeger
882b67dff4 Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892)
Summary:
The library will no longer link properly on VS 2019 (14.29.30133). To
ensure that engineers building on Windows can use and debug with this
build type, incremental linking needs to be turned off for this build
flag.

Verified that this build type successfully builds, links, and provides
debuggable Python modules on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892

Reviewed By: jbschlosser

Differential Revision: D30902565

Pulled By: malfet

fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b
2021-09-14 09:44:18 -07:00
Hanton Yang
22d38bd10d [OSS] Enable Metal in PyTorch MacOS nightly builds (#63718)
Summary:
Build on https://github.com/pytorch/pytorch/pull/63825

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63718

Test Plan:
1.Add `ci/binaries` label to PR, so the CI will build those nightly builds

2.Make sure the following CI jobs build with `USE_PYTORCH_METAL_EXPORT` option is `ON`:
```
ci/circleci: binary_macos_arm64_conda_3_8_cpu_nightly_build
ci/circleci: binary_macos_arm64_conda_3_9_cpu_nightly_build
ci/circleci: binary_macos_arm64_wheel_3_8_cpu_nightly_build
ci/circleci: binary_macos_arm64_wheel_3_9_cpu_nightly_build
ci/circleci: binary_macos_conda_3_6_cpu_nightly_build
ci/circleci: binary_macos_conda_3_7_cpu_nightly_build
ci/circleci: binary_macos_conda_3_8_cpu_nightly_build
ci/circleci: binary_macos_conda_3_9_cpu_nightly_build
ci/circleci: binary_macos_libtorch_3_7_cpu_nightly_build
ci/circleci: binary_macos_wheel_3_6_cpu_nightly_build
ci/circleci: binary_macos_wheel_3_7_cpu_nightly_build
ci/circleci: binary_macos_wheel_3_8_cpu_nightly_build
ci/circleci: binary_macos_wheel_3_9_cpu_nightly_build
```

3.Test `conda` and `wheel` builds locally on [HelloWorld-Metal](https://github.com/pytorch/ios-demo-app/tree/master/HelloWorld-Metal) demo with [(Prototype) Use iOS GPU in PyTorch](https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html)

(1) conda
```
conda install https://15667941-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/pytorch-1.10.0.dev20210826-py3.8_0.tar.bz2
```
(2) wheel
```
pip3 install https://15598647-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/torch-1.10.0.dev20210824-cp38-none-macosx_10_9_x86_64.whl
```

Reviewed By: xta0

Differential Revision: D30593167

Pulled By: hanton

fbshipit-source-id: 471da204e94b29c11301c857c50501307a5f0785
2021-08-27 09:25:05 -07:00
Nikita Shulga
5ab356ffe6 Update CMake minimum version to 3.10 (#63660)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63660

Test Plan: Imported from OSS

Reviewed By: janeyx99, mruberry

Differential Revision: D30543878

fbshipit-source-id: a7d938807653f39727f2cc7d7ca167200567b6a0
2021-08-25 09:25:43 -07:00
driazati
bd8608cd5c Use CMake for breakpad (#63186)
Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.

```python
import torch

# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()

# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186

Reviewed By: malfet, seemethere

Differential Revision: D30318404

Pulled By: driazati

fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc
2021-08-19 10:42:01 -07:00
Peter Bell
f70b9ee5de Advertise USE_PRECOMPILED_HEADERS in CONTRIBUTING.md (#62827)
Summary:
This option was added in https://github.com/pytorch/pytorch/issues/61940 and fits with this section's theme of improving build times.

I've also changed it to a `cmake_dependent_option` instead of `FATAL_ERROR`ing for older CMake versions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62827

Reviewed By: astaff

Differential Revision: D30342102

Pulled By: malfet

fbshipit-source-id: 3095b44b7085aee8a884ec95cba9f8998d4442e7
2021-08-17 10:14:40 -07:00
Kimish Patel
38c185189c [Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419

This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.

Test Plan:
MobileProfiler.ModuleHierarchy

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993660

fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
2021-08-13 21:40:19 -07:00
peterjc123
d16587f84d Enable rebuilds for Ninja on Windows (#62948)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59859.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62948

Reviewed By: seemethere, tktrungna

Differential Revision: D30192246

Pulled By: janeyx99

fbshipit-source-id: af25cc4bf0db67a1304d9971cfa0ff6831bb3b48
2021-08-09 16:15:45 -07:00
Peter Bell
b7ac286d0e CMake: Add optional precompiled header support (#61940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61940

This adds a `USE_PRECOMPILED_HEADERS` option to the CMake build which
precompiles `ATen.h` and also `CUDAContext.h` for the cuda library.
After making a change in `native_functions.yaml`, this speeds up compilation
time by around 15% on my machine.

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D29988775

Pulled By: malfet

fbshipit-source-id: a23c468c958a8b74ebaef052a5b2e5fa3836c64b
2021-08-03 09:13:47 -07:00
Can Balioglu
7565039ee9 Support system-provided Intel TBB (#61934)
Summary:
This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic.

Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934

Reviewed By: malfet

Differential Revision: D29805416

Pulled By: cbalioglu

fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd
2021-08-02 07:39:00 -07:00
Jane Xu
e318058ffe Ignore LNK4099 for debug binary libtorch builds (#62060)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61979

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62060

Test Plan:
This CI shouldn't break
and https://github.com/pytorch/pytorch/pull/62061

Reviewed By: driazati

Differential Revision: D29877487

Pulled By: janeyx99

fbshipit-source-id: 497f84caab3f9ae609644fd397ad87a6dc8a2a77
2021-07-23 09:31:41 -07:00
Luca Wehrstedt
a1780432fa Move c10d to libtorch(_cuda) (#59563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563

ghstack-source-id: 131331264

Test Plan: CI

Reviewed By: malfet

Differential Revision: D28932239

fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34
2021-06-15 02:01:31 -07:00
Nikita Shulga
1ea5c19c19 Add USE_WHOLE_CUDNN option (#59744)
Summary:
It is only enabled if USE_STATIC_CUDNN is enabled

Next step after https://github.com/pytorch/pytorch/pull/59721 towards resolving fast kernels stripping reported in https://github.com/pytorch/pytorch/issues/50153

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59744

Reviewed By: seemethere, ngimel

Differential Revision: D29007314

Pulled By: malfet

fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a
2021-06-09 21:12:42 -07:00
Nikita Shulga
7179e7ea7b [CMake] Prefer third_party/pybind11 by default (#58951)
Summary:
To make build behaviour aligned with other third_party/ libraries,
introduce `USE_SYSTEM_PYBIND11 (d55b25a633)` build option, which set to OFF by
default, which means PyTorch will be build with bundled pybind11 even if
other version is already installed locally.

Fixes https://github.com/pytorch/pytorch/issues/58750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951

Reviewed By: driazati

Differential Revision: D28690411

Pulled By: malfet

fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c
2021-05-25 15:10:17 -07:00
Nathan John Sircombe
bf00d26deb Enables builds with Compute Library backend for oneDNN (#55913)
Summary:
Since v1.7, oneDNN (MKL-DNN) has supported the use of Compute Library
for the Arm architeture to provide optimised convolution primitives
on AArch64.

This change enables the use of Compute Library in the PyTorch build.
Following the approach used to enable the use of CBLAS in MKLDNN,
It is enabled by setting the env vars USE_MKLDNN and USE_MKLDNN_ACL.
The location of the Compute Library build must be set useing `ACL_ROOT_DIR`.

This is an extension of the work in https://github.com/pytorch/pytorch/pull/50400
which added support for the oneDNN/MKL-DNN backend on AArch64.

_Note: this assumes that Compute Library has been built and installed at
ACL_ROOT_DIR. Compute library can be downloaded here:
`https://github.com/ARM-software/ComputeLibrary`_

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55913

Reviewed By: ailzhang

Differential Revision: D28559516

Pulled By: malfet

fbshipit-source-id: 29d24996097d0a54efc9ab754fb3f0bded290005
2021-05-20 07:43:56 -07:00
Xiang Gao
6c70cbedb6 step 0 of cuDNN v8 convolution API integration (#51390)
Summary:
This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release.

The work is not complete, and this PR is only step 0.

**What this PR does:**
- Add cudnn-frontend as a submodule.
- Modify cmake to build that submodule.
- Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default.
- Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below.

**What this PR doesn't:**
- Only convolution forward, no backward. The backward will use v7 API.
- No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions.
- No test beyond PyTorch's unit tests.
  - Not tested for correctness on real models.
  - Not benchmarked for performance.
- Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR)
- cuDNN benchmark is not supported.
- There are failing tests, which will be resolved later:
  ```
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in...
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (...
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9
  FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an...
  FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet
  FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
  FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
  ```

Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390

Reviewed By: malfet

Differential Revision: D28513167

Pulled By: ngimel

fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740
2021-05-19 12:54:09 -07:00
Pavel Belevich
96e1a83fb2 Add Gloo TCP_TLS transport (#56442)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56442

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D27896285

Pulled By: pbelevich

fbshipit-source-id: 589af59ca4c7c9bab2329f079382c09b71cfcf9e
2021-05-07 13:36:11 -07:00
Kimish Patel
f4a921600a [PyTorch, Mobile] Serialization format change for source range (#54284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284

In order to bring mobile deployment, via lite interpreter, on feature
parity with JIT, with respect model level debug information we must make
model level debug information available to mobile runtime.
At the moment, model level debug information is stored in SourceRange
which associates node's of graph to where the come from in original
python source code.
This information is serialized as part of debug_pkl and deserialized
when JIT loads the model and reads the model code.
On lite interpreter, we do not have access to all the functionality of
JIT and hence we cannot load model in the same way as JIT, by reading
code, constructing module hierarchy and graph corresponding module
methods etc. Instead in, lite interpreter, only bytecode corresonding to
the compiled graph, Code, is saved.
Thus in order to annotate OPs in the bytecode with equivalent
SourceRange information we do the following:
1. During model serialization, we create a unique tag for each source
range of the model.
2. Create a map of <SourceRange, tag>
3. During debug_pkl serialization we save tag along with SourceRange, on
top of byte offset.
4. During bytecode generation, the methods of the top module are
lowered. During this process methods are inlined. In the inlined graph,
when the node of a graph is lowered to bytecode, we query node's source
range and look it up against the map.
5. Resulting source range tag is serialized in module_debug_info.
6. During model deserialization, we read all the debug_pkl records in
the archieve and create a map of <tag, SourceRange>
7. This map can be used to find source code information.

During mobile runtime:
1. We read all the debug_pkl records and create <tag=debug_handle,
SourceRange> map.
   1.1 This map, MobileDebugInfo, is a member of mobile Module.
2. Interpreter catches appropriate exceptions and sets the thread local
debug handle and rethrows the exception.
3. In Function's run method we catch exception and query current debug
handle where the exception happened.
4. Query MobileDebugInfo with debug handle to retrieve source range and
augment error with source range info.

This information is still incomplete as it does not contain entire
callstack.

In the following diffs we will serialize InlinedCallStack directly.

Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro,
so that mobile builds can avoid building MobileDebugInfo, source range
and source range pickler/unpickler. Later we will add path where, if
building without debug support stack trace will contain only debug
handles. They can be symbolicated later.

Test Plan:
Ported bunch of source range tests from test_jit.py. Added on more test
in test_lite_interpreter.py

Imported from OSS

Reviewed By: raziel

Differential Revision: D27174722

fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12
2021-05-04 09:19:27 -07:00
davidriazati@fb.com
264d87985a Use ld.gold by default to link in CI (#57061)
Summary:
This adds an option to CMake to use `ld.gold` to link rather than `ld` (which symlinks to `ld.bfd` on Ubuntu by default). This shouldn't change any functionality, only a mild improvement on link times during builds (shaves off 1 minute) on CI.

Verify by searching for `ld.gold is available` in [the logs](https://circleci.com/api/v1.1/project/github/pytorch/pytorch/13046834/output/105/0?file=true&allocation-id=608c434338107e5b6cf938a1-0-build%2F7BDA2FF1)
](https://our.intern.facebook.com/intern/diff/28123522/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57061

Pulled By: driazati

Reviewed By: janeyx99

Differential Revision: D28123522

fbshipit-source-id: 5a60798ca4785427fd92bbf3b3aa5f63730e9b20
2021-05-03 10:05:36 -07:00
davidriazati@fb.com
c44cbc63cc Ignore more compiler warnings, unify WERROR options (#56630)
Summary:
This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet).
](https://our.intern.facebook.com/intern/diff/28005063/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630

Pulled By: driazati

Reviewed By: malfet

Differential Revision: D28005063

fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0
2021-04-29 21:20:29 -07:00
davidriazati@fb.com
21be40b390 Add torch_cpu specific flag for debug info (#57190)
Summary:
Right now we are using `REL_WITH_DEB_INFO=1` on Linux CI binary builds. This is causing intermittent failures on CUDA builds since the debug information increases the load on the linker. This adds a workaround by a flag to enable debug info only for the target we actually want it for (`libtorch_cpu.so`, all the other binaries are stripped over their debug info after building).

Example failures (from [the hud](https://ezyang.github.io/pytorch-ci-hud/build2/pytorch-nightly?mode=nightly)):
* https://app.circleci.com/pipelines/github/pytorch/pytorch/311785/workflows/df640957-54b0-4592-aeef-6d5baee503ae/jobs/12932229
* https://app.circleci.com/pipelines/github/pytorch/pytorch/311784/workflows/e3b487d6-fb46-4a5d-a2d5-22eec328b678/jobs/12932228
* https://app.circleci.com/pipelines/github/pytorch/pytorch/311784/workflows/e3b487d6-fb46-4a5d-a2d5-22eec328b678/jobs/12932227

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57190

Pulled By: driazati

Reviewed By: janeyx99

Differential Revision: D28085550

fbshipit-source-id: 0fc5b3e769b10c0dd3811717f968d0c933667361
2021-04-29 12:06:15 -07:00
Will Constable
21fd5f4b79 Document current deploy cpython build #56490 (#56600)
Summary:
Call out the issues with cpython deps and suggest a workaround.

Fixes https://github.com/pytorch/pytorch/issues/56490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56600

Reviewed By: albanD

Differential Revision: D27920647

Pulled By: wconstab

fbshipit-source-id: 61a53a176eaf42a6166d649d3cb0fdfa2489e9d2
2021-04-22 09:02:29 -07:00
Eddie Yan
81f181567a Add USE_MAGMA build flag (#55994)
Summary:
Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master).

A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be *manually* deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild?

CC malfet ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994

Reviewed By: mruberry

Differential Revision: D27766287

Pulled By: malfet

fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421
2021-04-15 00:43:12 -07:00
Ailing Zhang
1688a5d31a Cleanup since FEATURE_TORCH_MOBILE is always true. (#55835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55835

Now that https://github.com/pytorch/pytorch/pull/55238 is landed for a
week and no complains. It seems safe to say FEATURE_TORCH_MOBILE is
always true and we can do some cleanup.

Test Plan: Imported from OSS

Reviewed By: ezyang, walterddr

Differential Revision: D27721284

Pulled By: ailzhang

fbshipit-source-id: 4896bc5f736373d0922cfbe8eed0d16df62f0fa1
2021-04-14 09:08:18 -07:00
Ivan Kobzarev
85fcadc059 [lite-interpreter] speed_benchmark_torch support BUILD_LITE_INTERPRETER (#55402)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55402

Test Plan: Imported from OSS

Reviewed By: cccclai

Differential Revision: D27599824

Pulled By: IvanKobzarev

fbshipit-source-id: 3adbb8a16a785d3610404d71ef2d895904b1a8ef
2021-04-07 11:39:32 -07:00
SpaceIm
aeedd5c7df cmake: fix ONNX_NAMESPACE if USE_SYSTEM_ONNX (#54973)
Summary:
`ONNX_NAMESPACE` is empty by default if `USE_SYSTEM_ONNX ON`, while it should be equal to `onnx`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54973

Reviewed By: glaringlee

Differential Revision: D27466020

Pulled By: walterddr

fbshipit-source-id: 47cde3604acbda3f45bec5893036b39fd1eb58c9
2021-03-31 08:29:00 -07:00
Nikita Shulga
68bdeef2ce [CMake] Simplify CPU architecture detection logic (#54637)
Summary:
CMAKE_SYSTEM_PROCESSOR set to x86_64(on Linux) or AMD64 (5ec224496b)(on Windows) indicates build is running on x86_64 architecture, while `CMAKE_SYSTEM_PROCESSOR` set to aarch64 or arm64 means we running on ARMv8+ architecture.
Delete `i[3-6]86` pattern as 32-bit builds are no longer supported

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54637

Reviewed By: ezyang

Differential Revision: D27311897

Pulled By: malfet

fbshipit-source-id: 26989fc9b54a96d70c768ab03ca4528506ee7808
2021-03-25 12:32:18 -07:00
Leonard Lausen
90bbe0b38b cmake: auto-detect ccache to speed up developer builds (#49389)
Summary:
https://ccache.dev/ is a compiler cache that speeds up subsequent builds. Auto-detecting ccache ensures that it is used on systems where it is available, greatly improving build times for developers. There is no risk in enabling ccache in practice. Please refer to https://ccache.dev/ for a short summary / motivation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49389

Reviewed By: ejguan

Differential Revision: D27169957

Pulled By: malfet

fbshipit-source-id: 673b60bbceb0d323901c8a992a75792c6da9b805
2021-03-18 14:20:53 -07:00
Ashkan Aliabadi
e5ecd1ddf8 [Vulkan]Fix build warnings-treated-as-error on Linux. (#52781)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D26669311

Pulled By: AshkanAliabadi

fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311
2021-03-03 13:48:43 -08:00
Chen Lai
14f7bf0629 [PyTorch] update CMake to build libtorch lite (#51419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51419

## Summary

1. Add an option `BUILD_LITE_INTERPRETER` in `caffe2/CMakeLists.txt` and set `OFF` as default.
2. Update 'build_android.sh' with an argument to swtich `BUILD_LITE_INTERPRETER`, 'OFF' as default.
3. Add a mini demo app `lite_interpreter_demo` linked with `libtorch` library, which can be used for quick test.

## Test Plan
Built lite interpreter version of libtorch and test with Image Segmentation demo app ([android version](https://github.com/pytorch/android-demo-app/tree/master/ImageSegmentation)/[ios version](https://github.com/pytorch/ios-demo-app/tree/master/ImageSegmentation))

### Android
1. **Prepare model**: Prepare the lite interpreter version of model by run the script below to generate the scripted model `deeplabv3_scripted.pt` and `deeplabv3_scripted.ptl`
```
import torch

model = torch.hub.load('pytorch/vision:v0.7.0', 'deeplabv3_resnet50', pretrained=True)
model.eval()

scripted_module = torch.jit.script(model)
# Export full jit version model (not compatible lite interpreter), leave it here for comparison
scripted_module.save("deeplabv3_scripted.pt")
# Export lite interpreter version model (compatible with lite interpreter)
scripted_module._save_for_lite_interpreter("deeplabv3_scripted.ptl")

```
2. **Build libtorch lite for android**: Build libtorch for android for all 4 android abis (armeabi-v7a, arm64-v8a, x86, x86_64) `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh`. This pr is tested on Pixel 4 emulator with x86, so use cmd `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` to specify abi to save built time. After the build finish, it will show the library path:
```
...
BUILD SUCCESSFUL in 55s
134 actionable tasks: 22 executed, 112 up-to-date
+ find /Users/chenlai/pytorch/android -type f -name '*aar'
+ xargs ls -lah
-rw-r--r--  1 chenlai  staff    13M Feb 11 11:48 /Users/chenlai/pytorch/android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
-rw-r--r--  1 chenlai  staff    36K Feb  9 16:45 /Users/chenlai/pytorch/android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
```
3. **Use the PyTorch Android libraries built from source in the ImageSegmentation app**: Create a folder 'libs' in the path, the path from repository root will be `ImageSegmentation/app/libs`. Copy `pytorch_android-release` to the path `ImageSegmentation/app/libs/pytorch_android-release.aar`. Copy 'pytorch_android_torchvision` (downloaded from [here](https://oss.sonatype.org/#nexus-search;quick~torchvision_android)) to the path `ImageSegmentation/app/libs/pytorch_android_torchvision.aar` Update the `dependencies` part of `ImageSegmentation/app/build.gradle` to
```
dependencies {
    implementation 'androidx.appcompat:appcompat:1.2.0'
    implementation 'androidx.constraintlayout:constraintlayout:2.0.2'
    testImplementation 'junit:junit:4.12'
    androidTestImplementation 'androidx.test.ext:junit:1.1.2'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0'

    implementation(name:'pytorch_android-release', ext:'aar')
    implementation(name:'pytorch_android_torchvision', ext:'aar')

    implementation 'com.android.support:appcompat-v7:28.0.0'
    implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'
}
```
Update `allprojects` part in `ImageSegmentation/build.gradle` to
```

allprojects {
    repositories {
        google()
        jcenter()
        flatDir {
            dirs 'libs'
        }
    }
}
```
4. **Update model loader api**: Update `ImageSegmentation/app/src/main/java/org/pytorch/imagesegmentation/MainActivity.java` by
4.1 Add new import: `import org.pytorch.LiteModuleLoader;`
4.2 Replace the way to load pytorch lite model
```
//            mModule = Module.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.pt"));
            mModule = LiteModuleLoader.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.ptl"));
```
5. **Test app**: Build and run the ImageSegmentation app in Android Studio,
![image](https://user-images.githubusercontent.com/16430979/107696279-9cea5900-6c66-11eb-8286-4d1d68abff61.png)

### iOS
1. **Prepare model**: Same as Android.
2. **Build libtorch lite for ios** `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR BUILD_LITE_INTERPRETER=1   ./scripts/build_ios.sh`
3. **Remove Cocoapods from the project**: run `pod deintegrate`
4. **Link ImageSegmentation demo app with the custom built library**:
Open your project in XCode, go to your project Target’s **Build Phases - Link Binaries With Libraries**, click the **+** sign and add all the library files located in `build_ios/install/lib`. Navigate to the project **Build Settings**, set the value **Header Search Paths** to `build_ios/install/include` and **Library Search Paths** to `build_ios/install/lib`.
In the build settings, search for **other linker flags**. Add a custom linker flag below
```
-all_load
```
Finally, disable bitcode for your target by selecting the Build Settings, searching for Enable Bitcode, and set the value to No.
**

5. Update library and api**
5.1 Update `TorchModule.mm``
To use the custom built libraries the project, replace `#import <LibTorch/LibTorch.h>` (in `TorchModule.mm`) which is needed when using LibTorch via Cocoapods with the code below:

```
//#import <LibTorch/LibTorch.h>
#include "ATen/ATen.h"
#include "caffe2/core/timer.h"
#include "caffe2/utils/string_utils.h"
#include "torch/csrc/autograd/grad_mode.h"
#include "torch/script.h"
#include <torch/csrc/jit/mobile/function.h>
#include <torch/csrc/jit/mobile/import.h>
#include <torch/csrc/jit/mobile/interpreter.h>
#include <torch/csrc/jit/mobile/module.h>
#include <torch/csrc/jit/mobile/observer.h>
```
5.2 Update `ViewController.swift`
```
//        if let filePath = Bundle.main.path(forResource:
//            "deeplabv3_scripted", ofType: "pt"),
//            let module = TorchModule(fileAtPath: filePath) {
//            return module
//        } else {
//            fatalError("Can't find the model file!")
//        }
        if let filePath = Bundle.main.path(forResource:
            "deeplabv3_scripted", ofType: "ptl"),
            let module = TorchModule(fileAtPath: filePath) {
            return module
        } else {
            fatalError("Can't find the model file!")
        }
```

### Unit test
Add `test/cpp/lite_interpreter`, with one unit test `test_cores.cpp` and a light model `sequence.ptl` to test `_load_for_mobile()`, `bc.find_method()` and `bc.forward()` functions.

### Size:
**With the change:**
Android:
x86: `pytorch_android-release.aar` (**13.8 MB**)

IOS:
`pytorch/build_ios/install/lib` (lib: **66 MB**):
```
(base) chenlai@chenlai-mp lib % ls -lh
total 135016
-rw-r--r--  1 chenlai  staff   3.3M Feb 15 20:45 libXNNPACK.a
-rw-r--r--  1 chenlai  staff   965K Feb 15 20:45 libc10.a
-rw-r--r--  1 chenlai  staff   4.6K Feb 15 20:45 libclog.a
-rw-r--r--  1 chenlai  staff    42K Feb 15 20:45 libcpuinfo.a
-rw-r--r--  1 chenlai  staff    39K Feb 15 20:45 libcpuinfo_internals.a
-rw-r--r--  1 chenlai  staff   1.5M Feb 15 20:45 libeigen_blas.a
-rw-r--r--  1 chenlai  staff   148K Feb 15 20:45 libfmt.a
-rw-r--r--  1 chenlai  staff    44K Feb 15 20:45 libpthreadpool.a
-rw-r--r--  1 chenlai  staff   166K Feb 15 20:45 libpytorch_qnnpack.a
-rw-r--r--  1 chenlai  staff   384B Feb 15 21:19 libtorch.a
-rw-r--r--  1 chenlai  staff    **60M** Feb 15 20:47 libtorch_cpu.a
```
`pytorch/build_ios/install`:
```
(base) chenlai@chenlai-mp install % du -sh *
 14M	include
 66M	lib
2.8M	share
```

**Master (baseline):**
Android:
x86: `pytorch_android-release.aar` (**16.2 MB**)

IOS:
`pytorch/build_ios/install/lib` (lib: **84 MB**):
```
(base) chenlai@chenlai-mp lib % ls -lh
total 172032
-rw-r--r--  1 chenlai  staff   3.3M Feb 17 22:18 libXNNPACK.a
-rw-r--r--  1 chenlai  staff   969K Feb 17 22:18 libc10.a
-rw-r--r--  1 chenlai  staff   4.6K Feb 17 22:18 libclog.a
-rw-r--r--  1 chenlai  staff    42K Feb 17 22:18 libcpuinfo.a
-rw-r--r--  1 chenlai  staff   1.5M Feb 17 22:18 libeigen_blas.a
-rw-r--r--  1 chenlai  staff    44K Feb 17 22:18 libpthreadpool.a
-rw-r--r--  1 chenlai  staff   166K Feb 17 22:18 libpytorch_qnnpack.a
-rw-r--r--  1 chenlai  staff   384B Feb 17 22:19 libtorch.a
-rw-r--r--  1 chenlai  staff    78M Feb 17 22:19 libtorch_cpu.a
```
`pytorch/build_ios/install`:
```
(base) chenlai@chenlai-mp install % du -sh *
 14M	include
 84M	lib
2.8M	share
```

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D26518778

Pulled By: cccclai

fbshipit-source-id: 4503ffa1f150ecc309ed39fb0549e8bd046a3f9c
2021-02-21 01:43:54 -08:00
Bel H
db33afbf9f Change cmake to allow building with MLC kick-off build (#51326)
Summary:
- Allows build process to build with MLC enabled if subrepo folder mlc is in path and we can link against ML Compute on macOS BigSur
- To build with MLC enabled you will need to clone the mlc repo inside the pytorch repository.
- We need both this change and https://github.com/pytorch/pytorch/pull/50634 on pytorch/pytorch to enable the `mlc` device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51326

Reviewed By: glaringlee

Differential Revision: D26533138

Pulled By: malfet

fbshipit-source-id: 0baa06b4eb2d62dbfc0f6fc922096cb0db1cc7d1
2021-02-19 13:04:25 -08:00
Jiakai Liu
c9c4b871a5 [pytorch] reintroduce static dispatch (#51957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51957

This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```

Differential Revision: D26337857

Test Plan: Imported from OSS

Reviewed By: bhosmer

Pulled By: ljk53

fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364
2021-02-19 11:41:39 -08:00
Jane Xu
ac2bdf553e update build_host_protoc command for macos cross compilation (#50922)
Summary:
Currently, adding a cross compile build is failing on CI due to a cmake builtin compiler check that does not pass due to cross compiling the host protoc library.

Setting the CMAKE_TRY_COMPILE_TARGET_TYPE flag should fix it. (Based on this [SOF answer](https://stackoverflow.com/questions/53633705/cmake-the-c-compiler-is-not-able-to-compile-a-simple-test-program).)

To test that this works, please run: `CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_NNPACK=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF python setup.py install` from a Mac x86_64 machine with Xcode12.3 (anything with MacOS 11 SDK).

Then, you can check that things were compiled for arm by running `lipo -info <file>` for any file in the `build/lib` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50922

Reviewed By: malfet

Differential Revision: D26355054

Pulled By: janeyx99

fbshipit-source-id: 919f3f9bd95d7c7bba6ab3a95428d3ca309f8ead
2021-02-11 14:36:51 -08:00
cyy
1aaddd83a5 don't set the same C++ and C standards twice (#51832)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51832

Reviewed By: izdeby

Differential Revision: D26312660

Pulled By: ezyang

fbshipit-source-id: 7d646cd106397e70bca0050d0aa30eb62b085cee
2021-02-08 08:53:26 -08:00
Jane Xu
88af2149e1 Add build option to split torch_cuda library into torch_cuda_cu and torch_cuda_cpp (#49050)
Summary:
Because of the size of our `libtorch_cuda.so`, linking with other hefty binaries presents a problem where 32bit relocation markers are too small and end up overflowing. This PR attempts to break up `torch_cuda` into `torch_cuda_cu` and `torch_cuda_cpp`.

`torch_cuda_cu`: all the files previously in `Caffe2_GPU_SRCS` that are
* pure `.cu` files in `aten`match
* all the BLAS files
* all the THC files, except for THCAllocator.cpp, THCCachingHostAllocator.cpp and THCGeneral.cpp
* all files in`detail`
* LegacyDefinitions.cpp and LegacyTHFunctionsCUDA.cpp
* Register*CUDA.cpp
* CUDAHooks.cpp
* CUDASolver.cpp
* TensorShapeCUDA.cpp

`torch_cuda_cpp`: all other files in `Caffe2_GPU_SRCS`

Accordingly, TORCH_CUDA_API and TORCH_CUDA_BUILD_MAIN_LIB usages are getting split as well to TORCH_CUDA_CU_API and TORCH_CUDA_CPP_API.

To test this locally, you can run `export BUILD_SPLIT_CUDA=ON && python setup.py develop`. In your `build/lib` folder, you should find binaries for both `torch_cuda_cpp` and `torch_cuda_cu`. To see that the SPLIT_CUDA option was toggled, you can grep the Summary of running cmake and make sure `Split CUDA` is ON.

This build option is tested on CI for CUDA 11.1 builds (linux for now, but windows soon).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49050

Reviewed By: walterddr

Differential Revision: D26114310

Pulled By: janeyx99

fbshipit-source-id: 0180f2519abb5a9cdde16a6fb7dd3171cff687a6
2021-02-01 18:42:35 -08:00
Ivan Kobzarev
dbfaf966b0 [android] turn on USE_VULKAN for android builds by default (#51291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51291

Turning on USE_VULKAN for android builds
Remove standalone android vulkan build

Testing all ci jobs (for master): https://github.com/pytorch/pytorch/pull/51292

Test Plan: Imported from OSS

Reviewed By: AshkanAliabadi

Differential Revision: D26141891

Pulled By: IvanKobzarev

fbshipit-source-id: e8e1a4ab612c0786ce09217ab9370fd75a71eb00
2021-01-29 11:58:21 -08:00
Will Constable
f2e41257e4 Back out "Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"" (#51267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51267

Original commit changeset: b70185916502

Test Plan: test locally, oss ci-all, fbcode incl deferred

Reviewed By: suo

Differential Revision: D26121251

fbshipit-source-id: 4315b7fd5476914c8e5d6f547e1cfbcf0c227781
2021-01-28 19:30:45 -08:00
Mike Ruberry
12a434abbc Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"
Test Plan: revert-hammer

Differential Revision:
D26077905 (dc2a44c4fc)

Original commit changeset: fae83bf9822d

fbshipit-source-id: b70185916502ba9ebe16d781cf0659b9f7865c9a
2021-01-27 19:53:29 -08:00
Will Constable
dc2a44c4fc Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" (#51124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51124

Original commit changeset: 1c7133627da2

Test Plan: Test locally with interpreter_test and on CI

Reviewed By: suo

Differential Revision: D26077905

fbshipit-source-id: fae83bf9822d79e9a9b5641bc5191a7f3fdea78d
2021-01-27 16:49:42 -08:00
Mike Ruberry
e843974a6e Revert D25850783: Add torch::deploy, an embedded torch-python interpreter
Test Plan: revert-hammer

Differential Revision:
D25850783 (3192f9e4fe)

Original commit changeset: a4656377caff

fbshipit-source-id: 1c7133627da28fb12848da7a9a46de6d3b2b67c6
2021-01-26 02:07:44 -08:00
Will Constable
3192f9e4fe Add torch::deploy, an embedded torch-python interpreter (#50458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50458

libinterpreter.so contains a frozen python distribution including
torch-python bindings.

Freezing refers to serializing bytecode of python standard library modules as
well as the torch python library and embedding them in the library code.  This
library can then be dlopened multiple times in one process context, each
interpreter having its own python state and GIL.  In addition, each python
environment is sealed off from the filesystem and can only import the frozen
modules included in the distribution.

This change relies on newly added frozenpython, a cpython 3.8.6 fork built for this purpose.  Frozenpython provides libpython3.8-frozen.a which
contains frozen bytecode and object code for the python standard library.

Building on top of frozen python, the frozen torch-python bindings are added in
this diff, providing each embedded interpreter with a copy of the torch
bindings.  Each interpreter is intended to share one instance of libtorch and
the underlying tensor libraries.

Known issues

- Autograd is not expected to work with the embedded interpreter currently, as it manages
its own python interactions and needs to coordinate with the duplicated python
states in each of the interpreters.
- Distributed and cuda stuff is disabled in libinterpreter.so build, needs to be revisited
- __file__ is not supported in the context of embedded python since there are no
files for the underlying library modules.
using __file__
- __version__ is not properly supported in the embedded torch-python, just a
workaround for now

Test Plan: tested locally and on CI with cmake and buck builds running torch::deploy interpreter_test

Reviewed By: ailzhang

Differential Revision: D25850783

fbshipit-source-id: a4656377caff25b73913daae7ae2f88bcab8fd88
2021-01-25 15:14:28 -08:00
Will Constable
4bbff92014 Refactor build targets for torch::deploy (#50288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50288

torch::deploy will bundle the objects contained in libtorch-python together with frozenpython into a shared library.  Therefore, the libtorch-python objs can't bring with them a dependency on system python.

Buck TARGETS are added throughout the caffe2 tree to make available objects or headers that will be needed by torch::deploy but would have brought unsuitable dependencies if accessed using existing targets.

CMakeLists are modified to separate a torch-python-objs object library which lets torch::deploy compile these objs with the same compile flags as libttorch_python used, but without some of the link-time dependencies such as python.

CudaIPCTypes is moved from libtorch_python to libtorch_cuda because it is really not a python binding, and it statically registers a cuda_ipc_callback which would be duplicated if included in each copy of torch::deploy.

Test Plan: no new functionality, just ensure existing tests continue to pass

Reviewed By: malfet

Differential Revision: D25850785

fbshipit-source-id: b0b81c050cbee04e9de96888f8a09d29238a9db8
2021-01-22 09:16:32 -08:00
Ilia Cherniavskii
e34992ebee Set USE_KINETO=1 (#49897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897

Resend of https://github.com/pytorch/pytorch/pull/49201

Test Plan: see 49201

Reviewed By: malfet

Differential Revision: D25717102

Pulled By: ilia-cher

fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6
2021-01-22 00:09:21 -08:00
Nikita Shulga
cebab83d3f Fix USE_MKLDN defaults (#50782)
Summary:
Fixes regression introduced by https://github.com/pytorch/pytorch/pull/50400
`cmake_dependent_option` semantic is following (see https://cmake.org/cmake/help/v3.19/module/CMakeDependentOption.html);
`cmake_dependent_option(<option> "<help_text>" <value> <depends> <force>)`
I.e. depends should be true for CPU_INTEL or CPU_AARCH64 but default value should be ON only if CPU_INTEL is true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50782

Reviewed By: xuzhao9

Differential Revision: D25966509

Pulled By: malfet

fbshipit-source-id: c891cd9234311875762403f7125d0c3803bb0e65
2021-01-19 21:41:53 -08:00
Rong Rong (AI Infra)
ebd142e94b initial commit to enable fast_nvcc (#49773)
Summary:
draft enable fast_nvcc.
* cleaned up some non-standard usages
* added fall-back to wrap_nvcc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49773

Test Plan:
Configuration to enable fast nvcc:
  - install and enable `ccache` but delete `.ccache/` folder before each build.
  - `TORCH_CUDA_ARCH_LIST=6.0;6.1;6.2;7.0;7.5`
  - Toggling `USE_FAST_NVCC=ON/OFF` cmake config and run `cmake --build` to verify the build time.

Initial statistic for a full compilation:
* `cmake --build . -- -j $(nproc)`:
  - fast NVCC
```
        real    48m55.706s
        user    1559m14.218s
        sys     318m41.138s
```
  - normal NVCC:
```
        real    43m38.723s
        user    1470m28.131s
        sys     90m46.879s
```
* `cmake --build . -- -j $(nproc/4)`:
  - fast NVCC:
```
        real    53m44.173s
        user    1130m18.323s
        sys     71m32.385s
```
  - normal  NVCC:
```
        real    81m53.768s
        user    858m45.402s
        sys     61m15.539s
```
* Conclusion: fast NVCC doesn't provide too much gain when compiler is set to use full CPU utilization, in fact it is **even worse** because of the thread switcing.

initial statistic for partial recompile (edit .cu files)

* `cmake --build . -- -j $(nproc)`
  - fast NVCC:
```
[2021-01-13 18:10:24] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o
[2021-01-13 18:11:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so
```
  - normal NVCC:
```
[2021-01-13 17:35:40] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o
[2021-01-13 17:38:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so
```
* Conclusion: Effective compilation time for single CU file modification reduced from from 2min30sec to only 40sec when compiling multiple architecture. This shows **4X** gain in speed up using fast NVCC -- reaching the theoretical limit of 5X when compiling 5 gencode architecture at the same time.

Follow up PRs:
- should have better fallback mechanism to detect whether a build is supported by fast_nvcc or not instead of dryruning then fail with fallback.
- performance measurement instrumentation to measure what's the total compile time vs the parallel tasks critical path time.
- figure out why `-j $(nproc)` gives significant sys overhead (`sys 318m41.138s` vs `sys 90m46.879s`) over normal nvcc, guess this is context switching, but not exactly sure

Reviewed By: malfet

Differential Revision: D25692758

Pulled By: walterddr

fbshipit-source-id: c244d07b9b71f146e972b6b3682ca792b38c4457
2021-01-19 14:50:54 -08:00
Rong Rong
070a30b265 [BE] add warning message to cmake against env var "-std=c++xx" (#50491)
Summary:
this was discovered when working on https://github.com/pytorch/pytorch/issues/50230.

environment variables such as CXXFLAGS="-std=c++17" will not work because we use CMAKE_CXX_STANDARD 14.
Adding this warning to alert users when environment variable was set.

See: [CMake env var usage](https://cmake.org/cmake/help/latest/manual/cmake-env-variables.7.html#id4) and [CXXFLAGS usage](https://cmake.org/cmake/help/latest/envvar/CXXFLAGS.html) for more details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50491

Reviewed By: mrshenli

Differential Revision: D25907851

Pulled By: walterddr

fbshipit-source-id: 5af5eec76f79f9d35456af1f2663cafbc54e7dc8
2021-01-15 07:12:56 -08:00
Nathan John Sircombe
664126bab5 Enables build with oneDNN (MKL-DNN) on AArch64 (#50400)
Summary:
Since version 1.6, oneDNN has provided limited support for AArch64 builds.

This minor change is to detect an AArch64 CPU and permit the use of
`USE_MKLDNN` in that case.

Build flags for oneDNN are also modified accordingly.

Note: oneDNN on AArch64, by default, will use oneDNN's reference C++ kernels.
These are not optimised for AArch64, but oneDNN v1.7 onwards provides support
for a limited set of primitives based Arm Compute Library.
See: https://github.com/oneapi-src/oneDNN/pull/795
and: https://github.com/oneapi-src/oneDNN/pull/820
for more details. Support for ACL-based oneDNN primitives in PyTorch
will require some further modification,

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50400

Reviewed By: izdeby

Differential Revision: D25886589

Pulled By: malfet

fbshipit-source-id: 2c81277a28ad4528c2d2211381e7c6692d952bc1
2021-01-13 08:41:44 -08:00
Jane Xu
c2d37cd990 Change CMake config to enable universal binary for Mac (#50243)
Summary:
This PR is a step towards enabling cross compilation from x86_64 to arm64.

The following has been added:
1. When cross compilation is detected, compile a local universal fatfile to use as protoc.
2. For the simple compile check in MiscCheck.cmake, make sure to compile the small snippet as a universal binary in order to run the check.

**Test plan:**

Kick off a minimal build on a mac intel machine with the macOS 11 SDK with this command:
```
CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF USE_NNPACK=OFF python setup.py install
```
(If you run the above command before this change, or without macOS 11 SDK set up, it will fail.)

Then check the platform of the built binaries using this command:
```
lipo -info build/lib/libfmt.a
```
Output:
- Before this PR, running a regular build via `python setup.py install` (instead of using the flags listed above):
  ```
  Non-fat file: build/lib/libfmt.a is architecture: x86_64
  ```
- Using this PR:
  ```
  Non-fat file: build/lib/libfmt.a is architecture: arm64
  ```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50243

Reviewed By: malfet

Differential Revision: D25849955

Pulled By: janeyx99

fbshipit-source-id: e9853709a7279916f66aa4c4e054dfecced3adb1
2021-01-08 17:26:08 -08:00
Ashkan Aliabadi
1c12cbea90 Optimize Vulkan command buffer submission rate. (#49112)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49112

Differential Revision: D25729889

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Pulled By: AshkanAliabadi

fbshipit-source-id: c4ab470fdcf3f83745971986f3a44a3dff69287f
2021-01-08 16:39:22 -08:00
Antonio Cuni
8f31621f78 Fix MKL builds on Ubuntu (#50212)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/50211

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50212

Reviewed By: janeyx99

Differential Revision: D25850876

Pulled By: walterddr

fbshipit-source-id: be138db3ae370c45f5fbf3af486cf8b32518df87
2021-01-08 13:16:30 -08:00
Jithun Nair
45ec35827e Set USE_RCCL cmake option (dependent on USE_NCCL) [REDUX] (#34683)
Summary:
Refiled duplicate of https://github.com/pytorch/pytorch/issues/31341 which was reverted in commit 63964175b5.

This PR enables RCCL support when building Gloo as part of PyTorch for ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34683

Reviewed By: glaringlee

Differential Revision: D25540578

Pulled By: ezyang

fbshipit-source-id: fcb02e5745d62e1b7d2e02048160e9e7a4b4df2d
2021-01-06 07:03:02 -08:00
Rong Rong (AI Infra)
12ee7b61e7 support building with conda installed libraries (#50080)
Summary:
This should fix a bunch of share library compilation error when installed in conda lib, lib64 folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50080

Reviewed By: seemethere

Differential Revision: D25781923

Pulled By: walterddr

fbshipit-source-id: 78a74925981d65243b98bb99a65f1f2766e87a2f
2021-01-05 12:32:51 -08:00
Ilia Cherniavskii
72b00a8a52 Revert D25480770: Set USE_KINETO=1
Test Plan: revert-hammer

Differential Revision:
D25480770 (1a92802bde)

Original commit changeset: 037cd774f554

fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1
2020-12-18 07:06:28 -08:00
Ilia Cherniavskii
1a92802bde Set USE_KINETO=1 (#49201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201

This unblocks kineto profiler for 1.8 release.
This PR supercedes https://github.com/pytorch/pytorch/pull/48391
Note: this will somewhat increase the size of linux server binaries, bc
we add libkineto.a and libcupti_static.a:
-rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a
-rw-r--r-- 1 root root 13699658 Nov 13  2019 /usr/local/cuda/lib64/libcupti_static.a

Test Plan:
CI
https://github.com/pytorch/pytorch/pull/48391

Imported from OSS

Reviewed By: ngimel

Differential Revision: D25480770

fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c
2020-12-18 01:48:10 -08:00
Nikita Shulga
84fce6d29a [AARCH64] Fix HAS_VST1 check if compiled by clang (#49182)
Summary:
Use `UL` suffix supported by all C99 compatible compilers  instead of `__AARCH64_UINT64_C`, which is a gcc specific extension

Before the change this check would have failed as follows with a bug-free clang compiler with the following errors:
```
$ clang has_vst1.c
has_vst1.c:5:41: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration]
  v.val[0] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0)));
                                        ^
has_vst1.c:5:79: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration]
  v.val[0] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0)));
                                                                              ^
has_vst1.c:6:41: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration]
  v.val[1] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0)));
                                        ^
has_vst1.c:6:79: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration]
  v.val[1] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0)));
                                                                              ^
4 warnings generated.
/tmp/has_vst1-b1e162.o: In function `main':
has_vst1.c:(.text+0x30): undefined reference to `__AARCH64_UINT64_C'
```

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49182

Reviewed By: walterddr

Differential Revision: D25471994

Pulled By: malfet

fbshipit-source-id: 0129a6f7aabc46aa117ef719d3a211449cb410f1
2020-12-10 15:19:12 -08:00
Ashkan Aliabadi
66440d1b29 Tweak Vulkan memory use. (#47728)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47728

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D25032740

Pulled By: AshkanAliabadi

fbshipit-source-id: 7eb72538dc1aa3feb4e2f8c4ff9c675eb8e97057
2020-11-30 14:28:09 -08:00
Joe Zhu
42e7cdc50a Improve libuv detection on Windows (#48571)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48304

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48571

Reviewed By: ejguan

Differential Revision: D25220903

Pulled By: mrshenli

fbshipit-source-id: a485568621c4e289c5439474c2651186bc63c2f0
2020-11-30 11:16:13 -08:00
Gemfield
3c9e71c9ad fix BUILD_MOBILE_BENCHMARK typo (#48515)
Summary:
BUILD_MOBILE_BENCHMARKS in CMakeLists.txt should be BUILD_MOBILE_BENCHMARK.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48515

Reviewed By: albanD

Differential Revision: D25198724

Pulled By: mrshenli

fbshipit-source-id: 12765d10c272da04cb104202fcbabc6a0b007c5e
2020-11-30 08:38:43 -08:00
Ilia Cherniavskii
f2da18af14 Add USE_KINETO build option (#45888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45888

Adding USE_LIBKINETO build option

Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python
setup.py develop install --cmake

Reviewed By: Chillee

Differential Revision: D25142221

Pulled By: ilia-cher

fbshipit-source-id: d1634a8f9599604ff511fac59b9072854289510c
2020-11-21 20:20:32 -08:00
Bert Maher
8a996dd139 [te] Make BUILD_TENSOREXPR_BENCHMARK a real CMake option (#48158)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48158

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D25059877

Pulled By: bertmaher

fbshipit-source-id: a98b6c18a91b4fe89d12bf5f7ead604e3cc0c8b0
2020-11-18 12:19:14 -08:00
Rong Rong
147a48fb27 [cmake] clean up cmake/Utils.cmake (#47923)
Summary:
Consolidate into cmake/public/utils.cmake

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47923

Reviewed By: samestep

Differential Revision: D24955961

Pulled By: walterddr

fbshipit-source-id: 9d5f6af2b353a8c6f6d521c841fd0989393755cd
2020-11-16 08:12:32 -08:00
Jiakai Liu
8e3af9faa8 [pytorch] fix debug symbol flag for android clang (#46331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46331

Fix the android build size issue #46246.

Test Plan: Imported from OSS

Reviewed By: dhruvbird

Differential Revision: D24390061

Pulled By: ljk53

fbshipit-source-id: b4a6f297e89b9c08dff4297c6a41aabd41d9fff5
2020-11-10 14:55:43 -08:00
Ashkan Aliabadi
6cd8b5e9a7 Provide CMake option to enable Vulkan API. (#46503)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46503

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D24379144

Pulled By: AshkanAliabadi

fbshipit-source-id: 8d8c57f96bbac2a44615828a3474c912704f3a85
2020-10-20 18:45:52 -07:00
Pritam Damania
cb3c1d17e4 Promote -Wcast-function-type to an error in builds. (#46356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46356

Adding the flag `-Werror=cast-function-type` to ensure we don't allow
any invalid casts (ex: PyCFunction casts).

For more details see: https://github.com/pytorch/pytorch/issues/45419
ghstack-source-id: 114632980

Test Plan: waitforbuildbot

Reviewed By: albanD

Differential Revision: D24319759

fbshipit-source-id: 26ce4650c220e8e9dd3550245f214c7e6c21a5dc
2020-10-20 18:09:06 -07:00
Tao Xu
495070b388 [Metal] Add the Python binding for optimize_for_mobile (#46456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46456

Add the python binding in CMake. The general workflow is

- Build pytorch -  `USE_PYTORCH_METAL=ON python setup.py install --cmake`
- Run optimize_for_mobile

```
import torch
from torch.utils.mobile_optimizer import optimize_for_mobile

scripted_model = torch.jit.load('./mobilenetv2.pt')
optimized_model = optimize_for_mobile(scripted_model, backend='metal')
torch.jit.export_opnames(optimized_model)
torch.jit.save(optimized_model, './mobilenetv2_metal.bc')
```
The exported ops are

```
['aten::adaptive_avg_pool2d', 'aten::add.Tensor', 'aten::addmm', 'aten::reshape', 'aten::size.int', 'metal::copy_to_host', 'metal_prepack::conv2d_run']
```
ghstack-source-id: 114559878

Test Plan:
- Sandcastle CI
- Circle CI

Reviewed By: kimishpatel

Differential Revision: D24356768

fbshipit-source-id: fb5c4c4b6316347b67edb4132da044a81470ddfd
2020-10-17 10:26:25 -07:00
Tao Xu
04e5fcc0ed [GPU] Introduce USE_PYTORCH_METAL (#46383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46383

The old `USE_METAL` is actually being used by Caffe2. Here we introduce a new macro to enable metal in pytorch.
ghstack-source-id: 114499392

Test Plan:
- Circle CI
- The Person Segmentation model works

Reviewed By: linbinyu

Differential Revision: D24322018

fbshipit-source-id: 4e5548afba426b49f314366d89b18ba0c7e745ca
2020-10-16 18:19:32 -07:00
Michael Ranieri
b1d24dded1 make a way to disable callgrind (#46116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46116

Ideally I would just use one of the existing preprocessor flags such as `FBCODE_CAFFE2`, but this implies a whole bunch of other things elsewhere, so it is not really a solution for ovrsource.

Test Plan: CI green, we are able to disable it internally with `-DNVALGRIND`

Reviewed By: malfet

Differential Revision: D24227360

fbshipit-source-id: 24a3b393cf46d6a16acca0a9ec52610d4bb8704f
2020-10-13 16:18:04 -07:00
Tao Xu
a277c097ac [iOS][GPU] Add Metal/MPSCNN support on iOS (#46112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112

### Summary

This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta.

allow-large-files

- Users API

```
  auto module = torch::jit::load(model);
  module.eval();
  at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal();
  auto output = module.forward({input}).toTensor().cpu();
```
- Supported Models
    - Person Segmentation v106 (FB Internal)
    - Mobilenetv2

- Supported Operators
    - aten::conv2d
    - aten::addmm
    - aten::add.Tensor
    - aten::sub.Tensor
    - aten::mul.Tensor
    - aten::relu
    - aten::hardtanh
    - aten::hardtanh_
    - aten::sigmoid
    - aten::max_pool2d
    - aten::adaptive_avg_pool2d
    - aten::reshape
    - aten::t
    - aten::view
    - aten::log_softmax.int
    - aten::upsample_nearest2d.vec

- Supported Devices
    - Apple A9 and above
    - iOS 10.2 and above

- CMake scripts
    - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON`

### Test Plan

- Circle CI

ghstack-source-id: 114155638

Test Plan:
1. Sandcastle CI
2. Circle CI

Reviewed By: dreiss

Differential Revision: D23236555

fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625
2020-10-13 01:46:56 -07:00
gunandrose4u
ffd50b8220 SET USE_DISTRIBUTED OFF when libuv is not installed (#45554)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45554

Reviewed By: izdeby

Differential Revision: D24016825

Pulled By: mrshenli

fbshipit-source-id: 332d860429626a915c06f98cad31e6db1cbc4eb1
2020-09-30 12:46:36 -07:00
gunandrose4u
0a38aed025 Auto set libuv_ROOT env var for Gloo submodule on Windows platform (#45484)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45484

Reviewed By: lw

Differential Revision: D23990724

Pulled By: mrshenli

fbshipit-source-id: 1987ce7eb7d3f9d3120c07e954cd6581cd3caf59
2020-09-29 08:58:56 -07:00
gunandrose4u
f07ac6a004 Fix Windows build failure after DDP PR merged (#45335)
Summary:
Fixes #{issue number}
This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335

Reviewed By: zou3519

Differential Revision: D23931471

Pulled By: mrshenli

fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494
2020-09-25 12:37:50 -07:00
Mike Ruberry
103fa3894a Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only
Test Plan: revert-hammer

Differential Revision:
D23841786 (0122299f9b)

Original commit changeset: 334ba1ed73ef

fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f
2020-09-24 22:44:33 -07:00
gunandrose4u
0122299f9b Enable distributed package on windows, Gloo backend supported only (#42897)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42095

For test case part will be committed to this PR later

mrshenli, please help to review

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897

Reviewed By: osalpekar

Differential Revision: D23841786

Pulled By: mrshenli

fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3
2020-09-24 21:13:55 -07:00
Ivan Kobzarev
6debe825be [vulkan] glsl shaders relaxed precision mode to cmake option (#43076)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43076

Test Plan: Imported from OSS

Reviewed By: AshkanAliabadi

Differential Revision: D23143354

Pulled By: IvanKobzarev

fbshipit-source-id: 7b3ead1e63cf8acf6e8e547080a8ead7a2db994b
2020-09-16 12:51:34 -07:00
peter
ed862d3682 Split CUDA_NVCC_FLAGS by space (#44603)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44599

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603

Reviewed By: albanD

Differential Revision: D23692320

Pulled By: ezyang

fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754
2020-09-14 20:25:37 -07:00
Marcin Juszkiewicz
e261e0953e Fix centos8 gcc (#44644)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44198 properly this time

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44644

Reviewed By: albanD

Differential Revision: D23684909

Pulled By: malfet

fbshipit-source-id: cea6f6e2ae28138f6b93a6513d1abd36d14ae573
2020-09-14 12:28:09 -07:00
Marcin Juszkiewicz
566b8d0650 handle missing NEON vst1_*_x2 intrinsics (#44198) (#44199)
Summary:
CentOS 8 on AArch64 has vld1_* intrinsics but lacks vst1q_f32_x2 one.

This patch checks for it and handle it separately to vld1_* ones.

Fixes https://github.com/pytorch/pytorch/issues/44198

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44199

Reviewed By: seemethere

Differential Revision: D23641273

Pulled By: malfet

fbshipit-source-id: c2053c8e0427705eaeeeb82ec030925bff22623a
2020-09-11 16:02:44 -07:00
Yujun
db24c5c582 Change code coverage option name (#43999)
Summary:
According to [documentation](https://github.com/pytorch/pytorch/blob/master/tools/setup_helpers/cmake.py#L265), only options starts with `BUILD_` / `USE_` / `CMAKE_` in `CMakeLists.txt` can be imported by environment variables.

 ---
This diff is originally intended to enable  `c++` source coverage with `CircleCI` and `codecov.io`, but we will finish it in the future. You can find the related information in the diff history. Following is the originally procedur:

Based on [this pull request](1bda5e480c), life becomes much easier for this time.
1.in `build.sh`
- Enable coverage builld option for c++
- `apt-get install lcov`

2.in `test.sh`
- run `lcov`

3.in `pytorch-job-specs.yml`
- copy coverage.info to `test/` folder and upload it to codecov.io

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43999

Test Plan: Test on github

Reviewed By: malfet

Differential Revision: D23464656

Pulled By: scintiller

fbshipit-source-id: b2365691f04681d25ba5c00293fbcafe8e8e0745
2020-09-11 15:55:05 -07:00
Bram Wasti
6512032699 [Static Runtime] Add OSS build for static runtime benchmarks (#43881)
Summary:
Adds CMake option.  Build with:

```
BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43881

Reviewed By: hlu1

Differential Revision: D23430708

Pulled By: bwasti

fbshipit-source-id: a39bf54e8d4d044a4a3e4273a5b9a887daa033ec
2020-09-02 08:00:18 -07:00
Sebastian Pop
c259146477 add missing NEON {vld1,vst1}_*_x2 intrinsics (#43683)
Summary:
Workaround for issue https://github.com/pytorch/pytorch/issues/43265.
Add the missing intrinsics until gcc-7 gets the missing patches backported.

Fixes https://github.com/pytorch/pytorch/issues/43265.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43683

Reviewed By: albanD

Differential Revision: D23467867

Pulled By: malfet

fbshipit-source-id: 7c138dd3de3c45852a60f2cfe8b4d7f7cf76bc7e
2020-09-01 21:19:39 -07:00
Rong Rong
8ca3913f47 Introduce BUILD_CAFFE2 flag (#43673)
Summary:
introduce BUILD_CAFFE2 flag. default to `ON`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43673

Reviewed By: malfet

Differential Revision: D23381035

Pulled By: walterddr

fbshipit-source-id: 1f4582987fa0c4a911f0b18d311c04fdbf8dd8f0
2020-09-01 10:18:23 -07:00
Jiakai Liu
3a0e35c9f2 [pytorch] deprecate static dispatch (#43564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564

Static dispatch was originally introduced for mobile selective build.

Since we have added selective build support for dynamic dispatch and
tested it in FB production for months, we can deprecate static dispatch
to reduce the complexity of the codebase.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23324452

Pulled By: ljk53

fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7
2020-08-27 14:52:48 -07:00
Ann Shan
0dc41ff465 [pytorch] add flag for autograd ops to mobile builds (#43154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43154

Adds the build flag `BUILD_MOBILE_AUTOGRAD` which toggles whether autograd files should be included for a PyTorch mobile build (default off).
ghstack-source-id: 110369406

Test Plan: CI

Reviewed By: ljk53

Differential Revision: D23061913

fbshipit-source-id: bc3d6683ab17f158990d83e4fae0a011d5adeca1
2020-08-20 12:39:55 -07:00
Xiang Gao
ee74c2e5be Compress fatbin to fit into 32bit indexing (#43074)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39968

tested with `TORCH_CUDA_ARCH_LIST='3.5 5.2 6.0 6.1 7.0 7.5 8.0+PTX'`, before this PR, it was failing, and with this  PR, the build succeed.

With `TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0+PTX'`, `libtorch_cuda.so` with symbols changes from 2.9GB -> 2.2GB

cc: ptrblck mcarilli jjsjann123

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43074

Reviewed By: mrshenli

Differential Revision: D23176095

Pulled By: malfet

fbshipit-source-id: 7b3e6d049fc080e519f21e80df05ef68e7bea57e
2020-08-18 09:48:54 -07:00
Nikita Shulga
0cf4a5bccb Add GCC codecoverage flags (#43066)
Summary:
Rename `CLANG_CODE_COVERAGE` option to `CODE_COVERAGE` and add compiler specific flags for GCC and Clang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43066

Reviewed By: scintiller

Differential Revision: D23137488

Pulled By: malfet

fbshipit-source-id: a89570469692f878d84f7da6f9d5dc01df423e80
2020-08-14 17:16:18 -07:00
Nikita Shulga
ea65a56854 Use string(APPEND FOO " bar") instead of `set(FOO "${FOO} bar") (#42844)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42844

Reviewed By: scintiller

Differential Revision: D23067577

Pulled By: malfet

fbshipit-source-id: e4380ce02fd6aca37c955a7bc24435222c5d8b19
2020-08-12 10:33:11 -07:00
Yujun Zhao
7524699d58 Modify clang code coverage to CMakeList.txt (for MacOS) (#42837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42837

Originally we use
```
list(APPEND CMAKE_C_FLAGS  -fprofile-instr-generate -fcoverage-mapping)
list(APPEND CMAKE_CXX_FLAGS  -fprofile-instr-generate -fcoverage-mapping)
```
But when compile project on mac with Coverage On, it has the error:
`clang: error: no input files
/bin/sh: -fprofile-instr-generate: command not found
/bin/sh: -fcoverage-mapping: command not found`

The reason behind it, is `list(APPEND CMAKE_CXX_FLAGS` will add an additional `;` to the variable. This means, if we do `list(APPEND foo a)` and then `list(APPEND foo b)`, then `foo` will be `a;b` -- with the additional `;`. Since we have `CMAKE_CXX_FLAGS` defined before in the `CMakeList.txt`, we can only use `set(...)` here
After changing it to
```
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fprofile-instr-generate -fcoverage-mapping")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fprofile-instr-generate -fcoverage-mapping")
```
Test successufully in local mac machine.

Test Plan: Test locally on mac machine

Reviewed By: malfet

Differential Revision: D23043057

fbshipit-source-id: ff6f4891b35b7f005861ee2f8e4c550c997fe961
2020-08-11 09:57:55 -07:00
Khalid Almufti
b282297559 Replace whitelist with allowlist (#42067)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41757

I've replaced all the whitelist with allowlist for this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067

Reviewed By: pbelevich

Differential Revision: D22791690

Pulled By: malfet

fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4
2020-07-28 08:01:16 -07:00
Edward Yang
befb22790f Fix a number of deprecation warnings (#40179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40179

- Pass no-psabi to shut up GCC about # Suppress "The ABI for passing
  parameters with 64-byte alignment has changed in GCC 4.6"
- Fix use of deprecated data() accessor (and minor optimization: hoist
  accessor out of loop)
- Undeprecate NetDef.num_workers, no one is serious about fixing these
- Suppress warnings about deprecated pthreadpool types

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D22234138

Pulled By: ezyang

fbshipit-source-id: 6a1601b6d7551a7e6487a44ae65b19acdcb7b849
2020-07-14 09:11:34 -07:00
Kimish Patel
d6feb6141f [Vec256][neon] Add neon backend for vec256 (#39341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341

This PR introduces neon backend for vec256 class for float datatype.
For now only aarch64 is enabled due to few issues with enabling in
aarch32 bit.

Test Plan:
vec256_test

Imported from OSS

Differential Revision: D21822399

fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d
2020-07-09 16:25:09 -07:00
Kimish Patel
bddba1e336 Add benchmark for add op. (#40059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059

This benchmark is added specifically for mobile to see if compiler is
autovectorizing and thus we have no advantage of neon backend for vec256
for add op.

Test Plan:
CI

Imported from OSS

Differential Revision: D22055146

fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5
2020-07-09 16:22:55 -07:00
Yujun Zhao
22f940b7bd add clang code coverage compile flags (#41103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41103

add a CLANG_CODE_COVERAGE option to CMakeList. If the option is ON, add code coverage needed compile flags.

Test Plan:
Clone pytorch source code to local, modified these changes and builded it with `CLANG_CODE_COVERAGE ON` and `BUILD_TESTS ON`.  Run a manual test and attach code coverage report.

{F243609020}

Reviewed By: malfet

Differential Revision: D22422513

fbshipit-source-id: 27a31395c31b5b5f4b72523954722771d8f61080
2020-07-09 14:14:18 -07:00
David Reiss
b7e044f0e5 Re-apply PyTorch pthreadpool changes
Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.

Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`

Reviewed By: xcheng16

Differential Revision: D22199952

fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
2020-06-23 19:26:21 -07:00