Commit Graph

369 Commits

Author SHA1 Message Date
Rong Rong
147a48fb27 [cmake] clean up cmake/Utils.cmake (#47923)
Summary:
Consolidate into cmake/public/utils.cmake

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47923

Reviewed By: samestep

Differential Revision: D24955961

Pulled By: walterddr

fbshipit-source-id: 9d5f6af2b353a8c6f6d521c841fd0989393755cd
2020-11-16 08:12:32 -08:00
Jiakai Liu
8e3af9faa8 [pytorch] fix debug symbol flag for android clang (#46331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46331

Fix the android build size issue #46246.

Test Plan: Imported from OSS

Reviewed By: dhruvbird

Differential Revision: D24390061

Pulled By: ljk53

fbshipit-source-id: b4a6f297e89b9c08dff4297c6a41aabd41d9fff5
2020-11-10 14:55:43 -08:00
Ashkan Aliabadi
6cd8b5e9a7 Provide CMake option to enable Vulkan API. (#46503)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46503

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D24379144

Pulled By: AshkanAliabadi

fbshipit-source-id: 8d8c57f96bbac2a44615828a3474c912704f3a85
2020-10-20 18:45:52 -07:00
Pritam Damania
cb3c1d17e4 Promote -Wcast-function-type to an error in builds. (#46356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46356

Adding the flag `-Werror=cast-function-type` to ensure we don't allow
any invalid casts (ex: PyCFunction casts).

For more details see: https://github.com/pytorch/pytorch/issues/45419
ghstack-source-id: 114632980

Test Plan: waitforbuildbot

Reviewed By: albanD

Differential Revision: D24319759

fbshipit-source-id: 26ce4650c220e8e9dd3550245f214c7e6c21a5dc
2020-10-20 18:09:06 -07:00
Tao Xu
495070b388 [Metal] Add the Python binding for optimize_for_mobile (#46456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46456

Add the python binding in CMake. The general workflow is

- Build pytorch -  `USE_PYTORCH_METAL=ON python setup.py install --cmake`
- Run optimize_for_mobile

```
import torch
from torch.utils.mobile_optimizer import optimize_for_mobile

scripted_model = torch.jit.load('./mobilenetv2.pt')
optimized_model = optimize_for_mobile(scripted_model, backend='metal')
torch.jit.export_opnames(optimized_model)
torch.jit.save(optimized_model, './mobilenetv2_metal.bc')
```
The exported ops are

```
['aten::adaptive_avg_pool2d', 'aten::add.Tensor', 'aten::addmm', 'aten::reshape', 'aten::size.int', 'metal::copy_to_host', 'metal_prepack::conv2d_run']
```
ghstack-source-id: 114559878

Test Plan:
- Sandcastle CI
- Circle CI

Reviewed By: kimishpatel

Differential Revision: D24356768

fbshipit-source-id: fb5c4c4b6316347b67edb4132da044a81470ddfd
2020-10-17 10:26:25 -07:00
Tao Xu
04e5fcc0ed [GPU] Introduce USE_PYTORCH_METAL (#46383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46383

The old `USE_METAL` is actually being used by Caffe2. Here we introduce a new macro to enable metal in pytorch.
ghstack-source-id: 114499392

Test Plan:
- Circle CI
- The Person Segmentation model works

Reviewed By: linbinyu

Differential Revision: D24322018

fbshipit-source-id: 4e5548afba426b49f314366d89b18ba0c7e745ca
2020-10-16 18:19:32 -07:00
Michael Ranieri
b1d24dded1 make a way to disable callgrind (#46116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46116

Ideally I would just use one of the existing preprocessor flags such as `FBCODE_CAFFE2`, but this implies a whole bunch of other things elsewhere, so it is not really a solution for ovrsource.

Test Plan: CI green, we are able to disable it internally with `-DNVALGRIND`

Reviewed By: malfet

Differential Revision: D24227360

fbshipit-source-id: 24a3b393cf46d6a16acca0a9ec52610d4bb8704f
2020-10-13 16:18:04 -07:00
Tao Xu
a277c097ac [iOS][GPU] Add Metal/MPSCNN support on iOS (#46112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112

### Summary

This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta.

allow-large-files

- Users API

```
  auto module = torch::jit::load(model);
  module.eval();
  at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal();
  auto output = module.forward({input}).toTensor().cpu();
```
- Supported Models
    - Person Segmentation v106 (FB Internal)
    - Mobilenetv2

- Supported Operators
    - aten::conv2d
    - aten::addmm
    - aten::add.Tensor
    - aten::sub.Tensor
    - aten::mul.Tensor
    - aten::relu
    - aten::hardtanh
    - aten::hardtanh_
    - aten::sigmoid
    - aten::max_pool2d
    - aten::adaptive_avg_pool2d
    - aten::reshape
    - aten::t
    - aten::view
    - aten::log_softmax.int
    - aten::upsample_nearest2d.vec

- Supported Devices
    - Apple A9 and above
    - iOS 10.2 and above

- CMake scripts
    - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON`

### Test Plan

- Circle CI

ghstack-source-id: 114155638

Test Plan:
1. Sandcastle CI
2. Circle CI

Reviewed By: dreiss

Differential Revision: D23236555

fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625
2020-10-13 01:46:56 -07:00
gunandrose4u
ffd50b8220 SET USE_DISTRIBUTED OFF when libuv is not installed (#45554)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45554

Reviewed By: izdeby

Differential Revision: D24016825

Pulled By: mrshenli

fbshipit-source-id: 332d860429626a915c06f98cad31e6db1cbc4eb1
2020-09-30 12:46:36 -07:00
gunandrose4u
0a38aed025 Auto set libuv_ROOT env var for Gloo submodule on Windows platform (#45484)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45484

Reviewed By: lw

Differential Revision: D23990724

Pulled By: mrshenli

fbshipit-source-id: 1987ce7eb7d3f9d3120c07e954cd6581cd3caf59
2020-09-29 08:58:56 -07:00
gunandrose4u
f07ac6a004 Fix Windows build failure after DDP PR merged (#45335)
Summary:
Fixes #{issue number}
This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335

Reviewed By: zou3519

Differential Revision: D23931471

Pulled By: mrshenli

fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494
2020-09-25 12:37:50 -07:00
Mike Ruberry
103fa3894a Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only
Test Plan: revert-hammer

Differential Revision:
D23841786 (0122299f9b)

Original commit changeset: 334ba1ed73ef

fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f
2020-09-24 22:44:33 -07:00
gunandrose4u
0122299f9b Enable distributed package on windows, Gloo backend supported only (#42897)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42095

For test case part will be committed to this PR later

mrshenli, please help to review

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897

Reviewed By: osalpekar

Differential Revision: D23841786

Pulled By: mrshenli

fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3
2020-09-24 21:13:55 -07:00
Ivan Kobzarev
6debe825be [vulkan] glsl shaders relaxed precision mode to cmake option (#43076)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43076

Test Plan: Imported from OSS

Reviewed By: AshkanAliabadi

Differential Revision: D23143354

Pulled By: IvanKobzarev

fbshipit-source-id: 7b3ead1e63cf8acf6e8e547080a8ead7a2db994b
2020-09-16 12:51:34 -07:00
peter
ed862d3682 Split CUDA_NVCC_FLAGS by space (#44603)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44599

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603

Reviewed By: albanD

Differential Revision: D23692320

Pulled By: ezyang

fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754
2020-09-14 20:25:37 -07:00
Marcin Juszkiewicz
e261e0953e Fix centos8 gcc (#44644)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44198 properly this time

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44644

Reviewed By: albanD

Differential Revision: D23684909

Pulled By: malfet

fbshipit-source-id: cea6f6e2ae28138f6b93a6513d1abd36d14ae573
2020-09-14 12:28:09 -07:00
Marcin Juszkiewicz
566b8d0650 handle missing NEON vst1_*_x2 intrinsics (#44198) (#44199)
Summary:
CentOS 8 on AArch64 has vld1_* intrinsics but lacks vst1q_f32_x2 one.

This patch checks for it and handle it separately to vld1_* ones.

Fixes https://github.com/pytorch/pytorch/issues/44198

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44199

Reviewed By: seemethere

Differential Revision: D23641273

Pulled By: malfet

fbshipit-source-id: c2053c8e0427705eaeeeb82ec030925bff22623a
2020-09-11 16:02:44 -07:00
Yujun
db24c5c582 Change code coverage option name (#43999)
Summary:
According to [documentation](https://github.com/pytorch/pytorch/blob/master/tools/setup_helpers/cmake.py#L265), only options starts with `BUILD_` / `USE_` / `CMAKE_` in `CMakeLists.txt` can be imported by environment variables.

 ---
This diff is originally intended to enable  `c++` source coverage with `CircleCI` and `codecov.io`, but we will finish it in the future. You can find the related information in the diff history. Following is the originally procedur:

Based on [this pull request](1bda5e480c), life becomes much easier for this time.
1.in `build.sh`
- Enable coverage builld option for c++
- `apt-get install lcov`

2.in `test.sh`
- run `lcov`

3.in `pytorch-job-specs.yml`
- copy coverage.info to `test/` folder and upload it to codecov.io

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43999

Test Plan: Test on github

Reviewed By: malfet

Differential Revision: D23464656

Pulled By: scintiller

fbshipit-source-id: b2365691f04681d25ba5c00293fbcafe8e8e0745
2020-09-11 15:55:05 -07:00
Bram Wasti
6512032699 [Static Runtime] Add OSS build for static runtime benchmarks (#43881)
Summary:
Adds CMake option.  Build with:

```
BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43881

Reviewed By: hlu1

Differential Revision: D23430708

Pulled By: bwasti

fbshipit-source-id: a39bf54e8d4d044a4a3e4273a5b9a887daa033ec
2020-09-02 08:00:18 -07:00
Sebastian Pop
c259146477 add missing NEON {vld1,vst1}_*_x2 intrinsics (#43683)
Summary:
Workaround for issue https://github.com/pytorch/pytorch/issues/43265.
Add the missing intrinsics until gcc-7 gets the missing patches backported.

Fixes https://github.com/pytorch/pytorch/issues/43265.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43683

Reviewed By: albanD

Differential Revision: D23467867

Pulled By: malfet

fbshipit-source-id: 7c138dd3de3c45852a60f2cfe8b4d7f7cf76bc7e
2020-09-01 21:19:39 -07:00
Rong Rong
8ca3913f47 Introduce BUILD_CAFFE2 flag (#43673)
Summary:
introduce BUILD_CAFFE2 flag. default to `ON`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43673

Reviewed By: malfet

Differential Revision: D23381035

Pulled By: walterddr

fbshipit-source-id: 1f4582987fa0c4a911f0b18d311c04fdbf8dd8f0
2020-09-01 10:18:23 -07:00
Jiakai Liu
3a0e35c9f2 [pytorch] deprecate static dispatch (#43564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564

Static dispatch was originally introduced for mobile selective build.

Since we have added selective build support for dynamic dispatch and
tested it in FB production for months, we can deprecate static dispatch
to reduce the complexity of the codebase.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23324452

Pulled By: ljk53

fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7
2020-08-27 14:52:48 -07:00
Ann Shan
0dc41ff465 [pytorch] add flag for autograd ops to mobile builds (#43154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43154

Adds the build flag `BUILD_MOBILE_AUTOGRAD` which toggles whether autograd files should be included for a PyTorch mobile build (default off).
ghstack-source-id: 110369406

Test Plan: CI

Reviewed By: ljk53

Differential Revision: D23061913

fbshipit-source-id: bc3d6683ab17f158990d83e4fae0a011d5adeca1
2020-08-20 12:39:55 -07:00
Xiang Gao
ee74c2e5be Compress fatbin to fit into 32bit indexing (#43074)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39968

tested with `TORCH_CUDA_ARCH_LIST='3.5 5.2 6.0 6.1 7.0 7.5 8.0+PTX'`, before this PR, it was failing, and with this  PR, the build succeed.

With `TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0+PTX'`, `libtorch_cuda.so` with symbols changes from 2.9GB -> 2.2GB

cc: ptrblck mcarilli jjsjann123

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43074

Reviewed By: mrshenli

Differential Revision: D23176095

Pulled By: malfet

fbshipit-source-id: 7b3e6d049fc080e519f21e80df05ef68e7bea57e
2020-08-18 09:48:54 -07:00
Nikita Shulga
0cf4a5bccb Add GCC codecoverage flags (#43066)
Summary:
Rename `CLANG_CODE_COVERAGE` option to `CODE_COVERAGE` and add compiler specific flags for GCC and Clang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43066

Reviewed By: scintiller

Differential Revision: D23137488

Pulled By: malfet

fbshipit-source-id: a89570469692f878d84f7da6f9d5dc01df423e80
2020-08-14 17:16:18 -07:00
Nikita Shulga
ea65a56854 Use string(APPEND FOO " bar") instead of `set(FOO "${FOO} bar") (#42844)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42844

Reviewed By: scintiller

Differential Revision: D23067577

Pulled By: malfet

fbshipit-source-id: e4380ce02fd6aca37c955a7bc24435222c5d8b19
2020-08-12 10:33:11 -07:00
Yujun Zhao
7524699d58 Modify clang code coverage to CMakeList.txt (for MacOS) (#42837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42837

Originally we use
```
list(APPEND CMAKE_C_FLAGS  -fprofile-instr-generate -fcoverage-mapping)
list(APPEND CMAKE_CXX_FLAGS  -fprofile-instr-generate -fcoverage-mapping)
```
But when compile project on mac with Coverage On, it has the error:
`clang: error: no input files
/bin/sh: -fprofile-instr-generate: command not found
/bin/sh: -fcoverage-mapping: command not found`

The reason behind it, is `list(APPEND CMAKE_CXX_FLAGS` will add an additional `;` to the variable. This means, if we do `list(APPEND foo a)` and then `list(APPEND foo b)`, then `foo` will be `a;b` -- with the additional `;`. Since we have `CMAKE_CXX_FLAGS` defined before in the `CMakeList.txt`, we can only use `set(...)` here
After changing it to
```
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fprofile-instr-generate -fcoverage-mapping")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fprofile-instr-generate -fcoverage-mapping")
```
Test successufully in local mac machine.

Test Plan: Test locally on mac machine

Reviewed By: malfet

Differential Revision: D23043057

fbshipit-source-id: ff6f4891b35b7f005861ee2f8e4c550c997fe961
2020-08-11 09:57:55 -07:00
Khalid Almufti
b282297559 Replace whitelist with allowlist (#42067)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41757

I've replaced all the whitelist with allowlist for this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067

Reviewed By: pbelevich

Differential Revision: D22791690

Pulled By: malfet

fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4
2020-07-28 08:01:16 -07:00
Edward Yang
befb22790f Fix a number of deprecation warnings (#40179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40179

- Pass no-psabi to shut up GCC about # Suppress "The ABI for passing
  parameters with 64-byte alignment has changed in GCC 4.6"
- Fix use of deprecated data() accessor (and minor optimization: hoist
  accessor out of loop)
- Undeprecate NetDef.num_workers, no one is serious about fixing these
- Suppress warnings about deprecated pthreadpool types

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D22234138

Pulled By: ezyang

fbshipit-source-id: 6a1601b6d7551a7e6487a44ae65b19acdcb7b849
2020-07-14 09:11:34 -07:00
Kimish Patel
d6feb6141f [Vec256][neon] Add neon backend for vec256 (#39341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341

This PR introduces neon backend for vec256 class for float datatype.
For now only aarch64 is enabled due to few issues with enabling in
aarch32 bit.

Test Plan:
vec256_test

Imported from OSS

Differential Revision: D21822399

fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d
2020-07-09 16:25:09 -07:00
Kimish Patel
bddba1e336 Add benchmark for add op. (#40059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059

This benchmark is added specifically for mobile to see if compiler is
autovectorizing and thus we have no advantage of neon backend for vec256
for add op.

Test Plan:
CI

Imported from OSS

Differential Revision: D22055146

fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5
2020-07-09 16:22:55 -07:00
Yujun Zhao
22f940b7bd add clang code coverage compile flags (#41103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41103

add a CLANG_CODE_COVERAGE option to CMakeList. If the option is ON, add code coverage needed compile flags.

Test Plan:
Clone pytorch source code to local, modified these changes and builded it with `CLANG_CODE_COVERAGE ON` and `BUILD_TESTS ON`.  Run a manual test and attach code coverage report.

{F243609020}

Reviewed By: malfet

Differential Revision: D22422513

fbshipit-source-id: 27a31395c31b5b5f4b72523954722771d8f61080
2020-07-09 14:14:18 -07:00
David Reiss
b7e044f0e5 Re-apply PyTorch pthreadpool changes
Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.

Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`

Reviewed By: xcheng16

Differential Revision: D22199952

fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
2020-06-23 19:26:21 -07:00
Kate Mormysh
92d3182c11 Revert D21232894: Unify PyTorch mobile's threadpool usage.
Test Plan: revert-hammer

Differential Revision:
D21232894 (b9d3869df3)

Original commit changeset: 8b3de86247fb

fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd
2020-06-23 17:09:14 -07:00
Ashkan Aliabadi
b9d3869df3 Unify PyTorch mobile's threadpool usage. (#37243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243

*** Why ***

As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool.  Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version.

The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point.  That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks.  With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene.  As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands.

This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2.  Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell.

So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do.

The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene.  This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the
exact same third party implementation in this PR.

Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well.  The implementation of ATen parallel_for on non-mobile builds remains unchanged.

*** How ***

This is where things get tricky.

A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use.

pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR.  This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation.  In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in.  Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try.  I am heavily relying on CI to find any issues as local testing can only go that far.

Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration.  This simplifies the logic at the cost of pushing the complexity to the build scripts.  From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration.

When it is all said or done, the layering will look like this:

a) aten::parallel_for, uses
b) caffe2::PThreadPool, which uses
c) pthreadpool C API, which delegates to
    c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here.
    c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to
    c-2-1) caffe2::ThreadPool, and the rabbit hole ends here.

NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b).

Differential Revision: D21232894

Test Plan: Imported from OSS

Reviewed By: dreiss

Pulled By: AshkanAliabadi

fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354
2020-06-23 16:34:51 -07:00
Ivan Kobzarev
74a2cb87e3 [android][cmake] Remove NO_EXPORT for libtorch mobile build (#39584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39584

Removing `-DNO_EXPORT` for not-custom-build to be able to link to C10/A10 api.
Custom build stays the same as its main goal is to have minimum binary size, while export api functions will increase it.

Additional changes:

1. aten/src/ATen/DynamicLibrary.cpp uses libdl, if we need this functionality we will need to link result with libdl, but currently disabling this functionality for mobile.

Test Plan: Imported from OSS

Differential Revision: D22111600

Pulled By: IvanKobzarev

fbshipit-source-id: d730201c55f543c959a596b34be532aecee6b9ab
2020-06-18 11:48:53 -07:00
peter
0f39ed86a7 Cleanup debug info switches with MSVC (#39703)
Summary:
Switch off `/Z7` so that we don't generate debug info in Release and MinSizeRel builds, so that we will probably get smaller static libraries and object files and faster build time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39703

Differential Revision: D21960684

Pulled By: ezyang

fbshipit-source-id: 909a237a138183591d667885b13fc311470eed65
2020-06-09 14:11:40 -07:00
Hong Xu
89c0efb30b Also set CMAKE_C_STANDARD for MSVC (#39304)
Summary:
According to
<https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/Compiler/MSVC-C.cmake>,
the option simply has no effect for MSVC as of today. It is better to not impose
such an if condition as it is a bit misleading (the current code makes it look like we have compatibility issues with MSVC C11 support), and also it's better to
leave the judgment of MSVC C support to CMake devs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39304

Differential Revision: D21846032

Pulled By: malfet

fbshipit-source-id: 962e5721da3d7b9be4117b42bdc35df426b7da7b
2020-06-02 13:59:07 -07:00
Ivan Kobzarev
b460465a18 [Mobile GPU][Integration] Vulkan backend integration (#36491)
Summary:
This PR contains the initial version of Vulkan (GPU) Backend integration.
The primary target environment is Android, but the desktop build is also supported.

## CMake
Introducing three cmake options:
USE_VULKAN:
The main switch, if it is off, all other options do not affect.
USE_VULKAN_WRAPPER:
ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h.
OFF - linking with libvulkan.so directly
USE_VULKAN_SHADERC_RUNTIME:
ON - Shader compilation library will be linked, and shaders will be compiled runtime.
OFF - Shaders will be precompiled and shader compilation library is not included.

## Codegen
if `USE_VULKAN_SHADERC_RUNTIME` is ON:
Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp.
if `USE_VULKAN_SHADERC_RUNTIME` is OFF:
The source of shaders is included as `glsl.h`,`glsl.cpp`.

All codegen results happen in the build directory.

## Build dependencies
cmake/Dependencies.cmake
If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK.
Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it.
(Desktop build was tested only on Linux).

## Pytorch integration:
Adding 'Vulkan" as new Backend, DispatchKey, DeviceType.
We are using Strided layout without supporting strides at the moment, but we plan to support them in the future.
Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor,
more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h`

Main code location: `aten/src/ATen/native/vulkan`
`aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor.

`aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops.

`aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API

## GLSL shaders
Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files.
All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3

## Supported operations
Code point:
conv2d no-groups
conv2d depthwise
addmm
upsample nearest 2d
clamp
hardtanh

## Testing
`aten/src/ATen/test/vulkan_test.cpp` - contains tests for
copy from CPU to Vulkan and back
all supported operations
Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader

## Vulkan execution
The initial implementation is trivial and waits every operator's execution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491

Differential Revision: D21696709

Pulled By: IvanKobzarev

fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa
2020-05-26 08:30:13 -07:00
Gregory Chanan
b27be3e0c5 Avoid double dispatch in logical_not for compilation speed reasons. (#38565)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38565

Also note this turns on "-Wno-unused-local-typedefs" because we are using dispatch macros for error checking.

Test Plan: Imported from OSS

Differential Revision: D21598478

Pulled By: gchanan

fbshipit-source-id: 28f9ad01bd678df0601a10d0daf3ed31c47c4ab2
2020-05-18 09:25:54 -07:00
Nikita Shulga
dc918162b7 Remove Caffe2_MAIN_LIBS (#38408)
Summary:
Right now it is an unused alias to `torch_library` interface library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38408

Differential Revision: D21598250

Pulled By: malfet

fbshipit-source-id: ec9a2446b94e7ea68298831212005c2c80bbc95c
2020-05-15 12:27:15 -07:00
Wojciech Baranowski
945672bf3e cmake: improve dependencies in incremental builds (#37661)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26304

Test procedure:
With ninja:
[x] Build a clean checkout
[x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s.
[x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding.
[x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files
[x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding.
[x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s.

Without ninja:
[x] Build a clean checkout
[x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change.
[x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661

Differential Revision: D21434624

Pulled By: ezyang

fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338
2020-05-06 14:25:18 -07:00
Brian Vaughan
d4edbbd396 Revert D21369541: Make a separate cmake option for caffe2 tests
Test Plan: revert-hammer

Differential Revision:
D21369541

Original commit changeset: 669cff70c5b5

fbshipit-source-id: 500d261eaf3f02bcd698d343480b9e951e2844b9
2020-05-05 06:30:52 -07:00
Michael Suo
aff92ef3d6 Make a separate cmake option for caffe2 tests (#37721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37721

Even though we disabled caffe2 test configs in Python, the BUILD_TEST
option was still building caffe2 test cpp binaries and various CI
configurations were running them (since they just run every binary in
`torch/test`).

This PR adds a caffe2-specific BUILD_TEST option (BUILD_CAFFE2_TEST),
which defaults to OFF, and gates the compilation of caffe2 test cpp
binaries under it.

Test Plan: Imported from OSS

Differential Revision: D21369541

Pulled By: suo

fbshipit-source-id: 669cff70c5b53f016e8e016bcb3a99bf3617e1f9
2020-05-04 23:26:27 -07:00
Lucas Hosseini
8a30553738 [TensorPipe/RPC] Add TensorPipe dependency (#36695)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36695

Reviewed By: lw

Differential Revision: D21312297

Pulled By: beauby

fbshipit-source-id: 39fdc3de91efa4ac97dd169f09fb304b273b0050
2020-04-30 11:05:15 -07:00
Mo Zhou
69e2f1aaff [cmake] add HAVE_SOVERSION option (default=OFF). (#37502)
Summary:
This is useful for linux distributions when the ABI/API of libtorch has
been changed. The default SOVERSION is set to
"${TORCH_VERSION_MAJOR}.${TORCH_VERSION_MINOR}".

ezyang

But if the release strategy of pytorch/caffe2 involves avoiding breaking API/ABI changes to libtorch for minor/patch releases, then we can set `TORCH_SOVERSION` to simply `TORCH_VERSION_MAJOR`. Please confirm that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37502

Differential Revision: D21303565

Pulled By: ezyang

fbshipit-source-id: 798f5ec7fc5f0431ff1a7f9e8e5d3a0d3b25bb22
2020-04-30 06:52:33 -07:00
Mo Zhou
58a46a174e [cmake] add USE_SYSTEM_{XNNPACK,ONNX} options. (#37501)
Summary:
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37501

Differential Revision: D21303527

Pulled By: ezyang

fbshipit-source-id: 58353d78c66e5bcc9198ce8cde36ac7232bb4b2f
2020-04-29 09:26:16 -07:00
peter
c5d6f59ab1 Replacing EHa with EHsc (#37235)
Summary:
We should not rely on the async exceptions. Catching C++ only exception is more sensible and may get a boost in both space (1163 MB -> 1073 MB, 0.92x) and performance(51m -> 49m, 0.96x).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37235

Differential Revision: D21256918

Pulled By: ezyang

fbshipit-source-id: 572ee96f2e4c48ad13f83409e4e113483b3a457a
2020-04-28 08:20:37 -07:00
Mo Zhou
5b9f7f7b0e [cmake] Add USE_SYSTEM_{GLOO,FP16,PTHREADPOOL,PSIMD,FXDIV,BENCHMARK} options (#14699) (#37277)
Summary:
These options are disabled by default, and are supposed to be used by
linux distro developers. With the existing shortcut option
USE_SYSTEM_LIBS toggled, these new options will be enabled as well.

Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should
no longer check the existence of git submodules.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277

Differential Revision: D21256999

Pulled By: ezyang

fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf
2020-04-27 09:37:27 -07:00
Mo Zhou
ff21b15624 cmake: add USE_SYSTEM_{LIBS,CPUINFO,SLEEF} options (#14699) (#37137)
Summary:
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37137

Differential Revision: D21222632

Pulled By: ezyang

fbshipit-source-id: 47624b30f8d07b31a40a26edf665bbec39e45202
2020-04-23 20:43:36 -07:00