Commit Graph

639 Commits

Author SHA1 Message Date
Xiang Gao
c66ca74f03 Add device debug info to CUDA build (#31929)
Summary:
Also print NVCC flags in the summary
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31929

Differential Revision: D19312079

Pulled By: ezyang

fbshipit-source-id: cd20d5a385f61174c1907a9ad883c04de66ef037
2020-01-08 09:56:20 -08:00
Edward Yang
a9dae70bae Remove LibIRC logic from cmake. (#31152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31152

Per apaszke: I can't find any reasonable references to libIRC online, so
I decided to remove this.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262582

Pulled By: ezyang

fbshipit-source-id: a1d47462427a3e0ca469062321d608e0badf8548
2020-01-06 14:39:43 -08:00
Hong Xu
daf00beaba Remove duplicated Numa detection code. (#30628)
Summary:
cmake/Dependencies.cmake (1111a6b810/cmake/Dependencies.cmake (L595-L609)) has already detected Numa. Duplicated detection and variables may lead to
incorrect results.

Close https://github.com/pytorch/pytorch/issues/29968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628

Differential Revision: D18782479

Pulled By: ezyang

fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0
2020-01-03 08:48:46 -08:00
Sebastian Messmer
5554e5b793 Docs: c++11 -> c++14 (#30530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30530

Switch some mentions of "C++11" in the docs to "C++14"
ghstack-source-id: 95812049

Test Plan: testinprod

Differential Revision: D18733733

fbshipit-source-id: b9d0490eb3f72bad974d134bbe9eb563f6bc8775
2019-12-17 14:09:02 -08:00
Peter Bell
7cb83bea3b Fix static cuda builds on older cmake versions (#30935)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/28378#issuecomment-562597033

To reproduce the failure I had to downgrade to `cmake 3.9` (Ubuntu 18 uses 3.10 apparently). These older `cmake` versions unfortunately don't seem to allow `target_link_libraries(INTERFACE)` to be used with imported libraries. Switching back to `set_property(TARGET)` fixes the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30935

Differential Revision: D18956912

Pulled By: albanD

fbshipit-source-id: a2b728ee3268599a428b7878c988e1edef5d9dda
2019-12-14 20:29:27 -08:00
Richard Zou
9047d4df45 Remove all remaining usages of BUILD_NAMEDTENSOR (#31116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116

Changelist:
- remove BUILD_NAMEDTENSOR macro
- remove torch._C._BUILD_NAMEDTENSOR
- remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR

Future:
- In the next diff, I will remove all usages of
ATen/core/EnableNamedTensor.h since that header doesn't do anything
anymore
- After that, we'll be done with the BUILD_NAMEDTENSOR removal.

Test Plan: - run CI

Differential Revision: D18934951

Pulled By: zou3519

fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d
2019-12-12 09:53:03 -08:00
Jiakai Liu
bf1b4b6fef add torch_cpu to the static library list in TorchConfig.cmake.in (#30769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30769

The TorchConfig.cmake is the public cmake we produce in install folder for
3rd party client code to get all libtorch dependencies easily.

Apparently this build flow is not well covered by our CI (which is focused
on 1st party build / shared libraries?) as the little dummy project for
code analysis testing purpose was broken by #30315 without fail any CI.

Fixed the problem for mobile build and add the dummy project build to mobile
CI as well.

Test Plan: - make sure new CI pass;

Differential Revision: D18825054

Pulled By: ljk53

fbshipit-source-id: 80506f3875ffbc1a191154bb9e3621c621e08b12
2019-12-05 11:13:32 -08:00
Nathan Goldbaum
1f1ce53e8e Don't install pybind11 header directory for system pybind11 installs (#30758)
Summary:
For system pybind11 installs this is a system header location that should not get installed since it might include other unrelated headers. Since the header is already installed for a system install there's no need to install the headers, so only do the install when we use the bundled pybind11 version.

Closes https://github.com/pytorch/pytorch/issues/29823. Closes https://github.com/pytorch/pytorch/issues/30627.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30758

Differential Revision: D18820189

Pulled By: bddppq

fbshipit-source-id: fcc9fa657897e18c07da090752c912e3be513b17
2019-12-04 16:43:21 -08:00
Edward Yang
38986e1dea Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.

Some things of note:

* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18790941

Pulled By: ezyang

fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
2019-12-04 08:04:57 -08:00
Sebastian Messmer
bc2e6d10fa Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14"
Summary: Original commit changeset: 775d2e29be0b

Test Plan: CI

Reviewed By: mruberry

Differential Revision: D18775520

fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac
2019-12-03 14:33:43 -08:00
Sebastian Messmer
a2ed50c920 Revert D17908478: Switch PyTorch/Caffe2 to C++14
Test Plan: revert-hammer

Differential Revision:
D17908478

Original commit changeset: 6e340024591e

fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d
2019-11-27 14:57:05 -08:00
Sebastian Messmer
d0acc9c085 Switch PyTorch/Caffe2 to C++14 (#30406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406

ghstack-source-id: 94642238

Test Plan: waitforsandcastle

Differential Revision: D17908478

fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb
2019-11-27 10:47:31 -08:00
Jiakai Liu
43fb0015db custom build script (#30144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144

Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.

Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.

Test Steps:
1. Build "dump_operator_names" binary and use it to dump root ops needed
by a specific model:
```
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
```

2. The MobileNetV2 model should use the following ops:
```
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm
```
NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.

3. Run custom build script locally (use Android as an example):
```
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```

4. Checkout demo app that uses locally built library instead of
downloading from jcenter repo:
```
git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git
```

5. Copy locally built libraries to demo app folder:
```
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
```

6. Build demo app with locally built libtorch:
```
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
```

7. Install and run the demo app.

In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.

Test Plan: Imported from OSS

Differential Revision: D18612127

Pulled By: ljk53

fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a
2019-11-20 13:16:02 -08:00
Edward Yang
b0309d1b5b More documentation on caffe2_interface_library (#29903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29903

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18616888

Pulled By: ezyang

fbshipit-source-id: 360760a688dcc8ba117cd79d89db2afb2c35ab27
2019-11-20 08:58:01 -08:00
Daya Khudia
79b797ccac Build time warning on windows for fbgemm (#29062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29062

Build time warning
ghstack-source-id: 94202405

Test Plan: None

Reviewed By: jianyuh

Differential Revision: D18279505

fbshipit-source-id: 873cdeb848d34849d6babc435b1a42171f0609a3
2019-11-19 14:30:20 -08:00
Edward Yang
0e5200adfe Refactor target_compile_options into torch_compile_options (#29730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29730

Back in the day, Caffe2 had a good idea: instead of spattering
target_compile_options all over the codebase, define a helper
function which sets all the options for a target.  This is especially
helpful if I want to split libtorch.so into libtorch_cpu.so
and libtorch_cuda.so; I need a way to easily apply options
to multiple targets.  A shared helper function is just the ticket.

I moved every target_compile_options call in caffe2/CMakeLists.txt
that didn't seem target dependent (exclusions included OpenMP flags,
API-related macros, ONNX related macros and HIP flags) into
torch_compile_options.  I slavishly preserved the structure:
there's a nearly redundant WERROR if() in the output but I preserved
it.

There is one thing I don't like about this, which is that now
the compile options are off in a random directory that no one would
expect.  But c'est la vie...

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18571166

Pulled By: ezyang

fbshipit-source-id: 21cd5f7663485077600782078fbb1787fab09035
2019-11-18 07:05:48 -08:00
David Reiss
d22f61432d Update fbjni and enable PyTorch JNI build
Summary:
- Add a "BUILD_JNI" option that enables building PyTorch JNI bindings and
  fbjni.  This is off by default because it adds a dependency on jni.h.
- Update to the latest fbjni so we can inhibit building its tests,
  because they depend on gtest.
- Set JAVA_HOME and BUILD_JNI in Linux binary build configurations if we
  can find jni.h in Docker.

Test Plan:
- Built on dev server.
- Verified that libpytorch_jni links after libtorch when both are built
  in a parallel build.

Differential Revision: D18536828

fbshipit-source-id: 19cb3be8298d3619352d02bb9446ab802c27ec66
2019-11-15 13:59:44 -08:00
Jiakai Liu
b508de6412 add static libraries to TorchConfig.cmake.in (#29837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29837

The current TorchConfig seems only handles shared libraries. When
building static libraries it doesn't provide the list of all needed
static libraries. This is especially a problem for mobile build as we
build static libraries first then link into shared library / binary to
do "gc-sections". Today we have to manually import these dependent
libraries on each callsite.

Test Plan:
- build_mobile.sh builds and runs;
- The baby test project in #29716 builds and runs;
- Will check CI for other platforms;

Differential Revision: D18513404

Pulled By: ljk53

fbshipit-source-id: c3dc2c01004c4c9c4574c71fd9a4253c9e19e1e9
2019-11-14 20:41:33 -08:00
Junjie Bai
b0c245d52d Consolidate the places that find pybind11 include dirs (#29659)
Summary:
Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659

Differential Revision: D18458208

Pulled By: bddppq

fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d
2019-11-12 14:51:56 -08:00
Junjie Bai
f111f1b1a7 Suppress implicit int-float conversion warning in ROCm build (#29604)
Summary:
```
c10/util/Half.h:467:37: warning: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-int-float-conversion]
  return f < limit::lowest() || f > limit::max();
                                  ~ ^~~~~~~~~~~~
c10/util/Half.h:497:41: note: in instantiation of function template specialization 'c10::overflows<long, double>' requested here
  if (!std::is_same<To, bool>::value && overflows<To, From>(f)) {
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29604

Differential Revision: D18440713

Pulled By: bddppq

fbshipit-source-id: f059b4e37e90fa84308be52ff5e1070ffd04031e
2019-11-12 10:44:28 -08:00
Hong Xu
21d11e0b64 FindCUDA: Use find_program instead of find_path to find nvcc (#29160)
Summary:
Otherwise nvcc is not found if it is in env PATH but a non-standard
location.

Import from my patch for CMake:
https://gitlab.kitware.com/cmake/cmake/merge_requests/3990

Although we currently do nvcc search in a Python script, it will be removed soon in https://github.com/pytorch/pytorch/issues/28617.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29160

Differential Revision: D18326693

Pulled By: ezyang

fbshipit-source-id: dc7ff3f6026f0655386ff685bce7372e2b061a4b
2019-11-05 08:51:35 -08:00
Sergei Nikolaev
1e2049c566 #26426 fixed (#28715)
Summary:
This is the fix for reverted https://github.com/pytorch/pytorch/issues/26426
houseroad bddppq soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28715

Reviewed By: hl475

Differential Revision: D18146731

Pulled By: houseroad

fbshipit-source-id: 247366451a6334e84df82d00339521f797b33130
2019-11-01 12:53:01 -07:00
Adam J. Stewart
4cf7277d62 Explain how to specify library location for MKL (#28779)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24334.

I'm still kind of confused why `FindMKL.cmake` was unable to locate my MKL libraries. They are in the standard `/opt/intel/mkl` installation prefix on macOS. But at least with this more detailed error message, it will be easier for people to figure out how to fix the problem.

zhangguanheng66 xkszltl soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28779

Differential Revision: D18170998

Pulled By: soumith

fbshipit-source-id: 47e61baadd84c758267dca566eb1fb8a081de92f
2019-10-28 08:00:54 -07:00
Junjie Bai
d37c2d7c8d Revert D17495965: TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test
Test Plan: revert-hammer

Differential Revision:
D17495965

Original commit changeset: 3e8dbe8943f5

fbshipit-source-id: d47fcbec22b0d61df41d7dbf15cfdde196ac818f
2019-10-25 13:58:16 -07:00
Sergei Nikolaev
4996e3aca2 TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test (#26426)
Summary:
This PR makes Caffe2 compatible with TensorRT 6. To make sure it works well, new unit test is added. This test checks PyTorch->ONNX->TRT6 inference flow for all classification models from TorhchVision Zoo.
Note on CMake changes: it has to be done in order to import onnx-tensorrt project. See https://github.com/pytorch/pytorch/issues/18524 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26426

Reviewed By: hl475

Differential Revision: D17495965

Pulled By: houseroad

fbshipit-source-id: 3e8dbe8943f5a28a51368fd5686c8d6e86e7f693
2019-10-25 13:01:57 -07:00
Peter Bell
03d24dba6c Fix static linking cuDNN without static CUDA (#28378)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/27887#issuecomment-544649765

The logs show that `USE_STATIC_CUDNN` is used but not `CAFFE2_STATIC_LINK_CUDA`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28378

Differential Revision: D18061841

Pulled By: ezyang

fbshipit-source-id: 3b9b49953094e02f808ff12107ba4226688d9986
2019-10-22 10:08:09 -07:00
Edward Yang
a3902c901a Revert "Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887)" (#28310)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28310

This reverts commit 3d3bff5ff1.

Test Plan: Imported from OSS

Differential Revision: D18042859

Pulled By: ezyang

fbshipit-source-id: cded781dda6fcc04199af6abd07ac09fdc0405de
2019-10-21 14:45:17 -07:00
Peter Bell
3d3bff5ff1 Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15476, supersedes https://github.com/pytorch/pytorch/issues/23496, supersedes and closes https://github.com/pytorch/pytorch/issues/27607

As explained by rgommers in https://github.com/pytorch/pytorch/issues/23496, linking against the expanded library path for `libculibos` in `cmake/Dependencies.cmake` hard codes the path into the distributed cmake files.

Instead, I only link against the targets (e.g. `caffe2::cudnn`) and move the  dependency on `libculibos` into the cuda import targets declared in `cmake/public/cuda.cmake`. That file is distributed with the other cmake files and so the variable is expanded on the user's machine. I am now also using `CMAKE_STATIC_LIBRARY_SUFFIX` instead of `.a` to fix the windows issue from https://github.com/pytorch/pytorch/issues/15828.  I don't have a windows setup to confirm though.

Finally, to get pytorch to compile with the extra libraries enabled, I also had to link `__caffe2_nccl` to `torch_python`; otherwise I was getting include errors as the hard coded include directory was wrong. `nccl` is built into `build` not `third_party/build`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27887

Differential Revision: D17929440

Pulled By: ezyang

fbshipit-source-id: 3db6bd94d758fca2e1d6a64f4f5eea03cc07cf64
2019-10-16 09:21:47 -07:00
Edward Yang
0b6186d778 Remove Tensor.h, TensorMethods.h from src/core. (#27086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086

This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past).

This is a commandeer of #25031

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D17687345

Pulled By: ezyang

fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f
2019-10-06 09:37:50 -07:00
Johannes M Dieterich
17c672e704 enable rocTX API (#27416)
Summary:
ROCm 2.9 brings support for the rocTX API through rocTracer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416

Differential Revision: D17777480

Pulled By: bddppq

fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7
2019-10-05 01:55:00 -07:00
Junjie Bai
f4d0d0a811 Enable RCCL in ROCm build (#27383)
Summary:
continues https://github.com/pytorch/pytorch/pull/23884
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383

Differential Revision: D17767248

Pulled By: bddppq

fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90
2019-10-04 17:41:41 -07:00
J M Dieterich
32d009a37f Add gfx908 to the list of per-default compiled architectures. (#27388)
Summary:
ROCm 2.8 added preliminary support for gfx908.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27388

Differential Revision: D17767772

Pulled By: bddppq

fbshipit-source-id: 172daf5bb66d3db86a13e287059af4b9b90a7f57
2019-10-04 14:49:33 -07:00
Mingbo Wan
5379e87a32 Cuda101 upgrade (#26823)
Summary:
test run: https://github.com/pytorch/pytorch/issues/26732
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26823

Reviewed By: soumith

Differential Revision: D17576095

Pulled By: mingbowan

fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b
2019-09-25 14:44:12 -07:00
Hong Xu
5e5cbceeba remove tools/setup_helpers/cudnn.py (#25876)
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.

Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876

Differential Revision: D17346270

Pulled By: ezyang

fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894
2019-09-24 07:44:33 -07:00
jasjuang
e4821012ad prevent generating caffe2::mkldnn for multiple times (#25257)
Summary:
This is a similar problem to https://github.com/pytorch/pytorch/issues/25004. After the merge of https://github.com/pytorch/pytorch/issues/25167, I recompiled torch and discovered another similar bug.

ezyang please take a look
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25257

Differential Revision: D17528116

Pulled By: ezyang

fbshipit-source-id: 1657d9ee6dced3548f246010b05e2b3c25c37dee
2019-09-23 08:53:02 -07:00
Jiakai Liu
d6e3aed032 add eigen blas for mobile build (#26508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26508

Enable BLAS for pytorch mobile build using Eigen BLAS.
It's not most juicy optimization for typical mobile CV models as we are already
using NNPACK/QNNPACK for most ops there. But it's nice to have good fallback
implementation for other ops.

Test Plan:
- Create a simple matrix multiplication script model:
```
import torch

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.weights = torch.ones(1000, 1000)

    def forward(self, x):
        return torch.mm(x, self.weights)

n = Net()
module = torch.jit.trace_module(n, {'forward': torch.ones(1000, 1000)})
module.save('mm.pk')
```

- Before integrate with eigen blas:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch \
--model=mm.pk \
--input_dims="1000,1000" \
--input_type=float \
--warmup=5 \
--iter=5'

Milliseconds per iter: 2218.52.
```

- After integrate with eigen blas:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch_eigen \
--model=mm.pk \
--input_dims="1000,1000" \
--input_type=float \
--warmup=5 \
--iter=5'

Milliseconds per iter: 314.535.
```

- Improve MobileNetV2 single thread perf by ~5%:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch \
--model=mobilenetv2.pk \
--input_dims="1,3,224,224" \
--input_type=float \
--warmup=5 \
--iter=20 \
--print_output=false \
--caffe2_threadpool_force_inline=true'

Milliseconds per iter: 367.055.

adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch_eigen \
--model=mobilenetv2.pk \
--input_dims="1,3,224,224" \
--input_type=float \
--warmup=5 \
--iter=20 \
--print_output=false \
--caffe2_threadpool_force_inline=true'

Milliseconds per iter: 348.77.
```

Differential Revision: D17489587

fbshipit-source-id: efe542db810a900f680da7ec7e60f215f58db66e
2019-09-20 15:45:11 -07:00
Ashkan Aliabadi
dc851ab5d4 Integrate forked QNNPACK into mobile PyTorch builds. (#25844)
Summary:
Enable forked QNNPACK builds in PyTorch mobile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25844

Differential Revision: D17336458

Pulled By: AshkanAliabadi

fbshipit-source-id: 6ea09dd6c114b64313e9159bf7f17253bc87bfdb
2019-09-16 20:50:43 -07:00
Tao Xu
3051e36e05 Remove armv7s build from iOS (#26222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26222

### Summary

The last generation of armv7s device is Phone 5C. As discussed with David offline, we decided not to support iOS armv7s devices.

### Test plan

- CI finishes successfully
- Builds can be run only on X86_64 and arm64 devices

Test Plan: Imported from OSS

Differential Revision: D17385308

Pulled By: xta0

fbshipit-source-id: f883999aed18224ea3386b1f016964a33270fa34
2019-09-14 11:07:37 -07:00
Sebastian Messmer
8321f2592e Register ATen ops with c10 (#26131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131

Changes in this PR:
- For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files.
- This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things.
- Enable the use_c10_dispatcher: True flag for about ~70% of operators
- This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash
- For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops.

Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true):
- `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet
- `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser
- out functions have different argument order in C++ as in the jit schema
- `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
- fixed-size arrays like `int[3]` not supported in c10 yet

These will be fixed in separate diffs and then the exclusion tag will be removed.
ghstack-source-id: 90060748

Test Plan: a diff stacked on top uses these registrations to call these ops from ATen

Differential Revision: D16603131

fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb
2019-09-13 13:52:40 -07:00
Jiakai Liu
075adb4d2d remove pthreadpool.a from install directory (#25977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977

Call add_subdirectory() explicitly before NNPACK/QNNPACK with
EXCLUDE_FROM_ALL property so that pthreadpool target won't be installed
by default for libtorch mobile build.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977

Test Plan: Imported from OSS

Differential Revision: D17312083

Pulled By: ljk53

fbshipit-source-id: 79851d0aa9402c5b9287ef4bbd8d7fd3a341497d
2019-09-11 12:27:56 -07:00
albanD
63df9ffd0b Fix typo in OpenBLAS cmake detection
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25966

Differential Revision: D17315925

Pulled By: albanD

fbshipit-source-id: 55c6b4a1ddeaf95714034ec66a4d59b0f00ba634
2019-09-11 09:10:42 -07:00
Jiakai Liu
74b48f21c1 remove protobuf from Dependencies.cmake for libtorch mobile build (#25958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25958

Should have cleaned up the remaining protobuf dependencies before landing PR #25896.

Test Plan: - CI build;

Reviewed By: dreiss

Differential Revision: D17296949

Pulled By: ljk53

fbshipit-source-id: 20c444e63900c7fa054db3cc757d3f18614af630
2019-09-10 18:23:20 -07:00
Johannes M Dieterich
26675b507f Enable libflame as a LAPACK choice (#25795)
Summary:
libflame is BLIS's companion LAPACK from the FLAME project

Mimicks my ancient
f5bc78263e
in cmake upstream

BLIS WWW: https://github.com/flame/libflame
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25795

Differential Revision: D17286461

Pulled By: bddppq

fbshipit-source-id: 7cd0d27127c78563574791415e4a34f045df30df
2019-09-10 10:34:55 -07:00
Soumith Chintala
73855ecd43 fix cudnn static linkage (#25848)
Summary:
Fix regression caused by https://github.com/pytorch/pytorch/pull/24938

This fixes CUDA nightly breakages
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25848

Differential Revision: D17256348

Pulled By: soumith

fbshipit-source-id: dded577717947d0f092e9d76b423b2bc7c56070a
2019-09-08 21:41:57 -07:00
J M Dieterich
748436a514 Enable BLIS from the FLAME project as a BLAS choice. (#23819)
Summary:
BLIS is AMD's official recommendation for BLAS.

Mimicks my ancient
f5bc78263e
in cmake upstream

BLIS WWW: https://github.com/flame/blis
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23819

Differential Revision: D17231360

Pulled By: bddppq

fbshipit-source-id: 68db70d63e410438f99b2bf57986b81ff6b6c5b3
2019-09-06 12:00:25 -07:00
Jiakai Liu
67c530851c get rid of protobuf dependencies (#25650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650

This PR removes protobuf dependencies from mobile build altogether:
- caffe2/proto: protobuf files, including caffe2.proto and torch.proto;
- caffe2 components that depend on caffe2.proto, including most part of
caffe2/core, caffe2/utils;
- libprotobuf / libprotobuf-lite dependencies;
- protobuf compiler;
- some utils class, e.g.: netdef_converter.cpp;
- introduce a macro to disable third_party/onnx which depends on protobuf;

Test Plan:
- builds;
- link with demo app to make sure it can load and run a model in pickle format;

Differential Revision: D17183548

Pulled By: ljk53

fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531
2019-09-06 08:48:20 -07:00
Hong Xu
cc4211069e Do not pass down USE_GLOO_IBVERBS to CMake (#25720)
Summary:
It doesn't seem to be used anywhere once down to CMake in this repo or any submodules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25720

Differential Revision: D17225088

Pulled By: pietern

fbshipit-source-id: a24b080e6346a203b345e2b834fe095e3b9aece0
2019-09-06 02:40:42 -07:00
Johannes M Dieterich
9c5a899773 Enable jit fusion on ROCm (#22872)
Summary:
As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails
* new hipification rules for API_RTC
* add hiprtc APIs to the shim loader
* update cmake infrastructure to find the hiprtc library (it is part of the HIP package)
* enabling of unit tests in the jit_fuser test set
* special casing in resource strings for HIP - the typedefs CUDA requires would be redundant
* for now disable the occupancy calculation we do not support yet and hard-code

Thanks to t-vi for working with me on getting this integration done!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872

Differential Revision: D17207425

Pulled By: bddppq

fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe
2019-09-05 18:22:08 -07:00
Pieter Noordhuis
3556bea5aa Build torch.distributed with Gloo backend on macOS (#25260)
Summary:
In facebookincubator/gloo#212, a libuv based Gloo transport was introduced,
which allows us to use Gloo on macOS (and later perhaps also Windows). This
commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS.

A few notes:
* The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`.
* The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS).
* The TCP store works but sometimes crashes on process termination.
* The distributed tests are not yet run.
* The nightly builds don't use `USE_DISTRIBUTED=1`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260

Reviewed By: mrshenli

Differential Revision: D17202381

Pulled By: pietern

fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c
2019-09-05 07:09:50 -07:00
James Reed
817f4502fb Dynamic dispatch for optimized quantized op kernels (#25545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25545

This re-uses the infrastructure from ATen/native/cpu, which compiles kernels multiple times for different instruction sets and dispatches dynamically based on the CPU's capability flags at runtime. This ensures we use the most optimal quantized kernel for the given machine

Test Plan: Imported from OSS

Differential Revision: D17166369

Pulled By: jamesr66a

fbshipit-source-id: 8c8393f99365e1408819bbaf254c1b5734a34b70
2019-09-04 13:26:40 -07:00