Commit Graph

625 Commits

Author SHA1 Message Date
Nikita Shulga
bc02255d5e Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows.
Test Plan: revert-hammer

Differential Revision:
D30721329 (7dbc21bc2b)

Original commit changeset: aa1af47df8cc

fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404
2021-09-23 22:14:32 -07:00
Guangyun Han
7dbc21bc2b Enable CUPTI for kineto by default on windows. (#62175)
Summary:
It fix nothing.

For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175

Reviewed By: ezyang

Differential Revision: D30721329

Pulled By: gdankel

fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84
2021-09-23 15:13:47 -07:00
Nick Kreeger
882b67dff4 Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892)
Summary:
The library will no longer link properly on VS 2019 (14.29.30133). To
ensure that engineers building on Windows can use and debug with this
build type, incremental linking needs to be turned off for this build
flag.

Verified that this build type successfully builds, links, and provides
debuggable Python modules on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892

Reviewed By: jbschlosser

Differential Revision: D30902565

Pulled By: malfet

fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b
2021-09-14 09:44:18 -07:00
Peter Bell
e4f44bec27 Fix pocketfft include path in mobile build (#63714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714

PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target,

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30498369

Pulled By: malfet

fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef
2021-08-23 17:48:57 -07:00
driazati
bd8608cd5c Use CMake for breakpad (#63186)
Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.

```python
import torch

# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()

# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186

Reviewed By: malfet, seemethere

Differential Revision: D30318404

Pulled By: driazati

fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc
2021-08-19 10:42:01 -07:00
Nikita Shulga
6e5d065b2b Add pocketfft as submodule (#62841)
Summary:
Using https://github.com/mreineck/pocketfft

Also delete explicit installation of pocketfft during the build as it will be available via submodule

Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5

Partially addresses https://github.com/pytorch/pytorch/issues/62821

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841

Reviewed By: seemethere

Differential Revision: D30140441

Pulled By: malfet

fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825
2021-08-17 15:29:56 -07:00
Kimish Patel
38c185189c [Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419

This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.

Test Plan:
MobileProfiler.ModuleHierarchy

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993660

fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
2021-08-13 21:40:19 -07:00
Pruthvi Madugundu
ab7a472980 [ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786)
Summary:
- HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm.
- TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786

Reviewed By: bdhirsh

Differential Revision: D30281682

Pulled By: seemethere

fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673
2021-08-13 15:00:43 -07:00
Isuru Fernando
b58e04f156 Make sure FindLAPACK finds the same BLAS library (#49647)
Summary:
BLAS library is found by cmake/Dependencies.cmake and then
LAPACK library is found by FindLAPACK.cmake which in turn calls
FindBLAS.cmake. This means that we are searching for BLAS twice
and they might be different things. By setting a few variables,
this can be avoided.

cc seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647

Reviewed By: seemethere, ejguan

Differential Revision: D29943680

Pulled By: malfet

fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59
2021-08-02 20:41:00 -07:00
Can Balioglu
7565039ee9 Support system-provided Intel TBB (#61934)
Summary:
This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic.

Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934

Reviewed By: malfet

Differential Revision: D29805416

Pulled By: cbalioglu

fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd
2021-08-02 07:39:00 -07:00
Hong Xu
7acb8b71e1 Remove AVX detection code that duplicates FindAVX.cmake (#61748)
Summary:
This PR deletes some code in `MiscCheck.cmake` that perform the exact
same functionality as `FindAVX.cmake`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748

Reviewed By: ejguan

Differential Revision: D29791282

Pulled By: malfet

fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213
2021-07-20 14:34:36 -07:00
Tongliang Liao
0afbb9e81e PYTHON_LIBRARY may be set to empty or NOTFOUND. (#61230)
Summary:
Not sure why (maybe from dependencies?) but it can certainly break package lookup upon re-entry of cmake.
So instead of checking whether they are defined, we should check whether there is any meaningful value inside.

Fixes https://github.com/pytorch/pytorch/issues/59887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61230

Reviewed By: H-Huang

Differential Revision: D29668766

Pulled By: malfet

fbshipit-source-id: 79a59578740c4434327aff4f9a22eba9c4bf48d1
2021-07-13 07:09:31 -07:00
Nikita Shulga
4036820506 Add PocketFFT support (#60976)
Summary:
Needed on platforms, that do not have MKL, such as aarch64 and M1
- Add `AT_POCKETFFT_ENABLED()` to Config.h.in
- Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT
- Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL

Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations

Fixes https://github.com/pytorch/pytorch/issues/41592

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976

Reviewed By: walterddr, driazati, janeyx99, samestep

Differential Revision: D29466530

Pulled By: malfet

fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf
2021-06-30 16:28:20 -07:00
Peter Bell
31a884987d Remove some TH includes from ATen (#60323)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60323

Test Plan: Imported from OSS

Reviewed By: malfet, anjali411

Differential Revision: D29252862

Pulled By: ngimel

fbshipit-source-id: 9ea13495d382c04dfd52b8dd63314f53b7e83936
2021-06-22 10:55:17 -07:00
Luca Wehrstedt
08ce5eedf5 [reland] Move RPC agents to libtorch (#60170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170

Reland of #59939.

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29193234

fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60
2021-06-18 05:15:09 -07:00
Mike Ruberry
f233274f30 Revert D28875276: Move RPC agents to libtorch
Test Plan: revert-hammer

Differential Revision:
D28875276 (fc50f91929)

Original commit changeset: f2f6970fd74d

fbshipit-source-id: 3c52af652579733ebea8ddfb06576a0ce262bf78
2021-06-17 00:48:58 -07:00
Luca Wehrstedt
fc50f91929 Move RPC agents to libtorch (#59939)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59939

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28875276

fbshipit-source-id: f2f6970fd74de5f112636e78edaa4410c61d8c45
2021-06-15 16:20:53 -07:00
Nikita Shulga
8845cbabf0 [CMake] Split caffe2::cudnn into public and private (#59721)
Summary:
This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721

Reviewed By: ngimel

Differential Revision: D29000967

Pulled By: malfet

fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336
2021-06-09 13:18:48 -07:00
Michael Wootton
e66015dadf Add build support for kineto + rocm (#58401)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58399

CMake changes to allow kineto to build with rocm support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58401

Reviewed By: mruberry

Differential Revision: D28479807

Pulled By: walterddr

fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7
2021-06-03 12:15:20 -07:00
neginraoof
599f5058cf [ONNX] Update ONNX to rel-1.9 (#55889) (#57080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57080

ONNX optimizer is removed in ONNX 1.9
This PR removes ONNX optimizer from a C++ code path and uses `try-except` block in Python to keep it compatible with both ONNX-1.8 and 1.9.

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D28467330

Pulled By: malfet

fbshipit-source-id: 5e4669dd0537648898e593f9e253da18d6dc7568

Co-authored-by: neginraoof <neginmr@utexas.edu>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-06-02 08:27:17 -07:00
Jeff Daily
ba694520e5 [ROCm] fix JIT codegen (#57400)
Summary:
Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT.

- ROCM_VERSION macro must be available to both device and host compilation passes.
- Unifies some of CUDA and HIP differences in the code generated.
  - NAN / POS_INFINITY / NEG_INFINITY
  - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated]
- Differentiates bf16 codegen for HIP.
- Optionally provides missing macros when using hiprtc precompiled header feature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400

Reviewed By: ejguan

Differential Revision: D28421065

Pulled By: malfet

fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074
2021-05-27 11:45:07 -07:00
Nikita Shulga
7179e7ea7b [CMake] Prefer third_party/pybind11 by default (#58951)
Summary:
To make build behaviour aligned with other third_party/ libraries,
introduce `USE_SYSTEM_PYBIND11 (d55b25a633)` build option, which set to OFF by
default, which means PyTorch will be build with bundled pybind11 even if
other version is already installed locally.

Fixes https://github.com/pytorch/pytorch/issues/58750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951

Reviewed By: driazati

Differential Revision: D28690411

Pulled By: malfet

fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c
2021-05-25 15:10:17 -07:00
Xiang Gao
6c70cbedb6 step 0 of cuDNN v8 convolution API integration (#51390)
Summary:
This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release.

The work is not complete, and this PR is only step 0.

**What this PR does:**
- Add cudnn-frontend as a submodule.
- Modify cmake to build that submodule.
- Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default.
- Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below.

**What this PR doesn't:**
- Only convolution forward, no backward. The backward will use v7 API.
- No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions.
- No test beyond PyTorch's unit tests.
  - Not tested for correctness on real models.
  - Not benchmarked for performance.
- Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR)
- cuDNN benchmark is not supported.
- There are failing tests, which will be resolved later:
  ```
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in...
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (...
  FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9
  FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an...
  FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet
  FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
  FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
  ```

Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390

Reviewed By: malfet

Differential Revision: D28513167

Pulled By: ngimel

fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740
2021-05-19 12:54:09 -07:00
peter
432676599c Stop installing libuv on Windows (#51936)
Summary:
Fixes #{issue number}
gunandrose4u

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51936

Reviewed By: malfet

Differential Revision: D28467662

Pulled By: seemethere

fbshipit-source-id: 28d203ee3af13d6a3158f188c2e889e310ee6010
2021-05-17 08:52:29 -07:00
Ilia Cherniavskii
6997e7bd39 Update Kineto submodule (#58179)
Summary:
Update Kineto submodule, minor api changes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58179

Test Plan: CI

Reviewed By: gdankel

Differential Revision: D28391369

Pulled By: ilia-cher

fbshipit-source-id: 61fbf63d9ec2db66fac203944679e4b99cb0d568
2021-05-13 04:03:04 -07:00
Ilia Cherniavskii
c714596027 [kineto] Update Kineto submodule, cupti library paths (#57789)
Summary:
Update kineto submodule, improve cupti detection

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57789

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D28297175

Pulled By: ilia-cher

fbshipit-source-id: 5895270fae160097ae8872a592984d0e4a1b187b
2021-05-10 19:15:59 -07:00
Ilia Cherniavskii
65fad0ebd2 Expand Kineto platform support (ci-all) (#56323)
Summary:
Expanding support to all builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56323

Test Plan: CI

Reviewed By: malfet

Differential Revision: D28171478

Pulled By: ilia-cher

fbshipit-source-id: 16bc752d1be3cbaeda5316f5d8a687ae05a83d22
2021-05-05 15:00:01 -07:00
davidriazati@fb.com
c44cbc63cc Ignore more compiler warnings, unify WERROR options (#56630)
Summary:
This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet).
](https://our.intern.facebook.com/intern/diff/28005063/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630

Pulled By: driazati

Reviewed By: malfet

Differential Revision: D28005063

fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0
2021-04-29 21:20:29 -07:00
davidriazati@fb.com
4b96fc060b Remove distutils (#57040)
Summary:
[distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places.

Fixes #56527
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040

Pulled By: driazati

Reviewed By: nikithamalgifb

Differential Revision: D28051356

fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720
2021-04-29 12:10:11 -07:00
davidriazati@fb.com
d1b6383d65 Hide warnings for deprecated quantization APIs (#56291)
Summary:
These have a tracking task to actually fix them but in the meantime they
should not be clogging up everyone's build output (see #55952).
](https://our.intern.facebook.com/intern/diff/27830229/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56291

Pulled By: driazati

Reviewed By: bertmaher

Differential Revision: D27830229

fbshipit-source-id: f1e5d6e9b2c63d4a4ad99a1744a520f8c681c22b
2021-04-19 11:11:33 -07:00
Jeff Daily
e1752ffa04 [reland][ROCm] use hiprtc precompiled header (#55965)
Summary:
Revert "Revert D27449031 (2a7df657fe): [pytorch][PR] [ROCm] use hiprtc precompiled header".  Reland PR https://github.com/pytorch/pytorch/issues/54350.

This reverts commit 204ac21bf1.

The original PR was reverted under suspicion that it was causing CI instability, but it was instead due to a hardware failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55965

Reviewed By: jbschlosser

Differential Revision: D27755907

Pulled By: malfet

fbshipit-source-id: 75bf0b9d888df3dee62f00a366b1123757e0474e
2021-04-15 15:47:56 -07:00
Eddie Yan
81f181567a Add USE_MAGMA build flag (#55994)
Summary:
Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master).

A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be *manually* deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild?

CC malfet ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994

Reviewed By: mruberry

Differential Revision: D27766287

Pulled By: malfet

fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421
2021-04-15 00:43:12 -07:00
Alexander Golynski
204ac21bf1 Revert D27449031: [pytorch][PR] [ROCm] use hiprtc precompiled header
Test Plan: revert-hammer

Differential Revision:
D27449031 (2a7df657fe)

Original commit changeset: 81a8d7847a47

fbshipit-source-id: b7b970c8ea4110357fba3ad4d52a86fa5641d90c
2021-04-01 06:42:04 -07:00
Jeff Daily
2a7df657fe [ROCm] use hiprtc precompiled header (#54350)
Summary:
HIP's runtime compiler (hiprtc) is adding support for precompiled HIP headers in the ROCm 4.2 release.  Conditionally add support for this feature.  Using this feature will improve the ROCm torch wheel user experience; users will no longer need to install HIP headers separately to use torch JIT features.

The use of this feature is conditionalized on a new ROCM_VERSION macro.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54350

Reviewed By: H-Huang

Differential Revision: D27449031

Pulled By: malfet

fbshipit-source-id: 81a8d7847a47ce2bb253d1ea58740ef66ed154a3
2021-03-31 13:36:50 -07:00
Shruti Ramesh
f1f3c8b0fa Adding PyTorch + DNNL + AMD BLIS path (#54953)
Summary:
These changes provide the user with an additional option to choose the DNNL+BLIS path for PyTorch.

This assumes BLIS is already downloaded or built from source and the necessary library file is available at the location: $BLIS_HOME/lib/libblis.so and include files are available at: $BLIS_HOME/include/blis/blis.h and $BLIS_HOME/include/blis/cblas.h

Export the below variables to build PyTorch with MKLDNN+BLIS and proceed with the regular installation procedure as below:
$export BLIS_HOME=path-to-BLIS
$export PATH=$BLIS_HOME/include/blis:$PATH LD_LIBRARY_PATH=$BLIS_HOME/lib:$LD_LIBRARY_PATH
$export BLAS=BLIS USE_MKLDNN_CBLAS=ON WITH_BLAS=blis
$python setup.py install

CPU only Dockerfile to build PyTorch with AMD BLIS is available at : docker/cpu-blis/Dockerfile
Example command line to build using the Dockerfile:
sudo DOCKER_BUILDKIT=1 docker build . -t docker-image-repo-name
Example command line to run the built docker container:
sudo docker run --name container-name -it docker-image-repo-name

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54953

Reviewed By: glaringlee

Differential Revision: D27466799

Pulled By: malfet

fbshipit-source-id: e03bae9561be3a67429df3b1be95a79005c63050
2021-03-31 10:40:25 -07:00
Jeff Daily
1dffbe759b [ROCm] utilize PUBLIC vs PRIVATE linking to avoid incorrect dependencies (#54727)
Summary:
Fixes the build of projects that depend on torch, such as torchaudio.  Otherwise torchaudio will complain that gloo_hip is missing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54727

Reviewed By: H-Huang

Differential Revision: D27361513

Pulled By: ezyang

fbshipit-source-id: 714cc2db23e7adf3e89303e941b78c27625b9460
2021-03-30 19:22:56 -07:00
Michael Melesse
2620bce42a [ROCM] load only hipfft separately past rocm4.1 (#54349)
Summary:
This PR is a follow up to https://github.com/pytorch/pytorch/pull/53408.

It only loads hipfft if the version is rocm 4.1 or after and stops loading rocfft. This was done to resolve some issues observed in our internal ci due to conflicts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54349

Reviewed By: ezyang

Differential Revision: D27374252

Pulled By: ngimel

fbshipit-source-id: 724e80df5011ea8fabd81739e18ae8a13d3a7ea0
2021-03-26 19:54:25 -07:00
Michael Melesse
4c1af249fb [ROCM] load hipfft separately from rocfft (#53408)
Summary:
This PR makes changes to how hipfft is loaded in pytorch. hipfft is packaged in a separate library to rocfft following rocm 4.1.

We check the rocm version and if it is past rocm 4.1 we load hipfft in addition to rocfft.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53408

Reviewed By: albanD

Differential Revision: D26952702

Pulled By: malfet

fbshipit-source-id: f42be304b587c060816e39d36f5c1a2cdc37bfab
2021-03-11 09:18:33 -08:00
Ilia Cherniavskii
795ed5ca3f Enable Kineto in CPU builds (#53174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53174

Enable Kineto also in the CPU builds (non-mobile, non-Windows(atm))

Test Plan: CI

Reviewed By: gdankel

Differential Revision: D26776112

Pulled By: ilia-cher

fbshipit-source-id: 8733f65c2993105136c853f2a7b6e497d0fa53bf
2021-03-04 19:15:52 -08:00
Ashkan Aliabadi
e5ecd1ddf8 [Vulkan]Fix build warnings-treated-as-error on Linux. (#52781)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D26669311

Pulled By: AshkanAliabadi

fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311
2021-03-03 13:48:43 -08:00
Jeff Daily
d02ea9a141 [ROCm] add hipMAGMA support (#51238)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48831.

- CI image is updated to build hipMAGMA from source and set env MAGMA_HOME.
- CMake is updated to separate different requirements for CUDA versus ROCm MAGMA.
- Some unit tests that become enabled with MAGMA are currently skipped for ROCm due to failures.  Fixing these failures will be follow-on work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51238

Reviewed By: ngimel

Differential Revision: D26184918

Pulled By: malfet

fbshipit-source-id: ada632f1ae7b413e8cae6543fe931dcd46985821
2021-02-01 22:09:33 -08:00
Luca Wehrstedt
b77f72b5a0 Enable TensorPipe's SHM transport (#50760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50760

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.
ghstack-source-id: 120470890

Test Plan: Exported three times to CircleCI with tests consistently passing

Reviewed By: mrshenli

Differential Revision: D23814828

fbshipit-source-id: f355cb6515776debad536924de4f4d3fbb05a874
2021-01-27 11:45:09 -08:00
Jeff Daily
b2e5617553 [ROCm] rename HIP_HCC_FLAGS to HIP_CLANG_FLAGS (#50917)
Summary:
ROCm 3.5 replaced hcc with hip-clang and deprecated HIP_HCC_FLAGS.
HIP_CLANG_FLAGS should be used moving forward. HIP_HCC_FLAGS will
be removed soon.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50917

Reviewed By: ejguan

Differential Revision: D26008094

Pulled By: walterddr

fbshipit-source-id: cfec4f96fbd9bd338834a841c37267f6a4703cab
2021-01-22 07:24:05 -08:00
Ilia Cherniavskii
e34992ebee Set USE_KINETO=1 (#49897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897

Resend of https://github.com/pytorch/pytorch/pull/49201

Test Plan: see 49201

Reviewed By: malfet

Differential Revision: D25717102

Pulled By: ilia-cher

fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6
2021-01-22 00:09:21 -08:00
Luca Wehrstedt
112a583467 Enable TensorPipe's CMA channel (#50759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50759

ghstack-source-id: 120032288

Test Plan: Exported to CircleCI and tested

Reviewed By: mrshenli

Differential Revision: D25959326

fbshipit-source-id: be6df209ff3a79a8961acbda64ee7805a5c434a9
2021-01-20 10:53:47 -08:00
Ilia Cherniavskii
72b00a8a52 Revert D25480770: Set USE_KINETO=1
Test Plan: revert-hammer

Differential Revision:
D25480770 (1a92802bde)

Original commit changeset: 037cd774f554

fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1
2020-12-18 07:06:28 -08:00
Ilia Cherniavskii
1a92802bde Set USE_KINETO=1 (#49201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201

This unblocks kineto profiler for 1.8 release.
This PR supercedes https://github.com/pytorch/pytorch/pull/48391
Note: this will somewhat increase the size of linux server binaries, bc
we add libkineto.a and libcupti_static.a:
-rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a
-rw-r--r-- 1 root root 13699658 Nov 13  2019 /usr/local/cuda/lib64/libcupti_static.a

Test Plan:
CI
https://github.com/pytorch/pytorch/pull/48391

Imported from OSS

Reviewed By: ngimel

Differential Revision: D25480770

fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c
2020-12-18 01:48:10 -08:00
Abdelrauf
95a1725a4a Vsx initial support issue27678 (#41541)
Summary:
### Pytorch Vec256 ppc64le support
implemented types:

- double
- float
- int16
- int32
- int64
- qint32
- qint8
- quint8
- complex_float
- complex_double

Notes:
All basic vector operations are implemented:
There are a few problems:
- minimum maximum nan propagation for ppc64le is missing and was not checked
- complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones.  That's why they were either excluded or tested in smaller domain range
- precisions of the implemented float math functions

~~Besides, I added CPU_CAPABILITY for power. but as because of  quantization errors for DEFAULT I had to undef and  use vsx for DEFAULT too~~

#### Details
##### Supported math functions

+ plus sign means vectorized, -  minus sign means missing,   (implementation notes are added inside braces)
(notes). Example: -(both ) means it was also missing on x86 side
g( func_name)  means vectorization is using func_name
sleef - redirected to the Sleef
unsupported

function_name | float | double | complex float | complex double
|-- | -- | -- | -- | --|
acos | sleef | sleef | f(asin) | f(asin)
asin | sleef | sleef | +(pytorch impl) | +(pytorch impl)
atan | sleef | sleef | f(log) | f(log)
atan2 | sleef | sleef | unsupported | unsupported
cos | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both)
cosh | f(exp)   | -(both) | -(both) |
erf | sleef | sleef | unsupported | unsupported
erfc | sleef | sleef | unsupported | unsupported
erfinv | - (both) | - (both) | unsupported | unsupported
exp | + | sleef | - (x86:f()) | - (x86:f())
expm1 | f(exp)  | sleef | unsupported | unsupported
lgamma | sleef | sleef |   |
log | +  | sleef | -(both) | -(both)
log10 | f(log)  | sleef | f(log) | f(log)
log1p | f(log)  | sleef | unsupported | unsupported
log2 | f(log)  | sleef | f(log) | f(log)
pow | + f(exp)  | sleef | -(both) | -(both)
sin | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both)
sinh | f(exp)  | sleef | -(both) | -(both)
tan | sleef | sleef | -(both) | -(both)
tanh | f(exp)  | sleef | -(both) | -(both)
hypot | sleef | sleef | -(both) | -(both)
nextafter | sleef  | sleef | -(both) | -(both)
fmod | sleef | sleef | -(both) | -(both)

[Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685)
Current list:

- [x] Blends
- [x] Memory: UnAlignedLoadStore
- [x] Arithmetics: Plus,Minu,Multiplication,Division
- [x] Bitwise: BitAnd, BitOr, BitXor
- [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual
- [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp
- [x] SignManipulation: Absolute, Negate
- [x] Interleave: Interleave, DeInterleave
- [x] Rounding: Round, Ceil, Floor, Trunc
- [x] Mask: ZeroMask
- [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal
- [x] Trigonometric: Sin, Cos, Tan
- [x] Hyperbolic: Tanh, Sinh, Cosh
- [x] InverseTrigonometric: Asin, ACos, ATan, ATan2
- [x] Logarithm: Log, Log2, Log10, Log1p
- [x] Exponents: Exp, Expm1
- [x] ErrorFunctions: Erf, Erfc, Erfinv
- [x] Pow: Pow
- [x] LGamma: LGamma
- [x] Quantization: quantize, dequantize, requantize_from_int
- [x] Quantization: widening_subtract, relu, relu6
Missing:
- [ ] Constructors, initializations
- [ ] Conversion , Cast
- [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex)

#### Notes on tests and testing framework
- some math functions are tested within domain range
- mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions.
- some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~
- round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx  vec_round failed  for  (even)+0 .5 values**~~ . it was solved by using vec_rint
- ~~**complex types are not tested**~~  **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain**
- ~~quantizations are not tested~~  Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions
- the testing framework should be improved further
- ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~
Vec256 Test cases will be built for each CPU_CAPABILITY

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41541

Reviewed By: zhangguanheng66

Differential Revision: D23922049

Pulled By: VitalyFedyunin

fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098
2020-12-10 13:42:39 -08:00
peterjc123
5450614cf6 Correctly apply WIN32_LEAN_AND_MEAN to the whole repo (#49025)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49025

Reviewed By: zhangguanheng66

Differential Revision: D25399912

Pulled By: ezyang

fbshipit-source-id: 9b7225b0e43511e0b8981c39035d814a4406c523
2020-12-08 19:38:23 -08:00
Rong Rong
b89c328493 Add fftw3 cmake as alternative for FFT/DFT (#48808)
Summary:
added cmake discovery in Dependencies.cmake for fftw3.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48808

Reviewed By: janeyx99

Differential Revision: D25375320

Pulled By: walterddr

fbshipit-source-id: cde3afc51eef9c621c7d19be7ad7573fc8b838c2
2020-12-08 10:35:33 -08:00
Jithun Nair
5f62308739 Hipify revamp [REDUX] (#48715)
Summary:
[Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451]

This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself.

Correspondingly, changes are made to cpp_extension.py to match these improvements.

The list of improvements to hipify is as follows:

1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path.

The list of changes to cpp_extension.py is as follows:

1. Call hipify when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically

cc jeffdaily sunway513 ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715

Reviewed By: bdhirsh

Differential Revision: D25272824

Pulled By: ezyang

fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e
2020-12-02 18:03:23 -08:00
Daily, Jeff
7f869dca70 [ROCm] update debug flags (#46717)
Summary:
Improves support for rocgdb when setting DEBUG=1 and building for ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46717

Reviewed By: mrshenli

Differential Revision: D25171544

Pulled By: malfet

fbshipit-source-id: b4699ba2277dcb89f07efb86f7153fae82a80dc3
2020-11-30 15:27:30 -08:00
Rong Rong
af520d9d04 [cmake] clean up blas discovery (#47940)
Summary:
remove useless variable changes in blas discovery

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47940

Reviewed By: malfet

Differential Revision: D25122228

Pulled By: walterddr

fbshipit-source-id: 12bc3ce9e4f89a72b6a92c10d14024e5941f4b96
2020-11-30 10:29:50 -08:00
Nikita Shulga
e7ca62be08 Fix PyTorch compilation on Apple M1 (#48275)
Summary:
Update cpuinfo and sleef to contain build fixes for M1

Fixes https://github.com/pytorch/pytorch/issues/48145

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48275

Reviewed By: walterddr

Differential Revision: D25135153

Pulled By: malfet

fbshipit-source-id: 2a82e14407d6f40c7dacd11109a8499d808c8ec1
2020-11-26 07:08:33 -08:00
Ilia Cherniavskii
f7a8bf2855 Use libkineto in profiler (#46470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46470

Adding ability to use Kineto (CUPTI) to profile CUDA kernels

Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install
python test/test_profiler.py

python test/test_autograd.py -k test_profile
python test/test_autograd.py -k test_record

```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us       2.000us        33.33%       2.000us       1.000us             2
                                      sgemm_32x32x32_NN         0.00%       0.000us         0.00%       0.000us       0.000us       2.000us        33.33%       2.000us       2.000us             1
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.000us        16.67%       1.000us       1.000us             1
                       Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us       1.000us        16.67%       1.000us       1.000us             1
                                            aten::randn         5.17%      74.000us         6.71%      96.000us      48.000us       0.000us         0.00%       0.000us       0.000us             2
                                            aten::empty         1.33%      19.000us         1.33%      19.000us       4.750us       0.000us         0.00%       0.000us       0.000us             4
                                          aten::normal_         1.05%      15.000us         1.05%      15.000us       7.500us       0.000us         0.00%       0.000us       0.000us             2
                                               aten::to        77.90%       1.114ms        91.61%       1.310ms     436.667us       0.000us         0.00%       3.000us       1.000us             3
                                    aten::empty_strided         2.52%      36.000us         2.52%      36.000us      12.000us       0.000us         0.00%       0.000us       0.000us             3
                                            aten::copy_         2.73%      39.000us        11.19%     160.000us      53.333us       0.000us         0.00%       3.000us       1.000us             3
                                        cudaMemcpyAsync         4.34%      62.000us         4.34%      62.000us      20.667us       0.000us         0.00%       0.000us       0.000us             3
                                  cudaStreamSynchronize         1.61%      23.000us         1.61%      23.000us       7.667us       0.000us         0.00%       0.000us       0.000us             3
                                               aten::mm         0.21%       3.000us         7.20%     103.000us     103.000us       0.000us         0.00%       2.000us       2.000us             1
                                           aten::stride         0.21%       3.000us         0.21%       3.000us       1.000us       0.000us         0.00%       0.000us       0.000us             3
                                       cudaLaunchKernel         2.45%      35.000us         2.45%      35.000us      17.500us       0.000us         0.00%       0.000us       0.000us             2
                                              aten::add         0.49%       7.000us         4.27%      61.000us      61.000us       0.000us         0.00%       1.000us       1.000us             1
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
```

benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a

Reviewed By: Chillee

Differential Revision: D25142223

Pulled By: ilia-cher

fbshipit-source-id: b0dff46c28da5fb0a8e01cf548aa4f2b723fde80
2020-11-25 04:32:16 -08:00
Ilia Cherniavskii
f2da18af14 Add USE_KINETO build option (#45888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45888

Adding USE_LIBKINETO build option

Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python
setup.py develop install --cmake

Reviewed By: Chillee

Differential Revision: D25142221

Pulled By: ilia-cher

fbshipit-source-id: d1634a8f9599604ff511fac59b9072854289510c
2020-11-21 20:20:32 -08:00
Nikita Shulga
8af9f2cc23 Revert D24924736: [pytorch][PR] Hipify revamp
Test Plan: revert-hammer

Differential Revision:
D24924736 (10b490a3e0)

Original commit changeset: 4af42b8ff4f2

fbshipit-source-id: 7f8f90d55d8a69a2890ec73622fcea559189e381
2020-11-18 11:48:30 -08:00
Jithun Nair
10b490a3e0 Hipify revamp (#45451)
Summary:
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, **not for PyTorch or Caffe2 itself**.

Correspondingly, changes are made to `cpp_extension.py` to match these improvements.

The list of improvements to hipify is as follows:

1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path.

The list of changes to `cpp_extension.py` is as follows:
1. Call `hipify` when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically

cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451

Reviewed By: ezyang

Differential Revision: D24924736

Pulled By: malfet

fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d
2020-11-18 08:37:49 -08:00
Rong Rong
7391edb591 [hotfix] fix misleadingly summary BLAS=MKL when there's no BLAS install (#47803)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47803

Reviewed By: samestep

Differential Revision: D24907453

Pulled By: walterddr

fbshipit-source-id: a3e41041f6aa506b054eb0ffc61f8525ba02cbf1
2020-11-12 16:05:14 -08:00
Nikita Shulga
e8a73fbf34 Workaround PyTorch debug build crash using old GCC (#47805)
Summary:
gcc-7.4.x or older fails to compile XNNPACK in debug mode with internal compiler error
Workaround this in a build script by pasing -O1 optimisation flag to XNNPACK if compiled on older compilers

Fixes https://github.com/pytorch/pytorch/issues/47292

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47805

Reviewed By: seemethere

Differential Revision: D24905758

Pulled By: malfet

fbshipit-source-id: 93f4e3b3b5c10b69734627c50e36b2eb544699c8
2020-11-11 16:33:47 -08:00
Nikita Shulga
83d358da7c Fix LAPACK functionality detection from static OpenBLAS (#46710)
Summary:
BLAS `sgemm_` only depends on pthreads, but LAPACK `cheev_` also depends on libm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46710

Reviewed By: walterddr

Differential Revision: D24476082

Pulled By: malfet

fbshipit-source-id: e0b91116f18bbcdabb1f99c2ec9d98283df4393f
2020-10-26 08:34:28 -07:00
peter
89f368bef8 Enable XNNPACK on Windows & Update XNNPACK (#45830)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44283.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45830

Reviewed By: zhangguanheng66

Differential Revision: D24504302

Pulled By: ezyang

fbshipit-source-id: ab28088a4fbb553a27ed7c8da87ec7b40c73c2f1
2020-10-23 14:17:45 -07:00
Shen Li
eadc59df55 Enable TP_USE_CUDA and TP_ENABLE_CUDA_IPC (#46523)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46523

Test Plan: Imported from OSS

Reviewed By: beauby

Differential Revision: D24385830

Pulled By: mrshenli

fbshipit-source-id: 59a40843b4dc1585e176062476da9ab74c84179b
2020-10-19 09:05:00 -07:00
peterjc123
bb99bea774 Compress NVCC flags for Windows (#45842)
Summary:
Fixes #{issue number}
This makes the command line shorter.
Also updates `randomtemp` in which the previous version has a limitation that the length of the argument cannot exceed 260.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45842

Reviewed By: albanD

Differential Revision: D24137088

Pulled By: ezyang

fbshipit-source-id: f0b4240735306e302eb3887f54a2b7af83c9f5dc
2020-10-07 08:39:15 -07:00
Xiang Gao
2fa062002e CUDA BFloat16 infrastructure (#44925)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44925

Reviewed By: agolynski

Differential Revision: D23783910

Pulled By: ngimel

fbshipit-source-id: dacac2ad87d58056bdc68bfe0b7ab1de5c2af0d8
2020-10-02 16:21:30 -07:00
gunandrose4u
f07ac6a004 Fix Windows build failure after DDP PR merged (#45335)
Summary:
Fixes #{issue number}
This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335

Reviewed By: zou3519

Differential Revision: D23931471

Pulled By: mrshenli

fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494
2020-09-25 12:37:50 -07:00
Mike Ruberry
103fa3894a Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only
Test Plan: revert-hammer

Differential Revision:
D23841786 (0122299f9b)

Original commit changeset: 334ba1ed73ef

fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f
2020-09-24 22:44:33 -07:00
gunandrose4u
0122299f9b Enable distributed package on windows, Gloo backend supported only (#42897)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42095

For test case part will be committed to this PR later

mrshenli, please help to review

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897

Reviewed By: osalpekar

Differential Revision: D23841786

Pulled By: mrshenli

fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3
2020-09-24 21:13:55 -07:00
peter
ed862d3682 Split CUDA_NVCC_FLAGS by space (#44603)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44599

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603

Reviewed By: albanD

Differential Revision: D23692320

Pulled By: ezyang

fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754
2020-09-14 20:25:37 -07:00
Nikita Shulga
fc51047af5 Small fixes in Dependency.cmake and run_test.py (#44414)
Summary:
Do not add gencode flags to NVCC_FLAGS twice: First time they are added in `cmake/public/cuda.cmake` no need to do it again in `cmake/Dependencies.cmake`
Copy `additional_unittest_args` before appending local options to it in `run_test()` method

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44414

Reviewed By: seemethere

Differential Revision: D23605733

Pulled By: malfet

fbshipit-source-id: 782a0da61650356a978a892fb03c66cb1a1ea26b
2020-09-09 15:09:33 -07:00
Parichay Kapoor
8ecfa9d9a2 [cmake] End support for python3.5 for pytorch (#43105)
Summary:
PyTorch uses f-string in its python codes.
Python support for f-string started with version 3.6
Using python version 3.5 or older fails the build with latest release/master.
This patch checks the version of the python used for build and mandates it to be 3.6 or higher.

Signed-off-by: Parichay Kapoor <kparichay@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43105

Reviewed By: glaringlee

Differential Revision: D23301481

Pulled By: malfet

fbshipit-source-id: e9b4f7bffce7384c8ade3b7d131b10cf58f5e8a0
2020-08-25 09:42:42 -07:00
Luca Wehrstedt
c30bc6d4d7 Update TensorPipe submodule (#42522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522

Main changes:
- Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch.
- Changed the way the preprocessor flags are provided, and changed their name.

There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories.

I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one.

Test Plan: CI

Reviewed By: malfet

Differential Revision: D22959472

fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67
2020-08-06 02:14:58 -07:00
Edward Yang
352e15f1a2 Revert D22812445: Update TensorPipe submodule
Test Plan: revert-hammer

Differential Revision:
D22812445 (2335430086)

Original commit changeset: e6d824bb28f5

fbshipit-source-id: 606632a9aaf2513b5ac949e4d6687aa7563eae5d
2020-07-31 10:16:48 -07:00
Luca Wehrstedt
2335430086 Update TensorPipe submodule (#42225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42225

Main changes:
- Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch.
- Changed the way the preprocessor flags are provided, and changed their name.

There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header.

I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one.

Test Plan: CircleCI is all green.

Reviewed By: beauby

Differential Revision: D22812445

fbshipit-source-id: e6d824bb28f5afe75fd765de0430968174f3531f
2020-07-30 02:32:52 -07:00
Alexander Grund
a4b831a86a Replace if(NOT ${var}) by if(NOT var) (#41924)
Summary:
As explained in https://github.com/pytorch/pytorch/issues/41922 using `if(NOT ${var})" is usually wrong and can lead to issues like https://github.com/pytorch/pytorch/issues/41922 where the condition is wrongly evaluated to FALSE instead of TRUE. Instead the unevaluated variable name should be used in all cases, see the CMake docu for details.

This fixes the `NOT ${var}` cases by using a simple regexp replacement. It seems `pybind11_PREFER_third_party` is the only variable really prone to causing an issue as all others are set. However due to CMake evaluating unquoted strings in `if` conditions as variable names I recommend to never use unquoted `${var}` in an if condition. A similar regexp based replacement could be done on the whole codebase but as that does a lot of changes I didn't include this now. Also `if(${var})` will likely lead to a parser error if `var` is unset instead of a wrong result

Fixes https://github.com/pytorch/pytorch/issues/41922

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41924

Reviewed By: seemethere

Differential Revision: D22700229

Pulled By: mrshenli

fbshipit-source-id: e2b3466039e4312887543c2e988270547a91c439
2020-07-23 15:49:20 -07:00
Anush Elangovan
c86699d425 [cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387)
Summary:
Add support for including pytorch via an add_subdirectory()
This requires using PROJECT_* instead of CMAKE_* which refer to
the top-most project including pytorch.

TEST=add_subdirectory() into a pytorch checkout and build.
There are still some hardcoded references to TORCH_SRC_DIR, I will
fix in a follow on commit. For now you can create a symlink to
 <pytorch>/torch/ in your project.

Change-Id: Ic2a8aec3b08f64e2c23d9e79db83f14a0a896abc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41387

Reviewed By: zhangguanheng66

Differential Revision: D22539944

Pulled By: ezyang

fbshipit-source-id: b7e9631021938255f0a6ea897a7abb061759093d
2020-07-15 11:09:05 -07:00
Alexander Grund
ac3542fa59 Define PSIMD_SOURCE_DIR when including FP16 (#41233)
Summary:
Avoids a superflous redownload when *NNPACK is not used (e.g. on Power)

Example: https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/1128/consoleFull
Search for "Downloading PSimd"

See also https://github.com/pytorch/pytorch/issues/41178

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41233

Differential Revision: D22488833

Pulled By: malfet

fbshipit-source-id: 637291419ddd3b2a8dc25e211a4ebbba955e5855
2020-07-10 16:55:10 -07:00
Kimish Patel
d6feb6141f [Vec256][neon] Add neon backend for vec256 (#39341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341

This PR introduces neon backend for vec256 class for float datatype.
For now only aarch64 is enabled due to few issues with enabling in
aarch32 bit.

Test Plan:
vec256_test

Imported from OSS

Differential Revision: D21822399

fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d
2020-07-09 16:25:09 -07:00
Kimish Patel
bddba1e336 Add benchmark for add op. (#40059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059

This benchmark is added specifically for mobile to see if compiler is
autovectorizing and thus we have no advantage of neon backend for vec256
for add op.

Test Plan:
CI

Imported from OSS

Differential Revision: D22055146

fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5
2020-07-09 16:22:55 -07:00
Alexander Grund
7c29a4e66f Don't add NCCL dependency to gloo if system NCCL is used (#41180)
Summary:
This avoids a (currently only) warning of cmake:
```
The dependency target "nccl_external" of target "gloo_cuda" does not exist.
Call Stack (most recent call first):
  CMakeLists.txt:411 (include)
```

This will be a real problem once Policy CMP0046 is set which will make this warning be an error

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41180

Differential Revision: D22460623

Pulled By: malfet

fbshipit-source-id: 0222b12b435e5e2fdf2bc85752f95abba1e3d4d5
2020-07-09 12:10:29 -07:00
Ashkan Aliabadi
c8deca8ea8 Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40524

Reviewed By: ezyang

Differential Revision: D22215742

Pulled By: AshkanAliabadi

fbshipit-source-id: ef594e0901337a92b21ddd44e554da66c723eb7c
2020-07-09 10:00:36 -07:00
Thomas Viehmann
a8bc7545d5 use PYTORCH_ROCM_ARCH to set GLOO_ROCM_ARCH (#40170)
Summary:
Previously it used the default arch set which may or may not coincide with the user's.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40170

Differential Revision: D22400866

Pulled By: xw285cornell

fbshipit-source-id: 222ba684782024fa68f37bf7d4fdab9a2389bdea
2020-07-07 19:41:02 -07:00
David Reiss
b7e044f0e5 Re-apply PyTorch pthreadpool changes
Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.

Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`

Reviewed By: xcheng16

Differential Revision: D22199952

fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
2020-06-23 19:26:21 -07:00
Kate Mormysh
92d3182c11 Revert D21232894: Unify PyTorch mobile's threadpool usage.
Test Plan: revert-hammer

Differential Revision:
D21232894 (b9d3869df3)

Original commit changeset: 8b3de86247fb

fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd
2020-06-23 17:09:14 -07:00
Ashkan Aliabadi
b9d3869df3 Unify PyTorch mobile's threadpool usage. (#37243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243

*** Why ***

As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool.  Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version.

The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point.  That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks.  With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene.  As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands.

This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2.  Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell.

So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do.

The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene.  This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the
exact same third party implementation in this PR.

Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well.  The implementation of ATen parallel_for on non-mobile builds remains unchanged.

*** How ***

This is where things get tricky.

A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use.

pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR.  This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation.  In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in.  Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try.  I am heavily relying on CI to find any issues as local testing can only go that far.

Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration.  This simplifies the logic at the cost of pushing the complexity to the build scripts.  From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration.

When it is all said or done, the layering will look like this:

a) aten::parallel_for, uses
b) caffe2::PThreadPool, which uses
c) pthreadpool C API, which delegates to
    c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here.
    c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to
    c-2-1) caffe2::ThreadPool, and the rabbit hole ends here.

NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b).

Differential Revision: D21232894

Test Plan: Imported from OSS

Reviewed By: dreiss

Pulled By: AshkanAliabadi

fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354
2020-06-23 16:34:51 -07:00
Richard Zou
2ba5f98dd1 Revert D22068657: [pytorch][PR] Remove global CMAKE_INSTALL_RPATH_USE_LINK_PATH directive
Test Plan: revert-hammer

Differential Revision:
D22068657

Original commit changeset: b04c529572a9

fbshipit-source-id: d8227dfc12d9b6382f7bf2905686b6025034561c
2020-06-17 13:05:01 -07:00
mattip
49732f0450 Remove global CMAKE_INSTALL_RPATH_USE_LINK_PATH directive (#37737)
Summary:
Closes gh-35418,

PR gh-16414 added [the `CMAKE_INSTALL_RPATH_USE_LINK_PATH`directive](https://github.com/pytorch/pytorch/pull/16414/files#diff-dcf5891602b4162c36c2125c806639c5R16) which is non-standard and will cause CMake to write an `RPATH` entry for libraries outside the current build. Removing it leaves an RPATH entry for `$ORIGIN` but removes the entries for things like `/usr/local/cuda-10.2/lib64/stubs:/usr/local/cuda-10.2/lib64` for `libcaffe2_nvrtc.so` on linux.

The added test fails before this PR, passes after. It is equivalent to checking `objdump -p torch/lib/libcaffe2_nvrtc.so | grep RPATH` for an external path to the directory where cuda "lives"

I am not sure if it solve the `rpath/libc++.1.dylib` problem for `_C.cpython-37m-darwin.so` on macOS in issue gh-36941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37737

Differential Revision: D22068657

Pulled By: ezyang

fbshipit-source-id: b04c529572a94363855f1e4dd3e93c9db3c85657
2020-06-16 11:18:39 -07:00
peter
0f39ed86a7 Cleanup debug info switches with MSVC (#39703)
Summary:
Switch off `/Z7` so that we don't generate debug info in Release and MinSizeRel builds, so that we will probably get smaller static libraries and object files and faster build time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39703

Differential Revision: D21960684

Pulled By: ezyang

fbshipit-source-id: 909a237a138183591d667885b13fc311470eed65
2020-06-09 14:11:40 -07:00
Michael Voznesensky
fce01a9bab [JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379)
Summary:
Before:
```
2020-05-11 18:31:41 INFO     Benchmarking 'basic', best of 10 runs (with 1 warmup runs)
{
  "Big Tensors Save": {
    "mean": 17.8048762,
    "median": 17.458917
  },
  "Big Tensors Load": {
    "mean": 3.2556887,
    "median": 2.9668495000000004
  },
  "Small Tensors Save": {
    "mean": 4.0381357,
    "median": 3.9440125
  },
  "Small Tensors Load": {
    "mean": 5.8792499,
    "median": 5.603067
  },
  "benchmark_run_at": "2020-05-12T01:31:41"
}
```
After
```
Use zipfile serialization: True
2020-05-12 20:15:32 INFO     Benchmarking 'basic', best of 10 runs (with 1 warmup runs)
{
  "Big Tensors Save": {
    "mean": 4.7534657,
    "median": 4.646732
  },
  "Big Tensors Load": {
    "mean": 3.6001919,
    "median": 3.493285
  },
  "Small Tensors Save": {
    "mean": 4.1066924,
    "median": 4.1219255
  },
  "Small Tensors Load": {
    "mean": 6.3902358,
    "median": 6.36977
  },
  "benchmark_run_at": "2020-05-13T03:15:32"
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38379

Differential Revision: D21779494

Pulled By: voznesenskym

fbshipit-source-id: 694d65029a5b817424d454bd331e285df828c67a
2020-05-29 01:56:18 -07:00
Ivan Kobzarev
928e99b9bb [vulkan] jni build support USE_VULKAN (#39188)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39188

Extracting Vulkan_LIBS and Vulkan_INCLUDES setup from `cmake/Dependencies.cmake` to `cmake/VulkanDependencies.cmake` and reuse it in android/pytorch_android/CMakeLists.txt

Adding control to build with Vulkan setting env variable `USE_VULKAN` for `scripts/build_android.sh` `scripts/build_pytorch_android.sh`

We do not use Vulkan backend in pytorch_android, but with this build option we can track android aar change with `USE_VULKAN` added.

Currently it is 88Kb.

Test Plan: Imported from OSS

Differential Revision: D21770892

Pulled By: IvanKobzarev

fbshipit-source-id: a39433505fdcf43d3b524e0fe08062d5ebe0d872
2020-05-28 15:39:02 -07:00
peter
1fef2075a5 Disable some unsupported module for 32-bit build (#38950)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/38322#issuecomment-632976523 and https://github.com/pytorch/pytorch/issues/38322#issuecomment-628698852.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38950

Differential Revision: D21721918

Pulled By: ezyang

fbshipit-source-id: 999788bb88d3e3c2c06f8dec4f0d6b3389095936
2020-05-26 08:30:35 -07:00
Ivan Kobzarev
b460465a18 [Mobile GPU][Integration] Vulkan backend integration (#36491)
Summary:
This PR contains the initial version of Vulkan (GPU) Backend integration.
The primary target environment is Android, but the desktop build is also supported.

## CMake
Introducing three cmake options:
USE_VULKAN:
The main switch, if it is off, all other options do not affect.
USE_VULKAN_WRAPPER:
ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h.
OFF - linking with libvulkan.so directly
USE_VULKAN_SHADERC_RUNTIME:
ON - Shader compilation library will be linked, and shaders will be compiled runtime.
OFF - Shaders will be precompiled and shader compilation library is not included.

## Codegen
if `USE_VULKAN_SHADERC_RUNTIME` is ON:
Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp.
if `USE_VULKAN_SHADERC_RUNTIME` is OFF:
The source of shaders is included as `glsl.h`,`glsl.cpp`.

All codegen results happen in the build directory.

## Build dependencies
cmake/Dependencies.cmake
If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK.
Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it.
(Desktop build was tested only on Linux).

## Pytorch integration:
Adding 'Vulkan" as new Backend, DispatchKey, DeviceType.
We are using Strided layout without supporting strides at the moment, but we plan to support them in the future.
Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor,
more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h`

Main code location: `aten/src/ATen/native/vulkan`
`aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor.

`aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops.

`aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API

## GLSL shaders
Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files.
All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3

## Supported operations
Code point:
conv2d no-groups
conv2d depthwise
addmm
upsample nearest 2d
clamp
hardtanh

## Testing
`aten/src/ATen/test/vulkan_test.cpp` - contains tests for
copy from CPU to Vulkan and back
all supported operations
Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader

## Vulkan execution
The initial implementation is trivial and waits every operator's execution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491

Differential Revision: D21696709

Pulled By: IvanKobzarev

fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa
2020-05-26 08:30:13 -07:00
Wojciech Baranowski
945672bf3e cmake: improve dependencies in incremental builds (#37661)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26304

Test procedure:
With ninja:
[x] Build a clean checkout
[x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s.
[x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding.
[x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files
[x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding.
[x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s.

Without ninja:
[x] Build a clean checkout
[x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change.
[x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661

Differential Revision: D21434624

Pulled By: ezyang

fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338
2020-05-06 14:25:18 -07:00
Lucas Hosseini
8a30553738 [TensorPipe/RPC] Add TensorPipe dependency (#36695)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36695

Reviewed By: lw

Differential Revision: D21312297

Pulled By: beauby

fbshipit-source-id: 39fdc3de91efa4ac97dd169f09fb304b273b0050
2020-04-30 11:05:15 -07:00
Mo Zhou
58a46a174e [cmake] add USE_SYSTEM_{XNNPACK,ONNX} options. (#37501)
Summary:
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37501

Differential Revision: D21303527

Pulled By: ezyang

fbshipit-source-id: 58353d78c66e5bcc9198ce8cde36ac7232bb4b2f
2020-04-29 09:26:16 -07:00
Michael Suo
68895eda9d add fmt, take 7 (#37356)
Summary:
fmt is a formatting library for C++. It has several properties that make it nice
for inclusion in PyTorch:
- Widely used
- Basically copies how Python does it
- Support for all the compilers and platforms we care about
- Standards track (C++20)
- Small code size
- Header only

This PR includes it as a submodule and sets up the build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37356

Differential Revision: D21262619

Pulled By: suo

fbshipit-source-id: 1d9a1a5ed08a634213748e7b02fc718ef8dac4c9
2020-04-29 09:08:24 -07:00
cyy
9259a283b7 use detected python version to find pylibs (#34041)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34041

Differential Revision: D21302552

Pulled By: ezyang

fbshipit-source-id: 140c3d2146bad8feb425cf3670cffdbabc5101b1
2020-04-29 08:17:15 -07:00
Mo Zhou
5b9f7f7b0e [cmake] Add USE_SYSTEM_{GLOO,FP16,PTHREADPOOL,PSIMD,FXDIV,BENCHMARK} options (#14699) (#37277)
Summary:
These options are disabled by default, and are supposed to be used by
linux distro developers. With the existing shortcut option
USE_SYSTEM_LIBS toggled, these new options will be enabled as well.

Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should
no longer check the existence of git submodules.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277

Differential Revision: D21256999

Pulled By: ezyang

fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf
2020-04-27 09:37:27 -07:00
Mo Zhou
007163407c [cmake] Support "Generic" BLAS (#14699) (#37276)
Summary:
The "Generic" BLAS refers to the Netlib BLAS. This option is meaningful
to the Debian family due to the "update-alternatives" mechanism, which
enables the user to switch the libblas.so providers between different
implementations at runtime, such as ATLAS, OpenBLAS, and Intel MKL.
Such, building against generic BLAS provides much flexibility.

This new option is not documented in setup.py because it's only supposed
to be used by linux distro (especially Debian family) developersonly.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37276

Differential Revision: D21256877

Pulled By: ezyang

fbshipit-source-id: 55a5356653a1cfc763a5699b04afe5938f2007ec
2020-04-27 08:17:43 -07:00
Mo Zhou
ff21b15624 cmake: add USE_SYSTEM_{LIBS,CPUINFO,SLEEF} options (#14699) (#37137)
Summary:
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37137

Differential Revision: D21222632

Pulled By: ezyang

fbshipit-source-id: 47624b30f8d07b31a40a26edf665bbec39e45202
2020-04-23 20:43:36 -07:00
David Reiss
83de675ebf Fail CMake setup if trying to build with Python 2 (#35612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35612

Python 2 has reached end-of-life and is no longer supported by PyTorch.
To spare users from a long, doomed build when trying to use PyTorch with
Python 2, detect this case early and fail with a clear message.  This
commit covers CMake setup.

Test Plan: Attempted to build PyTorch with Python 2 and saw a clear error *quickly*.

Differential Revision: D20842873

Pulled By: dreiss

fbshipit-source-id: b35e38c12f9381ff4ca10cf801b7a03da87b1d19
2020-04-16 10:22:36 -07:00
Yinghai Lu
eb00bac2b5 Make FakeLowP tests work (#36525)
Summary:
Make the e2e FakeLowP python tests work with Glow lowering in OSS environment. Added a README.md as a guideline.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36525

Reviewed By: hyuen

Differential Revision: D21004706

Pulled By: yinghai

fbshipit-source-id: d182152e4a1a3368640bd7872cb9ea4d4bff4b02
2020-04-13 20:16:33 -07:00
Yinghai Lu
c1efe1ddb5 Enable building of FakeLowP ops (#36170)
Summary:
We open sourced the FakeLowp ops as a reference implementation of fp16 ops. This PR makes it buildable.

```
USE_CUDA=0 USE_ROCM=0 USE_FAKELOWP=ON python setup.py install
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36170

Test Plan:
Build Onnxifi library in Glow.
```
cp ${GLOW}/build/lib/Onnxifi/libonnxifi-glow.so ${MY_PATH}/ibonnxifi.so
LD_LIBRARY_PATH=${MY_PATH}/ibonnxifi.so python pytorch/caffe2/python/fakelowp/test_sls_nnpi_fp16.py
```

It doesn't run successfully right now because we need to open source the glow gflags and some other ops like `FbgemmPack`.

Reviewed By: houseroad

Differential Revision: D20980681

Pulled By: yinghai

fbshipit-source-id: 6dd31883a985850a77261bcc527029479bbc303f
2020-04-11 13:17:59 -07:00
Owen Anderson
b8383b3d4c [WIP] Enable NNC's LLVM dependency in CI (#35564)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35564

Differential Revision: D20848144

Pulled By: resistor

fbshipit-source-id: 992589447162766fbe8df0c696563511a2bb8e52
2020-04-06 15:54:35 -07:00
Nikita Shulga
e2adcc1c53 Report CUDA separate compilation flag (#35726)
Summary:
In Summary specify whether CUDA code is compiled with separate compilation enabled

Also, correctly handle space-separate TORCH_NVCC_FLAGS when adding them to NVCC_CUDA_FLAGS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35726

Test Plan: CI + local build with TORCH_NVCC_FLAGS set to "-Xfatbin -compress-all"

Differential Revision: D20830885

Pulled By: malfet

fbshipit-source-id: 0e0ecab4a97b6c8662a2c4bfc817857da9f32201
2020-04-02 19:35:02 -07:00
peter
3bdc4a37ed CMake script cleanup - mixed case for function names (#35589)
Summary:
Running the following code.
```bash
cmake --help-command-list |
grep -v "cmake version" |
while read c; do
    echo 's/\b'"$(echo $c | tr '[:lower:]' '[:upper:]')"'\(\s*\)(/'"$c"'\1(/g'
done >convert.sed &&
git ls-files -z -- bootstrap '*.cmake' '*.cmake.in' '*CMakeLists.txt' |
egrep -z -v '^(cmake/Modules/|cmake/Modules_CUDA_fix/)' |
xargs -0 sed -i -f convert.sed &&
rm convert.sed
```
cmake-lint is too sensitive about mixed case so I didn't switch the check on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35589

Differential Revision: D20735648

Pulled By: ezyang

fbshipit-source-id: a09a60a7ce921bb198575a35335faa299bd10b66
2020-03-30 11:37:02 -07:00
Nikita Shulga
b9adbb5002 Fix/relax CMake linter rules (#35574)
Summary:
Ignore mixed upper-case/lower-case style for now
Fix space between function and its arguments violation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574

Test Plan: CI

Differential Revision: D20712969

Pulled By: malfet

fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78
2020-03-27 16:52:33 -07:00
peter
45c9ed825a Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521)
Summary:
Running commands:
```bash
shopt -s globstar

sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake.in
```
We may further convert all the commands into lowercase according to the following issue: 77543bde41.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521

Differential Revision: D20704382

Pulled By: malfet

fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80
2020-03-27 14:25:17 -07:00
Johannes M Dieterich
835ee34e38 [ROCm] Update to ROCm 3.1.1 (#35552)
Summary:
Redux.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35552

Differential Revision: D20701593

Pulled By: ezyang

fbshipit-source-id: 1946d1e8fb47d597da903bae5d355bf52a5f017f
2020-03-27 12:21:12 -07:00
peter
f5383a213f Fix openmp detection with clang-cl (#35365)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35365

Differential Revision: D20653049

Pulled By: ezyang

fbshipit-source-id: 193c0d956b1aea72b3daa104ef49c4bf167a165a
2020-03-26 19:59:53 -07:00
Edward Yang
3622e1c90f Revert D20589048: [pytorch][PR] [ROCm] Update CI dockers to ROCm release 3.1.1
Test Plan: revert-hammer

Differential Revision:
D20589048

Original commit changeset: 568f40c1b90f

fbshipit-source-id: 724c4fe99e8806f00d2f7dceb71d15a02358f663
2020-03-26 09:31:59 -07:00
Johannes M Dieterich
f7f7c4edd9 [ROCm] Update CI dockers to ROCm release 3.1.1 (#33930)
Summary:
Request to update ROCm CI dockers to release 3.1

Changes required to the PyTorch source base attached:
* switch to the fast path for the Caffe2 ReLU operator
* switch to the new hipMemcpyWithStream(stream) API to replace hipMemcpyAsync(stream) && hipStreamSynchronize(stream) paradigm in an optimized fashion
* disable two regressed unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33930

Differential Revision: D20589048

Pulled By: ezyang

fbshipit-source-id: 568f40c1b90f311eb2ba57f02a9901114d8364af
2020-03-26 07:55:44 -07:00
Nikita Shulga
f87cd83d11 Append multiple arguments to list of flags as multiple items (#34899)
Summary:
This makes PyTorch compileable(but not linkable) with `CUDA_SEPARABLE_COMPILATION` option enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34899

Test Plan: CI

Differential Revision: D20501050

Pulled By: malfet

fbshipit-source-id: 02903890a827fcc430a26f397d4d05999cf3a441
2020-03-17 16:48:32 -07:00
Mikhail Zolotukhin
ea5c86c276 [TensorExpr] Add LLVM codegen. (#34228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34228

This PR adds LLVM codegen to tensor expressions. LLVM is added as an
optional build dependency specified with `USE_LLVM=<path_to_llvm>`
variable. If this variable is not set or LLVM is not found in the
specified path, the LLVM codegen is completely disabled.

Differential Revision: D20251832

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 77e203ab4421eb03afc64f8da17e0daab277ecc2
2020-03-16 11:49:34 -07:00
Kimish Patel
84bd71dbd4 Enable threading for XNNPACK ops. (#34547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34547

This enables threading by passing a threadpool to xnnpack ops.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20370553

fbshipit-source-id: 4db08e73f8c69b9e722b0e11a00621c4e229a31a
2020-03-14 12:53:36 -07:00
Hong Xu
e73d4286b0 Fix conflict between XNNPACK's clog dependency and our cpuinfo dependency (#33922)
Summary:
Currently if we run

```bash
DEBUG=1 ONNX_ML=0 MAX_JOBS=8 CMAKE_CXX_COMPILER_LAUNCHER=ccache CMAKE_C_COMPILER_LAUNCHER=ccache CMAKE_CUDA_COMPILER_LAUNCHER=ccache USE_OPENMP=0 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_NCCL=0 USE_CUDA=1 USE_CUDNN=0 USE_STATIC_CUDNN=0 USE_NNPACK=0 USE_QNNPACK=0 USE_FBGEMM=0 BUILD_TEST=0 TORCH_CUDA_ARCH_LIST="6.1" python setup.py develop --cmake-only
```

then `touch build/CMakeCache.txt` (which adjusting build options will
do), then `python setup.py develop`, the following error message will
show up:

```
CMake Error at build/clog-source/CMakeLists.txt:249 (ADD_SUBDIRECTORY):
ADD_SUBDIRECTORY not given a binary directory but the given source
directory "/home/hong/wsrc/pytorch/build/clog-source" is not a subdirectory
of "/home/hong/wsrc/pytorch/build/clog-source".  When specifying an
out-of-tree source a binary directory must be explicitly specified.
```

This is due to a conflict between our cpuinfo submodule and XNNPACK's
external clog dependency. Moving our cpuinfo upward and setting
CLOG_SOURCE_DIR resolves the issue.

 ---

Also reverted https://github.com/pytorch/pytorch/issues/33947 , where `CLOG_SOURCE_DIR` as an option is not quite appropriate (given that cpuinfo uses its included clog subdir) and the setting of this variable should be a bit later when the dir of cpuinfo is known.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33922

Differential Revision: D20193572

Pulled By: ezyang

fbshipit-source-id: 7cdbdc947a6c7e0ef10df33feccb5b20e1b3ba43
2020-03-02 10:40:12 -08:00
Kimish Patel
0e52627358 Fixing pthreadpool symbol conflict issue. (#33869)
Summary:
Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that
is conflicting, to pthread_create_c2.
Removed 2 other conflicting symbols that are not used internally at all.
Pointing XNNPACK to original repo instead of the fork.

Copy pasted the new interface and implementation to
caff2/utils/threadpool, so that for internal builds we compile against
this.

When threadpool is unified this will be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869

Differential Revision: D20140580

Pulled By: kimishpatel

fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3
2020-02-28 21:23:18 -08:00
David Reiss
991f7a20f2 Use clog from cpuinfo/deps instead of downloading (#33947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33947

XNNPACK was downloading clog because we weren't setting CLOG_SOURCE_DIR.
Actually, it was downloading cpuinfo and pointing to the copy of clog
within that.  So let's just point to the copy of clog within the cpuinfo
submodule we already have.

(Note: this ignores all push blocking failures!)

Test Plan:
Ran cmake and didn't see any downloading.
Verified that our clog is the same as the one that was being downloaded
with `diff -Naur`.

Differential Revision: D20169656

Pulled By: suo

fbshipit-source-id: ba0f7d1535f702e504fbc4f0102e567f860db94b
2020-02-28 15:19:03 -08:00
Wojciech Baranowski
8aa09de19e build: set -DNDEBUG in Release (#32719)
Summary:
This might lead to silent undefined behaviour (e.g. with out-of-bound indices). This affects `test_multinomial_invalid_probs_cuda` which is now removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32719

Test Plan:
* Build with VERBOSE=1 and manually inspect `less ndebug.build.log | grep 'c++' | grep -v -- -DNDEBUG` (only with nina on Linux)
* CI

Fixes https://github.com/pytorch/pytorch/issues/22745

Differential Revision: D20104340

Pulled By: yf225

fbshipit-source-id: 2ebfd7ddae632258a36316999eeb5c968fb7642c
2020-02-26 12:53:31 -08:00
Ashkan Aliabadi
6aecfd1e80 Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722

In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK.

XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards.  This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs.  This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way.

Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed.  The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance.

Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop.

The more optimal solution, and one we will  decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models.  Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution.

This PR does not include any of the front end changes  mentioned above.  Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644.  Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move.

Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509

Test Plan:
Build: CI
Functionality: Not exposed

Reviewed By: dreiss

Differential Revision: D20069796

Pulled By: AshkanAliabadi

fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c
2020-02-24 21:58:56 -08:00
Ashkan Aliabadi
039dc90854 Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration.
Test Plan: revert-hammer

Differential Revision:
D19521853

Original commit changeset: 99a1fab31d0e

fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6
2020-02-23 22:07:19 -08:00
Ashkan Aliabadi
941b42428a Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509)
Summary:
In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK.

XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards.  This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs.  This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way.

Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed.  The less efficient implementation would be to hook these operators into their corresponding **native** implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance.

Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop.

The more optimal solution, and one we will  decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models.  Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution.

This PR does not include any of the front end changes  mentioned above.  Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644.  Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move.

Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509

Reviewed By: dreiss

Differential Revision: D19521853

Pulled By: AshkanAliabadi

fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa
2020-02-23 19:08:42 -08:00
Hong Xu
15ba902c08 Turn ONNX_ML into a proper build option. (#33424)
Summary:
The detection of the env variable ONNX_ML has been properly handled in tools/setup_helpers/cmake.py,
line 242.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33424

Differential Revision: D20043991

Pulled By: ezyang

fbshipit-source-id: 91d1d49a5a12f719e67d9507cc203c8a40992f03
2020-02-21 15:42:33 -08:00
Hongzhang Shan
5e80ca12bb [pt][fbgemm] Turn on USE_FBGEMM on Windows env (#297)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/297

Pull Request resolved: https://github.com/pytorch/pytorch/pull/33250

As Title says. FBGEMM has recently added the support for Windows.

ghstack-source-id: 97932881

Test Plan: CI

Reviewed By: jspark1105

Differential Revision: D19738268

fbshipit-source-id: e7f3c91f033018f6355edeaf6003bd2803119df4
2020-02-19 15:09:21 -08:00
davidriazati
74ce3a032c Fix some bugs with zipfile serialization (#32244)
Summary:
Stacked PRs
 * #32958 - Make zip serialization the default
 * **#32244 - Fix some bugs with zipfile serialization**

It includes the following changes:
* Split up tests so that we can test both serialization methods
    * Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end)
* Call `readinto` on a buffer if possible instead of `read` + a copy
* Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine)
](https://our.intern.facebook.com/intern/diff/19418935/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32244

Pulled By: driazati

Reviewed By: eellison

Differential Revision: D19418935

fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573
2020-02-05 15:32:14 -08:00
Nathan Goldbaum
1f1ce53e8e Don't install pybind11 header directory for system pybind11 installs (#30758)
Summary:
For system pybind11 installs this is a system header location that should not get installed since it might include other unrelated headers. Since the header is already installed for a system install there's no need to install the headers, so only do the install when we use the bundled pybind11 version.

Closes https://github.com/pytorch/pytorch/issues/29823. Closes https://github.com/pytorch/pytorch/issues/30627.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30758

Differential Revision: D18820189

Pulled By: bddppq

fbshipit-source-id: fcc9fa657897e18c07da090752c912e3be513b17
2019-12-04 16:43:21 -08:00
Sebastian Messmer
bc2e6d10fa Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14"
Summary: Original commit changeset: 775d2e29be0b

Test Plan: CI

Reviewed By: mruberry

Differential Revision: D18775520

fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac
2019-12-03 14:33:43 -08:00
Sebastian Messmer
a2ed50c920 Revert D17908478: Switch PyTorch/Caffe2 to C++14
Test Plan: revert-hammer

Differential Revision:
D17908478

Original commit changeset: 6e340024591e

fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d
2019-11-27 14:57:05 -08:00
Sebastian Messmer
d0acc9c085 Switch PyTorch/Caffe2 to C++14 (#30406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406

ghstack-source-id: 94642238

Test Plan: waitforsandcastle

Differential Revision: D17908478

fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb
2019-11-27 10:47:31 -08:00
Daya Khudia
79b797ccac Build time warning on windows for fbgemm (#29062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29062

Build time warning
ghstack-source-id: 94202405

Test Plan: None

Reviewed By: jianyuh

Differential Revision: D18279505

fbshipit-source-id: 873cdeb848d34849d6babc435b1a42171f0609a3
2019-11-19 14:30:20 -08:00
Junjie Bai
b0c245d52d Consolidate the places that find pybind11 include dirs (#29659)
Summary:
Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659

Differential Revision: D18458208

Pulled By: bddppq

fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d
2019-11-12 14:51:56 -08:00
Junjie Bai
f111f1b1a7 Suppress implicit int-float conversion warning in ROCm build (#29604)
Summary:
```
c10/util/Half.h:467:37: warning: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-int-float-conversion]
  return f < limit::lowest() || f > limit::max();
                                  ~ ^~~~~~~~~~~~
c10/util/Half.h:497:41: note: in instantiation of function template specialization 'c10::overflows<long, double>' requested here
  if (!std::is_same<To, bool>::value && overflows<To, From>(f)) {
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29604

Differential Revision: D18440713

Pulled By: bddppq

fbshipit-source-id: f059b4e37e90fa84308be52ff5e1070ffd04031e
2019-11-12 10:44:28 -08:00
Sergei Nikolaev
1e2049c566 #26426 fixed (#28715)
Summary:
This is the fix for reverted https://github.com/pytorch/pytorch/issues/26426
houseroad bddppq soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28715

Reviewed By: hl475

Differential Revision: D18146731

Pulled By: houseroad

fbshipit-source-id: 247366451a6334e84df82d00339521f797b33130
2019-11-01 12:53:01 -07:00
Junjie Bai
d37c2d7c8d Revert D17495965: TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test
Test Plan: revert-hammer

Differential Revision:
D17495965

Original commit changeset: 3e8dbe8943f5

fbshipit-source-id: d47fcbec22b0d61df41d7dbf15cfdde196ac818f
2019-10-25 13:58:16 -07:00
Sergei Nikolaev
4996e3aca2 TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test (#26426)
Summary:
This PR makes Caffe2 compatible with TensorRT 6. To make sure it works well, new unit test is added. This test checks PyTorch->ONNX->TRT6 inference flow for all classification models from TorhchVision Zoo.
Note on CMake changes: it has to be done in order to import onnx-tensorrt project. See https://github.com/pytorch/pytorch/issues/18524 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26426

Reviewed By: hl475

Differential Revision: D17495965

Pulled By: houseroad

fbshipit-source-id: 3e8dbe8943f5a28a51368fd5686c8d6e86e7f693
2019-10-25 13:01:57 -07:00
Peter Bell
03d24dba6c Fix static linking cuDNN without static CUDA (#28378)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/27887#issuecomment-544649765

The logs show that `USE_STATIC_CUDNN` is used but not `CAFFE2_STATIC_LINK_CUDA`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28378

Differential Revision: D18061841

Pulled By: ezyang

fbshipit-source-id: 3b9b49953094e02f808ff12107ba4226688d9986
2019-10-22 10:08:09 -07:00
Edward Yang
a3902c901a Revert "Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887)" (#28310)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28310

This reverts commit 3d3bff5ff1.

Test Plan: Imported from OSS

Differential Revision: D18042859

Pulled By: ezyang

fbshipit-source-id: cded781dda6fcc04199af6abd07ac09fdc0405de
2019-10-21 14:45:17 -07:00
Peter Bell
3d3bff5ff1 Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15476, supersedes https://github.com/pytorch/pytorch/issues/23496, supersedes and closes https://github.com/pytorch/pytorch/issues/27607

As explained by rgommers in https://github.com/pytorch/pytorch/issues/23496, linking against the expanded library path for `libculibos` in `cmake/Dependencies.cmake` hard codes the path into the distributed cmake files.

Instead, I only link against the targets (e.g. `caffe2::cudnn`) and move the  dependency on `libculibos` into the cuda import targets declared in `cmake/public/cuda.cmake`. That file is distributed with the other cmake files and so the variable is expanded on the user's machine. I am now also using `CMAKE_STATIC_LIBRARY_SUFFIX` instead of `.a` to fix the windows issue from https://github.com/pytorch/pytorch/issues/15828.  I don't have a windows setup to confirm though.

Finally, to get pytorch to compile with the extra libraries enabled, I also had to link `__caffe2_nccl` to `torch_python`; otherwise I was getting include errors as the hard coded include directory was wrong. `nccl` is built into `build` not `third_party/build`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27887

Differential Revision: D17929440

Pulled By: ezyang

fbshipit-source-id: 3db6bd94d758fca2e1d6a64f4f5eea03cc07cf64
2019-10-16 09:21:47 -07:00
Johannes M Dieterich
17c672e704 enable rocTX API (#27416)
Summary:
ROCm 2.9 brings support for the rocTX API through rocTracer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416

Differential Revision: D17777480

Pulled By: bddppq

fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7
2019-10-05 01:55:00 -07:00
Junjie Bai
f4d0d0a811 Enable RCCL in ROCm build (#27383)
Summary:
continues https://github.com/pytorch/pytorch/pull/23884
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383

Differential Revision: D17767248

Pulled By: bddppq

fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90
2019-10-04 17:41:41 -07:00
Hong Xu
5e5cbceeba remove tools/setup_helpers/cudnn.py (#25876)
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.

Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876

Differential Revision: D17346270

Pulled By: ezyang

fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894
2019-09-24 07:44:33 -07:00
Jiakai Liu
d6e3aed032 add eigen blas for mobile build (#26508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26508

Enable BLAS for pytorch mobile build using Eigen BLAS.
It's not most juicy optimization for typical mobile CV models as we are already
using NNPACK/QNNPACK for most ops there. But it's nice to have good fallback
implementation for other ops.

Test Plan:
- Create a simple matrix multiplication script model:
```
import torch

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.weights = torch.ones(1000, 1000)

    def forward(self, x):
        return torch.mm(x, self.weights)

n = Net()
module = torch.jit.trace_module(n, {'forward': torch.ones(1000, 1000)})
module.save('mm.pk')
```

- Before integrate with eigen blas:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch \
--model=mm.pk \
--input_dims="1000,1000" \
--input_type=float \
--warmup=5 \
--iter=5'

Milliseconds per iter: 2218.52.
```

- After integrate with eigen blas:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch_eigen \
--model=mm.pk \
--input_dims="1000,1000" \
--input_type=float \
--warmup=5 \
--iter=5'

Milliseconds per iter: 314.535.
```

- Improve MobileNetV2 single thread perf by ~5%:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch \
--model=mobilenetv2.pk \
--input_dims="1,3,224,224" \
--input_type=float \
--warmup=5 \
--iter=20 \
--print_output=false \
--caffe2_threadpool_force_inline=true'

Milliseconds per iter: 367.055.

adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch_eigen \
--model=mobilenetv2.pk \
--input_dims="1,3,224,224" \
--input_type=float \
--warmup=5 \
--iter=20 \
--print_output=false \
--caffe2_threadpool_force_inline=true'

Milliseconds per iter: 348.77.
```

Differential Revision: D17489587

fbshipit-source-id: efe542db810a900f680da7ec7e60f215f58db66e
2019-09-20 15:45:11 -07:00
Ashkan Aliabadi
dc851ab5d4 Integrate forked QNNPACK into mobile PyTorch builds. (#25844)
Summary:
Enable forked QNNPACK builds in PyTorch mobile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25844

Differential Revision: D17336458

Pulled By: AshkanAliabadi

fbshipit-source-id: 6ea09dd6c114b64313e9159bf7f17253bc87bfdb
2019-09-16 20:50:43 -07:00
Jiakai Liu
075adb4d2d remove pthreadpool.a from install directory (#25977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977

Call add_subdirectory() explicitly before NNPACK/QNNPACK with
EXCLUDE_FROM_ALL property so that pthreadpool target won't be installed
by default for libtorch mobile build.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977

Test Plan: Imported from OSS

Differential Revision: D17312083

Pulled By: ljk53

fbshipit-source-id: 79851d0aa9402c5b9287ef4bbd8d7fd3a341497d
2019-09-11 12:27:56 -07:00
Jiakai Liu
74b48f21c1 remove protobuf from Dependencies.cmake for libtorch mobile build (#25958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25958

Should have cleaned up the remaining protobuf dependencies before landing PR #25896.

Test Plan: - CI build;

Reviewed By: dreiss

Differential Revision: D17296949

Pulled By: ljk53

fbshipit-source-id: 20c444e63900c7fa054db3cc757d3f18614af630
2019-09-10 18:23:20 -07:00
Soumith Chintala
73855ecd43 fix cudnn static linkage (#25848)
Summary:
Fix regression caused by https://github.com/pytorch/pytorch/pull/24938

This fixes CUDA nightly breakages
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25848

Differential Revision: D17256348

Pulled By: soumith

fbshipit-source-id: dded577717947d0f092e9d76b423b2bc7c56070a
2019-09-08 21:41:57 -07:00
J M Dieterich
748436a514 Enable BLIS from the FLAME project as a BLAS choice. (#23819)
Summary:
BLIS is AMD's official recommendation for BLAS.

Mimicks my ancient
f5bc78263e
in cmake upstream

BLIS WWW: https://github.com/flame/blis
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23819

Differential Revision: D17231360

Pulled By: bddppq

fbshipit-source-id: 68db70d63e410438f99b2bf57986b81ff6b6c5b3
2019-09-06 12:00:25 -07:00
Jiakai Liu
67c530851c get rid of protobuf dependencies (#25650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650

This PR removes protobuf dependencies from mobile build altogether:
- caffe2/proto: protobuf files, including caffe2.proto and torch.proto;
- caffe2 components that depend on caffe2.proto, including most part of
caffe2/core, caffe2/utils;
- libprotobuf / libprotobuf-lite dependencies;
- protobuf compiler;
- some utils class, e.g.: netdef_converter.cpp;
- introduce a macro to disable third_party/onnx which depends on protobuf;

Test Plan:
- builds;
- link with demo app to make sure it can load and run a model in pickle format;

Differential Revision: D17183548

Pulled By: ljk53

fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531
2019-09-06 08:48:20 -07:00
Johannes M Dieterich
9c5a899773 Enable jit fusion on ROCm (#22872)
Summary:
As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails
* new hipification rules for API_RTC
* add hiprtc APIs to the shim loader
* update cmake infrastructure to find the hiprtc library (it is part of the HIP package)
* enabling of unit tests in the jit_fuser test set
* special casing in resource strings for HIP - the typedefs CUDA requires would be redundant
* for now disable the occupancy calculation we do not support yet and hard-code

Thanks to t-vi for working with me on getting this integration done!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872

Differential Revision: D17207425

Pulled By: bddppq

fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe
2019-09-05 18:22:08 -07:00
Pieter Noordhuis
3556bea5aa Build torch.distributed with Gloo backend on macOS (#25260)
Summary:
In facebookincubator/gloo#212, a libuv based Gloo transport was introduced,
which allows us to use Gloo on macOS (and later perhaps also Windows). This
commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS.

A few notes:
* The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`.
* The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS).
* The TCP store works but sometimes crashes on process termination.
* The distributed tests are not yet run.
* The nightly builds don't use `USE_DISTRIBUTED=1`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260

Reviewed By: mrshenli

Differential Revision: D17202381

Pulled By: pietern

fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c
2019-09-05 07:09:50 -07:00
iotamudelta
4fe857187c switch to rocThrust for thrust/cub APIs (#25620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602

Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option.

Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header.

Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust.

Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable.

Skip four tests that fail with the new rocThrust for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864

Reviewed By: xw285cornell

Differential Revision: D16940768

Pulled By: bddppq

fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5
2019-09-03 22:16:30 -07:00
Hong Xu
03f67e4b16 Remove BUILD_ATEN_ONLY build option (#24441)
Summary:
This build option no longer works.

Close https://github.com/pytorch/pytorch/issues/21703
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24441

Differential Revision: D17138131

Pulled By: ezyang

fbshipit-source-id: 67adac990645a5df1f7c2e2dbef3689b2c30fcf8
2019-08-30 13:44:38 -07:00
peter
061f2d1683 Skip useless macros from Windows.h (#25444)
Summary:
Applying https://github.com/pytorch/pytorch/issues/25398 to the whole project.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25444

Differential Revision: D17131251

Pulled By: ezyang

fbshipit-source-id: 7a8817f3444aebd6028bf1056514355e2c4cc748
2019-08-30 06:42:44 -07:00
Pavel Belevich
2e224d62b6 Add USE_CUDNN check to AT_CUDNN_ENABLED definition (#25037)
Summary:
We have environment variable USE_CUDNN with self-explanatory name. However cpp code is compiled based on cpp macro definition AT_CUDNN_ENABLED, which is defined as:

```
  IF (NOT AT_CUDA_ENABLED OR NOT CUDNN_FOUND)
    MESSAGE(STATUS "CuDNN not found. Compiling without CuDNN support")
    set(AT_CUDNN_ENABLED 0)
  ELSE()
    include_directories(SYSTEM ${CUDNN_INCLUDE_DIRS})
    set(AT_CUDNN_ENABLED 1)
  ENDIF()
```

So, even if USE_CUDNN is set to 0, cpp is compiled with cuDNN if cmake finds cuDNN in the system. I actually tested it and was very surprised when I was debugging cuDNN code which I built with USE_CUDNN=0. I believe that cmake code above should look like this:

`IF (NOT AT_CUDA_ENABLED OR NOT CUDNN_FOUND OR NOT USE_CUDNN) ...`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25037

Differential Revision: D17048683

Pulled By: pbelevich

fbshipit-source-id: 48afa19eaae0bba2ffd49c1f68db0b4efd5cf85e
2019-08-27 18:43:11 -07:00
Hong Xu
92750acb88 Move the detection of cuDNN to FindCUDNN.cmake (#24938)
Summary:
Currently they sit together with other code in cuda.cmake. This commit is the first step toward cleaning up cuDNN detection in our build system.

Another attempt to https://github.com/pytorch/pytorch/issues/24293,  which breaks manywheels build because it does not handle `USE_STATIC_CUDNN` properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24938

Differential Revision: D17070920

Pulled By: ezyang

fbshipit-source-id: a4d017a3505c102d9c435a73ae62332e4336c52e
2019-08-27 06:51:52 -07:00
Edward Yang
907f5020c3 Revert D16914345: [pytorch][PR] Move the detection of cuDNN to FindCUDNN.cmake
Differential Revision:
D16914345

Original commit changeset: fd261478c01d

fbshipit-source-id: b933ad7ed49028ab9ac6976c3ae768132dc9bacb
2019-08-20 14:23:12 -07:00
Hong Xu
6ce6939be9 Move the detection of cuDNN to FindCUDNN.cmake (#24784)
Summary:
Currently they sit together with other code in cuda.cmake. This commit
is the first step toward cleaning up cuDNN detection in our build system.

Another attempt to https://github.com/pytorch/pytorch/issues/24293,  which breaks manywheels build because it does not handle `USE_STATIC_CUDNN`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24784

Differential Revision: D16914345

Pulled By: ezyang

fbshipit-source-id: fd261478c01d879dc770c1f1a56b17cc1a587be2
2019-08-20 01:55:46 -07:00
Edward Yang
c676db230d Revert D16834297: Move the search of cuDNN files to FindCUDNN.cmake.
Differential Revision:
D16834297

Original commit changeset: ec2c0ba0c659

fbshipit-source-id: 028a727f4baaaf4439c7ca17c999bba7ea6d419f
2019-08-16 08:30:21 -07:00
Hong Xu
482607c16c Move the search of cuDNN files to FindCUDNN.cmake.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24293

Test Plan: Imported from OSS

Differential Revision: D16834297

Pulled By: ezyang

fbshipit-source-id: ec2c0ba0c659d82fffd40d52ae723934377aa49c
2019-08-16 06:07:25 -07:00
peter
10d2ada17d Fix Z7_MSVC_OVERRIDE for C source files (#24389)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24145#issuecomment-521507234
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24389

Differential Revision: D16828222

Pulled By: ezyang

fbshipit-source-id: dcf652fbd8b8945c71993e9b99394e18ac542e6b
2019-08-15 06:52:42 -07:00
Hong Xu
61db8b64ec Build option USE_NUMA should only show up on Linux. (#23673)
Summary:
(intentionally left blank)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23673

Differential Revision: D16627453

Pulled By: vincentqb

fbshipit-source-id: df62f1b26901bec6369b5589b98124165f40e6f1
2019-08-09 08:17:52 -07:00
Peter Yeh
8df83ce559 Bump Gloo (#23400)
Summary:
Feature includes

- Log message if bind(2) fail
- Make collective work with single process context
- Use hipStreamCreateWithFlags instead of hipStreamCreateWithPriority
- Add RCCl support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23400

Differential Revision: D16623110

Pulled By: bddppq

fbshipit-source-id: e75cd8d2e2cad551ad0b0a08667320d7036b78bd
2019-08-02 11:26:28 -07:00
Johannes M Dieterich
4b78ce1ba4 Clean cmake infrastructure up (#23527)
Summary:
Only check for cmake dependencies we directly depend on (e.g., hipsparse but not rocsparse)

Use cmake targets for ROCm where possible.

While there, update the docker CI build infrastructure to only pull in packages by name we directly depend on (anticipating the demise of, e.g., miopengemm). I do not anticipate a docker rebuild to be necessary at this stage as the changes are somewhat cosmetic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23527

Differential Revision: D16561010

Pulled By: ezyang

fbshipit-source-id: 87cd9d8a15a74caf9baca85a3e840e9d19ad5d9f
2019-07-30 07:26:48 -07:00
Junjie Bai
67aede98c3 Exclude unused onnx targets (#23195)
Summary:
e.g. onnxifi_dummy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23195

Differential Revision: D16441493

Pulled By: bddppq

fbshipit-source-id: 76816e7a7c73f60f3c7abea10fbdbf086cea0476
2019-07-23 10:22:57 -07:00
Hong Xu
a62c687445 Remove unused atomics detection code. (#23089)
Summary:
USE_{C11,MSC,GCC}_ATOMICS are not used in PyTorch or submodules. Now we remove their underlying detection code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23089

Differential Revision: D16402750

Pulled By: ezyang

fbshipit-source-id: fde84b958eb0b5b4d3f0406acefa92ab30ea43be
2019-07-20 10:52:53 -07:00
Ilia Cherniavskii
23badc60f3 Fix TBB build for older versions of cmake
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23038

Test Plan:
with-proxy pip install --upgrade cmake==3.11.0
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

with-proxy pip install --upgrade cmake==3.13.3
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

with-proxy pip install --upgrade cmake==3.6.3
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

Imported from OSS

Differential Revision: D16365699

Pulled By: ilia-cher

fbshipit-source-id: cbf779dff63e4e186d9b4c2fc21539a24ce0d5a2
2019-07-18 20:12:26 -07:00
Hong Xu
a6441c00d6 Remove build variable NCCL_EXTERNAL (#22467)
Summary:
It's always set to equal USE_NCCL, we made Gloo depending on Caffe2 NCCL
build. See 30da84fbe1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22467

Differential Revision: D16098581

Pulled By: ezyang

fbshipit-source-id: f706ec7cebc2e6315bafca013b669f5a72e04815
2019-07-02 15:36:44 -07:00
Ilia Cherniavskii
6350dbddd1 Fix sequential MKL case (#22062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22062
ghimport-source-id: a30255d7453c4ffecf40215a785c1e06b7296368

Test Plan:
USE_CUDA=0 PARALLEL_BACKEND=OPENMP BLAS=MKL USE_MKLDNN=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ BUILD_BINARY=1 python setup.py develop --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D15938079

Pulled By: ilia-cher

fbshipit-source-id: e7ef0c5bc75ebb845ebe66bf76a4070d45305b35
2019-06-24 12:56:43 -07:00
Hong Xu
cd0d8480d3 Remove many build options redundantly specified in Python build scripts. (#21877)
Summary:
Currently many build options are explicitly passed from Python build scripts to CMake. But this is unecessary, at least for many of them. This commit removes the build options that have the same name in CMakeLists.txt and environment variables (e.g., `USE_REDIS`). Additionally, many build options that are not explicitly passed to CMake are lost.

For `ONNX_ML`, `ONNX_NAMESPACE`, and `BUILDING_WITH_TORCH_LIBS`, I changed their default values in CMake scripts (as consistent with what the `CMake.defines` call meant), to avoid their default values being redundantly set in the Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21877

Differential Revision: D15964996

Pulled By: ezyang

fbshipit-source-id: 127a46af7e2964885ffddce24e1a62995e0c5007
2019-06-24 07:17:54 -07:00
Karl Ostmo
49481d576d Torch rename (#20774)
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants).  Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.

The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774

Differential Revision: D15769965

Pulled By: kostmo

fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
2019-06-12 20:12:34 -07:00
Hong Xu
240d62fbaa Move redundant code that checks NumPy during build to a helper module and add an option to disable building with NumPy
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21417

Reviewed By: ezyang

Differential Revision: D15694357

Pulled By: fmassa

fbshipit-source-id: bc1bda23349ba4531f19619fa4adecb846225c20
2019-06-06 08:15:19 -07:00
Ilia Cherniavskii
580eab6562 Restore TBB module (#20454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454
ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384

Differential Revision: D15326062

Pulled By: ilia-cher

fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f
2019-05-28 02:49:36 -07:00
Syed Tousif Ahmed
5268b7dfaf Remove support for CUDA 8 (#20298)
Summary:
1.1.0 stopped support for CUDA 8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20298

Differential Revision: D15294639

Pulled By: ezyang

fbshipit-source-id: b9411bfe456f93f1529b745dc83b7d6310df684d
2019-05-13 11:24:22 -07:00
Junjie Bai
bc5398451e Enable ROCm multi-gpu with Gloo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18640

Differential Revision: D15185822

Pulled By: bddppq

fbshipit-source-id: 1b49ab3fb0f251cfc7ef3ddd62033ae0065a4ec3
2019-05-07 09:55:47 -07:00
Jiakai Liu
c7c02724cd CMakeLists changes to enable libtorch for Android (#19762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19762
ghimport-source-id: 287aa7fea4efd38994e14d794123eb2046b91fc0

Differential Revision: D15087653

Pulled By: ljk53

fbshipit-source-id: 4498ff9f7f7903c3e25541184302b811267958e9
2019-05-03 09:28:53 -07:00
Jiakai Liu
8cd6d2f101 rename BUILD_ATEN_MOBILE to INTERN_BUILD_MOBILE and make it private (#19942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19942
ghimport-source-id: 6bacc8f5ad7911af8cf5fde9fcb604ade666b862

Reviewed By: dzhulgakov

Differential Revision: D15144325

Pulled By: ljk53

fbshipit-source-id: d63a70f007110d5d1055d6bec1ed09a1a6aafdae
2019-05-01 00:20:24 -07:00
peter
3803d1c901 Fix conda build for Windows (#19824)
Summary:
Let's test it before merging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19824

Differential Revision: D15116111

Pulled By: soumith

fbshipit-source-id: 0a73de3f045ee1349061674f5f8e2aaba382493c
2019-04-27 23:10:46 -07:00
Sebastian Messmer
a456e1e196 Add either type (#19285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19285

The either type is a tagged union with two members.
This is going to be used in a diff stacked on top to allow a function to return one of two types.

Also, generally, either<Error, Result> is a great pattern for returning value_or_error from a function without using exceptions and we could use this class for that later.

Reviewed By: dzhulgakov

Differential Revision: D14931923

fbshipit-source-id: 7d1dd77b3e5b655f331444394dcdeab24772ab3a
2019-04-18 02:04:43 -07:00
Edward Yang
48a35135fb Convert all tabs to spaces, add CI. (#18959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959
ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156

Differential Revision: D14831246

Pulled By: ezyang

fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0
2019-04-09 08:12:26 -07:00
peter
5e33085f27 Make it possible for users for select /Zi or /ZI over /Z7 when using MSVC (#18790)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18701.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18790

Differential Revision: D14748195

Pulled By: ezyang

fbshipit-source-id: e50df1b5ca199a88d7b5ea3ea45d25d23cd31a27
2019-04-03 08:24:52 -07:00
Sebastian Messmer
bb8a0d717c Enable gmock and fix system gtest issue (#18706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18706

- Enable gmock
- Fix issue where the gtest source files in third_party would include system gtest headers

Reviewed By: ezyang

Differential Revision: D14715302

fbshipit-source-id: 5335390913e651bda85c69d7ea9b5c1bce58f172
2019-04-02 12:33:22 -07:00
Junjie Bai
0fe6e8c870 Remove ComputeLibrary submodule
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18052

Reviewed By: ezyang

Differential Revision: D14477355

fbshipit-source-id: c56b802f6d69701596c327cf9af6782f30e335fa
2019-03-16 09:06:42 -07:00
HE, Tao
98c54e9fa6 When openblas exists, "OpenBLAS_FOUND" is defined, rather than "OPENBLAS_FOUND". (#17841)
Summary:
See https://github.com/pytorch/pytorch/blob/master/cmake/Modules/FindOpenBLAS.cmake#L36

This typo lead to cmake fails to detect openblas on ubuntu.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17841

Differential Revision: D14400261

Pulled By: soumith

fbshipit-source-id: 287e019e122230cf6b70ab1ea94e5c514f429c88
2019-03-10 09:34:50 -07:00
Johannes M Dieterich
1607bb322d Support all ROCm supported uarchs simultaneously: gfx803, gfx900, gfx906 (#17367)
Summary:
Correct misspelled flag.

Remove dependency on debug flag (HCC_AMDGPU_TARGET)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17367

Differential Revision: D14227334

Pulled By: bddppq

fbshipit-source-id: d838f219a9a1854330b0bc851c40dfbba77a32ef
2019-02-26 11:54:07 -08:00
Lu Fang
3d68a2d6de Add foxi submodule (ONNXIFI facebook extension)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17178

Reviewed By: yinghai

Differential Revision: D14197987

Pulled By: houseroad

fbshipit-source-id: c21d7235e40c2ca4925a10c467c2b4da2f1024ad
2019-02-25 08:00:03 -08:00
Soumith Chintala
3a47d56946 Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA (#16705) (#17337)
Summary:
Attempt #2 (attempt 1 is https://github.com/pytorch/pytorch/pull/16705 and got reverted because of CI failures)

Fixes https://github.com/pytorch/pytorch/issues/14805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17337

Differential Revision: D14175626

Pulled By: soumith

fbshipit-source-id: 66f2e10e219a1bf88ed342ec5c89da6f2994d8eb
2019-02-21 16:12:02 -08:00
bddppq
c063a33ef3 Add support to build for multiple amd gpu targets (#17329)
Summary:
iotamudelta petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17329

Differential Revision: D14161277

Pulled By: bddppq

fbshipit-source-id: f3eb9f52e96a8fcd779c57df0f8c9a2c54754e35
2019-02-20 18:45:24 -08:00
Lu Fang
d73e6cb59d Automatic update of fbcode/onnx to 4c091e048ca42682d63ccd3c1811560bc12b732d (#17264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17264

Previous import was 822d8df0a2a32233c6022f50a158817a0f19bdc7

Included changes:
- **[4c091e0](https://github.com/onnx/onnx/commit/4c091e0)**: Support defined ONNX_ML in parent cmake files (#1821) <Lu Fang>
- **[57372f3](https://github.com/onnx/onnx/commit/57372f3)**: Delete OpsetVersionConverter.md which is a duplicate of VersionConverter.md (#1818) <Prasanth Pulavarthi>
- **[ab1c57e](https://github.com/onnx/onnx/commit/ab1c57e)**: [ONNXIFI]Add extension to be implementable (#1796) <Rui Zhu>
- **[b92eee8](https://github.com/onnx/onnx/commit/b92eee8)**: Revert "Implement Op Annotation's for ONNX (#1648)" (#1812) <Ke Zhang>
- **[61f1e9e](https://github.com/onnx/onnx/commit/61f1e9e)**: Enable ONNX_ML by default (#1810) <Shinichiro Hamaji>
- **[4f064a1](https://github.com/onnx/onnx/commit/4f064a1)**: fix Greater and Less doc (#1811) <Guoliang Hua>
- **[0628582](https://github.com/onnx/onnx/commit/0628582)**: Implement Op Annotation's for ONNX (#1648) <Armen>
- **[ad9d2f7](https://github.com/onnx/onnx/commit/ad9d2f7)**: Versioning doc update for Opset 9 (#1805) <Vinitra Swamy>
- **[e71e3be](https://github.com/onnx/onnx/commit/e71e3be)**: add dilation case for ConvTranspose op (#1797) <Randy>

Reviewed By: yinghai

Differential Revision: D14135024

fbshipit-source-id: 1e4f9dda89abf48994798d080dd5d58207a6e4b6
2019-02-19 14:54:34 -08:00
Edward Yang
4047c97266 Revert D13952085: [pytorch][PR] Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA
Differential Revision:
D13952085

Original commit changeset: 410c4e117a44

fbshipit-source-id: fca59c37e71f8e61ae52867d5401b28fbacefe5a
2019-02-05 07:42:59 -08:00
Soumith Chintala
3f570b5eea Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA (#16705)
Differential Revision: D13952085

Pulled By: soumith

fbshipit-source-id: 410c4e117a44c08eadc6f3ded91fafc320a7c696
2019-02-04 16:51:12 -08:00
JerryShih
73db487a8e Update the cmake build configuration for AppleClang compiler (#15820)
Summary:
This pr try to merge the https://github.com/pytorch/pytorch/pull/11563 again and fix the linking error in https://github.com/pytorch/pytorch/pull/14837.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15820

Differential Revision: D13942024

Pulled By: ezyang

fbshipit-source-id: dc6d1e9c4b0f177914f3745665244272a03ce33c
2019-02-04 08:53:47 -08:00
SsnL
13422fca32 Add torch.backends.openmp.is_available(); fix some cmake messages (#16425)
Summary:
1. add `torch.backends.openmp.is_available()`
2. Improve various `cmake` outputs
3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets
4. Fix `MKL` warning message, and QUIET flag.
5. Fix various typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425

Differential Revision: D13903395

Pulled By: soumith

fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d
2019-01-31 16:15:46 -08:00
Zachary DeVito
21193bf123 try to get rid of tmp_install (#16414)
Summary:
Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414

Differential Revision: D13863635

Pulled By: zdevito

fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
2019-01-29 17:29:40 -08:00
Yaxun (Sam) Liu
9521a15c88 hip-clang enablement (#16085)
Summary:
Initial enabling of the upcoming hip-clang compiler for the PyTorch source base.

Changes:
* update the Eigen submodule to a version including our upstreamed hip-clang enabling there
* modify a few ifdef guards with the `__HIP__` macro used by hip-clang
* use `__lane_id` instead of `hc::__lane_id`
* add Debug flags for ROCm to the cmake infrastructure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16085

Differential Revision: D13709459

Pulled By: ezyang

fbshipit-source-id: 1b7b33fe810a0434766180580d4443ea177eb7c7
2019-01-22 09:09:48 -08:00
peter
f7733526aa Generate PDB files for better debugging on Windows (#16008)
Summary:
1. Unify `build_pytorch_libs.bat`, `setup.py` and `torch/CMakeLists.txt` on the debugging flags with the `CMAKE_BUILD_TYPE` being `Debug`, `Release` and `RelWithDebInfo`.
2. Install PDBs through CMake if they are generated.

Reference:
1. CMake PDB install: https://gitlab.kitware.com/cmake/cmake/issues/18393#note_459199
2. About debugging flags https://stackoverflow.com/a/4662345
3. MSDN page about /DEBUG flag: https://docs.microsoft.com/en-us/cpp/build/reference/debug-generate-debug-info?view=vs-2017
4. MSDN page about /Z{i/I/7}: https://docs.microsoft.com/en-us/cpp/build/reference/z7-zi-zi-debug-information-format?view=vs-2017

Work to do:
- [x] Test the changes work in Release config through this PR
- [ ] <del> Test debug build through https://github.com/pytorch/pytorch/pull/16009 </del>
- [x] Test release build with debugging symbols through #16013

Difficulties:
- [x] Replace /Zi flags with /Z7 (which will be added if DEBUG or RelWithDebInfo is used), as it is not supported by sccache
- [x] Resolve `LINK : fatal error LNK1210: exceeded internal ILK size limit; link with /INCREMENTAL:NO` in the debug build
- [ ] DEBUG build blocked by a MSVC bug. In order to resolve it, we'll need to update the MSVC in CI: https://developercommunity.visualstudio.com/content/problem/225957/fatal-error-lnk1318-unexpected-pdb-error-ok-0.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16008

Differential Revision: D13709527

Pulled By: ezyang

fbshipit-source-id: e8365bc75d9ec64099093f7001f83d99a06b196b
2019-01-16 23:34:32 -08:00
Tongliang Liao
55511004d1 Resolve errors in perfkernel for Windows (#16031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16031

1. MSVC only has _mm_prefetch(const char*, int). Fixed in both python codegen and C++ files.
2. uint32_t in "cvtsh_ss_bugfix.h" requires "#include <cstdint>".
3. Some files use gflags headers. Add dependency via c10.
4. Isolate arch flags with interface library and private compile options.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15753

Reviewed By: dskhudia

Differential Revision: D13636233

Pulled By: jspark1105

fbshipit-source-id: cdcbd4240e07b749554a2a5676c11af88f23c31d
2019-01-16 21:51:00 -08:00
Shane Li
620ff25bdb Enhance cpu support on gloo based multi-nodes mode. (#11330)
Summary:
1. Add some gloo communication operators into related fallback list;
2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator;
3. Add new cpu context support for some python module files and resnet50 training example file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330

Reviewed By: yinghai

Differential Revision: D13624519

Pulled By: wesolwsk

fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f
2019-01-15 11:47:10 -08:00
Jesse Hellemn
8964a2e6e6 Split Caffe2 CI into cmake-only and python builds (#15917)
Summary:
bypass-lint

- Change all Caffe2 builds to use setup.py instead of cmake
- Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp
- Move skipIfCI logic from onnx test scripts to the rest of CI logic
- Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917

Reviewed By: orionr

Differential Revision: D13637583

Pulled By: pjh5

fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153
2019-01-14 15:20:44 -08:00
Gu, Jinghui
12e0ed55b4 Upgrade MKL-DNN to version 0.17 and static build MKL-DNN (#15504)
Summary:
Upgrade MKl-DNN to 0.17 and static build MKL-DNN to fix the potentail build error due to old mkldnn version in host system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15504

Differential Revision: D13547885

Pulled By: soumith

fbshipit-source-id: 46f790a3d9289c1e153e51c62be17c5206ea8f9a
2018-12-25 22:56:51 -08:00
James Reed
acbd9c49b0 Direct FBGEMM integraton into ATen (#13777)
Summary:
This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation:

1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel.
2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value.
3) Biases are unquantized
4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops *must* be run with FBGEMM

The API can be seen in the added test case. Highlights are:
1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above.
2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777

Differential Revision: D13383276

Pulled By: jamesr66a

fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344
2018-12-21 10:35:51 -08:00
Marat Dukhan
9abd755a76 Make cpuinfo logging less verbose (#15405)
Summary:
Log only errors in cpuinfo.

Fix to #15401 and #15398
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15405

Differential Revision: D13526251

Pulled By: Maratyszcza

fbshipit-source-id: 4d9eba0912f7b45093bed2e343cd77a151ffa8c4
2018-12-19 20:23:36 -08:00
Jerry Zhang
12cf5178aa caffe2 mobile opengl (#15322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15322

caffe2 mobile opengl code is not used, deleting it to reduce complications when we perform other changes

Reviewed By: Maratyszcza

Differential Revision: D13499943

fbshipit-source-id: 6479f6b9f50f08b5ae28f8f0bc4a1c4fc3f3c3c2
2018-12-18 08:20:52 -08:00
Edward Yang
71ee882157 Reenable OpenMP by reverting the following two commits. (#15315)
Summary:
Revert "Put back linker flag for OpenMP to prevent build break on ppc64le (#14569)"

This reverts commit a84e873bb1.

Revert "Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473)"

This reverts commit 8901935ad4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15315

Differential Revision: D13495852

Pulled By: ezyang

fbshipit-source-id: bcd3f60088b14831c53d3c171f10cd1ab6b35dee
2018-12-17 19:54:41 -08:00
bddppq
479481b6cb Remove linker and dlopen flags that allowed undefined symbols in rocm build (#15091)
Summary:
Previously the undefined symbols were caused by disabled_modules in tools/amd_build/disabled_features.json (now it's cleared).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15091

Differential Revision: D13429595

Pulled By: bddppq

fbshipit-source-id: b341e83f9e5a8d16440a364e837b045a8a4fd6e1
2018-12-11 23:23:47 -08:00
Edward Yang
b710642969 Make ATen HIPify out-of-place, but still reuse CUDA names. (#14866)
Summary:
```
    This diff changes the HIPification of ATen to be out-of-place.
    We now have the following mappings:

    - ATen/cuda => ATen/hip
    - ATen/native/cuda => ATen/native/hip
    - ATen/native/sparse/cuda => ATen/native/sparse/hip
    - THC => THH
    - THCUNN => THHUNN

    The build system is adjusted to know about these new build paths,
    and HIPify is taught how to adjust include paths and
    THC_GENERIC_FILE appropriately.  ATen_hip is now built as
    the ATen_hip library, rather than reusing ATen_cuda.

    However, despite these new filepaths, none of the identifiers in ATen
    have actually changed.  So, e.g., THHGeneral.h still defines functions
    named THC_blahblah, and HIP still shows up as CUDA in PyTorch itself.
    We'll tackle this in a subsequent PR; this diff is just to get the files
    out-of-place.

    Minor extra improvements:

    - Don't edit tmp_install when hipifying
    - HIP no longer builds native_cudnn_cpp; it was unnecessary
    - Caffe2_HIP_INCLUDES is now Caffe2_HIP_INCLUDE, for consistency
      with all the other variables.
    - HIP build now properly respects ATEN_CUDA_FILES_GEN_LIB (it
      did not previously.)
    - You can now override file extension matching in pyHIPIFY
      by explicitly specifying its full name in the matching list.
      This is used so we can HIPify CMakeLists.txt in some situations.

    A little bit of string and ceiling wax:

    - gen.py grows a --rocm flag so that it knows to generate CUDA
      files which actually refer to the HIP headers (e.g., THH.h)
      We'll get rid of this eventually and generate real HIP files,
      but not for this PR.
    - Management of HIP dependencies is now completely deleted
      from the ATen CMakeLists.txt.  The old code was dead (because
      it was shoveled in ATen_CUDA_DEPENDENCY_LIBS and promptly
      ignored by the Caffe2 build system) and didn't actually work.
```

Stacked on https://github.com/pytorch/pytorch/pull/14849 review last commit only
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14866

Differential Revision: D13419475

Pulled By: ezyang

fbshipit-source-id: cb4c843df69a1d8369314c9fab1b7719520fa3db
2018-12-11 19:15:27 -08:00
rohithkrn
7e2b074219 Integrate rocBLAS fp16 api into Caffe2 (#14882)
Summary:
This PR integrates rocBLAS half and mixed precision APIs in to Caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14882

Differential Revision: D13407840

Pulled By: bddppq

fbshipit-source-id: 75cb0d74da066776fa66575f1d255e879d36121e
2018-12-10 17:54:06 -08:00
Daniel Bermond
478eb70c07 Fix build with OpenCV 4.0 (#14356)
Summary:
Fixes #14355
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14356

Differential Revision: D13356237

Pulled By: bddppq

fbshipit-source-id: 2bf6ee21995c2c7b617c4e78ea7341f975f1b937
2018-12-07 16:40:31 -08:00
Sergei Nikolaev
a0ee3a279c USE_TENSORRT support and TensorRT 5 compatibility
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13945

Differential Revision: D13317525

Pulled By: yinghai

fbshipit-source-id: 8630dfec1bbc5aac19539e344e7c38a7fd8b051d
2018-12-07 14:01:11 -08:00
Daya S Khudia
50936cb06e Move avx2 specific code in different source files (#28)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/28

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14516

This is the first diff in a series of diffs that will separate out avx2 specific code in separate files. The goal is to compile as little as possible code with avx2 and avx512 compiler flags.

Reviewed By: jianyuh

Differential Revision: D13248376

fbshipit-source-id: 401c2e9d3cd96c420fd08c3efa011febce96ffbb
2018-12-05 12:19:35 -08:00
Your Name
cf059028f0 Do not load ROCm cmake files if USE_ROCM is off (#14261)
Summary:
Previously if it unconditionally tries to load rocm cmake files, so there was no way to disable rocm build. After this change, USE_ROCM=0 will disable rocm build.
Should fix #14025

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14261

Differential Revision: D13242090

Pulled By: bddppq

fbshipit-source-id: 652ec7d49dce9b357778bfa53a8e04b7079787ab
2018-11-29 11:17:19 -08:00
JerryShih
8901935ad4 Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473)
Summary:
Original PR: https://github.com/pytorch/pytorch/pull/11563
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14473

Differential Revision: D13234208

Pulled By: ezyang

fbshipit-source-id: 7d874c63659e93728af239ecdfb85547613e52ad
2018-11-28 09:28:26 -08:00
Jesse Hellemn
afb2c0ce86 changing some rpath stuff (#14304)
Summary:
See if anything breaks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14304

Differential Revision: D13201418

Pulled By: pjh5

fbshipit-source-id: ac2101b61a23bda37329d4d923c3d9d120e718bf
2018-11-26 15:57:47 -08:00
Marat Dukhan
351478439f Disable QNNPACK for multi-architecture iOS builds (#14125)
Summary:
QNNPACK contains assembly files, and CMake tries to build them for wrong architectures in multi-arch builds. This patch has two effects:
- Disables QNNPACK in multi-arch iOS builds
- Specifies a single `IOS_ARCH=arm64` by default (covers most iPhones/iPads on the market)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14125

Differential Revision: D13112366

Pulled By: Maratyszcza

fbshipit-source-id: b369083045b440e41d506667a92e41139c11a971
2018-11-16 21:18:01 -08:00
ArutyunovG
8e91da4cb3 Windows shared build (#13550)
Summary:
Hi guys,

I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios.
This is the first pull request.
Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015.
CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system.
Python is 3.5, Detectron works from python interface as well.
It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built.

What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat.

After this pull request the next step is to add Visual Studio 2017 support in the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550

Reviewed By: ezyang

Differential Revision: D13042597

Pulled By: orionr

fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc
2018-11-16 12:16:28 -08:00
Anders Papitto
2983998bb3 add torch-python target (#12742)
Summary:
This is the next minimal step towards moving _C into cmake. For now,
leave _C in setup.py, but reduce it to an empty stub file. All of its
sources are now part of the new torch-python cmake target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12742

Reviewed By: soumith

Differential Revision: D13089691

Pulled By: anderspapitto

fbshipit-source-id: 1c746fda33cfebb26e02a7f0781fefa8b0d86385
2018-11-16 11:43:48 -08:00
Johannes M Dieterich
53a3c46950 Switch to packaged Thrust on Ubuntu, enable CentOS 7.5 as a CI target (#12899)
Summary:
1) Use the hip-thrust version of Thrust as opposed to the GH master. (ROCm 267)

2) CentOS 7.5 docker (ROCm 279)

* Always install the libraries at docker creation for ubuntu.
* Add Dockerfile for CentOS ROCm
* Enable the centos build
* Source devtoolset in bashrc
* Set locales correctly depending on whether we are on Ubuntu or CentOS
* Install a newer cmake for CentOS
* Checkout thrust as there is no package for CentOS yet.

PyTorch/Caffe2 on ROCm passed tests: https://github.com/ROCmSoftwarePlatform/pytorch/pull/280

For attention: bddppq ezyang

Docker rebuild for Ubuntu not urgent (getting rid of Thrust checkout and package install is mainly cosmetic). If docker for CentOS 7.5 is wanted, build is necessary. Build of PyTorch tested by me in CentOS docker. PyTorch unit tests work mostly, however, a test in test_jit causes a python recursion error that seems to be due to the python2 on CentOS as we haven't ever seen this on Ubuntu - hence please do not enable unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12899

Differential Revision: D13029424

Pulled By: bddppq

fbshipit-source-id: 1ca8f4337ec6a603f2742fc81046d5b8f8717c76
2018-11-12 14:39:54 -08:00
Gu, Jinghui
d01cb70497 build with mkl-dnn by default (#13303)
Summary:
build with mkl-dnn by default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13303

Reviewed By: yinghai

Differential Revision: D12979633

Pulled By: orionr

fbshipit-source-id: 00d23fa27c0d13e82f7e5acb3ebd00ed7ba1d5dc
2018-11-08 11:18:27 -08:00
peter
fd9aaa6b79 Fix linking errors on Windows (#13100)
Summary:
1. Removes the flag "/FORCE:UNRESOLVED" that shouldn't be used.
2. Fix the code logic for ONNX_BUILD_MAIN_LIBS on Windows
3. Add a patch for protobuf using CMake
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13100

Differential Revision: D12978950

Pulled By: orionr

fbshipit-source-id: db9eb8136acf5712cfb5a24ed228b7934d873331
2018-11-08 09:54:09 -08:00
rohithkrn
afc7dbd586 Hipify caffe2/utils/math_gpu.cu (#13521)
Summary:
This PR adds caffe2/utils/math_gpu.cu to pyHipify

bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13521

Differential Revision: D12954843

Pulled By: bddppq

fbshipit-source-id: a2bf367da07e49cb7807ba6876b42d0733fc8205
2018-11-07 11:34:15 -08:00
Daya S Khudia
18de330e86 CMake integration for int8 server operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13558

Reviewed By: Maratyszcza

Differential Revision: D12945460

Pulled By: dskhudia

fbshipit-source-id: 1a91027b305fd6af77eebd9a4fad092a12f54712
2018-11-06 15:45:15 -08:00
Mingzhe Li
4bca51e3e7 unify BLAS check between Caffe2 and ATen (#13514)
Summary:
This PR is unifying BLAS check between Caffe2 and ATen. It skips redundant BLAS check for ATen in certain conditions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13514

Reviewed By: orionr

Differential Revision: D12905272

Pulled By: mingzhe09088

fbshipit-source-id: 05163704f363c97a762ff034f88a67bd32ac01d0
2018-11-02 18:40:10 -07:00
Hong Xu
a43c6385f1 When looking for pybind11, do not attempt to get properties from pybind11:pybind11. (#12188)
Summary:
There is no property name "INTERFACE_INCLUDE_DIRECTORIES" for pybind11::pybind11. This will cause cmake error if there exists a system installation of pybind11. In addition, pybind11_INCLUDE_DIRS is already set once "find_package(pybind11 CONFIG)" finds pybind11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12188

Differential Revision: D10362655

Pulled By: soumith

fbshipit-source-id: 9c5d13295c4a2cf9aacd03e195994287d06ed15c
2018-10-31 11:23:01 -07:00
Gu, Jinghui
dbab9b73b6 seperate mkl, mklml, and mkldnn (#12170)
Summary:
1. Remove avx2 support in mkldnn
2. Seperate mkl, mklml, and mkldnn
3. Fix convfusion test case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170

Reviewed By: yinghai

Differential Revision: D10207126

Pulled By: orionr

fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51
2018-10-29 10:52:55 -07:00
Junjie Bai
883da952be Hipify caffe2/core (#13148)
Summary:
petrex ashishfarmer iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13148

Reviewed By: xw285cornell

Differential Revision: D10862276

Pulled By: bddppq

fbshipit-source-id: 1754834ec50f7dd2f752780e20b2a9cf19d03fc4
2018-10-26 15:27:32 -07:00
Marat Dukhan
5e73b828bd CMake integration for Int8 ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13145

Differential Revision: D10860849

Pulled By: Maratyszcza

fbshipit-source-id: fdbcc23ff9beaeaedfd561176df6cfe87685c1f5
2018-10-25 22:25:10 -07:00
Orion Reblitz-Richardson
99d24aefc3 Move a number of ATen checks out of Dependencies.cmake (#12990)
Summary:
cc Yangqing mingzhe09088 anderspapitto mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12990

Differential Revision: D10862301

Pulled By: orionr

fbshipit-source-id: 62ba09cf0725f29692fac71bc30173469283390b
2018-10-25 17:26:25 -07:00
Anders Papitto
1b530fdae0 remove the find-package codepath for gloo in caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12893

Differential Revision: D10493310

Pulled By: anderspapitto

fbshipit-source-id: ba5bd375c118b0f0ab7fb7b9fda010fe17a6ac8d
2018-10-22 11:54:53 -07:00
Anders Papitto
8f51c513a6 gloo: build once, share between pytorch/caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12885

Differential Revision: D10492244

Pulled By: anderspapitto

fbshipit-source-id: 79af1ceb9bb0dab4585a728e64554ff4f38d6c32
2018-10-22 11:06:14 -07:00
Anders Papitto
79709f02e9 fix overwriting of CMAKE_EXE_LINKER_FLAGS (#12834)
Summary:
bug lurking since 2016
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12834

Reviewed By: bddppq

Differential Revision: D10452484

Pulled By: anderspapitto

fbshipit-source-id: 352584af06e2fb35338fb66b3d8eb1050b716349
2018-10-18 15:34:28 -07:00
Hector Yuen
17ab3bd502 implement rowwise quantization for fp16 (#12382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382

implement fp16-> (uint8 + scale and bias in fp32)

this is similar to fp32 rowwise quantization

we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways

Reviewed By: csummersea

Differential Revision: D10220463

fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f
2018-10-12 13:57:55 -07:00
Johannes M Dieterich
957142a4fe switch ROCm CI targets to white rabbit release (#12577)
Summary:
* switches docker files over to white rabbit release - removed custom package installs
* skips five tests that regressed in that release
* fixes some case-sensitivity issues in ROCm supplied cmake files by sed'ing them in the docker
* includes first changes to the infrastructure to support upcoming hip-clang compiler
* prints ROCm library versions as part of the build (as discussed w/ ezyang )
* explicitly searches for miopengemm
* installs the new hip-thrust package to be able to remove the explicit Thrust checkout in a future revision
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12577

Differential Revision: D10350165

Pulled By: bddppq

fbshipit-source-id: 60f9c9caf04a48cfa90f4c37e242d944a175ab31
2018-10-11 18:03:11 -07:00
Giovanni
0d50c117db Introduce BUILD_ATEN_ONLY cmake option (#12443)
Summary:
Following up #11488 conversation with orionr
And our brief conversation at PTDC about ATen with soumith and apaszke

This PR enables a very slim build focused on ATen particularly without caffe2 and protobuf among other dependencies.
WIth this PR NimTorch tests pass fully, including AD, convolutions, wasm, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12443

Reviewed By: mingzhe09088

Differential Revision: D10249313

Pulled By: orionr

fbshipit-source-id: 4f50503f08b79f59e7717fca2b4a1f420d908707
2018-10-10 12:54:19 -07:00
Christian Puhrsch
f564163951 Remove SSE-only code and convolve5x5 (#12109)
Summary:
Performance oriented code will use AVX/AVX2, so we don't need SSE specific code anymore. This will also reduce the probability of running into an error on legacy CPUs.

On top of this convolve is covered by modern libraries such as MKLDNN, which are much more performant and which we now build against by default (even for builds from source).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12109

Differential Revision: D10055134

Pulled By: colesbury

fbshipit-source-id: 789b8a34d5936d9c144bcde410c30f7eb1c776fa
2018-10-09 10:53:50 -07:00
vishwakftw
39bd73ae51 Guard NumPy usage using USE_NUMPY (#11798)
Summary:
All usages of the `ndarray` construct have now been guarded with `USE_NUMPY`. This eliminates the requirement of NumPy while building PyTorch from source.

Fixes #11757

Reviewed By: Yangqing

Differential Revision: D10031862

Pulled By: SsnL

fbshipit-source-id: 32d84fd770a7714d544e2ca1895a3d7c75b3d712
2018-10-04 12:11:02 -07:00
Yangqing Jia
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
Junjie Bai
65bf181ddf Add "ai.onnx.pytorch" onnx domain (#12157)
Summary:
zrphercule
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12157

Differential Revision: D10100799

Pulled By: bddppq

fbshipit-source-id: 76fdd126e0b52c54276752b3b0174735355a7d2f
2018-09-28 09:57:06 -07:00
Orion Reblitz-Richardson
94c513cc7f Improve pybind11 message (#11640)
Summary:
Improving the message based on https://github.com/pytorch/pytorch/issues/11570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11640

Differential Revision: D10033383

Pulled By: orionr

fbshipit-source-id: 0cdcdbe0582d896283a12970aebe771efa390dd2
2018-09-25 11:26:05 -07:00
Edward Yang
fcb3ccf23f Don't record Git version automatically via cmake (#12046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12046

This /sounds/ like a good idea in theory, but a feature
like this must be implemented very carefully, because if
you just plop the Git version in a header (that is included
by every file in your project, as macros.h is), then every
time you do a 'git pull', you will do a FULL rebuild, because
macros.h is going to regenerate to a new version and of course
you have to rebuild a source file if a header file changes.

I don't have time to implement it correctly, so I'm axing
the feature instead. If you want git versions in, e.g.,
nightly builds, please explicitly specify that when you feed
in the version.

Reviewed By: pjh5

Differential Revision: D10030556

fbshipit-source-id: 499d001c7b8ccd4ef15ce10dd6591c300c7df27d
2018-09-25 09:40:19 -07:00
Mingzhe Li
a7cbcb1bb9 Enable build_python on windows (#11385)
Summary:
The PR aims to resolve issues related to BUILD_PYTHON and BUILD_TEST after FULL_CAFFE2 is removed on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11385

Reviewed By: orionr

Differential Revision: D9884906

Pulled By: mingzhe09088

fbshipit-source-id: fc114c0cbff6223f1ec261161e4caecc1fef5dd6
2018-09-17 21:40:03 -07:00
Edward Yang
07fd4450ab Revert D9831398: [pytorch][PR] Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0)
Differential Revision:
D9831398

Original commit changeset: db119d3f9c26

fbshipit-source-id: 4f183c9c178c159473bdaaa6299d4d5eb8afe549
2018-09-17 09:39:23 -07:00
JerryShih
f5bc2aef07 Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#11563)
Summary:
Fix the link OpenMP link error for AppleClang 9.0 compiler.

Built with the following command:
python setup.py build develop

The error message:

```
Undefined symbols for architecture x86_64:
  "___kmpc_critical", referenced from:
      _THFloatTensor_addmm in THTensorMath.cpp.o
      _THDoubleTensor_addmm in THTensorMath.cpp.o
      _THByteTensor_addmm in THTensorMath.cpp.o
      _THCharTensor_addmm in THTensorMath.cpp.o
      _THShortTensor_addmm in THTensorMath.cpp.o
      _THIntTensor_addmm in THTensorMath.cpp.o
      _THLongTensor_addmm in THTensorMath.cpp.o
      ...
  "___kmpc_end_critical", referenced from:
      _THFloatTensor_addmm in THTensorMath.cpp.o
      _THDoubleTensor_addmm in THTensorMath.cpp.o
      _THByteTensor_addmm in THTensorMath.cpp.o
      _THCharTensor_addmm in THTensorMath.cpp.o
      _THShortTensor_addmm in THTensorMath.cpp.o
      _THIntTensor_addmm in THTensorMath.cpp.o
      _THLongTensor_addmm in THTensorMath.cpp.o
      ...
  "___kmpc_end_reduce_nowait", referenced from:
      _.omp_outlined..270 in THTensorMoreMath.cpp.o
      _.omp_outlined..271 in THTensorMoreMath.cpp.o
      _.omp_outlined..273 in THTensorMoreMath.cpp.o
      _.omp_outlined..275 in THTensorMoreMath.cpp.o
      _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o
      ...
  "___kmpc_end_serialized_parallel", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "___kmpc_for_static_fini", referenced from:
      _.omp_outlined..9 in Embedding.cpp.o
      _.omp_outlined. in EmbeddingBag.cpp.o
      _.omp_outlined. in GridSampler.cpp.o
      _.omp_outlined..42 in GridSampler.cpp.o
      _.omp_outlined..44 in GridSampler.cpp.o
      _.omp_outlined..45 in GridSampler.cpp.o
      _.omp_outlined..47 in GridSampler.cpp.o
      ...
  "___kmpc_for_static_init_4", referenced from:
      _.omp_outlined. in init.cpp.o
      _.omp_outlined..35 in init.cpp.o
      _.omp_outlined..36 in init.cpp.o
      _.omp_outlined..37 in init.cpp.o
      _.omp_outlined..49 in init.cpp.o
      _.omp_outlined..52 in init.cpp.o
      _.omp_outlined..220 in init.cpp.o
      ...
  "___kmpc_for_static_init_8", referenced from:
      _.omp_outlined..9 in Embedding.cpp.o
      _.omp_outlined. in EmbeddingBag.cpp.o
      _.omp_outlined. in GridSampler.cpp.o
      _.omp_outlined..42 in GridSampler.cpp.o
      _.omp_outlined..44 in GridSampler.cpp.o
      _.omp_outlined..45 in GridSampler.cpp.o
      _.omp_outlined..47 in GridSampler.cpp.o
      ...
  "___kmpc_for_static_init_8u", referenced from:
      _.omp_outlined..203 in init.cpp.o
      _.omp_outlined..207 in init.cpp.o
      _.omp_outlined..209 in init.cpp.o
      _.omp_outlined..210 in init.cpp.o
  "___kmpc_fork_call", referenced from:
      at::native::embedding_dense_backward_cpu(at::Tensor const&, at::Tensor const&, long long, long long, bool) in Embedding.cpp.o
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::grid_sampler_2d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_3d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_2d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_3d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      ...
  "___kmpc_global_thread_num", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "___kmpc_push_num_threads", referenced from:
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      ...
  "___kmpc_reduce_nowait", referenced from:
      _.omp_outlined..270 in THTensorMoreMath.cpp.o
      _.omp_outlined..271 in THTensorMoreMath.cpp.o
      _.omp_outlined..273 in THTensorMoreMath.cpp.o
      _.omp_outlined..275 in THTensorMoreMath.cpp.o
      _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o
      ...
  "___kmpc_serialized_parallel", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "_omp_get_max_threads", referenced from:
      _THGetNumThreads in THGeneral.cpp.o
      caffe2::Caffe2SetOpenMPThreads(int*, char***) in init_omp.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      ...
  "_omp_get_num_procs", referenced from:
      _THGetNumCores in THGeneral.cpp.o
  "_omp_get_num_threads", referenced from:
      _.omp_outlined. in Embedding.cpp.o
      _.omp_outlined. in SoftMax.cpp.o
      _.omp_outlined..35 in SoftMax.cpp.o
      _.omp_outlined..37 in SoftMax.cpp.o
      _.omp_outlined..38 in SoftMax.cpp.o
      _.omp_outlined..46 in SoftMax.cpp.o
      _.omp_outlined..47 in SoftMax.cpp.o
      ...
  "_omp_get_thread_num", referenced from:
      _.omp_outlined. in Embedding.cpp.o
      _.omp_outlined. in SoftMax.cpp.o
      _.omp_outlined..35 in SoftMax.cpp.o
      _.omp_outlined..37 in SoftMax.cpp.o
      _.omp_outlined..38 in SoftMax.cpp.o
      _.omp_outlined..46 in SoftMax.cpp.o
      _.omp_outlined..47 in SoftMax.cpp.o
      ...
  "_omp_in_parallel", referenced from:
      _THFloatTensor_copy in THTensorCopy.cpp.o
      _THDoubleTensor_copy in THTensorCopy.cpp.o
      _THByteTensor_copy in THTensorCopy.cpp.o
      _THCharTensor_copy in THTensorCopy.cpp.o
      _THShortTensor_copy in THTensorCopy.cpp.o
      _THIntTensor_copy in THTensorCopy.cpp.o
      _THLongTensor_copy in THTensorCopy.cpp.o
      ...
  "_omp_set_num_threads", referenced from:
      _THSetNumThreads in THGeneral.cpp.o
      caffe2::Caffe2SetOpenMPThreads(int*, char***) in init_omp.cc.o
ld: symbol(s) not found for architecture x86_64
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11563

Differential Revision: D9831398

Pulled By: ezyang

fbshipit-source-id: db119d3f9c26a71180335ad955f2f62c5369f9ed
2018-09-17 08:24:20 -07:00
Junjie Bai
d24bcfd930 Suppress hiprand "duplicate-decl-specifier" warning (#11698)
Summary:
Otherwise each build produces 65MB of warnings log, which makes the CI hard to debug.

iotamudelta Jorghi12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11698

Differential Revision: D9840356

Pulled By: bddppq

fbshipit-source-id: b69bf6a5c38a97b188221f9c084c608ffc9b37c8
2018-09-14 15:51:43 -07:00
Yangqing Jia
3121c8f526 Update gtest and remove the macro guide on gtest from #11321 (#11417)
Summary:
Last PR seems to have test failures, re-issuing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11417

Reviewed By: orionr

Differential Revision: D9784706

Pulled By: Yangqing

fbshipit-source-id: 9e5f347e19fa2700ff69d2cd69ea7a9e01a91609
2018-09-11 20:16:08 -07:00
Orion Reblitz-Richardson
a175282776 Flags for LMDB, LevelDB, and Caffe2 ops (#11462)
Summary:
Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with

```
USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps
```

Also add a flag to build Caffe2 ops, which is default `ON`. Disable with

```
NO_CAFFE2_OPS=1 python setup.py build_deps
```

cc Yangqing soumith pjh5 mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462

Reviewed By: soumith

Differential Revision: D9758156

Pulled By: orionr

fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63
2018-09-10 17:27:50 -07:00
Tongliang Liao
538ea67437 Search for CMake config files for pybind11. (#11423)
Summary:
If pybind is build with cmake and installed, we should use config file instead of the Findpybind11 shipped with caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11423

Differential Revision: D9735557

Pulled By: ezyang

fbshipit-source-id: 28a39e579fa045060aa1a716e5fd7dbcf7b89569
2018-09-08 22:44:03 -07:00
Orion Reblitz-Richardson
802d21c8f4 Remove FULL_CAFFE2 flag (#11321)
Summary:
Continuing pjh5's work to remove FULL_CAFFE2 flag completely.

With these changes you'll be able to also do something like

```
NO_TEST=1 python setup.py build_deps
```
and this will skip building tests in caffe2, aten, and c10d. By default the tests are built.

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321

Reviewed By: mingzhe09088

Differential Revision: D9694950

Pulled By: orionr

fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8
2018-09-07 15:09:44 -07:00
Yangqing Jia
68613cf5a2 Windows DLL build with Caffe2 code (#11266)
Summary:
This is an experimental build on top of what orionr and mingzhe09088 built.

Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266

Reviewed By: orionr

Differential Revision: D9682942

Pulled By: Yangqing

fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3
2018-09-06 15:12:20 -07:00
Anders Papitto
a853a74217 defer resolution of mkl to a cmake wrapper library (#11298)
Summary:
this is a fix that's needed for building extensions with a
pre-packaged pytorch. Consider the scenario where

(1) pytorch is compiled and packaged on machine A
(2) the package is downloaded and installed on machine B
(3) an extension is compiled on machine B, using the downloaded package

Before this patch, stage (1) would embed absolute paths to the system
installation of mkl into the generated Caffe2Config.cmake, leading to
failures in stage (3) if mkl was not at the same location on B as on
A. After this patch, only a reference to the wrapper library is
embedded, which is re-resolved on machine B.

We are already using a similar approach for cuda.

Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298

Differential Revision: D9683150

Pulled By: anderspapitto

fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4
2018-09-06 09:10:39 -07:00
Yangqing Jia
684b55d762 In default, use third party eigen. Added new flag USE_SYSTEM_EIGEN_INSTALL to control. (#11020)
Summary:
TSIA. apaszke pointed out that it might be better to use third party folder in default, since system Eigen may often be out of date and does not have the version we need to compile successfully.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11020

Differential Revision: D9562548

Pulled By: Yangqing

fbshipit-source-id: d8ab8a6ebe1f3d9eec638ef726cf5dc4dcf777b5
2018-09-04 10:56:22 -07:00
iotamudelta
33c7cc13ca improve docker packages, fix bugs, enable tests, enable FFT (#10893)
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893

Differential Revision: D9615053

Pulled By: ezyang

fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
2018-09-02 08:54:42 -07:00