Commit Graph

229 Commits

Author SHA1 Message Date
Hong Xu
a8edc2b5d2 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22926

Differential Revision: D16546369

Pulled By: colesbury

fbshipit-source-id: 56f7ef4476e586dee19366fdb720085d1c2f2027
2019-07-29 13:47:05 -07:00
Hong Xu
09ba4df031 Whether MKLDNN should be built under native arch should respect USE_NATIVE_ARCH (#23445)
Summary:
Currently there is no way to build MKLDNN more optimized than sse4. This commit let MKLDNN build respect USE_NATIVE_ARCH.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23445

Differential Revision: D16542275

Pulled By: ezyang

fbshipit-source-id: 550976531d6a52db9128c0e3d4589a33715feee2
2019-07-29 08:13:56 -07:00
Gu, Jinghui
1dd4d55565 Improve FindMKLDNN.cmake to avoid binary compatibility issue in MKL-DNN (#23292)
Summary:
Illegal instruction is encountered in pre-built package in MKL-DNN. https://github.com/pytorch/pytorch/issues/23231
To avoid such binary compatibility issue, the HostOpts option in MKL-DNN is disabled in order to build MKL-DNN for generic arch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23292

Differential Revision: D16488773

Pulled By: soumith

fbshipit-source-id: 9e13c76fb9cb9338103cb767d7463c10891d294a
2019-07-25 04:42:26 -07:00
Hong Xu
60c46dd4df Let CMake handle NCCL detection instead of our handcrafted Python script. (#22930)
Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81e A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
2019-07-23 08:45:51 -07:00
Edward Yang
798d5d9771 Revert D16281714: Add sanity checks for NCCL detection.
Differential Revision:
D16281714

Original commit changeset: 396bcbf099bd

fbshipit-source-id: a22cc112d1b6a62d689f9d8a7f93e8be3abe2a44
2019-07-16 13:58:27 -07:00
Will Feng
01f03d56ee Revert D16283037: Add sanity checks for NCCL detection.
Differential Revision:
D16283037

Original commit changeset: fc09c9443a56

fbshipit-source-id: 30cdf7b1ad91498ee615d018de5571ba36f4383e
2019-07-16 13:20:43 -07:00
Hong Xu
31497799b9 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16283037

Pulled By: ezyang

fbshipit-source-id: fc09c9443a568d9af1c78a847282a7d707c49dd6
2019-07-16 11:32:36 -07:00
Hong Xu
e2046f8c1d Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16281714

Pulled By: ezyang

fbshipit-source-id: 396bcbf099bd07b996cf779c6b43092096b52d90
2019-07-16 11:32:32 -07:00
Hui Wu
07ef85e326 Add USE_MKLDNN_CBLAS build option. (#19014)
Summary:
MKL-DNN is the main library for computation when we use ideep device. It can use kernels implemented by different algorithms (including JIT, CBLAS, etc.) for computation. We add the "USE_MKLDNN_CBLAS" (default OFF) build option so that users can decide whether to use CBLAS computation methods or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19014

Differential Revision: D16094090

Pulled By: ezyang

fbshipit-source-id: 3f0b1d1a59a327ea0d1456e2752f2edd78d96ccc
2019-07-02 12:29:54 -07:00
Hong Xu
e6d4a2d289 Remove unused file cmake/Modules/FindMIOpen.cmake (#22244)
Summary:
`cmake/public/LoadHIP.cmake` calls `find_package(miopen)`, which uses the CMake module in MIOpen installation (It includes the line `set(miopen_DIR ${MIOPEN_PATH}/lib/cmake/miopen)`). `cmake/Modules/FindMIOpen.cmake` is not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22244

Differential Revision: D16000771

Pulled By: bddppq

fbshipit-source-id: 07bb40fdf033521e8427fc351715d47e6e30ed34
2019-06-26 21:21:46 -07:00
Ilia Cherniavskii
6350dbddd1 Fix sequential MKL case (#22062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22062
ghimport-source-id: a30255d7453c4ffecf40215a785c1e06b7296368

Test Plan:
USE_CUDA=0 PARALLEL_BACKEND=OPENMP BLAS=MKL USE_MKLDNN=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ BUILD_BINARY=1 python setup.py develop --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D15938079

Pulled By: ilia-cher

fbshipit-source-id: e7ef0c5bc75ebb845ebe66bf76a4070d45305b35
2019-06-24 12:56:43 -07:00
bddppq
4940e41d16 Fix mkl-dnn tautological compare error (#21371)
Summary:
```
../third_party/ideep/mkl-dnn/src/cpu/jit_avx512_common_convolution.hpp:144:821: error: self-comparison always evaluates to true [-Werror,-Wtautological-compare]
        virtual pd_t *clone() const override { return new pd_t(*this); } virtual status_t create_primitive(primitive_t **primitive, const primitive_at_t *inputs, const primitive_t **outputs) const override { double ms = get_msec(); primitive_t::input_vector ins(inputs, inputs + this->n_inputs()); primitive_t::outpu
t_vector outs(outputs, outputs + this->n_outputs()); auto ret = safe_ptr_assign<primitive_t>(*primitive, new (jit_avx512_common_convolution_bwd_data_t)(this, ins, outs)); ms = get_msec() - ms; if (mkldnn_verbose()->level >= 2) { printf("mkldnn_verbose,create,%s,%g\n", this->info(), ms); fflush(0); } return ret; } v
irtual const char *name() const override { return (avx512_common == sse42 ? "jit:" "sse42" : (avx512_common == avx ? "jit:" "avx" : (avx512_common == avx2 ? "jit:" "avx2" : (avx512_common == avx512_common ? "jit:" "avx512_common" : (avx512_common == avx512_core ? "jit:" "avx512_core" : (avx512_common == avx512_mic
? "jit:" "avx512_mic" : (avx512_common == avx512_mic_4ops ? "jit:" "avx512_mic_4ops" : "jit:" ""))))))); };
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21371

Differential Revision: D15631392

Pulled By: bddppq

fbshipit-source-id: 3b0008acab8ae53ce61327686bd8367e7fb5d298
2019-06-04 15:27:07 -07:00
Ilia Cherniavskii
580eab6562 Restore TBB module (#20454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454
ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384

Differential Revision: D15326062

Pulled By: ilia-cher

fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f
2019-05-28 02:49:36 -07:00
peter
872bab22c6 Some essential changes needed before updating the Windows AMI (#20353)
Summary:
1. Add cuda 10.1 build
2. Turn on openmp loop support for VS 2019
3. Remove legacy code about selective builds

Tested through CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20353

Differential Revision: D15294806

Pulled By: ezyang

fbshipit-source-id: 0acf5c3fbbc398fd9ebdf9f97653499d39638432
2019-05-10 09:08:51 -07:00
Ilia Cherniavskii
481b6d0268 Allow a non-OpenMP based build (#19749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19749
ghimport-source-id: a6636c0acddbdc5fd5b0dcb20b9f80cbdb9159b9

Differential Revision: D15141993

Pulled By: ilia-cher

fbshipit-source-id: 96085608398b2a4c97c68b2948f5184d07f9ad3d
2019-05-06 19:34:48 -07:00
Edward Yang
48a35135fb Convert all tabs to spaces, add CI. (#18959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959
ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156

Differential Revision: D14831246

Pulled By: ezyang

fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0
2019-04-09 08:12:26 -07:00
Balint Cristian
67fdb4abf7 AVX2 with GCC9 fix. (#18991)
Summary:
Dear All,

The proposed patch fixes the test code snippets used in cmake infrastructure, and implicit failure to set properly the ```CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS``` flag. The libcaffe2.so will have some ```UND``` avx2 related references, rendering it unusable.

* Using GCC 9 test code from cmake build infra always fails:
```
$ gcc  -O2 -g -pipe -Wall -m64 -mtune=generic -fopenmp -DCXX_HAS_AVX_1 -fPIE -o test.o -c test.c -mavx2
test.c: In function ‘main’:
test.c:11:26: error: incompatible type for argument 1 of ‘_mm256_extract_epi64’
   11 |     _mm256_extract_epi64(x, 0); // we rely on this in our AVX2 code
      |                          ^
      |                          |
      |                          __m256 {aka __vector(8) float}
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/immintrin.h:51,
                 from test.c:4:
/usr/lib/gcc/x86_64-redhat-linux/9/include/avxintrin.h:550:31: note: expected ‘__m256i’ {aka ‘__vector(4) long long int’} but argument is of type ‘__m256’ {aka ‘__vector(8) float’}
  550 | _mm256_extract_epi64 (__m256i __X, const int __N)
      |

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 9.0.1 20190328 (Red Hat 9.0.1-0.12) (GCC)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18991

Differential Revision: D14821838

Pulled By: ezyang

fbshipit-source-id: 7eb3a854a1a831f6fda8ed7ad089746230b529d7
2019-04-07 08:27:00 -07:00
Thomas Viehmann
13bc002422 fixes for AVX detection (#17915)
Summary:
Our AVX2 routines use functions such as _mm256_extract_epi64
that do not exist on 32 bit systems even when they have AVX2.
This disables AVX2 when _mm256_extract_epi64 does not exist.

This fixes the "local" part of #17901 (except disabling FBGEMM),
but there also is sleef to be updated and NNPACK to be fixed,
see the bug report for further discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17915

Differential Revision: D14437338

Pulled By: soumith

fbshipit-source-id: d4ef7e0801b5d1222a855a38ec207dd88b4680da
2019-03-13 03:55:06 -07:00
JerryShih
73db487a8e Update the cmake build configuration for AppleClang compiler (#15820)
Summary:
This pr try to merge the https://github.com/pytorch/pytorch/pull/11563 again and fix the linking error in https://github.com/pytorch/pytorch/pull/14837.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15820

Differential Revision: D13942024

Pulled By: ezyang

fbshipit-source-id: dc6d1e9c4b0f177914f3745665244272a03ce33c
2019-02-04 08:53:47 -08:00
SsnL
13422fca32 Add torch.backends.openmp.is_available(); fix some cmake messages (#16425)
Summary:
1. add `torch.backends.openmp.is_available()`
2. Improve various `cmake` outputs
3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets
4. Fix `MKL` warning message, and QUIET flag.
5. Fix various typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425

Differential Revision: D13903395

Pulled By: soumith

fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d
2019-01-31 16:15:46 -08:00
rtarquini
879bccb1af Support for Jetson Xavier (#15660)
Summary:
The request changes are to support building Pytorch 1.0 on the Jetson Xavier with Openblas.  Jetson Xavier with Jetpack 3.3 has generic lapack installed. To pick up the CUDA accelerated BLAS/Lapack, I had to build Openblas and build/link pytorch from source. Otherwise, I got a runtime error indicating lapack routines were not cuda enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15660

Differential Revision: D13571324

Pulled By: soumith

fbshipit-source-id: 9b148d081d6e7fa7e1824dfdd93283c67f69e683
2019-01-02 18:51:42 -08:00
Gu, Jinghui
12e0ed55b4 Upgrade MKL-DNN to version 0.17 and static build MKL-DNN (#15504)
Summary:
Upgrade MKl-DNN to 0.17 and static build MKL-DNN to fix the potentail build error due to old mkldnn version in host system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15504

Differential Revision: D13547885

Pulled By: soumith

fbshipit-source-id: 46f790a3d9289c1e153e51c62be17c5206ea8f9a
2018-12-25 22:56:51 -08:00
Edward Yang
54d8ce94ee Revert D13383102: [pytorch][PR] Upgrade MKL-DNN to version 0.17
Differential Revision:
D13383102

Original commit changeset: c434f0e0ddff

fbshipit-source-id: 690f46ca0710954fa591a5ea77535e9759db4de5
2018-12-18 07:39:20 -08:00
Gu, Jinghui
4b97a46421 Disable strict-overflow flag to avoid compilation error (#14977)
Summary:
Disable strict-overflow flag to avoid compilation error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14977

Differential Revision: D13447577

Pulled By: soumith

fbshipit-source-id: 1957bd5aa3c7b79219da3dd53560464977c89526
2018-12-12 22:41:33 -08:00
Gu, Jinghui
70598740ec Upgrade MKL-DNN to version 0.17 (#14308)
Summary:
upgrade MKL-DNN to version 0.17
update mkldnn bridge to latest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14308

Differential Revision: D13383102

Pulled By: yinghai

fbshipit-source-id: c434f0e0ddff2ee2c86db2d6c44a37298fd005a3
2018-12-07 16:44:50 -08:00
Gu, Jinghui
6aee5488b5 correct omp dependency for mkl-dnn (#13449)
Summary:
The motivational of this PR is to enforce mkldnn to use the same omp version of caffe2 framework.
Meanwhile, do not change other assumptions within mkldnn.

Previously, the MKL_cmake_included is set in caffe2 in order to disable omp seeking in mkldnn.
But, with such change, mkldnn has no chance to adapt for mkl found by caffe2.
Then, some building flags of mkl will be not set in mkldnn.
For example, USE_MKL, USE_CBLAS, etc.

In this PR, we enforce set the MKLIOMP5LIB for mkldnn according to caffe2, and tell the mkl root path in MKLROOT for mkldnn. Then, mkldnn is built as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13449

Differential Revision: D12899504

Pulled By: yinghai

fbshipit-source-id: 22a196bd00b4ef0a11d350a32c049304613edf52
2018-11-06 10:48:09 -08:00
Gu, Jinghui
dbab9b73b6 seperate mkl, mklml, and mkldnn (#12170)
Summary:
1. Remove avx2 support in mkldnn
2. Seperate mkl, mklml, and mkldnn
3. Fix convfusion test case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170

Reviewed By: yinghai

Differential Revision: D10207126

Pulled By: orionr

fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51
2018-10-29 10:52:55 -07:00
Christian Puhrsch
f564163951 Remove SSE-only code and convolve5x5 (#12109)
Summary:
Performance oriented code will use AVX/AVX2, so we don't need SSE specific code anymore. This will also reduce the probability of running into an error on legacy CPUs.

On top of this convolve is covered by modern libraries such as MKLDNN, which are much more performant and which we now build against by default (even for builds from source).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12109

Differential Revision: D10055134

Pulled By: colesbury

fbshipit-source-id: 789b8a34d5936d9c144bcde410c30f7eb1c776fa
2018-10-09 10:53:50 -07:00
Gu, Jinghui
c064f8a89d Fix build error mkldnn due to corruptted CMAKE_REQUIRED_LIBRARIES (#12195)
Summary:
This is to fix cmake-time compilation error.

 When we change script to build Caffe2 with mkldnn, we run into some cmake-time compilation support check (like in libsleef) failed due to incorrect setting of CMAKE_REQUIRED_LIBRARIES.  It is a global setting which can interfere camke compilation if it is not clean up properly.  FindBLAS.cmake and FindLAPACK.cmake didn't clean this flag, and causes incorrect building of libsleef.so.

yinghai gujinghui
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12195

Differential Revision: D10159314

Pulled By: yinghai

fbshipit-source-id: 04908738f7d005579605b9c2a58d54f035d3baf4
2018-10-04 11:56:06 -07:00
Yinghai Lu
658386a63f Make USE_IDEEP work again (#12026)
Summary:
This PR establish a baseline so that we can build IDEEP ops in the new work flow. From this baseline, we need to
- Merge the CMakefile of MKLDNN from caffe2 and Pytorch
- Get rid of `USE_MKL=ON`.

Build command from now on:
```
EXTRA_CAFFE2_CMAKE_FLAGS="-DUSE_MKL=ON -DINTEL_COMPILER_DIR=/opt/IntelComposerXE/2017.0.098"  python setup.py build_deps
```

gujinghui
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12026

Differential Revision: D10041199

Pulled By: yinghai

fbshipit-source-id: b7310bd84a494ac899d8e25da368b63feed4eeaf
2018-09-25 16:56:29 -07:00
Soumith Chintala
77af40c025 prioritize Accelerate over OpenBLAS (#11812)
Summary:
might fix some binary build issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11812

Reviewed By: ezyang

Differential Revision: D9927309

Pulled By: soumith

fbshipit-source-id: 9ed6c2c6fedc2a1cffbf52bc0a795135d4239800
2018-09-18 21:56:57 -07:00
Anders Papitto
a853a74217 defer resolution of mkl to a cmake wrapper library (#11298)
Summary:
this is a fix that's needed for building extensions with a
pre-packaged pytorch. Consider the scenario where

(1) pytorch is compiled and packaged on machine A
(2) the package is downloaded and installed on machine B
(3) an extension is compiled on machine B, using the downloaded package

Before this patch, stage (1) would embed absolute paths to the system
installation of mkl into the generated Caffe2Config.cmake, leading to
failures in stage (3) if mkl was not at the same location on B as on
A. After this patch, only a reference to the wrapper library is
embedded, which is re-resolved on machine B.

We are already using a similar approach for cuda.

Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298

Differential Revision: D9683150

Pulled By: anderspapitto

fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4
2018-09-06 09:10:39 -07:00
Johannes M Dieterich
a4c59a9dab MIOpen integration, more tests enabled, bug fixes (#10612)
Summary:
* first integration of MIOpen for batch norm and conv on ROCm
* workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing
* workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script
* use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm
* enable test_sparse set on CI, skip tests that don't work currently on ROCm
* enable more tests in test_optim after the elementwise_bug got fixed
* enable more tests in test_dataloader
* improvements to hipification and ROCm build

With this, resnet18 on CIFAR data trains without hang or crash in our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612

Reviewed By: bddppq

Differential Revision: D9423872

Pulled By: ezyang

fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd
2018-08-23 15:24:47 -07:00
peter
facb293aad Fix FindMKL.cmake for Windows (#10453)
Summary:
Targets the issue discussed at https://github.com/pytorch/pytorch/pull/7399#issuecomment-400788971.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10453

Differential Revision: D9311591

Pulled By: soumith

fbshipit-source-id: ac0712e10bdac4ea3f76d6fbad2178ec958b3a31
2018-08-13 21:09:27 -07:00
Jesse Hellemn
def3715e82 Minor changes for nicer pip packages (#9544)
Summary:
I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544

Reviewed By: orionr

Differential Revision: D9267111

Pulled By: pjh5

fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894
2018-08-10 12:09:46 -07:00
Yinghai Lu
766fa1fc96 Fix IDEEP CMakefile (#9217)
Summary:
The reason is that we are referencing `__ideep_looked_for` here: 77484d91db/cmake/Modules/FindMKL.cmake (L350)

This was first flushed out in https://github.com/pytorch/pytorch/pull/8105 and probably can help with https://github.com/pytorch/pytorch/issues/9024
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9217

Reviewed By: houseroad

Differential Revision: D8754491

Pulled By: yinghai

fbshipit-source-id: 70aecc2d60684b9ea522403dc98a0a1a2c3db7e6
2018-07-06 15:28:07 -07:00
Yinghai Lu
c3b499227d Avoid iomp/gomp clash when building IDEEP ops (#8955)
Summary:
This PR does 3 things
- Reorder the search order of `intel_lp64` and `gf_lp64` as the first one is more essential and should have high priority.
- Avoid repetitive searching of MKL libraries in `ideep` and `mkldnn` submodule if we already found those in `FindMKL`
- Avoid adding more MKL dependencies to IDEEP if MKL is also found.

TODO: provide an option for user to chose iomp or gomp.
Closes https://github.com/pytorch/pytorch/pull/8955

Reviewed By: bddppq

Differential Revision: D8666960

Pulled By: yinghai

fbshipit-source-id: 669d3142204a8b47c19a900444246fc44a139012
2018-06-27 21:24:36 -07:00
Pieter Noordhuis
8e019826c9 Fix cmake cudnn autodetection (#8891)
If CUDNN_INCLUDE_DIR, CUDNN_LIB_DIR, and/or CUDNN_ROOT_DIR were set,
but USE_CUDNN was not explicitly set, the code in
cmake/Dependencies.cmake would set USE_CUDNN=OFF even though it could
be found. This caused an issue in ATen, where it includes its CuDNN
bindings if the variable CUDNN_FOUND is set. This was the case,
because the find_package call in cmake/public/cuda.cmake searches for
CuDNN and ends up finding it. The net result is that ATen tried to
compile CuDNN bits, but the caffe2::cudnn target is never defined let
alone added as dependency, and the build fails on not being able to
find the header cudnn.h.

This change does two things:

1) Restore CuDNN autodetection by setting USE_CUDNN=ON if it is found.
2) Remove obsolete FindCuDNN.cmake module. This functionality now
lives in cmake/public/cuda.cmake.
2018-06-26 06:54:27 -07:00
Teng Li
a994b432ee [c10d] NCCL Process Group implementation (#8182)
* [c10d] Process Group NCCL implementation

* Addressed comments

* Added one missing return and clang format again

* Use cmake/Modules for everything and fix gloo build

* Fixed compiler warnings

* Deleted duplicated FindNCCL
2018-06-08 10:33:27 -07:00
Yinghai Lu
fb5cc630f6 Fix me (#7837)
* Mini fix

* No USE_MKL

* Add CAFFE2_USE_EIGEN_FOR_BLAS
2018-05-25 07:38:50 -07:00
Yinghai Lu
144c5d1ff3
Overwrite INTEL_MKL_DIR correctly (#7824) 2018-05-24 15:04:25 -07:00
Yinghai Lu
71bad33cc4
Match parenthesis (#7797) 2018-05-24 13:45:23 -07:00
Orion Reblitz-Richardson
4bf0202cac
[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399)
* Have PyTorch depend on minimal libcaffe2.so instead of libATen.so

* Build ATen tests as a part of Caffe2 build

* Hopefully cufft and nvcc fPIC fixes

* Make ATen install components optional

* Add tests back for ATen and fix TH build

* Fixes for test_install.sh script

* Fixes for cpp_build/build_all.sh

* Fixes for aten/tools/run_tests.sh

* Switch ATen cmake calls to USE_CUDA instead of NO_CUDA

* Attempt at fix for aten/tools/run_tests.sh

* Fix typo in last commit

* Fix valgrind call after pushd

* Be forgiving about USE_CUDA disable like PyTorch

* More fixes on the install side

* Link all libcaffe2 during test run

* Make cuDNN optional for ATen right now

* Potential fix for non-CUDA builds

* Use NCCL_ROOT_DIR environment variable

* Pass -fPIC through nvcc to base compiler/linker

* Remove THCUNN.h requirement for libtorch gen

* Add Mac test for -Wmaybe-uninitialized

* Potential Windows and Mac fixes

* Move MSVC target props to shared function

* Disable cpp_build/libtorch tests on Mac

* Disable sleef for Windows builds

* Move protos under BUILD_CAFFE2

* Remove space from linker flags passed with -Wl

* Remove ATen from Caffe2 dep libs since directly included

* Potential Windows fixes

* Preserve options while sleef builds

* Force BUILD_SHARED_LIBS flag for Caffe2 builds

* Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing

* Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake

* Fixes for the last two changes

* Potential fix for Mac build failure

* Switch Caffe2 to build_caffe2 dir to not conflict

* Cleanup FindMKL.cmake

* Another attempt at Mac cpp_build fix

* Clear cpp-build directory for Mac builds

* Disable test in Mac build/test to match cmake
2018-05-24 07:47:27 -07:00
Paul Jesse Hellemn
b875fb281c
Update from facebook (#7451)
* [bootcamp] Improve "Shape" operator to support axes specification

To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.

* Back out "Add barrier net that runs before training nets"

Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.

* Change warning to verbose log to reduce log spam

The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.

* Extract the shared code from different caffe2_benchmark binaries

The OSS benchmark and Internal benchmark will share most functions in the benchmark.

* Support MFR in sequence training

As titled.

* Make knowledge distillation work with using logged prediction feature as teacher label.

1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label

* [C2/CUDA]: unjoined cross entropy sigmoid

as desc

* Add async_scheduling executor into deferrable_net_exec_test

Add async_scheduling into tests and fix some exception cases

* Fix Event disabled error

When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync

* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA

as desc.

* [C2 Core] Infer input device option in C2 hypothesis_test checkers

Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.

For example:
CopyCPUToGPU

* [C2 Op]SplitByLengthsOp CPU/GPU implementation

[C2 Op]SplitByLengthsOp CPU/GPU implementation

* fix undefined symbol error

not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now

* Add tools in DAIPlayground platform to help debugging models

Add additional tools to allow Plauground override individual method defined in AnyExp.  This will allow user to create module that specificly change certain default method behavior.  An example included in this diff is deactivating test model and checkpointing.  When debugging any model problems, switching off components helps me quickly narrow down the location of the bug.  The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)

* add shape and type inference for int8 conversion operator

* Fix flaky test for group_norm

Fix flaky test for group_norm

* Fix group_norm_op_test flaky

Fix group_norm_op_test flaky

* Implementation of composite learning rate policy

In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.

For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.

* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader

# Use Cases:

1). input: DB file -> output: DatasetReader.

Use DBFileReader.

2). input: Reader -> build cache DB file -> output: DatasetReader.

Use CachedReader.

# Changes to CachedReader:

1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.

# Changes to tests:

1). Make a separate TestCase class for CachedReader and DBFileReader.

2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.

3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.

* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"

Original commit changeset: 4489c6133f11

* Fix LARS bug

Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.

* [tum] support sparse init & add uniformFill option

as title

* Propagate exception for async nets

Capture the exception when an exception is thrown in async nets and re-throw it after wait().  This allows exceptions to be propagated up to the caller.

This diff was a part of D7752068.  We split the diff so that C2 core files changes are in a separate diff.

* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc

Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a

Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>

* [C2]ReluN Op

relu n op.

tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6

* Call destructor when assigning a blob value

* Add executor overrides

Add executor overrides flag to enable migration to async_scheduling executor

* Add barrier net that runs before training nets - attempt #2

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.

This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.

To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net.  Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.

* Handle empty nets in async_scheduling

Make sure we don't get stuck on empty nets

* use CUDA_ARCH for conditional compile

* [C2 fix] infer function for ensure_cpu_output_op

* Update group_norm test to reduce flaky test

* Fix lr_multiplier for GPU
2018-05-10 23:14:27 -07:00
Jinghui
769397eb77 [Caffe2] [feature request] Add gradient operators for IDEEP (#7234)
* Add gradient operators for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add gradient test cases for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Upgrade third_party/ideep

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Refine SumOp for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share input buffer in fallback op if possible

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback ConvTranspose op for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix bug introduced by the patch of sharing input buffer

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share output buffer in fallback operators

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove IDEEP to resolve repo issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Reflash IDEEP repo

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove redundant lines in IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback operators for IDEEP
(Flatten, ResizeLike, Transpose, and Reshape)

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-05-09 08:52:24 -07:00
Yinghai Lu
ea24c7ff1b
Remove cdft library requirement from MKL (#7246) 2018-05-07 15:31:30 -07:00
Orion Reblitz-Richardson
aa38ae303d
[build] Setup to build ATen from root CMake file (#7163)
* Setup to build ATen from root CMake file

* Move aten/src/TH/cmake into cmake/Modules

* Add special code path for FindMKL for merge
2018-05-02 19:33:31 -07:00
Yinghai Lu
8b70f7d248
[Caffe2] Clean up ideep integration (#6881)
* Clean up ideep integrtation

* .

* Remove redundant code in convnet benchmark

* MKL ON

* Do not add -mavx2 everywhere

* .

* Comments

* rename

* .
2018-04-24 18:32:35 -07:00
Jinghui
26ddefbda1 [feature request] [Caffe2] Enable MKLDNN support for inference (#6699)
* Add operators based-on IDEEP interfaces

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Enable IDEEP as a caffe2 device

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add test cases for IDEEP ops

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add IDEEP as a caffe2 submodule

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Skip test cases if no IDEEP support

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct cmake options for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add dependences on ideep libraries

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix issues in IDEEP conv ops and etc.

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Move ideep from caffe2/ideep to caffe2/contrib/ideep

Signed-off-by: Gu Jinghui <jinghui.gu@intel.com>

* Update IDEEP to fix cmake issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix cmake issue caused by USE_MKL option

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct comments in MKL cmake file

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-04-22 21:58:14 -07:00
Marat Dukhan
63b5cc47eb
[caffe2] Minor changes in NNPACK CMake scripts (#6532)
- Tell NNPACK to not link pthreadpool, but only its headers
- Remove FindNNPACK.cmake as it is no longer used
2018-04-11 20:56:38 -04:00
Soumith Chintala
108f5c197f
[pytorch] add static linkage support for CuDNN and NCCL (#6410)
* when linking static CUDA libs, additional dep on culibos.a

* add USE_STATIC_NCCL option

* add USE_STATIC_CUDNN option

* remove libATen soversion

* add caffe, caffe2 folders to setup.py exclude list
2018-04-08 22:54:18 -04:00
Yangqing Jia
4aded2f7c1 Add Numa support (#2152) 2018-03-05 23:30:20 -08:00
Yangqing Jia
1f9df59de9 Move caffe_option to proper cmake_dependent_option (#2049) 2018-02-24 23:31:36 -08:00
Yangqing Jia
fe5fe7bad2 CMake cuda targets (#1993)
* wip: cuda targets

* Remove FindCuDNN.cmake as it is no longer needed
2018-02-22 15:54:34 -05:00
sf-wind
5439ab3cdc Remove gf library in MKL (#1976)
* Remove OpenGL code from benchmark

* Make it possible to print plot in the ipython notbook

* Create the blob if the blob is not specified in the init net

* Do not use gf library for MKL. Even after I install the entire MKL library it is still not found. After removing it, the MKL code can still run
2018-02-20 15:17:34 -08:00
Yangqing Jia
d481afb125 Modernizing glog. Same as gflags.
Summary:
Same as PR #1819.
Closes https://github.com/caffe2/caffe2/pull/1830

Differential Revision: D6832171

Pulled By: Yangqing

fbshipit-source-id: 462a9b807e78d60748160a0cfd24932c9003fcc3
2018-01-28 18:21:22 -08:00
Yangqing Jia
73ed0d5ced Modernizing the gflags dependency in cmake.
Summary:
Historically, for interface dependent libraries (glog, gflags and protobuf), exposing them in Caffe2Config.cmake is usually difficult.

New versions of glog and gflags ship with new-style cmake targets, so one does not need to use variables. New-style targets also make it easier for people to depend on them in installed config files.

This diff modernizes the gflags library, and still provides a fallback path if the installed gflags does not have cmake config files coming with it.

It does change one behavior of the build process though - when one specifies -DUSE_GFLAGS=ON but gflags cannot be found, the old script automatically turns it off but the new script crashes, forcing the user to specify USE_GFLAGS=OFF.
Closes https://github.com/caffe2/caffe2/pull/1819

Differential Revision: D6826604

Pulled By: Yangqing

fbshipit-source-id: 210f3926f291c8bfeb24eb9671e5adfcbf8cf7fe
2018-01-27 19:31:14 -08:00
Marat Dukhan
cd9d0f4561 Link cpuinfo when using external NNPACK
Summary:
Close #1685
Closes https://github.com/caffe2/caffe2/pull/1722

Differential Revision: D6686071

Pulled By: Maratyszcza

fbshipit-source-id: bbe86bfd479376bc7cdfdd0bad3896f1c2356216
2018-01-09 12:50:52 -08:00
Pieter Noordhuis
54342287fe Look for NCCL in CUDA_TOOLKIT_ROOT_DIR
Summary: Closes https://github.com/caffe2/caffe2/pull/1611

Reviewed By: dzhulgakov

Differential Revision: D6550168

Pulled By: pietern

fbshipit-source-id: e034ce4057d37bfc8b53949c56cbcb701ea5d958
2017-12-12 21:50:49 -08:00
Pieter Noordhuis
db06e91097 Bump gloo
Summary:
Latest version of Gloo takes care of MPI_Init/MPI_Finalize for us, so
this commit removes handling that from caffe2/contrib/gloo. It also
imports CMake NCCL module changes from Gloo to stay consistent and
allow setting NCCL_INCLUDE_DIR and NCCL_LIB_DIR separately.
Closes https://github.com/caffe2/caffe2/pull/1295

Reviewed By: dzhulgakov

Differential Revision: D5979364

Pulled By: pietern

fbshipit-source-id: 794b00b0a445317c30a13cc8f0f4dc38e590cc77
2017-10-05 16:57:59 -07:00
Luke Yeager
c858c68537 cmake: stop including files from the install directory
Summary:
Here is the buggy behavior which this change fixes:

* On the first configure with CMake, a system-wide benchmark installation is not found, so we use the version in `third_party/` ([see here](https://github.com/caffe2/caffe2/blob/v0.8.1/cmake/Dependencies.cmake#L98-L100))
* On installation, the benchmark sub-project installs its headers to `CMAKE_INSTALL_PREFIX` ([see here](https://github.com/google/benchmark/blob/4bf28e611b/src/CMakeLists.txt#L41-L44))
* On a rebuild, CMake searches the system again for a benchmark installation (see https://github.com/caffe2/caffe2/issues/916 for details on why the first search is not cached)
* CMake includes `CMAKE_INSTALL_PREFIX` when searching the system ([docs](https://cmake.org/cmake/help/v3.0/variable/CMAKE_SYSTEM_PREFIX_PATH.html))
* Voila, a "system" installation of benchmark is found at `CMAKE_INSTALL_PREFIX`
* On a rebuild, `-isystem $CMAKE_INSTALL_PREFIX/include` is added to every build target ([see here](https://github.com/caffe2/caffe2/blob/v0.8.1/cmake/Dependencies.cmake#L97)). e.g:

      cd /caffe2/build/caffe2/binaries && ccache /usr/bin/c++    -I/caffe2/build -isystem /caffe2/third_party/googletest/googletest/include -isystem /caffe2/install/include -isystem /usr/include/opencv -isystem /caffe2/third_party/eigen -isystem /usr/include/python2.7 -isystem /usr/lib/python2.7/dist-packages/numpy/core/include -isystem /caffe2/third_party/pybind11/include -isystem /usr/local/cuda/include -isystem /caffe2/third_party/cub -I/caffe2 -I/caffe2/build_host_protoc/include  -fopenmp -std=c++11 -O2 -fPIC -Wno-narrowing -O3 -DNDEBUG   -o CMakeFiles/split_db.dir/split_db.cc.o -c /caffe2/caffe2/binaries/split_db.cc

This causes two issues:
1. Since the headers and libraries at `CMAKE_INSTALL_PREFIX` have a later timestamp than the built files, an unnecessary rebuild is triggered
2. Out-dated headers from the install directory are used during compilation, which can lead to strange build errors (which can usually be fixed by `rm -rf`'ing the install directory)

Possible solutions:
* Stop searching the system for an install of benchmark, and always use the version in `third_party/`
* Cache the initial result of the system-wide search for benchmark, so we don't accidentally pick up the installed version later
* Hack CMake to stop looking for headers and libraries in the installation directory

This PR is an implementation of the first solution. Feel free to close this and fix the issue in another way if you like.
Closes https://github.com/caffe2/caffe2/pull/1112

Differential Revision: D5761750

Pulled By: Yangqing

fbshipit-source-id: 2240088994ffafdb6eedb3626d898b505a4ba564
2017-09-01 23:33:14 -07:00
Pieter Noordhuis
45e6e71198 Tidy up CMake for NCCL
Summary:
Use HINTS instead of PATHS for find_library so that you can specify
-DNCCL_ROOT_DIR and it will use this NCCL installation regardless of
what else is installed on your system. Also add a path hint to include
the default base path for NCCL 2 libraries.
Closes https://github.com/caffe2/caffe2/pull/1152

Reviewed By: Yangqing

Differential Revision: D5740053

Pulled By: pietern

fbshipit-source-id: 43f0908a63e8a9b90320dece0bbb558827433b48
2017-08-30 15:39:56 -07:00
Pieter Noordhuis
813cca85d1 Use CMake HINTS to find CuDNN
Summary:
The PATHS suggestion to find_library is searched after everything
else. By using HINTS, it searches CUDNN_ROOT_DIR much earlier, avoiding
potential conflicts with other paths that have the CuDNN header.
Closes https://github.com/caffe2/caffe2/pull/1122

Reviewed By: Yangqing

Differential Revision: D5701822

Pulled By: pietern

fbshipit-source-id: 3f15757701aff167e7ae2a3e8a4ccf5d96763a0c
2017-08-24 15:35:24 -07:00
Guillaume Dumont
8cc9dbf357 Added Ninja generator support on Windows
Summary:
I successfully built caffe2 using MSVC 2015 and the Ninja Generator. I use vcpkg to build glfags, glog, lmdb and protobuf. Here is my build procedure:

1. Install vcpkg and set it up according to vcpkg docs
2. Install dependencies
```
$> vcpkg install gflags glog lmdb protobuf eigen3 --triplet x64-windows-static
```
3. Run CMake with this batch file
```Batch
setlocal
if NOT DEFINED VCPKG_DIR ( echo "Please defined VCPKG_DIR" && exit /b 1 )
if NOT DEFINED CMAKE_BUILD_TYPE set CMAKE_BUILD_TYPE=Release
if NOT DEFINED BUILD_DIR set BUILD_DIR=build_%CMAKE_BUILD_TYPE%
if NOT DEFINED USE_CUDA set USE_CUDA=OFF

call "%VS140COMNTOOLS%\..\..\VC\vcvarsall.bat" amd64

if NOT EXIST %BUILD_DIR% (mkdir %BUILD_DIR%)
pushd %BUILD_DIR%

set CMAKE_GENERATOR=Ninja
set ZLIB_LIBRARY=%VCPKG_DIR%\installed\x64-windows-static\lib\zlib.lib

cmake -G"%CMAKE_GENERATOR%" ^
      -DBUILD_SHARED_LIBS=OFF ^
      -DCMAKE_VERBOSE_MAKEFILE=1 ^
      -DBUILD_TEST=OFF ^
      -DBUILD_SHARED_LIBS=OFF ^
      -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^
      -DUSE_CUDA=%USE_CUDA% ^
      -DZLIB_LIBRARY:FILEPATH="%ZLIB_LIBRARY%" ^
      -DVCPKG_TARGET_TRIPLET=x64-windows-static ^
      -DVCPKG_APPLOCAL_DEPS:BOOL=OFF ^
      -DCMAKE_TOOLCHAIN_FILE:FILEPATH=%VCPKG_DIR%\scripts\buildsystems\vcpkg.cmake ^
      -DPROTOBUF_PROTOC_EXECUTABLE:FILEPATH=%VCPKG_DIR%\installed\x64-windows-static\tools\protoc.exe ^
      ..\

ninja
popd

endlocal
```
Closes https://github.com/caffe2/caffe2/pull/880

Differential Revision: D5497384

Pulled By: Yangqing

fbshipit-source-id: e0d81d3dbd3286ab925eddef0e6fbf99eb6375a5
2017-07-26 00:32:20 -07:00
Daniel Bermond
0458985c1b Fix build with external nnpack installation
Summary:
libpthreadpool is needed during the linking stage and is missing when user chooses to use an external nnpack installation (from system libraries).

Fixes GitHub issue #459.

Detailed discussion on [this comment](https://github.com/caffe2/caffe2/issues/459#issuecomment-308831547).
Closes https://github.com/caffe2/caffe2/pull/808

Differential Revision: D5430318

Pulled By: Yangqing

fbshipit-source-id: 5e10332fb01e54d8360bb929c1a82b0eef580bbb
2017-07-25 23:03:39 -07:00
Guillaume Dumont
feecb09517 Added sensible default root location for MKL on Windows
Summary:
MKL on windows works with this change. Tested with MKL 2017 Update 3 (https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2017-release-notes).

Should fix #544

With MKL 2017 Update 3 #514 should not happen too.

Note: I used Anaconda which ships with its own MKL, so I had to make sure that the MKL 2017 Update 3 version was loaded by replacing the .dll in the `%AnacondaPrefix%\Library\bin` folder. Otherwise, numpy would load it's own version and I would have all sorts of missing procedures errors. Now that the same version is available through `conda` this is easily fixed with `conda install mkl==2017.0.3`
Closes https://github.com/caffe2/caffe2/pull/929

Differential Revision: D5429664

Pulled By: Yangqing

fbshipit-source-id: eaa150bab563ee4ce8348faee1624ac4af477513
2017-07-14 17:20:36 -07:00
haracejacob
2ec294a8bb Fix a few typos and grammars in comment
Summary:
Fix a few typos and grammars in comment

by using language-check, python library
spell_checker source code is here : https://github.com/17-1-SKKU-OSS/011A/blob/master/spell_checker/spell_checker.py
here is the text file which indicates what things should be fixed :  https://github.com/17-1-SKKU-OSS/011A/tree/master/spell_checker/fix/caffe2
Closes https://github.com/caffe2/caffe2/pull/719

Differential Revision: D5165118

Pulled By: aaronmarkham

fbshipit-source-id: 7fb8ef7a99d03cd5fd2f9ebdb01b9865e90fc37b
2017-06-14 18:22:39 -07:00
Hans Gaiser
567842e68d Check system dependencies first
Summary:
This PR changes the cmake of Caffe2 to look for system dependencies before resorting to the submodules in `third-party`. Only googletest should logically be in third-party, the other libraries should ideally be installed as system dependencies by the user. This PR adds system dependency checks for Gloo, CUB, pybind11, Eigen and benchmark, as these were missing from the cmake files.

In addition it removes the execution of `git submodule update --init` in cmake. This seems like bad behavior to me, it should be up to the user to download submodules and manage the git repository.
Closes https://github.com/caffe2/caffe2/pull/382

Differential Revision: D5124123

Pulled By: Yangqing

fbshipit-source-id: cc34dda58ffec447874a89d01058721c02a52476
2017-05-24 14:31:51 -07:00
Du Tran
033ab9da1b Adding video data layer for caffe2
Summary: Adding a simple video data layer which allows to read video data from frames, videos and output 5D tensor. It also allows multiple labels. The current implementation is based on ffmpeg

Differential Revision: D4801798

fbshipit-source-id: 46448e9c65fb055c2d71855447383a33ade0e444
2017-05-05 14:16:38 -07:00
Yangqing Jia
1aa5231fb3 make nnpack build on mac/linux, and also contbuild support
Summary:
* add custom ninja install

* minimal build for nnpack

* force -fPIC for nnpack
Closes https://github.com/caffe2/caffe2/pull/207

Differential Revision: D4729265

Pulled By: Yangqing

fbshipit-source-id: 2ed345a4fda6b4811af03cd1898e2402dda58701
2017-03-17 15:19:07 -07:00
Yangqing Jia
1741fd839f Re-apply windows diff D4657831
Summary:
(Note: previous revert was due to a race condition between D4657831 and
D4659953 that I failed to catch.)

After this, we should have contbuild guarding the Windows build both with
and without CUDA.

This includes a series of changes that are needed to make Windows build,
specifically:

(1) Various flags that are needed in the cmake system, specially dealing
with /MD, /MT, cuda, cudnn, whole static linking, etc.
(2) Contbuild scripts based on appveyo.
(3) For Windows build, note that one will need to use "cmake --build" to
build stuff so that the build type is consistent between configuration and
actual build. see scripts\build_windows.bat for details.
(4) In logging.h, ERROR is already defined by Windows. I don't have a good
solution now, and as a result, LOG(ERROR) on windows is going to be
LOG(INFO).
(5) variable length array is not supported by MSVC (and it is not part of
C++ standard). As a result I replaced them with vectors.
(6) sched.h is not available on Windows, so akyrola 's awesome simple
async net might encounter some slowdown due to no affinity setting on
Windows.
(7) MSVC has a bug that does not work very well with template calls inide
a templated function call, which is a known issue that should be fixed in
MSVC 2017. However for now this means changes to conv_op_impl.h and
recurrent_net_op.h. No actual functionalities are changed.
(8) std host function calls are not supported in CUDA8+MSVC, so I changed
lp_pool (and maybe a few others) to use cuda device functions.
(9) The current Scale and Axpy has heavy templating that does not work
well with MSVC. As a result I reverted azzolini 's changes to the Scale
and Axpy interface, moved the fixed-length version to ScaleFixedSize and
AxpyFixedSize.
(10) CUDA + MSVC does not deal with Eigen well, so I guarded all Eigen
parts to only the non-CUDA part.
(11) In conclusion, it is fun but painful to deal with visual c++.

Differential Revision: D4666745

fbshipit-source-id: 3c9035083067bdb19a16d9c345c1ce66b6a86600
2017-03-07 11:02:12 -08:00
Avani Nandini
039c3cf0ba Revert D4657831: [caffe2][PR] Changes for Windows build to pass.
Summary: This reverts commit 070ded372ed78a7e3e3919fdffa1d337640f146e

Differential Revision: D4657831

fbshipit-source-id: 3a0fb403936a9257776d637ce3ba5dbd81e1119f
2017-03-06 21:02:36 -08:00
Yangqing Jia
7b8c7b11d2 Changes for Windows build to pass.
Summary:
After this, we should have contbuild guarding the Windows build both with
and without CUDA.

This includes a series of changes that are needed to make Windows build,
specifically:

(1) Various flags that are needed in the cmake system, specially dealing
with /MD, /MT, cuda, cudnn, whole static linking, etc.
(2) Contbuild scripts based on appveyo.
(3) For Windows build, note that one will need to use "cmake --build" to
build stuff so that the build type is consistent between configuration and
actual build. see scripts\build_windows.bat for details.
(4) In logging.h, ERROR is already defined by Windows. I don't have a good
solution now, and as a result, LOG(ERROR) on windows is going to be
LOG(INFO).
(5) variable length array is not supported by MSVC (and it is not part of
C++ standard). As a result I replaced them with vectors.
(6) sched.h is not available on Windows, so akyrola 's awesome simple
async net might encounter some slowdown due to no affinity setting on
Windows.
(7) MSVC has a
Closes https://github.com/caffe2/caffe2/pull/183

Reviewed By: ajtulloch

Differential Revision: D4657831

Pulled By: Yangqing

fbshipit-source-id: 070ded372ed78a7e3e3919fdffa1d337640f146e
2017-03-06 20:03:37 -08:00
Bram Wasti
0d5f3654b2 Adding back untracked files from manual github pull
Summary: Github import didn't work and the manual import lost some files.

Reviewed By: Yangqing

Differential Revision: D4408509

fbshipit-source-id: ec8edb8c02876410f0ef212bde6847a7ba327fe4
2017-01-12 08:59:19 -08:00
Yangqing Jia
1cd166d330 CMake completions work
Summary: Closes https://github.com/caffe2/caffe2/pull/88

Differential Revision: D4404292

Pulled By: bwasti

fbshipit-source-id: 8a4351c2dee5136aaa12b90f1a61fd7afee51994
2017-01-11 16:59:22 -08:00
Bram Wasti
1aa473638d Added a search path to find OpenBLAS for convenience (homebrew install) 2016-12-29 16:15:25 -05:00
Simon Layton
99e97a4b7a Correction to paths to find cuDNN 2016-12-16 16:03:23 -05:00
Simon Layton
fbbb87cd46 Enhancements
Add BLAS chooser
Move cuDNN detection from Cuda -> FindCuDNN
Refactor main C2 libs, should enable no-GPU build (untested)
2016-12-13 09:29:01 -05:00
Simon Layton
52f09fe2c9 Initial building with deps 2016-12-13 09:29:01 -05:00