Commit Graph

216 Commits

Author SHA1 Message Date
imaginary-person
9e53c823b8 Add AVX512 support in ATen & remove AVX support (#61903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61903

### Remaining Tasks

- [ ] Collate results of benchmarks on two Intel Xeon machines (with & without CUDA, to check if CPU throttling causes issues with GPUs) - make graphs, including Roofline model plots (Intel Advisor can't make them with libgomp, though, but with Intel OpenMP).

### Summary

1. This draft PR produces binaries with with 3 types of ATen kernels - default, AVX2, AVX512 . Using the environment variable `ATEN_AVX512_256=TRUE`  also results in 3 types of kernels, but the compiler can use 32 ymm registers for AVX2, instead of the default 16. ATen kernels for `CPU_CAPABILITY_AVX` have been removed.

2. `nansum` is not using AVX512 kernel right now, as it has poorer accuracy for Float16, than does AVX2 or DEFAULT, whose respective accuracies aren't very good either (#59415).
It was more convenient to disable AVX512 dispatch for all dtypes of `nansum` for now.

3. On Windows , ATen Quantized AVX512 kernels are not being used, as quantization tests are flaky. If `--continue-through-failure` is used, then `test_compare_model_outputs_functional_static` fails. But if this test is skipped, `test_compare_model_outputs_conv_static` fails. If both these tests are skipped, then a third one fails. These are hard to debug right now due to not having access to a Windows machine with AVX512 support, so it was more convenient to disable AVX512 dispatch of all ATen Quantized kernels on Windows for now.

4. One test is currently being skipped -
[test_lstm` in `quantization.bc](https://github.com/pytorch/pytorch/issues/59098) - It fails only on Cascade Lake machines, irrespective of the `ATEN_CPU_CAPABILITY` used, because FBGEMM uses `AVX512_VNNI` on machines that support it. The value of `reduce_range` should be used as `False` on such machines.

The list of the changes is at https://gist.github.com/imaginary-person/4b4fda660534f0493bf9573d511a878d.

Credits to ezyang for proposing `AVX512_256` - these use AVX2 intrinsics but benefit from 32 registers, instead of the 16 ymm registers that AVX2 uses.
Credits to limo1996 for the initial proposal, and for optimizing `hsub_pd` & `hadd_pd`, which didn't have direct AVX512 equivalents, and are being used in some kernels. He also refactored `vec/functional.h` to remove duplicated code.
Credits to quickwritereader for helping fix 4 failing complex multiplication & division tests.

### Testing
1. `vec_test_all_types` was modified to test basic AVX512 support, as tests already existed for AVX2.
Only one test had to be modified, as it was hardcoded for AVX2.
2.  `pytorch_linux_bionic_py3_8_gcc9_coverage_test1` & `pytorch_linux_bionic_py3_8_gcc9_coverage_test2` are now using `linux.2xlarge` instances, as they support AVX512. They were used for testing AVX512 kernels, as AVX512 kernels are being used by default in both of the CI checks. Windows CI checks had already been using machines with AVX512 support.

### Would the downclocking caused by AVX512 pose an issue?

I think it's important to note that AVX2 causes downclocking as well, and the additional downclocking caused by AVX512 may not hamper performance on some Skylake machines & beyond, because of the double vector-size. I think that [this post with verifiable references is a must-read](https://community.intel.com/t5/Software-Tuning-Performance/Unexpected-power-vs-cores-profile-for-MKL-kernels-on-modern-Xeon/m-p/1133869/highlight/true#M6450). Also, AVX512 would _probably not_ hurt performance on a high-end machine, [but measurements are recommended](https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/). In case it does, `ATEN_AVX512_256=TRUE` can be used for building PyTorch, as AVX2 can then use 32 ymm registers instead of the default 16. [FBGEMM uses `AVX512_256` only on Xeon D processors](https://github.com/pytorch/FBGEMM/pull/209), which are said to have poor AVX512 performance.

This [official data](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) is for the Intel Skylake family, and the first link helps understand its significance. Cascade Lake & Ice Lake SP Xeon processors are said to be even better when it comes to AVX512 performance.

Here is the corresponding data for [Cascade Lake](https://cdrdv2.intel.com/v1/dl/getContent/338848) -

![CASCADE LAKE AVX2](https://user-images.githubusercontent.com/76181208/120666172-ffec3f80-c451-11eb-8ea1-8933ccc12a1b.PNG)
![CASCADE LAKE AVX512](https://user-images.githubusercontent.com/76181208/120666190-04b0f380-c452-11eb-9faa-38d233c874c8.PNG)

The corresponding data isn't publicly available for Intel Xeon SP 3rd gen (Ice Lake SP), but [Intel mentioned that the 3rd gen has frequency improvements pertaining to AVX512](https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf). Ice Lake SP machines also have 48 KB L1D caches, so that's another reason for AVX512 performance to be better on them.

### Is PyTorch always faster with AVX512?

No, but then PyTorch is not always faster with AVX2 either. Please refer to #60202. The benefit from vectorization is apparent with with small tensors that fit in caches or in kernels that are more compute heavy. For instance, AVX512 or AVX2 would yield no benefit for adding two 64 MB tensors, but adding two 1 MB tensors would do well with AVX2, and even more so with AVX512.

It seems that memory-bound computations, such as adding two 64 MB tensors can be slow with vectorization (depending upon the number of threads used), as the effects of downclocking can then be observed.

Original pull request: https://github.com/pytorch/pytorch/pull/56992

Reviewed By: soulitzer

Differential Revision: D29266289

Pulled By: ezyang

fbshipit-source-id: 2d5e8d1c2307252f22423bbc14f136c67c3e6184
2021-07-22 08:51:49 -07:00
shmsong
ee2dd35ef4 Resolving native dependency and try_run for cross compile (#59764)
Summary:
This is a PR on build system that provides support for cross compiling on Jetson platforms.

The major change is:

1. Disable try runs for cross compiling in `COMPILER_WORKS`, `BLAS`, and `CUDA`. They will not be able to perform try run on a cross compile setup

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59764

Reviewed By: soulitzer

Differential Revision: D29524363

Pulled By: malfet

fbshipit-source-id: f06d1ad30b704c9a17d77db686c65c0754db07b8
2021-07-09 09:29:21 -07:00
zhouzhuojie
6107cf3750 Add --jobs 0 for git submodule update (#61311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61311

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61152

Some related docs about `submodule.fetchJobs`
https://git-scm.com/docs/git-config#Documentation/git-config.txt-submodulefetchJobs

```
time git submodule update --init --recursive
________________________________________________________
Executed in  243.20 secs    fish           external
   usr time   49.64 secs  213.00 micros   49.64 secs
   sys time   29.27 secs  795.00 micros   29.27 secs
```

```
time git submodule update --init --recursive --jobs 4
________________________________________________________
Executed in  143.04 secs    fish           external
   usr time   51.06 secs  246.00 micros   51.06 secs
   sys time   30.96 secs  742.00 micros   30.96 secs
```

```
time git submodule update --init --recursive --jobs 8
________________________________________________________
Executed in  124.64 secs    fish           external
   usr time   51.76 secs  264.00 micros   51.76 secs
   sys time   30.49 secs  739.00 micros   30.49 secs

```

```
time git submodule update --init --recursive --jobs 0 # use all online cpus
 ________________________________________________________
Executed in  129.75 secs    fish           external
   usr time   51.64 secs  181.00 micros   51.64 secs
   sys time   31.49 secs  781.00 micros   31.49 secs

```

Test Plan: Imported from OSS

Reviewed By: 1ntEgr8

Differential Revision: D29560875

Pulled By: zhouzhuojie

fbshipit-source-id: 556027dffe744c66428075a8a1bf64683930aaaf
2021-07-07 16:28:18 -07:00
Nikita Shulga
40a7c317bc Run BLAS F2C checks on host architecture (#60703)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60351

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60703

Reviewed By: driazati

Differential Revision: D29379727

Pulled By: malfet

fbshipit-source-id: dadbb1d39373887f07d59d0a05e093a5d070b016
2021-06-24 18:44:41 -07:00
Nikita Shulga
63956610a7 Search for static OpenBLAS compiled with OpenMP (#59428)
Summary:
Before that, only dynamically linked OpenBLAS compield with OpenMP could
be found.

Also get rid of hardcoded codepath for libgfortran.a in FindLAPACK.cmake

Only affects aarch64 linux builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59428

Reviewed By: agolynski

Differential Revision: D28891314

Pulled By: malfet

fbshipit-source-id: 5af55a14c85ac66551ad2805c5716bbefe8d55b2
2021-06-04 08:09:21 -07:00
Gordon Fossum
007fe949aa Adding a new include directory in BLIS search path (#58166)
Summary:
While trying to build PyTorch with BLIS as the backend library,
we found a build issue due to some missing include files.
This was caused by a missing directory in the search path.
This patch adds that path in FindBLIS.cmake.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58166

Reviewed By: zou3519

Differential Revision: D28640460

Pulled By: malfet

fbshipit-source-id: d0cd3a680718a0a45788c46a502871b88fbadd52
2021-05-24 08:57:02 -07:00
Shruti Ramesh
f1f3c8b0fa Adding PyTorch + DNNL + AMD BLIS path (#54953)
Summary:
These changes provide the user with an additional option to choose the DNNL+BLIS path for PyTorch.

This assumes BLIS is already downloaded or built from source and the necessary library file is available at the location: $BLIS_HOME/lib/libblis.so and include files are available at: $BLIS_HOME/include/blis/blis.h and $BLIS_HOME/include/blis/cblas.h

Export the below variables to build PyTorch with MKLDNN+BLIS and proceed with the regular installation procedure as below:
$export BLIS_HOME=path-to-BLIS
$export PATH=$BLIS_HOME/include/blis:$PATH LD_LIBRARY_PATH=$BLIS_HOME/lib:$LD_LIBRARY_PATH
$export BLAS=BLIS USE_MKLDNN_CBLAS=ON WITH_BLAS=blis
$python setup.py install

CPU only Dockerfile to build PyTorch with AMD BLIS is available at : docker/cpu-blis/Dockerfile
Example command line to build using the Dockerfile:
sudo DOCKER_BUILDKIT=1 docker build . -t docker-image-repo-name
Example command line to run the built docker container:
sudo docker run --name container-name -it docker-image-repo-name

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54953

Reviewed By: glaringlee

Differential Revision: D27466799

Pulled By: malfet

fbshipit-source-id: e03bae9561be3a67429df3b1be95a79005c63050
2021-03-31 10:40:25 -07:00
Sam Estep
5bcbbf5373 Lint trailing newlines (#54737)
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.

The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:

- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`

I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):

- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)

To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737

Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:

- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true

In contrast, this run (after correcting the trailing newlines in this PR) succeeded:

- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241

To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```

Reviewed By: malfet

Differential Revision: D27409736

Pulled By: samestep

fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
2021-03-30 13:09:52 -07:00
Chester Liu
6a4d2c61d5 Allow linking against vcomp on Windows (#54132)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54054

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54132

Reviewed By: zou3519

Differential Revision: D27181524

Pulled By: malfet

fbshipit-source-id: b79b34afb7edcc594d9b5907c5a7505b9cc5683b
2021-03-19 14:36:07 -07:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Nikita Shulga
b3c4ac6319 Fix OpenBLAS discovery (#53168)
Summary:
Fix accidental regression introduced by https://github.com/pytorch/pytorch/issues/47940

`FIND_PACKAGE(OpenBLAS)` does not validate that discovered library can actually be used, while `check_fortran_libraries` does that

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53168

Test Plan: Build PyTorch with static OpenBLAS and check that `torch.svd(torch.ones(3, 3)).S` do not raise an exception

Reviewed By: walterddr

Differential Revision: D26772345

Pulled By: malfet

fbshipit-source-id: 3e4675c176b30dfe4f0490d7d3dfe4f9a4037134
2021-03-03 08:23:02 -08:00
peter
8870c391e9 Update mkl to 2020.2.254 (#52964)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52907

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52964

Reviewed By: H-Huang

Differential Revision: D26726464

Pulled By: seemethere

fbshipit-source-id: 8f3067292e6416e299b4b040c8fb73510134f02e
2021-03-01 11:13:57 -08:00
David Kyle
dbeda994db Update FindvecLib.cmake for macOS 10.14, 10.15 and Big Sur (#51288)
Summary:
When compiling libtorch on macOS there is the option to use the `vecLib` BLAS library from Apple's (Accelerate)[https://developer.apple.com/documentation/accelerate] framework. Recent versions of macOS have changed the location of veclib.h, this change adds the new locations to `FindvecLib.cmake`

To test run the following command:
```
BLAS=vecLib python setup.py install --cmake --cmake-only
```

The choice of BLAS library is confirmed in the output:
```
-- Trying to find preferred BLAS backend of choice: vecLib
-- Found vecLib: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Versions/Current/Headers
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51288

Reviewed By: jbschlosser

Differential Revision: D26531136

Pulled By: malfet

fbshipit-source-id: ce86807ccbf66973f33b3acb99b7f40cfd182b9b
2021-02-19 08:04:10 -08:00
Nathan John Sircombe
664126bab5 Enables build with oneDNN (MKL-DNN) on AArch64 (#50400)
Summary:
Since version 1.6, oneDNN has provided limited support for AArch64 builds.

This minor change is to detect an AArch64 CPU and permit the use of
`USE_MKLDNN` in that case.

Build flags for oneDNN are also modified accordingly.

Note: oneDNN on AArch64, by default, will use oneDNN's reference C++ kernels.
These are not optimised for AArch64, but oneDNN v1.7 onwards provides support
for a limited set of primitives based Arm Compute Library.
See: https://github.com/oneapi-src/oneDNN/pull/795
and: https://github.com/oneapi-src/oneDNN/pull/820
for more details. Support for ACL-based oneDNN primitives in PyTorch
will require some further modification,

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50400

Reviewed By: izdeby

Differential Revision: D25886589

Pulled By: malfet

fbshipit-source-id: 2c81277a28ad4528c2d2211381e7c6692d952bc1
2021-01-13 08:41:44 -08:00
Nikita Shulga
7b4a7661d6 Make PyTorch partially cross-compilable for Apple M1 (#49701)
Summary:
Update CPUINFO to include https://github.com/pytorch/cpuinfo/pull/51
Update sleef to include https://github.com/shibatch/sleef/pull/376
Modify aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt to recognize CMAKE_OSX_ARCHITECTURES

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49701

Test Plan: `cmake -DCMAKE_OSX_ARCHITECTURES=x86_64 -DPYTHON_EXECUTABLE=/usr/bin/python3  -DUSE_XNNPACK=NO -DBUILD_TEST=YES .. -G Ninja; ninja basic` finishes successfully on Apple M1

Reviewed By: janeyx99

Differential Revision: D25669219

Pulled By: malfet

fbshipit-source-id: 5ee36b64e3a7ac76448f2a300ac4993375a26de5
2020-12-22 09:33:12 -08:00
Abdelrauf
95a1725a4a Vsx initial support issue27678 (#41541)
Summary:
### Pytorch Vec256 ppc64le support
implemented types:

- double
- float
- int16
- int32
- int64
- qint32
- qint8
- quint8
- complex_float
- complex_double

Notes:
All basic vector operations are implemented:
There are a few problems:
- minimum maximum nan propagation for ppc64le is missing and was not checked
- complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones.  That's why they were either excluded or tested in smaller domain range
- precisions of the implemented float math functions

~~Besides, I added CPU_CAPABILITY for power. but as because of  quantization errors for DEFAULT I had to undef and  use vsx for DEFAULT too~~

#### Details
##### Supported math functions

+ plus sign means vectorized, -  minus sign means missing,   (implementation notes are added inside braces)
(notes). Example: -(both ) means it was also missing on x86 side
g( func_name)  means vectorization is using func_name
sleef - redirected to the Sleef
unsupported

function_name | float | double | complex float | complex double
|-- | -- | -- | -- | --|
acos | sleef | sleef | f(asin) | f(asin)
asin | sleef | sleef | +(pytorch impl) | +(pytorch impl)
atan | sleef | sleef | f(log) | f(log)
atan2 | sleef | sleef | unsupported | unsupported
cos | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both)
cosh | f(exp)   | -(both) | -(both) |
erf | sleef | sleef | unsupported | unsupported
erfc | sleef | sleef | unsupported | unsupported
erfinv | - (both) | - (both) | unsupported | unsupported
exp | + | sleef | - (x86:f()) | - (x86:f())
expm1 | f(exp)  | sleef | unsupported | unsupported
lgamma | sleef | sleef |   |
log | +  | sleef | -(both) | -(both)
log10 | f(log)  | sleef | f(log) | f(log)
log1p | f(log)  | sleef | unsupported | unsupported
log2 | f(log)  | sleef | f(log) | f(log)
pow | + f(exp)  | sleef | -(both) | -(both)
sin | +((ppc64le:avx_mathfun) ) | sleef | -(both) | -(both)
sinh | f(exp)  | sleef | -(both) | -(both)
tan | sleef | sleef | -(both) | -(both)
tanh | f(exp)  | sleef | -(both) | -(both)
hypot | sleef | sleef | -(both) | -(both)
nextafter | sleef  | sleef | -(both) | -(both)
fmod | sleef | sleef | -(both) | -(both)

[Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685)
Current list:

- [x] Blends
- [x] Memory: UnAlignedLoadStore
- [x] Arithmetics: Plus,Minu,Multiplication,Division
- [x] Bitwise: BitAnd, BitOr, BitXor
- [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual
- [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp
- [x] SignManipulation: Absolute, Negate
- [x] Interleave: Interleave, DeInterleave
- [x] Rounding: Round, Ceil, Floor, Trunc
- [x] Mask: ZeroMask
- [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal
- [x] Trigonometric: Sin, Cos, Tan
- [x] Hyperbolic: Tanh, Sinh, Cosh
- [x] InverseTrigonometric: Asin, ACos, ATan, ATan2
- [x] Logarithm: Log, Log2, Log10, Log1p
- [x] Exponents: Exp, Expm1
- [x] ErrorFunctions: Erf, Erfc, Erfinv
- [x] Pow: Pow
- [x] LGamma: LGamma
- [x] Quantization: quantize, dequantize, requantize_from_int
- [x] Quantization: widening_subtract, relu, relu6
Missing:
- [ ] Constructors, initializations
- [ ] Conversion , Cast
- [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex)

#### Notes on tests and testing framework
- some math functions are tested within domain range
- mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions.
- some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~
- round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx  vec_round failed  for  (even)+0 .5 values**~~ . it was solved by using vec_rint
- ~~**complex types are not tested**~~  **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain**
- ~~quantizations are not tested~~  Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions
- the testing framework should be improved further
- ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~
Vec256 Test cases will be built for each CPU_CAPABILITY

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41541

Reviewed By: zhangguanheng66

Differential Revision: D23922049

Pulled By: VitalyFedyunin

fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098
2020-12-10 13:42:39 -08:00
Nikita Shulga
c29f51642e Modify NEON check for ARM64 on OS X (#48982)
Summary:
Use CMAKE_SYSTEM_PROCESSOR rather than run sysctl

Fixes https://github.com/pytorch/pytorch/issues/48874

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48982

Reviewed By: walterddr

Differential Revision: D25385883

Pulled By: malfet

fbshipit-source-id: 47b6dc5be8d75f6d4a66a11c564abdfe31ac90b4
2020-12-08 07:58:22 -08:00
Rong Rong
af520d9d04 [cmake] clean up blas discovery (#47940)
Summary:
remove useless variable changes in blas discovery

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47940

Reviewed By: malfet

Differential Revision: D25122228

Pulled By: walterddr

fbshipit-source-id: 12bc3ce9e4f89a72b6a92c10d14024e5941f4b96
2020-11-30 10:29:50 -08:00
Nikita Shulga
e7ca62be08 Fix PyTorch compilation on Apple M1 (#48275)
Summary:
Update cpuinfo and sleef to contain build fixes for M1

Fixes https://github.com/pytorch/pytorch/issues/48145

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48275

Reviewed By: walterddr

Differential Revision: D25135153

Pulled By: malfet

fbshipit-source-id: 2a82e14407d6f40c7dacd11109a8499d808c8ec1
2020-11-26 07:08:33 -08:00
Nikita Shulga
83d358da7c Fix LAPACK functionality detection from static OpenBLAS (#46710)
Summary:
BLAS `sgemm_` only depends on pthreads, but LAPACK `cheev_` also depends on libm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46710

Reviewed By: walterddr

Differential Revision: D24476082

Pulled By: malfet

fbshipit-source-id: e0b91116f18bbcdabb1f99c2ec9d98283df4393f
2020-10-26 08:34:28 -07:00
pinzhenx
e1f74b1813 Fix mkldnn build on legacy x64 arch (#46082)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45838

`ARCH_OPT_FLAGS` was the old name of `MKLDNN_ARCH_OPT_FLAGS`, which has been renamed in [this commit](2a011ff02e (diff-a0abcbf647ed740b80615fb5b1614a44L97)), but not updated in pytorch.

As its default value will be set to sse4.1, some kernels are going to fail on the legacy arch that does not support SSE4.1. This patch was to make this flag effective.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46082

Reviewed By: glaringlee

Differential Revision: D24252149

Pulled By: agolynski

fbshipit-source-id: 7079deed373d664763c5888feb28795e5235caa8
2020-10-12 08:45:06 -07:00
suffian khan
92b95e5243 Fix NCCL version check when nccl.h in non-standard location. (#40982)
Summary:
The NCCL discovery process fails to compile detect_nccl_version.cc when nccl.h resides in a non-standard location.
Pass __NCCL_INCLUDE_DIRS__ to _try_run(... detect_nccl_version.cc)_ to fix this.

Can reproduce with Dockerfile ..
```Dockerfile
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 as build
WORKDIR /stage

# install conda
ARG CONDA_VERSION=4.7.10
ARG CONDA_URL=https://repo.anaconda.com/miniconda/Miniconda3-${CONDA_VERSION}-Linux-x86_64.sh
RUN cd /stage && curl -fSsL --insecure ${CONDA_URL} -o install-conda.sh &&\
    /bin/bash ./install-conda.sh -b -p /opt/conda &&\
    /opt/conda/bin/conda clean -ya
ENV PATH=/opt/conda/bin:${PATH}

# install prerequisites
RUN conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi

# attempt compile
ENV CUDA_HOME="/usr/local/cuda" \
    CUDNN_LIBRARY="/usr/lib/x86_64-linux-gnu" \
    NCCL_INCLUDE_DIR="/usr/local/cuda/include" \
    NCCL_LIB_DIR="/usr/local/cuda/lib64" \
    USE_SYSTEM_NCCL=1
RUN apt-get -y update &&\
    apt-get -y install git &&\
    cd /stage && git clone https://github.com/pytorch/pytorch.git &&\
    cd pytorch &&\
    git submodule update --init --recursive &&\
    python setup.py bdist_wheel
```

This generates the following error ..
```
-- Found NCCL: /usr/local/cuda/include
-- Determining NCCL version from /usr/local/cuda/include/nccl.h...
-- Looking for NCCL_VERSION_CODE
-- Looking for NCCL_VERSION_CODE - found
CMake Error at cmake/Modules/FindNCCL.cmake:78 (message):
  Found NCCL header version and library version do not match! (include:
  /usr/local/cuda/include, library: /usr/local/cuda/lib64/libnccl.so) Please
  set NCCL_INCLUDE_DIR and NCCL_LIB_DIR manually.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40982

Reviewed By: zou3519

Differential Revision: D22603911

Pulled By: malfet

fbshipit-source-id: 084870375a270fb9c7daf3c2e731992a03614ad6
2020-07-17 13:54:17 -07:00
Zhang, Xiaobing
63e5a53b8c DNNL: fix build error when DNNL using TBB threading pool (#40699)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40699

Differential Revision: D22286334

Pulled By: albanD

fbshipit-source-id: 0635a0a5e4bf80d44d90c86945d92e98e26ef480
2020-06-29 13:53:18 -07:00
pinzhenx
bd604cb5b7 Upgrade MKL-DNN to DNNL v1.2 (#32422)
Summary:
## Motivation

This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.

DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.

This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.

<br>

## What's included?

Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:

<br>

**General:**

1. Replace op-level allocator with global-registered allocator

```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);

// after
ideep::sum::compute(scales, {x, y}, z);
```

The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.

```
RegisterEngineAllocator cpu_alloc(
  ideep::engine::cpu_engine(),
  [](size_t size) {
    return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
  },
  [](void* p) {
    c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
  }
);
```
------

2. Simplify group convolution

We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.

As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.

```
// aten/src/ATen/native/mkldnn/Conv.cpp

if (w.ndims() == x.ndims() + 1) {
  AT_ASSERTM(
      groups > 1,
      "Only group _mkldnn_conv2d weights could have been reordered to 5d");
  kernel_size[0] = w.get_dim(0) * w.get_dim(1);
  std::copy_n(
      w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
  std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```

------

3. Enable DNNL built-in cache

Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.

This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.

------

4. Use 64-bit integer to denote dimensions

We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.

<br>

**Misc changes in each commit:**

**Commit:** change build options

Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.

Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)

------

**Commit:** aten reintegration

- aten/src/ATen/native/mkldnn/BinaryOps.cpp

    Implement binary ops using new operation `binary` provided by DNNL

- aten/src/ATen/native/mkldnn/Conv.cpp

    Clean up group convolution checks
    Simplify conv backward integration

- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

    Simplify prepacking convolution weights

- test/test_mkldnn.py

    Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue

- torch/utils/mkldnn.py

    Prepack weight tensor on module `__init__` to achieve better performance significantly

------

**Commit:** caffe2 reintegration

- caffe2/ideep/ideep_utils.h

    Clean up unused type definitions

- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc

   Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`

- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc

    Clean up group convolution checks
    Revamp convolution API

- caffe2/ideep/operators/conv_transpose_op.cc

    Clean up group convolution checks
    Clean up deconv workaround code

------

**Commit:** custom allocator

- Register c10 allocator as mentioned above

<br><br>

## Performance

We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.

ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%

_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_

† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422

Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results

10% improvement for ResNext with avx512, neutral on avx2

More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP

Reviewed By: yinghai

Differential Revision: D20381325

Pulled By: dzhulgakov

fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
2020-03-26 22:07:59 -07:00
cyy
5be8a4e027 find mkl installed by nuget (#34031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34031

Differential Revision: D20221807

Pulled By: ezyang

fbshipit-source-id: 827e2775956f408febb287676bbf9a96a70fe2d4
2020-03-03 07:44:20 -08:00
t-kuha
acea368095 Fix compilation error when buildng with FFMPEG (#27589)
Summary:
When building with FFMPEG, I encountered compilation error due to missing include/library.
I also find the change in video_input_op.h will improve build on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27589

Differential Revision: D19700351

Pulled By: ezyang

fbshipit-source-id: feff25daa43bd2234d5e75c66b9865b672a8fb51
2020-02-13 11:23:48 -08:00
peter
d3fa68eeec Fix for MKL detection script on Windows (#32970)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32914.
1. Use `DEFINED ENV{MKLProductDir}` instead of `$ENV{MKLProductDir}`
2. Cache `INTEL_COMPILER_DIR` and `INTEL_MKL_DIR`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32970

Differential Revision: D19727677

Pulled By: soumith

fbshipit-source-id: 065c6bee35a2295f1c478df1460cad7668b25af5
2020-02-04 12:41:39 -08:00
caozhong
29e6f13cd1 Enable MKL on MacOS if installed (#32905)
Summary:
Fix cmake script that missed MKL directories

Signed-off-by: caozhong <zhong.z.cao@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32905

Differential Revision: D19688496

Pulled By: ezyang

fbshipit-source-id: d04a608eea5f983e153a48b0b1eb0390aebbe6c0
2020-02-02 14:57:43 -08:00
xiaobing.zhang
19bb496a0d Enable mkldnn on windows (#31355)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15982.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31355

Differential Revision: D19428979

Pulled By: ezyang

fbshipit-source-id: bee304c5913e70e8dead3098e9796051861cd666
2020-01-27 09:00:02 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Edward Yang
a9dae70bae Remove LibIRC logic from cmake. (#31152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31152

Per apaszke: I can't find any reasonable references to libIRC online, so
I decided to remove this.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262582

Pulled By: ezyang

fbshipit-source-id: a1d47462427a3e0ca469062321d608e0badf8548
2020-01-06 14:39:43 -08:00
Adam J. Stewart
4cf7277d62 Explain how to specify library location for MKL (#28779)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24334.

I'm still kind of confused why `FindMKL.cmake` was unable to locate my MKL libraries. They are in the standard `/opt/intel/mkl` installation prefix on macOS. But at least with this more detailed error message, it will be easier for people to figure out how to fix the problem.

zhangguanheng66 xkszltl soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28779

Differential Revision: D18170998

Pulled By: soumith

fbshipit-source-id: 47e61baadd84c758267dca566eb1fb8a081de92f
2019-10-28 08:00:54 -07:00
albanD
63df9ffd0b Fix typo in OpenBLAS cmake detection
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25966

Differential Revision: D17315925

Pulled By: albanD

fbshipit-source-id: 55c6b4a1ddeaf95714034ec66a4d59b0f00ba634
2019-09-11 09:10:42 -07:00
Johannes M Dieterich
26675b507f Enable libflame as a LAPACK choice (#25795)
Summary:
libflame is BLIS's companion LAPACK from the FLAME project

Mimicks my ancient
f5bc78263e
in cmake upstream

BLIS WWW: https://github.com/flame/libflame
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25795

Differential Revision: D17286461

Pulled By: bddppq

fbshipit-source-id: 7cd0d27127c78563574791415e4a34f045df30df
2019-09-10 10:34:55 -07:00
J M Dieterich
748436a514 Enable BLIS from the FLAME project as a BLAS choice. (#23819)
Summary:
BLIS is AMD's official recommendation for BLAS.

Mimicks my ancient
f5bc78263e
in cmake upstream

BLIS WWW: https://github.com/flame/blis
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23819

Differential Revision: D17231360

Pulled By: bddppq

fbshipit-source-id: 68db70d63e410438f99b2bf57986b81ff6b6c5b3
2019-09-06 12:00:25 -07:00
Gu, Jinghui
718feb6d76 upgrade MKL-DNN to v0.20.3 (#22910)
Summary:
1. upgrade MKL-DNN to v0.20.3
2. allow user to change the capability of primitive cache in mkldnn-bridge by environment value LRU_CACHE_CAPACITY
3. support to fill all tensor elements by one scalar
4. fix the link issue if build with private MKLML other than pre-installed MKL
5. add rnn support in mkldnn-bridge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22910

Differential Revision: D16365998

Pulled By: VitalyFedyunin

fbshipit-source-id: b8d2bb454cbfbcd4b8983b1a8fa3b83e55ad01c3
2019-08-28 07:30:14 -07:00
Hong Xu
388dc4f2a6 Let user be able to change MKLDNN "-m" flags back and forth in subsequent builds (#23608)
Summary:
Currently once user has set `USE_NATIVE_ARCH` to OFF, they will never be able to turn it on for MKLDNN again by simply changing `USE_NATIVE_ARCH`. This commit fixes this issue.

Following up 09ba4df031
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23608

Differential Revision: D16599600

Pulled By: ezyang

fbshipit-source-id: 88bbec1b1504b5deba63e56f78632937d003a1f6
2019-08-01 06:05:36 -07:00
Hong Xu
a8edc2b5d2 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22926

Differential Revision: D16546369

Pulled By: colesbury

fbshipit-source-id: 56f7ef4476e586dee19366fdb720085d1c2f2027
2019-07-29 13:47:05 -07:00
Hong Xu
09ba4df031 Whether MKLDNN should be built under native arch should respect USE_NATIVE_ARCH (#23445)
Summary:
Currently there is no way to build MKLDNN more optimized than sse4. This commit let MKLDNN build respect USE_NATIVE_ARCH.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23445

Differential Revision: D16542275

Pulled By: ezyang

fbshipit-source-id: 550976531d6a52db9128c0e3d4589a33715feee2
2019-07-29 08:13:56 -07:00
Gu, Jinghui
1dd4d55565 Improve FindMKLDNN.cmake to avoid binary compatibility issue in MKL-DNN (#23292)
Summary:
Illegal instruction is encountered in pre-built package in MKL-DNN. https://github.com/pytorch/pytorch/issues/23231
To avoid such binary compatibility issue, the HostOpts option in MKL-DNN is disabled in order to build MKL-DNN for generic arch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23292

Differential Revision: D16488773

Pulled By: soumith

fbshipit-source-id: 9e13c76fb9cb9338103cb767d7463c10891d294a
2019-07-25 04:42:26 -07:00
Hong Xu
60c46dd4df Let CMake handle NCCL detection instead of our handcrafted Python script. (#22930)
Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81e A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
2019-07-23 08:45:51 -07:00
Edward Yang
798d5d9771 Revert D16281714: Add sanity checks for NCCL detection.
Differential Revision:
D16281714

Original commit changeset: 396bcbf099bd

fbshipit-source-id: a22cc112d1b6a62d689f9d8a7f93e8be3abe2a44
2019-07-16 13:58:27 -07:00
Will Feng
01f03d56ee Revert D16283037: Add sanity checks for NCCL detection.
Differential Revision:
D16283037

Original commit changeset: fc09c9443a56

fbshipit-source-id: 30cdf7b1ad91498ee615d018de5571ba36f4383e
2019-07-16 13:20:43 -07:00
Hong Xu
31497799b9 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16283037

Pulled By: ezyang

fbshipit-source-id: fc09c9443a568d9af1c78a847282a7d707c49dd6
2019-07-16 11:32:36 -07:00
Hong Xu
e2046f8c1d Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16281714

Pulled By: ezyang

fbshipit-source-id: 396bcbf099bd07b996cf779c6b43092096b52d90
2019-07-16 11:32:32 -07:00
Hui Wu
07ef85e326 Add USE_MKLDNN_CBLAS build option. (#19014)
Summary:
MKL-DNN is the main library for computation when we use ideep device. It can use kernels implemented by different algorithms (including JIT, CBLAS, etc.) for computation. We add the "USE_MKLDNN_CBLAS" (default OFF) build option so that users can decide whether to use CBLAS computation methods or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19014

Differential Revision: D16094090

Pulled By: ezyang

fbshipit-source-id: 3f0b1d1a59a327ea0d1456e2752f2edd78d96ccc
2019-07-02 12:29:54 -07:00
Hong Xu
e6d4a2d289 Remove unused file cmake/Modules/FindMIOpen.cmake (#22244)
Summary:
`cmake/public/LoadHIP.cmake` calls `find_package(miopen)`, which uses the CMake module in MIOpen installation (It includes the line `set(miopen_DIR ${MIOPEN_PATH}/lib/cmake/miopen)`). `cmake/Modules/FindMIOpen.cmake` is not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22244

Differential Revision: D16000771

Pulled By: bddppq

fbshipit-source-id: 07bb40fdf033521e8427fc351715d47e6e30ed34
2019-06-26 21:21:46 -07:00
Ilia Cherniavskii
6350dbddd1 Fix sequential MKL case (#22062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22062
ghimport-source-id: a30255d7453c4ffecf40215a785c1e06b7296368

Test Plan:
USE_CUDA=0 PARALLEL_BACKEND=OPENMP BLAS=MKL USE_MKLDNN=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ BUILD_BINARY=1 python setup.py develop --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D15938079

Pulled By: ilia-cher

fbshipit-source-id: e7ef0c5bc75ebb845ebe66bf76a4070d45305b35
2019-06-24 12:56:43 -07:00
bddppq
4940e41d16 Fix mkl-dnn tautological compare error (#21371)
Summary:
```
../third_party/ideep/mkl-dnn/src/cpu/jit_avx512_common_convolution.hpp:144:821: error: self-comparison always evaluates to true [-Werror,-Wtautological-compare]
        virtual pd_t *clone() const override { return new pd_t(*this); } virtual status_t create_primitive(primitive_t **primitive, const primitive_at_t *inputs, const primitive_t **outputs) const override { double ms = get_msec(); primitive_t::input_vector ins(inputs, inputs + this->n_inputs()); primitive_t::outpu
t_vector outs(outputs, outputs + this->n_outputs()); auto ret = safe_ptr_assign<primitive_t>(*primitive, new (jit_avx512_common_convolution_bwd_data_t)(this, ins, outs)); ms = get_msec() - ms; if (mkldnn_verbose()->level >= 2) { printf("mkldnn_verbose,create,%s,%g\n", this->info(), ms); fflush(0); } return ret; } v
irtual const char *name() const override { return (avx512_common == sse42 ? "jit:" "sse42" : (avx512_common == avx ? "jit:" "avx" : (avx512_common == avx2 ? "jit:" "avx2" : (avx512_common == avx512_common ? "jit:" "avx512_common" : (avx512_common == avx512_core ? "jit:" "avx512_core" : (avx512_common == avx512_mic
? "jit:" "avx512_mic" : (avx512_common == avx512_mic_4ops ? "jit:" "avx512_mic_4ops" : "jit:" ""))))))); };
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21371

Differential Revision: D15631392

Pulled By: bddppq

fbshipit-source-id: 3b0008acab8ae53ce61327686bd8367e7fb5d298
2019-06-04 15:27:07 -07:00
Ilia Cherniavskii
580eab6562 Restore TBB module (#20454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454
ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384

Differential Revision: D15326062

Pulled By: ilia-cher

fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f
2019-05-28 02:49:36 -07:00
peter
872bab22c6 Some essential changes needed before updating the Windows AMI (#20353)
Summary:
1. Add cuda 10.1 build
2. Turn on openmp loop support for VS 2019
3. Remove legacy code about selective builds

Tested through CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20353

Differential Revision: D15294806

Pulled By: ezyang

fbshipit-source-id: 0acf5c3fbbc398fd9ebdf9f97653499d39638432
2019-05-10 09:08:51 -07:00
Ilia Cherniavskii
481b6d0268 Allow a non-OpenMP based build (#19749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19749
ghimport-source-id: a6636c0acddbdc5fd5b0dcb20b9f80cbdb9159b9

Differential Revision: D15141993

Pulled By: ilia-cher

fbshipit-source-id: 96085608398b2a4c97c68b2948f5184d07f9ad3d
2019-05-06 19:34:48 -07:00
Edward Yang
48a35135fb Convert all tabs to spaces, add CI. (#18959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959
ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156

Differential Revision: D14831246

Pulled By: ezyang

fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0
2019-04-09 08:12:26 -07:00
Balint Cristian
67fdb4abf7 AVX2 with GCC9 fix. (#18991)
Summary:
Dear All,

The proposed patch fixes the test code snippets used in cmake infrastructure, and implicit failure to set properly the ```CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS``` flag. The libcaffe2.so will have some ```UND``` avx2 related references, rendering it unusable.

* Using GCC 9 test code from cmake build infra always fails:
```
$ gcc  -O2 -g -pipe -Wall -m64 -mtune=generic -fopenmp -DCXX_HAS_AVX_1 -fPIE -o test.o -c test.c -mavx2
test.c: In function ‘main’:
test.c:11:26: error: incompatible type for argument 1 of ‘_mm256_extract_epi64’
   11 |     _mm256_extract_epi64(x, 0); // we rely on this in our AVX2 code
      |                          ^
      |                          |
      |                          __m256 {aka __vector(8) float}
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/immintrin.h:51,
                 from test.c:4:
/usr/lib/gcc/x86_64-redhat-linux/9/include/avxintrin.h:550:31: note: expected ‘__m256i’ {aka ‘__vector(4) long long int’} but argument is of type ‘__m256’ {aka ‘__vector(8) float’}
  550 | _mm256_extract_epi64 (__m256i __X, const int __N)
      |

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 9.0.1 20190328 (Red Hat 9.0.1-0.12) (GCC)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18991

Differential Revision: D14821838

Pulled By: ezyang

fbshipit-source-id: 7eb3a854a1a831f6fda8ed7ad089746230b529d7
2019-04-07 08:27:00 -07:00
Thomas Viehmann
13bc002422 fixes for AVX detection (#17915)
Summary:
Our AVX2 routines use functions such as _mm256_extract_epi64
that do not exist on 32 bit systems even when they have AVX2.
This disables AVX2 when _mm256_extract_epi64 does not exist.

This fixes the "local" part of #17901 (except disabling FBGEMM),
but there also is sleef to be updated and NNPACK to be fixed,
see the bug report for further discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17915

Differential Revision: D14437338

Pulled By: soumith

fbshipit-source-id: d4ef7e0801b5d1222a855a38ec207dd88b4680da
2019-03-13 03:55:06 -07:00
JerryShih
73db487a8e Update the cmake build configuration for AppleClang compiler (#15820)
Summary:
This pr try to merge the https://github.com/pytorch/pytorch/pull/11563 again and fix the linking error in https://github.com/pytorch/pytorch/pull/14837.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15820

Differential Revision: D13942024

Pulled By: ezyang

fbshipit-source-id: dc6d1e9c4b0f177914f3745665244272a03ce33c
2019-02-04 08:53:47 -08:00
SsnL
13422fca32 Add torch.backends.openmp.is_available(); fix some cmake messages (#16425)
Summary:
1. add `torch.backends.openmp.is_available()`
2. Improve various `cmake` outputs
3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets
4. Fix `MKL` warning message, and QUIET flag.
5. Fix various typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425

Differential Revision: D13903395

Pulled By: soumith

fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d
2019-01-31 16:15:46 -08:00
rtarquini
879bccb1af Support for Jetson Xavier (#15660)
Summary:
The request changes are to support building Pytorch 1.0 on the Jetson Xavier with Openblas.  Jetson Xavier with Jetpack 3.3 has generic lapack installed. To pick up the CUDA accelerated BLAS/Lapack, I had to build Openblas and build/link pytorch from source. Otherwise, I got a runtime error indicating lapack routines were not cuda enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15660

Differential Revision: D13571324

Pulled By: soumith

fbshipit-source-id: 9b148d081d6e7fa7e1824dfdd93283c67f69e683
2019-01-02 18:51:42 -08:00
Gu, Jinghui
12e0ed55b4 Upgrade MKL-DNN to version 0.17 and static build MKL-DNN (#15504)
Summary:
Upgrade MKl-DNN to 0.17 and static build MKL-DNN to fix the potentail build error due to old mkldnn version in host system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15504

Differential Revision: D13547885

Pulled By: soumith

fbshipit-source-id: 46f790a3d9289c1e153e51c62be17c5206ea8f9a
2018-12-25 22:56:51 -08:00
Edward Yang
54d8ce94ee Revert D13383102: [pytorch][PR] Upgrade MKL-DNN to version 0.17
Differential Revision:
D13383102

Original commit changeset: c434f0e0ddff

fbshipit-source-id: 690f46ca0710954fa591a5ea77535e9759db4de5
2018-12-18 07:39:20 -08:00
Gu, Jinghui
4b97a46421 Disable strict-overflow flag to avoid compilation error (#14977)
Summary:
Disable strict-overflow flag to avoid compilation error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14977

Differential Revision: D13447577

Pulled By: soumith

fbshipit-source-id: 1957bd5aa3c7b79219da3dd53560464977c89526
2018-12-12 22:41:33 -08:00
Gu, Jinghui
70598740ec Upgrade MKL-DNN to version 0.17 (#14308)
Summary:
upgrade MKL-DNN to version 0.17
update mkldnn bridge to latest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14308

Differential Revision: D13383102

Pulled By: yinghai

fbshipit-source-id: c434f0e0ddff2ee2c86db2d6c44a37298fd005a3
2018-12-07 16:44:50 -08:00
Gu, Jinghui
6aee5488b5 correct omp dependency for mkl-dnn (#13449)
Summary:
The motivational of this PR is to enforce mkldnn to use the same omp version of caffe2 framework.
Meanwhile, do not change other assumptions within mkldnn.

Previously, the MKL_cmake_included is set in caffe2 in order to disable omp seeking in mkldnn.
But, with such change, mkldnn has no chance to adapt for mkl found by caffe2.
Then, some building flags of mkl will be not set in mkldnn.
For example, USE_MKL, USE_CBLAS, etc.

In this PR, we enforce set the MKLIOMP5LIB for mkldnn according to caffe2, and tell the mkl root path in MKLROOT for mkldnn. Then, mkldnn is built as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13449

Differential Revision: D12899504

Pulled By: yinghai

fbshipit-source-id: 22a196bd00b4ef0a11d350a32c049304613edf52
2018-11-06 10:48:09 -08:00
Gu, Jinghui
dbab9b73b6 seperate mkl, mklml, and mkldnn (#12170)
Summary:
1. Remove avx2 support in mkldnn
2. Seperate mkl, mklml, and mkldnn
3. Fix convfusion test case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170

Reviewed By: yinghai

Differential Revision: D10207126

Pulled By: orionr

fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51
2018-10-29 10:52:55 -07:00
Christian Puhrsch
f564163951 Remove SSE-only code and convolve5x5 (#12109)
Summary:
Performance oriented code will use AVX/AVX2, so we don't need SSE specific code anymore. This will also reduce the probability of running into an error on legacy CPUs.

On top of this convolve is covered by modern libraries such as MKLDNN, which are much more performant and which we now build against by default (even for builds from source).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12109

Differential Revision: D10055134

Pulled By: colesbury

fbshipit-source-id: 789b8a34d5936d9c144bcde410c30f7eb1c776fa
2018-10-09 10:53:50 -07:00
Gu, Jinghui
c064f8a89d Fix build error mkldnn due to corruptted CMAKE_REQUIRED_LIBRARIES (#12195)
Summary:
This is to fix cmake-time compilation error.

 When we change script to build Caffe2 with mkldnn, we run into some cmake-time compilation support check (like in libsleef) failed due to incorrect setting of CMAKE_REQUIRED_LIBRARIES.  It is a global setting which can interfere camke compilation if it is not clean up properly.  FindBLAS.cmake and FindLAPACK.cmake didn't clean this flag, and causes incorrect building of libsleef.so.

yinghai gujinghui
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12195

Differential Revision: D10159314

Pulled By: yinghai

fbshipit-source-id: 04908738f7d005579605b9c2a58d54f035d3baf4
2018-10-04 11:56:06 -07:00
Yinghai Lu
658386a63f Make USE_IDEEP work again (#12026)
Summary:
This PR establish a baseline so that we can build IDEEP ops in the new work flow. From this baseline, we need to
- Merge the CMakefile of MKLDNN from caffe2 and Pytorch
- Get rid of `USE_MKL=ON`.

Build command from now on:
```
EXTRA_CAFFE2_CMAKE_FLAGS="-DUSE_MKL=ON -DINTEL_COMPILER_DIR=/opt/IntelComposerXE/2017.0.098"  python setup.py build_deps
```

gujinghui
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12026

Differential Revision: D10041199

Pulled By: yinghai

fbshipit-source-id: b7310bd84a494ac899d8e25da368b63feed4eeaf
2018-09-25 16:56:29 -07:00
Soumith Chintala
77af40c025 prioritize Accelerate over OpenBLAS (#11812)
Summary:
might fix some binary build issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11812

Reviewed By: ezyang

Differential Revision: D9927309

Pulled By: soumith

fbshipit-source-id: 9ed6c2c6fedc2a1cffbf52bc0a795135d4239800
2018-09-18 21:56:57 -07:00
Anders Papitto
a853a74217 defer resolution of mkl to a cmake wrapper library (#11298)
Summary:
this is a fix that's needed for building extensions with a
pre-packaged pytorch. Consider the scenario where

(1) pytorch is compiled and packaged on machine A
(2) the package is downloaded and installed on machine B
(3) an extension is compiled on machine B, using the downloaded package

Before this patch, stage (1) would embed absolute paths to the system
installation of mkl into the generated Caffe2Config.cmake, leading to
failures in stage (3) if mkl was not at the same location on B as on
A. After this patch, only a reference to the wrapper library is
embedded, which is re-resolved on machine B.

We are already using a similar approach for cuda.

Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298

Differential Revision: D9683150

Pulled By: anderspapitto

fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4
2018-09-06 09:10:39 -07:00
Johannes M Dieterich
a4c59a9dab MIOpen integration, more tests enabled, bug fixes (#10612)
Summary:
* first integration of MIOpen for batch norm and conv on ROCm
* workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing
* workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script
* use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm
* enable test_sparse set on CI, skip tests that don't work currently on ROCm
* enable more tests in test_optim after the elementwise_bug got fixed
* enable more tests in test_dataloader
* improvements to hipification and ROCm build

With this, resnet18 on CIFAR data trains without hang or crash in our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612

Reviewed By: bddppq

Differential Revision: D9423872

Pulled By: ezyang

fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd
2018-08-23 15:24:47 -07:00
peter
facb293aad Fix FindMKL.cmake for Windows (#10453)
Summary:
Targets the issue discussed at https://github.com/pytorch/pytorch/pull/7399#issuecomment-400788971.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10453

Differential Revision: D9311591

Pulled By: soumith

fbshipit-source-id: ac0712e10bdac4ea3f76d6fbad2178ec958b3a31
2018-08-13 21:09:27 -07:00
Jesse Hellemn
def3715e82 Minor changes for nicer pip packages (#9544)
Summary:
I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544

Reviewed By: orionr

Differential Revision: D9267111

Pulled By: pjh5

fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894
2018-08-10 12:09:46 -07:00
Yinghai Lu
766fa1fc96 Fix IDEEP CMakefile (#9217)
Summary:
The reason is that we are referencing `__ideep_looked_for` here: 77484d91db/cmake/Modules/FindMKL.cmake (L350)

This was first flushed out in https://github.com/pytorch/pytorch/pull/8105 and probably can help with https://github.com/pytorch/pytorch/issues/9024
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9217

Reviewed By: houseroad

Differential Revision: D8754491

Pulled By: yinghai

fbshipit-source-id: 70aecc2d60684b9ea522403dc98a0a1a2c3db7e6
2018-07-06 15:28:07 -07:00
Yinghai Lu
c3b499227d Avoid iomp/gomp clash when building IDEEP ops (#8955)
Summary:
This PR does 3 things
- Reorder the search order of `intel_lp64` and `gf_lp64` as the first one is more essential and should have high priority.
- Avoid repetitive searching of MKL libraries in `ideep` and `mkldnn` submodule if we already found those in `FindMKL`
- Avoid adding more MKL dependencies to IDEEP if MKL is also found.

TODO: provide an option for user to chose iomp or gomp.
Closes https://github.com/pytorch/pytorch/pull/8955

Reviewed By: bddppq

Differential Revision: D8666960

Pulled By: yinghai

fbshipit-source-id: 669d3142204a8b47c19a900444246fc44a139012
2018-06-27 21:24:36 -07:00
Pieter Noordhuis
8e019826c9 Fix cmake cudnn autodetection (#8891)
If CUDNN_INCLUDE_DIR, CUDNN_LIB_DIR, and/or CUDNN_ROOT_DIR were set,
but USE_CUDNN was not explicitly set, the code in
cmake/Dependencies.cmake would set USE_CUDNN=OFF even though it could
be found. This caused an issue in ATen, where it includes its CuDNN
bindings if the variable CUDNN_FOUND is set. This was the case,
because the find_package call in cmake/public/cuda.cmake searches for
CuDNN and ends up finding it. The net result is that ATen tried to
compile CuDNN bits, but the caffe2::cudnn target is never defined let
alone added as dependency, and the build fails on not being able to
find the header cudnn.h.

This change does two things:

1) Restore CuDNN autodetection by setting USE_CUDNN=ON if it is found.
2) Remove obsolete FindCuDNN.cmake module. This functionality now
lives in cmake/public/cuda.cmake.
2018-06-26 06:54:27 -07:00
Teng Li
a994b432ee [c10d] NCCL Process Group implementation (#8182)
* [c10d] Process Group NCCL implementation

* Addressed comments

* Added one missing return and clang format again

* Use cmake/Modules for everything and fix gloo build

* Fixed compiler warnings

* Deleted duplicated FindNCCL
2018-06-08 10:33:27 -07:00
Yinghai Lu
fb5cc630f6 Fix me (#7837)
* Mini fix

* No USE_MKL

* Add CAFFE2_USE_EIGEN_FOR_BLAS
2018-05-25 07:38:50 -07:00
Yinghai Lu
144c5d1ff3
Overwrite INTEL_MKL_DIR correctly (#7824) 2018-05-24 15:04:25 -07:00
Yinghai Lu
71bad33cc4
Match parenthesis (#7797) 2018-05-24 13:45:23 -07:00
Orion Reblitz-Richardson
4bf0202cac
[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399)
* Have PyTorch depend on minimal libcaffe2.so instead of libATen.so

* Build ATen tests as a part of Caffe2 build

* Hopefully cufft and nvcc fPIC fixes

* Make ATen install components optional

* Add tests back for ATen and fix TH build

* Fixes for test_install.sh script

* Fixes for cpp_build/build_all.sh

* Fixes for aten/tools/run_tests.sh

* Switch ATen cmake calls to USE_CUDA instead of NO_CUDA

* Attempt at fix for aten/tools/run_tests.sh

* Fix typo in last commit

* Fix valgrind call after pushd

* Be forgiving about USE_CUDA disable like PyTorch

* More fixes on the install side

* Link all libcaffe2 during test run

* Make cuDNN optional for ATen right now

* Potential fix for non-CUDA builds

* Use NCCL_ROOT_DIR environment variable

* Pass -fPIC through nvcc to base compiler/linker

* Remove THCUNN.h requirement for libtorch gen

* Add Mac test for -Wmaybe-uninitialized

* Potential Windows and Mac fixes

* Move MSVC target props to shared function

* Disable cpp_build/libtorch tests on Mac

* Disable sleef for Windows builds

* Move protos under BUILD_CAFFE2

* Remove space from linker flags passed with -Wl

* Remove ATen from Caffe2 dep libs since directly included

* Potential Windows fixes

* Preserve options while sleef builds

* Force BUILD_SHARED_LIBS flag for Caffe2 builds

* Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing

* Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake

* Fixes for the last two changes

* Potential fix for Mac build failure

* Switch Caffe2 to build_caffe2 dir to not conflict

* Cleanup FindMKL.cmake

* Another attempt at Mac cpp_build fix

* Clear cpp-build directory for Mac builds

* Disable test in Mac build/test to match cmake
2018-05-24 07:47:27 -07:00
Paul Jesse Hellemn
b875fb281c
Update from facebook (#7451)
* [bootcamp] Improve "Shape" operator to support axes specification

To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.

* Back out "Add barrier net that runs before training nets"

Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.

* Change warning to verbose log to reduce log spam

The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.

* Extract the shared code from different caffe2_benchmark binaries

The OSS benchmark and Internal benchmark will share most functions in the benchmark.

* Support MFR in sequence training

As titled.

* Make knowledge distillation work with using logged prediction feature as teacher label.

1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label

* [C2/CUDA]: unjoined cross entropy sigmoid

as desc

* Add async_scheduling executor into deferrable_net_exec_test

Add async_scheduling into tests and fix some exception cases

* Fix Event disabled error

When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync

* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA

as desc.

* [C2 Core] Infer input device option in C2 hypothesis_test checkers

Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.

For example:
CopyCPUToGPU

* [C2 Op]SplitByLengthsOp CPU/GPU implementation

[C2 Op]SplitByLengthsOp CPU/GPU implementation

* fix undefined symbol error

not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now

* Add tools in DAIPlayground platform to help debugging models

Add additional tools to allow Plauground override individual method defined in AnyExp.  This will allow user to create module that specificly change certain default method behavior.  An example included in this diff is deactivating test model and checkpointing.  When debugging any model problems, switching off components helps me quickly narrow down the location of the bug.  The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)

* add shape and type inference for int8 conversion operator

* Fix flaky test for group_norm

Fix flaky test for group_norm

* Fix group_norm_op_test flaky

Fix group_norm_op_test flaky

* Implementation of composite learning rate policy

In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.

For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.

* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader

# Use Cases:

1). input: DB file -> output: DatasetReader.

Use DBFileReader.

2). input: Reader -> build cache DB file -> output: DatasetReader.

Use CachedReader.

# Changes to CachedReader:

1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.

# Changes to tests:

1). Make a separate TestCase class for CachedReader and DBFileReader.

2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.

3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.

* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"

Original commit changeset: 4489c6133f11

* Fix LARS bug

Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.

* [tum] support sparse init & add uniformFill option

as title

* Propagate exception for async nets

Capture the exception when an exception is thrown in async nets and re-throw it after wait().  This allows exceptions to be propagated up to the caller.

This diff was a part of D7752068.  We split the diff so that C2 core files changes are in a separate diff.

* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc

Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a

Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>

* [C2]ReluN Op

relu n op.

tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6

* Call destructor when assigning a blob value

* Add executor overrides

Add executor overrides flag to enable migration to async_scheduling executor

* Add barrier net that runs before training nets - attempt #2

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.

This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.

To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net.  Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.

* Handle empty nets in async_scheduling

Make sure we don't get stuck on empty nets

* use CUDA_ARCH for conditional compile

* [C2 fix] infer function for ensure_cpu_output_op

* Update group_norm test to reduce flaky test

* Fix lr_multiplier for GPU
2018-05-10 23:14:27 -07:00
Jinghui
769397eb77 [Caffe2] [feature request] Add gradient operators for IDEEP (#7234)
* Add gradient operators for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add gradient test cases for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Upgrade third_party/ideep

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Refine SumOp for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share input buffer in fallback op if possible

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback ConvTranspose op for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix bug introduced by the patch of sharing input buffer

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share output buffer in fallback operators

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove IDEEP to resolve repo issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Reflash IDEEP repo

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove redundant lines in IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback operators for IDEEP
(Flatten, ResizeLike, Transpose, and Reshape)

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-05-09 08:52:24 -07:00
Yinghai Lu
ea24c7ff1b
Remove cdft library requirement from MKL (#7246) 2018-05-07 15:31:30 -07:00
Orion Reblitz-Richardson
aa38ae303d
[build] Setup to build ATen from root CMake file (#7163)
* Setup to build ATen from root CMake file

* Move aten/src/TH/cmake into cmake/Modules

* Add special code path for FindMKL for merge
2018-05-02 19:33:31 -07:00
Yinghai Lu
8b70f7d248
[Caffe2] Clean up ideep integration (#6881)
* Clean up ideep integrtation

* .

* Remove redundant code in convnet benchmark

* MKL ON

* Do not add -mavx2 everywhere

* .

* Comments

* rename

* .
2018-04-24 18:32:35 -07:00
Jinghui
26ddefbda1 [feature request] [Caffe2] Enable MKLDNN support for inference (#6699)
* Add operators based-on IDEEP interfaces

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Enable IDEEP as a caffe2 device

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add test cases for IDEEP ops

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add IDEEP as a caffe2 submodule

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Skip test cases if no IDEEP support

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct cmake options for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add dependences on ideep libraries

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix issues in IDEEP conv ops and etc.

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Move ideep from caffe2/ideep to caffe2/contrib/ideep

Signed-off-by: Gu Jinghui <jinghui.gu@intel.com>

* Update IDEEP to fix cmake issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix cmake issue caused by USE_MKL option

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct comments in MKL cmake file

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-04-22 21:58:14 -07:00
Marat Dukhan
63b5cc47eb
[caffe2] Minor changes in NNPACK CMake scripts (#6532)
- Tell NNPACK to not link pthreadpool, but only its headers
- Remove FindNNPACK.cmake as it is no longer used
2018-04-11 20:56:38 -04:00
Soumith Chintala
108f5c197f
[pytorch] add static linkage support for CuDNN and NCCL (#6410)
* when linking static CUDA libs, additional dep on culibos.a

* add USE_STATIC_NCCL option

* add USE_STATIC_CUDNN option

* remove libATen soversion

* add caffe, caffe2 folders to setup.py exclude list
2018-04-08 22:54:18 -04:00
Yangqing Jia
4aded2f7c1 Add Numa support (#2152) 2018-03-05 23:30:20 -08:00
Yangqing Jia
1f9df59de9 Move caffe_option to proper cmake_dependent_option (#2049) 2018-02-24 23:31:36 -08:00
Yangqing Jia
fe5fe7bad2 CMake cuda targets (#1993)
* wip: cuda targets

* Remove FindCuDNN.cmake as it is no longer needed
2018-02-22 15:54:34 -05:00
sf-wind
5439ab3cdc Remove gf library in MKL (#1976)
* Remove OpenGL code from benchmark

* Make it possible to print plot in the ipython notbook

* Create the blob if the blob is not specified in the init net

* Do not use gf library for MKL. Even after I install the entire MKL library it is still not found. After removing it, the MKL code can still run
2018-02-20 15:17:34 -08:00
Yangqing Jia
d481afb125 Modernizing glog. Same as gflags.
Summary:
Same as PR #1819.
Closes https://github.com/caffe2/caffe2/pull/1830

Differential Revision: D6832171

Pulled By: Yangqing

fbshipit-source-id: 462a9b807e78d60748160a0cfd24932c9003fcc3
2018-01-28 18:21:22 -08:00
Yangqing Jia
73ed0d5ced Modernizing the gflags dependency in cmake.
Summary:
Historically, for interface dependent libraries (glog, gflags and protobuf), exposing them in Caffe2Config.cmake is usually difficult.

New versions of glog and gflags ship with new-style cmake targets, so one does not need to use variables. New-style targets also make it easier for people to depend on them in installed config files.

This diff modernizes the gflags library, and still provides a fallback path if the installed gflags does not have cmake config files coming with it.

It does change one behavior of the build process though - when one specifies -DUSE_GFLAGS=ON but gflags cannot be found, the old script automatically turns it off but the new script crashes, forcing the user to specify USE_GFLAGS=OFF.
Closes https://github.com/caffe2/caffe2/pull/1819

Differential Revision: D6826604

Pulled By: Yangqing

fbshipit-source-id: 210f3926f291c8bfeb24eb9671e5adfcbf8cf7fe
2018-01-27 19:31:14 -08:00
Marat Dukhan
cd9d0f4561 Link cpuinfo when using external NNPACK
Summary:
Close #1685
Closes https://github.com/caffe2/caffe2/pull/1722

Differential Revision: D6686071

Pulled By: Maratyszcza

fbshipit-source-id: bbe86bfd479376bc7cdfdd0bad3896f1c2356216
2018-01-09 12:50:52 -08:00
Pieter Noordhuis
54342287fe Look for NCCL in CUDA_TOOLKIT_ROOT_DIR
Summary: Closes https://github.com/caffe2/caffe2/pull/1611

Reviewed By: dzhulgakov

Differential Revision: D6550168

Pulled By: pietern

fbshipit-source-id: e034ce4057d37bfc8b53949c56cbcb701ea5d958
2017-12-12 21:50:49 -08:00
Pieter Noordhuis
db06e91097 Bump gloo
Summary:
Latest version of Gloo takes care of MPI_Init/MPI_Finalize for us, so
this commit removes handling that from caffe2/contrib/gloo. It also
imports CMake NCCL module changes from Gloo to stay consistent and
allow setting NCCL_INCLUDE_DIR and NCCL_LIB_DIR separately.
Closes https://github.com/caffe2/caffe2/pull/1295

Reviewed By: dzhulgakov

Differential Revision: D5979364

Pulled By: pietern

fbshipit-source-id: 794b00b0a445317c30a13cc8f0f4dc38e590cc77
2017-10-05 16:57:59 -07:00
Luke Yeager
c858c68537 cmake: stop including files from the install directory
Summary:
Here is the buggy behavior which this change fixes:

* On the first configure with CMake, a system-wide benchmark installation is not found, so we use the version in `third_party/` ([see here](https://github.com/caffe2/caffe2/blob/v0.8.1/cmake/Dependencies.cmake#L98-L100))
* On installation, the benchmark sub-project installs its headers to `CMAKE_INSTALL_PREFIX` ([see here](https://github.com/google/benchmark/blob/4bf28e611b/src/CMakeLists.txt#L41-L44))
* On a rebuild, CMake searches the system again for a benchmark installation (see https://github.com/caffe2/caffe2/issues/916 for details on why the first search is not cached)
* CMake includes `CMAKE_INSTALL_PREFIX` when searching the system ([docs](https://cmake.org/cmake/help/v3.0/variable/CMAKE_SYSTEM_PREFIX_PATH.html))
* Voila, a "system" installation of benchmark is found at `CMAKE_INSTALL_PREFIX`
* On a rebuild, `-isystem $CMAKE_INSTALL_PREFIX/include` is added to every build target ([see here](https://github.com/caffe2/caffe2/blob/v0.8.1/cmake/Dependencies.cmake#L97)). e.g:

      cd /caffe2/build/caffe2/binaries && ccache /usr/bin/c++    -I/caffe2/build -isystem /caffe2/third_party/googletest/googletest/include -isystem /caffe2/install/include -isystem /usr/include/opencv -isystem /caffe2/third_party/eigen -isystem /usr/include/python2.7 -isystem /usr/lib/python2.7/dist-packages/numpy/core/include -isystem /caffe2/third_party/pybind11/include -isystem /usr/local/cuda/include -isystem /caffe2/third_party/cub -I/caffe2 -I/caffe2/build_host_protoc/include  -fopenmp -std=c++11 -O2 -fPIC -Wno-narrowing -O3 -DNDEBUG   -o CMakeFiles/split_db.dir/split_db.cc.o -c /caffe2/caffe2/binaries/split_db.cc

This causes two issues:
1. Since the headers and libraries at `CMAKE_INSTALL_PREFIX` have a later timestamp than the built files, an unnecessary rebuild is triggered
2. Out-dated headers from the install directory are used during compilation, which can lead to strange build errors (which can usually be fixed by `rm -rf`'ing the install directory)

Possible solutions:
* Stop searching the system for an install of benchmark, and always use the version in `third_party/`
* Cache the initial result of the system-wide search for benchmark, so we don't accidentally pick up the installed version later
* Hack CMake to stop looking for headers and libraries in the installation directory

This PR is an implementation of the first solution. Feel free to close this and fix the issue in another way if you like.
Closes https://github.com/caffe2/caffe2/pull/1112

Differential Revision: D5761750

Pulled By: Yangqing

fbshipit-source-id: 2240088994ffafdb6eedb3626d898b505a4ba564
2017-09-01 23:33:14 -07:00
Pieter Noordhuis
45e6e71198 Tidy up CMake for NCCL
Summary:
Use HINTS instead of PATHS for find_library so that you can specify
-DNCCL_ROOT_DIR and it will use this NCCL installation regardless of
what else is installed on your system. Also add a path hint to include
the default base path for NCCL 2 libraries.
Closes https://github.com/caffe2/caffe2/pull/1152

Reviewed By: Yangqing

Differential Revision: D5740053

Pulled By: pietern

fbshipit-source-id: 43f0908a63e8a9b90320dece0bbb558827433b48
2017-08-30 15:39:56 -07:00
Pieter Noordhuis
813cca85d1 Use CMake HINTS to find CuDNN
Summary:
The PATHS suggestion to find_library is searched after everything
else. By using HINTS, it searches CUDNN_ROOT_DIR much earlier, avoiding
potential conflicts with other paths that have the CuDNN header.
Closes https://github.com/caffe2/caffe2/pull/1122

Reviewed By: Yangqing

Differential Revision: D5701822

Pulled By: pietern

fbshipit-source-id: 3f15757701aff167e7ae2a3e8a4ccf5d96763a0c
2017-08-24 15:35:24 -07:00
Guillaume Dumont
8cc9dbf357 Added Ninja generator support on Windows
Summary:
I successfully built caffe2 using MSVC 2015 and the Ninja Generator. I use vcpkg to build glfags, glog, lmdb and protobuf. Here is my build procedure:

1. Install vcpkg and set it up according to vcpkg docs
2. Install dependencies
```
$> vcpkg install gflags glog lmdb protobuf eigen3 --triplet x64-windows-static
```
3. Run CMake with this batch file
```Batch
setlocal
if NOT DEFINED VCPKG_DIR ( echo "Please defined VCPKG_DIR" && exit /b 1 )
if NOT DEFINED CMAKE_BUILD_TYPE set CMAKE_BUILD_TYPE=Release
if NOT DEFINED BUILD_DIR set BUILD_DIR=build_%CMAKE_BUILD_TYPE%
if NOT DEFINED USE_CUDA set USE_CUDA=OFF

call "%VS140COMNTOOLS%\..\..\VC\vcvarsall.bat" amd64

if NOT EXIST %BUILD_DIR% (mkdir %BUILD_DIR%)
pushd %BUILD_DIR%

set CMAKE_GENERATOR=Ninja
set ZLIB_LIBRARY=%VCPKG_DIR%\installed\x64-windows-static\lib\zlib.lib

cmake -G"%CMAKE_GENERATOR%" ^
      -DBUILD_SHARED_LIBS=OFF ^
      -DCMAKE_VERBOSE_MAKEFILE=1 ^
      -DBUILD_TEST=OFF ^
      -DBUILD_SHARED_LIBS=OFF ^
      -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^
      -DUSE_CUDA=%USE_CUDA% ^
      -DZLIB_LIBRARY:FILEPATH="%ZLIB_LIBRARY%" ^
      -DVCPKG_TARGET_TRIPLET=x64-windows-static ^
      -DVCPKG_APPLOCAL_DEPS:BOOL=OFF ^
      -DCMAKE_TOOLCHAIN_FILE:FILEPATH=%VCPKG_DIR%\scripts\buildsystems\vcpkg.cmake ^
      -DPROTOBUF_PROTOC_EXECUTABLE:FILEPATH=%VCPKG_DIR%\installed\x64-windows-static\tools\protoc.exe ^
      ..\

ninja
popd

endlocal
```
Closes https://github.com/caffe2/caffe2/pull/880

Differential Revision: D5497384

Pulled By: Yangqing

fbshipit-source-id: e0d81d3dbd3286ab925eddef0e6fbf99eb6375a5
2017-07-26 00:32:20 -07:00
Daniel Bermond
0458985c1b Fix build with external nnpack installation
Summary:
libpthreadpool is needed during the linking stage and is missing when user chooses to use an external nnpack installation (from system libraries).

Fixes GitHub issue #459.

Detailed discussion on [this comment](https://github.com/caffe2/caffe2/issues/459#issuecomment-308831547).
Closes https://github.com/caffe2/caffe2/pull/808

Differential Revision: D5430318

Pulled By: Yangqing

fbshipit-source-id: 5e10332fb01e54d8360bb929c1a82b0eef580bbb
2017-07-25 23:03:39 -07:00
Guillaume Dumont
feecb09517 Added sensible default root location for MKL on Windows
Summary:
MKL on windows works with this change. Tested with MKL 2017 Update 3 (https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2017-release-notes).

Should fix #544

With MKL 2017 Update 3 #514 should not happen too.

Note: I used Anaconda which ships with its own MKL, so I had to make sure that the MKL 2017 Update 3 version was loaded by replacing the .dll in the `%AnacondaPrefix%\Library\bin` folder. Otherwise, numpy would load it's own version and I would have all sorts of missing procedures errors. Now that the same version is available through `conda` this is easily fixed with `conda install mkl==2017.0.3`
Closes https://github.com/caffe2/caffe2/pull/929

Differential Revision: D5429664

Pulled By: Yangqing

fbshipit-source-id: eaa150bab563ee4ce8348faee1624ac4af477513
2017-07-14 17:20:36 -07:00
haracejacob
2ec294a8bb Fix a few typos and grammars in comment
Summary:
Fix a few typos and grammars in comment

by using language-check, python library
spell_checker source code is here : https://github.com/17-1-SKKU-OSS/011A/blob/master/spell_checker/spell_checker.py
here is the text file which indicates what things should be fixed :  https://github.com/17-1-SKKU-OSS/011A/tree/master/spell_checker/fix/caffe2
Closes https://github.com/caffe2/caffe2/pull/719

Differential Revision: D5165118

Pulled By: aaronmarkham

fbshipit-source-id: 7fb8ef7a99d03cd5fd2f9ebdb01b9865e90fc37b
2017-06-14 18:22:39 -07:00
Hans Gaiser
567842e68d Check system dependencies first
Summary:
This PR changes the cmake of Caffe2 to look for system dependencies before resorting to the submodules in `third-party`. Only googletest should logically be in third-party, the other libraries should ideally be installed as system dependencies by the user. This PR adds system dependency checks for Gloo, CUB, pybind11, Eigen and benchmark, as these were missing from the cmake files.

In addition it removes the execution of `git submodule update --init` in cmake. This seems like bad behavior to me, it should be up to the user to download submodules and manage the git repository.
Closes https://github.com/caffe2/caffe2/pull/382

Differential Revision: D5124123

Pulled By: Yangqing

fbshipit-source-id: cc34dda58ffec447874a89d01058721c02a52476
2017-05-24 14:31:51 -07:00
Du Tran
033ab9da1b Adding video data layer for caffe2
Summary: Adding a simple video data layer which allows to read video data from frames, videos and output 5D tensor. It also allows multiple labels. The current implementation is based on ffmpeg

Differential Revision: D4801798

fbshipit-source-id: 46448e9c65fb055c2d71855447383a33ade0e444
2017-05-05 14:16:38 -07:00
Yangqing Jia
1aa5231fb3 make nnpack build on mac/linux, and also contbuild support
Summary:
* add custom ninja install

* minimal build for nnpack

* force -fPIC for nnpack
Closes https://github.com/caffe2/caffe2/pull/207

Differential Revision: D4729265

Pulled By: Yangqing

fbshipit-source-id: 2ed345a4fda6b4811af03cd1898e2402dda58701
2017-03-17 15:19:07 -07:00
Yangqing Jia
1741fd839f Re-apply windows diff D4657831
Summary:
(Note: previous revert was due to a race condition between D4657831 and
D4659953 that I failed to catch.)

After this, we should have contbuild guarding the Windows build both with
and without CUDA.

This includes a series of changes that are needed to make Windows build,
specifically:

(1) Various flags that are needed in the cmake system, specially dealing
with /MD, /MT, cuda, cudnn, whole static linking, etc.
(2) Contbuild scripts based on appveyo.
(3) For Windows build, note that one will need to use "cmake --build" to
build stuff so that the build type is consistent between configuration and
actual build. see scripts\build_windows.bat for details.
(4) In logging.h, ERROR is already defined by Windows. I don't have a good
solution now, and as a result, LOG(ERROR) on windows is going to be
LOG(INFO).
(5) variable length array is not supported by MSVC (and it is not part of
C++ standard). As a result I replaced them with vectors.
(6) sched.h is not available on Windows, so akyrola 's awesome simple
async net might encounter some slowdown due to no affinity setting on
Windows.
(7) MSVC has a bug that does not work very well with template calls inide
a templated function call, which is a known issue that should be fixed in
MSVC 2017. However for now this means changes to conv_op_impl.h and
recurrent_net_op.h. No actual functionalities are changed.
(8) std host function calls are not supported in CUDA8+MSVC, so I changed
lp_pool (and maybe a few others) to use cuda device functions.
(9) The current Scale and Axpy has heavy templating that does not work
well with MSVC. As a result I reverted azzolini 's changes to the Scale
and Axpy interface, moved the fixed-length version to ScaleFixedSize and
AxpyFixedSize.
(10) CUDA + MSVC does not deal with Eigen well, so I guarded all Eigen
parts to only the non-CUDA part.
(11) In conclusion, it is fun but painful to deal with visual c++.

Differential Revision: D4666745

fbshipit-source-id: 3c9035083067bdb19a16d9c345c1ce66b6a86600
2017-03-07 11:02:12 -08:00
Avani Nandini
039c3cf0ba Revert D4657831: [caffe2][PR] Changes for Windows build to pass.
Summary: This reverts commit 070ded372ed78a7e3e3919fdffa1d337640f146e

Differential Revision: D4657831

fbshipit-source-id: 3a0fb403936a9257776d637ce3ba5dbd81e1119f
2017-03-06 21:02:36 -08:00
Yangqing Jia
7b8c7b11d2 Changes for Windows build to pass.
Summary:
After this, we should have contbuild guarding the Windows build both with
and without CUDA.

This includes a series of changes that are needed to make Windows build,
specifically:

(1) Various flags that are needed in the cmake system, specially dealing
with /MD, /MT, cuda, cudnn, whole static linking, etc.
(2) Contbuild scripts based on appveyo.
(3) For Windows build, note that one will need to use "cmake --build" to
build stuff so that the build type is consistent between configuration and
actual build. see scripts\build_windows.bat for details.
(4) In logging.h, ERROR is already defined by Windows. I don't have a good
solution now, and as a result, LOG(ERROR) on windows is going to be
LOG(INFO).
(5) variable length array is not supported by MSVC (and it is not part of
C++ standard). As a result I replaced them with vectors.
(6) sched.h is not available on Windows, so akyrola 's awesome simple
async net might encounter some slowdown due to no affinity setting on
Windows.
(7) MSVC has a
Closes https://github.com/caffe2/caffe2/pull/183

Reviewed By: ajtulloch

Differential Revision: D4657831

Pulled By: Yangqing

fbshipit-source-id: 070ded372ed78a7e3e3919fdffa1d337640f146e
2017-03-06 20:03:37 -08:00
Bram Wasti
0d5f3654b2 Adding back untracked files from manual github pull
Summary: Github import didn't work and the manual import lost some files.

Reviewed By: Yangqing

Differential Revision: D4408509

fbshipit-source-id: ec8edb8c02876410f0ef212bde6847a7ba327fe4
2017-01-12 08:59:19 -08:00
Yangqing Jia
1cd166d330 CMake completions work
Summary: Closes https://github.com/caffe2/caffe2/pull/88

Differential Revision: D4404292

Pulled By: bwasti

fbshipit-source-id: 8a4351c2dee5136aaa12b90f1a61fd7afee51994
2017-01-11 16:59:22 -08:00
Bram Wasti
1aa473638d Added a search path to find OpenBLAS for convenience (homebrew install) 2016-12-29 16:15:25 -05:00
Simon Layton
99e97a4b7a Correction to paths to find cuDNN 2016-12-16 16:03:23 -05:00
Simon Layton
fbbb87cd46 Enhancements
Add BLAS chooser
Move cuDNN detection from Cuda -> FindCuDNN
Refactor main C2 libs, should enable no-GPU build (untested)
2016-12-13 09:29:01 -05:00
Simon Layton
52f09fe2c9 Initial building with deps 2016-12-13 09:29:01 -05:00