Commit Graph

47 Commits

Author SHA1 Message Date
vishalrao487
d2623da52c replaced whitelist with allowlist (#45260)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41754

**(1)**
Intially file was named **gen_op_registration_whitelist.py** I changed it to **gen_op_registration_allowlist.py**

**(2)**
There were some **whitelist** in comment inside the file, I changed it to **allowlist**
![update1](https://user-images.githubusercontent.com/62737243/94106752-b296e780-fe59-11ea-8541-632a1dbf90d6.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45260

Reviewed By: dhruvbird

Differential Revision: D23947182

Pulled By: ljk53

fbshipit-source-id: 31b486592451dbb0605d7950e07747cbb72ab80f
2020-09-29 00:27:46 -07:00
Peter Bell
b70fac75ac CMake: Fix python dependencies in codegen (#45275)
Summary:
I noticed while working on https://github.com/pytorch/pytorch/issues/45163 that edits to python files in the  `tools/codegen/api/` directory wouldn't trigger rebuilds. This tells CMake about all of the dependencies, so rebuilds are triggered automatically.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45275

Reviewed By: zou3519

Differential Revision: D23922805

Pulled By: ezyang

fbshipit-source-id: 0fbf2b6a9b2346c31b9b0384e5ad5e0eb0f70e9b
2020-09-25 09:16:38 -07:00
Abdelrauf
6954ae1278 Vec256 Test cases (#42685)
Summary:
[Tests for Vec256 classes https://github.com/pytorch/pytorch/issues/15676](https://github.com/pytorch/pytorch/issues/15676)

Testing
Current list:

- [x] Blends
- [x] Memory: UnAlignedLoadStore
- [x] Arithmetics: Plus,Minu,Multiplication,Division
- [x] Bitwise: BitAnd, BitOr, BitXor
- [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual
- [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp
- [x] SignManipulation: Absolute, Negate
- [x] Interleave: Interleave, DeInterleave
- [x] Rounding: Round, Ceil, Floor, Trunc
- [x] Mask: ZeroMask
- [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal
- [x] Trigonometric: Sin, Cos, Tan
- [x] Hyperbolic: Tanh, Sinh, Cosh
- [x] InverseTrigonometric: Asin, ACos, ATan, ATan2
- [x] Logarithm: Log, Log2, Log10, Log1p
- [x] Exponents: Exp, Expm1
- [x] ErrorFunctions: Erf, Erfc, Erfinv
- [x] Pow: Pow
- [x] LGamma: LGamma
- [x] Quantization: quantize, dequantize, requantize_from_int
- [x] Quantization: widening_subtract, relu, relu6
Missing:
- [ ] Constructors, initializations
- [ ] Conversion , Cast
- [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex)

#### Notes on tests and testing framework
- some math functions are tested within domain range
- mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions.
- some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~
- round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx  vec_round failed  for  (even)+0 .5 values**~~ . it was solved by using vec_rint
- ~~**complex types are not tested**~~  **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain**
- ~~quantizations are not tested~~  Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions
- the testing framework should be improved further
- ~~For now  `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~
Vec256 Test cases will be built for each CPU_CAPABILITY

Fixes: https://github.com/pytorch/pytorch/issues/15676

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42685

Reviewed By: malfet

Differential Revision: D23034406

Pulled By: glaringlee

fbshipit-source-id: d1bf03acdfa271c88744c5d0235eeb8b77288ef8
2020-09-16 11:48:02 -07:00
Edward Yang
6ea89166bd Rewrite of ATen code generator (#42629)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629

How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D23183978

Pulled By: ezyang

fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763
2020-08-31 09:00:22 -07:00
Jiakai Liu
3a0e35c9f2 [pytorch] deprecate static dispatch (#43564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564

Static dispatch was originally introduced for mobile selective build.

Since we have added selective build support for dynamic dispatch and
tested it in FB production for months, we can deprecate static dispatch
to reduce the complexity of the codebase.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23324452

Pulled By: ljk53

fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7
2020-08-27 14:52:48 -07:00
Jiakai Liu
3afd24d62c [pytorch] check in default generated op dependency graph (#43570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43570

Add the default op dependency graph to the source tree - use it if user runs
custom build in dynamic dispatch mode without providing the graph.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23326988

Pulled By: ljk53

fbshipit-source-id: 5fefe90ca08bb0ca20284e87b70fe1dba8c66084
2020-08-27 14:51:44 -07:00
Anush Elangovan
c86699d425 [cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387)
Summary:
Add support for including pytorch via an add_subdirectory()
This requires using PROJECT_* instead of CMAKE_* which refer to
the top-most project including pytorch.

TEST=add_subdirectory() into a pytorch checkout and build.
There are still some hardcoded references to TORCH_SRC_DIR, I will
fix in a follow on commit. For now you can create a symlink to
 <pytorch>/torch/ in your project.

Change-Id: Ic2a8aec3b08f64e2c23d9e79db83f14a0a896abc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41387

Reviewed By: zhangguanheng66

Differential Revision: D22539944

Pulled By: ezyang

fbshipit-source-id: b7e9631021938255f0a6ea897a7abb061759093d
2020-07-15 11:09:05 -07:00
Wojciech Baranowski
0b9717b86a When linking libtorch_cpu.so, put AVX sources last in the input list (#40449)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39600
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40449

Reviewed By: VitalyFedyunin

Differential Revision: D22312501

Pulled By: colesbury

fbshipit-source-id: 4c09adb0173749046f20b84241d6c940b339ad77
2020-07-06 07:56:12 -07:00
Ivan Kobzarev
b460465a18 [Mobile GPU][Integration] Vulkan backend integration (#36491)
Summary:
This PR contains the initial version of Vulkan (GPU) Backend integration.
The primary target environment is Android, but the desktop build is also supported.

## CMake
Introducing three cmake options:
USE_VULKAN:
The main switch, if it is off, all other options do not affect.
USE_VULKAN_WRAPPER:
ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h.
OFF - linking with libvulkan.so directly
USE_VULKAN_SHADERC_RUNTIME:
ON - Shader compilation library will be linked, and shaders will be compiled runtime.
OFF - Shaders will be precompiled and shader compilation library is not included.

## Codegen
if `USE_VULKAN_SHADERC_RUNTIME` is ON:
Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp.
if `USE_VULKAN_SHADERC_RUNTIME` is OFF:
The source of shaders is included as `glsl.h`,`glsl.cpp`.

All codegen results happen in the build directory.

## Build dependencies
cmake/Dependencies.cmake
If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK.
Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it.
(Desktop build was tested only on Linux).

## Pytorch integration:
Adding 'Vulkan" as new Backend, DispatchKey, DeviceType.
We are using Strided layout without supporting strides at the moment, but we plan to support them in the future.
Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor,
more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h`

Main code location: `aten/src/ATen/native/vulkan`
`aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor.

`aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops.

`aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API

## GLSL shaders
Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files.
All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3

## Supported operations
Code point:
conv2d no-groups
conv2d depthwise
addmm
upsample nearest 2d
clamp
hardtanh

## Testing
`aten/src/ATen/test/vulkan_test.cpp` - contains tests for
copy from CPU to Vulkan and back
all supported operations
Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader

## Vulkan execution
The initial implementation is trivial and waits every operator's execution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491

Differential Revision: D21696709

Pulled By: IvanKobzarev

fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa
2020-05-26 08:30:13 -07:00
Nikita Shulga
664a3ab5c7 Enable py38 gcc9 build config (#38805)
Summary:
Add `py38-gcc9` build-only config
Add appropriate `-Wno-xyz` flags to ATEN kernels as well as `tensorexp/llvm_jit.cpp` and `tensorexp/llvm_codegen.cpp`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38805

Differential Revision: D21682953

Pulled By: malfet

fbshipit-source-id: 5b61d0dfe8bdec8fb13e2ae5857dc5e7c6e58e42
2020-05-21 01:38:04 -07:00
Nikita Shulga
4b52e52577 Use jit_core_sources from build_varliables.bzl (#38526)
Summary:
Replace hardcoded filelist in aten/src/ATen/CMakeLists.txt with one from `jit_source_sources`
Fix `append_filelist` to work independently from the location it was invoked
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38526

Differential Revision: D21594582

Pulled By: malfet

fbshipit-source-id: c7f216a460edd474a6258ba5ddafd4c4f59b02be
2020-05-15 08:21:37 -07:00
Jiakai Liu
6792bafa72 [pytorch] aten codegen to filter backends for default mobile build
Summary:
This is a simple change to mitigate the OSS mobile default build size regression caused by #34275 and #34622.

Mobile supported backends are already kinda hard-coded in function_wrapper.py as `static_dispatch_backends`:
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/function_wrapper.py#L243

This is simply to align dynamic registration with static dispatch for mobile build.

To measure mobile build size:
```
// Default mobile build:
scripts/build_pytorch_android.sh armeabi-v7a

// MobileNetV2 custom build:
SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```

- arm-v7 Android AAR (compressed) size:
```
+----------+-------------------+---------------+
|          | MobileNetV2 Build | Default Build |
+----------+-------------------+---------------+
| Original |         3,354,589 |     5,731,992 |
| #34275   |         3,404,978 |     6,640,526 |
| #34622   |         3,432,569 |     6,640,526 |
| This PR  |         3,431,660 |     6,534,135 |
+----------+-------------------+---------------+
```

Differential Revision: D20415107

Test Plan: Imported from OSS

Pulled By: ljk53

fbshipit-source-id: 75acf4dc5dfe9242c01b2db0b84bd6b4a1d0cd8d
2020-04-30 01:35:38 -07:00
Bram Wasti
4234d62489 [hotfix] Workaround for older versions of ninja (#37417)
Summary:
Older versions of ninja don't like relative paths in configure_file when it is called twice.

https://gitlab.kitware.com/cmake/cmake/issues/17601

Fix suggested in comments https://gitlab.kitware.com/cmake/cmake/-/issues/18584
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37417

Reviewed By: malfet

Differential Revision: D21280141

Pulled By: bwasti

fbshipit-source-id: 4cb94996a9e8ae8c01602ea1da6f4ce9d61fa700
2020-04-28 09:03:51 -07:00
Nikita Shulga
76cb7f2043 Use filelist from build_variables.bzl to fetch distributed file list (#37090)
Summary:
Rename `get_filelist` to `append_filelist`
Repalce hadcoded filelist under `USE_DISTRIBUTED` with `append_filelist("libtorch_distributed_sources" TORCH_SRCS)` call
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37090

Test Plan: CI

Differential Revision: D21184002

Pulled By: malfet

fbshipit-source-id: 25bb7f97fcb2bf5bec8bdb3aa059ae13e7610007
2020-04-22 13:13:25 -07:00
Nikita Shulga
4668d47d1f Add build_variable.bzl to CMAKE_RERUN target (#36809)
Summary:
`configure_file` command adds its input as a top-level dependency triggering make file regeneration if file timestamp have changed
Also abort CMAKE if `exec` of build_variables.bzl failed for some reason
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36809

Test Plan: Add invalid statement to build_variables.bzl and check that build process fails

Differential Revision: D21100721

Pulled By: malfet

fbshipit-source-id: 79a54aa367fb8dedb269c78b9538b4da203d856b
2020-04-17 17:28:07 -07:00
Nikita Shulga
d7fc05b0bf Fetch TORCH_SRCS from build_variables.bzl (#36737)
Summary:
Mimic `.bzl` parsing logic from https://github.com/pytorch/FBGEMM/pull/344
Generate `libtorch_cmake_sources` by running following script:
```
def read_file(path):
    with open(path) as f:
        return f.read()

def get_cmake_torch_srcs():
    caffe2_cmake = read_file("caffe2/CMakeLists.txt")
    start = caffe2_cmake.find("set(TORCH_SRCS")
    end = caffe2_cmake.find(")", start)
    return caffe2_cmake[start:end+1]
def get_cmake_torch_srcs_list():
    caffe2_torch_srcs = get_cmake_torch_srcs()
    unfiltered_list = [x.strip() for x in get_cmake_torch_srcs().split("\n") if len(x.strip())>0]
    return [x.replace("${TORCH_SRC_DIR}/","torch/") for x in unfiltered_list if 'TORCH_SRC_DIR' in x]

import imp
build_variables = imp.load_source('build_variables', 'tools/build_variables.bzl')
libtorch_core_sources = set(build_variables.libtorch_core_sources)
caffe2_torch_srcs = set(get_cmake_torch_srcs_list())
if not libtorch_core_sources.issubset(caffe2_torch_srcs):
    print("libtorch_core_sources must be a subset of caffe2_torch_srcs")
print(sorted(caffe2_torch_srcs.difference(libtorch_core_sources)))
```

Move common files between `libtorch_cmake_sources` and `libtorch_extra_sources` to `libtorch_jit_core_sources`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36737

Test Plan: CI

Differential Revision: D21078753

Pulled By: malfet

fbshipit-source-id: f46ca48d48aa122188f028136c14687ff52629ed
2020-04-16 19:12:52 -07:00
peter
3bdc4a37ed CMake script cleanup - mixed case for function names (#35589)
Summary:
Running the following code.
```bash
cmake --help-command-list |
grep -v "cmake version" |
while read c; do
    echo 's/\b'"$(echo $c | tr '[:lower:]' '[:upper:]')"'\(\s*\)(/'"$c"'\1(/g'
done >convert.sed &&
git ls-files -z -- bootstrap '*.cmake' '*.cmake.in' '*CMakeLists.txt' |
egrep -z -v '^(cmake/Modules/|cmake/Modules_CUDA_fix/)' |
xargs -0 sed -i -f convert.sed &&
rm convert.sed
```
cmake-lint is too sensitive about mixed case so I didn't switch the check on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35589

Differential Revision: D20735648

Pulled By: ezyang

fbshipit-source-id: a09a60a7ce921bb198575a35335faa299bd10b66
2020-03-30 11:37:02 -07:00
peter
45c9ed825a Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521)
Summary:
Running commands:
```bash
shopt -s globstar

sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake.in
```
We may further convert all the commands into lowercase according to the following issue: 77543bde41.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521

Differential Revision: D20704382

Pulled By: malfet

fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80
2020-03-27 14:25:17 -07:00
xiaobingsuper
fb70893e78 remove cadd_avx2 dead code (#34883)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34883

Test Plan: Imported from OSS

Differential Revision: D20611526

Pulled By: ngimel

fbshipit-source-id: 78c80b7361119fc8d2b9f6b4f0c86b61723fe05d
2020-03-24 12:00:56 -07:00
Jiakai Liu
61b680c012 [pytorch] force c10 schema registration for custom build
Summary:
PR #32521 has several issues with mobile builds:
1. It didn't work with static dispatch (which OSS mobile build currently uses);
2. PR #34275 fixed 1) but it doesn't fix custom build for #32521;
3. manuallyBoxedKernel has a bug with ops which only have catchAllKernel: 2d7ede5f71

Both 1) and 2) have similar root cause - some JIT side code expects certain schemas to be registered in JIT registry.
For example: considering this code snippet: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/frontend/builtin_functions.cpp#L10
```
auto scalar_operators_source = CodeTemplate(
    R"SCRIPT(
def mul(a : ${Scalar}, b : Tensor) -> Tensor:
  return b * a
...
```

It expects "aten::mul.Scalar(Tensor self, Scalar other) -> Tensor" to be registered in JIT - it doesn't necessarily need to call the implementation, though; otherwise it will fail some type check: https://github.com/pytorch/pytorch/pull/34013#issuecomment-592982889

Before #32521, all JIT registrations happen in register_aten_ops_*.cpp generated by gen_jit_dispatch.py.
After #32521, for ops with full c10 templated boxing/unboxing support, JIT registrations happen in TypeDefault.cpp/CPUType.cpp/... generated by aten/gen.py, with c10 register API via RegistrationListener in register_c10_ops.cpp. However, c10 registration in TypeDefault.cpp/CPUType.cpp/... are gated by `#ifndef USE_STATIC_DISPATCH`, thus these schemas won't be registered in JIT registry when USE_STATIC_DISPATCH is enabled.

PR #34275 fixes the problem by moving c10 registration out of `#ifndef USE_STATIC_DISPATCH` in TypeDefault.cpp/CPUType.cpp/..., so that all schemas can still be registered in JIT. But it doesn't fix custom build, where we only keep c10 registrations for ops used by specific model directly (for static dispatch custom build) and indirectly (for dynamic dispatch custom build). Currently there is no way for custom build script to know things like "aten::mul.Scalar(Tensor self, Scalar other) -> Tensor" needs to be kept, and in fact the implementation is not needed, only schema needs to be registered in JIT.

Before #32521, the problem was solved by keeping a DUMMY placeholder for unused ops in register_aten_ops_*.cpp: https://github.com/pytorch/pytorch/blob/master/tools/jit/gen_jit_dispatch.py#L326
After #32521, we could do similar thing by forcing aten/gen.py to register ALL schema strings for selective build - which is what is PR is doing.

Measured impact on custom build size (for MobileNetV2):
```
SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```
Before: 3,404,978
After:  3,432,569

~28K compressed size increase due to including more schema strings.

The table below summarizes the relationship between codegen flags and 5 build configurations that are related to mobile:
```
+--------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------+
|                                      |                              Open Source                                    |                  FB BUCK                   |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
|                                      |    Default Build    | Custom Build w/ Stat-Disp | Custom Build w/ Dyna-Disp |   Full-JIT    |         Lite-JIT           |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| Dispatch Type                        | Static              | Static                    | Dynamic                   | Dynamic (WIP) | Dynamic (WIP)              |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| ATen/gen.py                          |                     |                           |                           |               |                            |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| --op_registration_whitelist          | unset               | used root ops             | closure(used root ops)    | unset         | closure(possibly used ops) |
| --backend_whitelist                  | CPU Q-CPU           | CPU Q-CPU                 | CPU Q-CPU                 | CPU Q-CPU     | CPU Q-CPU                  |
| --per_op_registration                | false               | false                     | false                     | false         | true                       |
| --force_schema_registration          | false               | true                      | true                      | false         | false                      |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| tools/setup_helpers/generate_code.py |                     |                           |                           |               |                            |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
| --disable-autograd                   | true                | true                      | true                      | false         | WIP                        |
| --selected-op-list-path              | file(used root ops) | file(used root ops)       | file(used root ops)       | unset         | WIP                        |
| --disable_gen_tracing                | false               | false                     | false                     | false         | WIP                        |
+--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+
```

Differential Revision: D20397421

Test Plan: Imported from OSS

Pulled By: ljk53

fbshipit-source-id: 906750949ecacf68ac1e810fd22ee99f2e968d0b
2020-03-20 20:07:34 -07:00
Jiakai Liu
064c478453 [pytorch] register c10 ops for static dispatch to unblock c10 boxing
Summary:
PR #32521 broke static dispatch because some ops are no longer
registered in register_aten_ops_*.cpp - it expects the c10 registers in
TypeDefault.cpp / CPUType.cpp / etc to register these ops. However, all
c10 registers are inside `#ifndef USE_STATIC_DISPATCH` section.

To measure the OSS mobile build size impact of this PR:
```
 # default build: SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a
 # mobilenetv2 custom build: scripts/build_pytorch_android.sh armeabi-v7a
```

- Before this PR, Android AAR size for arm-v7:
* default build: 5.5M;
* mobilenetv2 custom build: 3.2M;

- After this PR:
* default build: 6.4M;
* mobilenetv2 custom build: 3.3M;

It regressed default build size by ~1M because more root ops are
registered by c10 registers, e.g. backward ops which are filtered out by
gen_jit_dispatch.py for inference-only mobile build.

mobilenetv2 custom build size regressed by ~100k presumably because
the op whitelist is not yet applied to things like BackendSelectRegister.

Differential Revision: D20266240

Test Plan: Imported from OSS

Pulled By: ljk53

fbshipit-source-id: 97a9a06779f8c62fe3ff5cce089aa7fa9dee3c4a
2020-03-20 20:07:15 -07:00
Jiakai Liu
3c042a6ab9 [pytorch][mobile] support for custom mobile build with dynamic dispatch (#34055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34055

Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D20193327

Pulled By: ljk53

fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa
2020-03-03 19:25:16 -08:00
xiaobing.zhang
b678256bfb Move glu to Aten(CPU) (#33179)
Summary:
This PR move glu to Aten(CPU).
Test script:
```
import torch
import torch.nn.functional as F
import time

torch.manual_seed(0)

def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"

#warm up
for n in [10, 100, 1000, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n // 2, device=device)
    for i in range(1000):
        output = F.glu(input)
        output.backward(grad_output)

for n in [10, 100, 1000, 10000]:
    fwd_t = 0
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n // 2, device=device)
    for i in range(10000):
        t1 = _time()
        output = F.glu(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test device: **skx-8180.**
Before:
```
input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms).
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms).
input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms).
input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms).
```
After:
```
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms).
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms).
input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms).
input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms).
```
Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179

Differential Revision: D19839835

Pulled By: VitalyFedyunin

fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9
2020-02-28 14:54:38 -08:00
Francis Charette Migneault
0150f40dde dont force msvc /Ox flag which can conflict with /RTC1 in debug config (#33164)
Summary:
Relates to https://github.com/pytorch/pytorch/issues/33132

This fix doesn't add full multi-configuration support described in https://github.com/pytorch/pytorch/issues/33132 but at least avoid the error presented in the issue when `CMAKE_BUILD_TYPE=Debug` is used with MSVC.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33164

Differential Revision: D19899727

Pulled By: ezyang

fbshipit-source-id: 28a364d920c4a3fb577c6b484ccd69a133fbcf5d
2020-02-13 22:15:20 -08:00
Edward Yang
0b6186d778 Remove Tensor.h, TensorMethods.h from src/core. (#27086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086

This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past).

This is a commandeer of #25031

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D17687345

Pulled By: ezyang

fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f
2019-10-06 09:37:50 -07:00
Sebastian Messmer
8321f2592e Register ATen ops with c10 (#26131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131

Changes in this PR:
- For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files.
- This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things.
- Enable the use_c10_dispatcher: True flag for about ~70% of operators
- This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash
- For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops.

Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true):
- `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet
- `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser
- out functions have different argument order in C++ as in the jit schema
- `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
- fixed-size arrays like `int[3]` not supported in c10 yet

These will be fixed in separate diffs and then the exclusion tag will be removed.
ghstack-source-id: 90060748

Test Plan: a diff stacked on top uses these registrations to call these ops from ATen

Differential Revision: D16603131

fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb
2019-09-13 13:52:40 -07:00
James Reed
817f4502fb Dynamic dispatch for optimized quantized op kernels (#25545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25545

This re-uses the infrastructure from ATen/native/cpu, which compiles kernels multiple times for different instruction sets and dispatches dynamically based on the CPU's capability flags at runtime. This ensures we use the most optimal quantized kernel for the given machine

Test Plan: Imported from OSS

Differential Revision: D17166369

Pulled By: jamesr66a

fbshipit-source-id: 8c8393f99365e1408819bbaf254c1b5734a34b70
2019-09-04 13:26:40 -07:00
Sebastian Messmer
791347642b Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888

This is an alternative to https://github.com/pytorch/pytorch/pull/23684.

Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed.
ghstack-source-id: 89357687

Test Plan: waitforsandcastle

Differential Revision: D16673569

fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf
2019-09-04 01:35:19 -07:00
Roy Li
2a698682e4 Remove Type dispatch (#21964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21964
ghimport-source-id: fdfb555ac4efbf31ae7d2c700a5aa44ad0cc4d7f

Test Plan: Imported from OSS

Differential Revision: D15897424

Pulled By: li-roy

fbshipit-source-id: 3cd6744254e34d70e6875ffde749b5cf959b663c
2019-06-30 04:11:35 -07:00
Sam Gross
b90790ab1b Don't split 256-bit AVX2 load/store intrinsics (#20609)
Summary:
Recent versions of GCC split unaligned load and store intrinsics into
two 128-bit instructions. On old processors (Sandy Bridge) this was a
bit faster for unaligned data, but bit slower for aligned data. On new
processors (Intel Haswell+, recent AMD) splitting loads is slower on
both aligned and unaligned data.

Clang, MSVC, and ICC do not split unaligned load and store intrinsics.

There's a good explanation here:
https://stackoverflow.com/questions/52626726/why-doesnt-gcc-resolve-mm256-loadu-pd-as-single-vmovupd#tab-top

Splitting load and store intrinsics makes no sense in our AVX2
configuration because the CPUs that support AVX2 instructions are the
same CPUs where splitting is disadvantageous on all data alignemnt.

Note that this doesn't change the AVX configuration (used by CPUs that
support AVX but not AVX2). It's possible this would be benficial for
that configuration too (our data is usually 32-byte aligned), but I'd
prefer the conservative change for now.

torch.add generated assembly (hot loop) (GCC 7.3.0)
before:
https://gist.github.com/colesbury/066376537bccd514daf8fe4ab54d8295

after:
https://gist.github.com/colesbury/8b4b948145001d44b225c51d2428bb91

Timing of `torch.add(x, y, out=z)` for size 10240 (1 thread, Broadwell,
no turbo):
before: 7.35 us after: 6.39 us

(Take the torch.add timings with a grain of salt. The difference in timings
is much larger than I would expect.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20609

Differential Revision: D15385800

Pulled By: colesbury

fbshipit-source-id: 66415b148a3b19360b9de9881af594ab46547b6f
2019-05-17 09:16:17 -07:00
Jiakai Liu
c7c02724cd CMakeLists changes to enable libtorch for Android (#19762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19762
ghimport-source-id: 287aa7fea4efd38994e14d794123eb2046b91fc0

Differential Revision: D15087653

Pulled By: ljk53

fbshipit-source-id: 4498ff9f7f7903c3e25541184302b811267958e9
2019-05-03 09:28:53 -07:00
Jiakai Liu
8cd6d2f101 rename BUILD_ATEN_MOBILE to INTERN_BUILD_MOBILE and make it private (#19942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19942
ghimport-source-id: 6bacc8f5ad7911af8cf5fde9fcb604ade666b862

Reviewed By: dzhulgakov

Differential Revision: D15144325

Pulled By: ljk53

fbshipit-source-id: d63a70f007110d5d1055d6bec1ed09a1a6aafdae
2019-05-01 00:20:24 -07:00
Gemfield
20159c3ffe remove redundant --install_dir parameter in GEN_COMMAND (#18473)
Summary:
remove redundant --install_dir parameter in GEN_COMMAND, since "--install_dir parameter " already contained in ${GEN_COMMAND}.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18473

Differential Revision: D14620193

Pulled By: ezyang

fbshipit-source-id: ee9953b5d055f4b8beb3557f95f6539051b0028a
2019-03-26 10:22:00 -07:00
Owen Anderson
fc2d8c6889 Eliminate PYCMD in favor of PYTHON_EXECUTABLE in CMake.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16522

Differential Revision: D13867376

Pulled By: resistor

fbshipit-source-id: 6bce68facea83c5161a31fcdfafe08827999eb2b
2019-01-30 17:13:43 -08:00
andersj
8a5ba577c1 Revert "remove use of tmp_install" (#15847)
Summary:
This reverts commit 04bf528589.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15847

Differential Revision: D13603174

Pulled By: anderspapitto

fbshipit-source-id: ae321434d3345ad94fad67bf71fd027cddeb4588
2019-01-08 16:30:19 -08:00
andersj
04bf528589 remove use of tmp_install
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14553

Differential Revision: D13583335

Pulled By: anderspapitto

fbshipit-source-id: 8711fead9eda877c1037a0bc59f91a3d2e01f3e0
2019-01-04 13:48:12 -08:00
Zachary DeVito
60f02b87be fix an issue where two rules build the same .py files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15230

Differential Revision: D13471625

Pulled By: zdevito

fbshipit-source-id: a982413a308c7a9bb5b6a82fe96fd3de44f555aa
2018-12-14 14:52:52 -08:00
Edward Yang
b710642969 Make ATen HIPify out-of-place, but still reuse CUDA names. (#14866)
Summary:
```
    This diff changes the HIPification of ATen to be out-of-place.
    We now have the following mappings:

    - ATen/cuda => ATen/hip
    - ATen/native/cuda => ATen/native/hip
    - ATen/native/sparse/cuda => ATen/native/sparse/hip
    - THC => THH
    - THCUNN => THHUNN

    The build system is adjusted to know about these new build paths,
    and HIPify is taught how to adjust include paths and
    THC_GENERIC_FILE appropriately.  ATen_hip is now built as
    the ATen_hip library, rather than reusing ATen_cuda.

    However, despite these new filepaths, none of the identifiers in ATen
    have actually changed.  So, e.g., THHGeneral.h still defines functions
    named THC_blahblah, and HIP still shows up as CUDA in PyTorch itself.
    We'll tackle this in a subsequent PR; this diff is just to get the files
    out-of-place.

    Minor extra improvements:

    - Don't edit tmp_install when hipifying
    - HIP no longer builds native_cudnn_cpp; it was unnecessary
    - Caffe2_HIP_INCLUDES is now Caffe2_HIP_INCLUDE, for consistency
      with all the other variables.
    - HIP build now properly respects ATEN_CUDA_FILES_GEN_LIB (it
      did not previously.)
    - You can now override file extension matching in pyHIPIFY
      by explicitly specifying its full name in the matching list.
      This is used so we can HIPify CMakeLists.txt in some situations.

    A little bit of string and ceiling wax:

    - gen.py grows a --rocm flag so that it knows to generate CUDA
      files which actually refer to the HIP headers (e.g., THH.h)
      We'll get rid of this eventually and generate real HIP files,
      but not for this PR.
    - Management of HIP dependencies is now completely deleted
      from the ATen CMakeLists.txt.  The old code was dead (because
      it was shoveled in ATen_CUDA_DEPENDENCY_LIBS and promptly
      ignored by the Caffe2 build system) and didn't actually work.
```

Stacked on https://github.com/pytorch/pytorch/pull/14849 review last commit only
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14866

Differential Revision: D13419475

Pulled By: ezyang

fbshipit-source-id: cb4c843df69a1d8369314c9fab1b7719520fa3db
2018-12-11 19:15:27 -08:00
Christian Puhrsch
f564163951 Remove SSE-only code and convolve5x5 (#12109)
Summary:
Performance oriented code will use AVX/AVX2, so we don't need SSE specific code anymore. This will also reduce the probability of running into an error on legacy CPUs.

On top of this convolve is covered by modern libraries such as MKLDNN, which are much more performant and which we now build against by default (even for builds from source).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12109

Differential Revision: D10055134

Pulled By: colesbury

fbshipit-source-id: 789b8a34d5936d9c144bcde410c30f7eb1c776fa
2018-10-09 10:53:50 -07:00
Gregory Chanan
9a7c196040 Move Type, Tensor, TensorMethods to core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11519

Reviewed By: yf225

Differential Revision: D9771684

Pulled By: gchanan

fbshipit-source-id: a57ee2072af99ce856f895c688b09d750a8606e0
2018-09-12 13:10:54 -07:00
Christian Puhrsch
aeb6094538 Unify opt flag for cmake codegen (#11227)
Summary:
Also enables debug for non-MSVC for kernel codegen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11227

Differential Revision: D9656506

Pulled By: cpuhrsch

fbshipit-source-id: 667195cb55de1a1a9042b6b1c4436e9c6c743333
2018-09-05 08:55:49 -07:00
Mingzhe Li
f0d8a36e70 Completely remove build_aten and use_aten (#10469)
Summary:
Breaking out of #8338 to completely remove build_aten and use_aten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10469

Reviewed By: orionr

Differential Revision: D9413639

Pulled By: mingzhe09088

fbshipit-source-id: b7203aa4f5f2bb95c504c8dc187a3167f2570183
2018-08-20 20:26:42 -07:00
Edward Yang
64a6f17177 Fix ATen/core header installation. (#10463)
Summary:
Fixes #10353 and fixes #10397.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10463

Differential Revision: D9296491

Pulled By: ezyang

fbshipit-source-id: f825c2a21a113e44a6f5c1c5ec17814d9deac366
2018-08-13 09:25:49 -07:00
Edward Yang
37a226de63 When BUILD_ATEN=OFF, use ATen/core directly (#10019)
Summary:
ATenCore.h is a dummy header to just test that this is working at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019

Reviewed By: smessmer

Differential Revision: D9067262

Pulled By: ezyang

fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee
2018-07-30 21:09:55 -07:00
Christian Puhrsch
e9e47ce8f1 Vectorize sigmoid (#8612)
Summary:
This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8612

Reviewed By: ezyang

Differential Revision: D8712298

Pulled By: cpuhrsch

fbshipit-source-id: 01a3d06af8d04513edd024ab1d01a6b753fc6f6a
2018-07-10 12:40:39 -07:00
anderspapitto
41ef5c2d4b Support for generating ATen during the fbcode build, rather than committing the generated files (#8002)
Paint the internal bikeshed a slightly different color to appease Buck tooling.
2018-06-01 16:04:02 -04:00
Orion Reblitz-Richardson
4bf0202cac
[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399)
* Have PyTorch depend on minimal libcaffe2.so instead of libATen.so

* Build ATen tests as a part of Caffe2 build

* Hopefully cufft and nvcc fPIC fixes

* Make ATen install components optional

* Add tests back for ATen and fix TH build

* Fixes for test_install.sh script

* Fixes for cpp_build/build_all.sh

* Fixes for aten/tools/run_tests.sh

* Switch ATen cmake calls to USE_CUDA instead of NO_CUDA

* Attempt at fix for aten/tools/run_tests.sh

* Fix typo in last commit

* Fix valgrind call after pushd

* Be forgiving about USE_CUDA disable like PyTorch

* More fixes on the install side

* Link all libcaffe2 during test run

* Make cuDNN optional for ATen right now

* Potential fix for non-CUDA builds

* Use NCCL_ROOT_DIR environment variable

* Pass -fPIC through nvcc to base compiler/linker

* Remove THCUNN.h requirement for libtorch gen

* Add Mac test for -Wmaybe-uninitialized

* Potential Windows and Mac fixes

* Move MSVC target props to shared function

* Disable cpp_build/libtorch tests on Mac

* Disable sleef for Windows builds

* Move protos under BUILD_CAFFE2

* Remove space from linker flags passed with -Wl

* Remove ATen from Caffe2 dep libs since directly included

* Potential Windows fixes

* Preserve options while sleef builds

* Force BUILD_SHARED_LIBS flag for Caffe2 builds

* Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing

* Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake

* Fixes for the last two changes

* Potential fix for Mac build failure

* Switch Caffe2 to build_caffe2 dir to not conflict

* Cleanup FindMKL.cmake

* Another attempt at Mac cpp_build fix

* Clear cpp-build directory for Mac builds

* Disable test in Mac build/test to match cmake
2018-05-24 07:47:27 -07:00