Commit Graph

37 Commits

Author SHA1 Message Date
Gu, Jinghui
575aebc182 implement operators for DNNLOWP (#18656)
Summary:
Implement operators for DNNLOWP, including int8_conv, int8_FC, int8_pooling, int8_relu, int8_sum, quantize/dequantize, and order_swtich operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18656

Differential Revision: D14767092

Pulled By: yinghai

fbshipit-source-id: 1f3e24929a358a42214da333bd304c593ea4468f
2019-04-10 12:04:39 -07:00
Orion Reblitz-Richardson
febc7ff99f Add __init__.py so files get picked up on install (#14898)
Summary:
This will let us install tests and other Caffe2 python code as a part of running Caffe2 tests in PyTorch.

Broken out of https://github.com/pytorch/pytorch/pull/13733/

cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14898

Reviewed By: pjh5

Differential Revision: D13381123

Pulled By: orionr

fbshipit-source-id: 0ec96629b0570f6cc2abb1d1d6fce084e7464dbe
2018-12-07 13:40:23 -08:00
Gu, Jinghui
dbab9b73b6 seperate mkl, mklml, and mkldnn (#12170)
Summary:
1. Remove avx2 support in mkldnn
2. Seperate mkl, mklml, and mkldnn
3. Fix convfusion test case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170

Reviewed By: yinghai

Differential Revision: D10207126

Pulled By: orionr

fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51
2018-10-29 10:52:55 -07:00
Viswanath Sivakumar
ade97afc74 Re-enable IDEEP graph rewrite test (#12661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12661

Was disabled since workspace.has_mkldnn is now set to false

Reviewed By: yinghai

Differential Revision: D10383913

fbshipit-source-id: ad6dc705f0606b3711e8b450dc384ad3ebb87686
2018-10-15 15:50:28 -07:00
Guan Pang
17637f2b03 enable_mkl support for resnet18+lstm model
Summary:
* Many op in lstm part of the model don't have implementation in ideep/mkl, and it doesn't make sense to copy back and forth for the few available ops because majority of RNN will be on CPU
* Thus the strategy is to enable mkl only for the resnet18 part of the model, then switch to default cpu engine for the lstm part

* The net may contain some external_inputs falsely added during ONNX->Caffe2. Canary in service shows their existence could leads to service crash (presumably due to these blob somehow get shared between threads). They're now manually removed which seem to be enough to avoid the crash.

Reviewed By: viswanathgs

Differential Revision: D8888763

fbshipit-source-id: da7761bcb7d876ff7bbb6640ae4b24712c0b1de6
2018-09-12 18:56:46 -07:00
Yinghai Lu
346de2535d Workaround lack of 0-dim support in ideep (#8959)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8959

MKL-DNN doesn't have support to  0-dim tensor. As a workaround, we produce CPUTensor instead of Ideep tensor in the fallback ops. And for those tensors, we don't need Ideep copy op anymore.

Reviewed By: viswanathgs

Differential Revision: D8665168

fbshipit-source-id: 59678de2c5aed8c691ab5caaadede6d6c000dd7b
2018-06-27 20:24:28 -07:00
sf-wind
5b86c3af4a
Update from facebook (#8384)
* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* Remove the code per soumith's comments

* Remove the code per soumith's comments

* Remove blank lines in the end of file

* Resolve conflicts for torch/_thnn/utils.py

* Update MKL exporter to IDEEP ops

TSIA

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)

* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove the code per soumith's comments

* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)

* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface

* Remove imaginary file (#8415)

* [Caffe2] Enable AMD/MIOPEN ops for Caffe2  (#8306)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format

* Enable some reduce operators' ONNX backend tests (#8418)

* fix old comment to point to the right file (#8416)

* Stop pinning nccl version. (#8421)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)

* Enable some of the ONNX backend test on broadcasting (#8423)

* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast

* Expose proto utils and ONNX (#8073)

* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files

* Rebase creates some weird situations, revert them manually

* Remove more weird changes due to rebase

* Need to add thread_name.cc after merge
2018-06-13 13:10:45 -07:00
Viswanath Sivakumar
d301d9df7a [ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233)
IDEEP supports fusion for non-group conv
2018-06-08 10:29:15 -07:00
Viswanath Sivakumar
4df86b6547 Update MKL exporter to IDEEP ops (#8228)
IDEEP exporter support
2018-06-06 21:43:43 -07:00
Yinghai Lu
8b70f7d248
[Caffe2] Clean up ideep integration (#6881)
* Clean up ideep integrtation

* .

* Remove redundant code in convnet benchmark

* MKL ON

* Do not add -mavx2 everywhere

* .

* Comments

* rename

* .
2018-04-24 18:32:35 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Viswanath Sivakumar
231d6f7b09 Add SqueezeOp in MKLDNN
Summary:
SqueezeOp support to drop drop dims of size 1. MKLMemory now supports Reshape()
if the buffer is in plain layout, in which case just the dims and layouts are
modified similar to caffe2::Tensor. SqueezeOp takes care of converting the
input to plain layout if needed via an intermediate buffer before calling
Reshape().

Differential Revision: D6735656

fbshipit-source-id: 953309498370e1b8986e8c593bc6963f38036255
2018-01-22 18:39:42 -08:00
Viswanath Sivakumar
b5d513b1f9 Add op in MKLDNN
Summary:
Just redirects to MKLSumOp. Doesn't support broadcast though since dnnSumCreate
expects identical dims.

Differential Revision: D6729788

fbshipit-source-id: 3e189465ad9d026bec4954648562ffe4e67fc393
2018-01-21 08:21:43 -08:00
Viswanath Sivakumar
b2964a92d9 Add MKLConcatOp
Summary:
MKLConcatOp along the channel dim of NCHW tensors. Spec:
https://software.intel.com/en-us/mkl-developer-reference-c-dnnconcatcreate

Reviewed By: ajtulloch

Differential Revision: D6689716

fbshipit-source-id: 492bc440474f8ce37caa85509789496659b03e79
2018-01-11 14:19:22 -08:00
Viswanath Sivakumar
073312eade Updates to MKL conversion script
Summary: Handling some special cases.

Reviewed By: ajtulloch

Differential Revision: D6647011

fbshipit-source-id: 6a434442da5e0a63d355242cb8df9418885c6fb4
2018-01-08 12:25:23 -08:00
Viswanath Sivakumar
4cf13cf417 Fix crash due to copying empty tensors into MKLMemory
Summary:
Ran into a scenario where if the CPU op in MKLFallbackOp outputs an empty
tensor, attempting to copy the output to MKLMemory (https://fburl.com/www2mtt4)
crashes. Modify MKLMemory to gracefully handle this. This is done at the
MKLMemory level because we want to make sure that its members such as dims and
layout are Reset() correctly.

Interestingly, MKL calls fail at different points for dims {0} and dims {0,N} despite
the buffer size being empty for both - former in dnnAllocateBuffer and
the latter in dnnConversionExecute (likely due to some difference in
layout?).

Also fixed CopyTo in addition to CopyFrom and tested all scenarios.

Reviewed By: ajtulloch

Differential Revision: D6646320

fbshipit-source-id: 61df585f610a949f312f05308baf310241dc9cb2
2017-12-30 15:36:48 -08:00
Andrew Tulloch
77b78935f2 More extensions
Reviewed By: kevinbchen

Differential Revision: D6300944

fbshipit-source-id: e915c3f3d6b475752d8b7df82ec467d86f88a7c7
2017-11-20 17:18:51 -08:00
Yangqing Jia
4d152ab931 disable sbn running mean and var comp
Summary:
cc pietern
Closes https://github.com/caffe2/caffe2/pull/1454

Differential Revision: D6310656

Pulled By: Yangqing

fbshipit-source-id: fa9a1e44b6289eb59e0388325c39b11d3b3e3ad4
2017-11-13 12:09:08 -08:00
Andrew Tulloch
ebae2f6c71 MKL Sigmoid op wrapper
Reviewed By: Yangqing

Differential Revision: D6222910

fbshipit-source-id: 92d0825a6a35a4bf6a12636e3d5dd8affcffeef3
2017-11-02 17:30:29 -07:00
Andrew Tulloch
a7644e4f4b Extend rewrite functionality to handle multiple outputs.
Summary: Still assumes a complete subgraph, but slightly more generic.

Reviewed By: Yangqing

Differential Revision: D6103228

fbshipit-source-id: bfa0d46067e05baa0478a4c37a67ccf8f81f34ec
2017-11-02 17:30:27 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Andrew Tulloch
133dc2603e Support grouped convolutions in MKL
Reviewed By: Yangqing

Differential Revision: D5487692

fbshipit-source-id: 94fb66b3b104cf16dcad07743def4ea940515689
2017-07-25 14:19:02 -07:00
Andrew Tulloch
d86f32ae2e Implement simple graph rewrite functionality.
Reviewed By: Yangqing

Differential Revision: D5487075

fbshipit-source-id: f7c7867c5cbae39cf197cf5e7ed8a64149f33208
2017-07-25 14:19:01 -07:00
Andrew Tulloch
9e6ea2987f MKLReluOp supports in-place X/Y
Reviewed By: Yangqing

Differential Revision: D5487060

fbshipit-source-id: 35d2d450f46aefc3c9395be45af99e13d1c168ec
2017-07-25 14:19:00 -07:00
Andrew Tulloch
af43e2b251 MKLConvOp handles the no bias case
Reviewed By: Yangqing

Differential Revision: D5487050

fbshipit-source-id: 4791943d331a2d7283f0f9b939f3f03e32dbdbed
2017-07-25 14:18:58 -07:00
Andrew Tulloch
f028e74fb7 Implement a filler op test
Reviewed By: Yangqing

Differential Revision: D5487042

fbshipit-source-id: 0b03683fd3822769381c14790c0c2e46162d1aaf
2017-07-25 14:18:57 -07:00
Andrew Tulloch
bbf2b578dc Implement MKL CopyTo/CopyFrom ops
Reviewed By: Yangqing

Differential Revision: D5482636

fbshipit-source-id: d044c495837aef985210f0b63d61f88f9acc3db7
2017-07-25 14:18:55 -07:00
Andrew Tulloch
71d04fd5cc Implement SumOp for MKL
Reviewed By: Yangqing

Differential Revision: D5482622

fbshipit-source-id: e1e8f8aebce874efc31fab2c870cd274ca0d037c
2017-07-25 14:18:54 -07:00
Andrew Tulloch
ad7d7657a4 Add tests to targets
Reviewed By: Yangqing

Differential Revision: D5482614

fbshipit-source-id: 04727e19b7b83b6d0d41ad3227866957480bc1ee
2017-07-25 14:18:54 -07:00
Andrew Tulloch
007492e730 Fix MKL spatial pooling test
Summary: tsia

Reviewed By: Yangqing

Differential Revision: D5482603

fbshipit-source-id: e95a8829c71125623066cfee3b76e774c7f3a46b
2017-07-25 14:18:53 -07:00
Andrew Tulloch
a7d8f489d9 Improe MKL SpatialBN test
Summary: tsia

Reviewed By: Yangqing

Differential Revision: D5482596

fbshipit-source-id: 2817ceb57154dcefffec3251efc397cba8163097
2017-07-25 14:18:52 -07:00
Hao Shi
c4c3797b0d Deprecate CNNModelHelper - Inception()
Summary:
This diff deprecates `CNNModelHelper` in `Inception()` only.

Depends on D5248848

Reviewed By: harouwu

Differential Revision: D5249312

fbshipit-source-id: 2818fb54bbae203956ed5cd5fb547508923c52a6
2017-06-15 14:03:27 -07:00
Hao Shi
b0625ff566 Deprecate CNNModelHelper - VGGA()
Summary:
This diff deprecates `CNNModelHelper` in `VGGA()` function.

Depends on D5247946

Reviewed By: harouwu

Differential Revision: D5248848

fbshipit-source-id: ede9113edb2024e4db8f0048f812050233e3fb40
2017-06-15 14:03:26 -07:00
Hao Shi
4aff677d3d Deprecate CNNModelHelper - OverFeat()
Summary:
This diff deprecates `CNNModelHelper` in `OverFeat()` function.

Depends on D5247004

Reviewed By: harouwu

Differential Revision: D5247946

fbshipit-source-id: 6a5299ec71f78e0b81a43212a028651522ab8f4b
2017-06-15 14:03:25 -07:00
Hao Shi
078589d7c6 Deprecate CNNModelHelper - AlexNet()
Summary:
This diff deprecates `CNNModelHelper` in the `AlexNet()` function. More diffs will be coming to deprecate the helper in other functions.

Depends on D5241738

Reviewed By: harouwu

Differential Revision: D5247004

fbshipit-source-id: eec5c5ef916a85de8289cb92d2174a6a4b8075bf
2017-06-15 14:03:24 -07:00
Hao Shi
c095b3f67f Deprecate CNNModelHelper - MLP()
Summary: This diff deprecates `CNNModelHelper` in `MLP()` function.

Reviewed By: harouwu

Differential Revision: D5241738

fbshipit-source-id: 03669a4166a02257aa5779860d06b40d7496104d
2017-06-15 14:03:23 -07:00
intel
b3b66e3d00 MKL related files with review comments incorporated
Summary:
This PR is based on commit "977c6b3" as this version allows MKL to use all the cores available.
All MKL related files are added here after incorporating review comments, major changes include

1. usage of Clang-format(Linter) with --style = Google
2. usage of macros for checking input and filter dimension in the mkl operators
3. merged Max and Average pooling functions
4. created a new folder for mkl related python scripts in Python folder and moved them there
5. there is no mkl_alexnet_test.py as that was redundant while convnet_benchmark.py does the same thing
Closes https://github.com/caffe2/caffe2/pull/270

Differential Revision: D4905219

Pulled By: Yangqing

fbshipit-source-id: e5f5b189714a835b93b9ebda24c52e09572dfca7
2017-04-25 00:31:29 -07:00