Summary:
When building iOS apps with a caffe2 dependency, we were seeing the `caffe2/caffe2/mobile/contrib/ios/mpscnn/mpscnn.mm:33:17: error: method 'copyWithZone:' in protocol 'NSCopying' not implemented [-Werror,-Wprotocol]`. This fixes it by implementing a shallow copy with that method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9748
Reviewed By: jerryzh168
Differential Revision: D8954332
Pulled By: williamtwilson
fbshipit-source-id: 0cd44408257c0bd3f4ffb80312ea9d13d13e5ff3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9350
Re-apply #9270
Breaking this out of #8338
This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer.
Reviewed By: mingzhe09088
Differential Revision: D8794431
fbshipit-source-id: de656334af46c697802073f8e8d9a6aeb9ca65a7
Summary:
Breaking this out of #8338
This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer.
cc mingzhe09088 smessmer BIT-silence Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9270
Reviewed By: mingzhe09088
Differential Revision: D8768025
Pulled By: orionr
fbshipit-source-id: 5b34017aeb67e35a1b5938d962181ccd4cd37591
* [mpscnn] MPSCNNChannelShuffle
att
* [Easy] Adding tags as an argument to the functional layer
Without it "tags" would be added as an argument to the operator.
The change here is based on the assumption that there is no operator that takes "tags" as an argument.
* Fix locally_connected_op schema check.
Fix locally_connected_op schema check.
* [C2] Add TypeAndShape inference for few more operators
As desc
* [c2] Shape inference should support 0 as dimension
Tensors can have 0 in their dimension.
* Make MockHiveReader loop over and support max_examples
Replace DatasetReader with RandomDatasetReader.
So that Mock Hive Reader can simulate a large data input using a small sample file as source.
* Utility function to wipe cache between benchmark runs
Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache.
* Allow caffe2 GlobalInit to be invoked multiple times
Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization.
* Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes
Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG
* Rethrow current exception on failure
Rethrow current exception instead of copy constructing a new one on op failure.
* Make `clone()` return subclass of List/Struct
`clone()` is not working correctly when we subclass those classes
* Wipe the cache before the net run
the util function is copied from D7409424
will rebase once D7409424 is landed.
* [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds
* Correct includes
async_polling include -> async_base include
* Prepare execution flags for executor migration
Making async_scheduling aware of underlying net type to prepare for executor
migration
* Add operator level observers into async executor
Adding operator level observers into RunAsync operators' calls
* Cleanup TEST_Benchmark
Remove duplicate code and provide default implementation in NetBase
* [C2] Fix type and shape inference for binary comparison ops
As desc.
* Add GlobalInit to predictor to ensure initialization is always done before prediction
FACEBOOK:
Redo D7651453 the correct way.
Now use a static variable for the arguments passed to GLog
* Remove spammy log message
This method is currently used in various places inside Caffe itself.
* Disable events for operators inside a chain
We don't need to use events in operators within a chain because the chain is
always scheduled on a single stream, keeping only first and last event for
scheduling purposes
* Ensure correct finish run order
In rare cases we might call finishRun and trigger net's destruction while
another worker is still holding shared_ptr to a thread pool, that can cause
thread pool destruction from within a worker thread in case no other nets are
using the pool. This diff fixes the order of calling finishRun and also changes
pool() to return raw pointer to keep pool's ownership within the net
* Reduce unnecessary polling
Make sure we don't waste CPU by polling operators that we can set an efficient
callbacks on
* Squash commit of syncing 9506eeb from github to fbcode
Patch xplat buck fix
add virtual destructor to OptimizationPass
add virtual destructor to OptimizationPass
build fixes for sync
build fixes for sync
* Fix net tracing
Fix net tracing from async_scheduling
* Fix logging
ARM64 clang from Android NDK doesn't define __ARM_NEON__, which results is perf regression on some models. I figured that some compilers define __ARM_NEON__ while others define __ARM_NEON. This patch changes all NEON-specific parts in Caffe2 to check both macros.
LOG(INFO) can be stripped out at compile-time or disabled at run-time,
but there're hardly use-cases where we want to call TEST_Benchmark,
but don't want to see the result. Additionally, on Android, LOG(INFO)
writes to logcat, which is OK for errors/warnings, but inconvenient
for benchmarking results, as on new phones logcat spawns logs like crazy.
- Remove USE_ARM64 option because it doesn't do what is expected
- Disable ARM ComputeLibrary for non-ARM/ARM64 builds
- Remove analysis of CMake options from scripts/build_android.sh
- Add user-specified CMake options at the end of command line to allow overriding defaults
- Update README for ARM ComputeLibrary integration and do not require to disable NNPACK for ARM64 build with ARM ComputeLibrary
Summary:
To build with tests and benchmarks
`./scripts/build_android.sh -G Ninja -DBUILD_TEST=ON -DUSE_NNAPI=ON`
To run unit test
`adb push build_android/bin/nnapi_test data/local/tmp`
`adb shell "cd data/local/tmp &&./nnapi_test`
To run benchmark
`adb push build_android/bin/nnapi_benchmark data/local/tmp`
`adb shell "cd data/local/tmp &&./nnapi_benchmark`
Tested on Google PIxel 2 XL with android 8.1
Closes https://github.com/caffe2/caffe2/pull/1918
Reviewed By: Maratyszcza
Differential Revision: D6944604
Pulled By: hlu1
fbshipit-source-id: 462f010117ae4628b23bef506c41397de3817ad4
Summary: Integrate android nn api into Caffe2. Supported ops include averagepool, maxpool, conv, relu, and softmax
Reviewed By: Maratyszcza
Differential Revision: D6560366
fbshipit-source-id: 2879a99c01acb050e711d9d7d5bde022ef95888d
Summary:
we are going to deprecate NNPACK bindings in caffe2/contrib/nnpack.
The first step is to move modern NNPACK bindings from caffe2/mobile/contrib/ios/ to
caffe2/share/contrib/nnpack/, and is implemented in this diff.
Reviewed By: sf-wind
Differential Revision: D6687454
fbshipit-source-id: 458614bade92ab5ba5d2ab7f0691071043198b57
Summary:
Imported and modified from https://github.com/ARM-software/vulkan-sdk
I changed libvulkan-stub.cpp to libvulkan-stub.c
Reviewed By: Maratyszcza
Differential Revision: D6641092
fbshipit-source-id: 1a7fbf745d58b6111a06a983910c583912365357
Summary: Use MPSCNNDepthwiseConv when groups == input_channels
Reviewed By: ajtulloch
Differential Revision: D6541561
fbshipit-source-id: 7164f26b8f3a101c0ab5c3e6c02ed855397d2750
Summary: Ran into some issues where these values seemed to be initialized to 0 and caused some trouble. Initializing to 1 is safe and well defined.
Reviewed By: hlu1
Differential Revision: D6582774
fbshipit-source-id: 088ec4e782d9680a1d9b4d2d42523d06cbc7dd72
Summary: Turns out that similar to RoIWarp, col2im in custom ConvTranspose implementation is also missing a bound check for image.
Reviewed By: ajtulloch
Differential Revision: D6494061
fbshipit-source-id: 1fadbdd05f360b20343df49b70d2be65eab128ac
Summary: Fix MPSCNNRoIWarp and made it more general to channels
Reviewed By: ajtulloch
Differential Revision: D6493869
fbshipit-source-id: 77cfa2e2f3bd80efc6e69a0774793e0162d9942a
Summary: The case when sampling_ratio = 0 was skipped before, this diff enables that setting.
Reviewed By: ajtulloch
Differential Revision: D6366669
fbshipit-source-id: 4f3b9eaf47eb9dc20823935428d3d886ea32a5fc
Summary: This is a reapplication of the earlier PR due to xplat move. Original author is Christoph Conrads <christoph.conrads@fluent.ai> christoph-conrads .
Reviewed By: houseroad
Differential Revision: D6379736
fbshipit-source-id: b7482ecf3b9487a528c15e92976e915791210002
Summary:
The source files are not exposed to the parent directory in mobile. Expose them now so that the files are built in OSS.
Closes https://github.com/caffe2/caffe2/pull/1435
Reviewed By: akyrola
Differential Revision: D6274056
Pulled By: sf-wind
fbshipit-source-id: 6b54645bc9a42b4329d8aa20051abeb5fc6b1c37
Summary:
replaces FB-internal NNPACK fork with open-source version.
Important FB features are already upstreamed to the GitHub repo.
Reviewed By: ajtulloch
Differential Revision: D6224054
fbshipit-source-id: 4dbe02b4da97648a663586414550c2d4e23c7221
Summary:
makes the necessary changes to support Caffe2 OpenGL ES backend on NVIDIA Tegra devices
- Remove no_bounds global because Tegra GLES driver doesn't recognize it as a constant. Define BOUNDS_CHECK_MODE macro instead.
- Recognize "NVIDIA Tegra" as a supported GL_RENDERER
Reviewed By: hlu1
Differential Revision: D6030760
fbshipit-source-id: e3655467612469d69c70b3fee35edb2d6774a793
Summary:
Separate class definition into header file
Remove uniform buffer initialization in the constructor because it's not necessary
Separate tiling and batching code
Reviewed By: jerryzh168
Differential Revision: D5960502
fbshipit-source-id: 5e3bce5192ce6dc69868be1722f490f690d87076
Summary: Clean up the metal remnants in BUCK now that the metal code has been removed
Reviewed By: bwasti
Differential Revision: D5966095
fbshipit-source-id: 6b022624fe91a6728549d93d2954328c6b4e059e
Summary:
D5772847 is breaking real time style transfer on android and conv unit tests on iPhone 7 upgraded to iOS 11.
The temporary fix in D5908415 only fixes android. iPhone 7 is still crashing.
I think these two diffs should be backed out before D5772847 is fully debugged
Reviewed By: fricc33
Differential Revision: D5913834
fbshipit-source-id: b8072c59c83adfed8a0b0ab0f42c39bc4398c7a0
Summary:
Implementation of MPSCNNMul that only supports multiplying a tensor with a scalar value for now.
Benchmark runtime for CPU, OpenGL and MPSCNN:
```
I0919 21:15:17.942468 3068398464 net_simple.cc:103] Main run finished. Milliseconds per iter: 527.795. Iters per second: 1.89467
I0919 21:15:21.043023 3068398464 opengl_test.cc:2293] Main run finished. Milliseconds per iter: 249.766. Iters per second: 4.00374
I0919 21:15:23.182369 3068398464 net_simple.cc:103] Main run finished. Milliseconds per iter: 175.548. Iters per second: 5.69644
```
Reviewed By: hlu1
Differential Revision: D5870100
fbshipit-source-id: 2aadd5d134f3b8b40a41f638040cbef35a0086df
Summary: Remove the caffe2 namespace {} because all the code inside opengl_test.cc is wrapped inside the caffe2 namespace
Reviewed By: Maratyszcza
Differential Revision: D5829458
fbshipit-source-id: e68dde08a1c3dc4c41260f5f028ca7efe8d34fbd
Summary: Kernel data and other shader parameters are now cached directly into uniform buffer blocks, and the blocks are dynamically attached at run time.
Reviewed By: hlu1
Differential Revision: D5772847
fbshipit-source-id: 746448c2d5db12e38fb883874ede3acfccb9f6ef
Summary: The android segmentation net was failing with MPSCNN because the some fused MPSCNNConvRelu ops become in-place after fusion.
Reviewed By: fricc33
Differential Revision: D5803245
fbshipit-source-id: 6808e9c3504389c113c7a16504d6554e83bdcc3e
Summary:
If the Gloo InfiniBand transport is used, the Gloo algorithms can use
GPUDirect to DMA directly from/to GPU memory. This is done through the
CudaDeviceWorkspace. This change adds a "gpu_direct" option to the
Allreduce operator that makes it use GPUDirect if the transport
supports it.
Closes https://github.com/caffe2/caffe2/pull/1203
Reviewed By: wesolwsk
Differential Revision: D5806366
Pulled By: pietern
fbshipit-source-id: 9e9a78f059f2b5c6e4fbf6574b7db4776a94696c
Summary: The convolution should not run with input texture slices > 1 with tiling
Differential Revision: D5774187
fbshipit-source-id: 5e94f82cd65e0d4425a7a0090a61a33bef2a14fc