Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71508
"==" is the more universal way to test type equalities, and also ::get() doesn't incur any refcount overhead now, so we can swtich to == instead of relying on type kinds.
ghstack-source-id: 147353057
Test Plan:
CI
buck test xplat/caffe2/android:pytorch_jni_common_test
Differential Revision: D33672433
fbshipit-source-id: 5973fd40de48b8017b5c3ebaa55bcf5b4b391aa3
(cherry picked from commit db84a4b566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421
I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` *solely* to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace.
Test Plan: Unit tests and CI.
Reviewed By: aaronenyeshi, albanD
Differential Revision: D32865907
fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66741
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D31705360
fbshipit-source-id: 7115f76e381ad2d98584eb534961c3cbb957ebaa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64602
print type name in error message for easier debugging.
Test Plan:
Example:
java.lang.IllegalStateException: Expected IValue type Tensor, actual type TensorList
Reviewed By: beback4u
Differential Revision: D30782318
fbshipit-source-id: 60d88a659e7b4bb2b574b12c7652a28f0d5ad0d2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419
This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.
Test Plan:
MobileProfiler.ModuleHierarchy
Imported from OSS
Reviewed By: raziel
Differential Revision: D29993660
fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
Summary:
I build using [Bazel](https://bazel.build/).
When I use `pytorch_android` in latest Android app, I get the following error due to dependencies:
```
$ bazel build //app/src/main:app
WARNING: API level 30 specified by android_ndk_repository 'androidndk' is not available. Using latest known API level 29
INFO: Analyzed target //app/src/main:app (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/H1Gdev/android-bazel-app/app/src/main/BUILD.bazel:3:15: Merging manifest for //app/src/main:app failed: (Exit 1): ResourceProcessorBusyBox failed: error executing command bazel-out/k8-opt-exec-2B5CBBC6/bin/external/bazel_tools/src/tools/android/java/com/google/devtools/build/android/ResourceProcessorBusyBox --tool MERGE_MANIFEST -- --manifest ... (remaining 11 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox ResourceProcessorBusyBox failed: error executing command bazel-out/k8-opt-exec-2B5CBBC6/bin/external/bazel_tools/src/tools/android/java/com/google/devtools/build/android/ResourceProcessorBusyBox --tool MERGE_MANIFEST -- --manifest ... (remaining 11 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
Error: /home/H1Gdev/.cache/bazel/_bazel_H1Gdev/29e18157a4334967491de4cc9a879dc0/sandbox/linux-sandbox/914/execroot/__main__/app/src/main/AndroidManifest.xml:19:18-86 Error:
Attribute application@appComponentFactory value=(androidx.core.app.CoreComponentFactory) from [maven//:androidx_core_core] AndroidManifest.xml:19:18-86
is also present at [maven//:com_android_support_support_compat] AndroidManifest.xml:19:18-91 value=(android.support.v4.app.CoreComponentFactory).
Suggestion: add 'tools:replace="android:appComponentFactory"' to <application> element at AndroidManifest.xml:5:5-19:19 to override.
May 19, 2021 10:45:03 AM com.google.devtools.build.android.ManifestMergerAction main
SEVERE: Error during merging manifests
com.google.devtools.build.android.AndroidManifestProcessor$ManifestProcessingException: Manifest merger failed : Attribute application@appComponentFactory value=(androidx.core.app.CoreComponentFactory) from [maven//:androidx_core_core] AndroidManifest.xml:19:18-86
is also present at [maven//:com_android_support_support_compat] AndroidManifest.xml:19:18-91 value=(android.support.v4.app.CoreComponentFactory).
Suggestion: add 'tools:replace="android:appComponentFactory"' to <application> element at AndroidManifest.xml:5:5-19:19 to override.
at com.google.devtools.build.android.AndroidManifestProcessor.mergeManifest(AndroidManifestProcessor.java:186)
at com.google.devtools.build.android.ManifestMergerAction.main(ManifestMergerAction.java:217)
at com.google.devtools.build.android.ResourceProcessorBusyBox$Tool$5.call(ResourceProcessorBusyBox.java:93)
at com.google.devtools.build.android.ResourceProcessorBusyBox.processRequest(ResourceProcessorBusyBox.java:233)
at com.google.devtools.build.android.ResourceProcessorBusyBox.main(ResourceProcessorBusyBox.java:177)
Warning:
See http://g.co/androidstudio/manifest-merger for more information about the manifest merger.
Target //app/src/main:app failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2.221s, Critical Path: 1.79s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully
```
This is due to conflict between `AndroidX` and `Support Library` on which `pytorch_android_torch` depends.
(In the case of `Gradle`, it is avoided by `android.useAndroidX`.)
I created [Android application](https://github.com/H1Gdev/android-bazel-app) for comparison.
At first, I updated `AppCompat` from `Support Library` to `AndroidX`, but `pytorch_android` and `pytorch_android_torchvision` didn't seem to need any dependencies, so I removed dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58527
Reviewed By: xta0
Differential Revision: D28585234
Pulled By: IvanKobzarev
fbshipit-source-id: 78aa6b1525543594ae951a6234dd88a3fdbfc062
Summary:
Build lite interpreter as default for android, should wait until https://github.com/pytorch/pytorch/pull/56002 lands
Mainly two changes:
1. Use lite interpreter as default for Android
2. Switch the lite interpreter build test to full jit build test
Test Plan: Imported from OSS
Differential Revision: D27695530
Reviewed By: IvanKobzarev
Pulled By: cccclai
fbshipit-source-id: e1b2c70fee6590accc22c7404b9dd52c7d7c36e2
Summary:
Switching pytorch android to use fbjni from prefab dependencies
Bumping version of fbjni to 0.2.2
soloader version to 0.10.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55066
Reviewed By: dreiss
Differential Revision: D27469727
Pulled By: IvanKobzarev
fbshipit-source-id: 2ab22879e81c9f2acf56807c6a133b0ca20bb40a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53567
Updating gradle to version 6.8.3
Proper zip was uploaded to aws.
Successful CI check: https://github.com/pytorch/pytorch/pull/53619
Test Plan: Imported from OSS
Reviewed By: dreiss
Differential Revision: D26928885
Pulled By: IvanKobzarev
fbshipit-source-id: b1081052967d9080cd6934fd48c4dbe933630e49
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51419
## Summary
1. Add an option `BUILD_LITE_INTERPRETER` in `caffe2/CMakeLists.txt` and set `OFF` as default.
2. Update 'build_android.sh' with an argument to swtich `BUILD_LITE_INTERPRETER`, 'OFF' as default.
3. Add a mini demo app `lite_interpreter_demo` linked with `libtorch` library, which can be used for quick test.
## Test Plan
Built lite interpreter version of libtorch and test with Image Segmentation demo app ([android version](https://github.com/pytorch/android-demo-app/tree/master/ImageSegmentation)/[ios version](https://github.com/pytorch/ios-demo-app/tree/master/ImageSegmentation))
### Android
1. **Prepare model**: Prepare the lite interpreter version of model by run the script below to generate the scripted model `deeplabv3_scripted.pt` and `deeplabv3_scripted.ptl`
```
import torch
model = torch.hub.load('pytorch/vision:v0.7.0', 'deeplabv3_resnet50', pretrained=True)
model.eval()
scripted_module = torch.jit.script(model)
# Export full jit version model (not compatible lite interpreter), leave it here for comparison
scripted_module.save("deeplabv3_scripted.pt")
# Export lite interpreter version model (compatible with lite interpreter)
scripted_module._save_for_lite_interpreter("deeplabv3_scripted.ptl")
```
2. **Build libtorch lite for android**: Build libtorch for android for all 4 android abis (armeabi-v7a, arm64-v8a, x86, x86_64) `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh`. This pr is tested on Pixel 4 emulator with x86, so use cmd `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` to specify abi to save built time. After the build finish, it will show the library path:
```
...
BUILD SUCCESSFUL in 55s
134 actionable tasks: 22 executed, 112 up-to-date
+ find /Users/chenlai/pytorch/android -type f -name '*aar'
+ xargs ls -lah
-rw-r--r-- 1 chenlai staff 13M Feb 11 11:48 /Users/chenlai/pytorch/android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
-rw-r--r-- 1 chenlai staff 36K Feb 9 16:45 /Users/chenlai/pytorch/android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
```
3. **Use the PyTorch Android libraries built from source in the ImageSegmentation app**: Create a folder 'libs' in the path, the path from repository root will be `ImageSegmentation/app/libs`. Copy `pytorch_android-release` to the path `ImageSegmentation/app/libs/pytorch_android-release.aar`. Copy 'pytorch_android_torchvision` (downloaded from [here](https://oss.sonatype.org/#nexus-search;quick~torchvision_android)) to the path `ImageSegmentation/app/libs/pytorch_android_torchvision.aar` Update the `dependencies` part of `ImageSegmentation/app/build.gradle` to
```
dependencies {
implementation 'androidx.appcompat:appcompat:1.2.0'
implementation 'androidx.constraintlayout:constraintlayout:2.0.2'
testImplementation 'junit:junit:4.12'
androidTestImplementation 'androidx.test.ext:junit:1.1.2'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0'
implementation(name:'pytorch_android-release', ext:'aar')
implementation(name:'pytorch_android_torchvision', ext:'aar')
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'
}
```
Update `allprojects` part in `ImageSegmentation/build.gradle` to
```
allprojects {
repositories {
google()
jcenter()
flatDir {
dirs 'libs'
}
}
}
```
4. **Update model loader api**: Update `ImageSegmentation/app/src/main/java/org/pytorch/imagesegmentation/MainActivity.java` by
4.1 Add new import: `import org.pytorch.LiteModuleLoader;`
4.2 Replace the way to load pytorch lite model
```
// mModule = Module.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.pt"));
mModule = LiteModuleLoader.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.ptl"));
```
5. **Test app**: Build and run the ImageSegmentation app in Android Studio,

### iOS
1. **Prepare model**: Same as Android.
2. **Build libtorch lite for ios** `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR BUILD_LITE_INTERPRETER=1 ./scripts/build_ios.sh`
3. **Remove Cocoapods from the project**: run `pod deintegrate`
4. **Link ImageSegmentation demo app with the custom built library**:
Open your project in XCode, go to your project Target’s **Build Phases - Link Binaries With Libraries**, click the **+** sign and add all the library files located in `build_ios/install/lib`. Navigate to the project **Build Settings**, set the value **Header Search Paths** to `build_ios/install/include` and **Library Search Paths** to `build_ios/install/lib`.
In the build settings, search for **other linker flags**. Add a custom linker flag below
```
-all_load
```
Finally, disable bitcode for your target by selecting the Build Settings, searching for Enable Bitcode, and set the value to No.
**
5. Update library and api**
5.1 Update `TorchModule.mm``
To use the custom built libraries the project, replace `#import <LibTorch/LibTorch.h>` (in `TorchModule.mm`) which is needed when using LibTorch via Cocoapods with the code below:
```
//#import <LibTorch/LibTorch.h>
#include "ATen/ATen.h"
#include "caffe2/core/timer.h"
#include "caffe2/utils/string_utils.h"
#include "torch/csrc/autograd/grad_mode.h"
#include "torch/script.h"
#include <torch/csrc/jit/mobile/function.h>
#include <torch/csrc/jit/mobile/import.h>
#include <torch/csrc/jit/mobile/interpreter.h>
#include <torch/csrc/jit/mobile/module.h>
#include <torch/csrc/jit/mobile/observer.h>
```
5.2 Update `ViewController.swift`
```
// if let filePath = Bundle.main.path(forResource:
// "deeplabv3_scripted", ofType: "pt"),
// let module = TorchModule(fileAtPath: filePath) {
// return module
// } else {
// fatalError("Can't find the model file!")
// }
if let filePath = Bundle.main.path(forResource:
"deeplabv3_scripted", ofType: "ptl"),
let module = TorchModule(fileAtPath: filePath) {
return module
} else {
fatalError("Can't find the model file!")
}
```
### Unit test
Add `test/cpp/lite_interpreter`, with one unit test `test_cores.cpp` and a light model `sequence.ptl` to test `_load_for_mobile()`, `bc.find_method()` and `bc.forward()` functions.
### Size:
**With the change:**
Android:
x86: `pytorch_android-release.aar` (**13.8 MB**)
IOS:
`pytorch/build_ios/install/lib` (lib: **66 MB**):
```
(base) chenlai@chenlai-mp lib % ls -lh
total 135016
-rw-r--r-- 1 chenlai staff 3.3M Feb 15 20:45 libXNNPACK.a
-rw-r--r-- 1 chenlai staff 965K Feb 15 20:45 libc10.a
-rw-r--r-- 1 chenlai staff 4.6K Feb 15 20:45 libclog.a
-rw-r--r-- 1 chenlai staff 42K Feb 15 20:45 libcpuinfo.a
-rw-r--r-- 1 chenlai staff 39K Feb 15 20:45 libcpuinfo_internals.a
-rw-r--r-- 1 chenlai staff 1.5M Feb 15 20:45 libeigen_blas.a
-rw-r--r-- 1 chenlai staff 148K Feb 15 20:45 libfmt.a
-rw-r--r-- 1 chenlai staff 44K Feb 15 20:45 libpthreadpool.a
-rw-r--r-- 1 chenlai staff 166K Feb 15 20:45 libpytorch_qnnpack.a
-rw-r--r-- 1 chenlai staff 384B Feb 15 21:19 libtorch.a
-rw-r--r-- 1 chenlai staff **60M** Feb 15 20:47 libtorch_cpu.a
```
`pytorch/build_ios/install`:
```
(base) chenlai@chenlai-mp install % du -sh *
14M include
66M lib
2.8M share
```
**Master (baseline):**
Android:
x86: `pytorch_android-release.aar` (**16.2 MB**)
IOS:
`pytorch/build_ios/install/lib` (lib: **84 MB**):
```
(base) chenlai@chenlai-mp lib % ls -lh
total 172032
-rw-r--r-- 1 chenlai staff 3.3M Feb 17 22:18 libXNNPACK.a
-rw-r--r-- 1 chenlai staff 969K Feb 17 22:18 libc10.a
-rw-r--r-- 1 chenlai staff 4.6K Feb 17 22:18 libclog.a
-rw-r--r-- 1 chenlai staff 42K Feb 17 22:18 libcpuinfo.a
-rw-r--r-- 1 chenlai staff 1.5M Feb 17 22:18 libeigen_blas.a
-rw-r--r-- 1 chenlai staff 44K Feb 17 22:18 libpthreadpool.a
-rw-r--r-- 1 chenlai staff 166K Feb 17 22:18 libpytorch_qnnpack.a
-rw-r--r-- 1 chenlai staff 384B Feb 17 22:19 libtorch.a
-rw-r--r-- 1 chenlai staff 78M Feb 17 22:19 libtorch_cpu.a
```
`pytorch/build_ios/install`:
```
(base) chenlai@chenlai-mp install % du -sh *
14M include
84M lib
2.8M share
```
Test Plan: Imported from OSS
Reviewed By: iseeyuan
Differential Revision: D26518778
Pulled By: cccclai
fbshipit-source-id: 4503ffa1f150ecc309ed39fb0549e8bd046a3f9c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48620
In preparation for storing bare function pointer (8 bytes)
instead of std::function (32 bytes).
ghstack-source-id: 118568242
Test Plan: CI
Reviewed By: ezyang
Differential Revision: D25132183
fbshipit-source-id: 3790cfb5d98479a46cf665b14eb0041a872c13da
Summary:
### Java, CPP
Introducing additional parameter `device` to LiteModuleLoader to specify device on which the `forward` will work.
On the java side this is enum that contains CPU and VULKAN, passing as jint to jni side and storing it as a member field on the same level as module.
On pytorch_jni_lite.cpp - for all input tensors converting them to vulkan.
On pytorch_jni_common.cpp (also goes to OSS) - if result Tensor is not cpu - call cpu. (Not Cpu at the moment is only Vulkan).
### BUCK
Introducing `pytorch_jni_lite_with_vulkan` target, that depends on `pytorch_jni_lite_with_vulkan` and adds `aten_vulkan`
In that case `pytorch_jni_lite_with_vulkan` can be used along with `pytorch_jni_lite_with_vulkan`.
Test Plan:
After the following diff with aidemo segmentation:
```
buck install -r aidemos-android
```
{F296224521}
Reviewed By: dreiss
Differential Revision: D23198335
fbshipit-source-id: 95328924e398901d76718c4d828f96e112dfa1b0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202
In preparation for changing mobile run_method() to be variadic, this diff:
* Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist.
* Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects.
ghstack-source-id: 111848222
Test Plan: CI, and all the unit tests which currently contain run_method that are being changed.
Reviewed By: iseeyuan
Differential Revision: D23436351
fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40785
The main goal of this change is to support creating Tensors specifying blob in NHWC (ChannelsLast) format.
ChannelsLast is supported only for 4-dim tensors, this is enforced on LibTorch side, I have not added asserts on java side in case that this limitation will be changed in future and not to have double asserts.
Additional changes in `aten/src/ATen/templates/Functions.h`:
`from_blob` creates `at::empty({0}, options)` tensor first and sets it Storage with sizes and strides afterwards.
But as ChannelsLast is only for 4-dim tensors - it fails on that creation, as dim==1.
I've added `zero_sizes()` function that returns `{0, 0, 0, 0}` for ChannelsLast and ChannelsLast3d.
Test Plan: Imported from OSS
Reviewed By: dreiss
Differential Revision: D22396244
Pulled By: IvanKobzarev
fbshipit-source-id: 02582d748a554e0f859aefe71cd2c1e321fb8979
Summary:
These were added accidentally (probably by an IDE) during a refactor.
These files have always been Open Source.
Test Plan: CI
Reviewed By: xcheng16
Differential Revision: D23250761
fbshipit-source-id: 4974430c0e28dd3269424d38edb36f4f71508157
Summary:
1. Modularize some bzl files to break circular buck load
2. Use query-based on instrumentation_tests
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: kwanmacher
Differential Revision: D22188728
fbshipit-source-id: affbabd333c51c8b1549af6602c6bb79fabb7236
Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.
Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`
Reviewed By: xcheng16
Differential Revision: D22199952
fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40442
Problem:
Nightly builds do not include libtorch headers as local build.
The reason is that on docker images path is different than local path when building with `scripts/build_pytorch_android.sh`
Solution:
Introducing gradle property to be able to specify it and add its specification to gradle build job and snapshots publishing job which run on the same docker image.
Test:
ci-all jobs check https://github.com/pytorch/pytorch/pull/40443
checking that gradle build will result with headers inside aar
Test Plan: Imported from OSS
Differential Revision: D22190955
Pulled By: IvanKobzarev
fbshipit-source-id: 9379458d8ab024ee991ca205a573c21d649e5f8a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243
*** Why ***
As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version.
The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands.
This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell.
So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do.
The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the
exact same third party implementation in this PR.
Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged.
*** How ***
This is where things get tricky.
A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use.
pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far.
Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration.
When it is all said or done, the layering will look like this:
a) aten::parallel_for, uses
b) caffe2::PThreadPool, which uses
c) pthreadpool C API, which delegates to
c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here.
c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to
c-2-1) caffe2::ThreadPool, and the rabbit hole ends here.
NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b).
Differential Revision: D21232894
Test Plan: Imported from OSS
Reviewed By: dreiss
Pulled By: AshkanAliabadi
fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354