Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.
Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`
Reviewed By: xcheng16
Differential Revision: D22199952
fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243
*** Why ***
As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version.
The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands.
This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell.
So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do.
The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the
exact same third party implementation in this PR.
Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged.
*** How ***
This is where things get tricky.
A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use.
pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far.
Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration.
When it is all said or done, the layering will look like this:
a) aten::parallel_for, uses
b) caffe2::PThreadPool, which uses
c) pthreadpool C API, which delegates to
c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here.
c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to
c-2-1) caffe2::ThreadPool, and the rabbit hole ends here.
NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b).
Differential Revision: D21232894
Test Plan: Imported from OSS
Reviewed By: dreiss
Pulled By: AshkanAliabadi
fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354
Summary:
Ignore mixed upper-case/lower-case style for now
Fix space between function and its arguments violation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574
Test Plan: CI
Differential Revision: D20712969
Pulled By: malfet
fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78
Summary:
This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions.
In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.)
The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.)
The CMake changes try to use the NNPack we already have in git.
In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924
Differential Revision: D13709576
Pulled By: ezyang
fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c
Caffe2 started with an option to use NNPACK pre-installed in the system.
Now this option is mostly legacy, as Caffe2 can include NNPACK in its own build on all platforms.
Due to problems when pre-installed NNPACK is built with different dependencies or compiler options, we decided to remove this option and alwyas build NNPACK with Caffe2.
This change makes Caffe2 always build NNPACK as part of its own build, and updates NNPACK and cpuinfo submodules.
Summary:
Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time
Closes https://github.com/caffe2/caffe2/pull/1917
Reviewed By: orionr
Differential Revision: D6938735
Pulled By: Maratyszcza
fbshipit-source-id: 841a6c47a1cd003a19f48f6c256aa4d9eb2cc6e4
Summary:
Original commit changeset: d0c1c7681605
Reverting due to broken OSS build due to this commit
Reviewed By: bddppq
Differential Revision: D6935666
fbshipit-source-id: 955cfeb6d5a4ed265b2e099094cfb5bfe960ff95
Summary:
Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time
Closes https://github.com/caffe2/caffe2/pull/1901
Differential Revision: D6930731
Pulled By: Maratyszcza
fbshipit-source-id: d0c1c7681605d957de6f51bd24fbb25afc0f282f
Summary:
- Fix path to FXdiv and FP16 dependencies
- Link cpuinfo library
- Pull NNPACK fix for PYTHONPATH handling when launching PeachPy
- Pull cpuinfo fix for cross-compiling on Linux for Android
- Pull cpuinfo fix for CPUINFO_LIBRARY_TYPE support
- Pull cpuinfo fix for iOS builds
Closes https://github.com/caffe2/caffe2/pull/1869
Differential Revision: D6881428
Pulled By: Maratyszcza
fbshipit-source-id: 7b4115daa090096dbd97303503792e7b144fbb43
Summary:
Few people complained in NNPACK repo about broken build on PPC64, as it specifically whitelists supported architecture in its CMakeLists.txt, and refuses to build on unsupported platforms. This commit explicitly disables NNPACK build (as part of Caffe2 build) on unsupported architectures.
Closes https://github.com/caffe2/caffe2/pull/1439
Differential Revision: D6288999
Pulled By: Maratyszcza
fbshipit-source-id: 76c40e9ce882356944b63968df8fd853f21ecd35
Summary:
NNPACK now supports building with CMake, and its build scripts have advantages over the ones in Caffe2:
- They automatically download all dependencies, no need to keep them in submodules anymore
- They automatically download and setup PeachPy for x86-64 build
- The same scripts are used for server/desktop (Linux, macOS) and mobile (Android/iOS)
- They unblock Caffe2 build with Ninja
Closes https://github.com/caffe2/caffe2/pull/1382
Reviewed By: Yangqing
Differential Revision: D6150723
Pulled By: Maratyszcza
fbshipit-source-id: 7c3e4e3406f60d4cc059e1c8112cb10aa3d75ece
Summary:
I closed https://github.com/caffe2/caffe2/pull/736 because one of these variables should be used after all.
Here's how C1 uses this variable: https://github.com/BVLC/caffe/blob/rc5/cmake/Targets.cmake#L116
Without this fix, there is a race condition in the parallel build leading to this error:
```
make[2]: *** No rule to make target `../third_party/NNPACK/lib/libnnpack.a', needed by `caffe2/libCaffe2_CPU.so'.
```
Closes https://github.com/caffe2/caffe2/pull/737
Differential Revision: D5211794
Pulled By: Yangqing
fbshipit-source-id: 9e368f09b01edaf86252727adc6f6cc40d244e29