pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Elijah Rippeth	b5479737d7	Add windows JNI support (#44257 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44257 Reviewed By: malfet Differential Revision: D24332820 Pulled By: ezyang fbshipit-source-id: 1dd97e9c8140129a02a9078623b190b33f30d5b0	2020-10-15 10:48:45 -07:00
generatedunixname89002005325674	592b398e82	[AutoAccept][Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D24044052 fbshipit-source-id: 50ac5b7480ed65af94617bf8b014252ea7b27c4f	2020-10-01 05:19:37 -07:00
Ivan Kobzarev	2c300fd74c	[android][vulkan] Module load argument to specify device cpu/vulkan (#44896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44896 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D23763771 Pulled By: IvanKobzarev fbshipit-source-id: 990a386ad13c704f03345dbe09e180281af913c9	2020-09-29 09:58:22 -07:00
Ivan Kobzarev	8ec6bc7292	[pytorch][vulkan][jni] LiteModuleLoader load argument to use vulkan device Summary: ### Java, CPP Introducing additional parameter `device` to LiteModuleLoader to specify device on which the `forward` will work. On the java side this is enum that contains CPU and VULKAN, passing as jint to jni side and storing it as a member field on the same level as module. On pytorch_jni_lite.cpp - for all input tensors converting them to vulkan. On pytorch_jni_common.cpp (also goes to OSS) - if result Tensor is not cpu - call cpu. (Not Cpu at the moment is only Vulkan). ### BUCK Introducing `pytorch_jni_lite_with_vulkan` target, that depends on `pytorch_jni_lite_with_vulkan` and adds `aten_vulkan` In that case `pytorch_jni_lite_with_vulkan` can be used along with `pytorch_jni_lite_with_vulkan`. Test Plan: After the following diff with aidemo segmentation: ``` buck install -r aidemos-android ``` {F296224521} Reviewed By: dreiss Differential Revision: D23198335 fbshipit-source-id: 95328924e398901d76718c4d828f96e112dfa1b0	2020-09-16 18:35:22 -07:00
Ann Shan	a61318a535	[pytorch] Replace mobile run_method with get_method and operator() (#44202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202 In preparation for changing mobile run_method() to be variadic, this diff: * Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist. * Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects. ghstack-source-id: 111848222 Test Plan: CI, and all the unit tests which currently contain run_method that are being changed. Reviewed By: iseeyuan Differential Revision: D23436351 fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577	2020-09-11 10:23:06 -07:00
generatedunixname89002005287564	28b1360d24	[Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D23536088 fbshipit-source-id: d4c6c26ed5bad4e8c1b80ac1c05bd86b36cb6aaa	2020-09-04 07:30:50 -07:00
Ivan Kobzarev	c40e3f9f98	[android][jni] Support Tensor MemoryFormat in java wrappers (#40785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40785 The main goal of this change is to support creating Tensors specifying blob in NHWC (ChannelsLast) format. ChannelsLast is supported only for 4-dim tensors, this is enforced on LibTorch side, I have not added asserts on java side in case that this limitation will be changed in future and not to have double asserts. Additional changes in `aten/src/ATen/templates/Functions.h`: `from_blob` creates `at::empty({0}, options)` tensor first and sets it Storage with sizes and strides afterwards. But as ChannelsLast is only for 4-dim tensors - it fails on that creation, as dim==1. I've added `zero_sizes()` function that returns `{0, 0, 0, 0}` for ChannelsLast and ChannelsLast3d. Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D22396244 Pulled By: IvanKobzarev fbshipit-source-id: 02582d748a554e0f859aefe71cd2c1e321fb8979	2020-09-03 17:01:35 -07:00
David Reiss	844d469ae7	Remove proprietary notices Summary: These were added accidentally (probably by an IDE) during a refactor. These files have always been Open Source. Test Plan: CI Reviewed By: xcheng16 Differential Revision: D23250761 fbshipit-source-id: 4974430c0e28dd3269424d38edb36f4f71508157	2020-08-20 20:14:59 -07:00
Martin Yuan	dfc7e71d13	[Selective Build] Apply query-based on instrumentation_tests Summary: 1. Modularize some bzl files to break circular buck load 2. Use query-based on instrumentation_tests (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: kwanmacher Differential Revision: D22188728 fbshipit-source-id: affbabd333c51c8b1549af6602c6bb79fabb7236	2020-06-26 08:05:53 -07:00
David Reiss	b7e044f0e5	Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5	2020-06-23 19:26:21 -07:00
Kate Mormysh	92d3182c11	Revert D21232894: Unify PyTorch mobile's threadpool usage. Test Plan: revert-hammer Differential Revision: D21232894 (`b9d3869df3`) Original commit changeset: 8b3de86247fb fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd	2020-06-23 17:09:14 -07:00
Ivan Kobzarev	2e6da36298	[android][ci] Fix CI packaging headers to aar (#40442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40442 Problem: Nightly builds do not include libtorch headers as local build. The reason is that on docker images path is different than local path when building with `scripts/build_pytorch_android.sh` Solution: Introducing gradle property to be able to specify it and add its specification to gradle build job and snapshots publishing job which run on the same docker image. Test: ci-all jobs check https://github.com/pytorch/pytorch/pull/40443 checking that gradle build will result with headers inside aar Test Plan: Imported from OSS Differential Revision: D22190955 Pulled By: IvanKobzarev fbshipit-source-id: 9379458d8ab024ee991ca205a573c21d649e5f8a	2020-06-23 16:41:12 -07:00
Ashkan Aliabadi	b9d3869df3	Unify PyTorch mobile's threadpool usage. (#37243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243 * Why * As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version. The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands. This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell. So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do. The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the exact same third party implementation in this PR. Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged. * How * This is where things get tricky. A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use. pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far. Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration. When it is all said or done, the layering will look like this: a) aten::parallel_for, uses b) caffe2::PThreadPool, which uses c) pthreadpool C API, which delegates to c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here. c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to c-2-1) caffe2::ThreadPool, and the rabbit hole ends here. NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b). Differential Revision: D21232894 Test Plan: Imported from OSS Reviewed By: dreiss Pulled By: AshkanAliabadi fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354	2020-06-23 16:34:51 -07:00
Ivan Kobzarev	9e5d62582c	[android][gradle] packaging headers in aars for publishing (#40392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40392 Test Plan: Imported from OSS Differential Revision: D22167757 Pulled By: IvanKobzarev fbshipit-source-id: 363319c64933382c0b0ddce65624fe5a4602da26	2020-06-22 16:56:39 -07:00
Ivan Kobzarev	0891764e80	[android] ANDROID_STL=c++_shared (#39588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39588 Before this diff we used c++_static linking. Users will dynamically link to libpytorch_jni.so and have at least one more their own shared library that probably uses stl library. We must have not more than one stl per app. ( https://developer.android.com/ndk/guides/cpp-support#one_stl_per_app ) To have only one stl per app changing ANDROID_STL way to c++_shared, that will add libc++_shared.so to packaging. Test Plan: Imported from OSS Differential Revision: D22118031 Pulled By: IvanKobzarev fbshipit-source-id: ea1e5085ae207a2f42d1fa9f6ab8ed0a21768e96	2020-06-18 13:50:05 -07:00
Ivan Kobzarev	d3b786afdb	[android] Add libtorch headers to pytorch_android aar (#39507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39507 Adding gradle task that will be run after `assemble` to add `headers` folder to the aar. Headers are choosed for the first specified abi, they should be the same for all abis. Adding headers works through temporary unpacking into gradle `$buildDir`, copying headers to it, zipping aar with headers. Test Plan: Imported from OSS Differential Revision: D22118009 Pulled By: IvanKobzarev fbshipit-source-id: 52e5b1e779eb42d977c67dba79e278f1922b8483	2020-06-18 13:47:18 -07:00
Ivan Kobzarev	928e99b9bb	[vulkan] jni build support USE_VULKAN (#39188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39188 Extracting Vulkan_LIBS and Vulkan_INCLUDES setup from `cmake/Dependencies.cmake` to `cmake/VulkanDependencies.cmake` and reuse it in android/pytorch_android/CMakeLists.txt Adding control to build with Vulkan setting env variable `USE_VULKAN` for `scripts/build_android.sh` `scripts/build_pytorch_android.sh` We do not use Vulkan backend in pytorch_android, but with this build option we can track android aar change with `USE_VULKAN` added. Currently it is 88Kb. Test Plan: Imported from OSS Differential Revision: D21770892 Pulled By: IvanKobzarev fbshipit-source-id: a39433505fdcf43d3b524e0fe08062d5ebe0d872	2020-05-28 15:39:02 -07:00
Ilia Cherniavskii	2d708cefcc	Move RecordFunction into ATen (#37548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548 Moving RecordFunction from torch::autograd::profiler into at namespace Test Plan: CI Imported from OSS Differential Revision: D21315852 fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa	2020-05-07 14:52:39 -07:00
Ilia Cherniavskii	800d5617c0	Recording of TorchScript functions (#34710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710 Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate. Test Plan: unit test (test_misc.cpp/testRecordFunction) Reviewed By: gdankel, dzhulgakov Differential Revision: D20158523 fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582	2020-03-31 00:33:23 -07:00
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
Ivan Kobzarev	f9cddff25a	[android] Preload module actions do only once (#32313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32313 `torch::autograd::profiler::pushCallback()`, `torch::jit::setPrintHandler` should be called only once, not before every loading `JITCallGuard guard;` not needed before loading module and has no effect Test Plan: Imported from OSS Differential Revision: D20559676 Pulled By: IvanKobzarev fbshipit-source-id: 70cce5d2dda20a00b378639725294cb3c440bad2	2020-03-20 20:06:25 -07:00
Jiakai Liu	6e47e7bf52	[pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites Summary: There are three guards related to mobile build: * AutoGradMode * AutoNonVariableTypeMode * GraphOptimizerEnabledGuard Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size. Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards: Full JIT: still set all three guards. More specifically: * OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI. * FB: Not covered by this diff (as we are using mobile interpreter for most internal builds). Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically: * OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites. * FB: JNI callsites: Use the unified LiteJITCallGuard. For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites. Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch. PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods. Test Plan: - CI Reviewed By: xta0 Differential Revision: D20498017 fbshipit-source-id: ba6740f66839a61790873df46e8e66e4e141c728	2020-03-18 17:19:35 -07:00
generatedunixname89002005287564	14c1ab049d	[Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D20415422 fbshipit-source-id: 860f8dd9dce0a2420792bafb7d3e58bd883ab7e4	2020-03-13 06:27:03 -07:00
Michael Suo	c235be42dd	[jit] kill script namespace (#34515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515 Once upon a time we thought this was necessary. In reality it is not, so removing it. For backcompat, our public interface (defined in `api/`) still has typedefs to the old `script::` names. There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph transform. I renamed one of them. Test Plan: Imported from OSS Differential Revision: D20353503 Pulled By: suo fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93	2020-03-11 23:32:48 -07:00
Jiakai Liu	7aca9afdfb	[pytorch] remove boilerplate setQEngine() from PyTorch mobile predictors (#34556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34556 According to https://github.com/pytorch/pytorch/pull/34012#discussion_r388581548, this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't really necessary for mobile. In Context.cpp it selects the last available QEngine if the engine isn't set explicitly. For OSS mobile prebuild it should only include QNNPACK engine so the default behavior should already be desired behavior. It makes difference only when USE_FBGEMM is set - but it should be off for both OSS mobile build and internal mobile build. Test Plan: Imported from OSS Differential Revision: D20374522 Pulled By: ljk53 fbshipit-source-id: d4e437a03c6d4f939edccb5c84f02609633a0698	2020-03-11 00:55:14 -07:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Ashkan Aliabadi	6aecfd1e80	Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c	2020-02-24 21:58:56 -08:00
Ashkan Aliabadi	039dc90854	Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration. Test Plan: revert-hammer Differential Revision: D19521853 Original commit changeset: 99a1fab31d0e fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6	2020-02-23 22:07:19 -08:00
Ashkan Aliabadi	941b42428a	Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509 ) Summary: In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Reviewed By: dreiss Differential Revision: D19521853 Pulled By: AshkanAliabadi fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa	2020-02-23 19:08:42 -08:00
Andres Suarez	b28a834813	[codemod][lint][fbcode] Apply google-java-format Test Plan: Sandcastle. Visual inspection. Reviewed By: scottrice Differential Revision: D19878711 fbshipit-source-id: be56f70b35825140676be511903e5274d1808f25	2020-02-13 12:14:14 -08:00
Ivan Kobzarev	eab99ab08e	[android] fbjni DoNotStrip annotation for oss native methods (#32567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32567 As a first change to support proguard. even if these methods could be not called from java, on jni level we register them and this registration will fail if methods are stripped. Adding DoNotStrip to all native methods that are registered in OSS. After integration of consumerProguardFiles in fbjni that prevents stripping by proguard DoNotStrip it will fix errors with proguard on. Test Plan: Imported from OSS Differential Revision: D19624684 Pulled By: IvanKobzarev fbshipit-source-id: cd7d9153e9f8faf31c99583cede4adbf06bab507	2020-01-29 11:52:53 -08:00
David Reiss	e4f43bf7a5	Set rpath for JNI library on Mac (#32247 ) Summary: Without this, dlopen won't look in the proper directory for dependencies (like libtorch and fbjni). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32247 Test Plan: Build libpytorch_jni.dylib on Mac, replaced the one from the libtorch nightly, and was able to run the Java demo. Differential Revision: D19501498 Pulled By: dreiss fbshipit-source-id: 13ffdff9622aa610f905d039f951ee9a3fdc6b23	2020-01-21 11:30:39 -08:00
Zachary DeVito	7e3c438913	Renaming IValue List functions (#32093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093 toGenericListRef -> toListRef isGenericList -> isList toGenericList -> toList toXListRef -> toXVector Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D19369767 Pulled By: zdevito fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae	2020-01-17 15:17:45 -08:00
Ivan Kobzarev	104b2c610b	Tensor prep from image in native (#31426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31426 Tensor convertion from YUV image is moved to native with optimizations to eliminate branching inside loop, no variables declaration, less ops. Perf stat from local devices - measuring converting 320x240 image from camera to 1,3,224,224 tensor; Legend: Java - current java impl JavaOpt - current java impl + the same optimizations with no if/else in for, declare variables outside of for, inlining etc. C - C impl ``` Nexus 5 JavaOpt N:25 avg:119.24 min: 87 max:177 p10:102 p25:105 p50:115 p75:127 p90:150 C N:25 avg: 17.24 min: 14 max: 39 p10: 14 p25: 15 p50: 15 p75: 16 p90: 23 Java N:25 avg:139.96 min: 70 max:214 p10: 89 p25:110 p50:139 p75:173 p90:181 avg C vs JavaOpt 6.91x Pixel 3 XL JavaOpt N:19 avg: 16.11 min: 12 max: 19 p10: 14 p25: 15 p50: 16 p75: 18 p90: 19 C N:19 avg: 5.79 min: 3 max: 10 p10: 4 p25: 5 p50: 6 p75: 6 p90: 9 Java N:19 avg: 16.21 min: 12 max: 20 p10: 14 p25: 15 p50: 16 p75: 18 p90: 20 avg C vs JavaOpt 2.78x Full build with 4 abis inside: Pixel 3 XL JavaOpt N:25 avg: 18.84 min: 16 max: 24 p10: 16 p25: 17 p50: 18 p75: 20 p90: 22 C N:25 avg: 7.96 min: 5 max: 10 p10: 7 p25: 7 p50: 8 p75: 9 p90: 9 avg C vs JavaOpt 2.36x ``` Test Plan: Imported from OSS Differential Revision: D19165429 Pulled By: IvanKobzarev fbshipit-source-id: 3b54e545f6fbecbc5bb43216aca81061e70bd369	2020-01-15 17:10:00 -08:00
Ivan Kobzarev	de5821d291	Torchscript print to logcat (#31456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31456 External request https://discuss.pytorch.org/t/jit-android-debugging-the-model/63950 By default torchscript print function goes to stdout. For android it is not seen in logcat by default. This change propagates it to logcat. Test Plan: Imported from OSS Differential Revision: D19171405 Pulled By: IvanKobzarev fbshipit-source-id: f9c88fa11d90bb386df9ed722ec9345fc6b25a34	2020-01-15 16:44:56 -08:00
David Reiss	4daa3dedbe	Fix IValue.isList Summary: I think this was wrong before? Test Plan: Not sure. Reviewed By: IvanKobzarev Differential Revision: D19221358 fbshipit-source-id: 27e675cac15dde29e026305f4b4e6cc774e15767	2020-01-07 16:33:36 -08:00
David Reiss	1b4d3d5748	Properly return data from non-contiguous tensors in Java Summary: These were returning incorrect data before. Now we make a contiguous copy before converting to Java. Exposing raw data to the user might be faster in some cases, but it's not clear that it's worth the complexity and code size. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221361 fbshipit-source-id: 22ecdad252c8fd968f833a2be5897c5ae483700c	2020-01-07 16:33:31 -08:00
David Reiss	2d6a2c898c	Support tensors with a storage offset in Java (#31584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31584 These were returning incorrect data before. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221360 fbshipit-source-id: b3f01de086857027f8e952a1c739f60814a57acd	2020-01-07 16:33:26 -08:00
David Reiss	6d1fa8296b	Support tensors with empty shape in Java Summary: These are valid tensors. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221362 fbshipit-source-id: fa9af2fc539eb7381627b3d473241a89859ef2ba	2020-01-07 16:33:21 -08:00
Ivan Kobzarev	492ca46e71	Fix androidTest - exclude host tests from it Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31522 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D19200861 Pulled By: IvanKobzarev fbshipit-source-id: a6024f3013398f9e0d237e06c984a20493d42f11	2020-01-06 11:29:46 -08:00
Ivan Kobzarev	3a19980b78	Tensor class created from java does not call native methods Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31520 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D19199477 Pulled By: IvanKobzarev fbshipit-source-id: ba51454586a9385dba4ab73936f907346e0105d1	2019-12-20 14:40:54 -08:00
David Reiss	35b249769d	Exclude lite interpreter Java files from OSS host build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31204 Test Plan: Imported from OSS Differential Revision: D19200610 Pulled By: dreiss fbshipit-source-id: 0cf41c99b4c2604afc2dccfebbea213c0e1f9638	2019-12-20 13:32:27 -08:00
Ivan Kobzarev	930d0751e6	Java Tensor hybrid, owns at::Tensor, no memcopy for java outputs. (#30501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30501 Motivation: In current state output of libtorch Module forward,runMethod is mem copied to java ByteBuffer, which is allocated, at least in some versions of android, on java heap. That could lead to intensive garbage collection. Change: Output java tensor becomes owner of output at::Tensor and holds it (as `pytorch_jni::TensorHybrid::tensor_` field) alive until java part is not destroyed by GC. For that org.pytorch.Tensor becomes 'Hybrid' class in fbjni naming and starts holding member field `HybridData mHybridData;` If construction of it starts from java side - java constructors of subclasses (we need all the fields initialized, due to this `mHybridData` is not declared final, but works as final) call `this.mHybridData = super.initHybrid();` to initialize cpp part (`at::Tensor tensor_`). If construction starts from cpp side - cpp side is initialiaed using provided at::Tensor with `makeCxxInstance(std::move(tensor))` and is passed to java method `org.pytorch.Tensor#nativeNewTensor` as parameter `HybridData hybridData`, which holds native pointer to cpp side. In that case `initHybrid()` method is not called, but parallel set of ctors of subclasses are used, which stores `hybridData` in `mHybridData`. Renaming: `JTensor` -> `TensorHybrid` Removed method: `JTensor::newAtTensorFromJTensor(JTensor)` becomes trivial `TensorHybrid->cthis()->tensor()` Test Plan: Imported from OSS Differential Revision: D18893320 Pulled By: IvanKobzarev fbshipit-source-id: df94775d2a010a1ad945b339101c89e2b79e0f83	2019-12-15 21:36:20 -08:00
Ivan Kobzarev	701e05dcbb	Buck test targets robolectric,instrumentattion Summary: Buck targets for robolectric and instrumentation tests for pytorch android: ``` buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:test_host ``` ``` buck test //xplat/caffe2/android:test_instrumentation ``` For both: ``` buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:pytorch ``` Models in assets: `pt_android_test_asset` - creates buck target that can be included in both robolectric and instrumentation tests that contains asset created from provided torchscript sources as separate file, using the latest binaries of libtorch. `pt_gen_test_asset_bin` does that tacing, usage format ``` generate_test_asset input_file.jit output_file.py ``` Example of test-host setup for users of pytorch android: robolectric tests: ``` load("fbsource//xplat/caffe2:pt_defs.bzl", "pt_android_test_asset", "pt_predictor_binary", "PT_ANDRIOID_TEST_HOST_JNI_DEPS") pt_android_test_asset( name = "test_asset", src = "test_asset.jit", asset_name = "test_asset.pt", ) robolectric3_test( name = "example_test_host", srcs = [...], jni_deps = PT_ANDRIOID_TEST_HOST_JNI_DEPS, deps = [ ":pytorch_common", ":test_asset", "//fbandroid/java/com/facebook/soloader/annotation:annotation", "//fbandroid/java/com/facebook/testing/robolectric/v3:v3", "//fbandroid/libraries/soloader/java/com/facebook/soloader:soloader", "//fbandroid/third-party/java/robolectric3/robolectric:robolectric", ], ) ``` COMMON_LINKER_FLAGS = ["-Wl,--no-as-needed"] can not be applied on MacOs Test Plan: ``` [twsvcscm@od0187.atn1 /data/sandcastle/boxes/fbsource (b416b20a)]$ buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:pytorch Parsing buck files: finished in 7.2 sec Creating action graph: finished in 0.7 sec Building: finished in 11.9 sec (100%) 791/791 jobs, 0 updated Total time: 19.9 sec Testing: finished in 11.0 sec (30 PASS/0 FAIL) RESULTS FOR //xplat/caffe2/android:test_host //xplat/caffe2/android:test_instrumentation PASS 159ms 15 Passed 0 Skipped 0 Failed org.pytorch.PytorchHostTests PASS 152ms 15 Passed 0 Skipped 0 Failed org.pytorch.PytorchInstrumentedTests (localhost:31930) TESTS PASSED ``` OSS changes test: ``` gradle -p android pytorch_android:cAT passes ``` Reviewed By: dreiss Differential Revision: D18799005 fbshipit-source-id: 881609826a837efebc8526aee40355c5a62947d0	2019-12-14 20:29:52 -08:00
Ivan Kobzarev	065685180d	Loading module from android asset (#30378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30378 Loading module directly from android assets. Iteration on https://github.com/pytorch/pytorch/pull/30109 Loading Module: ``` mModule = AndroidUtils.loadModuleFromAsset(assetName, getAssets()); ``` `org.pytorch.AndroidUtils` is excluded from pytorch_jni host build Testing: test_app module load switched to this approach and works fine ``` gradle test_app:installMobNet2QuantDebug -PABI_FILTERS=x86 && adb shell am start -n org.pytorch.testapp.mobNet2Quant/org.pytorch.testapp.MainActivity ``` Test Plan: Imported from OSS Differential Revision: D18893269 Pulled By: IvanKobzarev fbshipit-source-id: a7c73776f40e9c67bef233da05db56cc6efbe76a	2019-12-14 20:29:37 -08:00
Ivan Kobzarev	f7c92f60ba	Typo in filename align with classname Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31235 Test Plan: Imported from OSS Differential Revision: D19001793 Pulled By: IvanKobzarev fbshipit-source-id: ae7f410be6b3c291f1feb3027b5b4a6b7ce15ab3	2019-12-12 23:16:29 -08:00
Ivan Kobzarev	db90a5b992	Switch to open sourced fbjni (#30175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30175 fbjni was opensourced and java part is published as 'com.facebook.fbjni:fbjni-java-only:0.0.3' switching to it. We still need submodule fbjni inside the repo (which is already pointing to https://github.com/facebookincubator/fbjni) for so linking. Packaging changes: before that `libfbjni.so` came from pytorch_android_fbjni dependency, as we also linked fbjni in `pytorch_android/CMakeLists.txt` - it was built in pytorch_android, but excluded for publishing. As we had 2 libfbjni.so there was a hack to exclude it for publishing and resolve duplication locally. ``` if (rootProject.isPublishing()) { exclude '/libfbjni.so' } else { pickFirst '/libfbjni.so' } ``` After this change fbjni.so will be packaged inside pytorch_android.aar artefact and we do not need this gradle logic. I will update README in separate PR after landing previous PR to readme(https://github.com/pytorch/pytorch/pull/30128) to avoid conflicts Test Plan: Imported from OSS Differential Revision: D18982235 Pulled By: IvanKobzarev fbshipit-source-id: 5097df2557858e623fa480625819a24a7e8ad840	2019-12-12 20:05:22 -08:00
Ivan Kobzarev	ca8cb3241a	Expose setNumThreads to android api (#31205 ) Summary: PR https://github.com/pytorch/pytorch/pull/31033 was unlanded due to macos build failure: https://app.circleci.com/jobs/github/pytorch/pytorch/3916388 This PR has changes that `setNumThreads` is only for android and moved to separate class `org.pytorch.PytorchAndroid` as a static function which is better as it has global effect Pull Request resolved: https://github.com/pytorch/pytorch/pull/31205 Reviewed By: dreiss Differential Revision: D18977250 Pulled By: IvanKobzarev fbshipit-source-id: 4995859808af498c82933c4db52bd7c7dfae90e5	2019-12-12 18:57:27 -08:00
Michael Suo	c0bcfd0445	Revert D18923167: Expose setNumThreads to android api Test Plan: revert-hammer Differential Revision: D18923167 Original commit changeset: 8d98c2edbff4 fbshipit-source-id: 7db37cff298c511d0dd9eb373811c769e4a73be9	2019-12-12 09:23:58 -08:00
Ivan Kobzarev	6225443009	Expose setNumThreads to android api (#31033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31033 Intention: There are requests from users to control number of threads from android side: https://discuss.pytorch.org/t/android-pytorch-forward-method-running-in-a-separate-thread-slow-down-ui-thread/63516/2 https://discuss.pytorch.org/t/threading-of-model-pytorch-android/62490/2 At the moment `setNumThreads` is placed in `org.pytorch.Module`, but this method changes global threadPool size, in future we will move it to some separate class to repeat python binding structure, which has torch.set_num_threads() Test Plan: Imported from OSS Differential Revision: D18923167 Pulled By: IvanKobzarev fbshipit-source-id: 8d98c2edbff42e9b673509672dce3f2dd03a923e	2019-12-11 14:20:14 -08:00

1 2 3

109 Commits