pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Rong Rong	147a48fb27	[cmake] clean up cmake/Utils.cmake (#47923 ) Summary: Consolidate into cmake/public/utils.cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/47923 Reviewed By: samestep Differential Revision: D24955961 Pulled By: walterddr fbshipit-source-id: 9d5f6af2b353a8c6f6d521c841fd0989393755cd	2020-11-16 08:12:32 -08:00
Jiakai Liu	8e3af9faa8	[pytorch] fix debug symbol flag for android clang (#46331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46331 Fix the android build size issue #46246. Test Plan: Imported from OSS Reviewed By: dhruvbird Differential Revision: D24390061 Pulled By: ljk53 fbshipit-source-id: b4a6f297e89b9c08dff4297c6a41aabd41d9fff5	2020-11-10 14:55:43 -08:00
Ashkan Aliabadi	6cd8b5e9a7	Provide CMake option to enable Vulkan API. (#46503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24379144 Pulled By: AshkanAliabadi fbshipit-source-id: 8d8c57f96bbac2a44615828a3474c912704f3a85	2020-10-20 18:45:52 -07:00
Pritam Damania	cb3c1d17e4	Promote -Wcast-function-type to an error in builds. (#46356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46356 Adding the flag `-Werror=cast-function-type` to ensure we don't allow any invalid casts (ex: PyCFunction casts). For more details see: https://github.com/pytorch/pytorch/issues/45419 ghstack-source-id: 114632980 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D24319759 fbshipit-source-id: 26ce4650c220e8e9dd3550245f214c7e6c21a5dc	2020-10-20 18:09:06 -07:00
Tao Xu	495070b388	[Metal] Add the Python binding for optimize_for_mobile (#46456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46456 Add the python binding in CMake. The general workflow is - Build pytorch - `USE_PYTORCH_METAL=ON python setup.py install --cmake` - Run optimize_for_mobile ``` import torch from torch.utils.mobile_optimizer import optimize_for_mobile scripted_model = torch.jit.load('./mobilenetv2.pt') optimized_model = optimize_for_mobile(scripted_model, backend='metal') torch.jit.export_opnames(optimized_model) torch.jit.save(optimized_model, './mobilenetv2_metal.bc') ``` The exported ops are ``` ['aten::adaptive_avg_pool2d', 'aten::add.Tensor', 'aten::addmm', 'aten::reshape', 'aten::size.int', 'metal::copy_to_host', 'metal_prepack::conv2d_run'] ``` ghstack-source-id: 114559878 Test Plan: - Sandcastle CI - Circle CI Reviewed By: kimishpatel Differential Revision: D24356768 fbshipit-source-id: fb5c4c4b6316347b67edb4132da044a81470ddfd	2020-10-17 10:26:25 -07:00
Tao Xu	04e5fcc0ed	[GPU] Introduce USE_PYTORCH_METAL (#46383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46383 The old `USE_METAL` is actually being used by Caffe2. Here we introduce a new macro to enable metal in pytorch. ghstack-source-id: 114499392 Test Plan: - Circle CI - The Person Segmentation model works Reviewed By: linbinyu Differential Revision: D24322018 fbshipit-source-id: 4e5548afba426b49f314366d89b18ba0c7e745ca	2020-10-16 18:19:32 -07:00
Michael Ranieri	b1d24dded1	make a way to disable callgrind (#46116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46116 Ideally I would just use one of the existing preprocessor flags such as `FBCODE_CAFFE2`, but this implies a whole bunch of other things elsewhere, so it is not really a solution for ovrsource. Test Plan: CI green, we are able to disable it internally with `-DNVALGRIND` Reviewed By: malfet Differential Revision: D24227360 fbshipit-source-id: 24a3b393cf46d6a16acca0a9ec52610d4bb8704f	2020-10-13 16:18:04 -07:00
Tao Xu	a277c097ac	[iOS][GPU] Add Metal/MPSCNN support on iOS (#46112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112 ### Summary This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta. allow-large-files - Users API ``` auto module = torch::jit::load(model); module.eval(); at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal(); auto output = module.forward({input}).toTensor().cpu(); ``` - Supported Models - Person Segmentation v106 (FB Internal) - Mobilenetv2 - Supported Operators - aten::conv2d - aten::addmm - aten::add.Tensor - aten::sub.Tensor - aten::mul.Tensor - aten::relu - aten::hardtanh - aten::hardtanh_ - aten::sigmoid - aten::max_pool2d - aten::adaptive_avg_pool2d - aten::reshape - aten::t - aten::view - aten::log_softmax.int - aten::upsample_nearest2d.vec - Supported Devices - Apple A9 and above - iOS 10.2 and above - CMake scripts - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON` ### Test Plan - Circle CI ghstack-source-id: 114155638 Test Plan: 1. Sandcastle CI 2. Circle CI Reviewed By: dreiss Differential Revision: D23236555 fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625	2020-10-13 01:46:56 -07:00
gunandrose4u	ffd50b8220	SET USE_DISTRIBUTED OFF when libuv is not installed (#45554 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45554 Reviewed By: izdeby Differential Revision: D24016825 Pulled By: mrshenli fbshipit-source-id: 332d860429626a915c06f98cad31e6db1cbc4eb1	2020-09-30 12:46:36 -07:00
gunandrose4u	0a38aed025	Auto set libuv_ROOT env var for Gloo submodule on Windows platform (#45484 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45484 Reviewed By: lw Differential Revision: D23990724 Pulled By: mrshenli fbshipit-source-id: 1987ce7eb7d3f9d3120c07e954cd6581cd3caf59	2020-09-29 08:58:56 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
Ivan Kobzarev	6debe825be	[vulkan] glsl shaders relaxed precision mode to cmake option (#43076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43076 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23143354 Pulled By: IvanKobzarev fbshipit-source-id: 7b3ead1e63cf8acf6e8e547080a8ead7a2db994b	2020-09-16 12:51:34 -07:00
peter	ed862d3682	Split CUDA_NVCC_FLAGS by space (#44603 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603 Reviewed By: albanD Differential Revision: D23692320 Pulled By: ezyang fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754	2020-09-14 20:25:37 -07:00
Marcin Juszkiewicz	e261e0953e	Fix centos8 gcc (#44644 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44198 properly this time Pull Request resolved: https://github.com/pytorch/pytorch/pull/44644 Reviewed By: albanD Differential Revision: D23684909 Pulled By: malfet fbshipit-source-id: cea6f6e2ae28138f6b93a6513d1abd36d14ae573	2020-09-14 12:28:09 -07:00
Marcin Juszkiewicz	566b8d0650	handle missing NEON vst1__x2 intrinsics (#44198 ) (#44199 ) Summary: CentOS 8 on AArch64 has vld1_ intrinsics but lacks vst1q_f32_x2 one. This patch checks for it and handle it separately to vld1_* ones. Fixes https://github.com/pytorch/pytorch/issues/44198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44199 Reviewed By: seemethere Differential Revision: D23641273 Pulled By: malfet fbshipit-source-id: c2053c8e0427705eaeeeb82ec030925bff22623a	2020-09-11 16:02:44 -07:00
Yujun	db24c5c582	Change code coverage option name (#43999 ) Summary: According to [documentation](https://github.com/pytorch/pytorch/blob/master/tools/setup_helpers/cmake.py#L265), only options starts with `BUILD_` / `USE_` / `CMAKE_` in `CMakeLists.txt` can be imported by environment variables. --- This diff is originally intended to enable `c++` source coverage with `CircleCI` and `codecov.io`, but we will finish it in the future. You can find the related information in the diff history. Following is the originally procedur: Based on [this pull request](`1bda5e480c`), life becomes much easier for this time. 1.in `build.sh` - Enable coverage builld option for c++ - `apt-get install lcov` 2.in `test.sh` - run `lcov` 3.in `pytorch-job-specs.yml` - copy coverage.info to `test/` folder and upload it to codecov.io Pull Request resolved: https://github.com/pytorch/pytorch/pull/43999 Test Plan: Test on github Reviewed By: malfet Differential Revision: D23464656 Pulled By: scintiller fbshipit-source-id: b2365691f04681d25ba5c00293fbcafe8e8e0745	2020-09-11 15:55:05 -07:00
Bram Wasti	6512032699	[Static Runtime] Add OSS build for static runtime benchmarks (#43881 ) Summary: Adds CMake option. Build with: ``` BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43881 Reviewed By: hlu1 Differential Revision: D23430708 Pulled By: bwasti fbshipit-source-id: a39bf54e8d4d044a4a3e4273a5b9a887daa033ec	2020-09-02 08:00:18 -07:00
Sebastian Pop	c259146477	add missing NEON {vld1,vst1}_*_x2 intrinsics (#43683 ) Summary: Workaround for issue https://github.com/pytorch/pytorch/issues/43265. Add the missing intrinsics until gcc-7 gets the missing patches backported. Fixes https://github.com/pytorch/pytorch/issues/43265. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43683 Reviewed By: albanD Differential Revision: D23467867 Pulled By: malfet fbshipit-source-id: 7c138dd3de3c45852a60f2cfe8b4d7f7cf76bc7e	2020-09-01 21:19:39 -07:00
Rong Rong	8ca3913f47	Introduce BUILD_CAFFE2 flag (#43673 ) Summary: introduce BUILD_CAFFE2 flag. default to `ON`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43673 Reviewed By: malfet Differential Revision: D23381035 Pulled By: walterddr fbshipit-source-id: 1f4582987fa0c4a911f0b18d311c04fdbf8dd8f0	2020-09-01 10:18:23 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Ann Shan	0dc41ff465	[pytorch] add flag for autograd ops to mobile builds (#43154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43154 Adds the build flag `BUILD_MOBILE_AUTOGRAD` which toggles whether autograd files should be included for a PyTorch mobile build (default off). ghstack-source-id: 110369406 Test Plan: CI Reviewed By: ljk53 Differential Revision: D23061913 fbshipit-source-id: bc3d6683ab17f158990d83e4fae0a011d5adeca1	2020-08-20 12:39:55 -07:00
Xiang Gao	ee74c2e5be	Compress fatbin to fit into 32bit indexing (#43074 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39968 tested with `TORCH_CUDA_ARCH_LIST='3.5 5.2 6.0 6.1 7.0 7.5 8.0+PTX'`, before this PR, it was failing, and with this PR, the build succeed. With `TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0+PTX'`, `libtorch_cuda.so` with symbols changes from 2.9GB -> 2.2GB cc: ptrblck mcarilli jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43074 Reviewed By: mrshenli Differential Revision: D23176095 Pulled By: malfet fbshipit-source-id: 7b3e6d049fc080e519f21e80df05ef68e7bea57e	2020-08-18 09:48:54 -07:00
Nikita Shulga	0cf4a5bccb	Add GCC codecoverage flags (#43066 ) Summary: Rename `CLANG_CODE_COVERAGE` option to `CODE_COVERAGE` and add compiler specific flags for GCC and Clang Pull Request resolved: https://github.com/pytorch/pytorch/pull/43066 Reviewed By: scintiller Differential Revision: D23137488 Pulled By: malfet fbshipit-source-id: a89570469692f878d84f7da6f9d5dc01df423e80	2020-08-14 17:16:18 -07:00
Nikita Shulga	ea65a56854	Use `string(APPEND FOO " bar")` instead of `set(FOO "${FOO} bar") (#42844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42844 Reviewed By: scintiller Differential Revision: D23067577 Pulled By: malfet fbshipit-source-id: e4380ce02fd6aca37c955a7bc24435222c5d8b19	2020-08-12 10:33:11 -07:00
Yujun Zhao	7524699d58	Modify clang code coverage to CMakeList.txt (for MacOS) (#42837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42837 Originally we use ``` list(APPEND CMAKE_C_FLAGS -fprofile-instr-generate -fcoverage-mapping) list(APPEND CMAKE_CXX_FLAGS -fprofile-instr-generate -fcoverage-mapping) ``` But when compile project on mac with Coverage On, it has the error: `clang: error: no input files /bin/sh: -fprofile-instr-generate: command not found /bin/sh: -fcoverage-mapping: command not found` The reason behind it, is `list(APPEND CMAKE_CXX_FLAGS` will add an additional `;` to the variable. This means, if we do `list(APPEND foo a)` and then `list(APPEND foo b)`, then `foo` will be `a;b` -- with the additional `;`. Since we have `CMAKE_CXX_FLAGS` defined before in the `CMakeList.txt`, we can only use `set(...)` here After changing it to ``` set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fprofile-instr-generate -fcoverage-mapping") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fprofile-instr-generate -fcoverage-mapping") ``` Test successufully in local mac machine. Test Plan: Test locally on mac machine Reviewed By: malfet Differential Revision: D23043057 fbshipit-source-id: ff6f4891b35b7f005861ee2f8e4c550c997fe961	2020-08-11 09:57:55 -07:00
Khalid Almufti	b282297559	Replace whitelist with allowlist (#42067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41757 I've replaced all the whitelist with allowlist for this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067 Reviewed By: pbelevich Differential Revision: D22791690 Pulled By: malfet fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4	2020-07-28 08:01:16 -07:00
Edward Yang	befb22790f	Fix a number of deprecation warnings (#40179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40179 - Pass no-psabi to shut up GCC about # Suppress "The ABI for passing parameters with 64-byte alignment has changed in GCC 4.6" - Fix use of deprecated data() accessor (and minor optimization: hoist accessor out of loop) - Undeprecate NetDef.num_workers, no one is serious about fixing these - Suppress warnings about deprecated pthreadpool types Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22234138 Pulled By: ezyang fbshipit-source-id: 6a1601b6d7551a7e6487a44ae65b19acdcb7b849	2020-07-14 09:11:34 -07:00
Kimish Patel	d6feb6141f	[Vec256][neon] Add neon backend for vec256 (#39341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341 This PR introduces neon backend for vec256 class for float datatype. For now only aarch64 is enabled due to few issues with enabling in aarch32 bit. Test Plan: vec256_test Imported from OSS Differential Revision: D21822399 fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d	2020-07-09 16:25:09 -07:00
Kimish Patel	bddba1e336	Add benchmark for add op. (#40059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059 This benchmark is added specifically for mobile to see if compiler is autovectorizing and thus we have no advantage of neon backend for vec256 for add op. Test Plan: CI Imported from OSS Differential Revision: D22055146 fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5	2020-07-09 16:22:55 -07:00
Yujun Zhao	22f940b7bd	add clang code coverage compile flags (#41103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41103 add a CLANG_CODE_COVERAGE option to CMakeList. If the option is ON, add code coverage needed compile flags. Test Plan: Clone pytorch source code to local, modified these changes and builded it with `CLANG_CODE_COVERAGE ON` and `BUILD_TESTS ON`. Run a manual test and attach code coverage report. {F243609020} Reviewed By: malfet Differential Revision: D22422513 fbshipit-source-id: 27a31395c31b5b5f4b72523954722771d8f61080	2020-07-09 14:14:18 -07:00
David Reiss	b7e044f0e5	Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5	2020-06-23 19:26:21 -07:00
Kate Mormysh	92d3182c11	Revert D21232894: Unify PyTorch mobile's threadpool usage. Test Plan: revert-hammer Differential Revision: D21232894 (`b9d3869df3`) Original commit changeset: 8b3de86247fb fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd	2020-06-23 17:09:14 -07:00
Ashkan Aliabadi	b9d3869df3	Unify PyTorch mobile's threadpool usage. (#37243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243 * Why * As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version. The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands. This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell. So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do. The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the exact same third party implementation in this PR. Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged. * How * This is where things get tricky. A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use. pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far. Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration. When it is all said or done, the layering will look like this: a) aten::parallel_for, uses b) caffe2::PThreadPool, which uses c) pthreadpool C API, which delegates to c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here. c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to c-2-1) caffe2::ThreadPool, and the rabbit hole ends here. NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b). Differential Revision: D21232894 Test Plan: Imported from OSS Reviewed By: dreiss Pulled By: AshkanAliabadi fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354	2020-06-23 16:34:51 -07:00
Ivan Kobzarev	74a2cb87e3	[android][cmake] Remove NO_EXPORT for libtorch mobile build (#39584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39584 Removing `-DNO_EXPORT` for not-custom-build to be able to link to C10/A10 api. Custom build stays the same as its main goal is to have minimum binary size, while export api functions will increase it. Additional changes: 1. aten/src/ATen/DynamicLibrary.cpp uses libdl, if we need this functionality we will need to link result with libdl, but currently disabling this functionality for mobile. Test Plan: Imported from OSS Differential Revision: D22111600 Pulled By: IvanKobzarev fbshipit-source-id: d730201c55f543c959a596b34be532aecee6b9ab	2020-06-18 11:48:53 -07:00
peter	0f39ed86a7	Cleanup debug info switches with MSVC (#39703 ) Summary: Switch off `/Z7` so that we don't generate debug info in Release and MinSizeRel builds, so that we will probably get smaller static libraries and object files and faster build time Pull Request resolved: https://github.com/pytorch/pytorch/pull/39703 Differential Revision: D21960684 Pulled By: ezyang fbshipit-source-id: 909a237a138183591d667885b13fc311470eed65	2020-06-09 14:11:40 -07:00
Hong Xu	89c0efb30b	Also set CMAKE_C_STANDARD for MSVC (#39304 ) Summary: According to <https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/Compiler/MSVC-C.cmake>, the option simply has no effect for MSVC as of today. It is better to not impose such an if condition as it is a bit misleading (the current code makes it look like we have compatibility issues with MSVC C11 support), and also it's better to leave the judgment of MSVC C support to CMake devs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39304 Differential Revision: D21846032 Pulled By: malfet fbshipit-source-id: 962e5721da3d7b9be4117b42bdc35df426b7da7b	2020-06-02 13:59:07 -07:00
Ivan Kobzarev	b460465a18	[Mobile GPU][Integration] Vulkan backend integration (#36491 ) Summary: This PR contains the initial version of Vulkan (GPU) Backend integration. The primary target environment is Android, but the desktop build is also supported. ## CMake Introducing three cmake options: USE_VULKAN: The main switch, if it is off, all other options do not affect. USE_VULKAN_WRAPPER: ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h. OFF - linking with libvulkan.so directly USE_VULKAN_SHADERC_RUNTIME: ON - Shader compilation library will be linked, and shaders will be compiled runtime. OFF - Shaders will be precompiled and shader compilation library is not included. ## Codegen if `USE_VULKAN_SHADERC_RUNTIME` is ON: Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp. if `USE_VULKAN_SHADERC_RUNTIME` is OFF: The source of shaders is included as `glsl.h`,`glsl.cpp`. All codegen results happen in the build directory. ## Build dependencies cmake/Dependencies.cmake If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK. Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it. (Desktop build was tested only on Linux). ## Pytorch integration: Adding 'Vulkan" as new Backend, DispatchKey, DeviceType. We are using Strided layout without supporting strides at the moment, but we plan to support them in the future. Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor, more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h` Main code location: `aten/src/ATen/native/vulkan` `aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor. `aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops. `aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API ## GLSL shaders Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files. All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3 ## Supported operations Code point: conv2d no-groups conv2d depthwise addmm upsample nearest 2d clamp hardtanh ## Testing `aten/src/ATen/test/vulkan_test.cpp` - contains tests for copy from CPU to Vulkan and back all supported operations Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader ## Vulkan execution The initial implementation is trivial and waits every operator's execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491 Differential Revision: D21696709 Pulled By: IvanKobzarev fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa	2020-05-26 08:30:13 -07:00
Gregory Chanan	b27be3e0c5	Avoid double dispatch in logical_not for compilation speed reasons. (#38565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38565 Also note this turns on "-Wno-unused-local-typedefs" because we are using dispatch macros for error checking. Test Plan: Imported from OSS Differential Revision: D21598478 Pulled By: gchanan fbshipit-source-id: 28f9ad01bd678df0601a10d0daf3ed31c47c4ab2	2020-05-18 09:25:54 -07:00
Nikita Shulga	dc918162b7	Remove `Caffe2_MAIN_LIBS` (#38408 ) Summary: Right now it is an unused alias to `torch_library` interface library Pull Request resolved: https://github.com/pytorch/pytorch/pull/38408 Differential Revision: D21598250 Pulled By: malfet fbshipit-source-id: ec9a2446b94e7ea68298831212005c2c80bbc95c	2020-05-15 12:27:15 -07:00
Wojciech Baranowski	945672bf3e	cmake: improve dependencies in incremental builds (#37661 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26304 Test procedure: With ninja: [x] Build a clean checkout [x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. [x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files [x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding. [x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. Without ninja: [x] Build a clean checkout [x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661 Differential Revision: D21434624 Pulled By: ezyang fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338	2020-05-06 14:25:18 -07:00
Brian Vaughan	d4edbbd396	Revert D21369541: Make a separate cmake option for caffe2 tests Test Plan: revert-hammer Differential Revision: D21369541 Original commit changeset: 669cff70c5b5 fbshipit-source-id: 500d261eaf3f02bcd698d343480b9e951e2844b9	2020-05-05 06:30:52 -07:00
Michael Suo	aff92ef3d6	Make a separate cmake option for caffe2 tests (#37721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37721 Even though we disabled caffe2 test configs in Python, the BUILD_TEST option was still building caffe2 test cpp binaries and various CI configurations were running them (since they just run every binary in `torch/test`). This PR adds a caffe2-specific BUILD_TEST option (BUILD_CAFFE2_TEST), which defaults to OFF, and gates the compilation of caffe2 test cpp binaries under it. Test Plan: Imported from OSS Differential Revision: D21369541 Pulled By: suo fbshipit-source-id: 669cff70c5b53f016e8e016bcb3a99bf3617e1f9	2020-05-04 23:26:27 -07:00
Lucas Hosseini	8a30553738	[TensorPipe/RPC] Add TensorPipe dependency (#36695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36695 Reviewed By: lw Differential Revision: D21312297 Pulled By: beauby fbshipit-source-id: 39fdc3de91efa4ac97dd169f09fb304b273b0050	2020-04-30 11:05:15 -07:00
Mo Zhou	69e2f1aaff	[cmake] add HAVE_SOVERSION option (default=OFF). (#37502 ) Summary: This is useful for linux distributions when the ABI/API of libtorch has been changed. The default SOVERSION is set to "${TORCH_VERSION_MAJOR}.${TORCH_VERSION_MINOR}". ezyang But if the release strategy of pytorch/caffe2 involves avoiding breaking API/ABI changes to libtorch for minor/patch releases, then we can set `TORCH_SOVERSION` to simply `TORCH_VERSION_MAJOR`. Please confirm that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37502 Differential Revision: D21303565 Pulled By: ezyang fbshipit-source-id: 798f5ec7fc5f0431ff1a7f9e8e5d3a0d3b25bb22	2020-04-30 06:52:33 -07:00
Mo Zhou	58a46a174e	[cmake] add USE_SYSTEM_{XNNPACK,ONNX} options. (#37501 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37501 Differential Revision: D21303527 Pulled By: ezyang fbshipit-source-id: 58353d78c66e5bcc9198ce8cde36ac7232bb4b2f	2020-04-29 09:26:16 -07:00
peter	c5d6f59ab1	Replacing EHa with EHsc (#37235 ) Summary: We should not rely on the async exceptions. Catching C++ only exception is more sensible and may get a boost in both space (1163 MB -> 1073 MB, 0.92x) and performance(51m -> 49m, 0.96x). Pull Request resolved: https://github.com/pytorch/pytorch/pull/37235 Differential Revision: D21256918 Pulled By: ezyang fbshipit-source-id: 572ee96f2e4c48ad13f83409e4e113483b3a457a	2020-04-28 08:20:37 -07:00
Mo Zhou	5b9f7f7b0e	[cmake] Add USE_SYSTEM_{GLOO,FP16,PTHREADPOOL,PSIMD,FXDIV,BENCHMARK} options (#14699 ) (#37277 ) Summary: These options are disabled by default, and are supposed to be used by linux distro developers. With the existing shortcut option USE_SYSTEM_LIBS toggled, these new options will be enabled as well. Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should no longer check the existence of git submodules. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277 Differential Revision: D21256999 Pulled By: ezyang fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf	2020-04-27 09:37:27 -07:00
Mo Zhou	ff21b15624	cmake: add USE_SYSTEM_{LIBS,CPUINFO,SLEEF} options (#14699 ) (#37137 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37137 Differential Revision: D21222632 Pulled By: ezyang fbshipit-source-id: 47624b30f8d07b31a40a26edf665bbec39e45202	2020-04-23 20:43:36 -07:00

1 2 3 4 5 ...

369 Commits