pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Nikita Shulga	3255ddeec9	Make `Wunused-local-typedef` a hard error (#77918 ) Only allow it for `libtorch_python` and tests Helps prevent regression like https://github.com/pytorch/pytorch/pull/76547#issuecomment-1132208232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77918 Approved by: https://github.com/osalpekar, https://github.com/seemethere	2022-06-09 18:14:01 +00:00
Nikita Shulga	634954c55c	[MPS] Do not pass linker command to a compiler (#78630 ) `-weak_framework` is a linker rather than a compiler option and as such it should not be passed as CXX flag Also, use `string(APPEND` rather than `set(FOO "$(FOO) ...)` Likely fixes our ability to use `sccache` for MacOS CI builds, see https://github.com/pytorch/pytorch/issues/78375#issuecomment-1143697183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78630 Approved by: https://github.com/albanD	2022-06-01 22:08:54 +00:00
Alban Desmaison	fd121dfeec	Move x86 binaries builder to macos-12 to enable MPS build Pull Request resolved: https://github.com/pytorch/pytorch/pull/77662 Approved by: https://github.com/seemethere	2022-05-19 21:59:08 +00:00
Peter Bell	5cdf79fddc	Bump minimum CMake version to 3.13 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76312 Approved by: https://github.com/malfet	2022-05-19 15:38:55 +00:00
Eddie Yan	14ab3ff484	[cuDNN V8 API] Enable cuDNN v8 API by default (#75466 ) Testing via CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/75466 Approved by: https://github.com/ngimel	2022-05-17 21:54:17 +00:00
Alban Desmaison	cf975dde0d	Make sure that we can build without xcode on mac (#77450 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77450 Approved by: https://github.com/drisspg, https://github.com/kulinseth	2022-05-13 21:18:55 +00:00
Kulin Seth	e011a8e18b	Enable PyTorch operations on MPS Backend. (#77343 ) Add PyTorch operations to MPS backend. - https://github.com/pytorch/pytorch/issues/77394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343 Approved by: https://github.com/albanD	2022-05-13 18:28:53 +00:00
sanchitintel	4ee29d6033	[Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5) Re-landing #68111/#74596 ## Description v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of #50256, the below improvements are included: * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` `torch.jit.freeze` should be used after tracing (recommended) or scripting a model. ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: * SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) * SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code is placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is in: ``` caffe2/CMakeLists.txt cmake/public/mkldnn.cmake cmake/Modules/FindMKLDNN.cmake ``` ## Limitations * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step. * We have only optimized the inference use-case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622 Approved by: https://github.com/eellison	2022-05-05 16:57:03 +00:00
Eddie Yan	e838137b3e	Add high level control of fp32 matmul precision; disable TF32 for matmuls by default #76440 CC @mruberry @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/76509 Approved by: https://github.com/ngimel	2022-05-04 20:40:13 +00:00
Nikita Shulga	8473173c36	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency. Add `third_party` to torch_cpu include directories if compiling with Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-05-03 20:21:55 +00:00
PyTorch MergeBot	3dcd67a1b3	Revert "[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)" This reverts commit `8b11d81058`. Reverted https://github.com/pytorch/pytorch/pull/74596 on behalf of https://github.com/janeyx99	2022-04-29 15:40:17 +00:00
chunyuan	8b11d81058	[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1) Re-landing https://github.com/pytorch/pytorch/pull/68111 ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596 Approved by: https://github.com/malfet	2022-04-29 01:01:33 +00:00
Kulin Seth	54c75e1e8f	Add "mps" device to PyTorch framework. Remove the "mlc" device for Mac platforms. This commit will be followed up with: * adding MPS runtime components * PyTorch ops for MPS device Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/76291 Approved by: https://github.com/albanD	2022-04-27 19:21:57 +00:00
PyTorch MergeBot	d79d9fa283	Revert "Remove breakpad dependency" This reverts commit `9aa3c7fd83`. Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet	2022-04-17 17:58:51 +00:00
Nikita Shulga	9aa3c7fd83	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-04-17 17:43:45 +00:00
Nikita Shulga	bdf5a87714	Extend sign-compare warnings to gcc (take 2) Remove `-Wno-sign-compare` option for GCC Suppress erroneous sign-compare warning in `c10::greater_than_max`(see https://godbolt.org/z/Tr3Msnz99) Fix sign-compare in torch/deploy, `caffe2::QTensor::dim32()` and `generate_proposals_op_test.cc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75544 Approved by: https://github.com/osalpekar	2022-04-13 00:06:52 +00:00
Edward Z. Yang	c2124f5c66	Turn on -Wsign-compare This is enabled on some of our internal builds, is a common source of fbcode only errors and apparently we are relatively clean on it. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74996 Approved by: https://github.com/malfet	2022-04-12 18:58:14 +00:00
PyTorch MergeBot	80e05b7df4	Revert "Extend sign-compare warnings to gcc" This reverts commit `34446653c7`. Reverted https://github.com/pytorch/pytorch/pull/75544 on behalf of https://github.com/janeyx99	2022-04-12 18:22:53 +00:00
Nikita Shulga	34446653c7	Extend sign-compare warnings to gcc Remove `-Wno-sign-compare` option for GCC Suppress erroneous sign-compare warning in `c10::greater_than_max`(see https://godbolt.org/z/Tr3Msnz99) Fix sign-compare in torch/deploy Pull Request resolved: https://github.com/pytorch/pytorch/pull/75544 Approved by: https://github.com/osalpekar	2022-04-12 17:36:48 +00:00
Nikita Shulga	90a56fc515	Add `-Wsign-compare` to list of clang flags It caused a number of internal only compilation failures, for example see: https://github.com/pytorch/pytorch/pull/74425#issuecomment-1075476438 and https://github.com/pytorch/pytorch/pull/74542#issuecomment-1083518880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75085 Approved by: https://github.com/ngimel, https://github.com/albanD	2022-04-05 14:16:47 +00:00
Xiang Gao	3b29bd00eb	Make ProcessGroupNCCL load torch_ucc.so when TORCH_UCC_LIBRARY_PATH is set (#69552 ) Summary: This is the very first step for the UCC-NCCL integration. This PR lets `ProcessGroupNCCL` load the `torch_ucc.so` if the user specifies an environmental variable `TORCH_UCC_LIBRARY_PATH`. If this environment variable is not specified by the user, then there will be no visible change. In the future, we may want to make PyTorch smart enough to automatically detect the `torch_ucc.so` in the user's system, but before doing that, I believe we should first make sure that `ProcessGroupUCC` is very well tested. Note that in this PR, `ProcessGroupNCCL` just loads the library but will not use it. I am trying to make PRs small, so the usage of `torch_ucc.so` will be submitted in later PRs. This PR requires the change in https://github.com/facebookresearch/torch_ucc/pull/56, otherwise `torch_ucc.so` can not be successfully loaded. But his PR can be landed separately without waiting for https://github.com/facebookresearch/torch_ucc/pull/56 because, in PyTorch's unit tests, UCC is never used or tested. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69552 Reviewed By: mruberry Differential Revision: D34675212 Pulled By: jiayisuse fbshipit-source-id: a3d1fb98340dbe3a931af555423863efd381f1ae (cherry picked from commit 3778b6fabe70c26b5a65e6ddec641d2ef9113cd1)	2022-03-25 18:19:39 +00:00
Will Constable	3547f20872	Land remaining parts of Torchscript Lazy Tensor backend (#74111 ) Summary: Also enables bazel build to run lazy codegen. Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111 Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds Reviewed By: bdhirsh Differential Revision: D34772403 fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496 (cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)	2022-03-22 23:14:03 +00:00
Edward Z. Yang	493bbdc4fe	Use shared CUPTI by default Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74009 Approved by: https://github.com/malfet	2022-03-16 21:04:12 +00:00
Ashwin Hari	7ed73b2803	CMake option for using static MKL libraries Fixes #70587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73069 Approved by: https://github.com/malfet	2022-03-07 19:32:33 +00:00
Mengwei Liu	9ce9803abe	[PyTorch] Add codegen unboxing ability (#69881 ) Summary: RFC: https://github.com/pytorch/rfcs/pull/40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)	2022-03-01 23:28:13 +00:00
Nikita Shulga	6302cdb9bc	[Reland] Add BUILD_LAZY_CUDA_LINALG option (#73447 ) Summary: When enabled, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Avoid symbol clashes that can result in infinite recursion by moving all symbols in the library to its own namespace. Add checks that should prevent calling self in recursion to `LinearAlgebraStubs.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/73447 Reviewed By: albanD Differential Revision: D34538827 Pulled By: malfet fbshipit-source-id: f2535b471d3524768a84b2e169b6aa24c26c03bf (cherry picked from commit 4ec24b079c861c1122f0fa86e280b977c3c2f7ac)	2022-03-01 21:33:07 +00:00
Andrey Talman	197764b35d	Remove cuda 11.1 references (#73514 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/73377 We've migrated to CUDA-11.3 as default toolkit in 1.9, it's time to stop builds (especially considering forward-compatibility guarantee across CUDA-11.x drivers) Hence we are removing CUDA 11.1 support. We should also cleanup old cuda related code from our builder and pytorch repo making scripts a little more clean. We have code that references cuda 9.2 , 10.1 , 11.0, 11.1, 11.2 and none of these are currently use Pull Request resolved: https://github.com/pytorch/pytorch/pull/73514 Reviewed By: janeyx99 Differential Revision: D34551989 Pulled By: atalman fbshipit-source-id: 9ceaaa9b25ad49689986f4b29a26d20370d9d011 (cherry picked from commit fe109c62daf429e9053c03f6e374568ba23cd041)	2022-03-01 16:37:37 +00:00
Jane Xu	31271284bc	Revert D33992795: Add BUILD_LAZY_CUDA_LINALG option Test Plan: revert-hammer Differential Revision: D33992795 (`82130758f0`) Original commit changeset: d1fa351a3206 Original Phabricator Diff: D33992795 (`82130758f0`) fbshipit-source-id: f0a66d7431aea2c358718eef16fab05712cd6cae (cherry picked from commit df4900115f712e477ed5cc97510e6515a1ca17a9)	2022-02-25 18:37:31 +00:00
Digant Desai	b2054d3025	Prepare for an update to the XNNPACK submodule (#72642 ) Summary: - Target Sha1: ae108ef49aa5623b896fc93d4298c49d1750d9ba - Make USE_XNNPACK a dependent option on cmake minimum version 3.12 - Print USE_XNNPACK under cmake options summary, and print the availability from collet_env.py - Skip XNNPACK based tests when XNNPACK is not available - Add SkipIfNoXNNPACK wrapper to skip tests - Update cmake version for xenial-py3.7-gcc5.4 image to 3.12.4 - This is required for the backwards compatibility test. The PyTorch op schema is XNNPACK dependent. See, aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp for example. The nightly version is assumed to have USE_XNNPACK=ON, so with this change we ensure that the test build can also have XNNPACK. - HACK: skipping test_xnnpack_integration tests on ROCM Pull Request resolved: https://github.com/pytorch/pytorch/pull/72642 Reviewed By: kimishpatel Differential Revision: D34456794 Pulled By: digantdesai fbshipit-source-id: 85dbfe0211de7846d8a84321b14fdb061cd6c037 (cherry picked from commit 6cf48e7b64d6979962d701b5d493998262cc8bfa)	2022-02-25 00:39:15 +00:00
Nikita Shulga	82130758f0	Add BUILD_LAZY_CUDA_LINALG option (#72306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72306 When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33992795 Pulled By: malfet fbshipit-source-id: d1fa351a320659b29754997c20d754e69bfe36c0 (cherry picked from commit d5d6c69a988b9454538ecd28674206da2541de17)	2022-02-24 03:30:04 +00:00
Daniël de Kok	d50211860a	Use SLEEF functions for NEON vectors on macOS ARM64 (#70354 ) Summary: We noticed that on M1 Macs Tranformer network profiles are dominated by scalar `exp` and `erff` functions (for softmax and GELU). The NEON `Vectorized<float>` implementation does not use SLEEF functions in order to compile on mobile platforms. However, SLEEF is already compiled on macOS ARM64 and is safe to use there. This change adds another implementation of `Vectorized<float>` that uses SLEEF functions. This implementation is only used on macOS ARM64. This change speeds up e.g. prediction of spaCy transformer models by 20% on M1 Macs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70354 Reviewed By: albanD Differential Revision: D33659540 Pulled By: kimishpatel fbshipit-source-id: b8f02a61321873fc60778190a005c466c7d0cc0c (cherry picked from commit `71286a207c`)	2022-02-07 21:55:28 +00:00
Peter Bell	4829dcea09	Codegen: Generate seperate headers per operator (#68247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247 This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and `NativeMetaFunctions.h` into seperate headers per operator base name. With `at::sum` as an example, we can include: ```cpp <ATen/core/sum.h> // Like Functions.h <ATen/core/sum_ops.h> // Like Operators.h <ATen/core/sum_native.h> // Like NativeFunctions.h <ATen/core/sum_meta.h> // Like NativeMetaFunctions.h ``` The umbrella headers are still being generated, but all they do is include from the `ATen/ops' folder. Further, `TensorBody.h` now only includes the operators that have method variants. Which means files that only include `Tensor.h` don't need to be rebuilt when you modify function-only operators. Currently there are about 680 operators that don't have method variants, so this is potentially a significant win for incremental builds. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32596272 Pulled By: albanD fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272	2021-12-14 06:40:08 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Nikita Shulga	e305e4d4d8	Suppress common warnings when building by clang (#69710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69710 Namely no range-loop-analysis (that detect when loop variable can not be const reference Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997003 Pulled By: malfet fbshipit-source-id: dba0e7875e5b667e2cc394c70dd75e2403265918	2021-12-10 16:45:38 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Peter Bell	21919be96b	CMake: Update precompiled header and fix support (#67851 ) Summary: This fixes the `USE_PRECOMPILED_HEADERS` cmake version check which was accidentally inverted, so it was always disabled. I've also made the precompiled header so it only includes headers used in 95% or more of code, weighted by compile time. This limits it to the standard library, `c10` and a limited subset of `ATen/core`. Crucially, the new pch doesn't depend on `native_functions.yaml` so won't cause as much unnecessary rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67851 Reviewed By: zou3519 Differential Revision: D32290902 Pulled By: dagitses fbshipit-source-id: dfc33330028c99b02ff40963926c1f1260d00d00	2021-12-03 06:51:56 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Yi Zhang	31d36fd35d	fix sccache issue on Windows CPU (#68870 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68796 ``` 2021-11-24T10:12:40.7634007Z Compile requests 4312 2021-11-24T10:12:40.7634484Z Compile requests executed 4300 2021-11-24T10:12:40.7634823Z Cache hits 4227 2021-11-24T10:12:40.7635122Z Cache hits (C/C++) 4227 2021-11-24T10:12:40.7636139Z Cache misses 62 2021-11-24T10:12:40.7636930Z Cache misses (C/C++) 62 2021-11-24T10:12:40.7637333Z Cache timeouts 0 2021-11-24T10:12:40.7637839Z Cache read errors 0 2021-11-24T10:12:40.7638161Z Forced recaches 0 2021-11-24T10:12:40.7638489Z Cache write errors 0 2021-11-24T10:12:40.7638828Z Compilation failures 1 2021-11-24T10:12:40.7639180Z Cache errors 10 2021-11-24T10:12:40.7639490Z Cache errors (C/C++) 10 2021-11-24T10:12:40.7639856Z Non-cacheable compilations 0 2021-11-24T10:12:40.7640244Z Non-cacheable calls 0 2021-11-24T10:12:40.7640601Z Non-compilation calls 12 2021-11-24T10:12:40.7640987Z Unsupported compiler calls 0 2021-11-24T10:12:40.7641426Z Average cache write 0.104 s 2021-11-24T10:12:40.7641763Z Average cache read miss 6.000 s 2021-11-24T10:12:40.7642110Z Average cache read hit 0.046 s 2021-11-24T10:12:40.7642485Z Failed distributed compilations 0 ``` https://github.com/pytorch/pytorch/runs/4310176911?check_suite_focus=true cc seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/68870 Reviewed By: ejguan Differential Revision: D32646289 Pulled By: janeyx99 fbshipit-source-id: bf04446439e55a4ccaf9ce7c77812752ca717a7c	2021-11-24 08:04:59 -08:00
Peter Bell	e7e1b76106	Require CMake 3.13 when building with Ninja (#68731 ) Summary: There is a bug in CMake's Ninja generator where files considered inputs to the cmake command couldn't be generated by another build step. The fix was included in CMake 3.13, but 3.10.3 is still sufficient for other cmake generators e.g. makefiles. For reference, the bug is here https://gitlab.kitware.com/cmake/cmake/-/issues/18584 This is necessary for https://github.com/pytorch/pytorch/issues/68246 but I'm isolating the change here to make testing easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68731 Reviewed By: jbschlosser Differential Revision: D32604545 Pulled By: malfet fbshipit-source-id: 9bc0bd8641ba415dd63ce21a05c177e2f1dd9866	2021-11-23 09:34:20 -08:00
Jiakai Liu	3dc0754c53	[pytorch][mobile] deprecate the LLVM-based static analyzer (#68180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180 Since we've open sourced the tracing-based selective build, we can deprecate the op-dependency-graph-based selective build and the static analyzer tool that produces the dependency graph. ghstack-source-id: 143108377 Test Plan: CIs Reviewed By: seemethere Differential Revision: D32358467 fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c	2021-11-11 16:37:08 -08:00
Nikita Shulga	77beccaedb	Do not build PyTorch with caffe2 by default (#66658 ) Summary: CAFFE2 has been deprecated for a while, but still included in every PyTorch build. We should stop building it by default, although CI should still validate that caffe2 code is buildable. Build even fewer dependencies when compiling mobile builds without Caffe2 Introduce `TEST_CAFFE2` in torch.common.utils Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc` is code is compiled without Caffe2 Should be landed after https://github.com/pytorch/builder/pull/864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658 Reviewed By: driazati, seemethere, janeyx99 Differential Revision: D31669156 Pulled By: malfet fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d	2021-10-21 20:32:47 -07:00
Chen Lai	76efbccc3b	[PyTorch Edge][tracing-based] Unify tracer between internal and external (#64152 ) Summary: As title, introduce the file `TracerRunner` shared by internal/external tracer and the main function is ``` TracerResult trace_run(const std::string& input_module_path); ``` which basically takes the path to model file and generate the trace result. The main difference between external tracer and internal tracer is 1. the dependency on `<yaml-cpp/yaml.h>`. 2. the output yaml file from internal tracer includes `model_version` and `model_asset`. These are only needed for internal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64152 ghstack-source-id: 140692467 Test Plan: ``` ./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_with_bundled_input.ptl" --build_yaml_path "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml" ``` ``` ./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/deeplabv3_scripted_with_bundled_input.ptl ``` have the same operator output selected_operators.yaml (P460296279) selected_mobile_ops.h (P460296258) Reviewed By: dhruvbird Differential Revision: D30632224 fbshipit-source-id: eb0321dbc0f1fcf6d2e05384695eebb59ac04f8c	2021-10-15 02:19:45 -07:00
Michael Suo	3ac2c74896	Revert D31082208: Use shared CUPTI by default Test Plan: revert-hammer Differential Revision: D31082208 (`8b0eae5aa8`) Original commit changeset: 14f66af92084 fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db	2021-10-12 14:37:54 -07:00
Edward Yang	8b0eae5aa8	Use shared CUPTI by default (#65401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401 Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gdankel Differential Revision: D31082208 Pulled By: ezyang fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34	2021-10-12 11:01:40 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Chen Lai	3fe5895a00	Back out "Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS" (#66267 ) Summary: Previously https://github.com/pytorch/pytorch/pull/64087 broke the test `binary_macos_wheel_3_7_cpu_build`, because wheel build is not happy with `model_tracer`. Considering it's prototype and there is no need to ship model_tracer via wheel at the moment, using the option `TRACING_BASED` for building tracer. When tracing-based is mature enough, we can ship the tracer binary via wheel eventually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66267 Original commit changeset: 8ac3d75a52d0 ghstack-source-id: 140122106 Test Plan: binary_macos_wheel_3_7_cpu_build passes {F668643831} Reviewed By: dhruvbird Differential Revision: D31478593 fbshipit-source-id: 726cab1b31c4596f6268b7824eecb20e2e59d161	2021-10-08 20:12:12 -07:00
Nikita Shulga	4c4525fa5c	Compile without -Wno-unused-variable (take 2) (#66041 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Do not delete `caffe2::OperatorBase::Output` calls as they have side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041 Reviewed By: ngimel Differential Revision: D31360142 Pulled By: malfet fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8	2021-10-04 20:39:39 -07:00
Nikita Shulga	e4ee5ca698	Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable Test Plan: revert-hammer Differential Revision: D31326599 (`a6280ab653`) Original commit changeset: 924155f1257a fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf	2021-10-01 20:40:47 -07:00
Nikita Shulga	5ef350d7cc	Revert D31359010: [pytorch][PR] Fix cang-tidy regressions caused by #65954 Test Plan: revert-hammer Differential Revision: D31359010 (`c269f471f4`) Original commit changeset: dce4b91a9891 fbshipit-source-id: 085417432b6748d3672b9b7141460f47d1c17a7f	2021-10-01 20:35:35 -07:00
Nikita Shulga	c269f471f4	Fix cang-tidy regressions caused by #65954 (#66040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66040 Reviewed By: ZolotukhinM Differential Revision: D31359010 Pulled By: malfet fbshipit-source-id: dce4b91a98913c8d8c2d8f9ebc49654265239158	2021-10-01 19:50:53 -07:00
Nikita Shulga	a6280ab653	Compile without -Wno-unused-variable (#65954 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954 Reviewed By: ngimel Differential Revision: D31326599 Pulled By: malfet fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3	2021-10-01 17:40:47 -07:00
Dhruv Matani	a84feeeade	[PyTorch Edge] Conditionally trim dispatch key set to save heap memory at runtime (#65732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65732 For certain on-device uses, runtime memory comes at a premium. On-device deployments won't use all the available dispatch keys, so it makes sense to keep only the on-device specific ones around for such uses to reduce runtime heap memory allocated. This change keeps just 10 dispatch keys (the ones that used on-device), guarded under the `C10_MOBILE_TRIM_DISPATCH_KEYS` macro. it tries to keep the other code-paths unaffected and uses `constexpr` for use in the `array` declaration, and simple inline functions to ensure that the compiler is able to optimize these for server builds. Test Plan: Build and check mobile models end to end. ``` buck build -c "pt.enable_milan_dispatch_keys_trimming"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: ezyang Differential Revision: D31185407 fbshipit-source-id: e954765606373dea6ee9466a851dca7684167b0b	2021-09-29 12:20:33 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit `03389dc851`. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Tao Xu	18fa58c4e9	[CoreML][OSS] Integrate with CMake (#64523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64523 - Build Pytorch with CoreML delegate - ` USE_PYTORCH_METAL=ON python setup.py install --cmake` - Build iOS static libs - `IOS_PLATFORM=SIMULATOR USE_COREML_DELEGATE=1 ./scripts/build_ios.sh` ghstack-source-id: 138324216 Test Plan: - Test the Helloword example {F657778559} Reviewed By: iseeyuan Differential Revision: D30594041 fbshipit-source-id: 8cece0b2d4b3ef82d3ef4da8c1054919148beb16	2021-09-17 10:32:00 -07:00
Nikita Shulga	67570a60ba	Disable ParallelTBB (#65092 ) Summary: As ParallelTBB's `at::get_thread_num` is not compatible with general model used by OpenMP and ParallelNative (where it is an contiguous thread index within parallel loop), see https://github.com/pytorch/pytorch/issues/64571#issuecomment-914691883 More examples of similar regressions: https://github.com/pytorch/pytorch/runs/3612142217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65092 Reviewed By: zhouzhuojie Differential Revision: D30995936 Pulled By: malfet fbshipit-source-id: db145b6a850d794f2c954f59f30249b291473e36	2021-09-16 12:38:45 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Nick Kreeger	882b67dff4	Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892 ) Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b	2021-09-14 09:44:18 -07:00
Hanton Yang	22d38bd10d	[OSS] Enable Metal in PyTorch MacOS nightly builds (#63718 ) Summary: Build on https://github.com/pytorch/pytorch/pull/63825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63718 Test Plan: 1.Add `ci/binaries` label to PR, so the CI will build those nightly builds 2.Make sure the following CI jobs build with `USE_PYTORCH_METAL_EXPORT` option is `ON`: ``` ci/circleci: binary_macos_arm64_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_9_cpu_nightly_build ci/circleci: binary_macos_conda_3_6_cpu_nightly_build ci/circleci: binary_macos_conda_3_7_cpu_nightly_build ci/circleci: binary_macos_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_libtorch_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_6_cpu_nightly_build ci/circleci: binary_macos_wheel_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_wheel_3_9_cpu_nightly_build ``` 3.Test `conda` and `wheel` builds locally on [HelloWorld-Metal](https://github.com/pytorch/ios-demo-app/tree/master/HelloWorld-Metal) demo with [(Prototype) Use iOS GPU in PyTorch](https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html) (1) conda ``` conda install https://15667941-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/pytorch-1.10.0.dev20210826-py3.8_0.tar.bz2 ``` (2) wheel ``` pip3 install https://15598647-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/torch-1.10.0.dev20210824-cp38-none-macosx_10_9_x86_64.whl ``` Reviewed By: xta0 Differential Revision: D30593167 Pulled By: hanton fbshipit-source-id: 471da204e94b29c11301c857c50501307a5f0785	2021-08-27 09:25:05 -07:00
Nikita Shulga	5ab356ffe6	Update CMake minimum version to 3.10 (#63660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63660 Test Plan: Imported from OSS Reviewed By: janeyx99, mruberry Differential Revision: D30543878 fbshipit-source-id: a7d938807653f39727f2cc7d7ca167200567b6a0	2021-08-25 09:25:43 -07:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
Peter Bell	f70b9ee5de	Advertise USE_PRECOMPILED_HEADERS in CONTRIBUTING.md (#62827 ) Summary: This option was added in https://github.com/pytorch/pytorch/issues/61940 and fits with this section's theme of improving build times. I've also changed it to a `cmake_dependent_option` instead of `FATAL_ERROR`ing for older CMake versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62827 Reviewed By: astaff Differential Revision: D30342102 Pulled By: malfet fbshipit-source-id: 3095b44b7085aee8a884ec95cba9f8998d4442e7	2021-08-17 10:14:40 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
peterjc123	d16587f84d	Enable rebuilds for Ninja on Windows (#62948 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59859. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62948 Reviewed By: seemethere, tktrungna Differential Revision: D30192246 Pulled By: janeyx99 fbshipit-source-id: af25cc4bf0db67a1304d9971cfa0ff6831bb3b48	2021-08-09 16:15:45 -07:00
Peter Bell	b7ac286d0e	CMake: Add optional precompiled header support (#61940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61940 This adds a `USE_PRECOMPILED_HEADERS` option to the CMake build which precompiles `ATen.h` and also `CUDAContext.h` for the cuda library. After making a change in `native_functions.yaml`, this speeds up compilation time by around 15% on my machine. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29988775 Pulled By: malfet fbshipit-source-id: a23c468c958a8b74ebaef052a5b2e5fa3836c64b	2021-08-03 09:13:47 -07:00
Can Balioglu	7565039ee9	Support system-provided Intel TBB (#61934 ) Summary: This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic. Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934 Reviewed By: malfet Differential Revision: D29805416 Pulled By: cbalioglu fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd	2021-08-02 07:39:00 -07:00
Jane Xu	e318058ffe	Ignore LNK4099 for debug binary libtorch builds (#62060 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62060 Test Plan: This CI shouldn't break and https://github.com/pytorch/pytorch/pull/62061 Reviewed By: driazati Differential Revision: D29877487 Pulled By: janeyx99 fbshipit-source-id: 497f84caab3f9ae609644fd397ad87a6dc8a2a77	2021-07-23 09:31:41 -07:00
Luca Wehrstedt	a1780432fa	Move c10d to libtorch(_cuda) (#59563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563 ghstack-source-id: 131331264 Test Plan: CI Reviewed By: malfet Differential Revision: D28932239 fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34	2021-06-15 02:01:31 -07:00
Nikita Shulga	1ea5c19c19	Add USE_WHOLE_CUDNN option (#59744 ) Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after https://github.com/pytorch/pytorch/pull/59721 towards resolving fast kernels stripping reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a	2021-06-09 21:12:42 -07:00
Nikita Shulga	7179e7ea7b	[CMake] Prefer third_party/pybind11 by default (#58951 ) Summary: To make build behaviour aligned with other third_party/ libraries, introduce `USE_SYSTEM_PYBIND11 (`d55b25a633`)` build option, which set to OFF by default, which means PyTorch will be build with bundled pybind11 even if other version is already installed locally. Fixes https://github.com/pytorch/pytorch/issues/58750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951 Reviewed By: driazati Differential Revision: D28690411 Pulled By: malfet fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c	2021-05-25 15:10:17 -07:00
Nathan John Sircombe	bf00d26deb	Enables builds with Compute Library backend for oneDNN (#55913 ) Summary: Since v1.7, oneDNN (MKL-DNN) has supported the use of Compute Library for the Arm architeture to provide optimised convolution primitives on AArch64. This change enables the use of Compute Library in the PyTorch build. Following the approach used to enable the use of CBLAS in MKLDNN, It is enabled by setting the env vars USE_MKLDNN and USE_MKLDNN_ACL. The location of the Compute Library build must be set useing `ACL_ROOT_DIR`. This is an extension of the work in https://github.com/pytorch/pytorch/pull/50400 which added support for the oneDNN/MKL-DNN backend on AArch64. _Note: this assumes that Compute Library has been built and installed at ACL_ROOT_DIR. Compute library can be downloaded here: `https://github.com/ARM-software/ComputeLibrary`_ Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55913 Reviewed By: ailzhang Differential Revision: D28559516 Pulled By: malfet fbshipit-source-id: 29d24996097d0a54efc9ab754fb3f0bded290005	2021-05-20 07:43:56 -07:00
Xiang Gao	6c70cbedb6	step 0 of cuDNN v8 convolution API integration (#51390 ) Summary: This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release. The work is not complete, and this PR is only step 0. What this PR does: - Add cudnn-frontend as a submodule. - Modify cmake to build that submodule. - Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default. - Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below. What this PR doesn't: - Only convolution forward, no backward. The backward will use v7 API. - No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions. - No test beyond PyTorch's unit tests. - Not tested for correctness on real models. - Not benchmarked for performance. - Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR) - cuDNN benchmark is not supported. - There are failing tests, which will be resolved later: ``` FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9 FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an... FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM ``` Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390 Reviewed By: malfet Differential Revision: D28513167 Pulled By: ngimel fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740	2021-05-19 12:54:09 -07:00
Pavel Belevich	96e1a83fb2	Add Gloo TCP_TLS transport (#56442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56442 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D27896285 Pulled By: pbelevich fbshipit-source-id: 589af59ca4c7c9bab2329f079382c09b71cfcf9e	2021-05-07 13:36:11 -07:00
Kimish Patel	f4a921600a	[PyTorch, Mobile] Serialization format change for source range (#54284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284 In order to bring mobile deployment, via lite interpreter, on feature parity with JIT, with respect model level debug information we must make model level debug information available to mobile runtime. At the moment, model level debug information is stored in SourceRange which associates node's of graph to where the come from in original python source code. This information is serialized as part of debug_pkl and deserialized when JIT loads the model and reads the model code. On lite interpreter, we do not have access to all the functionality of JIT and hence we cannot load model in the same way as JIT, by reading code, constructing module hierarchy and graph corresponding module methods etc. Instead in, lite interpreter, only bytecode corresonding to the compiled graph, Code, is saved. Thus in order to annotate OPs in the bytecode with equivalent SourceRange information we do the following: 1. During model serialization, we create a unique tag for each source range of the model. 2. Create a map of <SourceRange, tag> 3. During debug_pkl serialization we save tag along with SourceRange, on top of byte offset. 4. During bytecode generation, the methods of the top module are lowered. During this process methods are inlined. In the inlined graph, when the node of a graph is lowered to bytecode, we query node's source range and look it up against the map. 5. Resulting source range tag is serialized in module_debug_info. 6. During model deserialization, we read all the debug_pkl records in the archieve and create a map of <tag, SourceRange> 7. This map can be used to find source code information. During mobile runtime: 1. We read all the debug_pkl records and create <tag=debug_handle, SourceRange> map. 1.1 This map, MobileDebugInfo, is a member of mobile Module. 2. Interpreter catches appropriate exceptions and sets the thread local debug handle and rethrows the exception. 3. In Function's run method we catch exception and query current debug handle where the exception happened. 4. Query MobileDebugInfo with debug handle to retrieve source range and augment error with source range info. This information is still incomplete as it does not contain entire callstack. In the following diffs we will serialize InlinedCallStack directly. Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro, so that mobile builds can avoid building MobileDebugInfo, source range and source range pickler/unpickler. Later we will add path where, if building without debug support stack trace will contain only debug handles. They can be symbolicated later. Test Plan: Ported bunch of source range tests from test_jit.py. Added on more test in test_lite_interpreter.py Imported from OSS Reviewed By: raziel Differential Revision: D27174722 fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12	2021-05-04 09:19:27 -07:00
davidriazati@fb.com	264d87985a	Use ld.gold by default to link in CI (#57061 ) Summary: This adds an option to CMake to use `ld.gold` to link rather than `ld` (which symlinks to `ld.bfd` on Ubuntu by default). This shouldn't change any functionality, only a mild improvement on link times during builds (shaves off 1 minute) on CI. Verify by searching for `ld.gold is available` in [the logs](https://circleci.com/api/v1.1/project/github/pytorch/pytorch/13046834/output/105/0?file=true&allocation-id=608c434338107e5b6cf938a1-0-build%2F7BDA2FF1) ](https://our.intern.facebook.com/intern/diff/28123522/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57061 Pulled By: driazati Reviewed By: janeyx99 Differential Revision: D28123522 fbshipit-source-id: 5a60798ca4785427fd92bbf3b3aa5f63730e9b20	2021-05-03 10:05:36 -07:00
davidriazati@fb.com	c44cbc63cc	Ignore more compiler warnings, unify WERROR options (#56630 ) Summary: This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet). ](https://our.intern.facebook.com/intern/diff/28005063/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630 Pulled By: driazati Reviewed By: malfet Differential Revision: D28005063 fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0	2021-04-29 21:20:29 -07:00
davidriazati@fb.com	21be40b390	Add torch_cpu specific flag for debug info (#57190 ) Summary: Right now we are using `REL_WITH_DEB_INFO=1` on Linux CI binary builds. This is causing intermittent failures on CUDA builds since the debug information increases the load on the linker. This adds a workaround by a flag to enable debug info only for the target we actually want it for (`libtorch_cpu.so`, all the other binaries are stripped over their debug info after building). Example failures (from [the hud](https://ezyang.github.io/pytorch-ci-hud/build2/pytorch-nightly?mode=nightly)): * https://app.circleci.com/pipelines/github/pytorch/pytorch/311785/workflows/df640957-54b0-4592-aeef-6d5baee503ae/jobs/12932229 * https://app.circleci.com/pipelines/github/pytorch/pytorch/311784/workflows/e3b487d6-fb46-4a5d-a2d5-22eec328b678/jobs/12932228 * https://app.circleci.com/pipelines/github/pytorch/pytorch/311784/workflows/e3b487d6-fb46-4a5d-a2d5-22eec328b678/jobs/12932227 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57190 Pulled By: driazati Reviewed By: janeyx99 Differential Revision: D28085550 fbshipit-source-id: 0fc5b3e769b10c0dd3811717f968d0c933667361	2021-04-29 12:06:15 -07:00
Will Constable	21fd5f4b79	Document current deploy cpython build #56490 (#56600 ) Summary: Call out the issues with cpython deps and suggest a workaround. Fixes https://github.com/pytorch/pytorch/issues/56490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56600 Reviewed By: albanD Differential Revision: D27920647 Pulled By: wconstab fbshipit-source-id: 61a53a176eaf42a6166d649d3cb0fdfa2489e9d2	2021-04-22 09:02:29 -07:00
Eddie Yan	81f181567a	Add `USE_MAGMA` build flag (#55994 ) Summary: Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master). A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be manually deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild? CC malfet ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994 Reviewed By: mruberry Differential Revision: D27766287 Pulled By: malfet fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421	2021-04-15 00:43:12 -07:00
Ailing Zhang	1688a5d31a	Cleanup since FEATURE_TORCH_MOBILE is always true. (#55835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55835 Now that https://github.com/pytorch/pytorch/pull/55238 is landed for a week and no complains. It seems safe to say FEATURE_TORCH_MOBILE is always true and we can do some cleanup. Test Plan: Imported from OSS Reviewed By: ezyang, walterddr Differential Revision: D27721284 Pulled By: ailzhang fbshipit-source-id: 4896bc5f736373d0922cfbe8eed0d16df62f0fa1	2021-04-14 09:08:18 -07:00
Ivan Kobzarev	85fcadc059	[lite-interpreter] speed_benchmark_torch support BUILD_LITE_INTERPRETER (#55402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55402 Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D27599824 Pulled By: IvanKobzarev fbshipit-source-id: 3adbb8a16a785d3610404d71ef2d895904b1a8ef	2021-04-07 11:39:32 -07:00
SpaceIm	aeedd5c7df	cmake: fix ONNX_NAMESPACE if USE_SYSTEM_ONNX (#54973 ) Summary: `ONNX_NAMESPACE` is empty by default if `USE_SYSTEM_ONNX ON`, while it should be equal to `onnx`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54973 Reviewed By: glaringlee Differential Revision: D27466020 Pulled By: walterddr fbshipit-source-id: 47cde3604acbda3f45bec5893036b39fd1eb58c9	2021-03-31 08:29:00 -07:00
Nikita Shulga	68bdeef2ce	[CMake] Simplify CPU architecture detection logic (#54637 ) Summary: CMAKE_SYSTEM_PROCESSOR set to x86_64(on Linux) or AMD64 (`5ec224496b`)(on Windows) indicates build is running on x86_64 architecture, while `CMAKE_SYSTEM_PROCESSOR` set to aarch64 or arm64 means we running on ARMv8+ architecture. Delete `i[3-6]86` pattern as 32-bit builds are no longer supported Pull Request resolved: https://github.com/pytorch/pytorch/pull/54637 Reviewed By: ezyang Differential Revision: D27311897 Pulled By: malfet fbshipit-source-id: 26989fc9b54a96d70c768ab03ca4528506ee7808	2021-03-25 12:32:18 -07:00
Leonard Lausen	90bbe0b38b	cmake: auto-detect ccache to speed up developer builds (#49389 ) Summary: https://ccache.dev/ is a compiler cache that speeds up subsequent builds. Auto-detecting ccache ensures that it is used on systems where it is available, greatly improving build times for developers. There is no risk in enabling ccache in practice. Please refer to https://ccache.dev/ for a short summary / motivation Pull Request resolved: https://github.com/pytorch/pytorch/pull/49389 Reviewed By: ejguan Differential Revision: D27169957 Pulled By: malfet fbshipit-source-id: 673b60bbceb0d323901c8a992a75792c6da9b805	2021-03-18 14:20:53 -07:00
Ashkan Aliabadi	e5ecd1ddf8	[Vulkan]Fix build warnings-treated-as-error on Linux. (#52781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D26669311 Pulled By: AshkanAliabadi fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311	2021-03-03 13:48:43 -08:00
Chen Lai	14f7bf0629	[PyTorch] update CMake to build libtorch lite (#51419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51419 ## Summary 1. Add an option `BUILD_LITE_INTERPRETER` in `caffe2/CMakeLists.txt` and set `OFF` as default. 2. Update 'build_android.sh' with an argument to swtich `BUILD_LITE_INTERPRETER`, 'OFF' as default. 3. Add a mini demo app `lite_interpreter_demo` linked with `libtorch` library, which can be used for quick test. ## Test Plan Built lite interpreter version of libtorch and test with Image Segmentation demo app ([android version](https://github.com/pytorch/android-demo-app/tree/master/ImageSegmentation)/[ios version](https://github.com/pytorch/ios-demo-app/tree/master/ImageSegmentation)) ### Android 1. Prepare model: Prepare the lite interpreter version of model by run the script below to generate the scripted model `deeplabv3_scripted.pt` and `deeplabv3_scripted.ptl` ``` import torch model = torch.hub.load('pytorch/vision:v0.7.0', 'deeplabv3_resnet50', pretrained=True) model.eval() scripted_module = torch.jit.script(model) # Export full jit version model (not compatible lite interpreter), leave it here for comparison scripted_module.save("deeplabv3_scripted.pt") # Export lite interpreter version model (compatible with lite interpreter) scripted_module._save_for_lite_interpreter("deeplabv3_scripted.ptl") ``` 2. Build libtorch lite for android: Build libtorch for android for all 4 android abis (armeabi-v7a, arm64-v8a, x86, x86_64) `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh`. This pr is tested on Pixel 4 emulator with x86, so use cmd `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` to specify abi to save built time. After the build finish, it will show the library path: ``` ... BUILD SUCCESSFUL in 55s 134 actionable tasks: 22 executed, 112 up-to-date + find /Users/chenlai/pytorch/android -type f -name 'aar' + xargs ls -lah -rw-r--r-- 1 chenlai staff 13M Feb 11 11:48 /Users/chenlai/pytorch/android/pytorch_android/build/outputs/aar/pytorch_android-release.aar -rw-r--r-- 1 chenlai staff 36K Feb 9 16:45 /Users/chenlai/pytorch/android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar ``` 3. Use the PyTorch Android libraries built from source in the ImageSegmentation app: Create a folder 'libs' in the path, the path from repository root will be `ImageSegmentation/app/libs`. Copy `pytorch_android-release` to the path `ImageSegmentation/app/libs/pytorch_android-release.aar`. Copy 'pytorch_android_torchvision` (downloaded from [here](https://oss.sonatype.org/#nexus-search;quick~torchvision_android)) to the path `ImageSegmentation/app/libs/pytorch_android_torchvision.aar` Update the `dependencies` part of `ImageSegmentation/app/build.gradle` to ``` dependencies { implementation 'androidx.appcompat:appcompat:1.2.0' implementation 'androidx.constraintlayout:constraintlayout:2.0.2' testImplementation 'junit:junit:4.12' androidTestImplementation 'androidx.test.ext:junit:1.1.2' androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0' implementation(name:'pytorch_android-release', ext:'aar') implementation(name:'pytorch_android_torchvision', ext:'aar') implementation 'com.android.support:appcompat-v7:28.0.0' implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3' } ``` Update `allprojects` part in `ImageSegmentation/build.gradle` to ``` allprojects { repositories { google() jcenter() flatDir { dirs 'libs' } } } ``` 4. Update model loader api: Update `ImageSegmentation/app/src/main/java/org/pytorch/imagesegmentation/MainActivity.java` by 4.1 Add new import: `import org.pytorch.LiteModuleLoader;` 4.2 Replace the way to load pytorch lite model ``` // mModule = Module.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.pt")); mModule = LiteModuleLoader.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.ptl")); ``` 5. Test app: Build and run the ImageSegmentation app in Android Studio, ![image](https://user-images.githubusercontent.com/16430979/107696279-9cea5900-6c66-11eb-8286-4d1d68abff61.png) ### iOS 1. Prepare model: Same as Android. 2. Build libtorch lite for ios* `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR BUILD_LITE_INTERPRETER=1 ./scripts/build_ios.sh` 3. Remove Cocoapods from the project: run `pod deintegrate` 4. Link ImageSegmentation demo app with the custom built library: Open your project in XCode, go to your project Target’s Build Phases - Link Binaries With Libraries, click the + sign and add all the library files located in `build_ios/install/lib`. Navigate to the project Build Settings, set the value Header Search Paths to `build_ios/install/include` and Library Search Paths to `build_ios/install/lib`. In the build settings, search for other linker flags. Add a custom linker flag below ``` -all_load ``` Finally, disable bitcode for your target by selecting the Build Settings, searching for Enable Bitcode, and set the value to No. 5. Update library and api 5.1 Update `TorchModule.mm`` To use the custom built libraries the project, replace `#import <LibTorch/LibTorch.h>` (in `TorchModule.mm`) which is needed when using LibTorch via Cocoapods with the code below: ``` //#import <LibTorch/LibTorch.h> #include "ATen/ATen.h" #include "caffe2/core/timer.h" #include "caffe2/utils/string_utils.h" #include "torch/csrc/autograd/grad_mode.h" #include "torch/script.h" #include <torch/csrc/jit/mobile/function.h> #include <torch/csrc/jit/mobile/import.h> #include <torch/csrc/jit/mobile/interpreter.h> #include <torch/csrc/jit/mobile/module.h> #include <torch/csrc/jit/mobile/observer.h> ``` 5.2 Update `ViewController.swift` ``` // if let filePath = Bundle.main.path(forResource: // "deeplabv3_scripted", ofType: "pt"), // let module = TorchModule(fileAtPath: filePath) { // return module // } else { // fatalError("Can't find the model file!") // } if let filePath = Bundle.main.path(forResource: "deeplabv3_scripted", ofType: "ptl"), let module = TorchModule(fileAtPath: filePath) { return module } else { fatalError("Can't find the model file!") } ``` ### Unit test Add `test/cpp/lite_interpreter`, with one unit test `test_cores.cpp` and a light model `sequence.ptl` to test `_load_for_mobile()`, `bc.find_method()` and `bc.forward()` functions. ### Size: With the change: Android: x86: `pytorch_android-release.aar` (13.8 MB) IOS: `pytorch/build_ios/install/lib` (lib: 66 MB): ``` (base) chenlai@chenlai-mp lib % ls -lh total 135016 -rw-r--r-- 1 chenlai staff 3.3M Feb 15 20:45 libXNNPACK.a -rw-r--r-- 1 chenlai staff 965K Feb 15 20:45 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Feb 15 20:45 libclog.a -rw-r--r-- 1 chenlai staff 42K Feb 15 20:45 libcpuinfo.a -rw-r--r-- 1 chenlai staff 39K Feb 15 20:45 libcpuinfo_internals.a -rw-r--r-- 1 chenlai staff 1.5M Feb 15 20:45 libeigen_blas.a -rw-r--r-- 1 chenlai staff 148K Feb 15 20:45 libfmt.a -rw-r--r-- 1 chenlai staff 44K Feb 15 20:45 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Feb 15 20:45 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Feb 15 21:19 libtorch.a -rw-r--r-- 1 chenlai staff 60M Feb 15 20:47 libtorch_cpu.a ``` `pytorch/build_ios/install`: ``` (base) chenlai@chenlai-mp install % du -sh * 14M include 66M lib 2.8M share ``` Master (baseline): Android: x86: `pytorch_android-release.aar` (16.2 MB) IOS: `pytorch/build_ios/install/lib` (lib: 84 MB): ``` (base) chenlai@chenlai-mp lib % ls -lh total 172032 -rw-r--r-- 1 chenlai staff 3.3M Feb 17 22:18 libXNNPACK.a -rw-r--r-- 1 chenlai staff 969K Feb 17 22:18 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Feb 17 22:18 libclog.a -rw-r--r-- 1 chenlai staff 42K Feb 17 22:18 libcpuinfo.a -rw-r--r-- 1 chenlai staff 1.5M Feb 17 22:18 libeigen_blas.a -rw-r--r-- 1 chenlai staff 44K Feb 17 22:18 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Feb 17 22:18 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Feb 17 22:19 libtorch.a -rw-r--r-- 1 chenlai staff 78M Feb 17 22:19 libtorch_cpu.a ``` `pytorch/build_ios/install`: ``` (base) chenlai@chenlai-mp install % du -sh * 14M include 84M lib 2.8M share ``` Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D26518778 Pulled By: cccclai fbshipit-source-id: 4503ffa1f150ecc309ed39fb0549e8bd046a3f9c	2021-02-21 01:43:54 -08:00
Bel H	db33afbf9f	Change cmake to allow building with MLC kick-off build (#51326 ) Summary: - Allows build process to build with MLC enabled if subrepo folder mlc is in path and we can link against ML Compute on macOS BigSur - To build with MLC enabled you will need to clone the mlc repo inside the pytorch repository. - We need both this change and https://github.com/pytorch/pytorch/pull/50634 on pytorch/pytorch to enable the `mlc` device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51326 Reviewed By: glaringlee Differential Revision: D26533138 Pulled By: malfet fbshipit-source-id: 0baa06b4eb2d62dbfc0f6fc922096cb0db1cc7d1	2021-02-19 13:04:25 -08:00
Jiakai Liu	c9c4b871a5	[pytorch] reintroduce static dispatch (#51957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51957 This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` Differential Revision: D26337857 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364	2021-02-19 11:41:39 -08:00
Jane Xu	ac2bdf553e	update build_host_protoc command for macos cross compilation (#50922 ) Summary: Currently, adding a cross compile build is failing on CI due to a cmake builtin compiler check that does not pass due to cross compiling the host protoc library. Setting the CMAKE_TRY_COMPILE_TARGET_TYPE flag should fix it. (Based on this [SOF answer](https://stackoverflow.com/questions/53633705/cmake-the-c-compiler-is-not-able-to-compile-a-simple-test-program).) To test that this works, please run: `CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_NNPACK=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF python setup.py install` from a Mac x86_64 machine with Xcode12.3 (anything with MacOS 11 SDK). Then, you can check that things were compiled for arm by running `lipo -info <file>` for any file in the `build/lib` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50922 Reviewed By: malfet Differential Revision: D26355054 Pulled By: janeyx99 fbshipit-source-id: 919f3f9bd95d7c7bba6ab3a95428d3ca309f8ead	2021-02-11 14:36:51 -08:00
cyy	1aaddd83a5	don't set the same C++ and C standards twice (#51832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51832 Reviewed By: izdeby Differential Revision: D26312660 Pulled By: ezyang fbshipit-source-id: 7d646cd106397e70bca0050d0aa30eb62b085cee	2021-02-08 08:53:26 -08:00
Jane Xu	88af2149e1	Add build option to split torch_cuda library into torch_cuda_cu and torch_cuda_cpp (#49050 ) Summary: Because of the size of our `libtorch_cuda.so`, linking with other hefty binaries presents a problem where 32bit relocation markers are too small and end up overflowing. This PR attempts to break up `torch_cuda` into `torch_cuda_cu` and `torch_cuda_cpp`. `torch_cuda_cu`: all the files previously in `Caffe2_GPU_SRCS` that are * pure `.cu` files in `aten`match * all the BLAS files * all the THC files, except for THCAllocator.cpp, THCCachingHostAllocator.cpp and THCGeneral.cpp * all files in`detail` * LegacyDefinitions.cpp and LegacyTHFunctionsCUDA.cpp * RegisterCUDA.cpp CUDAHooks.cpp * CUDASolver.cpp * TensorShapeCUDA.cpp `torch_cuda_cpp`: all other files in `Caffe2_GPU_SRCS` Accordingly, TORCH_CUDA_API and TORCH_CUDA_BUILD_MAIN_LIB usages are getting split as well to TORCH_CUDA_CU_API and TORCH_CUDA_CPP_API. To test this locally, you can run `export BUILD_SPLIT_CUDA=ON && python setup.py develop`. In your `build/lib` folder, you should find binaries for both `torch_cuda_cpp` and `torch_cuda_cu`. To see that the SPLIT_CUDA option was toggled, you can grep the Summary of running cmake and make sure `Split CUDA` is ON. This build option is tested on CI for CUDA 11.1 builds (linux for now, but windows soon). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49050 Reviewed By: walterddr Differential Revision: D26114310 Pulled By: janeyx99 fbshipit-source-id: 0180f2519abb5a9cdde16a6fb7dd3171cff687a6	2021-02-01 18:42:35 -08:00
Ivan Kobzarev	dbfaf966b0	[android] turn on USE_VULKAN for android builds by default (#51291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51291 Turning on USE_VULKAN for android builds Remove standalone android vulkan build Testing all ci jobs (for master): https://github.com/pytorch/pytorch/pull/51292 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D26141891 Pulled By: IvanKobzarev fbshipit-source-id: e8e1a4ab612c0786ce09217ab9370fd75a71eb00	2021-01-29 11:58:21 -08:00
Will Constable	f2e41257e4	Back out "Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"" (#51267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51267 Original commit changeset: b70185916502 Test Plan: test locally, oss ci-all, fbcode incl deferred Reviewed By: suo Differential Revision: D26121251 fbshipit-source-id: 4315b7fd5476914c8e5d6f547e1cfbcf0c227781	2021-01-28 19:30:45 -08:00
Mike Ruberry	12a434abbc	Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" Test Plan: revert-hammer Differential Revision: D26077905 (`dc2a44c4fc`) Original commit changeset: fae83bf9822d fbshipit-source-id: b70185916502ba9ebe16d781cf0659b9f7865c9a	2021-01-27 19:53:29 -08:00
Will Constable	dc2a44c4fc	Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" (#51124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51124 Original commit changeset: 1c7133627da2 Test Plan: Test locally with interpreter_test and on CI Reviewed By: suo Differential Revision: D26077905 fbshipit-source-id: fae83bf9822d79e9a9b5641bc5191a7f3fdea78d	2021-01-27 16:49:42 -08:00
Mike Ruberry	e843974a6e	Revert D25850783: Add torch::deploy, an embedded torch-python interpreter Test Plan: revert-hammer Differential Revision: D25850783 (`3192f9e4fe`) Original commit changeset: a4656377caff fbshipit-source-id: 1c7133627da28fb12848da7a9a46de6d3b2b67c6	2021-01-26 02:07:44 -08:00
Will Constable	3192f9e4fe	Add torch::deploy, an embedded torch-python interpreter (#50458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50458 libinterpreter.so contains a frozen python distribution including torch-python bindings. Freezing refers to serializing bytecode of python standard library modules as well as the torch python library and embedding them in the library code. This library can then be dlopened multiple times in one process context, each interpreter having its own python state and GIL. In addition, each python environment is sealed off from the filesystem and can only import the frozen modules included in the distribution. This change relies on newly added frozenpython, a cpython 3.8.6 fork built for this purpose. Frozenpython provides libpython3.8-frozen.a which contains frozen bytecode and object code for the python standard library. Building on top of frozen python, the frozen torch-python bindings are added in this diff, providing each embedded interpreter with a copy of the torch bindings. Each interpreter is intended to share one instance of libtorch and the underlying tensor libraries. Known issues - Autograd is not expected to work with the embedded interpreter currently, as it manages its own python interactions and needs to coordinate with the duplicated python states in each of the interpreters. - Distributed and cuda stuff is disabled in libinterpreter.so build, needs to be revisited - __file__ is not supported in the context of embedded python since there are no files for the underlying library modules. using __file__ - __version__ is not properly supported in the embedded torch-python, just a workaround for now Test Plan: tested locally and on CI with cmake and buck builds running torch::deploy interpreter_test Reviewed By: ailzhang Differential Revision: D25850783 fbshipit-source-id: a4656377caff25b73913daae7ae2f88bcab8fd88	2021-01-25 15:14:28 -08:00
Will Constable	4bbff92014	Refactor build targets for torch::deploy (#50288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50288 torch::deploy will bundle the objects contained in libtorch-python together with frozenpython into a shared library. Therefore, the libtorch-python objs can't bring with them a dependency on system python. Buck TARGETS are added throughout the caffe2 tree to make available objects or headers that will be needed by torch::deploy but would have brought unsuitable dependencies if accessed using existing targets. CMakeLists are modified to separate a torch-python-objs object library which lets torch::deploy compile these objs with the same compile flags as libttorch_python used, but without some of the link-time dependencies such as python. CudaIPCTypes is moved from libtorch_python to libtorch_cuda because it is really not a python binding, and it statically registers a cuda_ipc_callback which would be duplicated if included in each copy of torch::deploy. Test Plan: no new functionality, just ensure existing tests continue to pass Reviewed By: malfet Differential Revision: D25850785 fbshipit-source-id: b0b81c050cbee04e9de96888f8a09d29238a9db8	2021-01-22 09:16:32 -08:00
Ilia Cherniavskii	e34992ebee	Set USE_KINETO=1 (#49897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897 Resend of https://github.com/pytorch/pytorch/pull/49201 Test Plan: see 49201 Reviewed By: malfet Differential Revision: D25717102 Pulled By: ilia-cher fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6	2021-01-22 00:09:21 -08:00
Nikita Shulga	cebab83d3f	Fix USE_MKLDN defaults (#50782 ) Summary: Fixes regression introduced by https://github.com/pytorch/pytorch/pull/50400 `cmake_dependent_option` semantic is following (see https://cmake.org/cmake/help/v3.19/module/CMakeDependentOption.html); `cmake_dependent_option(<option> "<help_text>" <value> <depends> <force>)` I.e. depends should be true for CPU_INTEL or CPU_AARCH64 but default value should be ON only if CPU_INTEL is true Pull Request resolved: https://github.com/pytorch/pytorch/pull/50782 Reviewed By: xuzhao9 Differential Revision: D25966509 Pulled By: malfet fbshipit-source-id: c891cd9234311875762403f7125d0c3803bb0e65	2021-01-19 21:41:53 -08:00
Rong Rong (AI Infra)	ebd142e94b	initial commit to enable fast_nvcc (#49773 ) Summary: draft enable fast_nvcc. * cleaned up some non-standard usages * added fall-back to wrap_nvcc Pull Request resolved: https://github.com/pytorch/pytorch/pull/49773 Test Plan: Configuration to enable fast nvcc: - install and enable `ccache` but delete `.ccache/` folder before each build. - `TORCH_CUDA_ARCH_LIST=6.0;6.1;6.2;7.0;7.5` - Toggling `USE_FAST_NVCC=ON/OFF` cmake config and run `cmake --build` to verify the build time. Initial statistic for a full compilation: * `cmake --build . -- -j $(nproc)`: - fast NVCC ``` real 48m55.706s user 1559m14.218s sys 318m41.138s ``` - normal NVCC: ``` real 43m38.723s user 1470m28.131s sys 90m46.879s ``` * `cmake --build . -- -j $(nproc/4)`: - fast NVCC: ``` real 53m44.173s user 1130m18.323s sys 71m32.385s ``` - normal NVCC: ``` real 81m53.768s user 858m45.402s sys 61m15.539s ``` * Conclusion: fast NVCC doesn't provide too much gain when compiler is set to use full CPU utilization, in fact it is even worse because of the thread switcing. initial statistic for partial recompile (edit .cu files) * `cmake --build . -- -j $(nproc)` - fast NVCC: ``` [2021-01-13 18:10:24] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o [2021-01-13 18:11:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so ``` - normal NVCC: ``` [2021-01-13 17:35:40] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o [2021-01-13 17:38:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so ``` * Conclusion: Effective compilation time for single CU file modification reduced from from 2min30sec to only 40sec when compiling multiple architecture. This shows 4X gain in speed up using fast NVCC -- reaching the theoretical limit of 5X when compiling 5 gencode architecture at the same time. Follow up PRs: - should have better fallback mechanism to detect whether a build is supported by fast_nvcc or not instead of dryruning then fail with fallback. - performance measurement instrumentation to measure what's the total compile time vs the parallel tasks critical path time. - figure out why `-j $(nproc)` gives significant sys overhead (`sys 318m41.138s` vs `sys 90m46.879s`) over normal nvcc, guess this is context switching, but not exactly sure Reviewed By: malfet Differential Revision: D25692758 Pulled By: walterddr fbshipit-source-id: c244d07b9b71f146e972b6b3682ca792b38c4457	2021-01-19 14:50:54 -08:00
Rong Rong	070a30b265	[BE] add warning message to cmake against env var "-std=c++xx" (#50491 ) Summary: this was discovered when working on https://github.com/pytorch/pytorch/issues/50230. environment variables such as CXXFLAGS="-std=c++17" will not work because we use CMAKE_CXX_STANDARD 14. Adding this warning to alert users when environment variable was set. See: [CMake env var usage](https://cmake.org/cmake/help/latest/manual/cmake-env-variables.7.html#id4) and [CXXFLAGS usage](https://cmake.org/cmake/help/latest/envvar/CXXFLAGS.html) for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50491 Reviewed By: mrshenli Differential Revision: D25907851 Pulled By: walterddr fbshipit-source-id: 5af5eec76f79f9d35456af1f2663cafbc54e7dc8	2021-01-15 07:12:56 -08:00
Nathan John Sircombe	664126bab5	Enables build with oneDNN (MKL-DNN) on AArch64 (#50400 ) Summary: Since version 1.6, oneDNN has provided limited support for AArch64 builds. This minor change is to detect an AArch64 CPU and permit the use of `USE_MKLDNN` in that case. Build flags for oneDNN are also modified accordingly. Note: oneDNN on AArch64, by default, will use oneDNN's reference C++ kernels. These are not optimised for AArch64, but oneDNN v1.7 onwards provides support for a limited set of primitives based Arm Compute Library. See: https://github.com/oneapi-src/oneDNN/pull/795 and: https://github.com/oneapi-src/oneDNN/pull/820 for more details. Support for ACL-based oneDNN primitives in PyTorch will require some further modification, Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50400 Reviewed By: izdeby Differential Revision: D25886589 Pulled By: malfet fbshipit-source-id: 2c81277a28ad4528c2d2211381e7c6692d952bc1	2021-01-13 08:41:44 -08:00
Jane Xu	c2d37cd990	Change CMake config to enable universal binary for Mac (#50243 ) Summary: This PR is a step towards enabling cross compilation from x86_64 to arm64. The following has been added: 1. When cross compilation is detected, compile a local universal fatfile to use as protoc. 2. For the simple compile check in MiscCheck.cmake, make sure to compile the small snippet as a universal binary in order to run the check. Test plan: Kick off a minimal build on a mac intel machine with the macOS 11 SDK with this command: ``` CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF USE_NNPACK=OFF python setup.py install ``` (If you run the above command before this change, or without macOS 11 SDK set up, it will fail.) Then check the platform of the built binaries using this command: ``` lipo -info build/lib/libfmt.a ``` Output: - Before this PR, running a regular build via `python setup.py install` (instead of using the flags listed above): ``` Non-fat file: build/lib/libfmt.a is architecture: x86_64 ``` - Using this PR: ``` Non-fat file: build/lib/libfmt.a is architecture: arm64 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50243 Reviewed By: malfet Differential Revision: D25849955 Pulled By: janeyx99 fbshipit-source-id: e9853709a7279916f66aa4c4e054dfecced3adb1	2021-01-08 17:26:08 -08:00
Ashkan Aliabadi	1c12cbea90	Optimize Vulkan command buffer submission rate. (#49112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49112 Differential Revision: D25729889 Test Plan: Imported from OSS Reviewed By: SS-JIA Pulled By: AshkanAliabadi fbshipit-source-id: c4ab470fdcf3f83745971986f3a44a3dff69287f	2021-01-08 16:39:22 -08:00
Antonio Cuni	8f31621f78	Fix MKL builds on Ubuntu (#50212 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/50211 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50212 Reviewed By: janeyx99 Differential Revision: D25850876 Pulled By: walterddr fbshipit-source-id: be138db3ae370c45f5fbf3af486cf8b32518df87	2021-01-08 13:16:30 -08:00
Jithun Nair	45ec35827e	Set USE_RCCL cmake option (dependent on USE_NCCL) [REDUX] (#34683 ) Summary: Refiled duplicate of https://github.com/pytorch/pytorch/issues/31341 which was reverted in commit `63964175b5`. This PR enables RCCL support when building Gloo as part of PyTorch for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34683 Reviewed By: glaringlee Differential Revision: D25540578 Pulled By: ezyang fbshipit-source-id: fcb02e5745d62e1b7d2e02048160e9e7a4b4df2d	2021-01-06 07:03:02 -08:00
Rong Rong (AI Infra)	12ee7b61e7	support building with conda installed libraries (#50080 ) Summary: This should fix a bunch of share library compilation error when installed in conda lib, lib64 folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50080 Reviewed By: seemethere Differential Revision: D25781923 Pulled By: walterddr fbshipit-source-id: 78a74925981d65243b98bb99a65f1f2766e87a2f	2021-01-05 12:32:51 -08:00
Ilia Cherniavskii	72b00a8a52	Revert D25480770: Set USE_KINETO=1 Test Plan: revert-hammer Differential Revision: D25480770 (`1a92802bde`) Original commit changeset: 037cd774f554 fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1	2020-12-18 07:06:28 -08:00
Ilia Cherniavskii	1a92802bde	Set USE_KINETO=1 (#49201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201 This unblocks kineto profiler for 1.8 release. This PR supercedes https://github.com/pytorch/pytorch/pull/48391 Note: this will somewhat increase the size of linux server binaries, bc we add libkineto.a and libcupti_static.a: -rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a -rw-r--r-- 1 root root 13699658 Nov 13 2019 /usr/local/cuda/lib64/libcupti_static.a Test Plan: CI https://github.com/pytorch/pytorch/pull/48391 Imported from OSS Reviewed By: ngimel Differential Revision: D25480770 fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c	2020-12-18 01:48:10 -08:00
Nikita Shulga	84fce6d29a	[AARCH64] Fix HAS_VST1 check if compiled by clang (#49182 ) Summary: Use `UL` suffix supported by all C99 compatible compilers instead of `__AARCH64_UINT64_C`, which is a gcc specific extension Before the change this check would have failed as follows with a bug-free clang compiler with the following errors: ``` $ clang has_vst1.c has_vst1.c:5:41: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[0] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:5:79: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[0] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:6:41: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[1] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:6:79: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[1] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ 4 warnings generated. /tmp/has_vst1-b1e162.o: In function `main': has_vst1.c:(.text+0x30): undefined reference to `__AARCH64_UINT64_C' ``` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/49182 Reviewed By: walterddr Differential Revision: D25471994 Pulled By: malfet fbshipit-source-id: 0129a6f7aabc46aa117ef719d3a211449cb410f1	2020-12-10 15:19:12 -08:00
Ashkan Aliabadi	66440d1b29	Tweak Vulkan memory use. (#47728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47728 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25032740 Pulled By: AshkanAliabadi fbshipit-source-id: 7eb72538dc1aa3feb4e2f8c4ff9c675eb8e97057	2020-11-30 14:28:09 -08:00
Joe Zhu	42e7cdc50a	Improve libuv detection on Windows (#48571 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48571 Reviewed By: ejguan Differential Revision: D25220903 Pulled By: mrshenli fbshipit-source-id: a485568621c4e289c5439474c2651186bc63c2f0	2020-11-30 11:16:13 -08:00
Gemfield	3c9e71c9ad	fix BUILD_MOBILE_BENCHMARK typo (#48515 ) Summary: BUILD_MOBILE_BENCHMARKS in CMakeLists.txt should be BUILD_MOBILE_BENCHMARK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48515 Reviewed By: albanD Differential Revision: D25198724 Pulled By: mrshenli fbshipit-source-id: 12765d10c272da04cb104202fcbabc6a0b007c5e	2020-11-30 08:38:43 -08:00
Ilia Cherniavskii	f2da18af14	Add USE_KINETO build option (#45888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45888 Adding USE_LIBKINETO build option Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake Reviewed By: Chillee Differential Revision: D25142221 Pulled By: ilia-cher fbshipit-source-id: d1634a8f9599604ff511fac59b9072854289510c	2020-11-21 20:20:32 -08:00
Bert Maher	8a996dd139	[te] Make BUILD_TENSOREXPR_BENCHMARK a real CMake option (#48158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48158 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25059877 Pulled By: bertmaher fbshipit-source-id: a98b6c18a91b4fe89d12bf5f7ead604e3cc0c8b0	2020-11-18 12:19:14 -08:00
Rong Rong	147a48fb27	[cmake] clean up cmake/Utils.cmake (#47923 ) Summary: Consolidate into cmake/public/utils.cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/47923 Reviewed By: samestep Differential Revision: D24955961 Pulled By: walterddr fbshipit-source-id: 9d5f6af2b353a8c6f6d521c841fd0989393755cd	2020-11-16 08:12:32 -08:00
Jiakai Liu	8e3af9faa8	[pytorch] fix debug symbol flag for android clang (#46331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46331 Fix the android build size issue #46246. Test Plan: Imported from OSS Reviewed By: dhruvbird Differential Revision: D24390061 Pulled By: ljk53 fbshipit-source-id: b4a6f297e89b9c08dff4297c6a41aabd41d9fff5	2020-11-10 14:55:43 -08:00
Ashkan Aliabadi	6cd8b5e9a7	Provide CMake option to enable Vulkan API. (#46503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24379144 Pulled By: AshkanAliabadi fbshipit-source-id: 8d8c57f96bbac2a44615828a3474c912704f3a85	2020-10-20 18:45:52 -07:00
Pritam Damania	cb3c1d17e4	Promote -Wcast-function-type to an error in builds. (#46356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46356 Adding the flag `-Werror=cast-function-type` to ensure we don't allow any invalid casts (ex: PyCFunction casts). For more details see: https://github.com/pytorch/pytorch/issues/45419 ghstack-source-id: 114632980 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D24319759 fbshipit-source-id: 26ce4650c220e8e9dd3550245f214c7e6c21a5dc	2020-10-20 18:09:06 -07:00
Tao Xu	495070b388	[Metal] Add the Python binding for optimize_for_mobile (#46456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46456 Add the python binding in CMake. The general workflow is - Build pytorch - `USE_PYTORCH_METAL=ON python setup.py install --cmake` - Run optimize_for_mobile ``` import torch from torch.utils.mobile_optimizer import optimize_for_mobile scripted_model = torch.jit.load('./mobilenetv2.pt') optimized_model = optimize_for_mobile(scripted_model, backend='metal') torch.jit.export_opnames(optimized_model) torch.jit.save(optimized_model, './mobilenetv2_metal.bc') ``` The exported ops are ``` ['aten::adaptive_avg_pool2d', 'aten::add.Tensor', 'aten::addmm', 'aten::reshape', 'aten::size.int', 'metal::copy_to_host', 'metal_prepack::conv2d_run'] ``` ghstack-source-id: 114559878 Test Plan: - Sandcastle CI - Circle CI Reviewed By: kimishpatel Differential Revision: D24356768 fbshipit-source-id: fb5c4c4b6316347b67edb4132da044a81470ddfd	2020-10-17 10:26:25 -07:00
Tao Xu	04e5fcc0ed	[GPU] Introduce USE_PYTORCH_METAL (#46383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46383 The old `USE_METAL` is actually being used by Caffe2. Here we introduce a new macro to enable metal in pytorch. ghstack-source-id: 114499392 Test Plan: - Circle CI - The Person Segmentation model works Reviewed By: linbinyu Differential Revision: D24322018 fbshipit-source-id: 4e5548afba426b49f314366d89b18ba0c7e745ca	2020-10-16 18:19:32 -07:00
Michael Ranieri	b1d24dded1	make a way to disable callgrind (#46116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46116 Ideally I would just use one of the existing preprocessor flags such as `FBCODE_CAFFE2`, but this implies a whole bunch of other things elsewhere, so it is not really a solution for ovrsource. Test Plan: CI green, we are able to disable it internally with `-DNVALGRIND` Reviewed By: malfet Differential Revision: D24227360 fbshipit-source-id: 24a3b393cf46d6a16acca0a9ec52610d4bb8704f	2020-10-13 16:18:04 -07:00
Tao Xu	a277c097ac	[iOS][GPU] Add Metal/MPSCNN support on iOS (#46112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112 ### Summary This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta. allow-large-files - Users API ``` auto module = torch::jit::load(model); module.eval(); at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal(); auto output = module.forward({input}).toTensor().cpu(); ``` - Supported Models - Person Segmentation v106 (FB Internal) - Mobilenetv2 - Supported Operators - aten::conv2d - aten::addmm - aten::add.Tensor - aten::sub.Tensor - aten::mul.Tensor - aten::relu - aten::hardtanh - aten::hardtanh_ - aten::sigmoid - aten::max_pool2d - aten::adaptive_avg_pool2d - aten::reshape - aten::t - aten::view - aten::log_softmax.int - aten::upsample_nearest2d.vec - Supported Devices - Apple A9 and above - iOS 10.2 and above - CMake scripts - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON` ### Test Plan - Circle CI ghstack-source-id: 114155638 Test Plan: 1. Sandcastle CI 2. Circle CI Reviewed By: dreiss Differential Revision: D23236555 fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625	2020-10-13 01:46:56 -07:00
gunandrose4u	ffd50b8220	SET USE_DISTRIBUTED OFF when libuv is not installed (#45554 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45554 Reviewed By: izdeby Differential Revision: D24016825 Pulled By: mrshenli fbshipit-source-id: 332d860429626a915c06f98cad31e6db1cbc4eb1	2020-09-30 12:46:36 -07:00
gunandrose4u	0a38aed025	Auto set libuv_ROOT env var for Gloo submodule on Windows platform (#45484 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45484 Reviewed By: lw Differential Revision: D23990724 Pulled By: mrshenli fbshipit-source-id: 1987ce7eb7d3f9d3120c07e954cd6581cd3caf59	2020-09-29 08:58:56 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
Ivan Kobzarev	6debe825be	[vulkan] glsl shaders relaxed precision mode to cmake option (#43076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43076 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23143354 Pulled By: IvanKobzarev fbshipit-source-id: 7b3ead1e63cf8acf6e8e547080a8ead7a2db994b	2020-09-16 12:51:34 -07:00
peter	ed862d3682	Split CUDA_NVCC_FLAGS by space (#44603 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603 Reviewed By: albanD Differential Revision: D23692320 Pulled By: ezyang fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754	2020-09-14 20:25:37 -07:00
Marcin Juszkiewicz	e261e0953e	Fix centos8 gcc (#44644 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44198 properly this time Pull Request resolved: https://github.com/pytorch/pytorch/pull/44644 Reviewed By: albanD Differential Revision: D23684909 Pulled By: malfet fbshipit-source-id: cea6f6e2ae28138f6b93a6513d1abd36d14ae573	2020-09-14 12:28:09 -07:00
Marcin Juszkiewicz	566b8d0650	handle missing NEON vst1__x2 intrinsics (#44198 ) (#44199 ) Summary: CentOS 8 on AArch64 has vld1_ intrinsics but lacks vst1q_f32_x2 one. This patch checks for it and handle it separately to vld1_* ones. Fixes https://github.com/pytorch/pytorch/issues/44198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44199 Reviewed By: seemethere Differential Revision: D23641273 Pulled By: malfet fbshipit-source-id: c2053c8e0427705eaeeeb82ec030925bff22623a	2020-09-11 16:02:44 -07:00
Yujun	db24c5c582	Change code coverage option name (#43999 ) Summary: According to [documentation](https://github.com/pytorch/pytorch/blob/master/tools/setup_helpers/cmake.py#L265), only options starts with `BUILD_` / `USE_` / `CMAKE_` in `CMakeLists.txt` can be imported by environment variables. --- This diff is originally intended to enable `c++` source coverage with `CircleCI` and `codecov.io`, but we will finish it in the future. You can find the related information in the diff history. Following is the originally procedur: Based on [this pull request](`1bda5e480c`), life becomes much easier for this time. 1.in `build.sh` - Enable coverage builld option for c++ - `apt-get install lcov` 2.in `test.sh` - run `lcov` 3.in `pytorch-job-specs.yml` - copy coverage.info to `test/` folder and upload it to codecov.io Pull Request resolved: https://github.com/pytorch/pytorch/pull/43999 Test Plan: Test on github Reviewed By: malfet Differential Revision: D23464656 Pulled By: scintiller fbshipit-source-id: b2365691f04681d25ba5c00293fbcafe8e8e0745	2020-09-11 15:55:05 -07:00
Bram Wasti	6512032699	[Static Runtime] Add OSS build for static runtime benchmarks (#43881 ) Summary: Adds CMake option. Build with: ``` BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43881 Reviewed By: hlu1 Differential Revision: D23430708 Pulled By: bwasti fbshipit-source-id: a39bf54e8d4d044a4a3e4273a5b9a887daa033ec	2020-09-02 08:00:18 -07:00
Sebastian Pop	c259146477	add missing NEON {vld1,vst1}_*_x2 intrinsics (#43683 ) Summary: Workaround for issue https://github.com/pytorch/pytorch/issues/43265. Add the missing intrinsics until gcc-7 gets the missing patches backported. Fixes https://github.com/pytorch/pytorch/issues/43265. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43683 Reviewed By: albanD Differential Revision: D23467867 Pulled By: malfet fbshipit-source-id: 7c138dd3de3c45852a60f2cfe8b4d7f7cf76bc7e	2020-09-01 21:19:39 -07:00
Rong Rong	8ca3913f47	Introduce BUILD_CAFFE2 flag (#43673 ) Summary: introduce BUILD_CAFFE2 flag. default to `ON`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43673 Reviewed By: malfet Differential Revision: D23381035 Pulled By: walterddr fbshipit-source-id: 1f4582987fa0c4a911f0b18d311c04fdbf8dd8f0	2020-09-01 10:18:23 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Ann Shan	0dc41ff465	[pytorch] add flag for autograd ops to mobile builds (#43154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43154 Adds the build flag `BUILD_MOBILE_AUTOGRAD` which toggles whether autograd files should be included for a PyTorch mobile build (default off). ghstack-source-id: 110369406 Test Plan: CI Reviewed By: ljk53 Differential Revision: D23061913 fbshipit-source-id: bc3d6683ab17f158990d83e4fae0a011d5adeca1	2020-08-20 12:39:55 -07:00
Xiang Gao	ee74c2e5be	Compress fatbin to fit into 32bit indexing (#43074 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39968 tested with `TORCH_CUDA_ARCH_LIST='3.5 5.2 6.0 6.1 7.0 7.5 8.0+PTX'`, before this PR, it was failing, and with this PR, the build succeed. With `TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0+PTX'`, `libtorch_cuda.so` with symbols changes from 2.9GB -> 2.2GB cc: ptrblck mcarilli jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43074 Reviewed By: mrshenli Differential Revision: D23176095 Pulled By: malfet fbshipit-source-id: 7b3e6d049fc080e519f21e80df05ef68e7bea57e	2020-08-18 09:48:54 -07:00
Nikita Shulga	0cf4a5bccb	Add GCC codecoverage flags (#43066 ) Summary: Rename `CLANG_CODE_COVERAGE` option to `CODE_COVERAGE` and add compiler specific flags for GCC and Clang Pull Request resolved: https://github.com/pytorch/pytorch/pull/43066 Reviewed By: scintiller Differential Revision: D23137488 Pulled By: malfet fbshipit-source-id: a89570469692f878d84f7da6f9d5dc01df423e80	2020-08-14 17:16:18 -07:00
Nikita Shulga	ea65a56854	Use `string(APPEND FOO " bar")` instead of `set(FOO "${FOO} bar") (#42844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42844 Reviewed By: scintiller Differential Revision: D23067577 Pulled By: malfet fbshipit-source-id: e4380ce02fd6aca37c955a7bc24435222c5d8b19	2020-08-12 10:33:11 -07:00
Yujun Zhao	7524699d58	Modify clang code coverage to CMakeList.txt (for MacOS) (#42837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42837 Originally we use ``` list(APPEND CMAKE_C_FLAGS -fprofile-instr-generate -fcoverage-mapping) list(APPEND CMAKE_CXX_FLAGS -fprofile-instr-generate -fcoverage-mapping) ``` But when compile project on mac with Coverage On, it has the error: `clang: error: no input files /bin/sh: -fprofile-instr-generate: command not found /bin/sh: -fcoverage-mapping: command not found` The reason behind it, is `list(APPEND CMAKE_CXX_FLAGS` will add an additional `;` to the variable. This means, if we do `list(APPEND foo a)` and then `list(APPEND foo b)`, then `foo` will be `a;b` -- with the additional `;`. Since we have `CMAKE_CXX_FLAGS` defined before in the `CMakeList.txt`, we can only use `set(...)` here After changing it to ``` set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fprofile-instr-generate -fcoverage-mapping") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fprofile-instr-generate -fcoverage-mapping") ``` Test successufully in local mac machine. Test Plan: Test locally on mac machine Reviewed By: malfet Differential Revision: D23043057 fbshipit-source-id: ff6f4891b35b7f005861ee2f8e4c550c997fe961	2020-08-11 09:57:55 -07:00
Khalid Almufti	b282297559	Replace whitelist with allowlist (#42067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41757 I've replaced all the whitelist with allowlist for this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067 Reviewed By: pbelevich Differential Revision: D22791690 Pulled By: malfet fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4	2020-07-28 08:01:16 -07:00
Edward Yang	befb22790f	Fix a number of deprecation warnings (#40179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40179 - Pass no-psabi to shut up GCC about # Suppress "The ABI for passing parameters with 64-byte alignment has changed in GCC 4.6" - Fix use of deprecated data() accessor (and minor optimization: hoist accessor out of loop) - Undeprecate NetDef.num_workers, no one is serious about fixing these - Suppress warnings about deprecated pthreadpool types Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22234138 Pulled By: ezyang fbshipit-source-id: 6a1601b6d7551a7e6487a44ae65b19acdcb7b849	2020-07-14 09:11:34 -07:00
Kimish Patel	d6feb6141f	[Vec256][neon] Add neon backend for vec256 (#39341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341 This PR introduces neon backend for vec256 class for float datatype. For now only aarch64 is enabled due to few issues with enabling in aarch32 bit. Test Plan: vec256_test Imported from OSS Differential Revision: D21822399 fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d	2020-07-09 16:25:09 -07:00
Kimish Patel	bddba1e336	Add benchmark for add op. (#40059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059 This benchmark is added specifically for mobile to see if compiler is autovectorizing and thus we have no advantage of neon backend for vec256 for add op. Test Plan: CI Imported from OSS Differential Revision: D22055146 fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5	2020-07-09 16:22:55 -07:00
Yujun Zhao	22f940b7bd	add clang code coverage compile flags (#41103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41103 add a CLANG_CODE_COVERAGE option to CMakeList. If the option is ON, add code coverage needed compile flags. Test Plan: Clone pytorch source code to local, modified these changes and builded it with `CLANG_CODE_COVERAGE ON` and `BUILD_TESTS ON`. Run a manual test and attach code coverage report. {F243609020} Reviewed By: malfet Differential Revision: D22422513 fbshipit-source-id: 27a31395c31b5b5f4b72523954722771d8f61080	2020-07-09 14:14:18 -07:00
David Reiss	b7e044f0e5	Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5	2020-06-23 19:26:21 -07:00
Kate Mormysh	92d3182c11	Revert D21232894: Unify PyTorch mobile's threadpool usage. Test Plan: revert-hammer Differential Revision: D21232894 (`b9d3869df3`) Original commit changeset: 8b3de86247fb fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd	2020-06-23 17:09:14 -07:00
Ashkan Aliabadi	b9d3869df3	Unify PyTorch mobile's threadpool usage. (#37243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243 * Why * As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version. The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands. This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell. So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do. The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the exact same third party implementation in this PR. Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged. * How * This is where things get tricky. A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use. pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far. Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration. When it is all said or done, the layering will look like this: a) aten::parallel_for, uses b) caffe2::PThreadPool, which uses c) pthreadpool C API, which delegates to c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here. c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to c-2-1) caffe2::ThreadPool, and the rabbit hole ends here. NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b). Differential Revision: D21232894 Test Plan: Imported from OSS Reviewed By: dreiss Pulled By: AshkanAliabadi fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354	2020-06-23 16:34:51 -07:00
Ivan Kobzarev	74a2cb87e3	[android][cmake] Remove NO_EXPORT for libtorch mobile build (#39584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39584 Removing `-DNO_EXPORT` for not-custom-build to be able to link to C10/A10 api. Custom build stays the same as its main goal is to have minimum binary size, while export api functions will increase it. Additional changes: 1. aten/src/ATen/DynamicLibrary.cpp uses libdl, if we need this functionality we will need to link result with libdl, but currently disabling this functionality for mobile. Test Plan: Imported from OSS Differential Revision: D22111600 Pulled By: IvanKobzarev fbshipit-source-id: d730201c55f543c959a596b34be532aecee6b9ab	2020-06-18 11:48:53 -07:00
peter	0f39ed86a7	Cleanup debug info switches with MSVC (#39703 ) Summary: Switch off `/Z7` so that we don't generate debug info in Release and MinSizeRel builds, so that we will probably get smaller static libraries and object files and faster build time Pull Request resolved: https://github.com/pytorch/pytorch/pull/39703 Differential Revision: D21960684 Pulled By: ezyang fbshipit-source-id: 909a237a138183591d667885b13fc311470eed65	2020-06-09 14:11:40 -07:00
Hong Xu	89c0efb30b	Also set CMAKE_C_STANDARD for MSVC (#39304 ) Summary: According to <https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/Compiler/MSVC-C.cmake>, the option simply has no effect for MSVC as of today. It is better to not impose such an if condition as it is a bit misleading (the current code makes it look like we have compatibility issues with MSVC C11 support), and also it's better to leave the judgment of MSVC C support to CMake devs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39304 Differential Revision: D21846032 Pulled By: malfet fbshipit-source-id: 962e5721da3d7b9be4117b42bdc35df426b7da7b	2020-06-02 13:59:07 -07:00
Ivan Kobzarev	b460465a18	[Mobile GPU][Integration] Vulkan backend integration (#36491 ) Summary: This PR contains the initial version of Vulkan (GPU) Backend integration. The primary target environment is Android, but the desktop build is also supported. ## CMake Introducing three cmake options: USE_VULKAN: The main switch, if it is off, all other options do not affect. USE_VULKAN_WRAPPER: ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h. OFF - linking with libvulkan.so directly USE_VULKAN_SHADERC_RUNTIME: ON - Shader compilation library will be linked, and shaders will be compiled runtime. OFF - Shaders will be precompiled and shader compilation library is not included. ## Codegen if `USE_VULKAN_SHADERC_RUNTIME` is ON: Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp. if `USE_VULKAN_SHADERC_RUNTIME` is OFF: The source of shaders is included as `glsl.h`,`glsl.cpp`. All codegen results happen in the build directory. ## Build dependencies cmake/Dependencies.cmake If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK. Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it. (Desktop build was tested only on Linux). ## Pytorch integration: Adding 'Vulkan" as new Backend, DispatchKey, DeviceType. We are using Strided layout without supporting strides at the moment, but we plan to support them in the future. Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor, more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h` Main code location: `aten/src/ATen/native/vulkan` `aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor. `aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops. `aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API ## GLSL shaders Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files. All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3 ## Supported operations Code point: conv2d no-groups conv2d depthwise addmm upsample nearest 2d clamp hardtanh ## Testing `aten/src/ATen/test/vulkan_test.cpp` - contains tests for copy from CPU to Vulkan and back all supported operations Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader ## Vulkan execution The initial implementation is trivial and waits every operator's execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491 Differential Revision: D21696709 Pulled By: IvanKobzarev fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa	2020-05-26 08:30:13 -07:00
Gregory Chanan	b27be3e0c5	Avoid double dispatch in logical_not for compilation speed reasons. (#38565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38565 Also note this turns on "-Wno-unused-local-typedefs" because we are using dispatch macros for error checking. Test Plan: Imported from OSS Differential Revision: D21598478 Pulled By: gchanan fbshipit-source-id: 28f9ad01bd678df0601a10d0daf3ed31c47c4ab2	2020-05-18 09:25:54 -07:00
Nikita Shulga	dc918162b7	Remove `Caffe2_MAIN_LIBS` (#38408 ) Summary: Right now it is an unused alias to `torch_library` interface library Pull Request resolved: https://github.com/pytorch/pytorch/pull/38408 Differential Revision: D21598250 Pulled By: malfet fbshipit-source-id: ec9a2446b94e7ea68298831212005c2c80bbc95c	2020-05-15 12:27:15 -07:00
Wojciech Baranowski	945672bf3e	cmake: improve dependencies in incremental builds (#37661 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26304 Test procedure: With ninja: [x] Build a clean checkout [x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. [x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files [x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding. [x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. Without ninja: [x] Build a clean checkout [x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661 Differential Revision: D21434624 Pulled By: ezyang fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338	2020-05-06 14:25:18 -07:00
Brian Vaughan	d4edbbd396	Revert D21369541: Make a separate cmake option for caffe2 tests Test Plan: revert-hammer Differential Revision: D21369541 Original commit changeset: 669cff70c5b5 fbshipit-source-id: 500d261eaf3f02bcd698d343480b9e951e2844b9	2020-05-05 06:30:52 -07:00
Michael Suo	aff92ef3d6	Make a separate cmake option for caffe2 tests (#37721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37721 Even though we disabled caffe2 test configs in Python, the BUILD_TEST option was still building caffe2 test cpp binaries and various CI configurations were running them (since they just run every binary in `torch/test`). This PR adds a caffe2-specific BUILD_TEST option (BUILD_CAFFE2_TEST), which defaults to OFF, and gates the compilation of caffe2 test cpp binaries under it. Test Plan: Imported from OSS Differential Revision: D21369541 Pulled By: suo fbshipit-source-id: 669cff70c5b53f016e8e016bcb3a99bf3617e1f9	2020-05-04 23:26:27 -07:00
Lucas Hosseini	8a30553738	[TensorPipe/RPC] Add TensorPipe dependency (#36695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36695 Reviewed By: lw Differential Revision: D21312297 Pulled By: beauby fbshipit-source-id: 39fdc3de91efa4ac97dd169f09fb304b273b0050	2020-04-30 11:05:15 -07:00
Mo Zhou	69e2f1aaff	[cmake] add HAVE_SOVERSION option (default=OFF). (#37502 ) Summary: This is useful for linux distributions when the ABI/API of libtorch has been changed. The default SOVERSION is set to "${TORCH_VERSION_MAJOR}.${TORCH_VERSION_MINOR}". ezyang But if the release strategy of pytorch/caffe2 involves avoiding breaking API/ABI changes to libtorch for minor/patch releases, then we can set `TORCH_SOVERSION` to simply `TORCH_VERSION_MAJOR`. Please confirm that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37502 Differential Revision: D21303565 Pulled By: ezyang fbshipit-source-id: 798f5ec7fc5f0431ff1a7f9e8e5d3a0d3b25bb22	2020-04-30 06:52:33 -07:00
Mo Zhou	58a46a174e	[cmake] add USE_SYSTEM_{XNNPACK,ONNX} options. (#37501 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37501 Differential Revision: D21303527 Pulled By: ezyang fbshipit-source-id: 58353d78c66e5bcc9198ce8cde36ac7232bb4b2f	2020-04-29 09:26:16 -07:00
peter	c5d6f59ab1	Replacing EHa with EHsc (#37235 ) Summary: We should not rely on the async exceptions. Catching C++ only exception is more sensible and may get a boost in both space (1163 MB -> 1073 MB, 0.92x) and performance(51m -> 49m, 0.96x). Pull Request resolved: https://github.com/pytorch/pytorch/pull/37235 Differential Revision: D21256918 Pulled By: ezyang fbshipit-source-id: 572ee96f2e4c48ad13f83409e4e113483b3a457a	2020-04-28 08:20:37 -07:00
Mo Zhou	5b9f7f7b0e	[cmake] Add USE_SYSTEM_{GLOO,FP16,PTHREADPOOL,PSIMD,FXDIV,BENCHMARK} options (#14699 ) (#37277 ) Summary: These options are disabled by default, and are supposed to be used by linux distro developers. With the existing shortcut option USE_SYSTEM_LIBS toggled, these new options will be enabled as well. Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should no longer check the existence of git submodules. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277 Differential Revision: D21256999 Pulled By: ezyang fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf	2020-04-27 09:37:27 -07:00
Mo Zhou	ff21b15624	cmake: add USE_SYSTEM_{LIBS,CPUINFO,SLEEF} options (#14699 ) (#37137 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37137 Differential Revision: D21222632 Pulled By: ezyang fbshipit-source-id: 47624b30f8d07b31a40a26edf665bbec39e45202	2020-04-23 20:43:36 -07:00
Nikita Shulga	4593d87b84	Do not link torch_python with nccl (#37040 ) Summary: If NCCL is used, just allow `torch_python` to access enums defined in header file Pull Request resolved: https://github.com/pytorch/pytorch/pull/37040 Test Plan: `cmake ../pytorch -DPYTHON_EXECUTABLE=/usr/bin/python3.7 -DCMAKE_BUILD_TYPE=RELWITHDEBINFO -DUSE_CUDA=YES -DBUILD_TEST=YES -DUSE_NCCL=YES -DUSE_DISTRIBUTED=NO -DCMAKE_CXX_COMPILER=/usr/bin/cuda-g++ -DCMAKE_C_COMPILER=/usr/bin/cuda-gcc -DUSE_MKLDNN=ON -G Ninja` + `ninja torch_python` Differential Revision: D21171573 Pulled By: malfet fbshipit-source-id: e5eba0f610da3b0fcd17342ad46458dc7b0d251b	2020-04-21 21:00:49 -07:00
Shen Li	a14a8376aa	Link NCCL lib to TORCH_PYTHON_LINK_LIBRARIES when USE_NCCL=1 (#36948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36948 Compiling with USE_DISTRIBUTED=0 fails as it would still try to compile python_nccl.cpp which requires NCCL but the NCCL lib is not linked. Test Plan: Imported from OSS Differential Revision: D21142012 Pulled By: mrshenli fbshipit-source-id: 6ca94056ca859da7f833a31edcb4c5260d8625e4	2020-04-20 19:15:02 -07:00
Yinghai Lu	c1efe1ddb5	Enable building of FakeLowP ops (#36170 ) Summary: We open sourced the FakeLowp ops as a reference implementation of fp16 ops. This PR makes it buildable. ``` USE_CUDA=0 USE_ROCM=0 USE_FAKELOWP=ON python setup.py install ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36170 Test Plan: Build Onnxifi library in Glow. ``` cp ${GLOW}/build/lib/Onnxifi/libonnxifi-glow.so ${MY_PATH}/ibonnxifi.so LD_LIBRARY_PATH=${MY_PATH}/ibonnxifi.so python pytorch/caffe2/python/fakelowp/test_sls_nnpi_fp16.py ``` It doesn't run successfully right now because we need to open source the glow gflags and some other ops like `FbgemmPack`. Reviewed By: houseroad Differential Revision: D20980681 Pulled By: yinghai fbshipit-source-id: 6dd31883a985850a77261bcc527029479bbc303f	2020-04-11 13:17:59 -07:00
Nikita Shulga	59ed0c5fd7	Strip newline when ingesting `version.txt` (#36002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36002 Test Plan: Run cmake and observe there are no warning in stdout nor in `CMakeCache.txt` Differential Revision: D20872854 Pulled By: malfet fbshipit-source-id: 8a61b63b3d564e597e7a62dd913c97bc64b183b9	2020-04-06 13:21:10 -07:00
Eli Uriegas	a53328e89c	cmake: Grab TORCH_DEFAULT_VERSION from version.txt (#35260 ) Summary: This variable hasn't been updated in a long time since it usually just gets overwritten by whatever is in the setup.py but let's set the default to something a bit more in-line with what we're actually building. Closes https://github.com/pytorch/pytorch/issues/35210 cc ksasso1028 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35260 Differential Revision: D20818302 Pulled By: seemethere fbshipit-source-id: 530fe137e45be1d0ac0233525c80f7099c17b05a	2020-04-02 11:57:47 -07:00
peter	3bdc4a37ed	CMake script cleanup - mixed case for function names (#35589 ) Summary: Running the following code. ```bash cmake --help-command-list \| grep -v "cmake version" \| while read c; do echo 's/\b'"$(echo $c \| tr '[:lower:]' '[:upper:]')"'$\s$(/'"$c"'\1(/g' done >convert.sed && git ls-files -z -- bootstrap '.cmake' '.cmake.in' 'CMakeLists.txt' \| egrep -z -v '^(cmake/Modules/\|cmake/Modules_CUDA_fix/)' \| xargs -0 sed -i -f convert.sed && rm convert.sed ``` cmake-lint is too sensitive about mixed case so I didn't switch the check on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35589 Differential Revision: D20735648 Pulled By: ezyang fbshipit-source-id: a09a60a7ce921bb198575a35335faa299bd10b66	2020-03-30 11:37:02 -07:00
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
Nikita Shulga	512bcf68be	[Formatting] `if (` -> `if(` in CMakeLists.txt (#35343 ) Summary: Same to `else`, `endif` and `elseif`. Also prefer lowercase over uppercase ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/35343 Test Plan: None at all Differential Revision: D20638789 Pulled By: malfet fbshipit-source-id: 8058075693185e66f5dda7b825b725e139d0d000	2020-03-25 13:48:42 -07:00
Nikita Shulga	93983c7d00	Add `USE_TSAN` option (#35197 ) Summary: Sometimes it is important to run code with thread sanitizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35197 Test Plan: CI Differential Revision: D20605005 Pulled By: malfet fbshipit-source-id: bcd1a5191b5f859e12b6df6737c980099b1edc36	2020-03-23 14:56:42 -07:00
Nikita Shulga	f87cd83d11	Append multiple arguments to list of flags as multiple items (#34899 ) Summary: This makes PyTorch compileable(but not linkable) with `CUDA_SEPARABLE_COMPILATION` option enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34899 Test Plan: CI Differential Revision: D20501050 Pulled By: malfet fbshipit-source-id: 02903890a827fcc430a26f397d4d05999cf3a441	2020-03-17 16:48:32 -07:00
Edward Yang	76d9e76b4a	Default to erroring when failing to return from non-void function. (#34663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34663 Been bitten by this so many times. Never more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20425480 Pulled By: ezyang fbshipit-source-id: c4489efacc4149c9b57d1b8207cc872970c2501f	2020-03-17 07:31:56 -07:00
Kimish Patel	84bd71dbd4	Enable threading for XNNPACK ops. (#34547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34547 This enables threading by passing a threadpool to xnnpack ops. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20370553 fbshipit-source-id: 4db08e73f8c69b9e722b0e11a00621c4e229a31a	2020-03-14 12:53:36 -07:00
Edward Yang	63964175b5	Revert D20379910: [pytorch][PR] Set USE_RCCL cmake option (dependent on USE_NCCL) Test Plan: revert-hammer Differential Revision: D20379910 Original commit changeset: 981f924be93d fbshipit-source-id: 2cfc2eebe6ebabf801f0ea6a183aad2342ada79f	2020-03-11 07:41:13 -07:00
Jithun Nair	ce77d4a316	Set USE_RCCL cmake option (dependent on USE_NCCL) (#31341 ) Summary: so that Gloo build has RCCL path enabled for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/31341 Differential Revision: D20379910 Pulled By: ezyang fbshipit-source-id: 981f924be93ddcc0705c1934f92d938c29aaf312	2020-03-10 20:26:09 -07:00
Jiakai Liu	9a5e9d8cec	[pytorch][mobile] change mobile build scripts to build PyTorch by default (#34203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34203 Currently cmake and mobile build scripts still build libcaffe2 by default. To build pytorch mobile users have to set environment variable BUILD_PYTORCH_MOBILE=1 or set cmake option BUILD_CAFFE2_MOBILE=OFF. PyTorch mobile has been released for a while. It's about time to change CMake and build scripts to build libtorch by default. Changed caffe2 CI job to build libcaffe2 by setting BUILD_CAFFE2_MOBILE=1 environment variable. Only found android CI for libcaffe2 - do we ever have iOS CI for libcaffe2? Test Plan: Imported from OSS Differential Revision: D20267274 Pulled By: ljk53 fbshipit-source-id: 9d997032a599c874d62fbcfc4f5d4fbf8323a12e	2020-03-05 23:40:47 -08:00
Jiakai Liu	385067ed4f	[pytorch][cmake] improve build mobile with host toolchain (#34187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34187 Noticed that a recent PR broke Android/iOS CI but didn't break mobile build with host toolchain. Turns out one mobile related flag was not set on PYTORCH_BUILD_MOBILE code path: ``` "set(INTERN_DISABLE_MOBILE_INTERP ON)" ``` First, move the INTERN_DISABLE_MOBILE_INTERP macro below, to stay with other "mobile + pytorch" options - it's not relevant to "mobile + caffe2" so doesn't need to be set as common "mobile" option; Second, rename PYTORCH_BUILD_MOBILE env-variable to BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN - it's a bit verbose but becomes more clear what it does - there is another env-variable "BUILD_PYTORCH_MOBILE" used in scripts/build_android.sh, build_ios.sh, which toggles between "mobile + pytorch" v.s. "mobile + caffe2"; Third, combine BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN with ANDROID/IOS to avoid missing common mobile options again in future. Test Plan: Imported from OSS Differential Revision: D20251864 Pulled By: ljk53 fbshipit-source-id: dc90cc87ffd4d0bf8a78ae960c4ce33a8bb9e912	2020-03-04 11:43:16 -08:00
Jiakai Liu	3c042a6ab9	[pytorch][mobile] support for custom mobile build with dynamic dispatch (#34055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34055 Enable custom mobile build with dynamic dispatch for OSS build. It calls a python util script to calculate transitive dependencies from the op dependency graph and the list of used root ops, then pass the result as the op registration whitelist to aten codegen, so that only these used ops are registered and kept at link time. For custom build with dynamic dispatch to work correctly, it's critical to have the accurate list of used ops. Current assumption is that only those ops referenced by TorchScript model are used. It works well if client code doesn't call libtorch API (e.g. tensor methods) directly; otherwise the extra used ops need to be added to the whitelist manually, as shown by the HACK in prepare_model.py. Also, if JIT starts calling extra ops independent of specific model, then the extra ops need to be added to the whitelist as well. Verified the correctness of the whole process with MobileNetV2: ``` TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh ``` Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D20193327 Pulled By: ljk53 fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa	2020-03-03 19:25:16 -08:00
Xiang Gao	1beb309e03	Make DEBUG == REL_WITH_DEB_INFO on CUDA build (#34153 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/34079 I don't know how much we care about the difference between `-G` and `-lineinfo` in `DEBUG` vs `REL_WITH_DEB_INFO`, but since `-G` never worked, let's just use `-lineinfo` on both `DEBUG` and `REL_WITH_DEB_INFO`. This would resolve the failure in `DEBUG=1` build. Locally tested to work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34153 Reviewed By: ljk53 Differential Revision: D20232049 Pulled By: ngimel fbshipit-source-id: 4e48ff818850ba911298b0cc159522f33a305aaa	2020-03-03 15:07:42 -08:00
Peter Bell	4b3ae7e0af	Enable -Werror=format compile errors on torch exception types (#34019 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33899 In the issue, we have ``` TypeError("expected %s (got %s)", dispatch_key, toString(other.key_set()).c_str()); ``` which results in `dispatch_key` being interpreted as a c-string by `sprintf`. Adding `__attrbute__((format))` to the `TypeError` constructor allows gcc or clang to detect this at compile time. Then `-Werror=format` makes it a hard error at compile time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34019 Differential Revision: D20194842 Pulled By: ezyang fbshipit-source-id: fa4448916c309d91e3d949fa65bb3aa7cca5c6a8	2020-03-02 13:25:39 -08:00
Ashkan Aliabadi	6aecfd1e80	Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c	2020-02-24 21:58:56 -08:00
Ashkan Aliabadi	039dc90854	Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration. Test Plan: revert-hammer Differential Revision: D19521853 Original commit changeset: 99a1fab31d0e fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6	2020-02-23 22:07:19 -08:00
Ashkan Aliabadi	941b42428a	Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509 ) Summary: In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Reviewed By: dreiss Differential Revision: D19521853 Pulled By: AshkanAliabadi fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa	2020-02-23 19:08:42 -08:00
Hong Xu	15ba902c08	Turn ONNX_ML into a proper build option. (#33424 ) Summary: The detection of the env variable ONNX_ML has been properly handled in tools/setup_helpers/cmake.py, line 242. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33424 Differential Revision: D20043991 Pulled By: ezyang fbshipit-source-id: 91d1d49a5a12f719e67d9507cc203c8a40992f03	2020-02-21 15:42:33 -08:00
Tao Xu	9c0625b004	[iOS] Add watchOS support (#33318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33318 ### Summary Recently, we have a [discussion](https://discuss.pytorch.org/t/libtorch-on-watchos/69073/14) in the forum about watchOS. This PR adds the support for building watchOS libraries. ### Test Plan - `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=WATCHOS ./scripts/build_ios.sh` Test Plan: Imported from OSS Differential Revision: D19896534 Pulled By: xta0 fbshipit-source-id: 7b9286475e895d9fefd998246e7090ac92c4c9b6	2020-02-14 14:02:22 -08:00
peterjc123	ebed008dd4	Correct /MP usage in MSVC (#33120 ) Summary: ## Several flags `/MP[M]`: It is a flag for the compiler `cl`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. `/maxcpucount:[M]`: It is a flag for the generator `msbuild`. It leads to project-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. `/p:CL_MPCount=[M]`: It is a flag for the generator `msbuild`. It leads the generator to pass `/MP[M]` to the compiler. `/j[M]`: It is a flag for the generator `ninja`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. ## Reason for the change 1. Object-level multiprocessing is preferred over project-level multiprocessing. 2. ~For ninja, we don't need to set `/MP` otherwise M * M processes will be spawned.~ Actually, it is not correct because in ninja configs, there are only one source file in the command. Therefore, the `/MP` switch should be useless. 3. For msbuild, if it is called through Python configuration scripts, then `/p:CL_MPCount=[M]` will be added, otherwise, we add `/MP` to `CMAKE_CXX_FLAGS`. 4. ~It may be a possible fix for https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393. Because `/MP` is also passed to `nvcc`.~ It is probably not true. Because `/MP` should not be effective given there is only one source file per command. ## Reference 1. https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=vs-2019 2. https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows 3. https://blog.kitware.com/cmake-building-with-all-your-cores/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/33120 Differential Revision: D19817227 Pulled By: ezyang fbshipit-source-id: f8d01f835016971729c7a8d8a0d1cb8a8c2c6a5f	2020-02-10 11:29:25 -08:00
Jiakai Liu	0ac31a99be	run code analysis against mobile interpreter (#32276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32276 Include mobile interpreter in mobile code analysis pass, which has some manually registered ops in temporary namespaces. The mobile interpreter is still under development and these ops will be removed in the future. This is a temporary step for internal build experiment. Test Plan: Imported from OSS Differential Revision: D19426818 Pulled By: ljk53 fbshipit-source-id: 507453dc801e5f93208f1baea12400beccda9ca5	2020-01-17 17:21:28 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Xiang Gao	c66ca74f03	Add device debug info to CUDA build (#31929 ) Summary: Also print NVCC flags in the summary Pull Request resolved: https://github.com/pytorch/pytorch/pull/31929 Differential Revision: D19312079 Pulled By: ezyang fbshipit-source-id: cd20d5a385f61174c1907a9ad883c04de66ef037	2020-01-08 09:56:20 -08:00
Richard Zou	9047d4df45	Remove all remaining usages of BUILD_NAMEDTENSOR (#31116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116 Changelist: - remove BUILD_NAMEDTENSOR macro - remove torch._C._BUILD_NAMEDTENSOR - remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR Future: - In the next diff, I will remove all usages of ATen/core/EnableNamedTensor.h since that header doesn't do anything anymore - After that, we'll be done with the BUILD_NAMEDTENSOR removal. Test Plan: - run CI Differential Revision: D18934951 Pulled By: zou3519 fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d	2019-12-12 09:53:03 -08:00
Edward Yang	2ced81f289	Revert "Default to not build Caffe2 operators on Windows. (#29061 )" (#30740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30740 This reverts commit `7102aceaf8`. Test Plan: Imported from OSS Differential Revision: D18834315 Pulled By: ezyang fbshipit-source-id: 2dbd1cf686864b9840365083182cd6188a285399	2019-12-05 14:01:59 -08:00
Sebastian Messmer	bc2e6d10fa	Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14" Summary: Original commit changeset: 775d2e29be0b Test Plan: CI Reviewed By: mruberry Differential Revision: D18775520 fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac	2019-12-03 14:33:43 -08:00
Liu Xiteng	8ee61e0be4	Fix CPU_INTEL flag error on windows (#30564 ) Summary: ${CMAKE_HOST_SYSTEM_PROCESSOR} get processor name by `uname -p` on linux and `%PROCESSOR_ARCHITECTURE%` on windows 1. %PROCESSOR_ARCHITECTURE% has value in (AMD64\|IA64\|ARM64) for 64-bit processor, and (x86) for 32-bit processor 2. `uname -p` has value like "(x86_64\|i[3-6]+86)" We cannot tell intel cpu from other cpus by ${CMAKE_HOST_SYSTEM_PROCESSOR}. It is the architecture, not provider. i. e. Intel CPU i7-9700K CPU on windows get "AMD64" reference: [MSDN](https://docs.microsoft.com/zh-cn/windows/win32/winprog64/wow64-implementation-details?redirectedfrom=MSDN) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30564 Differential Revision: D18763031 Pulled By: ezyang fbshipit-source-id: 11ae20e66b4b89bde1dcf4df6177606a3374c671	2019-12-02 08:43:01 -08:00
Sebastian Messmer	a2ed50c920	Revert D17908478: Switch PyTorch/Caffe2 to C++14 Test Plan: revert-hammer Differential Revision: D17908478 Original commit changeset: 6e340024591e fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d	2019-11-27 14:57:05 -08:00
Sebastian Messmer	d0acc9c085	Switch PyTorch/Caffe2 to C++14 (#30406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406 ghstack-source-id: 94642238 Test Plan: waitforsandcastle Differential Revision: D17908478 fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb	2019-11-27 10:47:31 -08:00
Jiakai Liu	43fb0015db	custom build script (#30144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144 Create script to produce libtorch that only contains ops needed by specific models. Developers can use this workflow to further optimize mobile build size. Need keep a dummy stub for unused (stripped) ops because some JIT side logic requires certain function schemas to be existed in the JIT op registry. Test Steps: 1. Build "dump_operator_names" binary and use it to dump root ops needed by a specific model: ``` build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml ``` 2. The MobileNetV2 model should use the following ops: ``` - aten::t - aten::dropout - aten::mean.dim - aten::add.Tensor - prim::ListConstruct - aten::addmm - aten::_convolution - aten::batch_norm - aten::hardtanh_ - aten::mm ``` NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm". You need fix it manually for now. 3. Run custom build script locally (use Android as an example): ``` SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` 4. Checkout demo app that uses locally built library instead of downloading from jcenter repo: ``` git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git ``` 5. Copy locally built libraries to demo app folder: ``` find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \; ``` 6. Build demo app with locally built libtorch: ``` cd ${HOME}/src/android-demo-app/HelloWorldApp ./gradlew clean && ./gradlew assembleDebug ``` 7. Install and run the demo app. In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M. Test Plan: Imported from OSS Differential Revision: D18612127 Pulled By: ljk53 fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a	2019-11-20 13:16:02 -08:00
David Reiss	d22f61432d	Update fbjni and enable PyTorch JNI build Summary: - Add a "BUILD_JNI" option that enables building PyTorch JNI bindings and fbjni. This is off by default because it adds a dependency on jni.h. - Update to the latest fbjni so we can inhibit building its tests, because they depend on gtest. - Set JAVA_HOME and BUILD_JNI in Linux binary build configurations if we can find jni.h in Docker. Test Plan: - Built on dev server. - Verified that libpytorch_jni links after libtorch when both are built in a parallel build. Differential Revision: D18536828 fbshipit-source-id: 19cb3be8298d3619352d02bb9446ab802c27ec66	2019-11-15 13:59:44 -08:00
Jiakai Liu	9371b31818	set USE_STATIC_DISPATCH outside cmake (#29715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29715 Previous we hard code it to enable static dispatch when building mobile library. Since we are exploring approaches to deprecate static dispatch we should make it optional. This PR moved the setting from cmake to bash build scripts which can be overridden. Test Plan: - verified it's still using static dispatch when building with these scripts. Differential Revision: D18474640 Pulled By: ljk53 fbshipit-source-id: 7591acc22009bfba36302e3b2a330b1428d8e3f1	2019-11-14 20:41:29 -08:00
Ivan Kobzarev	92b9de1428	Test application for profiling, CMake params for debug symbols (#28406 ) Summary: Reason: To have one-step build for test android application based on the current code state that is ready for profiling with simpleperf, systrace etc. to profile performance inside the application. ## Parameters to control debug symbols stripping Introducing /CMakeLists parameter `ANDROID_DEBUG_SYMBOLS` to be able not to strip symbols for pytorch (not add linker flag `-s`) which is checked in `scripts/build_android.sh` On gradle side stripping happens by default, and to prevent it we have to specify ``` android { packagingOptions { doNotStrip "*/.so" } } ``` which is now controlled by new gradle property `nativeLibsDoNotStrip ` ## Test_App `android/test_app` - android app with one MainActivity that does inference in cycle `android/build_test_app.sh` - script to build libtorch with debug symbols for specified android abis and adds `NDK_DEBUG=1` and `-PnativeLibsDoNotStrip=true` to keep all debug symbols for profiling. Script assembles all debug flavors: ``` └─ $ find . -type f -name *apk ./test_app/app/build/outputs/apk/mobilenetQuant/debug/test_app-mobilenetQuant-debug.apk ./test_app/app/build/outputs/apk/resnet/debug/test_app-resnet-debug.apk ``` ## Different build configurations Module for inference can be set in `android/test_app/app/build.gradle` as a BuildConfig parameters: ``` productFlavors { mobilenetQuant { dimension "model" applicationIdSuffix ".mobilenetQuant" buildConfigField ("String", "MODULE_ASSET_NAME", buildConfigProps('MODULE_ASSET_NAME_MOBILENET_QUANT')) addManifestPlaceholders([APP_NAME: "PyMobileNetQuant"]) buildConfigField ("String", "LOGCAT_TAG", "\"pytorch-mobilenet\"") } resnet { dimension "model" applicationIdSuffix ".resnet" buildConfigField ("String", "MODULE_ASSET_NAME", buildConfigProps('MODULE_ASSET_NAME_RESNET18')) addManifestPlaceholders([APP_NAME: "PyResnet"]) buildConfigField ("String", "LOGCAT_TAG", "\"pytorch-resnet\"") } ``` In that case we can setup several apps on the same device for comparison, to separate packages `applicationIdSuffix`: 'org.pytorch.testapp.mobilenetQuant' and different application names and logcat tags as `manifestPlaceholder` and another BuildConfig parameter: ``` ─ $ adb shell pm list packages \| grep pytorch package:org.pytorch.testapp.mobilenetQuant package:org.pytorch.testapp.resnet ``` In future we can add another BuildConfig params e.g. single/multi threads and other configuration for profiling. At the moment 2 flavors - for resnet18 and for mobilenetQuantized which can be installed on connected device: ``` cd android ``` ``` gradle test_app:installMobilenetQuantDebug ``` ``` gradle test_app:installResnetDebug ``` ## Testing: ``` cd android sh build_test_app.sh adb install -r test_app/app/build/outputs/apk/mobilenetQuant/debug/test_app-mobilenetQuant-debug.apk ``` ``` cd $ANDROID_NDK python simpleperf/run_simpleperf_on_device.py record --app org.pytorch.testapp.mobilenetQuant -g --duration 10 -o /data/local/tmp/perf.data adb pull /data/local/tmp/perf.data python simpleperf/report_html.py ``` Simpleperf report has all symbols: ![Screenshot 2019-10-22 11 06 21](https://user-images.githubusercontent.com/6638825/67315740-0bc50100-f4bc-11e9-8f9e-2499be13d63e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28406 Differential Revision: D18386622 Pulled By: IvanKobzarev fbshipit-source-id: 3a751192bbc4bc3c6d7f126b0b55086b4d586e7a	2019-11-08 14:19:04 -08:00
Edward Yang	7102aceaf8	Default to not build Caffe2 operators on Windows. (#29061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29061 It looks like we are too close to the maximum library size on Windows. Kill Caffe2 operators to get us lower again. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D18281083 Pulled By: ezyang fbshipit-source-id: 8a11f9059dbf330f659bd96cc0cc2abc947723a8	2019-11-04 14:32:47 -08:00
Sergei Nikolaev	1e2049c566	#26426 fixed (#28715 ) Summary: This is the fix for reverted https://github.com/pytorch/pytorch/issues/26426 houseroad bddppq soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/28715 Reviewed By: hl475 Differential Revision: D18146731 Pulled By: houseroad fbshipit-source-id: 247366451a6334e84df82d00339521f797b33130	2019-11-01 12:53:01 -07:00
Junjie Bai	d37c2d7c8d	Revert D17495965: TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test Test Plan: revert-hammer Differential Revision: D17495965 Original commit changeset: 3e8dbe8943f5 fbshipit-source-id: d47fcbec22b0d61df41d7dbf15cfdde196ac818f	2019-10-25 13:58:16 -07:00
Sergei Nikolaev	4996e3aca2	TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test (#26426 ) Summary: This PR makes Caffe2 compatible with TensorRT 6. To make sure it works well, new unit test is added. This test checks PyTorch->ONNX->TRT6 inference flow for all classification models from TorhchVision Zoo. Note on CMake changes: it has to be done in order to import onnx-tensorrt project. See https://github.com/pytorch/pytorch/issues/18524 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26426 Reviewed By: hl475 Differential Revision: D17495965 Pulled By: houseroad fbshipit-source-id: 3e8dbe8943f5a28a51368fd5686c8d6e86e7f693	2019-10-25 13:01:57 -07:00
Junjie Bai	f4d0d0a811	Enable RCCL in ROCm build (#27383 ) Summary: continues https://github.com/pytorch/pytorch/pull/23884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383 Differential Revision: D17767248 Pulled By: bddppq fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90	2019-10-04 17:41:41 -07:00
Jiakai Liu	8f54d0d6b6	update android/iOS build library packing (#26565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26565 For OSS mobile build we should keep QNNPACK off and PYTORCH_QNNPACK on as we don't include caffe2 ops that use third_party/QNNPACK. Update android/iOS build script to include new libraries accordingly. Test Plan: - CI build Differential Revision: D17508918 Pulled By: ljk53 fbshipit-source-id: 0483d45646d4d503b4e5c1d483e4df72cffc6c68	2019-09-20 17:48:15 -07:00
Jiakai Liu	d6e3aed032	add eigen blas for mobile build (#26508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26508 Enable BLAS for pytorch mobile build using Eigen BLAS. It's not most juicy optimization for typical mobile CV models as we are already using NNPACK/QNNPACK for most ops there. But it's nice to have good fallback implementation for other ops. Test Plan: - Create a simple matrix multiplication script model: ``` import torch class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.weights = torch.ones(1000, 1000) def forward(self, x): return torch.mm(x, self.weights) n = Net() module = torch.jit.trace_module(n, {'forward': torch.ones(1000, 1000)}) module.save('mm.pk') ``` - Before integrate with eigen blas: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch \ --model=mm.pk \ --input_dims="1000,1000" \ --input_type=float \ --warmup=5 \ --iter=5' Milliseconds per iter: 2218.52. ``` - After integrate with eigen blas: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch_eigen \ --model=mm.pk \ --input_dims="1000,1000" \ --input_type=float \ --warmup=5 \ --iter=5' Milliseconds per iter: 314.535. ``` - Improve MobileNetV2 single thread perf by ~5%: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch \ --model=mobilenetv2.pk \ --input_dims="1,3,224,224" \ --input_type=float \ --warmup=5 \ --iter=20 \ --print_output=false \ --caffe2_threadpool_force_inline=true' Milliseconds per iter: 367.055. adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch_eigen \ --model=mobilenetv2.pk \ --input_dims="1,3,224,224" \ --input_type=float \ --warmup=5 \ --iter=20 \ --print_output=false \ --caffe2_threadpool_force_inline=true' Milliseconds per iter: 348.77. ``` Differential Revision: D17489587 fbshipit-source-id: efe542db810a900f680da7ec7e60f215f58db66e	2019-09-20 15:45:11 -07:00
Jiakai Liu	9f4174c496	expose USE_STATIC_DISPATCH macro to public headers Summary: USE_STATIC_DISPATCH needs to be exposed as we don't hide header files containing it for iOS (yet). Otherwise it's error-prone to request all external projects to set the macro correctly on their own. Also remove redundant USE_STATIC_DISPATCH definition from other places. Test Plan: - build android gradle to confirm linker can still strip out dead code; - integrate with demo app to confirm inference can run without problem; Differential Revision: D17484260 Pulled By: ljk53 fbshipit-source-id: 653f597acb2583761b723eff8026d77518007533	2019-09-20 14:01:49 -07:00
Jiakai Liu	c3f881cdbc	add script to build mobile library with host toolchain (#26440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26440 As we are optimizing build size for Android/iOS, it starts diverging from default build on several build options, e.g.: - USE_STATIC_DISPATCH=ON; - disable autograd; - disable protobuf; - no caffe2 ops; - no torch/csrc/api; ... Create this build_mobile.sh script to 'simulate' mobile build mode with host toolchain so that people who don't work on mobile regularly can debug Android/iOS CI error more easily. It might also be used to build libtorch on devices like raspberry pi natively. Test Plan: - run scripts/build_mobile.sh -DBUILD_BINARY=ON - run build_mobile/bin/speed_benchmark_torch on host machine Differential Revision: D17466580 Pulled By: ljk53 fbshipit-source-id: 7abb6b50335af5b71e58fb6d6f9c38eb74bd5781	2019-09-18 19:34:09 -07:00
Ashkan Aliabadi	dc851ab5d4	Integrate forked QNNPACK into mobile PyTorch builds. (#25844 ) Summary: Enable forked QNNPACK builds in PyTorch mobile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25844 Differential Revision: D17336458 Pulled By: AshkanAliabadi fbshipit-source-id: 6ea09dd6c114b64313e9159bf7f17253bc87bfdb	2019-09-16 20:50:43 -07:00
Jiakai Liu	6630c3f379	add NO_EXPORT macro to unset __visibility__ attribute (#25816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25816 On Android we will release a small set of native APIs designed for mobile use cases. All of needed libtorch c++ APIs are called from inside this JNI bridge: android/pytorch_android/src/main/cpp/pytorch_jni.cpp With NO_EXPORT set for android static library build, it will hide all original TORCH, CAFFE2, TH/ATen APIs, which will allow linker to strip out unused ones from mobile library when producing DSO. If people choose to directly build libtorch DSO then it will still keep all c++ APIs as the mobile API layer is not part of libtorch build (yet). Test Plan: - build libtorch statically and link into demo app; - confirm that linker can strip out unused APIs; Differential Revision: D17247237 Pulled By: ljk53 fbshipit-source-id: de668216b5f2130da0d6988937f98770de571c7a	2019-09-10 10:20:21 -07:00
Jiakai Liu	8485710143	introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile Summary: This is the first of a series of changes to reduce build size by cutting autograd functions from mobile build. When INTERN_DISABLE_AUTOGRAD is set: * On CMake side we exclude Functions.h/cpp, VariableType.h/cpp, VariableTypeManual.cpp from the build process. Still keep variable_factories.h as we rely on it to create variables instead of tensors. In source code we gate a couple autograd references (in autograd/variable.cpp) with C10_MOBILE (technically we should use a dedicated c macro but its maintenance cost is higher than cmake macro as we have several build systems to change). * Pass --disable-autograd flag to codegen script, which will stop generating Functions/VariableType code. And for variable_factories.h it will stop generating tracing code. Edit: in this diff we will keep Functions.h/cpp to avoid changing source code. Why we need this change if it's already not calling VariableType and autograd stuff with USE_STATIC_DISPATCH=ON for mobile? It's trying to reduce static library size for iOS build, for which it's relatively harder to strip size with linker approach. Why we need make involved change into codegen script? There isn't a global config system in codegen - autograd/env.py provides similar functionality but it says not adding anything there. Test Plan: - will check CI; - test mobile build in sample app; Differential Revision: D17202733 Pulled By: ljk53 fbshipit-source-id: 5701c6639b39ce58aba9bf5489a08d30d1dcd299	2019-09-10 10:20:17 -07:00
Jiakai Liu	67c530851c	get rid of protobuf dependencies (#25650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650 This PR removes protobuf dependencies from mobile build altogether: - caffe2/proto: protobuf files, including caffe2.proto and torch.proto; - caffe2 components that depend on caffe2.proto, including most part of caffe2/core, caffe2/utils; - libprotobuf / libprotobuf-lite dependencies; - protobuf compiler; - some utils class, e.g.: netdef_converter.cpp; - introduce a macro to disable third_party/onnx which depends on protobuf; Test Plan: - builds; - link with demo app to make sure it can load and run a model in pickle format; Differential Revision: D17183548 Pulled By: ljk53 fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531	2019-09-06 08:48:20 -07:00
Pieter Noordhuis	3e843115c0	Use whitelist instead of blacklist for USE_DISTRIBUTED (#25759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25759 In #25260, USE_DISTRIBUTED was defaulted to OFF for Windows and macOS only. The Android builds didn't run for the PR and started to fail when it was merged to master. It turns out the mobile builds explicitly disable USE_DISTRIBUTED but only after the USE_DISTRIBUTED option, and derivative dependent options were defined. The result being that USE_GLOO was enabled while USE_DISTRIBUTED was disabled. This commit ensures that USE_DISTRIBUTED defaults to OFF unless the build is for a supported platform. ghstack-source-id: 89613698 Test Plan: N/A Differential Revision: D17224842 fbshipit-source-id: 459039b79ad5240e81dfa3caf486858d6e77ba4b	2019-09-06 07:53:44 -07:00
Hong Xu	cc4211069e	Do not pass down USE_GLOO_IBVERBS to CMake (#25720 ) Summary: It doesn't seem to be used anywhere once down to CMake in this repo or any submodules Pull Request resolved: https://github.com/pytorch/pytorch/pull/25720 Differential Revision: D17225088 Pulled By: pietern fbshipit-source-id: a24b080e6346a203b345e2b834fe095e3b9aece0	2019-09-06 02:40:42 -07:00
Jiakai Liu	99b6472d6b	move USE_STATIC_DISPATCH from CI script to master cmake (#25696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25696 Move the flag from CI to CMake so it's less magic and can be reused by iOS build as well. Test Plan: - will check CI Differential Revision: D17202734 Pulled By: ljk53 fbshipit-source-id: da4f150cbcf2bb5624def386ce3699eff2a7446f	2019-09-05 15:14:13 -07:00
Pieter Noordhuis	3556bea5aa	Build torch.distributed with Gloo backend on macOS (#25260 ) Summary: In facebookincubator/gloo#212, a libuv based Gloo transport was introduced, which allows us to use Gloo on macOS (and later perhaps also Windows). This commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS. A few notes: * The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`. * The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS). * The TCP store works but sometimes crashes on process termination. * The distributed tests are not yet run. * The nightly builds don't use `USE_DISTRIBUTED=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260 Reviewed By: mrshenli Differential Revision: D17202381 Pulled By: pietern fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c	2019-09-05 07:09:50 -07:00
Pavel Belevich	738303ba43	Add set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "-Wl,--no-as-needed") to CMakeLists.txt (#25445 ) Summary: This is a fix for a rare build issue on Ubuntu: `symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk` https://software.intel.com/en-us/articles/symbol-lookup-error-when-linking-intel-mkl-with-gcc-on-ubuntu Pull Request resolved: https://github.com/pytorch/pytorch/pull/25445 Differential Revision: D17151458 Pulled By: pbelevich fbshipit-source-id: a0f3e86a05ac408b95446560f42fc16fbff2d7af	2019-09-04 13:40:10 -07:00
Hong Xu	03f67e4b16	Remove BUILD_ATEN_ONLY build option (#24441 ) Summary: This build option no longer works. Close https://github.com/pytorch/pytorch/issues/21703 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24441 Differential Revision: D17138131 Pulled By: ezyang fbshipit-source-id: 67adac990645a5df1f7c2e2dbef3689b2c30fcf8	2019-08-30 13:44:38 -07:00
Edward Yang	c56464d13e	Turn off warnings on Windows CI. (#24331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24331 Currently our logs are something like 40M a pop. Turning off warnings and turning on verbose makefiles (to see the compile commands) reduces this to more like 8M. We could probably reduce log size more but verbose makefile is really useful and we'll keep it turned on for Windows. Some findings: 1. Setting `CMAKE_VERBOSE_MAKEFILE` inside CMakelists.txt itself as suggested in https://github.com/ninja-build/ninja/issues/900#issuecomment-417917630 does not work on Windows. Setting `-DCMAKE_VERBOSE_MAKEFILE=1` does work (and we respect this environment variable.) 2. The high (`/W3`) warning level is by default on MSVC is due to cmake inserting this in the default flags. On recent versions of cmake, CMP0092 can be used to disable this flag in the default set. The string replace trick sort of works, but the standard snippet you'll find on the internet won't disable the flag from nvcc. I inspected the CUDA cmake code and verified it does respect CMP0092 3. `EHsc` is also in the default flags; this one cannot be suppressed via a policy. The string replace trick seems to work... 4. ... however, it seems nvcc implicitly inserts an `/EHs` after `-Xcompiler` specified flags, which means that if we add `/EHa` to our set of flags, you'll get a warning from nvcc. So we probably have to figure out how to exclude EHa from the nvcc flags set (EHs does seem to work fine.) 5. To suppress warnings in nvcc, you must BOTH pass `-w` and `-Xcompiler /w`. Individually these are not enough. The patch applies these things; it also fixes a bug where nvcc verbose command printing doesn't work with `-GNinja`. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17131746 Pulled By: ezyang fbshipit-source-id: fb142f8677072a5430664b28155373088f074c4b	2019-08-30 07:11:07 -07:00
Hong Xu	92750acb88	Move the detection of cuDNN to FindCUDNN.cmake (#24938 ) Summary: Currently they sit together with other code in cuda.cmake. This commit is the first step toward cleaning up cuDNN detection in our build system. Another attempt to https://github.com/pytorch/pytorch/issues/24293, which breaks manywheels build because it does not handle `USE_STATIC_CUDNN` properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24938 Differential Revision: D17070920 Pulled By: ezyang fbshipit-source-id: a4d017a3505c102d9c435a73ae62332e4336c52e	2019-08-27 06:51:52 -07:00
Ivan Kobzarev	2cccad2c56	Turn off fbgemm for libtorch android build (#25113 ) Summary: https://github.com/pytorch/FBGEMM (USE_FBGEMM is ON by default for x86, x86_64) Build libtorch for android_abi x86_64 fails due to this. Turning it off for android builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/25113 Reviewed By: dreiss Differential Revision: D16992459 Pulled By: IvanKobzarev fbshipit-source-id: 3cf35a67043288cb591cc3b23c261258c28cf304	2019-08-23 12:47:53 -07:00
Lucian Grijincu	9c9f14029d	Revert D16929363: Revert D16048264: Add static dispatch mode to reduce mobile code size Differential Revision: D16929363 Original commit changeset: 69d302929e18 fbshipit-source-id: add36a6047e4574788eb127c40f6166edeca705f	2019-08-20 17:08:31 -07:00
Lucian Grijincu	bd6cf5099b	Revert D16048264: Add static dispatch mode to reduce mobile code size Differential Revision: D16048264 Original commit changeset: ad1e50951273 fbshipit-source-id: 69d302929e183e2da26b64dcc24c69c3b7de186b	2019-08-20 16:26:18 -07:00
Edward Yang	907f5020c3	Revert D16914345: [pytorch][PR] Move the detection of cuDNN to FindCUDNN.cmake Differential Revision: D16914345 Original commit changeset: fd261478c01d fbshipit-source-id: b933ad7ed49028ab9ac6976c3ae768132dc9bacb	2019-08-20 14:23:12 -07:00
Roy Li	6824c9018d	Add static dispatch mode to reduce mobile code size Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22335 Test Plan: Imported from OSS Differential Revision: D16048264 Pulled By: li-roy fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e	2019-08-20 12:21:32 -07:00
Hong Xu	6ce6939be9	Move the detection of cuDNN to FindCUDNN.cmake (#24784 ) Summary: Currently they sit together with other code in cuda.cmake. This commit is the first step toward cleaning up cuDNN detection in our build system. Another attempt to https://github.com/pytorch/pytorch/issues/24293, which breaks manywheels build because it does not handle `USE_STATIC_CUDNN`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24784 Differential Revision: D16914345 Pulled By: ezyang fbshipit-source-id: fd261478c01d879dc770c1f1a56b17cc1a587be2	2019-08-20 01:55:46 -07:00
peterjc123	d9b4149e99	Fix cmake backslash syntax error on Windows. (#24420 ) Summary: ``` [1/1424] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/operators/torch_generated_weighted_sample_op.cu.obj CMake Warning (dev) at torch_generated_weighted_sample_op.cu.obj.Release.cmake:82 (set): Syntax error in cmake code at C:/Users/Ganzorig/pytorch/build/caffe2/CMakeFiles/torch.dir/operators/torch_generated_weighted_sample_op.cu.obj.Release.cmake:82 when parsing string C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build;C:/Users/Ganzorig/pytorch;C:/Users/Ganzorig/pytorch/cmake/../third_party/googletest/googlemock/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/googletest/googletest/include;;C:/Users/Ganzorig/pytorch/third_party/protobuf/src;C:/Users/Ganzorig/pytorch/cmake/../third_party/benchmark/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/eigen;C:/Users/Ganzorig/Anaconda3/envs/code/include;C:/Users/Ganzorig/Anaconda3/envs/code/lib/site-packages/numpy/core/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/pybind11/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/cub;C:/Users/Ganzorig/pytorch/build/caffe2/contrib/aten;C:/Users/Ganzorig/pytorch/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/build/third_party/foxi;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api/include;C:/Program Files/NVIDIA Corporation/NvToolsExt/include;C:/Users/Ganzorig/pytorch/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/caffe2/../torch/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/caffe2/../torch/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/../aten/src/ATen;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc;C:/Users/Ganzorig/pytorch/caffe2/../torch/../third_party/miniz-2.0.8;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api/include;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/aten/../third_party/catch/single_include;C:/Users/Ganzorig/pytorch/aten/src/ATen/..;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/ATen;C:/Users/Ganzorig/pytorch/third_party/miniz-2.0.8;C:/Users/Ganzorig/pytorch/caffe2/core/nomnigraph/include;C:/Users/Ganzorig/pytorch/caffe2/;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/THC;C:/Users/Ganzorig/pytorch/aten/src/THC;C:/Users/Ganzorig/pytorch/aten/src/THCUNN;C:/Users/Ganzorig/pytorch/aten/src/ATen/cuda;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/aten/../third_party/catch/single_include;C:/Users/Ganzorig/pytorch/aten/src/ATen/..;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/ATen;C:/Users/Ganzorig/pytorch/third_party/protobuf/src;C:/Users/Ganzorig/pytorch/c10/../;C:/Users/Ganzorig/pytorch/build;C:/Users/Ganzorig/pytorch/third_party/cpuinfo/include;C:/Users/Ganzorig/pytorch/third_party/FP16/include;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/c10/cuda/../..;C:/Users/Ganzorig/pytorch/build;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1\include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include Invalid escape sequence \i Policy CMP0010 is not set: Bad variable reference syntax is an error. Run "cmake --help-policy CMP0010" for policy details. Use the cmake_policy command to set the policy and suppress this warning. This warning is for project developers. Use -Wno-dev to suppress it. ``` Compared to https://github.com/pytorch/pytorch/issues/24044 , this commit moves the fix up, and uses [bracket arguments](https://cmake.org/cmake/help/v3.12/manual/cmake-language.7.html#bracket-argument). PR also sent to upstream: https://gitlab.kitware.com/cmake/cmake/merge_requests/3679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24420 Differential Revision: D16914193 Pulled By: ezyang fbshipit-source-id: 9f897cf4f607502a16dbd1045f2aedcb49c38da7	2019-08-20 01:25:20 -07:00
peter	10d2ada17d	Fix Z7_MSVC_OVERRIDE for C source files (#24389 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24145#issuecomment-521507234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24389 Differential Revision: D16828222 Pulled By: ezyang fbshipit-source-id: dcf652fbd8b8945c71993e9b99394e18ac542e6b	2019-08-15 06:52:42 -07:00
Hong Xu	61db8b64ec	Build option USE_NUMA should only show up on Linux. (#23673 ) Summary: (intentionally left blank) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23673 Differential Revision: D16627453 Pulled By: vincentqb fbshipit-source-id: df62f1b26901bec6369b5589b98124165f40e6f1	2019-08-09 08:17:52 -07:00
Tao Xu	87a75bd605	remove ONNX & Turn on `NO_API` for mobile build (#23546 ) Summary: ### Summary The iOS build was broken after this PR 👉 [23195](https://github.com/pytorch/pytorch/pull/23195/files) was merged, as there are two files still have dependency on ONNX. - `test.cpp` in `test/cpp/jit` - `export.cpp` in `torch/csrc/jit` This PR is to remove ONNX completely from mobile build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23546 Test Plan: - The `build_ios.sh` finished successfully. - The `libtorch.a` can be compiled and run on iOS devices Differential Revision: D16558236 Pulled By: xta0 fbshipit-source-id: b7ff1db750698cfd5a72d5cb0b9f2f378e315077	2019-07-31 10:42:56 -07:00
Hong Xu	b335f3910f	Remove redundant MSVC_Z7_OVERRIDE processing and combine "/EHa" flag setup (#23455 ) Summary: - MSVC_Z7_OVERRIDE has already handled in CMakeLists.txt. No need to process it for once more in the Python scripts. - Option MSVC_Z7_OVERRIDE should be visible to the user only if MSVC is used. - Move the setting of "/EHa" flag to CMakeLists.txt, where other MSVC-specific flags are processed. This also further prepares the removal of redundant cflags setup in Python build scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23455 Differential Revision: D16542274 Pulled By: ezyang fbshipit-source-id: 4d3b8b07161478bbba8a21feb6ea24c9024e21ac	2019-07-29 08:08:47 -07:00
Hong Xu	f91b19c2aa	Do not explicitly set USE_FBGEMM in tools/setup_helpers/cmake.py (#23314 ) Summary: Instead, defer its default value to CMakeLists.txt NO_FBGEMM has already been handled in tools/setup_helpers/env.py (although deprecated) Pull Request resolved: https://github.com/pytorch/pytorch/pull/23314 Differential Revision: D16493580 Pulled By: ezyang fbshipit-source-id: 7255eb1df5e8a6dd0362507d68da0986a9ed46e2	2019-07-25 07:11:52 -07:00
Hong Xu	60c46dd4df	Let CMake handle NCCL detection instead of our handcrafted Python script. (#22930 ) Summary: --- How does the current code subsume all detections in the deleted `nccl.py`? - The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`. - The main NCCL detection happens in [FindNCCL.cmake](`8377d4b32c/cmake/Modules/FindNCCL.cmake`), which is called by [nccl.cmake](`8377d4b32c/cmake/External/nccl.cmake`). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this. - `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`. - Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`. * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA. `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled. * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly: - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system. - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ). --- Regarding for relevant issues: - https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment. - https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection). - `b7e258f81e` A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests: > When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930 Differential Revision: D16440275 Pulled By: ezyang fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa	2019-07-23 08:45:51 -07:00
Jesse Hellemn	e08f8f45ff	Turning on fbgemm for nightlies (#22784 ) Summary: fbgemm requires a AVX512 which requires a more recent compiler, so this also switches all the nightlies from devtoolset3 to devtoolset7. Since CUDA 9.0 doesn't support devtoolset7, we also switch from CUDA 9.0 to CUDA 9.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22784 Differential Revision: D16428165 Pulled By: pjh5 fbshipit-source-id: c1af3729d8edce88a96fa9069d4c5a1808c25f99	2019-07-22 15:09:11 -07:00
Edward Yang	798d5d9771	Revert D16281714: Add sanity checks for NCCL detection. Differential Revision: D16281714 Original commit changeset: 396bcbf099bd fbshipit-source-id: a22cc112d1b6a62d689f9d8a7f93e8be3abe2a44	2019-07-16 13:58:27 -07:00
Hong Xu	e2046f8c1d	Add sanity checks for NCCL detection. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819 Test Plan: Imported from OSS Differential Revision: D16281714 Pulled By: ezyang fbshipit-source-id: 396bcbf099bd07b996cf779c6b43092096b52d90	2019-07-16 11:32:32 -07:00
Supriya Rao	c97829d701	Adding FC and Relu QNNPACK ops to C10 registry (#22174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22174 This is a preliminary change outlining the approach we plan to follow to integrate QNNPACK operators into the pytorch backend. The operators will not be made visible to the user in the python world, so ultimately we will have a function that calls qnnpack backend based on the environment being run on. The goal of the project is to integrate QNNPACK library with PyTorch to achieve good performance for quantized mobile models. Reviewed By: ljk53 Differential Revision: D15806325 fbshipit-source-id: c14e1d864ac94570333a7b14031ea231d095c2ae	2019-07-08 14:21:42 -07:00
Hui Wu	07ef85e326	Add USE_MKLDNN_CBLAS build option. (#19014 ) Summary: MKL-DNN is the main library for computation when we use ideep device. It can use kernels implemented by different algorithms (including JIT, CBLAS, etc.) for computation. We add the "USE_MKLDNN_CBLAS" (default OFF) build option so that users can decide whether to use CBLAS computation methods or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19014 Differential Revision: D16094090 Pulled By: ezyang fbshipit-source-id: 3f0b1d1a59a327ea0d1456e2752f2edd78d96ccc	2019-07-02 12:29:54 -07:00
Hong Xu	693871ded3	Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360 ) Summary: Currently the build system accepts USE_NAMEDTENSOR from the environment variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake. This discrepancy does not seem necessary and complicates the build system. The naming of this build option is also semantically incorrect ("BUILD_" vis-a-vis "USE_"). This commit eradicate this issue before it is made into a stable release. The support of NO_NAMEDTENSOR is also removed, since PyTorch has been quite inconsistent about "NO_*" build options. --- Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360 Differential Revision: D16074509 Pulled By: zou3519 fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae	2019-07-02 11:46:13 -07:00
Hong Xu	bf677b8849	Set MKLDNN (default) build variables in CMakeLists.txt, not in Python build scripts (#22215 ) Summary: This is yet another step to disentangle Python build scripts and CMake and improve their integration (Let CMake handle more build environment detections, and less by our handcrafted Python scripts). The processor detection logic also changed a bit: Instead of detecting whether the system processor is PPC or ARM, this PR changes to detect Intel CPUs, because this is more precise as MKL only supports Intel CPUs. The build option `USE_MKLDNN` will also not be presented to users on non-Intel processors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22215 Differential Revision: D16005953 Pulled By: ezyang fbshipit-source-id: bf3f74d53609b3f835e280f63a872ff3c9352763	2019-06-27 10:21:55 -07:00
Hong Xu	cd0d8480d3	Remove many build options redundantly specified in Python build scripts. (#21877 ) Summary: Currently many build options are explicitly passed from Python build scripts to CMake. But this is unecessary, at least for many of them. This commit removes the build options that have the same name in CMakeLists.txt and environment variables (e.g., `USE_REDIS`). Additionally, many build options that are not explicitly passed to CMake are lost. For `ONNX_ML`, `ONNX_NAMESPACE`, and `BUILDING_WITH_TORCH_LIBS`, I changed their default values in CMake scripts (as consistent with what the `CMake.defines` call meant), to avoid their default values being redundantly set in the Python build scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21877 Differential Revision: D15964996 Pulled By: ezyang fbshipit-source-id: 127a46af7e2964885ffddce24e1a62995e0c5007	2019-06-24 07:17:54 -07:00
Edward Yang	c36dc35853	Revert D15576968: Turn on Werror for deprecated-declarations. Differential Revision: D15576968 Original commit changeset: fb73a8986a5b fbshipit-source-id: 1ae19afc6816f764b895a47162728433a319ac0b	2019-06-07 19:15:56 -07:00
Edward Yang	66d596645a	Turn on Werror for deprecated-declarations. (#21195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21195 The motivation is that, while we shouldn't break USER code for using deprecated declarations, we should keep our internal code base deprecation clean. Differential Revision: D15576968 fbshipit-source-id: fb73a8986a5b60bf49ee18260653100319bb1030	2019-06-07 17:24:17 -07:00
Jiakai Liu	702ba3d2fb	build torch for libtorch mobile build (#21234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21234 ghimport-source-id: 8d401691a811991c79acf5e09e60389910910365 Differential Revision: D15616540 Pulled By: ljk53 fbshipit-source-id: 150e706630911bf14c55f47f4058eaada1edf1cc	2019-06-03 19:51:05 -07:00
Ilia Cherniavskii	580eab6562	Restore TBB module (#20454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454 ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384 Differential Revision: D15326062 Pulled By: ilia-cher fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f	2019-05-28 02:49:36 -07:00
peter	f4b434a6a5	Fix incorrect torch version in CMake (#21007 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20525 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21007 Differential Revision: D15515260 Pulled By: soumith fbshipit-source-id: 149084cce276c5e76ca0c5c0872c5e990c47bdfd	2019-05-27 23:46:49 -07:00
Hong Xu	6af2482612	Leave it as an option for whether to colorize output during build (#20771 ) Summary: Currently PyTorch forces color output due to #20662. But users should be left an option to turn it off because redirection of the output to a file would be messed if color output is forced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20771 Differential Revision: D15495677 Pulled By: ezyang fbshipit-source-id: 9d89bbed40d0b67368554305394763a54c5ff6f5	2019-05-24 09:22:52 -07:00
Michael Suo	d7cd2d7a8c	compile with -fcolor-diagnostics (#20662 ) Summary: Let there be color! Pull Request resolved: https://github.com/pytorch/pytorch/pull/20662 Differential Revision: D15434110 Pulled By: suo fbshipit-source-id: a317ae72ad72e0b8249f55c9c8d31f420c78c040	2019-05-21 10:32:55 -07:00
peter	3bc0bd9534	Fix caffe2 build failure on Windows (#20574 ) Summary: Fixes #20568. Looks like CMake is passing `/MD` when we call `add_library`. We need to fix these with C source files too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20574 Differential Revision: D15392682 Pulled By: ezyang fbshipit-source-id: c92034d8725fcec48fd7db6cf5322868e956dc6b	2019-05-17 07:21:42 -07:00
Jesse Hellemn	5821a76b8e	Forcing gcc ABI and safer bash scripts, v2 (#20540 ) Summary: First time this was merged it broke master and was reverted. This time I do not add ```set -u``` to the .circleci/scripts/setup* scripts. There's still a chance that ```set -u``` breaks the binary builds on master, but at least those can be fixed in parallel and don't completely eliminate signal from all merges. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20540 Differential Revision: D15373444 Pulled By: pjh5 fbshipit-source-id: 0203c20865827366ecd8fa07b2db74d255549ed1	2019-05-16 09:40:01 -07:00
Bram Wasti	8e26759f14	Back out "[pytorch][PR] Manually set _GLIBCXX_USE_CXX11_ABI in devtoolset7 binary builds" Summary: Original commit changeset: 571bba8a93ea Reviewed By: pjh5 Differential Revision: D15349783 fbshipit-source-id: 75c3e2b9b97e0ac0e8bcdef93e53b0d475c6fa38	2019-05-15 00:02:55 -07:00
Jesse Hellemn	ea38fbfc5c	Manually set _GLIBCXX_USE_CXX11_ABI in devtoolset7 binary builds (#20243 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/17492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20243 Differential Revision: D15348101 Pulled By: pjh5 fbshipit-source-id: 571bba8a93eaa9806db3f3d38697c26b5285da7a	2019-05-14 18:02:42 -07:00
Karl Ostmo	4ba28deb6e	Unify libtorch and libcaffe2 (#17783 ) Summary: This PR is an intermediate step toward the ultimate goal of eliminating "caffe2" in favor of "torch". This PR moves all of the files that had constituted "libtorch.so" into the "libcaffe2.so" library, and wraps "libcaffe2.so" with a shell library named "libtorch.so". This means that, for now, `caffe2/CMakeLists.txt` becomes a lot bigger, and `torch/CMakeLists.txt` becomes smaller. The torch Python bindings (`torch_python.so`) still remain in `torch/CMakeLists.txt`. The follow-up to this PR will rename references to `caffe2` to `torch`, and flatten the shell into one library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17783 Differential Revision: D15284178 Pulled By: kostmo fbshipit-source-id: a08387d735ae20652527ced4e69fd75b8ff88b05	2019-05-10 09:50:53 -07:00
Richard Zou	e01a5bf28b	Add USE_NAMEDTENSOR compilation flag. (#20162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20162 ghimport-source-id: 0efcd67f04aa087e1dd5faeee550daa2f13ef1a5 Reviewed By: gchanan Differential Revision: D15278211 Pulled By: zou3519 fbshipit-source-id: 6fee981915d83e820fe8b50a8f59da22a428a9bf	2019-05-09 09:09:16 -07:00
peter	1e35ef07e9	Switch off USE_DISTRIBUTED on default for MSVC (#20302 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20250 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20302 Differential Revision: D15277733 Pulled By: ezyang fbshipit-source-id: a8915939d033a04f962908d19bbad840b6234e27	2019-05-09 06:29:31 -07:00
Jiakai Liu	c7c02724cd	CMakeLists changes to enable libtorch for Android (#19762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19762 ghimport-source-id: 287aa7fea4efd38994e14d794123eb2046b91fc0 Differential Revision: D15087653 Pulled By: ljk53 fbshipit-source-id: 4498ff9f7f7903c3e25541184302b811267958e9	2019-05-03 09:28:53 -07:00
Jiakai Liu	8cd6d2f101	rename BUILD_ATEN_MOBILE to INTERN_BUILD_MOBILE and make it private (#19942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19942 ghimport-source-id: 6bacc8f5ad7911af8cf5fde9fcb604ade666b862 Reviewed By: dzhulgakov Differential Revision: D15144325 Pulled By: ljk53 fbshipit-source-id: d63a70f007110d5d1055d6bec1ed09a1a6aafdae	2019-05-01 00:20:24 -07:00
James Reed	1b3967b491	-fno-math-errno -fno-trapping-math (#19552 ) Summary: As suggested in https://github.com/pytorch/pytorch/pull/19152#discussion_r275925767, this may give the compiler more opportunities for auto-vectorization Pull Request resolved: https://github.com/pytorch/pytorch/pull/19552 Differential Revision: D15048358 Pulled By: jamesr66a fbshipit-source-id: db2c2c515c3e9f7d22305c039ab0c8a867fc43a2	2019-04-23 11:06:49 -07:00
peter	5e33085f27	Make it possible for users for select /Zi or /ZI over /Z7 when using MSVC (#18790 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18701. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18790 Differential Revision: D14748195 Pulled By: ezyang fbshipit-source-id: e50df1b5ca199a88d7b5ea3ea45d25d23cd31a27	2019-04-03 08:24:52 -07:00
Sacha	8276d82f78	Move flags that do not work on MSVC (#18686 ) Summary: MSVC errors on these flags as they are not supported Pull Request resolved: https://github.com/pytorch/pytorch/pull/18686 Differential Revision: D14704254 Pulled By: ezyang fbshipit-source-id: 936d33ed6b7474d7774a49505cdac50dbe8dd99a	2019-04-01 07:28:05 -07:00
Junjie Bai	0fe6e8c870	Remove ComputeLibrary submodule Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18052 Reviewed By: ezyang Differential Revision: D14477355 fbshipit-source-id: c56b802f6d69701596c327cf9af6782f30e335fa	2019-03-16 09:06:42 -07:00
Thomas Viehmann	7e34bd230b	Disable FBGEMM when building under x86 32bit (#17922 ) Summary: FBGEMM doesn't work on x86 32bit and prior to this patch, it will generate x86_64 objects in a build that is supposed to be x86 32bit. FBGEMM actually relies on registers not available on x86_32, so we disable it. This takes of one element of #17901. There are more dependencies and a separate PR (#17915) regarding AVX detection for the code in the main repository. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17922 Differential Revision: D14437340 Pulled By: soumith fbshipit-source-id: bd9fc98cf607d9b0bc28127fbbc8b04fa10eecbe	2019-03-13 03:46:50 -07:00
Soumith Chintala	d1b2ab83fc	disable default system-wide detection of gflags, glog, opencv, lmdb, leveldb (#16789 ) Summary: They can instead by enable by env flags USE_* (as always). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16789 Differential Revision: D13971789 Pulled By: soumith fbshipit-source-id: d5eac9be677114be3fb15b43080faa0efdfff8ee	2019-02-06 05:13:47 -08:00
JerryShih	73db487a8e	Update the cmake build configuration for AppleClang compiler (#15820 ) Summary: This pr try to merge the https://github.com/pytorch/pytorch/pull/11563 again and fix the linking error in https://github.com/pytorch/pytorch/pull/14837. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15820 Differential Revision: D13942024 Pulled By: ezyang fbshipit-source-id: dc6d1e9c4b0f177914f3745665244272a03ce33c	2019-02-04 08:53:47 -08:00
Owen Anderson	f204e3e624	Pass WERROR to CMake as an explicit parameter rather than an env var. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16465 Differential Revision: D13853949 Pulled By: resistor fbshipit-source-id: 71ccf90a2824ad21c9f26dd753b186f30435d82a	2019-01-28 20:57:18 -08:00
peter	f7733526aa	Generate PDB files for better debugging on Windows (#16008 ) Summary: 1. Unify `build_pytorch_libs.bat`, `setup.py` and `torch/CMakeLists.txt` on the debugging flags with the `CMAKE_BUILD_TYPE` being `Debug`, `Release` and `RelWithDebInfo`. 2. Install PDBs through CMake if they are generated. Reference: 1. CMake PDB install: https://gitlab.kitware.com/cmake/cmake/issues/18393#note_459199 2. About debugging flags https://stackoverflow.com/a/4662345 3. MSDN page about /DEBUG flag: https://docs.microsoft.com/en-us/cpp/build/reference/debug-generate-debug-info?view=vs-2017 4. MSDN page about /Z{i/I/7}: https://docs.microsoft.com/en-us/cpp/build/reference/z7-zi-zi-debug-information-format?view=vs-2017 Work to do: - [x] Test the changes work in Release config through this PR - [ ] <del> Test debug build through https://github.com/pytorch/pytorch/pull/16009 </del> - [x] Test release build with debugging symbols through #16013 Difficulties: - [x] Replace /Zi flags with /Z7 (which will be added if DEBUG or RelWithDebInfo is used), as it is not supported by sccache - [x] Resolve `LINK : fatal error LNK1210: exceeded internal ILK size limit; link with /INCREMENTAL:NO` in the debug build - [ ] DEBUG build blocked by a MSVC bug. In order to resolve it, we'll need to update the MSVC in CI: https://developercommunity.visualstudio.com/content/problem/225957/fatal-error-lnk1318-unexpected-pdb-error-ok-0.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/16008 Differential Revision: D13709527 Pulled By: ezyang fbshipit-source-id: e8365bc75d9ec64099093f7001f83d99a06b196b	2019-01-16 23:34:32 -08:00
James Reed	acbd9c49b0	Direct FBGEMM integraton into ATen (#13777 ) Summary: This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation: 1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel. 2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value. 3) Biases are unquantized 4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops must be run with FBGEMM The API can be seen in the added test case. Highlights are: 1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above. 2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777 Differential Revision: D13383276 Pulled By: jamesr66a fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344	2018-12-21 10:35:51 -08:00
Jerry Zhang	12cf5178aa	caffe2 mobile opengl (#15322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15322 caffe2 mobile opengl code is not used, deleting it to reduce complications when we perform other changes Reviewed By: Maratyszcza Differential Revision: D13499943 fbshipit-source-id: 6479f6b9f50f08b5ae28f8f0bc4a1c4fc3f3c3c2	2018-12-18 08:20:52 -08:00
Jesse Hellemn	efb37e86eb	Removing BUILD_C10_EXPERIMENTAL_OPS option and unglobbing experimental/c10d ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15064 Reviewed By: orionr Differential Revision: D13474801 Pulled By: pjh5 fbshipit-source-id: 9d3664c3a3a1b6c2d9f083f8476fe3b037296b98	2018-12-17 15:35:41 -08:00
Orion Reblitz-Richardson	687834dcb4	Install cpp tests when built (#15000 ) Summary: This is broken out of https://github.com/pytorch/pytorch/pull/13733/ We want to install cpp tests so they can ultimately be runnable from that location for Caffe2 tests run from PyTorch builds. cc pjh5 yf225 anderspapitto Pull Request resolved: https://github.com/pytorch/pytorch/pull/15000 Reviewed By: pjh5 Differential Revision: D13416253 Pulled By: orionr fbshipit-source-id: 51280be0a22557a742f90c9f303c58c35cbd4a38	2018-12-11 10:07:48 -08:00
Your Name	cf059028f0	Do not load ROCm cmake files if USE_ROCM is off (#14261 ) Summary: Previously if it unconditionally tries to load rocm cmake files, so there was no way to disable rocm build. After this change, USE_ROCM=0 will disable rocm build. Should fix #14025 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14261 Differential Revision: D13242090 Pulled By: bddppq fbshipit-source-id: 652ec7d49dce9b357778bfa53a8e04b7079787ab	2018-11-29 11:17:19 -08:00
ArutyunovG	8e91da4cb3	Windows shared build (#13550 ) Summary: Hi guys, I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios. This is the first pull request. Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015. CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system. Python is 3.5, Detectron works from python interface as well. It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built. What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat. After this pull request the next step is to add Visual Studio 2017 support in the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550 Reviewed By: ezyang Differential Revision: D13042597 Pulled By: orionr fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc	2018-11-16 12:16:28 -08:00
Daya S Khudia	f66cb02016	Turn fbgemm off by default for pytorch (#14048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14048 Setting USE_FBGEMM to OFF by default until we figure out properly separating avx2 code. See [this issue](https://github.com/pytorch/pytorch/issues/13993). Pytorch can still be compiled with fbgemm by using USE_FBGEMM=ON. Reviewed By: jspark1105 Differential Revision: D13090454 fbshipit-source-id: 6e0e92612e4362a306e376df3dc33e8edeb066e9	2018-11-15 18:42:16 -08:00
Gu, Jinghui	d01cb70497	build with mkl-dnn by default (#13303 ) Summary: build with mkl-dnn by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/13303 Reviewed By: yinghai Differential Revision: D12979633 Pulled By: orionr fbshipit-source-id: 00d23fa27c0d13e82f7e5acb3ebd00ed7ba1d5dc	2018-11-08 11:18:27 -08:00
Daya S Khudia	18de330e86	CMake integration for int8 server operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13558 Reviewed By: Maratyszcza Differential Revision: D12945460 Pulled By: dskhudia fbshipit-source-id: 1a91027b305fd6af77eebd9a4fad092a12f54712	2018-11-06 15:45:15 -08:00
Gu, Jinghui	dbab9b73b6	seperate mkl, mklml, and mkldnn (#12170 ) Summary: 1. Remove avx2 support in mkldnn 2. Seperate mkl, mklml, and mkldnn 3. Fix convfusion test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170 Reviewed By: yinghai Differential Revision: D10207126 Pulled By: orionr fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51	2018-10-29 10:52:55 -07:00
Marat Dukhan	5e73b828bd	CMake integration for Int8 ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13145 Differential Revision: D10860849 Pulled By: Maratyszcza fbshipit-source-id: fdbcc23ff9beaeaedfd561176df6cfe87685c1f5	2018-10-25 22:25:10 -07:00
mratsim	a1bbe80e21	Remove NervanaGPU operators from Caffe2 (#12564 ) Summary: Fix #12540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12564 Reviewed By: orionr Differential Revision: D10379775 Pulled By: soumith fbshipit-source-id: a925b116f2687e56bf54465fc02ca2eb1e7c8eb0	2018-10-15 11:04:46 -07:00
Giovanni	0d50c117db	Introduce BUILD_ATEN_ONLY cmake option (#12443 ) Summary: Following up #11488 conversation with orionr And our brief conversation at PTDC about ATen with soumith and apaszke This PR enables a very slim build focused on ATen particularly without caffe2 and protobuf among other dependencies. WIth this PR NimTorch tests pass fully, including AD, convolutions, wasm, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12443 Reviewed By: mingzhe09088 Differential Revision: D10249313 Pulled By: orionr fbshipit-source-id: 4f50503f08b79f59e7717fca2b4a1f420d908707	2018-10-10 12:54:19 -07:00
Anders Papitto	57fcc57f31	set CMAKE_INSTALL_MESSAGE to NEVER (#12392 ) Summary: this removes a bunch of spam output from the build. This is (1) cleaner (2) a couple seconds faster in some cases, e.g. my slow-rendering emacs-based shell Pull Request resolved: https://github.com/pytorch/pytorch/pull/12392 Differential Revision: D10225340 Pulled By: anderspapitto fbshipit-source-id: 477ee76d24f8db50084b1e261db8c22733de923b	2018-10-05 15:57:44 -07:00
vishwakftw	39bd73ae51	Guard NumPy usage using USE_NUMPY (#11798 ) Summary: All usages of the `ndarray` construct have now been guarded with `USE_NUMPY`. This eliminates the requirement of NumPy while building PyTorch from source. Fixes #11757 Reviewed By: Yangqing Differential Revision: D10031862 Pulled By: SsnL fbshipit-source-id: 32d84fd770a7714d544e2ca1895a3d7c75b3d712	2018-10-04 12:11:02 -07:00
Peter Goldsborough	bcc2a0599b	Enable clang-tidy in CI (#12213 ) Summary: At long last, we will have clang-tidy enabled in CI. For a while I thought I could clean up the project enough to enable clang-tidy with all checks enabled, but I figure it's smarter to set up the minimal checks and at least have those in CI. We can fix more going forward. ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12213 Differential Revision: D10183069 Pulled By: goldsborough fbshipit-source-id: 7ecd2d368258f46efe23a2449c0a206d10f3a769	2018-10-03 17:25:06 -07:00
Orion Reblitz-Richardson	02d7c88fa4	Unify versions across setup.py, libtorch, and libcaffe2 (#12053 ) Summary: This unifies our versions across setup.py, libtorch, and libcaffe2. CMake has a default version (bumped to 1.0.0) that can be overridden by setup.py. The versions are also printed as a part of cmake/Summary.cmake to make sure they are correct. cc Yangqing ezyang soumith goldsborough pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12053 Differential Revision: D10041878 Pulled By: orionr fbshipit-source-id: a98a01771f6c008d1016ab63ab785c3a88c3ddb0	2018-09-26 08:55:06 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00
Peter Goldsborough	d712a71741	Protobuf serialization (#11619 ) Summary: This PR serves two purposes: 1. Design an abstraction over a serialization scheme for C++ modules, optimizers and tensors in general, 2. Add serialization to the ONNX/PyTorch proto format. This is currently a rough prototype I coded up today, to get quick feedback. For this I propose the following serialization interface within the C++ API: ```cpp namespace torch { namespace serialize { class Reader { public: virtual ~Reader() = default; virtual void read(const std::string& key, Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; class Writer { public: virtual ~Reader() = default; virtual void writer(const std::string& key, const Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; }} // namespace torch::serialize ``` There are then subclasses of these two for (1) Cereal and (2) Protobuf (called the "DefaultWriter" and "DefaultReader" to hide the implementation details). See `torch/serialize/cereal.h` and `torch/serialize/default.h`. This abstraction and subclassing for these two allows us to: 1. Provide a cereal-less serialization forward that we can ship and iterate on going forward, 2. Provide no-friction backwards compatibility with existing C++ API uses, mainly StarCraft. The user-facing API is (conceptually): ```cpp void torch::save(const Module& module, Writer& writer); void torch::save(const Optimizer& optimizer, Writer& writer); void torch::read(Module& module, Reader& reader); void torch::read(Optimizer& optimizer, Reader& reader); ``` with implementations for both optimizers and modules that write into the `Writer` and read from the `Reader` ebetica ezyang zdevito dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/11619 Differential Revision: D9984664 Pulled By: goldsborough fbshipit-source-id: e03afaa646221546e7f93bb8dfe3558e384a5847	2018-09-20 20:39:34 -07:00
Peter Goldsborough	130d55a5f4	Allow building the C++ API without cereal (#11498 ) Summary: I am working on unifying the C++ extensions and C++ API, and one constraint for this is that we will want to be able to build the C++ API without cereal, since we won't want to ship it with the Python `torch` package. For this I introduce a `TORCH_WITH_CEREAL` option to CMake. If on, the C++ API will be built with cereal and thus serialization support. If off, serialization functions will throw exceptions, but the library will otherwise still compile the same. __This option is on by default, so for regular C++ API users nothing will change__. However, from C++ extensions, we'll be able to turn it off. This effectively means we won't be searching for any cereal headers from C++ API headers, which wouldn't be installed in the Python package. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11498 Differential Revision: D9784803 Pulled By: goldsborough fbshipit-source-id: 5d0a1f2501993012d28cf3d730f45932b483abc4	2018-09-12 16:56:07 -07:00
Yangqing Jia	35d52dbb0e	re-enable USE_MPI (#11416 ) Summary: The previous error was caused by mpi_test not depending on MPI_CXX_LIBRARIES. This might solve the problem. Not tested locally - waiting for CI test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11416 Reviewed By: mingzhe09088 Differential Revision: D9771694 Pulled By: Yangqing fbshipit-source-id: 53e7b4f64eadc88313bc4dd9b8e3f7931cda6e91	2018-09-11 18:26:12 -07:00
Orion Reblitz-Richardson	a175282776	Flags for LMDB, LevelDB, and Caffe2 ops (#11462 ) Summary: Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with ``` USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps ``` Also add a flag to build Caffe2 ops, which is default `ON`. Disable with ``` NO_CAFFE2_OPS=1 python setup.py build_deps ``` cc Yangqing soumith pjh5 mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462 Reviewed By: soumith Differential Revision: D9758156 Pulled By: orionr fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63	2018-09-10 17:27:50 -07:00
Orion Reblitz-Richardson	802d21c8f4	Remove FULL_CAFFE2 flag (#11321 ) Summary: Continuing pjh5's work to remove FULL_CAFFE2 flag completely. With these changes you'll be able to also do something like ``` NO_TEST=1 python setup.py build_deps ``` and this will skip building tests in caffe2, aten, and c10d. By default the tests are built. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321 Reviewed By: mingzhe09088 Differential Revision: D9694950 Pulled By: orionr fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8	2018-09-07 15:09:44 -07:00
Yangqing Jia	68613cf5a2	Windows DLL build with Caffe2 code (#11266 ) Summary: This is an experimental build on top of what orionr and mingzhe09088 built. Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266 Reviewed By: orionr Differential Revision: D9682942 Pulled By: Yangqing fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3	2018-09-06 15:12:20 -07:00
Anders Papitto	a853a74217	defer resolution of mkl to a cmake wrapper library (#11298 ) Summary: this is a fix that's needed for building extensions with a pre-packaged pytorch. Consider the scenario where (1) pytorch is compiled and packaged on machine A (2) the package is downloaded and installed on machine B (3) an extension is compiled on machine B, using the downloaded package Before this patch, stage (1) would embed absolute paths to the system installation of mkl into the generated Caffe2Config.cmake, leading to failures in stage (3) if mkl was not at the same location on B as on A. After this patch, only a reference to the wrapper library is embedded, which is re-resolved on machine B. We are already using a similar approach for cuda. Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298 Differential Revision: D9683150 Pulled By: anderspapitto fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4	2018-09-06 09:10:39 -07:00
Orion Reblitz-Richardson	dda8402447	Cleanup dependency of distributed flags (#11221 ) Summary: Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set. cc pietern The controller you requested could not be found. cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221 Reviewed By: Yangqing Differential Revision: D9664267 Pulled By: orionr fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73	2018-09-06 08:56:00 -07:00
Jesse Hellemn	c0efe6f027	Forward declarations of needed curand functions (#10911 ) Summary: Needed for FULL_CAFFE2=1 with statically linked CUDA libraries. Waiting on advice from Nvidia Pull Request resolved: https://github.com/pytorch/pytorch/pull/10911 Reviewed By: pjh5 Differential Revision: D9636256 Pulled By: orionr fbshipit-source-id: fcad7945910b6c8fb5f52e81cc87dad5fcfb3c65	2018-09-05 16:56:26 -07:00
Yangqing Jia	684b55d762	In default, use third party eigen. Added new flag USE_SYSTEM_EIGEN_INSTALL to control. (#11020 ) Summary: TSIA. apaszke pointed out that it might be better to use third party folder in default, since system Eigen may often be out of date and does not have the version we need to compile successfully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11020 Differential Revision: D9562548 Pulled By: Yangqing fbshipit-source-id: d8ab8a6ebe1f3d9eec638ef726cf5dc4dcf777b5	2018-09-04 10:56:22 -07:00
Orion Reblitz-Richardson	6508db7421	Remove BUILD_CAFFE2 and build everything (#8338 ) Summary: This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification. cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338 Reviewed By: mingzhe09088 Differential Revision: D9600513 Pulled By: orionr fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d	2018-08-31 13:10:24 -07:00
Junjie Bai	302e9cb815	Update onnx submodule to onnx/onnx@bae6333 (#10961 ) Summary: ONNX v1.3.0 release Pull Request resolved: https://github.com/pytorch/pytorch/pull/10961 Reviewed By: houseroad Differential Revision: D9543998 Pulled By: bddppq fbshipit-source-id: b7f0a0553d832d609d3b7613a608f7bf4a2582ef	2018-08-30 15:25:57 -07:00
Orion Reblitz-Richardson	b41988c71e	Cleanup BUILD_DOCS cmake section (#11000 ) Summary: Breaking out of https://github.com/pytorch/pytorch/pull/8338 cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11000 Differential Revision: D9557474 Pulled By: orionr fbshipit-source-id: 7d84914b67ff37bdb7738f9b7846dfeb5b975c00	2018-08-29 10:09:52 -07:00
Tongzhou Wang	efd2aeac9e	Set -Wno-stringop-overflow only with GCC >=7 (#10954 ) Summary: `stringop-overflow` is added in GCC 7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10954 Differential Revision: D9546084 Pulled By: SsnL fbshipit-source-id: e6e68f993f1dbaa879ca66dc43bbcff9c49890ff	2018-08-28 14:25:29 -07:00
Edward Yang	227635142f	Delete THD master_worker (#10731 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731 Differential Revision: D9423675 Pulled By: ezyang fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc	2018-08-22 08:54:36 -07:00
Mingzhe Li	f0d8a36e70	Completely remove build_aten and use_aten (#10469 ) Summary: Breaking out of #8338 to completely remove build_aten and use_aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10469 Reviewed By: orionr Differential Revision: D9413639 Pulled By: mingzhe09088 fbshipit-source-id: b7203aa4f5f2bb95c504c8dc187a3167f2570183	2018-08-20 20:26:42 -07:00
Jesse Hellemn	15d7f49205	Adding ATEN_NO_TEST option to root level cmake for propogation to aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10708 Reviewed By: ml7 Differential Revision: D9410916 Pulled By: pjh5 fbshipit-source-id: b216a9ff7be23ff8754f2fe0b8197b5d006aa08d	2018-08-20 15:40:27 -07:00
Peter Goldsborough	c101a57a74	Build mechanism for custom operators (#10226 ) Summary: This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I: 1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries 2. Created a ` torch/op.h` header for easy inclusion of necessary headers, 3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op. 1. It defines an op in `op.{h,cpp}` 2. Registers it with the JIT using `RegisterOperators` 3. Builds it into a shared library via a `CMakeLists.txt` 4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey! The pure C++ and the Python builds are separate and not coupled in any way. zdevito soumith dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226 Differential Revision: D9296839 Pulled By: goldsborough fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0	2018-08-16 18:56:17 -07:00
Yangqing Jia	40109b16d0	Remove caffe1 specific proto (#10380 ) Summary: This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation. Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380 Differential Revision: D9267981 Pulled By: Yangqing fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390	2018-08-10 11:10:26 -07:00
Gregory Chanan	2d56b5cf8b	Prepare THC for first class scalars (0-dimensional tensors). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10072 Differential Revision: D9082421 Pulled By: gchanan fbshipit-source-id: d4327b07aaef85cc2521393008154ebceae8cbfd	2018-08-01 14:28:51 -07:00
Junjie Bai	ba5d33bede	Re-Enable ATen in C2 in integration builds to test ONNX ATen conversions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10060 Differential Revision: D9081387 Pulled By: bddppq fbshipit-source-id: 13cbff63df5241e013d4ebacfcd6da082e7196f6	2018-07-31 15:27:05 -07:00
Edward Yang	37a226de63	When BUILD_ATEN=OFF, use ATen/core directly (#10019 ) Summary: ATenCore.h is a dummy header to just test that this is working at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019 Reviewed By: smessmer Differential Revision: D9067262 Pulled By: ezyang fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee	2018-07-30 21:09:55 -07:00
peter	53083b8353	Remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and fix CUDA 8 build on Windows (#9491 ) (#9491 ) Summary: Fixes #9092. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9491 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9693 Differential Revision: D8946850 Pulled By: ezyang fbshipit-source-id: bd816f459ab70f6b4a0983305a1ce341bb633707	2018-07-23 06:40:39 -07:00
Edward Yang	a08119afc2	Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561 ) Summary: * THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>` * Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet. * There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides) * Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides * Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides) Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go. Note for gchanan: review from commit "ci" and after Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561 Reviewed By: cpuhrsch Differential Revision: D8901926 Pulled By: ezyang fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510	2018-07-19 14:10:06 -07:00
Anders Papitto	4c615b1796	Introduce libtorch to setup.py build (#8792 ) Summary: Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other. 1) with setup.py. This method - used the setuptools C extension functionality - worked on all platforms - did not build test_jit/test_api binaries - did not include the C++ api - always included python functionality - produced _C.so 2) with cpp_build. This method - used CMake - did not support Windows or ROCM - was capable of building the test binaries - included the C++ api - did not build the python functionality - produced libtorch.so This diff combines the two. 1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build - is CMake-based - works on all platforms - builds the test binaries - includes the C++ api - does not include the python functionality - produces libtorch.so 2) the setup.py build - compiles the python functionality - calls into the CMake build to build libtorch.so - produces _C.so, which has a dependency on libtorch.so In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792 Reviewed By: ezyang Differential Revision: D8764181 Pulled By: anderspapitto fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f	2018-07-18 14:59:33 -07:00
Sebastian Messmer	04fce5eca6	Remove dummy c10 folder (#9367 ) Summary: This was previously meant to be used for c10 code but that plan since changed Pull Request resolved: https://github.com/pytorch/pytorch/pull/9367 Reviewed By: orionr Differential Revision: D8814361 Pulled By: smessmer fbshipit-source-id: 8e35fa74e160343a2bb8432013847677aa73695a	2018-07-12 19:14:55 -07:00
Jesse Hellemn	99ab082366	Making setup.py install work for Caffe2 (#8509 ) Summary: Tested on my mac on a pretty clean anaconda3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8509 Reviewed By: orionr Differential Revision: D8702257 Pulled By: pjh5 fbshipit-source-id: eda03ef9732da9fc56b31d909af5c0e39520d689	2018-07-09 18:10:58 -07:00
Mingzhe Li	01a7ca3d64	Fix Pytorch Mac build issues (#9283 ) Summary: Breaking this out of #8338 This fixed Mac build issues after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9283 Reviewed By: orionr Differential Revision: D8773459 Pulled By: mingzhe09088 fbshipit-source-id: 71942e8e6891a625e6b1a7dc0160e87444c64209	2018-07-09 15:40:46 -07:00
Mingzhe Li	a70a90b28f	Fix pytorch linux build issues (#9273 ) Summary: Breaking out of #8338 This fixes the build issues with pytorch on linux machines after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9273 Reviewed By: orionr Differential Revision: D8768869 Pulled By: mingzhe09088 fbshipit-source-id: 2730426ed1bed398eb5dc804c7348aeeb27c93d3	2018-07-09 14:41:36 -07:00
Tongzhou Wang	f935ba1b05	[build] Enable clang-specific warnings only when using clang (#8869 ) * Wraps clang only warnings in an if * add back -Wno-missing-field-initializers	2018-06-26 11:09:25 -04:00
Tongzhou Wang	2b926aafb0	[build] disable test_expect for pinning cmake to 3.5* in dockerfiles repo (#8850 ) * pin pytorch-linux-xenial* to use cmake 3.5* * disable test_expect	2018-06-25 14:21:42 -04:00
Soumith Chintala	dc186cc9fe	Remove NO_* and WITH_* across codebase, except in setup.py (#8555 ) * remove legacy options from CMakeLists * codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY * cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA * removed NO_* variables and hotpatch them only in setup.py * fix lint	2018-06-15 12:29:48 -04:00
sf-wind	5b86c3af4a	Update from facebook (#8384 ) * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * Remove the code per soumith's comments * Remove the code per soumith's comments * Remove blank lines in the end of file * Resolve conflicts for torch/_thnn/utils.py * Update MKL exporter to IDEEP ops TSIA * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove the code per soumith's comments * [ONNX] Add an ATen fallback pathway for ONNX export (#8273) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface * Remove imaginary file (#8415) * [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format * Enable some reduce operators' ONNX backend tests (#8418) * fix old comment to point to the right file (#8416) * Stop pinning nccl version. (#8421) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428) * Enable some of the ONNX backend test on broadcasting (#8423) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast * Expose proto utils and ONNX (#8073) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files * Rebase creates some weird situations, revert them manually * Remove more weird changes due to rebase * Need to add thread_name.cc after merge	2018-06-13 13:10:45 -07:00
Orion Reblitz-Richardson	edd4e2c5d1	Expose proto utils and ONNX (#8073 ) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files	2018-06-13 10:25:32 -07:00
Peter Goldsborough	7c9e936986	Add way of deprecating ATen functions (#8404 )	2018-06-12 19:26:43 -07:00
Orion Reblitz-Richardson	49eec35e5b	More warning skips (#8382 ) * Remove check for unused private fields * Suppress inconsistent-missing-override * Hopefully last warning skip for Mac * Add one more warning ignore	2018-06-12 14:44:36 -04:00
Yangqing Jia	991bdd7f13	[build] remove the use of NO_CUDA (#8300 ) * Only remove NO_CUDA from CMakeLists.txt * @ezyang's catch	2018-06-12 12:14:36 -04:00
Will Feng	1f02ebd323	Use clang 8 to build CUDA in macOS CI (#8355 ) * Don't use -faligned-new flag for clang < 9.0 * Select Xcode 8.2 toolchain when building CUDA * Better comment	2018-06-11 22:45:40 -04:00
Edward Z. Yang	0cced57cb8	Build DEBUG mode with -O0, fixes #8335 . (#8336 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-11 21:05:12 -04:00
Yangqing Jia	c486b8749d	Add option USE_NVRTC which defaults to off (#8289 )	2018-06-08 14:27:23 -07:00
Sebastian Meßmer	efba555a38	c10 build setup (#8264 ) * Move c10/ to caffe2/dispatch/ * Set up caffe2/utils directory	2018-06-08 12:11:17 -07:00
Yangqing Jia	1a03ba51dc	[cmake] Add and export Modules_CUDA_fix (#8271 ) * Add and export Modules_CUDA_fix * actually, need to include before finding cuda	2018-06-07 21:50:30 -07:00
Orion Reblitz-Richardson	d1bdb3b10a	Remove core and util warnings (#8239 ) * Fix some signed/unsigned mismatches * Skip unused result warning * Explict fallthrough for murmur hash * Enable aligned new support to eliminate warning * Switch to int instead of unsigned in some cases	2018-06-07 09:10:33 -07:00
Yangqing Jia	b401e6b03a	Allow optional build and installation of native test binaries (#8225 ) * test finetuning * install off by default * Turn BUILD_TEST=ON for jenkins. * Turn on install_test in jenkins as well	2018-06-06 20:56:31 -07:00
Sebastian Meßmer	b03ba9023e	Set up a c10 source folder (#7822 ) * Set up a c10 source folder	2018-06-06 16:56:17 -07:00
Edward Z. Yang	bf58bb5e59	Fix cuda.framework error on OSX. (#8136 ) When compiling OSX with CUDA, Caffe2's build system uses find_package(cuda) to get its grubby hands on the CUDA driver library (for some strange reason, FindCUDA doesn't save this information as a variable). Unfortunately, on OSX, sometimes this picks up the cuda.framework folder, and then our build system chokes to death because it doesn't try to link against this as a framework. (Is the folder even a framework? I have no idea). This commit attempts to fix this in a two pronged fashion: 1. For some users, reducing the precedence of frameworks using CMAKE_FIND_FRAMEWORK seems to help. So we set these variables. However, this fix is not perfect; on my laptop it doesn't actually solve the problem. 2. PyTorch doesn't actually need the CUDA driver API. So we only add the dep when building Caffe2. Fixes #8022 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-05 13:37:05 -04:00
bddppq	db5bc71562	Fix and ignore some warnings (#8081 )	2018-06-04 01:01:59 -07:00
bddppq	580d212267	Remove WITH_ROCM cmake flag/variable (use USE_ROCM solely) (#8013 )	2018-05-31 20:50:59 -04:00
Peter Goldsborough	d476d0b4ab	[Hotfix] Bring back warnings and -Werror to ATen (#7866 ) * Bring back warnings and -Werror to ATen * Unbreak... * Fix tbb errors	2018-05-30 21:59:04 -07:00
Soumith Chintala	f4256c9605	cache and use BLAS_SET_BY_USER so that it doesn't set itself to TRUE when run second time (#7942 )	2018-05-30 11:44:23 -07:00
Yinghai Lu	fb5cc630f6	Fix me (#7837 ) * Mini fix * No USE_MKL * Add CAFFE2_USE_EIGEN_FOR_BLAS	2018-05-25 07:38:50 -07:00
Orion Reblitz-Richardson	4bf0202cac	[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399 ) * Have PyTorch depend on minimal libcaffe2.so instead of libATen.so * Build ATen tests as a part of Caffe2 build * Hopefully cufft and nvcc fPIC fixes * Make ATen install components optional * Add tests back for ATen and fix TH build * Fixes for test_install.sh script * Fixes for cpp_build/build_all.sh * Fixes for aten/tools/run_tests.sh * Switch ATen cmake calls to USE_CUDA instead of NO_CUDA * Attempt at fix for aten/tools/run_tests.sh * Fix typo in last commit * Fix valgrind call after pushd * Be forgiving about USE_CUDA disable like PyTorch * More fixes on the install side * Link all libcaffe2 during test run * Make cuDNN optional for ATen right now * Potential fix for non-CUDA builds * Use NCCL_ROOT_DIR environment variable * Pass -fPIC through nvcc to base compiler/linker * Remove THCUNN.h requirement for libtorch gen * Add Mac test for -Wmaybe-uninitialized * Potential Windows and Mac fixes * Move MSVC target props to shared function * Disable cpp_build/libtorch tests on Mac * Disable sleef for Windows builds * Move protos under BUILD_CAFFE2 * Remove space from linker flags passed with -Wl * Remove ATen from Caffe2 dep libs since directly included * Potential Windows fixes * Preserve options while sleef builds * Force BUILD_SHARED_LIBS flag for Caffe2 builds * Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing * Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake * Fixes for the last two changes * Potential fix for Mac build failure * Switch Caffe2 to build_caffe2 dir to not conflict * Cleanup FindMKL.cmake * Another attempt at Mac cpp_build fix * Clear cpp-build directory for Mac builds * Disable test in Mac build/test to match cmake	2018-05-24 07:47:27 -07:00
bddppq	966c65859d	Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2" (#7802 ) * Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) `4898c9e925`" This reverts commit `9c679dab5f`. * Revert "Add BiasCHW fallback for GPU (#7738)" This reverts commit `14ad2e74f1`. * Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)" This reverts commit `2ebcf4bb37`.	2018-05-23 17:58:47 -07:00
Peter Yeh	2ebcf4bb37	[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.	2018-05-23 15:13:09 -07:00
Eric S. Yu	9213336c73	fix cmake USE_ASAN (#7608 )	2018-05-16 11:10:13 -04:00
Orion Reblitz-Richardson	24b41da795	[build] Make ATen buildable without all Caffe2 by root cmake (#7295 ) * Make ATen buildable without all Caffe2 by root cmake * Fix typo in aten cmake * Set BUILD_ATEN from USE_ATEN as compat * Only set BUILD_ATEN from USE_ATEN when on * Have USE_GLOO only set when BUILD_CAFFE2	2018-05-08 10:24:04 -07:00
Bram Wasti	ed6f79ccd2	[caffe2][build] Add ASAN to the debug release of caffe2 (#7107 )	2018-05-07 15:26:51 -07:00
Orion Reblitz-Richardson	053b68c4da	Fix USE_ATEN flag in caffe2 (#7252 )	2018-05-04 08:30:08 -07:00
Orion Reblitz-Richardson	aa38ae303d	[build] Setup to build ATen from root CMake file (#7163 ) * Setup to build ATen from root CMake file * Move aten/src/TH/cmake into cmake/Modules * Add special code path for FindMKL for merge	2018-05-02 19:33:31 -07:00
Paul Jesse Hellemn	2e32e8df75	Statically linking CUDA for Anaconda builds (#6680 ) * Statically linking CUDA for Anaconda builds * typo * Adding a summary line * Comments * Typo fix * Fix faulty parameter passing * Removing problem CUDA modules for now * Fixing unused debugging function * Turning off static cuda linking until script changes are in * Disabling mkl	2018-04-25 18:22:54 -05:00

... 5 6 7 8 9 ...

786 Commits