pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
linuxone	f64906f470	ibm z14/15 SIMD support (#66407 ) Summary: https://github.com/pytorch/pytorch/issues/66406 implemented z arch 14/15 vector SIMD additions. so far besides bfloat all other types have their SIMD implementation. it has 99% coverage and currently passing the local test. it is concise and the main SIMD file is only one header file it's using template metaprogramming, mostly. but still, there are a few macrosses left with the intention not to modify PyTorch much Sleef supports z15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66407 Reviewed By: mrshenli Differential Revision: D33370163 Pulled By: malfet fbshipit-source-id: 0e5a57f31b22a718cd2a9ac59753fb468cdda140	2022-01-04 09:40:18 -08:00
Peter Bell	c34aa715fa	AT_MKL_SEQUENTIAL and build changes (#70259 ) Summary: Re-land of https://github.com/pytorch/pytorch/pull/69419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70259 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33246757 Pulled By: ngimel fbshipit-source-id: 738f8558d4cad6752be14108f9931ec3514f6682	2021-12-22 13:52:23 -08:00
Peter Bell	4829dcea09	Codegen: Generate seperate headers per operator (#68247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247 This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and `NativeMetaFunctions.h` into seperate headers per operator base name. With `at::sum` as an example, we can include: ```cpp <ATen/core/sum.h> // Like Functions.h <ATen/core/sum_ops.h> // Like Operators.h <ATen/core/sum_native.h> // Like NativeFunctions.h <ATen/core/sum_meta.h> // Like NativeMetaFunctions.h ``` The umbrella headers are still being generated, but all they do is include from the `ATen/ops' folder. Further, `TensorBody.h` now only includes the operators that have method variants. Which means files that only include `Tensor.h` don't need to be rebuilt when you modify function-only operators. Currently there are about 680 operators that don't have method variants, so this is potentially a significant win for incremental builds. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32596272 Pulled By: albanD fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272	2021-12-14 06:40:08 -08:00
Jithun Nair	8dfdc3df82	[ROCm] Refactor how to specify AMD gpu targets using PYTORCH_ROCM_ARCH (#61706 ) Summary: Remove all hardcoded AMD gfx targets PyTorch build and Magma build will use rocm_agent_enumerator as backup if PYTORCH_ROCM_ARCH env var is not defined PyTorch extensions will use same gfx targets as the PyTorch build, unless PYTORCH_ROCM_ARCH env var is defined torch.cuda.get_arch_list() now works for ROCm builds PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now. PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906" cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706 Reviewed By: seemethere Differential Revision: D32735862 Pulled By: malfet fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d	2021-12-13 15:41:40 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Nikita Shulga	e305e4d4d8	Suppress common warnings when building by clang (#69710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69710 Namely no range-loop-analysis (that detect when loop variable can not be const reference Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997003 Pulled By: malfet fbshipit-source-id: dba0e7875e5b667e2cc394c70dd75e2403265918	2021-12-10 16:45:38 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
chunyuan	9ad05f2c3a	Upgrade oneDNN to v2.3.3 and package oneDNN Graph API together (#63748 ) Summary: This PR upgrades oneDNN to [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3) and includes [Graph API preview release](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.2) in one package. - oneDNN will be located at `pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN` - The version of oneDNN will be [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3) The main changes on CPU: - v2.3 - Extended primitive cache to improve primitive descriptor creation performance. - Improved primitive cache performance in multithreaded configurations. - Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). - Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats. - Improved performance of reduction primitive - Improved performance of depthwise convolution primitive with NHWC activations for training cases - v2.3.1 - Improved int8 GEMM performance for processors with Intel AVX2 and Intel DL Boost support - Fixed integer overflow for inner product implementation on CPUs - Fixed out of bounds access in GEMM implementation for Intel SSE 4.1 - v2.3.2 - Fixed performance regression in fp32 inner product primitive for processors with Intel AVX512 support - v2.3.3 - Reverted check for memory descriptor stride validity for unit dimensions - Fixed memory leak in CPU GEMM implementation More changes can be found in https://github.com/oneapi-src/oneDNN/releases. - The Graph API provides flexible API for aggressive fusion, and the preview2 supports fusion for FP32 inference. See the [Graph API release branch](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview2) and [spec](https://spec.oneapi.io/onednn-graph/latest/introduction.html) for more details. A separate PR will be submitted to integrate the oneDNN Graph API to Torchscript graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63748 Reviewed By: albanD Differential Revision: D32153889 Pulled By: malfet fbshipit-source-id: 536071168ffe312d452f75d54f34c336ca3778c1	2021-12-09 13:42:40 -08:00
Sicheng Stephen Jia	bede33e3f5	[vulkan] Add image format qualifier to glsl files (#69330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69330 --- ## Context Previously, our shader files did not declare any [image format qualifiers](https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)#Image_formats) for image layouts. This causes the SPIR-V modules produced to declare the [StorageImageWriteWithoutFormat](https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_a_id_capability_a_capability) capability, which requires `shaderStorageImageWriteWithoutFormat` to be enabled in [VkPhysicalDeviceFeatures](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceFeatures.html). `shaderStorageImageWriteWithoutFormat` is not available on some devices, causing errors to be reported by the Vulkan validation layer. ## Changes Vulkan shaders now declare the image format explicitly so that the SPIR-V modules produced are compatible with devices that do not have `shaderStorageImageWriteWithoutFormat` enabled. Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32840909 Pulled By: SS-JIA fbshipit-source-id: 76e0a0da68b423ebc74ae7e839b9cfaf57d2cd39	2021-12-07 16:23:09 -08:00
Peter Bell	9a7732e852	CMake: Support dynamic codegen outputs (#68246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68246 Currently the codegen produces a list of output files at CMake configuration time and the build system has no way of knowing if the outputs change. So if that happens, you basically need to delete the build folder and re-run from scratch. Instead, this generates the output list every time the code generation is run and changes the output to be a `.cmake` file that gets included in the main cmake configuration step. That means the build system knows to re-run cmake automatically if a new output is added. So, for example you could change the number of shards that `Operators.cpp` is split into and it all just works transparently to the user. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32596268 Pulled By: albanD fbshipit-source-id: 15e0896aeaead90aed64b9c8fda70cf28fef13a2	2021-12-07 15:58:06 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Michael Suo	ad182479b0	[deploy] docs (#69251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69251 This adds some actual documentation for deploy, which is probably useful since we told everyone it was experimentally available so they will probably be looking at what the heck it is. It also wires up various compoenents of the OSS build to actually work when used from an external project. Differential Revision: D32783312 D32783312 Test Plan: Imported from OSS Reviewed By: wconstab Pulled By: suo fbshipit-source-id: c5c0a1e3f80fa273b5a70c13ba81733cb8d2c8f8	2021-12-01 21:55:18 -08:00
Peter Bell	e534c5efd7	CMake: Include instead of copying cpu kernel files (#67656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67656 Currently, each cpu kernel file is copied into the build folder 3 times to give them different compilation flags. This changes it to instead generate 3 files that `#include` the original file. The biggest difference is that updating a copied file requires `cmake` to re-run, whereas include dependencies are natively handled by `ninja`. A side benefit is that included files show up directly in the build dependency graph, whereas `cmake` file copies don't. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32566108 Pulled By: malfet fbshipit-source-id: ae75368fede37e7ca03be6ade3d4e4a63479440d	2021-11-30 19:13:53 -08:00
Jiakai Liu	3dc0754c53	[pytorch][mobile] deprecate the LLVM-based static analyzer (#68180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180 Since we've open sourced the tracing-based selective build, we can deprecate the op-dependency-graph-based selective build and the static analyzer tool that produces the dependency graph. ghstack-source-id: 143108377 Test Plan: CIs Reviewed By: seemethere Differential Revision: D32358467 fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c	2021-11-11 16:37:08 -08:00
Peter Bell	4d601a1c36	codegen: Split up source, header and Declarations.yaml generation (#67497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67497 This allows more of the code-generation to happen in parallel, whereas previously all codegen was serialized. Test Plan: Imported from OSS Reviewed By: dagitses, mruberry Differential Revision: D32027250 Pulled By: albanD fbshipit-source-id: 6407c4c3e25ad15d542aa73da6ded6a309c8eb6a	2021-11-03 13:20:54 -07:00
Gordon Fossum	ea4d983885	Modify "gemm" code to enable access to "sbgemm_" routine in OpenBLAS (#58831 ) Summary: OpenBLAS recently added support for bfloat16 GEMM, so this change has PyTorch call out to OpenBLAS for that, like it does for single and double precision Our goal is to try to enable PyTorch to make calls to "sbgemm" in OpenBLAS. We are prepared (if it is your preference) to add fences to the code to limit this change to the Power architecture, but our first instinct is that anyone on any architecture that enables access to sbgemm in their OpenBLAS library should be able to use this code. (but again, we respect that as we are just starting to modify PyTorch, we respect your guidance!) (there is no issue number related to this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58831 Reviewed By: albanD Differential Revision: D29951900 Pulled By: malfet fbshipit-source-id: 3d0a4a638ac95b2ff2e9f6d08827772e28d397c3	2021-11-03 08:53:27 -07:00
Robert Blackwell	cee4e8f35d	Add FlexiBLAS build support per #64752 (#64815 ) Summary: To enable building torch+dependencies, set WITH_BLAS=flexi BLAS=FlexiBLAS Fixes https://github.com/pytorch/pytorch/issues/64752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64815 Reviewed By: jbschlosser Differential Revision: D31997745 Pulled By: albanD fbshipit-source-id: db208d59002f5896608a03132616400f09d972aa	2021-10-28 11:28:00 -07:00
Xiang Gao	b8dfb45ac2	Refactor cub namespace handling (#66219 ) Summary: This PR is to update PyTorch with the following cub changes: - Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added. And I do the following change to PyTorch: - Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag. - Fix caffe2 failures caused by the above change. - Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219 Reviewed By: bdhirsh Differential Revision: D31626931 Pulled By: ngimel fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d	2021-10-25 14:37:09 -07:00
Michael Suo	3ac2c74896	Revert D31082208: Use shared CUPTI by default Test Plan: revert-hammer Differential Revision: D31082208 (`8b0eae5aa8`) Original commit changeset: 14f66af92084 fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db	2021-10-12 14:37:54 -07:00
Edward Yang	8b0eae5aa8	Use shared CUPTI by default (#65401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401 Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gdankel Differential Revision: D31082208 Pulled By: ezyang fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34	2021-10-12 11:01:40 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Chen Lai	355acfdebc	[PyTorch Edge][tracing-based] use operator.yaml to build libtorch library (#66237 ) Summary: https://pxl.cl/1QK3N Enable using the yaml file from tracer to build libtorch library for ios and android. 1. Android: ``` SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 ./scripts/build_pytorch_android.sh x86 ``` libtorch_lite.so x86: 3 MB (larger than H1, static is ~3.2 MB) 2. iOS ``` SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh ``` Binary size: 7.6 MB Size: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66237 ghstack-source-id: 140197164 Reviewed By: dhruvbird Differential Revision: D31463119 fbshipit-source-id: c3f4eb71bdef1969eab6cb60999fec8547641cbd	2021-10-10 14:07:01 -07:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Michael Suo	9b40eaaaab	Revert D31193205: [pytorch][PR] CMake: Limit python include directories to only python libraries Test Plan: revert-hammer Differential Revision: D31193205 (`971c57f1d0`) Original commit changeset: 5c1b554a59d0 fbshipit-source-id: 5719b7df987ded6e7e212749a438db947656df87	2021-09-29 09:49:33 -07:00
Peter Bell	971c57f1d0	CMake: Limit python include directories to only python libraries (#65654 ) Summary: `include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes python, numpy and pybind11 into targets that only torch_python and caffe2_pybind_state are linked to. So, python libraries can't be accidentally included elsewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65654 Reviewed By: gchanan Differential Revision: D31193205 Pulled By: malfet fbshipit-source-id: 5c1b554a59d0e441a701a04ebb62f0032d38b208	2021-09-29 08:09:08 -07:00
Nikita Shulga	399214efd6	Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows Test Plan: revert-hammer Differential Revision: D31172530 (`6b60884f12`) Original commit changeset: 2c69ed0282c5 fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934	2021-09-24 19:18:15 -07:00
Guangyun Han	6b60884f12	Enable CUPTI for kineto by default on windows (#65608 ) Summary: Retry of https://github.com/pytorch/pytorch/pull/62175 See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information. malfet gdankel Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608 Reviewed By: zou3519 Differential Revision: D31172530 Pulled By: gdankel fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66	2021-09-24 13:00:49 -07:00
Nikita Shulga	bc02255d5e	Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows. Test Plan: revert-hammer Differential Revision: D30721329 (`7dbc21bc2b`) Original commit changeset: aa1af47df8cc fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404	2021-09-23 22:14:32 -07:00
Guangyun Han	7dbc21bc2b	Enable CUPTI for kineto by default on windows. (#62175 ) Summary: It fix nothing. For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175 Reviewed By: ezyang Differential Revision: D30721329 Pulled By: gdankel fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84	2021-09-23 15:13:47 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit `03389dc851`. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Tao Xu	18fa58c4e9	[CoreML][OSS] Integrate with CMake (#64523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64523 - Build Pytorch with CoreML delegate - ` USE_PYTORCH_METAL=ON python setup.py install --cmake` - Build iOS static libs - `IOS_PLATFORM=SIMULATOR USE_COREML_DELEGATE=1 ./scripts/build_ios.sh` ghstack-source-id: 138324216 Test Plan: - Test the Helloword example {F657778559} Reviewed By: iseeyuan Differential Revision: D30594041 fbshipit-source-id: 8cece0b2d4b3ef82d3ef4da8c1054919148beb16	2021-09-17 10:32:00 -07:00
Jane Xu	9af6fe991c	Remove CUDA 9.2 and older references from our cmake (#65065 ) Summary: Removes old CUDA references in our cuda.cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/65065 Reviewed By: malfet Differential Revision: D30992673 Pulled By: janeyx99 fbshipit-source-id: 85b524089ed57e5acbc71720267cf05e24a8c20a	2021-09-16 12:54:49 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Nick Kreeger	882b67dff4	Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892 ) Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b	2021-09-14 09:44:18 -07:00
Hanton Yang	22d38bd10d	[OSS] Enable Metal in PyTorch MacOS nightly builds (#63718 ) Summary: Build on https://github.com/pytorch/pytorch/pull/63825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63718 Test Plan: 1.Add `ci/binaries` label to PR, so the CI will build those nightly builds 2.Make sure the following CI jobs build with `USE_PYTORCH_METAL_EXPORT` option is `ON`: ``` ci/circleci: binary_macos_arm64_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_9_cpu_nightly_build ci/circleci: binary_macos_conda_3_6_cpu_nightly_build ci/circleci: binary_macos_conda_3_7_cpu_nightly_build ci/circleci: binary_macos_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_libtorch_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_6_cpu_nightly_build ci/circleci: binary_macos_wheel_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_wheel_3_9_cpu_nightly_build ``` 3.Test `conda` and `wheel` builds locally on [HelloWorld-Metal](https://github.com/pytorch/ios-demo-app/tree/master/HelloWorld-Metal) demo with [(Prototype) Use iOS GPU in PyTorch](https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html) (1) conda ``` conda install https://15667941-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/pytorch-1.10.0.dev20210826-py3.8_0.tar.bz2 ``` (2) wheel ``` pip3 install https://15598647-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/torch-1.10.0.dev20210824-cp38-none-macosx_10_9_x86_64.whl ``` Reviewed By: xta0 Differential Revision: D30593167 Pulled By: hanton fbshipit-source-id: 471da204e94b29c11301c857c50501307a5f0785	2021-08-27 09:25:05 -07:00
Peter Bell	e4f44bec27	Fix pocketfft include path in mobile build (#63714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714 PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target, Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30498369 Pulled By: malfet fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef	2021-08-23 17:48:57 -07:00
Nikita Shulga	bec75daa77	Update protobuf to 3.13.1 (#62571 ) Summary: Update bazel to 4.10.0 Update ASAN_SYMBOLIZER_PATH to llvm-7 Suppress `vptr` ubsan violations in `test_jit` Fix ProtoBuf patching for ONNX which caused Windows builds to crash while attempting to free `std::string` allocated on stack Fixes https://github.com/pytorch/pytorch/issues/62569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62571 Reviewed By: walterddr Differential Revision: D30048685 Pulled By: malfet fbshipit-source-id: 6462c1bef9c42318551d2cf906bbab41e1d4e1cd	2021-08-19 23:43:55 -07:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
peterjc123	9bb1371cc2	Disable RDYNAMIC check with MSVC (#62949 ) Summary: When testing with clang-cl, the flag is added though it is unsupported and that generates a few warnings. Tried a few alternatives like https://cmake.org/cmake/help/latest/module/CheckLinkerFlag.html, but they just don't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62949 Reviewed By: zhouzhuojie, driazati Differential Revision: D30359206 Pulled By: malfet fbshipit-source-id: 1bd27ad5772fe6757fa8c3a4bddf904f88d70b7b	2021-08-18 11:51:23 -07:00
Nikita Shulga	6e5d065b2b	Add pocketfft as submodule (#62841 ) Summary: Using https://github.com/mreineck/pocketfft Also delete explicit installation of pocketfft during the build as it will be available via submodule Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5 Partially addresses https://github.com/pytorch/pytorch/issues/62821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841 Reviewed By: seemethere Differential Revision: D30140441 Pulled By: malfet fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825	2021-08-17 15:29:56 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
Pruthvi Madugundu	ab7a472980	[ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786 ) Summary: - HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm. - TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786 Reviewed By: bdhirsh Differential Revision: D30281682 Pulled By: seemethere fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673	2021-08-13 15:00:43 -07:00
Isuru Fernando	b58e04f156	Make sure FindLAPACK finds the same BLAS library (#49647 ) Summary: BLAS library is found by cmake/Dependencies.cmake and then LAPACK library is found by FindLAPACK.cmake which in turn calls FindBLAS.cmake. This means that we are searching for BLAS twice and they might be different things. By setting a few variables, this can be avoided. cc seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647 Reviewed By: seemethere, ejguan Differential Revision: D29943680 Pulled By: malfet fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59	2021-08-02 20:41:00 -07:00
Can Balioglu	7565039ee9	Support system-provided Intel TBB (#61934 ) Summary: This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic. Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934 Reviewed By: malfet Differential Revision: D29805416 Pulled By: cbalioglu fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd	2021-08-02 07:39:00 -07:00
Brian Vaughan	2eef1f27f8	Disable ccache for nccl builds (#62208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62208 reverts https://github.com/pytorch/pytorch/pull/55814 which removed a workaround for: https://github.com/pytorch/pytorch/issues/13362 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29935472 Pulled By: nairbv fbshipit-source-id: 7ce9cde1408f17153632036fd128814032739746	2021-07-27 08:07:26 -07:00
Jane Xu	e318058ffe	Ignore LNK4099 for debug binary libtorch builds (#62060 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62060 Test Plan: This CI shouldn't break and https://github.com/pytorch/pytorch/pull/62061 Reviewed By: driazati Differential Revision: D29877487 Pulled By: janeyx99 fbshipit-source-id: 497f84caab3f9ae609644fd397ad87a6dc8a2a77	2021-07-23 09:31:41 -07:00
imaginary-person	9e53c823b8	Add AVX512 support in ATen & remove AVX support (#61903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61903 ### Remaining Tasks - [ ] Collate results of benchmarks on two Intel Xeon machines (with & without CUDA, to check if CPU throttling causes issues with GPUs) - make graphs, including Roofline model plots (Intel Advisor can't make them with libgomp, though, but with Intel OpenMP). ### Summary 1. This draft PR produces binaries with with 3 types of ATen kernels - default, AVX2, AVX512 . Using the environment variable `ATEN_AVX512_256=TRUE` also results in 3 types of kernels, but the compiler can use 32 ymm registers for AVX2, instead of the default 16. ATen kernels for `CPU_CAPABILITY_AVX` have been removed. 2. `nansum` is not using AVX512 kernel right now, as it has poorer accuracy for Float16, than does AVX2 or DEFAULT, whose respective accuracies aren't very good either (#59415). It was more convenient to disable AVX512 dispatch for all dtypes of `nansum` for now. 3. On Windows , ATen Quantized AVX512 kernels are not being used, as quantization tests are flaky. If `--continue-through-failure` is used, then `test_compare_model_outputs_functional_static` fails. But if this test is skipped, `test_compare_model_outputs_conv_static` fails. If both these tests are skipped, then a third one fails. These are hard to debug right now due to not having access to a Windows machine with AVX512 support, so it was more convenient to disable AVX512 dispatch of all ATen Quantized kernels on Windows for now. 4. One test is currently being skipped - [test_lstm` in `quantization.bc](https://github.com/pytorch/pytorch/issues/59098) - It fails only on Cascade Lake machines, irrespective of the `ATEN_CPU_CAPABILITY` used, because FBGEMM uses `AVX512_VNNI` on machines that support it. The value of `reduce_range` should be used as `False` on such machines. The list of the changes is at https://gist.github.com/imaginary-person/4b4fda660534f0493bf9573d511a878d. Credits to ezyang for proposing `AVX512_256` - these use AVX2 intrinsics but benefit from 32 registers, instead of the 16 ymm registers that AVX2 uses. Credits to limo1996 for the initial proposal, and for optimizing `hsub_pd` & `hadd_pd`, which didn't have direct AVX512 equivalents, and are being used in some kernels. He also refactored `vec/functional.h` to remove duplicated code. Credits to quickwritereader for helping fix 4 failing complex multiplication & division tests. ### Testing 1. `vec_test_all_types` was modified to test basic AVX512 support, as tests already existed for AVX2. Only one test had to be modified, as it was hardcoded for AVX2. 2. `pytorch_linux_bionic_py3_8_gcc9_coverage_test1` & `pytorch_linux_bionic_py3_8_gcc9_coverage_test2` are now using `linux.2xlarge` instances, as they support AVX512. They were used for testing AVX512 kernels, as AVX512 kernels are being used by default in both of the CI checks. Windows CI checks had already been using machines with AVX512 support. ### Would the downclocking caused by AVX512 pose an issue? I think it's important to note that AVX2 causes downclocking as well, and the additional downclocking caused by AVX512 may not hamper performance on some Skylake machines & beyond, because of the double vector-size. I think that [this post with verifiable references is a must-read](https://community.intel.com/t5/Software-Tuning-Performance/Unexpected-power-vs-cores-profile-for-MKL-kernels-on-modern-Xeon/m-p/1133869/highlight/true#M6450). Also, AVX512 would _probably not_ hurt performance on a high-end machine, [but measurements are recommended](https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/). In case it does, `ATEN_AVX512_256=TRUE` can be used for building PyTorch, as AVX2 can then use 32 ymm registers instead of the default 16. [FBGEMM uses `AVX512_256` only on Xeon D processors](https://github.com/pytorch/FBGEMM/pull/209), which are said to have poor AVX512 performance. This [official data](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) is for the Intel Skylake family, and the first link helps understand its significance. Cascade Lake & Ice Lake SP Xeon processors are said to be even better when it comes to AVX512 performance. Here is the corresponding data for [Cascade Lake](https://cdrdv2.intel.com/v1/dl/getContent/338848) - ![CASCADE LAKE AVX2](https://user-images.githubusercontent.com/76181208/120666172-ffec3f80-c451-11eb-8ea1-8933ccc12a1b.PNG) ![CASCADE LAKE AVX512](https://user-images.githubusercontent.com/76181208/120666190-04b0f380-c452-11eb-9faa-38d233c874c8.PNG) The corresponding data isn't publicly available for Intel Xeon SP 3rd gen (Ice Lake SP), but [Intel mentioned that the 3rd gen has frequency improvements pertaining to AVX512](https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf). Ice Lake SP machines also have 48 KB L1D caches, so that's another reason for AVX512 performance to be better on them. ### Is PyTorch always faster with AVX512? No, but then PyTorch is not always faster with AVX2 either. Please refer to #60202. The benefit from vectorization is apparent with with small tensors that fit in caches or in kernels that are more compute heavy. For instance, AVX512 or AVX2 would yield no benefit for adding two 64 MB tensors, but adding two 1 MB tensors would do well with AVX2, and even more so with AVX512. It seems that memory-bound computations, such as adding two 64 MB tensors can be slow with vectorization (depending upon the number of threads used), as the effects of downclocking can then be observed. Original pull request: https://github.com/pytorch/pytorch/pull/56992 Reviewed By: soulitzer Differential Revision: D29266289 Pulled By: ezyang fbshipit-source-id: 2d5e8d1c2307252f22423bbc14f136c67c3e6184	2021-07-22 08:51:49 -07:00
Hong Xu	7acb8b71e1	Remove AVX detection code that duplicates FindAVX.cmake (#61748 ) Summary: This PR deletes some code in `MiscCheck.cmake` that perform the exact same functionality as `FindAVX.cmake`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748 Reviewed By: ejguan Differential Revision: D29791282 Pulled By: malfet fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213	2021-07-20 14:34:36 -07:00
Hong Xu	f912889726	Remove unnecessary Ubuntu version checks (#61738 ) Summary: PR https://github.com/pytorch/pytorch/issues/5401 missed another Ubuntu version check in `cmake/MiscCheck.cmake`. The check for available functions added by https://github.com/pytorch/pytorch/issues/5401 are already present below the code snippet that this PR deletes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61738 Reviewed By: mrshenli Differential Revision: D29757525 Pulled By: ezyang fbshipit-source-id: 7f5f9312284973481a8b8a2b9c51cc09774722e9	2021-07-19 13:04:24 -07:00
Tongliang Liao	0afbb9e81e	`PYTHON_LIBRARY` may be set to empty or NOTFOUND. (#61230 ) Summary: Not sure why (maybe from dependencies?) but it can certainly break package lookup upon re-entry of cmake. So instead of checking whether they are defined, we should check whether there is any meaningful value inside. Fixes https://github.com/pytorch/pytorch/issues/59887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61230 Reviewed By: H-Huang Differential Revision: D29668766 Pulled By: malfet fbshipit-source-id: 79a59578740c4434327aff4f9a22eba9c4bf48d1	2021-07-13 07:09:31 -07:00
Luca Wehrstedt	c830db0265	Raise error in CMake for CUDA <9.2 (#61462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61462 Anything before CUDA 9.2 is not supported (see https://github.com/pytorch/pytorch/pull/36848), and perhaps not even that. ghstack-source-id: 133312018 Test Plan: CI Reviewed By: samestep Differential Revision: D29637251 fbshipit-source-id: 4300169b7298274b2074649342902a34bd2220b5	2021-07-09 11:28:38 -07:00
shmsong	ee2dd35ef4	Resolving native dependency and try_run for cross compile (#59764 ) Summary: This is a PR on build system that provides support for cross compiling on Jetson platforms. The major change is: 1. Disable try runs for cross compiling in `COMPILER_WORKS`, `BLAS`, and `CUDA`. They will not be able to perform try run on a cross compile setup Pull Request resolved: https://github.com/pytorch/pytorch/pull/59764 Reviewed By: soulitzer Differential Revision: D29524363 Pulled By: malfet fbshipit-source-id: f06d1ad30b704c9a17d77db686c65c0754db07b8	2021-07-09 09:29:21 -07:00
Elton Leander Pinto	7481c6fc02	Bump googletest version to v1.11.0 (#61395 ) Summary: This PR bumps the `googletest` version to v1.11.0. To facilitate this change, `CAFFE2_ASAN_FLAG` and `CAFFE2_TSAN_FLAG` are divided into corresponding compiler and linker variants. This is required because `googletest v1.11.0` sets the `-Werror` flag. The `-pie` flag is a linker flag, and passing it to a compiler invocation results in a `-Wunused-command-line-argument` warning, which in turn will cause `googletest` to fail to build with ASAN. Fixes https://github.com/pytorch/pytorch/issues/60865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61395 Reviewed By: iramazanli Differential Revision: D29620970 Pulled By: 1ntEgr8 fbshipit-source-id: cdb1d3d12e0fff834c2e62971e42c03f8c3fbf1b	2021-07-08 16:29:17 -07:00
zhouzhuojie	6107cf3750	Add --jobs 0 for git submodule update (#61311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61311 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61152 Some related docs about `submodule.fetchJobs` https://git-scm.com/docs/git-config#Documentation/git-config.txt-submodulefetchJobs ``` time git submodule update --init --recursive ________________________________________________________ Executed in 243.20 secs fish external usr time 49.64 secs 213.00 micros 49.64 secs sys time 29.27 secs 795.00 micros 29.27 secs ``` ``` time git submodule update --init --recursive --jobs 4 ________________________________________________________ Executed in 143.04 secs fish external usr time 51.06 secs 246.00 micros 51.06 secs sys time 30.96 secs 742.00 micros 30.96 secs ``` ``` time git submodule update --init --recursive --jobs 8 ________________________________________________________ Executed in 124.64 secs fish external usr time 51.76 secs 264.00 micros 51.76 secs sys time 30.49 secs 739.00 micros 30.49 secs ``` ``` time git submodule update --init --recursive --jobs 0 # use all online cpus ________________________________________________________ Executed in 129.75 secs fish external usr time 51.64 secs 181.00 micros 51.64 secs sys time 31.49 secs 781.00 micros 31.49 secs ``` Test Plan: Imported from OSS Reviewed By: 1ntEgr8 Differential Revision: D29560875 Pulled By: zhouzhuojie fbshipit-source-id: 556027dffe744c66428075a8a1bf64683930aaaf	2021-07-07 16:28:18 -07:00
Nikita Shulga	4036820506	Add PocketFFT support (#60976 ) Summary: Needed on platforms, that do not have MKL, such as aarch64 and M1 - Add `AT_POCKETFFT_ENABLED()` to Config.h.in - Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT - Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations Fixes https://github.com/pytorch/pytorch/issues/41592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976 Reviewed By: walterddr, driazati, janeyx99, samestep Differential Revision: D29466530 Pulled By: malfet fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf	2021-06-30 16:28:20 -07:00
Peter Bell	4a7d281119	Migrate THAllocator to ATen (#60325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60325 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371715 Pulled By: ngimel fbshipit-source-id: 78ec8368a48e1a4690d0664a0b02d2a235af98ff	2021-06-24 19:42:14 -07:00
Nikita Shulga	40a7c317bc	Run BLAS F2C checks on host architecture (#60703 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60703 Reviewed By: driazati Differential Revision: D29379727 Pulled By: malfet fbshipit-source-id: dadbb1d39373887f07d59d0a05e093a5d070b016	2021-06-24 18:44:41 -07:00
Peter Bell	31a884987d	Remove some TH includes from ATen (#60323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60323 Test Plan: Imported from OSS Reviewed By: malfet, anjali411 Differential Revision: D29252862 Pulled By: ngimel fbshipit-source-id: 9ea13495d382c04dfd52b8dd63314f53b7e83936	2021-06-22 10:55:17 -07:00
Luca Wehrstedt	08ce5eedf5	[reland] Move RPC agents to libtorch (#60170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170 Reland of #59939. Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193234 fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60	2021-06-18 05:15:09 -07:00
Mike Ruberry	f233274f30	Revert D28875276: Move RPC agents to libtorch Test Plan: revert-hammer Differential Revision: D28875276 (`fc50f91929`) Original commit changeset: f2f6970fd74d fbshipit-source-id: 3c52af652579733ebea8ddfb06576a0ce262bf78	2021-06-17 00:48:58 -07:00
Luca Wehrstedt	fc50f91929	Move RPC agents to libtorch (#59939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59939 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28875276 fbshipit-source-id: f2f6970fd74de5f112636e78edaa4410c61d8c45	2021-06-15 16:20:53 -07:00
Nikita Shulga	1ea5c19c19	Add USE_WHOLE_CUDNN option (#59744 ) Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after https://github.com/pytorch/pytorch/pull/59721 towards resolving fast kernels stripping reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a	2021-06-09 21:12:42 -07:00
Nikita Shulga	8845cbabf0	[CMake] Split caffe2::cudnn into public and private (#59721 ) Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336	2021-06-09 13:18:48 -07:00
Jiakai Liu	501320ed81	[pytorch] deprecate default_op_deps.yaml (#59573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59573 To do mobile selective build, we have several options: 1. static dispatch; 2. dynamic dispatch + static analysis (to create the dependency graph); 3. dynamic dispatch + tracing; We are developing 3. For open source, we used to only support 1, and currently we support both 1 and 2. This file is only used for 2. It was introduced when we deprecated the static dispatch (1). The motivation was to make sure we have a low-friction selective build workflow for dynamic dispatch (2). As the name indicates, it is the default dependency graph that users can try if they don't bother to run the static analyzer themselves. We have a CI to run the full workflow of 2 on every PR, which creates the dependency graph on-the-fly instead of using the committed file. Since the workflow to automatically update the file has been broken for a while, it started to confuse other pytorch developers as people are already manually editing it, and it might be broken for some models already. We reintroduced the static dispatch recently, so we decide to deprecate this file now and automatically turn on static dispatch if users run selective build without providing the static analysis graph. The tracing-based selective build will be the ultimate solution we'd like to provide for OSS, but it will take some more effort to polish and release. Differential Revision: D28941020 D28941020 Test Plan: Imported from OSS Reviewed By: dhruvbird Pulled By: ljk53 fbshipit-source-id: 9977ab8568e2cc1bdcdecd3d22e29547ef63889e	2021-06-07 19:37:37 -07:00
Jeff Daily	24e27af683	[ROCm] enable kernel asserts (#49624 ) Summary: Addresses missing ROCm feature indicated in https://github.com/pytorch/pytorch/issues/38943. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49624 Reviewed By: agolynski Differential Revision: D28902459 Pulled By: malfet fbshipit-source-id: 29c9b552770241a0ec52cd057ea45efc4389d838	2021-06-07 13:43:07 -07:00
Nikita Shulga	63956610a7	Search for static OpenBLAS compiled with OpenMP (#59428 ) Summary: Before that, only dynamically linked OpenBLAS compield with OpenMP could be found. Also get rid of hardcoded codepath for libgfortran.a in FindLAPACK.cmake Only affects aarch64 linux builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/59428 Reviewed By: agolynski Differential Revision: D28891314 Pulled By: malfet fbshipit-source-id: 5af55a14c85ac66551ad2805c5716bbefe8d55b2	2021-06-04 08:09:21 -07:00
Michael Wootton	e66015dadf	Add build support for kineto + rocm (#58401 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58399 CMake changes to allow kineto to build with rocm support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58401 Reviewed By: mruberry Differential Revision: D28479807 Pulled By: walterddr fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7	2021-06-03 12:15:20 -07:00
Eli Uriegas	eb1adc4c5e	cmake: Add USE_GLOO_WITH_OPENSSL to Summary.cmake (#59321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59321 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28839370 Pulled By: seemethere fbshipit-source-id: 0d4b35c05c2b1a78b752088cd16cd6263958e7f6	2021-06-02 11:10:55 -07:00
neginraoof	599f5058cf	[ONNX] Update ONNX to rel-1.9 (#55889 ) (#57080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57080 ONNX optimizer is removed in ONNX 1.9 This PR removes ONNX optimizer from a C++ code path and uses `try-except` block in Python to keep it compatible with both ONNX-1.8 and 1.9. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28467330 Pulled By: malfet fbshipit-source-id: 5e4669dd0537648898e593f9e253da18d6dc7568 Co-authored-by: neginraoof <neginmr@utexas.edu> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-06-02 08:27:17 -07:00
Jeff Daily	ba694520e5	[ROCm] fix JIT codegen (#57400 ) Summary: Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT. - ROCM_VERSION macro must be available to both device and host compilation passes. - Unifies some of CUDA and HIP differences in the code generated. - NAN / POS_INFINITY / NEG_INFINITY - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated] - Differentiates bf16 codegen for HIP. - Optionally provides missing macros when using hiprtc precompiled header feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400 Reviewed By: ejguan Differential Revision: D28421065 Pulled By: malfet fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074	2021-05-27 11:45:07 -07:00
Nikita Shulga	7179e7ea7b	[CMake] Prefer third_party/pybind11 by default (#58951 ) Summary: To make build behaviour aligned with other third_party/ libraries, introduce `USE_SYSTEM_PYBIND11 (`d55b25a633`)` build option, which set to OFF by default, which means PyTorch will be build with bundled pybind11 even if other version is already installed locally. Fixes https://github.com/pytorch/pytorch/issues/58750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951 Reviewed By: driazati Differential Revision: D28690411 Pulled By: malfet fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c	2021-05-25 15:10:17 -07:00
Nikita Shulga	2dda8d7571	Move cublas dependency after CuDNN (#58287 ) Summary: Library linking order matters during static linking Not sure whether its a bug or a feature, but if cublas is reference before CuDNN, it will be partially statically linked into the library, even if it is not used Pull Request resolved: https://github.com/pytorch/pytorch/pull/58287 Reviewed By: janeyx99 Differential Revision: D28433165 Pulled By: malfet fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c	2021-05-24 09:39:09 -07:00
Gordon Fossum	007fe949aa	Adding a new include directory in BLIS search path (#58166 ) Summary: While trying to build PyTorch with BLIS as the backend library, we found a build issue due to some missing include files. This was caused by a missing directory in the search path. This patch adds that path in FindBLIS.cmake. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/58166 Reviewed By: zou3519 Differential Revision: D28640460 Pulled By: malfet fbshipit-source-id: d0cd3a680718a0a45788c46a502871b88fbadd52	2021-05-24 08:57:02 -07:00
Nathan John Sircombe	bf00d26deb	Enables builds with Compute Library backend for oneDNN (#55913 ) Summary: Since v1.7, oneDNN (MKL-DNN) has supported the use of Compute Library for the Arm architeture to provide optimised convolution primitives on AArch64. This change enables the use of Compute Library in the PyTorch build. Following the approach used to enable the use of CBLAS in MKLDNN, It is enabled by setting the env vars USE_MKLDNN and USE_MKLDNN_ACL. The location of the Compute Library build must be set useing `ACL_ROOT_DIR`. This is an extension of the work in https://github.com/pytorch/pytorch/pull/50400 which added support for the oneDNN/MKL-DNN backend on AArch64. _Note: this assumes that Compute Library has been built and installed at ACL_ROOT_DIR. Compute library can be downloaded here: `https://github.com/ARM-software/ComputeLibrary`_ Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55913 Reviewed By: ailzhang Differential Revision: D28559516 Pulled By: malfet fbshipit-source-id: 29d24996097d0a54efc9ab754fb3f0bded290005	2021-05-20 07:43:56 -07:00
Xiang Gao	6c70cbedb6	step 0 of cuDNN v8 convolution API integration (#51390 ) Summary: This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release. The work is not complete, and this PR is only step 0. What this PR does: - Add cudnn-frontend as a submodule. - Modify cmake to build that submodule. - Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default. - Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below. What this PR doesn't: - Only convolution forward, no backward. The backward will use v7 API. - No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions. - No test beyond PyTorch's unit tests. - Not tested for correctness on real models. - Not benchmarked for performance. - Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR) - cuDNN benchmark is not supported. - There are failing tests, which will be resolved later: ``` FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9 FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an... FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM ``` Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390 Reviewed By: malfet Differential Revision: D28513167 Pulled By: ngimel fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740	2021-05-19 12:54:09 -07:00
peter	432676599c	Stop installing libuv on Windows (#51936 ) Summary: Fixes #{issue number} gunandrose4u Pull Request resolved: https://github.com/pytorch/pytorch/pull/51936 Reviewed By: malfet Differential Revision: D28467662 Pulled By: seemethere fbshipit-source-id: 28d203ee3af13d6a3158f188c2e889e310ee6010	2021-05-17 08:52:29 -07:00
Ilia Cherniavskii	6997e7bd39	Update Kineto submodule (#58179 ) Summary: Update Kineto submodule, minor api changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/58179 Test Plan: CI Reviewed By: gdankel Differential Revision: D28391369 Pulled By: ilia-cher fbshipit-source-id: 61fbf63d9ec2db66fac203944679e4b99cb0d568	2021-05-13 04:03:04 -07:00
Ivan Kobzarev	111c99cdfd	[vulkan] Fix glslc path for desktop build (#56507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56507 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D27951058 Pulled By: IvanKobzarev fbshipit-source-id: 29443b61264bb28ae4982ed9f4c21f1c45f6b519	2021-05-11 14:18:39 -07:00
Ilia Cherniavskii	c714596027	[kineto] Update Kineto submodule, cupti library paths (#57789 ) Summary: Update kineto submodule, improve cupti detection Pull Request resolved: https://github.com/pytorch/pytorch/pull/57789 Test Plan: CI Reviewed By: ngimel Differential Revision: D28297175 Pulled By: ilia-cher fbshipit-source-id: 5895270fae160097ae8872a592984d0e4a1b187b	2021-05-10 19:15:59 -07:00
Ilia Cherniavskii	65fad0ebd2	Expand Kineto platform support (ci-all) (#56323 ) Summary: Expanding support to all builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/56323 Test Plan: CI Reviewed By: malfet Differential Revision: D28171478 Pulled By: ilia-cher fbshipit-source-id: 16bc752d1be3cbaeda5316f5d8a687ae05a83d22	2021-05-05 15:00:01 -07:00
Nikita Shulga	133d8abbfc	Compute nvrtc during libtorch build (#57579 ) Summary: The warning is completely harmless, but it still its nice not to emit it when it could be computed. Fixes https://github.com/pytorch/pytorch/issues/53350 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57579 Reviewed By: walterddr Differential Revision: D28208938 Pulled By: malfet fbshipit-source-id: 8dcc3f1bff7c5ed2c0157268c3063228d3c445b6	2021-05-04 22:51:24 -07:00
davidriazati@fb.com	264d87985a	Use ld.gold by default to link in CI (#57061 ) Summary: This adds an option to CMake to use `ld.gold` to link rather than `ld` (which symlinks to `ld.bfd` on Ubuntu by default). This shouldn't change any functionality, only a mild improvement on link times during builds (shaves off 1 minute) on CI. Verify by searching for `ld.gold is available` in [the logs](https://circleci.com/api/v1.1/project/github/pytorch/pytorch/13046834/output/105/0?file=true&allocation-id=608c434338107e5b6cf938a1-0-build%2F7BDA2FF1) ](https://our.intern.facebook.com/intern/diff/28123522/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57061 Pulled By: driazati Reviewed By: janeyx99 Differential Revision: D28123522 fbshipit-source-id: 5a60798ca4785427fd92bbf3b3aa5f63730e9b20	2021-05-03 10:05:36 -07:00
davidriazati@fb.com	c44cbc63cc	Ignore more compiler warnings, unify WERROR options (#56630 ) Summary: This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet). ](https://our.intern.facebook.com/intern/diff/28005063/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630 Pulled By: driazati Reviewed By: malfet Differential Revision: D28005063 fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0	2021-04-29 21:20:29 -07:00
davidriazati@fb.com	4b96fc060b	Remove distutils (#57040 ) Summary: [distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places. Fixes #56527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040 Pulled By: driazati Reviewed By: nikithamalgifb Differential Revision: D28051356 fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720	2021-04-29 12:10:11 -07:00
davidriazati@fb.com	d1b6383d65	Hide warnings for deprecated quantization APIs (#56291 ) Summary: These have a tracking task to actually fix them but in the meantime they should not be clogging up everyone's build output (see #55952). ](https://our.intern.facebook.com/intern/diff/27830229/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56291 Pulled By: driazati Reviewed By: bertmaher Differential Revision: D27830229 fbshipit-source-id: f1e5d6e9b2c63d4a4ad99a1744a520f8c681c22b	2021-04-19 11:11:33 -07:00
Jeff Daily	e1752ffa04	[reland][ROCm] use hiprtc precompiled header (#55965 ) Summary: Revert "Revert D27449031 (`2a7df657fe`): [pytorch][PR] [ROCm] use hiprtc precompiled header". Reland PR https://github.com/pytorch/pytorch/issues/54350. This reverts commit `204ac21bf1`. The original PR was reverted under suspicion that it was causing CI instability, but it was instead due to a hardware failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55965 Reviewed By: jbschlosser Differential Revision: D27755907 Pulled By: malfet fbshipit-source-id: 75bf0b9d888df3dee62f00a366b1123757e0474e	2021-04-15 15:47:56 -07:00
Jeff Daily	a128938a75	[ROCm] add MAGMA_HOME env var hint to cmake, centos-rocm Dockerfile (#54511 ) Summary: MAGMA_HOME was previously set for the ubuntu-rocm/Dockerfile. However, this missed centos builds as well as any builds that do not use the CI image environments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54511 Reviewed By: jbschlosser Differential Revision: D27755983 Pulled By: malfet fbshipit-source-id: 1ffd2cd100f4221c2bb64e6915fa3372ee1f6247	2021-04-15 01:06:44 -07:00
Eddie Yan	81f181567a	Add `USE_MAGMA` build flag (#55994 ) Summary: Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master). A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be manually deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild? CC malfet ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994 Reviewed By: mruberry Differential Revision: D27766287 Pulled By: malfet fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421	2021-04-15 00:43:12 -07:00
Eli Uriegas	b98f011cd4	cmake: Enable (s)ccache for nccl builds (#55814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55814 I don't really know if the original issue is resolved but let's just check and see if this passes CI so that we can potentially get some speed up on our builds Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D27715734 Pulled By: seemethere fbshipit-source-id: a8f90774dfd25b0abf8e57283fe3591a8d8f3c4b	2021-04-13 14:49:25 -07:00
Alexander Golynski	204ac21bf1	Revert D27449031: [pytorch][PR] [ROCm] use hiprtc precompiled header Test Plan: revert-hammer Differential Revision: D27449031 (`2a7df657fe`) Original commit changeset: 81a8d7847a47 fbshipit-source-id: b7b970c8ea4110357fba3ad4d52a86fa5641d90c	2021-04-01 06:42:04 -07:00
Jeff Daily	2a7df657fe	[ROCm] use hiprtc precompiled header (#54350 ) Summary: HIP's runtime compiler (hiprtc) is adding support for precompiled HIP headers in the ROCm 4.2 release. Conditionally add support for this feature. Using this feature will improve the ROCm torch wheel user experience; users will no longer need to install HIP headers separately to use torch JIT features. The use of this feature is conditionalized on a new ROCM_VERSION macro. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54350 Reviewed By: H-Huang Differential Revision: D27449031 Pulled By: malfet fbshipit-source-id: 81a8d7847a47ce2bb253d1ea58740ef66ed154a3	2021-03-31 13:36:50 -07:00
Shruti Ramesh	f1f3c8b0fa	Adding PyTorch + DNNL + AMD BLIS path (#54953 ) Summary: These changes provide the user with an additional option to choose the DNNL+BLIS path for PyTorch. This assumes BLIS is already downloaded or built from source and the necessary library file is available at the location: $BLIS_HOME/lib/libblis.so and include files are available at: $BLIS_HOME/include/blis/blis.h and $BLIS_HOME/include/blis/cblas.h Export the below variables to build PyTorch with MKLDNN+BLIS and proceed with the regular installation procedure as below: $export BLIS_HOME=path-to-BLIS $export PATH=$BLIS_HOME/include/blis:$PATH LD_LIBRARY_PATH=$BLIS_HOME/lib:$LD_LIBRARY_PATH $export BLAS=BLIS USE_MKLDNN_CBLAS=ON WITH_BLAS=blis $python setup.py install CPU only Dockerfile to build PyTorch with AMD BLIS is available at : docker/cpu-blis/Dockerfile Example command line to build using the Dockerfile: sudo DOCKER_BUILDKIT=1 docker build . -t docker-image-repo-name Example command line to run the built docker container: sudo docker run --name container-name -it docker-image-repo-name Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/54953 Reviewed By: glaringlee Differential Revision: D27466799 Pulled By: malfet fbshipit-source-id: e03bae9561be3a67429df3b1be95a79005c63050	2021-03-31 10:40:25 -07:00
Jeff Daily	1dffbe759b	[ROCm] utilize PUBLIC vs PRIVATE linking to avoid incorrect dependencies (#54727 ) Summary: Fixes the build of projects that depend on torch, such as torchaudio. Otherwise torchaudio will complain that gloo_hip is missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54727 Reviewed By: H-Huang Differential Revision: D27361513 Pulled By: ezyang fbshipit-source-id: 714cc2db23e7adf3e89303e941b78c27625b9460	2021-03-30 19:22:56 -07:00
Sam Estep	5bcbbf5373	Lint trailing newlines (#54737 ) Summary: Context: https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines. The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR: - `.github/workflows/lint.yml` - `mypy-strict.ini` - `tools/README.md` - `tools/test/test_trailing_newlines.py` - `tools/trailing_newlines.py` I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository): - [How to detect file ends in newline?](https://stackoverflow.com/q/38746) - [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068) - [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800) - [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632) - [git ensure newline at end of each file](https://stackoverflow.com/q/57770972) To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737 Test Plan: Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR: - https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true In contrast, this run (after correcting the trailing newlines in this PR) succeeded: - https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241 To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow): ``` python tools/test/test_trailing_newlines.py ``` Reviewed By: malfet Differential Revision: D27409736 Pulled By: samestep fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19	2021-03-30 13:09:52 -07:00
Michael Melesse	2620bce42a	[ROCM] load only hipfft separately past rocm4.1 (#54349 ) Summary: This PR is a follow up to https://github.com/pytorch/pytorch/pull/53408. It only loads hipfft if the version is rocm 4.1 or after and stops loading rocfft. This was done to resolve some issues observed in our internal ci due to conflicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54349 Reviewed By: ezyang Differential Revision: D27374252 Pulled By: ngimel fbshipit-source-id: 724e80df5011ea8fabd81739e18ae8a13d3a7ea0	2021-03-26 19:54:25 -07:00
Chester Liu	6a4d2c61d5	Allow linking against vcomp on Windows (#54132 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54054 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54132 Reviewed By: zou3519 Differential Revision: D27181524 Pulled By: malfet fbshipit-source-id: b79b34afb7edcc594d9b5907c5a7505b9cc5683b	2021-03-19 14:36:07 -07:00
Leonard Lausen	90bbe0b38b	cmake: auto-detect ccache to speed up developer builds (#49389 ) Summary: https://ccache.dev/ is a compiler cache that speeds up subsequent builds. Auto-detecting ccache ensures that it is used on systems where it is available, greatly improving build times for developers. There is no risk in enabling ccache in practice. Please refer to https://ccache.dev/ for a short summary / motivation Pull Request resolved: https://github.com/pytorch/pytorch/pull/49389 Reviewed By: ejguan Differential Revision: D27169957 Pulled By: malfet fbshipit-source-id: 673b60bbceb0d323901c8a992a75792c6da9b805	2021-03-18 14:20:53 -07:00
Michael Melesse	4c1af249fb	[ROCM] load hipfft separately from rocfft (#53408 ) Summary: This PR makes changes to how hipfft is loaded in pytorch. hipfft is packaged in a separate library to rocfft following rocm 4.1. We check the rocm version and if it is past rocm 4.1 we load hipfft in addition to rocfft. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53408 Reviewed By: albanD Differential Revision: D26952702 Pulled By: malfet fbshipit-source-id: f42be304b587c060816e39d36f5c1a2cdc37bfab	2021-03-11 09:18:33 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Ilia Cherniavskii	795ed5ca3f	Enable Kineto in CPU builds (#53174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53174 Enable Kineto also in the CPU builds (non-mobile, non-Windows(atm)) Test Plan: CI Reviewed By: gdankel Differential Revision: D26776112 Pulled By: ilia-cher fbshipit-source-id: 8733f65c2993105136c853f2a7b6e497d0fa53bf	2021-03-04 19:15:52 -08:00
Ashkan Aliabadi	e5ecd1ddf8	[Vulkan]Fix build warnings-treated-as-error on Linux. (#52781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D26669311 Pulled By: AshkanAliabadi fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311	2021-03-03 13:48:43 -08:00
Nikita Shulga	b3c4ac6319	Fix OpenBLAS discovery (#53168 ) Summary: Fix accidental regression introduced by https://github.com/pytorch/pytorch/issues/47940 `FIND_PACKAGE(OpenBLAS)` does not validate that discovered library can actually be used, while `check_fortran_libraries` does that Pull Request resolved: https://github.com/pytorch/pytorch/pull/53168 Test Plan: Build PyTorch with static OpenBLAS and check that `torch.svd(torch.ones(3, 3)).S` do not raise an exception Reviewed By: walterddr Differential Revision: D26772345 Pulled By: malfet fbshipit-source-id: 3e4675c176b30dfe4f0490d7d3dfe4f9a4037134	2021-03-03 08:23:02 -08:00
peter	8870c391e9	Update mkl to 2020.2.254 (#52964 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52964 Reviewed By: H-Huang Differential Revision: D26726464 Pulled By: seemethere fbshipit-source-id: 8f3067292e6416e299b4b040c8fb73510134f02e	2021-03-01 11:13:57 -08:00
Chen Lai	14f7bf0629	[PyTorch] update CMake to build libtorch lite (#51419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51419 ## Summary 1. Add an option `BUILD_LITE_INTERPRETER` in `caffe2/CMakeLists.txt` and set `OFF` as default. 2. Update 'build_android.sh' with an argument to swtich `BUILD_LITE_INTERPRETER`, 'OFF' as default. 3. Add a mini demo app `lite_interpreter_demo` linked with `libtorch` library, which can be used for quick test. ## Test Plan Built lite interpreter version of libtorch and test with Image Segmentation demo app ([android version](https://github.com/pytorch/android-demo-app/tree/master/ImageSegmentation)/[ios version](https://github.com/pytorch/ios-demo-app/tree/master/ImageSegmentation)) ### Android 1. Prepare model: Prepare the lite interpreter version of model by run the script below to generate the scripted model `deeplabv3_scripted.pt` and `deeplabv3_scripted.ptl` ``` import torch model = torch.hub.load('pytorch/vision:v0.7.0', 'deeplabv3_resnet50', pretrained=True) model.eval() scripted_module = torch.jit.script(model) # Export full jit version model (not compatible lite interpreter), leave it here for comparison scripted_module.save("deeplabv3_scripted.pt") # Export lite interpreter version model (compatible with lite interpreter) scripted_module._save_for_lite_interpreter("deeplabv3_scripted.ptl") ``` 2. Build libtorch lite for android: Build libtorch for android for all 4 android abis (armeabi-v7a, arm64-v8a, x86, x86_64) `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh`. This pr is tested on Pixel 4 emulator with x86, so use cmd `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` to specify abi to save built time. After the build finish, it will show the library path: ``` ... BUILD SUCCESSFUL in 55s 134 actionable tasks: 22 executed, 112 up-to-date + find /Users/chenlai/pytorch/android -type f -name 'aar' + xargs ls -lah -rw-r--r-- 1 chenlai staff 13M Feb 11 11:48 /Users/chenlai/pytorch/android/pytorch_android/build/outputs/aar/pytorch_android-release.aar -rw-r--r-- 1 chenlai staff 36K Feb 9 16:45 /Users/chenlai/pytorch/android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar ``` 3. Use the PyTorch Android libraries built from source in the ImageSegmentation app: Create a folder 'libs' in the path, the path from repository root will be `ImageSegmentation/app/libs`. Copy `pytorch_android-release` to the path `ImageSegmentation/app/libs/pytorch_android-release.aar`. Copy 'pytorch_android_torchvision` (downloaded from [here](https://oss.sonatype.org/#nexus-search;quick~torchvision_android)) to the path `ImageSegmentation/app/libs/pytorch_android_torchvision.aar` Update the `dependencies` part of `ImageSegmentation/app/build.gradle` to ``` dependencies { implementation 'androidx.appcompat:appcompat:1.2.0' implementation 'androidx.constraintlayout:constraintlayout:2.0.2' testImplementation 'junit:junit:4.12' androidTestImplementation 'androidx.test.ext:junit:1.1.2' androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0' implementation(name:'pytorch_android-release', ext:'aar') implementation(name:'pytorch_android_torchvision', ext:'aar') implementation 'com.android.support:appcompat-v7:28.0.0' implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3' } ``` Update `allprojects` part in `ImageSegmentation/build.gradle` to ``` allprojects { repositories { google() jcenter() flatDir { dirs 'libs' } } } ``` 4. Update model loader api: Update `ImageSegmentation/app/src/main/java/org/pytorch/imagesegmentation/MainActivity.java` by 4.1 Add new import: `import org.pytorch.LiteModuleLoader;` 4.2 Replace the way to load pytorch lite model ``` // mModule = Module.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.pt")); mModule = LiteModuleLoader.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.ptl")); ``` 5. Test app: Build and run the ImageSegmentation app in Android Studio, ![image](https://user-images.githubusercontent.com/16430979/107696279-9cea5900-6c66-11eb-8286-4d1d68abff61.png) ### iOS 1. Prepare model: Same as Android. 2. Build libtorch lite for ios* `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR BUILD_LITE_INTERPRETER=1 ./scripts/build_ios.sh` 3. Remove Cocoapods from the project: run `pod deintegrate` 4. Link ImageSegmentation demo app with the custom built library: Open your project in XCode, go to your project Target’s Build Phases - Link Binaries With Libraries, click the + sign and add all the library files located in `build_ios/install/lib`. Navigate to the project Build Settings, set the value Header Search Paths to `build_ios/install/include` and Library Search Paths to `build_ios/install/lib`. In the build settings, search for other linker flags. Add a custom linker flag below ``` -all_load ``` Finally, disable bitcode for your target by selecting the Build Settings, searching for Enable Bitcode, and set the value to No. 5. Update library and api 5.1 Update `TorchModule.mm`` To use the custom built libraries the project, replace `#import <LibTorch/LibTorch.h>` (in `TorchModule.mm`) which is needed when using LibTorch via Cocoapods with the code below: ``` //#import <LibTorch/LibTorch.h> #include "ATen/ATen.h" #include "caffe2/core/timer.h" #include "caffe2/utils/string_utils.h" #include "torch/csrc/autograd/grad_mode.h" #include "torch/script.h" #include <torch/csrc/jit/mobile/function.h> #include <torch/csrc/jit/mobile/import.h> #include <torch/csrc/jit/mobile/interpreter.h> #include <torch/csrc/jit/mobile/module.h> #include <torch/csrc/jit/mobile/observer.h> ``` 5.2 Update `ViewController.swift` ``` // if let filePath = Bundle.main.path(forResource: // "deeplabv3_scripted", ofType: "pt"), // let module = TorchModule(fileAtPath: filePath) { // return module // } else { // fatalError("Can't find the model file!") // } if let filePath = Bundle.main.path(forResource: "deeplabv3_scripted", ofType: "ptl"), let module = TorchModule(fileAtPath: filePath) { return module } else { fatalError("Can't find the model file!") } ``` ### Unit test Add `test/cpp/lite_interpreter`, with one unit test `test_cores.cpp` and a light model `sequence.ptl` to test `_load_for_mobile()`, `bc.find_method()` and `bc.forward()` functions. ### Size: With the change: Android: x86: `pytorch_android-release.aar` (13.8 MB) IOS: `pytorch/build_ios/install/lib` (lib: 66 MB): ``` (base) chenlai@chenlai-mp lib % ls -lh total 135016 -rw-r--r-- 1 chenlai staff 3.3M Feb 15 20:45 libXNNPACK.a -rw-r--r-- 1 chenlai staff 965K Feb 15 20:45 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Feb 15 20:45 libclog.a -rw-r--r-- 1 chenlai staff 42K Feb 15 20:45 libcpuinfo.a -rw-r--r-- 1 chenlai staff 39K Feb 15 20:45 libcpuinfo_internals.a -rw-r--r-- 1 chenlai staff 1.5M Feb 15 20:45 libeigen_blas.a -rw-r--r-- 1 chenlai staff 148K Feb 15 20:45 libfmt.a -rw-r--r-- 1 chenlai staff 44K Feb 15 20:45 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Feb 15 20:45 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Feb 15 21:19 libtorch.a -rw-r--r-- 1 chenlai staff 60M Feb 15 20:47 libtorch_cpu.a ``` `pytorch/build_ios/install`: ``` (base) chenlai@chenlai-mp install % du -sh * 14M include 66M lib 2.8M share ``` Master (baseline): Android: x86: `pytorch_android-release.aar` (16.2 MB) IOS: `pytorch/build_ios/install/lib` (lib: 84 MB): ``` (base) chenlai@chenlai-mp lib % ls -lh total 172032 -rw-r--r-- 1 chenlai staff 3.3M Feb 17 22:18 libXNNPACK.a -rw-r--r-- 1 chenlai staff 969K Feb 17 22:18 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Feb 17 22:18 libclog.a -rw-r--r-- 1 chenlai staff 42K Feb 17 22:18 libcpuinfo.a -rw-r--r-- 1 chenlai staff 1.5M Feb 17 22:18 libeigen_blas.a -rw-r--r-- 1 chenlai staff 44K Feb 17 22:18 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Feb 17 22:18 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Feb 17 22:19 libtorch.a -rw-r--r-- 1 chenlai staff 78M Feb 17 22:19 libtorch_cpu.a ``` `pytorch/build_ios/install`: ``` (base) chenlai@chenlai-mp install % du -sh * 14M include 84M lib 2.8M share ``` Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D26518778 Pulled By: cccclai fbshipit-source-id: 4503ffa1f150ecc309ed39fb0549e8bd046a3f9c	2021-02-21 01:43:54 -08:00
Jiakai Liu	c9c4b871a5	[pytorch] reintroduce static dispatch (#51957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51957 This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` Differential Revision: D26337857 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364	2021-02-19 11:41:39 -08:00
Nikita Shulga	08017f4598	Add explicit cudart_static dependency for cublas_static (#52509 ) Summary: Fixes following error during static linking, by enforcing that cudart dependency is put after cublasLt ``` /usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0' /usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52509 Reviewed By: janeyx99 Differential Revision: D26547622 Pulled By: malfet fbshipit-source-id: 4e17f18cf0ab5479a549299faf2583a79fbda4b9	2021-02-19 10:45:49 -08:00
David Kyle	dbeda994db	Update FindvecLib.cmake for macOS 10.14, 10.15 and Big Sur (#51288 ) Summary: When compiling libtorch on macOS there is the option to use the `vecLib` BLAS library from Apple's (Accelerate)[https://developer.apple.com/documentation/accelerate] framework. Recent versions of macOS have changed the location of veclib.h, this change adds the new locations to `FindvecLib.cmake` To test run the following command: ``` BLAS=vecLib python setup.py install --cmake --cmake-only ``` The choice of BLAS library is confirmed in the output: ``` -- Trying to find preferred BLAS backend of choice: vecLib -- Found vecLib: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Versions/Current/Headers ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51288 Reviewed By: jbschlosser Differential Revision: D26531136 Pulled By: malfet fbshipit-source-id: ce86807ccbf66973f33b3acb99b7f40cfd182b9b	2021-02-19 08:04:10 -08:00
Nikita Shulga	de4c9ecc35	Fix libnvrtc discoverability in package patched by `auditwheel` (#52184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52184 `auditwheel` inserts first 8 symbols of sha256 checksum of the library before relocating into the wheel package. This change adds logic for computing the same short sha sum and embedding it into LazyNVRTC as alternative name for libnvrt.so Fixes https://github.com/pytorch/pytorch/issues/52075 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D26417403 Pulled By: malfet fbshipit-source-id: e366dd22e95e219979f6c2fa39acb11585b34c72	2021-02-13 19:38:27 -08:00
Nikita Shulga	bf841b25e4	[cmake] Add explicit cublas->cudart dependency (#52243 ) Summary: Necessary to ensure correct link order, especially if libraries are linked statically. Otherwise, one might run into: ``` /usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0' /usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52243 Reviewed By: seemethere, ngimel Differential Revision: D26437159 Pulled By: malfet fbshipit-source-id: 33b8bb5040bda10537833f3ad737f535488452ea	2021-02-13 18:21:33 -08:00
Chen Lai	ba7a2f6513	Add debug helper function to check target property (#52093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52093 # Summary The previous debug if statement only prints the file list, but it's not clear whether the target includes the file list correctly. This function can examine the target so it's more accurate. This pr includes changes: 1. Add a file `DebugHelper.cmake` with `print_target_properties` function. 2. Replace the debug if statement `if(FALSE)` by adding magical variable `PRINT_CMAKE_DEBUG_INFO` and applying the variable accordingly. Note: previous debug if statement output example: ``` -- CPU sources: -- /Users/chenlai/pytorch/aten/src/ATen/BatchedFallback.cpp -- /Users/chenlai/pytorch/aten/src/ATen/BatchedTensorImpl.cpp -- /Users/chenlai/pytorch/aten/src/ATen/BatchingRegistrations.cpp -- /Users/chenlai/pytorch/aten/src/ATen/CPUGeneratorImpl.cpp -- /Users/chenlai/pytorch/aten/src/ATen/Context.cpp -- /Users/chenlai/pytorch/aten/src/ATen/DLConvertor.cpp -- /Users/chenlai/pytorch/aten/src/ATen/DynamicLibrary.cpp -- /Users/chenlai/pytorch/aten/src/ATen/ExpandUtils.cpp -- /Users/chenlai/pytorch/aten/src/ATen/LegacyTHFunctionsCPU.cpp -- /Users/chenlai/pytorch/aten/src/ATen/MemoryOverlap.cpp ... -- GPU sources: -- CPU include: -- /Users/chenlai/pytorch/build_android/caffe2/aten/src/TH -- /Users/chenlai/pytorch/aten/src/TH -- /Users/chenlai/pytorch/aten/src -- /Users/chenlai/pytorch/build_android/caffe2/aten/src ... -- GPU include: -- /Users/chenlai/pytorch/build_android/caffe2/aten/src/TH -- /Users/chenlai/pytorch/aten/src/TH -- /Users/chenlai/pytorch/build_android/caffe2/aten/src/TH -- /Users/chenlai/pytorch/aten/src/TH ``` # Test plan Set `PRINT_CMAKE_DEBUG_INFO` to true by adding `DPRINT_CMAKE_DEBUG_INFO` in `./scripts/build_pytorch_android.sh`, run `./scripts/build_pytorch_android.sh x86` `print_target_properties(torch)` shows ``` torch ANDROID_ARCH = x86 torch ANDROID_STL_TYPE = c++_static torch ARCHIVE_OUTPUT_DIRECTORY = /Users/chenlai/pytorch/build_android_x86/lib torch AUTOGEN_ORIGIN_DEPENDS = ON torch AUTOMOC_COMPILER_PREDEFINES = ON torch AUTOMOC_MACRO_NAMES = Q_OBJECT;Q_GADGET;Q_NAMESPACE;Q_NAMESPACE_EXPORT torch AUTOMOC_PATH_PREFIX = OFF torch BINARY_DIR = /Users/chenlai/pytorch/build_android_x86/caffe2 torch BINARY_DIR = /Users/chenlai/pytorch/build_android_x86/caffe2 torch BUILD_WITH_INSTALL_RPATH = FALSE torch CXX_STANDARD = 14 torch C_STANDARD = 11 torch IMPORTED = FALSE torch IMPORTED_GLOBAL = FALSE torch INCLUDE_DIRECTORIES = /Users/chenlai/pytorch/build_android_x86/aten/src;/Users/chenlai/pytorch/aten/src;/Users/chenlai/pytorch/build_android_x86;/Users/chenlai/pytorch;/Users/chenlai/pytorch/third_party/XNNPACK/include;/Users/chenlai/Library/Android/sdk/ndk/21.3.6528147/sources/third_party/vulkan/src/common;/Users/chenlai/pytorch/cmake/../third_party/eigen;/Users/chenlai/pytorch/cmake/../third_party/pybind11/include torch INCLUDE_DIRECTORIES = /Users/chenlai/pytorch/build_android_x86/aten/src;/Users/chenlai/pytorch/aten/src;/Users/chenlai/pytorch/build_android_x86;/Users/chenlai/pytorch;/Users/chenlai/pytorch/third_party/XNNPACK/include;/Users/chenlai/Library/Android/sdk/ndk/21.3.6528147/sources/third_party/vulkan/src/common;/Users/chenlai/pytorch/cmake/../third_party/eigen;/Users/chenlai/pytorch/cmake/../third_party/pybind11/include torch INCLUDE_DIRECTORIES = /Users/chenlai/pytorch/build_android_x86/aten/src;/Users/chenlai/pytorch/aten/src;/Users/chenlai/pytorch/build_android_x86;/Users/chenlai/pytorch;/Users/chenlai/pytorch/third_party/XNNPACK/include;/Users/chenlai/Library/Android/sdk/ndk/21.3.6528147/sources/third_party/vulkan/src/common;/Users/chenlai/pytorch/cmake/../third_party/eigen;/Users/chenlai/pytorch/cmake/../third_party/pybind11/include torch INSTALL_RPATH = $ORIGIN torch INSTALL_RPATH_USE_LINK_PATH = TRUE torch INTERFACE_LINK_LIBRARIES = torch_cpu_library torch ISPC_HEADER_SUFFIX = _ispc.h torch LIBRARY_OUTPUT_DIRECTORY = /Users/chenlai/pytorch/build_android_x86/lib torch LINK_LIBRARIES = torch_cpu_library torch NAME = torch torch PCH_INSTANTIATE_TEMPLATES = ON torch PCH_WARN_INVALID = ON torch POSITION_INDEPENDENT_CODE = TRUE torch RUNTIME_OUTPUT_DIRECTORY = /Users/chenlai/pytorch/build_android_x86/bin torch SKIP_BUILD_RPATH = FALSE torch SOURCES = /Users/chenlai/pytorch/build_android_x86/empty.cpp torch SOURCE_DIR = /Users/chenlai/pytorch/caffe2 torch SOURCE_DIR = /Users/chenlai/pytorch/caffe2 torch TYPE = STATIC_LIBRARY torch TYPE = STATIC_LIBRARY torch UNITY_BUILD_BATCH_SIZE = 8 torch UNITY_BUILD_MODE = BATCH ``` Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26377725 Pulled By: cccclai fbshipit-source-id: dbe21ad533759f33711a0ce5328205bbcd5cf0f3	2021-02-11 15:37:14 -08:00
Jeff Daily	d02ea9a141	[ROCm] add hipMAGMA support (#51238 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48831. - CI image is updated to build hipMAGMA from source and set env MAGMA_HOME. - CMake is updated to separate different requirements for CUDA versus ROCm MAGMA. - Some unit tests that become enabled with MAGMA are currently skipped for ROCm due to failures. Fixing these failures will be follow-on work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51238 Reviewed By: ngimel Differential Revision: D26184918 Pulled By: malfet fbshipit-source-id: ada632f1ae7b413e8cae6543fe931dcd46985821	2021-02-01 22:09:33 -08:00
Jane Xu	88af2149e1	Add build option to split torch_cuda library into torch_cuda_cu and torch_cuda_cpp (#49050 ) Summary: Because of the size of our `libtorch_cuda.so`, linking with other hefty binaries presents a problem where 32bit relocation markers are too small and end up overflowing. This PR attempts to break up `torch_cuda` into `torch_cuda_cu` and `torch_cuda_cpp`. `torch_cuda_cu`: all the files previously in `Caffe2_GPU_SRCS` that are * pure `.cu` files in `aten`match * all the BLAS files * all the THC files, except for THCAllocator.cpp, THCCachingHostAllocator.cpp and THCGeneral.cpp * all files in`detail` * LegacyDefinitions.cpp and LegacyTHFunctionsCUDA.cpp * RegisterCUDA.cpp CUDAHooks.cpp * CUDASolver.cpp * TensorShapeCUDA.cpp `torch_cuda_cpp`: all other files in `Caffe2_GPU_SRCS` Accordingly, TORCH_CUDA_API and TORCH_CUDA_BUILD_MAIN_LIB usages are getting split as well to TORCH_CUDA_CU_API and TORCH_CUDA_CPP_API. To test this locally, you can run `export BUILD_SPLIT_CUDA=ON && python setup.py develop`. In your `build/lib` folder, you should find binaries for both `torch_cuda_cpp` and `torch_cuda_cu`. To see that the SPLIT_CUDA option was toggled, you can grep the Summary of running cmake and make sure `Split CUDA` is ON. This build option is tested on CI for CUDA 11.1 builds (linux for now, but windows soon). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49050 Reviewed By: walterddr Differential Revision: D26114310 Pulled By: janeyx99 fbshipit-source-id: 0180f2519abb5a9cdde16a6fb7dd3171cff687a6	2021-02-01 18:42:35 -08:00
Luca Wehrstedt	b77f72b5a0	Enable TensorPipe's SHM transport (#50760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120470890 Test Plan: Exported three times to CircleCI with tests consistently passing Reviewed By: mrshenli Differential Revision: D23814828 fbshipit-source-id: f355cb6515776debad536924de4f4d3fbb05a874	2021-01-27 11:45:09 -08:00
Will Constable	4bbff92014	Refactor build targets for torch::deploy (#50288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50288 torch::deploy will bundle the objects contained in libtorch-python together with frozenpython into a shared library. Therefore, the libtorch-python objs can't bring with them a dependency on system python. Buck TARGETS are added throughout the caffe2 tree to make available objects or headers that will be needed by torch::deploy but would have brought unsuitable dependencies if accessed using existing targets. CMakeLists are modified to separate a torch-python-objs object library which lets torch::deploy compile these objs with the same compile flags as libttorch_python used, but without some of the link-time dependencies such as python. CudaIPCTypes is moved from libtorch_python to libtorch_cuda because it is really not a python binding, and it statically registers a cuda_ipc_callback which would be duplicated if included in each copy of torch::deploy. Test Plan: no new functionality, just ensure existing tests continue to pass Reviewed By: malfet Differential Revision: D25850785 fbshipit-source-id: b0b81c050cbee04e9de96888f8a09d29238a9db8	2021-01-22 09:16:32 -08:00
Jeff Daily	b2e5617553	[ROCm] rename HIP_HCC_FLAGS to HIP_CLANG_FLAGS (#50917 ) Summary: ROCm 3.5 replaced hcc with hip-clang and deprecated HIP_HCC_FLAGS. HIP_CLANG_FLAGS should be used moving forward. HIP_HCC_FLAGS will be removed soon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50917 Reviewed By: ejguan Differential Revision: D26008094 Pulled By: walterddr fbshipit-source-id: cfec4f96fbd9bd338834a841c37267f6a4703cab	2021-01-22 07:24:05 -08:00
Ilia Cherniavskii	e34992ebee	Set USE_KINETO=1 (#49897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897 Resend of https://github.com/pytorch/pytorch/pull/49201 Test Plan: see 49201 Reviewed By: malfet Differential Revision: D25717102 Pulled By: ilia-cher fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6	2021-01-22 00:09:21 -08:00
Luca Wehrstedt	112a583467	Enable TensorPipe's CMA channel (#50759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50759 ghstack-source-id: 120032288 Test Plan: Exported to CircleCI and tested Reviewed By: mrshenli Differential Revision: D25959326 fbshipit-source-id: be6df209ff3a79a8961acbda64ee7805a5c434a9	2021-01-20 10:53:47 -08:00
Mark Final	e1bb476980	Issue #48724 . Only set the CMake IMPORTED_LOCATION property in static… (#49173 ) Summary: … library builds, as it is already set in shared library builds from the target that was imported from Caffe2. This was identified on Windows builds when PyTorch was built in shared Release mode, and a testapp was built with RelWithDebInfo in CMake. The problem appeared to be that because IMPORTED_LOCATION (in TorchConfig.cmake) and IMPORTED_LOCATION_RELEASE were both set (in Caffe2Targets.cmake), there occurred some confusion in the build as to what was correct. The symptoms are the error: ninja: error: 'torch-NOTFOUND', needed by 'test_pytorch.exe', missing and no known rule to make it in a noddy consuming test application. Fixes https://github.com/pytorch/pytorch/issues/48724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49173 Reviewed By: malfet Differential Revision: D25974151 Pulled By: ezyang fbshipit-source-id: 3454c0d29cbbe7a37608beedaae3efbb624b0479	2021-01-20 09:23:27 -08:00
Rong Rong (AI Infra)	ebd142e94b	initial commit to enable fast_nvcc (#49773 ) Summary: draft enable fast_nvcc. * cleaned up some non-standard usages * added fall-back to wrap_nvcc Pull Request resolved: https://github.com/pytorch/pytorch/pull/49773 Test Plan: Configuration to enable fast nvcc: - install and enable `ccache` but delete `.ccache/` folder before each build. - `TORCH_CUDA_ARCH_LIST=6.0;6.1;6.2;7.0;7.5` - Toggling `USE_FAST_NVCC=ON/OFF` cmake config and run `cmake --build` to verify the build time. Initial statistic for a full compilation: * `cmake --build . -- -j $(nproc)`: - fast NVCC ``` real 48m55.706s user 1559m14.218s sys 318m41.138s ``` - normal NVCC: ``` real 43m38.723s user 1470m28.131s sys 90m46.879s ``` * `cmake --build . -- -j $(nproc/4)`: - fast NVCC: ``` real 53m44.173s user 1130m18.323s sys 71m32.385s ``` - normal NVCC: ``` real 81m53.768s user 858m45.402s sys 61m15.539s ``` * Conclusion: fast NVCC doesn't provide too much gain when compiler is set to use full CPU utilization, in fact it is even worse because of the thread switcing. initial statistic for partial recompile (edit .cu files) * `cmake --build . -- -j $(nproc)` - fast NVCC: ``` [2021-01-13 18:10:24] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o [2021-01-13 18:11:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so ``` - normal NVCC: ``` [2021-01-13 17:35:40] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o [2021-01-13 17:38:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so ``` * Conclusion: Effective compilation time for single CU file modification reduced from from 2min30sec to only 40sec when compiling multiple architecture. This shows 4X gain in speed up using fast NVCC -- reaching the theoretical limit of 5X when compiling 5 gencode architecture at the same time. Follow up PRs: - should have better fallback mechanism to detect whether a build is supported by fast_nvcc or not instead of dryruning then fail with fallback. - performance measurement instrumentation to measure what's the total compile time vs the parallel tasks critical path time. - figure out why `-j $(nproc)` gives significant sys overhead (`sys 318m41.138s` vs `sys 90m46.879s`) over normal nvcc, guess this is context switching, but not exactly sure Reviewed By: malfet Differential Revision: D25692758 Pulled By: walterddr fbshipit-source-id: c244d07b9b71f146e972b6b3682ca792b38c4457	2021-01-19 14:50:54 -08:00
Richard Barnes	30a8ba93b1	Remove a blacklist reference (#50477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50477 See task for context Test Plan: Sandcastle+OSS tests Reviewed By: xush6528 Differential Revision: D25893906 fbshipit-source-id: c9b86d0292aa751597d75e8d1b53f99b99c924b9	2021-01-13 13:39:06 -08:00
Nathan John Sircombe	664126bab5	Enables build with oneDNN (MKL-DNN) on AArch64 (#50400 ) Summary: Since version 1.6, oneDNN has provided limited support for AArch64 builds. This minor change is to detect an AArch64 CPU and permit the use of `USE_MKLDNN` in that case. Build flags for oneDNN are also modified accordingly. Note: oneDNN on AArch64, by default, will use oneDNN's reference C++ kernels. These are not optimised for AArch64, but oneDNN v1.7 onwards provides support for a limited set of primitives based Arm Compute Library. See: https://github.com/oneapi-src/oneDNN/pull/795 and: https://github.com/oneapi-src/oneDNN/pull/820 for more details. Support for ACL-based oneDNN primitives in PyTorch will require some further modification, Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50400 Reviewed By: izdeby Differential Revision: D25886589 Pulled By: malfet fbshipit-source-id: 2c81277a28ad4528c2d2211381e7c6692d952bc1	2021-01-13 08:41:44 -08:00
Gemfield	deba3bd1d0	Fix TORCH_LIBRARIES variables when do static build (#49458 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21737 With this fix, TORCH_LIBRARIES variable can provide all nessesary static libraries build from pytorch repo. User program (if do static build) now can just link with ${TORCH_LIBRARIES} + MKL + cuda runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49458 Reviewed By: mrshenli Differential Revision: D25895354 Pulled By: malfet fbshipit-source-id: 8ff47d14ae1f90036522654d4354256ed5151e5c	2021-01-13 07:56:27 -08:00
Chester Liu	bee6b0be58	Fix warning when running scripts/build_ios.sh (#49457 ) Summary: * Fixes `cmake implicitly converting 'string' to 'STRING' type` * Fixes `clang: warning: argument unused during compilation: '-mfpu=neon-fp16' [-Wunused-command-line-argument]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49457 Reviewed By: zhangguanheng66 Differential Revision: D25871014 Pulled By: malfet fbshipit-source-id: fa0c181ae7a1b8668e47f5ac6abd27a1c735ffce	2021-01-11 19:31:32 -08:00
Jane Xu	c2d37cd990	Change CMake config to enable universal binary for Mac (#50243 ) Summary: This PR is a step towards enabling cross compilation from x86_64 to arm64. The following has been added: 1. When cross compilation is detected, compile a local universal fatfile to use as protoc. 2. For the simple compile check in MiscCheck.cmake, make sure to compile the small snippet as a universal binary in order to run the check. Test plan: Kick off a minimal build on a mac intel machine with the macOS 11 SDK with this command: ``` CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF USE_NNPACK=OFF python setup.py install ``` (If you run the above command before this change, or without macOS 11 SDK set up, it will fail.) Then check the platform of the built binaries using this command: ``` lipo -info build/lib/libfmt.a ``` Output: - Before this PR, running a regular build via `python setup.py install` (instead of using the flags listed above): ``` Non-fat file: build/lib/libfmt.a is architecture: x86_64 ``` - Using this PR: ``` Non-fat file: build/lib/libfmt.a is architecture: arm64 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50243 Reviewed By: malfet Differential Revision: D25849955 Pulled By: janeyx99 fbshipit-source-id: e9853709a7279916f66aa4c4e054dfecced3adb1	2021-01-08 17:26:08 -08:00
Nikita Shulga	7b4a7661d6	Make PyTorch partially cross-compilable for Apple M1 (#49701 ) Summary: Update CPUINFO to include https://github.com/pytorch/cpuinfo/pull/51 Update sleef to include https://github.com/shibatch/sleef/pull/376 Modify aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt to recognize CMAKE_OSX_ARCHITECTURES Pull Request resolved: https://github.com/pytorch/pytorch/pull/49701 Test Plan: `cmake -DCMAKE_OSX_ARCHITECTURES=x86_64 -DPYTHON_EXECUTABLE=/usr/bin/python3 -DUSE_XNNPACK=NO -DBUILD_TEST=YES .. -G Ninja; ninja basic` finishes successfully on Apple M1 Reviewed By: janeyx99 Differential Revision: D25669219 Pulled By: malfet fbshipit-source-id: 5ee36b64e3a7ac76448f2a300ac4993375a26de5	2020-12-22 09:33:12 -08:00
Jane Xu	71ca600af9	Renaming CAFFE2_API to TORCH_API (#49496 ) Summary: Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO. Manually edited some references of the removed `CAFFE2_API`: * `CONTRIBUTING.md` * `caffe2/proto/CMakeLists.txt` * `cmake/ProtoBuf.cmake` * `c10/macros/Export.h` * `torch/csrc/WindowsTorchApiMacro.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496 Reviewed By: malfet, samestep Differential Revision: D25600726 Pulled By: janeyx99 fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782	2020-12-18 10:54:50 -08:00
Ilia Cherniavskii	72b00a8a52	Revert D25480770: Set USE_KINETO=1 Test Plan: revert-hammer Differential Revision: D25480770 (`1a92802bde`) Original commit changeset: 037cd774f554 fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1	2020-12-18 07:06:28 -08:00
Ilia Cherniavskii	1a92802bde	Set USE_KINETO=1 (#49201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201 This unblocks kineto profiler for 1.8 release. This PR supercedes https://github.com/pytorch/pytorch/pull/48391 Note: this will somewhat increase the size of linux server binaries, bc we add libkineto.a and libcupti_static.a: -rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a -rw-r--r-- 1 root root 13699658 Nov 13 2019 /usr/local/cuda/lib64/libcupti_static.a Test Plan: CI https://github.com/pytorch/pytorch/pull/48391 Imported from OSS Reviewed By: ngimel Differential Revision: D25480770 fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c	2020-12-18 01:48:10 -08:00
Rong Rong	88b3d3371b	add additional arm64 checker in cmake files (#48952 ) Summary: tentatively fixes https://github.com/pytorch/pytorch/issues/48873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48952 Reviewed By: H-Huang Differential Revision: D25463266 Pulled By: walterddr fbshipit-source-id: 40afefffe8ab98ae7261c770316cb9c25225285f	2020-12-11 08:10:09 -08:00
Abdelrauf	95a1725a4a	Vsx initial support issue27678 (#41541 ) Summary: ### Pytorch Vec256 ppc64le support implemented types: - double - float - int16 - int32 - int64 - qint32 - qint8 - quint8 - complex_float - complex_double Notes: All basic vector operations are implemented: There are a few problems: - minimum maximum nan propagation for ppc64le is missing and was not checked - complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones. That's why they were either excluded or tested in smaller domain range - precisions of the implemented float math functions ~~Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT too~~ #### Details ##### Supported math functions + plus sign means vectorized, - minus sign means missing, (implementation notes are added inside braces) (notes). Example: -(both ) means it was also missing on x86 side g( func_name) means vectorization is using func_name sleef - redirected to the Sleef unsupported function_name \| float \| double \| complex float \| complex double \|-- \| -- \| -- \| -- \| --\| acos \| sleef \| sleef \| f(asin) \| f(asin) asin \| sleef \| sleef \| +(pytorch impl) \| +(pytorch impl) atan \| sleef \| sleef \| f(log) \| f(log) atan2 \| sleef \| sleef \| unsupported \| unsupported cos \| +((ppc64le:avx_mathfun) ) \| sleef \| -(both) \| -(both) cosh \| f(exp) \| -(both) \| -(both) \| erf \| sleef \| sleef \| unsupported \| unsupported erfc \| sleef \| sleef \| unsupported \| unsupported erfinv \| - (both) \| - (both) \| unsupported \| unsupported exp \| + \| sleef \| - (x86:f()) \| - (x86:f()) expm1 \| f(exp) \| sleef \| unsupported \| unsupported lgamma \| sleef \| sleef \| \| log \| + \| sleef \| -(both) \| -(both) log10 \| f(log) \| sleef \| f(log) \| f(log) log1p \| f(log) \| sleef \| unsupported \| unsupported log2 \| f(log) \| sleef \| f(log) \| f(log) pow \| + f(exp) \| sleef \| -(both) \| -(both) sin \| +((ppc64le:avx_mathfun) ) \| sleef \| -(both) \| -(both) sinh \| f(exp) \| sleef \| -(both) \| -(both) tan \| sleef \| sleef \| -(both) \| -(both) tanh \| f(exp) \| sleef \| -(both) \| -(both) hypot \| sleef \| sleef \| -(both) \| -(both) nextafter \| sleef \| sleef \| -(both) \| -(both) fmod \| sleef \| sleef \| -(both) \| -(both) [Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685) Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on Vsx vec_round failed for (even)+0 .5 values~~ . it was solved by using vec_rint - ~~complex types are not tested~~ After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Pull Request resolved: https://github.com/pytorch/pytorch/pull/41541 Reviewed By: zhangguanheng66 Differential Revision: D23922049 Pulled By: VitalyFedyunin fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098	2020-12-10 13:42:39 -08:00
peterjc123	5450614cf6	Correctly apply WIN32_LEAN_AND_MEAN to the whole repo (#49025 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49025 Reviewed By: zhangguanheng66 Differential Revision: D25399912 Pulled By: ezyang fbshipit-source-id: 9b7225b0e43511e0b8981c39035d814a4406c523	2020-12-08 19:38:23 -08:00
Rong Rong	b89c328493	Add fftw3 cmake as alternative for FFT/DFT (#48808 ) Summary: added cmake discovery in Dependencies.cmake for fftw3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48808 Reviewed By: janeyx99 Differential Revision: D25375320 Pulled By: walterddr fbshipit-source-id: cde3afc51eef9c621c7d19be7ad7573fc8b838c2	2020-12-08 10:35:33 -08:00
Nikita Shulga	c29f51642e	Modify NEON check for ARM64 on OS X (#48982 ) Summary: Use CMAKE_SYSTEM_PROCESSOR rather than run sysctl Fixes https://github.com/pytorch/pytorch/issues/48874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48982 Reviewed By: walterddr Differential Revision: D25385883 Pulled By: malfet fbshipit-source-id: 47b6dc5be8d75f6d4a66a11c564abdfe31ac90b4	2020-12-08 07:58:22 -08:00
Jithun Nair	5f62308739	Hipify revamp [REDUX] (#48715 ) Summary: [Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451] This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to cpp_extension.py to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path. The list of changes to cpp_extension.py is as follows: 1. Call hipify when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715 Reviewed By: bdhirsh Differential Revision: D25272824 Pulled By: ezyang fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e	2020-12-02 18:03:23 -08:00
Daily, Jeff	7f869dca70	[ROCm] update debug flags (#46717 ) Summary: Improves support for rocgdb when setting DEBUG=1 and building for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46717 Reviewed By: mrshenli Differential Revision: D25171544 Pulled By: malfet fbshipit-source-id: b4699ba2277dcb89f07efb86f7153fae82a80dc3	2020-11-30 15:27:30 -08:00
Ashkan Aliabadi	66440d1b29	Tweak Vulkan memory use. (#47728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47728 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25032740 Pulled By: AshkanAliabadi fbshipit-source-id: 7eb72538dc1aa3feb4e2f8c4ff9c675eb8e97057	2020-11-30 14:28:09 -08:00
Rong Rong	af520d9d04	[cmake] clean up blas discovery (#47940 ) Summary: remove useless variable changes in blas discovery Pull Request resolved: https://github.com/pytorch/pytorch/pull/47940 Reviewed By: malfet Differential Revision: D25122228 Pulled By: walterddr fbshipit-source-id: 12bc3ce9e4f89a72b6a92c10d14024e5941f4b96	2020-11-30 10:29:50 -08:00
Nikita Shulga	e7ca62be08	Fix PyTorch compilation on Apple M1 (#48275 ) Summary: Update cpuinfo and sleef to contain build fixes for M1 Fixes https://github.com/pytorch/pytorch/issues/48145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48275 Reviewed By: walterddr Differential Revision: D25135153 Pulled By: malfet fbshipit-source-id: 2a82e14407d6f40c7dacd11109a8499d808c8ec1	2020-11-26 07:08:33 -08:00
Ilia Cherniavskii	f7a8bf2855	Use libkineto in profiler (#46470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46470 Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Reviewed By: Chillee Differential Revision: D25142223 Pulled By: ilia-cher fbshipit-source-id: b0dff46c28da5fb0a8e01cf548aa4f2b723fde80	2020-11-25 04:32:16 -08:00
Ilia Cherniavskii	f2da18af14	Add USE_KINETO build option (#45888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45888 Adding USE_LIBKINETO build option Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake Reviewed By: Chillee Differential Revision: D25142221 Pulled By: ilia-cher fbshipit-source-id: d1634a8f9599604ff511fac59b9072854289510c	2020-11-21 20:20:32 -08:00
Bert Maher	8a996dd139	[te] Make BUILD_TENSOREXPR_BENCHMARK a real CMake option (#48158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48158 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25059877 Pulled By: bertmaher fbshipit-source-id: a98b6c18a91b4fe89d12bf5f7ead604e3cc0c8b0	2020-11-18 12:19:14 -08:00
Nikita Shulga	8af9f2cc23	Revert D24924736: [pytorch][PR] Hipify revamp Test Plan: revert-hammer Differential Revision: D24924736 (`10b490a3e0`) Original commit changeset: 4af42b8ff4f2 fbshipit-source-id: 7f8f90d55d8a69a2890ec73622fcea559189e381	2020-11-18 11:48:30 -08:00
Jithun Nair	10b490a3e0	Hipify revamp (#45451 ) Summary: This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to `cpp_extension.py` to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path. The list of changes to `cpp_extension.py` is as follows: 1. Call `hipify` when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451 Reviewed By: ezyang Differential Revision: D24924736 Pulled By: malfet fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d	2020-11-18 08:37:49 -08:00
Samuel Aldana	8c00221fe2	Fix inconsistent environment variable naming for setting NVTOOLEXT_HOME in TorchConfig.cmake (#48012 ) Summary: When building libtorch with CUDA installed in some unconventional location, CMake files rely on some environment variables to set cmake variable, in particular NVTOOLSEXT_PATH environment variable is used to set NVTOOLEXT_HOME in cmake/public/cuda.cmake. Later when consuming such build using the generated cmake finder TorchConfig.cmake, another convention is used which feels rather inconsistent, relying on a completly new environment variable NVTOOLEXT_HOME, although the former way is still in place, cmake/public/cuda.cmake being transitively called via Caffe2Config.cmake, which is called by TorchConfig.cmake Fixes https://github.com/pytorch/pytorch/issues/48032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48012 Reviewed By: gchanan Differential Revision: D25031260 Pulled By: ezyang fbshipit-source-id: 0d6ab8ba9f52dd10be418b1a92b0f53c889f3f2d	2020-11-18 07:37:53 -08:00
Tao Xu	3846e35a55	[GPU] Enable Metal on macosx (#47635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47635 Add macosx support for metal. The supported os version is 10.13 and above. ghstack-source-id: 116845318 Test Plan: 1. Sandcastle Tests 2. CircleCI Jobs 3. In the next diff, we'll run the person segmentation model inside a macos app Reviewed By: dreiss Differential Revision: D24825088 fbshipit-source-id: 10d7976c953e765599002dc42d7f8d248d7c9846	2020-11-17 14:44:34 -08:00
Rong Rong	147a48fb27	[cmake] clean up cmake/Utils.cmake (#47923 ) Summary: Consolidate into cmake/public/utils.cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/47923 Reviewed By: samestep Differential Revision: D24955961 Pulled By: walterddr fbshipit-source-id: 9d5f6af2b353a8c6f6d521c841fd0989393755cd	2020-11-16 08:12:32 -08:00
Rong Rong	7391edb591	[hotfix] fix misleadingly summary BLAS=MKL when there's no BLAS install (#47803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47803 Reviewed By: samestep Differential Revision: D24907453 Pulled By: walterddr fbshipit-source-id: a3e41041f6aa506b054eb0ffc61f8525ba02cbf1	2020-11-12 16:05:14 -08:00
Nikita Shulga	e8a73fbf34	Workaround PyTorch debug build crash using old GCC (#47805 ) Summary: gcc-7.4.x or older fails to compile XNNPACK in debug mode with internal compiler error Workaround this in a build script by pasing -O1 optimisation flag to XNNPACK if compiled on older compilers Fixes https://github.com/pytorch/pytorch/issues/47292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47805 Reviewed By: seemethere Differential Revision: D24905758 Pulled By: malfet fbshipit-source-id: 93f4e3b3b5c10b69734627c50e36b2eb544699c8	2020-11-11 16:33:47 -08:00
Rong Rong	611080a118	[hot fix] cuda 11.0.x doesn't support sm86. (#47408 ) Summary: Bump condition check from >11.0 to >11.0.3 CMAKE 3.5 doesn't support VERSION_GREATER_EQUAL see [here](https://github.com/Dav1dde/glad/issues/134), so we might need to bump this again iv 11.0.4+ releases. should fix https://github.com/pytorch/pytorch/issues/47352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47408 Reviewed By: glaringlee Differential Revision: D24759949 Pulled By: walterddr fbshipit-source-id: de384c7b150babaf799cce53ed198e5e931899da	2020-11-06 10:34:25 -08:00
Edward Yang	dc6f723cb4	Delete Vulkan from code generator. (#46938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46938 It turns out that after https://github.com/pytorch/pytorch/pull/42194 landed we no longer actually generate any registrations into this file. That means it's completely unnecessary. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24573518 Pulled By: ezyang fbshipit-source-id: b41ada9e394b780f037f5977596a36b896b5648c	2020-10-29 14:40:54 -07:00
Nikita Shulga	83d358da7c	Fix LAPACK functionality detection from static OpenBLAS (#46710 ) Summary: BLAS `sgemm_` only depends on pthreads, but LAPACK `cheev_` also depends on libm Pull Request resolved: https://github.com/pytorch/pytorch/pull/46710 Reviewed By: walterddr Differential Revision: D24476082 Pulled By: malfet fbshipit-source-id: e0b91116f18bbcdabb1f99c2ec9d98283df4393f	2020-10-26 08:34:28 -07:00
peter	89f368bef8	Enable XNNPACK on Windows & Update XNNPACK (#45830 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44283. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45830 Reviewed By: zhangguanheng66 Differential Revision: D24504302 Pulled By: ezyang fbshipit-source-id: ab28088a4fbb553a27ed7c8da87ec7b40c73c2f1	2020-10-23 14:17:45 -07:00
acxz	5e0bfd7455	[Build] [CMake] [ROCm] find hsa-runtime64 properly (#45550 ) Summary: Properly Fixes https://github.com/pytorch/pytorch/issues/44384 similar in vein to https://github.com/pytorch/pytorch/issues/42064 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45550 Reviewed By: ezyang Differential Revision: D24412674 Pulled By: malfet fbshipit-source-id: f3d056c7069cb9d8a7d4174b604b9e3fbb14180b	2020-10-20 08:38:32 -07:00
Shen Li	eadc59df55	Enable TP_USE_CUDA and TP_ENABLE_CUDA_IPC (#46523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46523 Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D24385830 Pulled By: mrshenli fbshipit-source-id: 59a40843b4dc1585e176062476da9ab74c84179b	2020-10-19 09:05:00 -07:00
Tao Xu	04e5fcc0ed	[GPU] Introduce USE_PYTORCH_METAL (#46383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46383 The old `USE_METAL` is actually being used by Caffe2. Here we introduce a new macro to enable metal in pytorch. ghstack-source-id: 114499392 Test Plan: - Circle CI - The Person Segmentation model works Reviewed By: linbinyu Differential Revision: D24322018 fbshipit-source-id: 4e5548afba426b49f314366d89b18ba0c7e745ca	2020-10-16 18:19:32 -07:00
pinzhenx	e1f74b1813	Fix mkldnn build on legacy x64 arch (#46082 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45838 `ARCH_OPT_FLAGS` was the old name of `MKLDNN_ARCH_OPT_FLAGS`, which has been renamed in [this commit](`2a011ff02e (diff-a0abcbf647ed740b80615fb5b1614a44L97)`), but not updated in pytorch. As its default value will be set to sse4.1, some kernels are going to fail on the legacy arch that does not support SSE4.1. This patch was to make this flag effective. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46082 Reviewed By: glaringlee Differential Revision: D24252149 Pulled By: agolynski fbshipit-source-id: 7079deed373d664763c5888feb28795e5235caa8	2020-10-12 08:45:06 -07:00
peterjc123	bb99bea774	Compress NVCC flags for Windows (#45842 ) Summary: Fixes #{issue number} This makes the command line shorter. Also updates `randomtemp` in which the previous version has a limitation that the length of the argument cannot exceed 260. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45842 Reviewed By: albanD Differential Revision: D24137088 Pulled By: ezyang fbshipit-source-id: f0b4240735306e302eb3887f54a2b7af83c9f5dc	2020-10-07 08:39:15 -07:00
REX51	67889db8aa	Replaced BLACKLIST with BLOCKLIST (#45781 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41714 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45781 Reviewed By: nairbv Differential Revision: D24136821 Pulled By: albanD fbshipit-source-id: 0c0223bda0c5b4da75167a27d7859562db396304	2020-10-06 07:49:00 -07:00
Xiang Gao	2fa062002e	CUDA BFloat16 infrastructure (#44925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44925 Reviewed By: agolynski Differential Revision: D23783910 Pulled By: ngimel fbshipit-source-id: dacac2ad87d58056bdc68bfe0b7ab1de5c2af0d8	2020-10-02 16:21:30 -07:00
Abaho Katabarwa	de3a48013a	Use CAFFE2_USE_MSVC_STATIC_RUNTIME to determine when to avoid waiting for global destructors on Windows (#43532 ) Summary: We are trying to build libtorch statically (BUILD_SHARED_LIBS=OFF) then link it into a DLL. Our setup hits the infinite loop mentioned [here](`54c05fa34e/torch/csrc/autograd/engine.cpp (L228)`) because we build with `BUILD_SHARED_LIBS=OFF` but still link it all into a DLL at the end of the day. This PR fixes the issue by changing the condition to guard on which windows runtime the build links against using the `CAFFE2_USE_MSVC_STATIC_RUNTIME` flag. `CAFFE2_USE_MSVC_STATIC_RUNTIME` defaults to ON when `BUILD_SHARED_LIBS=OFF`, so backwards compatibility is maintained. I'm not entirely confident I understand the subtleties of the windows runtime versus linking setup, but this setup works for us and should not affect the existing builds. Fixes https://github.com/pytorch/pytorch/issues/44470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43532 Reviewed By: mrshenli Differential Revision: D24053767 Pulled By: albanD fbshipit-source-id: 1127fefe5104d302a4fc083106d4e9f48e50add8	2020-10-01 16:41:14 -07:00
Xiang Gao	0a15646e15	CUDA RTX30 series support (#45489 ) Summary: I also opened a PR on cmake upstream: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/5292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45489 Reviewed By: zhangguanheng66 Differential Revision: D23997844 Pulled By: ezyang fbshipit-source-id: 4e7443dde9e70632ee429184f0d51cb9aa5a98b5	2020-09-29 18:19:23 -07:00
peter	33aba57e4c	Patch generate files for system protobuf (#44583 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42939 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44583 Reviewed By: albanD Differential Revision: D23692639 Pulled By: ezyang fbshipit-source-id: 49781f704dd6ceab7717b63225d0b4076ce33daa	2020-09-29 18:06:33 -07:00
vishalrao487	d2623da52c	replaced whitelist with allowlist (#45260 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41754 (1) Intially file was named gen_op_registration_whitelist.py I changed it to gen_op_registration_allowlist.py (2) There were some whitelist in comment inside the file, I changed it to allowlist ![update1](https://user-images.githubusercontent.com/62737243/94106752-b296e780-fe59-11ea-8541-632a1dbf90d6.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45260 Reviewed By: dhruvbird Differential Revision: D23947182 Pulled By: ljk53 fbshipit-source-id: 31b486592451dbb0605d7950e07747cbb72ab80f	2020-09-29 00:27:46 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Peter Bell	b70fac75ac	CMake: Fix python dependencies in codegen (#45275 ) Summary: I noticed while working on https://github.com/pytorch/pytorch/issues/45163 that edits to python files in the `tools/codegen/api/` directory wouldn't trigger rebuilds. This tells CMake about all of the dependencies, so rebuilds are triggered automatically. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45275 Reviewed By: zou3519 Differential Revision: D23922805 Pulled By: ezyang fbshipit-source-id: 0fbf2b6a9b2346c31b9b0384e5ad5e0eb0f70e9b	2020-09-25 09:16:38 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
Ivan Kobzarev	6debe825be	[vulkan] glsl shaders relaxed precision mode to cmake option (#43076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43076 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23143354 Pulled By: IvanKobzarev fbshipit-source-id: 7b3ead1e63cf8acf6e8e547080a8ead7a2db994b	2020-09-16 12:51:34 -07:00
Abdelrauf	6954ae1278	Vec256 Test cases (#42685 ) Summary: [Tests for Vec256 classes https://github.com/pytorch/pytorch/issues/15676](https://github.com/pytorch/pytorch/issues/15676) Testing Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on Vsx vec_round failed for (even)+0 .5 values~~ . it was solved by using vec_rint - ~~complex types are not tested~~ After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Fixes: https://github.com/pytorch/pytorch/issues/15676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42685 Reviewed By: malfet Differential Revision: D23034406 Pulled By: glaringlee fbshipit-source-id: d1bf03acdfa271c88744c5d0235eeb8b77288ef8	2020-09-16 11:48:02 -07:00
Nikita Shulga	a5cc151b8c	Build EigenBlas as static library (#44747 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44747 Reviewed By: ezyang Differential Revision: D23717927 Pulled By: malfet fbshipit-source-id: c46fbcf5a55895cb984dd4c5301fbcb784fc17d5	2020-09-16 10:25:26 -07:00
peter	ed862d3682	Split CUDA_NVCC_FLAGS by space (#44603 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603 Reviewed By: albanD Differential Revision: D23692320 Pulled By: ezyang fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754	2020-09-14 20:25:37 -07:00
Yujun	db24c5c582	Change code coverage option name (#43999 ) Summary: According to [documentation](https://github.com/pytorch/pytorch/blob/master/tools/setup_helpers/cmake.py#L265), only options starts with `BUILD_` / `USE_` / `CMAKE_` in `CMakeLists.txt` can be imported by environment variables. --- This diff is originally intended to enable `c++` source coverage with `CircleCI` and `codecov.io`, but we will finish it in the future. You can find the related information in the diff history. Following is the originally procedur: Based on [this pull request](`1bda5e480c`), life becomes much easier for this time. 1.in `build.sh` - Enable coverage builld option for c++ - `apt-get install lcov` 2.in `test.sh` - run `lcov` 3.in `pytorch-job-specs.yml` - copy coverage.info to `test/` folder and upload it to codecov.io Pull Request resolved: https://github.com/pytorch/pytorch/pull/43999 Test Plan: Test on github Reviewed By: malfet Differential Revision: D23464656 Pulled By: scintiller fbshipit-source-id: b2365691f04681d25ba5c00293fbcafe8e8e0745	2020-09-11 15:55:05 -07:00
Nikita Shulga	8a574c7104	[Cmake] Drop quotation marks around `$ENV{MAX_JOBS}` (#44557 ) Summary: Solves `the '-j' option requires a positive integer argument` error on some systems when MAX_JOBS is not defined Pull Request resolved: https://github.com/pytorch/pytorch/pull/44557 Reviewed By: vkuzo Differential Revision: D23653511 Pulled By: malfet fbshipit-source-id: 7d86fb7fb6c946c34afdc81bf2c3168a74d00a1f	2020-09-11 12:57:11 -07:00
Nikita Shulga	fc51047af5	Small fixes in Dependency.cmake and run_test.py (#44414 ) Summary: Do not add gencode flags to NVCC_FLAGS twice: First time they are added in `cmake/public/cuda.cmake` no need to do it again in `cmake/Dependencies.cmake` Copy `additional_unittest_args` before appending local options to it in `run_test()` method Pull Request resolved: https://github.com/pytorch/pytorch/pull/44414 Reviewed By: seemethere Differential Revision: D23605733 Pulled By: malfet fbshipit-source-id: 782a0da61650356a978a892fb03c66cb1a1ea26b	2020-09-09 15:09:33 -07:00
Nikita Shulga	0e64b02912	FindCUDA error handling (#44236 ) Summary: Check return code of `nvcc --version` and if it's not zero, print warning and mark CUDA as not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44236 Test Plan: Run `CUDA_NVCC_EXECUTABLE=/foo/bar cmake ../` Reviewed By: ezyang Differential Revision: D23552336 Pulled By: malfet fbshipit-source-id: cf9387140a8cdbc8dab12fcc4bfaf55ae8e6a502	2020-09-07 18:17:55 -07:00
Nikita Shulga	4d431881d1	Control NCCL build parallelism via MAX_JOBS environment var (#44167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44167 Reviewed By: walterddr, ngimel Differential Revision: D23522419 Pulled By: malfet fbshipit-source-id: 31b25a71fef3e470bdf382eb3698e267326fa354	2020-09-04 10:02:53 -07:00
Bram Wasti	6512032699	[Static Runtime] Add OSS build for static runtime benchmarks (#43881 ) Summary: Adds CMake option. Build with: ``` BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43881 Reviewed By: hlu1 Differential Revision: D23430708 Pulled By: bwasti fbshipit-source-id: a39bf54e8d4d044a4a3e4273a5b9a887daa033ec	2020-09-02 08:00:18 -07:00
Rong Rong	8ca3913f47	Introduce BUILD_CAFFE2 flag (#43673 ) Summary: introduce BUILD_CAFFE2 flag. default to `ON`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43673 Reviewed By: malfet Differential Revision: D23381035 Pulled By: walterddr fbshipit-source-id: 1f4582987fa0c4a911f0b18d311c04fdbf8dd8f0	2020-09-01 10:18:23 -07:00
Edward Yang	6ea89166bd	Rewrite of ATen code generator (#42629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629 How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D23183978 Pulled By: ezyang fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763	2020-08-31 09:00:22 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Jiakai Liu	3afd24d62c	[pytorch] check in default generated op dependency graph (#43570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43570 Add the default op dependency graph to the source tree - use it if user runs custom build in dynamic dispatch mode without providing the graph. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23326988 Pulled By: ljk53 fbshipit-source-id: 5fefe90ca08bb0ca20284e87b70fe1dba8c66084	2020-08-27 14:51:44 -07:00
Yi Zhang	76894062dc	move wholearchive to link option (#43485 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43216 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43485 Reviewed By: glaringlee Differential Revision: D23318735 Pulled By: malfet fbshipit-source-id: 90c316d3d5ed51afcff356e6d9219950f119a902	2020-08-25 10:36:10 -07:00
Parichay Kapoor	8ecfa9d9a2	[cmake] End support for python3.5 for pytorch (#43105 ) Summary: PyTorch uses f-string in its python codes. Python support for f-string started with version 3.6 Using python version 3.5 or older fails the build with latest release/master. This patch checks the version of the python used for build and mandates it to be 3.6 or higher. Signed-off-by: Parichay Kapoor <kparichay@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/43105 Reviewed By: glaringlee Differential Revision: D23301481 Pulled By: malfet fbshipit-source-id: e9b4f7bffce7384c8ade3b7d131b10cf58f5e8a0	2020-08-25 09:42:42 -07:00
Ashkan Aliabadi	e8139624f2	Search on system path for Vulkan headers and libraries as a last resort. (#43301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43301 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252338 Pulled By: AshkanAliabadi fbshipit-source-id: 8eefe98eedf9dbeb570565bfb13ab61b1d6bca0e	2020-08-20 21:14:09 -07:00
Ann Shan	0dc41ff465	[pytorch] add flag for autograd ops to mobile builds (#43154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43154 Adds the build flag `BUILD_MOBILE_AUTOGRAD` which toggles whether autograd files should be included for a PyTorch mobile build (default off). ghstack-source-id: 110369406 Test Plan: CI Reviewed By: ljk53 Differential Revision: D23061913 fbshipit-source-id: bc3d6683ab17f158990d83e4fae0a011d5adeca1	2020-08-20 12:39:55 -07:00
Nikita Shulga	0cf4a5bccb	Add GCC codecoverage flags (#43066 ) Summary: Rename `CLANG_CODE_COVERAGE` option to `CODE_COVERAGE` and add compiler specific flags for GCC and Clang Pull Request resolved: https://github.com/pytorch/pytorch/pull/43066 Reviewed By: scintiller Differential Revision: D23137488 Pulled By: malfet fbshipit-source-id: a89570469692f878d84f7da6f9d5dc01df423e80	2020-08-14 17:16:18 -07:00
Nikita Shulga	dcee8933fb	Fix some linking rules to allow path with whitespaces (#42718 ) Summary: Essentially, replace `-Wl,--whole-archive,$<TARGET_FILE:FOO>` with `-Wl,--whole-archive,\"$<TARGET_FILE:FOO>\"` as TARGET_FILE might return path containing whitespaces Fixes https://github.com/pytorch/pytorch/issues/42657 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42718 Reviewed By: ezyang Differential Revision: D22993568 Pulled By: malfet fbshipit-source-id: de878b17d20e35b51dd350f20d079c8b879f70b5	2020-08-07 10:23:23 -07:00
Nikita Shulga	31ed468905	Fix cmake warning (#42707 ) Summary: If argumenets in set_target_properties are not separated by whitespace, cmake raises a warning: ``` CMake Warning (dev) at cmake/public/cuda.cmake:269: Syntax Warning in cmake code at column 54 Argument not separated from preceding token by whitespace. ``` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42707 Reviewed By: ailzhang Differential Revision: D22988055 Pulled By: malfet fbshipit-source-id: c3744f23b383d603788cd36f89a8286a46b6c00f	2020-08-07 09:57:21 -07:00
Luca Wehrstedt	c30bc6d4d7	Update TensorPipe submodule (#42522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CI Reviewed By: malfet Differential Revision: D22959472 fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67	2020-08-06 02:14:58 -07:00
Akash Patel	644d787cd8	find rccl properly (#42072 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42072 Reviewed By: malfet Differential Revision: D22969778 Pulled By: ezyang fbshipit-source-id: 509178775d4d99460bcb147bcfced29f04cabdc4	2020-08-05 21:46:38 -07:00
peter	b08347fd7b	Add CUDA 11 builds for Windows CI (#42420 ) Summary: Stacked on https://github.com/pytorch/pytorch/pull/42410. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42420 Reviewed By: seemethere Differential Revision: D22917230 Pulled By: malfet fbshipit-source-id: 6ad394f7f8c430c587e0b0d9c5a5e7b7bcd85bfe	2020-08-05 09:40:33 -07:00
Michael Carilli	0f358fab6b	Hide cudnn symbols in libtorch_cuda.so when statically linking cudnn (#41986 ) Summary: This PR intends to fix https://github.com/pytorch/pytorch/issues/32983. The initial (one-line) diff causes statically linked cudnn symbols in `libtorch_cuda.so` to have local linkage (such that they shouldn't be visible to external libraries during dynamic linking at load time), at least in my source build on Ubuntu 20.04. Procedure I used to verify: ``` export USE_STATIC_CUDNN=ON python3 setup.py install ... ``` then ``` mcarilli@mcarilli-desktop:~/Desktop/mcarilli_github/pytorch/torch/lib$ nm libtorch_cuda.so \| grep cudnnCreate 00000000031ff540 t cudnnCreate 00000000031fbe70 t cudnnCreateActivationDescriptor ``` Before the diff they were marked with capital `T`s indicating external linkage. Caveats: - The fix is gcc-specific afaik. I have no idea how to enable it for Windows or other compilers. - Hiding the cudnn symbols will break external C++ applications that rely on linking `libtorch.so` to supply cudnn symbol definitions. IMO this is "off menu" usage so I don't think it's a major concern. Hiding the symbols _won't_ break applications that call cudnn indirectly through torch functions, which IMO is the "on menu" way. - I know _very little_ about the build system. The diff's intent is to add a link option that applies to any Pytorch `.so`s that statically link cudnn, and does so on Linux only. I'm blindly following soumith 's recommendation https://github.com/pytorch/pytorch/issues/32983#issuecomment-662056151, and post-checking the built libs (I also added `set(CMAKE_VERBOSE_MAKEFILE ON)` to the top-level CMakeLists.txt at one point to confirm `-Wl,--exclude-libs,libcudnn_static.a` was picked up by the command that linked `libtorch_cuda.so`). - https://github.com/pytorch/pytorch/issues/32983 (which used a Pytorch 1.4 binary build) complained about `libtorch.so`, not `libtorch_cuda.so`: ``` nvpohanh@ubuntu:~$ nm /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so \| grep ' cudnnCreate' 000000000f479c30 T cudnnCreate 000000000f475ff0 T cudnnCreateActivationDescriptor ``` In my source build, `libtorch.so` ends up small, containing no cudnn symbols (this is true with or without the PR's diff), which contradicts https://github.com/pytorch/pytorch/issues/32983. Maybe the symbol organization (what goes in `libtorch.so` vs `libtorch_cuda/cpu/whatever.so`) changed since 1.4. Or maybe the symbol organization is different for source vs binary builds, in which case I have no idea if this PR's diff has the same effect for a binary build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41986 Reviewed By: glaringlee Differential Revision: D22934926 Pulled By: malfet fbshipit-source-id: 711475834e0f8148f0e5f2fe28fca5f138ef494b	2020-08-04 22:59:40 -07:00
Edward Yang	352e15f1a2	Revert D22812445: Update TensorPipe submodule Test Plan: revert-hammer Differential Revision: D22812445 (`2335430086`) Original commit changeset: e6d824bb28f5 fbshipit-source-id: 606632a9aaf2513b5ac949e4d6687aa7563eae5d	2020-07-31 10:16:48 -07:00
Luca Wehrstedt	2335430086	Update TensorPipe submodule (#42225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42225 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CircleCI is all green. Reviewed By: beauby Differential Revision: D22812445 fbshipit-source-id: e6d824bb28f5afe75fd765de0430968174f3531f	2020-07-30 02:32:52 -07:00
Khalid Almufti	b282297559	Replace whitelist with allowlist (#42067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41757 I've replaced all the whitelist with allowlist for this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067 Reviewed By: pbelevich Differential Revision: D22791690 Pulled By: malfet fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4	2020-07-28 08:01:16 -07:00
acxz	e59db43313	Find hip properly (#42064 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41886 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42064 Reviewed By: seemethere Differential Revision: D22757115 Pulled By: malfet fbshipit-source-id: 9c8805e6eb0b7d7defe0ecb08c1e45dcc775a237	2020-07-27 13:47:01 -07:00
Nikita Shulga	cf7e7909d5	NCCL must depend on librt (#41978 ) Summary: Since NCCL makes calls to shm_open/shm_close it must depend on librt on Linux This should fix `DSO missing from command line` error on some platforms Pull Request resolved: https://github.com/pytorch/pytorch/pull/41978 Reviewed By: colesbury Differential Revision: D22721430 Pulled By: malfet fbshipit-source-id: d2ae08ce9da3979daaae599e677d5e4519b080f0	2020-07-24 16:47:19 -07:00
Alexander Grund	a4b831a86a	Replace if(NOT ${var}) by if(NOT var) (#41924 ) Summary: As explained in https://github.com/pytorch/pytorch/issues/41922 using `if(NOT ${var})" is usually wrong and can lead to issues like https://github.com/pytorch/pytorch/issues/41922 where the condition is wrongly evaluated to FALSE instead of TRUE. Instead the unevaluated variable name should be used in all cases, see the CMake docu for details. This fixes the `NOT ${var}` cases by using a simple regexp replacement. It seems `pybind11_PREFER_third_party` is the only variable really prone to causing an issue as all others are set. However due to CMake evaluating unquoted strings in `if` conditions as variable names I recommend to never use unquoted `${var}` in an if condition. A similar regexp based replacement could be done on the whole codebase but as that does a lot of changes I didn't include this now. Also `if(${var})` will likely lead to a parser error if `var` is unset instead of a wrong result Fixes https://github.com/pytorch/pytorch/issues/41922 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41924 Reviewed By: seemethere Differential Revision: D22700229 Pulled By: mrshenli fbshipit-source-id: e2b3466039e4312887543c2e988270547a91c439	2020-07-23 15:49:20 -07:00
Jeff Daily	5152633258	[ROCm] update hip library name (#41813 ) Summary: With transition to hipclang, the HIP runtime library name was changed. A symlink was added to ease the transition, but is going to be removed. Conditionally set library name based on HIP compiler used. Patch gloo submodule as part of build_amd.py script until its associated fix is available. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41813 Reviewed By: zhangguanheng66 Differential Revision: D22660077 Pulled By: xw285cornell fbshipit-source-id: c538129268d9947535b34523201f655b13c9e0a3	2020-07-22 09:42:45 -07:00
suffian khan	92b95e5243	Fix NCCL version check when nccl.h in non-standard location. (#40982 ) Summary: The NCCL discovery process fails to compile detect_nccl_version.cc when nccl.h resides in a non-standard location. Pass __NCCL_INCLUDE_DIRS__ to _try_run(... detect_nccl_version.cc)_ to fix this. Can reproduce with Dockerfile .. ```Dockerfile FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 as build WORKDIR /stage # install conda ARG CONDA_VERSION=4.7.10 ARG CONDA_URL=https://repo.anaconda.com/miniconda/Miniconda3-${CONDA_VERSION}-Linux-x86_64.sh RUN cd /stage && curl -fSsL --insecure ${CONDA_URL} -o install-conda.sh &&\ /bin/bash ./install-conda.sh -b -p /opt/conda &&\ /opt/conda/bin/conda clean -ya ENV PATH=/opt/conda/bin:${PATH} # install prerequisites RUN conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi # attempt compile ENV CUDA_HOME="/usr/local/cuda" \ CUDNN_LIBRARY="/usr/lib/x86_64-linux-gnu" \ NCCL_INCLUDE_DIR="/usr/local/cuda/include" \ NCCL_LIB_DIR="/usr/local/cuda/lib64" \ USE_SYSTEM_NCCL=1 RUN apt-get -y update &&\ apt-get -y install git &&\ cd /stage && git clone https://github.com/pytorch/pytorch.git &&\ cd pytorch &&\ git submodule update --init --recursive &&\ python setup.py bdist_wheel ``` This generates the following error .. ``` -- Found NCCL: /usr/local/cuda/include -- Determining NCCL version from /usr/local/cuda/include/nccl.h... -- Looking for NCCL_VERSION_CODE -- Looking for NCCL_VERSION_CODE - found CMake Error at cmake/Modules/FindNCCL.cmake:78 (message): Found NCCL header version and library version do not match! (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libnccl.so) Please set NCCL_INCLUDE_DIR and NCCL_LIB_DIR manually. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40982 Reviewed By: zou3519 Differential Revision: D22603911 Pulled By: malfet fbshipit-source-id: 084870375a270fb9c7daf3c2e731992a03614ad6	2020-07-17 13:54:17 -07:00
Xiang Gao	8940a4e684	Pull upstream select_compute_arch from cmake for Ampere (#41133 ) Summary: This pulls the following merge requests from CMake upstream: - https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4979 - https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4991 The above two merge requests improve the Ampere build: - If `TORCH_CUDA_ARCH_LIST` is not set, it can now automatically pickup 8.0 as its part of its default value - If `TORCH_CUDA_ARCH_LIST=Ampere`, it no longer fails with `Unknown CUDA Architecture Name Ampere in CUDA_SELECT_NVCC_ARCH_FLAGS` Codes related to architecture < 3.5 are manually removed because PyTorch no longer supports it. cc: ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/41133 Reviewed By: malfet Differential Revision: D22540547 Pulled By: ezyang fbshipit-source-id: 6e040f4054ef04f18ebb7513497905886a375632	2020-07-15 12:53:32 -07:00
Anush Elangovan	c86699d425	[cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387 ) Summary: Add support for including pytorch via an add_subdirectory() This requires using PROJECT_* instead of CMAKE_* which refer to the top-most project including pytorch. TEST=add_subdirectory() into a pytorch checkout and build. There are still some hardcoded references to TORCH_SRC_DIR, I will fix in a follow on commit. For now you can create a symlink to <pytorch>/torch/ in your project. Change-Id: Ic2a8aec3b08f64e2c23d9e79db83f14a0a896abc Pull Request resolved: https://github.com/pytorch/pytorch/pull/41387 Reviewed By: zhangguanheng66 Differential Revision: D22539944 Pulled By: ezyang fbshipit-source-id: b7e9631021938255f0a6ea897a7abb061759093d	2020-07-15 11:09:05 -07:00
Alexander Grund	ac3542fa59	Define PSIMD_SOURCE_DIR when including FP16 (#41233 ) Summary: Avoids a superflous redownload when *NNPACK is not used (e.g. on Power) Example: https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/1128/consoleFull Search for "Downloading PSimd" See also https://github.com/pytorch/pytorch/issues/41178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41233 Differential Revision: D22488833 Pulled By: malfet fbshipit-source-id: 637291419ddd3b2a8dc25e211a4ebbba955e5855	2020-07-10 16:55:10 -07:00
Kimish Patel	d6feb6141f	[Vec256][neon] Add neon backend for vec256 (#39341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341 This PR introduces neon backend for vec256 class for float datatype. For now only aarch64 is enabled due to few issues with enabling in aarch32 bit. Test Plan: vec256_test Imported from OSS Differential Revision: D21822399 fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d	2020-07-09 16:25:09 -07:00
Kimish Patel	bddba1e336	Add benchmark for add op. (#40059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059 This benchmark is added specifically for mobile to see if compiler is autovectorizing and thus we have no advantage of neon backend for vec256 for add op. Test Plan: CI Imported from OSS Differential Revision: D22055146 fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5	2020-07-09 16:22:55 -07:00
Yujun Zhao	22f940b7bd	add clang code coverage compile flags (#41103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41103 add a CLANG_CODE_COVERAGE option to CMakeList. If the option is ON, add code coverage needed compile flags. Test Plan: Clone pytorch source code to local, modified these changes and builded it with `CLANG_CODE_COVERAGE ON` and `BUILD_TESTS ON`. Run a manual test and attach code coverage report. {F243609020} Reviewed By: malfet Differential Revision: D22422513 fbshipit-source-id: 27a31395c31b5b5f4b72523954722771d8f61080	2020-07-09 14:14:18 -07:00
Alexander Grund	7c29a4e66f	Don't add NCCL dependency to gloo if system NCCL is used (#41180 ) Summary: This avoids a (currently only) warning of cmake: ``` The dependency target "nccl_external" of target "gloo_cuda" does not exist. Call Stack (most recent call first): CMakeLists.txt:411 (include) ``` This will be a real problem once Policy CMP0046 is set which will make this warning be an error Pull Request resolved: https://github.com/pytorch/pytorch/pull/41180 Differential Revision: D22460623 Pulled By: malfet fbshipit-source-id: 0222b12b435e5e2fdf2bc85752f95abba1e3d4d5	2020-07-09 12:10:29 -07:00
Ashkan Aliabadi	c8deca8ea8	Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40524 Reviewed By: ezyang Differential Revision: D22215742 Pulled By: AshkanAliabadi fbshipit-source-id: ef594e0901337a92b21ddd44e554da66c723eb7c	2020-07-09 10:00:36 -07:00
Thomas Viehmann	a8bc7545d5	use PYTORCH_ROCM_ARCH to set GLOO_ROCM_ARCH (#40170 ) Summary: Previously it used the default arch set which may or may not coincide with the user's. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40170 Differential Revision: D22400866 Pulled By: xw285cornell fbshipit-source-id: 222ba684782024fa68f37bf7d4fdab9a2389bdea	2020-07-07 19:41:02 -07:00
Wojciech Baranowski	0b9717b86a	When linking libtorch_cpu.so, put AVX sources last in the input list (#40449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40449 Reviewed By: VitalyFedyunin Differential Revision: D22312501 Pulled By: colesbury fbshipit-source-id: 4c09adb0173749046f20b84241d6c940b339ad77	2020-07-06 07:56:12 -07:00
peter	6e4f99b063	Fix wrong MSVC version constraint for CUDA 9.2 (#40794 ) Summary: Tested with https://github.com/pytorch/pytorch/pull/40782. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40794 Differential Revision: D22318045 Pulled By: malfet fbshipit-source-id: a737ffd7cb8a6a9efb62b84378318f4c3800ad8f	2020-06-30 13:02:45 -07:00
Zhang, Xiaobing	63e5a53b8c	DNNL: fix build error when DNNL using TBB threading pool (#40699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40699 Differential Revision: D22286334 Pulled By: albanD fbshipit-source-id: 0635a0a5e4bf80d44d90c86945d92e98e26ef480	2020-06-29 13:53:18 -07:00
Nikita Shulga	44bf822084	Add C++ standard version check to top level headers (#40510 ) Summary: Remove `-std=c++14` flag from `utils.cmake`, since PyTorch C++ API can be invoked by any compiler compliant with C++14 standard or later Pull Request resolved: https://github.com/pytorch/pytorch/pull/40510 Differential Revision: D22253313 Pulled By: malfet fbshipit-source-id: ff731525868b251c27928fc98b0724080ead9be2	2020-06-26 08:44:04 -07:00
David Reiss	b7e044f0e5	Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5	2020-06-23 19:26:21 -07:00
Kate Mormysh	92d3182c11	Revert D21232894: Unify PyTorch mobile's threadpool usage. Test Plan: revert-hammer Differential Revision: D21232894 (`b9d3869df3`) Original commit changeset: 8b3de86247fb fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd	2020-06-23 17:09:14 -07:00
Ashkan Aliabadi	b9d3869df3	Unify PyTorch mobile's threadpool usage. (#37243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243 * Why * As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version. The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands. This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell. So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do. The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the exact same third party implementation in this PR. Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged. * How * This is where things get tricky. A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use. pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far. Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration. When it is all said or done, the layering will look like this: a) aten::parallel_for, uses b) caffe2::PThreadPool, which uses c) pthreadpool C API, which delegates to c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here. c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to c-2-1) caffe2::ThreadPool, and the rabbit hole ends here. NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b). Differential Revision: D21232894 Test Plan: Imported from OSS Reviewed By: dreiss Pulled By: AshkanAliabadi fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354	2020-06-23 16:34:51 -07:00
Nikita Shulga	e509c58a1c	Set C++14 compatibility flag in torch_compile_options (#40399 ) Summary: Also mark warning modifiers as private options (i.e. libraries depending on `torch_cpu` do not have to be compiled with `-Wall`) Closes https://github.com/pytorch/pytorch/issues/31283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40399 Differential Revision: D22186206 Pulled By: malfet fbshipit-source-id: 1ad4277b5acc5c39849a3e4efe4b93a189d26e59	2020-06-23 07:10:22 -07:00
Richard Zou	2ba5f98dd1	Revert D22068657: [pytorch][PR] Remove global CMAKE_INSTALL_RPATH_USE_LINK_PATH directive Test Plan: revert-hammer Differential Revision: D22068657 Original commit changeset: b04c529572a9 fbshipit-source-id: d8227dfc12d9b6382f7bf2905686b6025034561c	2020-06-17 13:05:01 -07:00
mattip	49732f0450	Remove global CMAKE_INSTALL_RPATH_USE_LINK_PATH directive (#37737 ) Summary: Closes gh-35418, PR gh-16414 added [the `CMAKE_INSTALL_RPATH_USE_LINK_PATH`directive](https://github.com/pytorch/pytorch/pull/16414/files#diff-dcf5891602b4162c36c2125c806639c5R16) which is non-standard and will cause CMake to write an `RPATH` entry for libraries outside the current build. Removing it leaves an RPATH entry for `$ORIGIN` but removes the entries for things like `/usr/local/cuda-10.2/lib64/stubs:/usr/local/cuda-10.2/lib64` for `libcaffe2_nvrtc.so` on linux. The added test fails before this PR, passes after. It is equivalent to checking `objdump -p torch/lib/libcaffe2_nvrtc.so \| grep RPATH` for an external path to the directory where cuda "lives" I am not sure if it solve the `rpath/libc++.1.dylib` problem for `_C.cpython-37m-darwin.so` on macOS in issue gh-36941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37737 Differential Revision: D22068657 Pulled By: ezyang fbshipit-source-id: b04c529572a94363855f1e4dd3e93c9db3c85657	2020-06-16 11:18:39 -07:00
peter	905c6730b7	Adding /FS for NVCC if /Zi is used (#39994 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39989. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39994 Differential Revision: D22034956 Pulled By: malfet fbshipit-source-id: b26cf188eba8b796ee6e39e6adbc3e2fbb07a53a	2020-06-13 12:16:12 -07:00
peter	0f39ed86a7	Cleanup debug info switches with MSVC (#39703 ) Summary: Switch off `/Z7` so that we don't generate debug info in Release and MinSizeRel builds, so that we will probably get smaller static libraries and object files and faster build time Pull Request resolved: https://github.com/pytorch/pytorch/pull/39703 Differential Revision: D21960684 Pulled By: ezyang fbshipit-source-id: 909a237a138183591d667885b13fc311470eed65	2020-06-09 14:11:40 -07:00
Xiang Gao	b3fac8af6b	Initial support for building on Ampere GPU, CUDA 11, cuDNN 8 (#39277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277 This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8. TF32 related features will not be included in this PR. Test Plan: Imported from OSS Differential Revision: D21832814 Pulled By: malfet fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06	2020-06-02 10:03:42 -07:00
Michael Voznesensky	fce01a9bab	[JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379 ) Summary: Before: ``` 2020-05-11 18:31:41 INFO Benchmarking 'basic', best of 10 runs (with 1 warmup runs) { "Big Tensors Save": { "mean": 17.8048762, "median": 17.458917 }, "Big Tensors Load": { "mean": 3.2556887, "median": 2.9668495000000004 }, "Small Tensors Save": { "mean": 4.0381357, "median": 3.9440125 }, "Small Tensors Load": { "mean": 5.8792499, "median": 5.603067 }, "benchmark_run_at": "2020-05-12T01:31:41" } ``` After ``` Use zipfile serialization: True 2020-05-12 20:15:32 INFO Benchmarking 'basic', best of 10 runs (with 1 warmup runs) { "Big Tensors Save": { "mean": 4.7534657, "median": 4.646732 }, "Big Tensors Load": { "mean": 3.6001919, "median": 3.493285 }, "Small Tensors Save": { "mean": 4.1066924, "median": 4.1219255 }, "Small Tensors Load": { "mean": 6.3902358, "median": 6.36977 }, "benchmark_run_at": "2020-05-13T03:15:32" } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38379 Differential Revision: D21779494 Pulled By: voznesenskym fbshipit-source-id: 694d65029a5b817424d454bd331e285df828c67a	2020-05-29 01:56:18 -07:00
Ivan Kobzarev	928e99b9bb	[vulkan] jni build support USE_VULKAN (#39188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39188 Extracting Vulkan_LIBS and Vulkan_INCLUDES setup from `cmake/Dependencies.cmake` to `cmake/VulkanDependencies.cmake` and reuse it in android/pytorch_android/CMakeLists.txt Adding control to build with Vulkan setting env variable `USE_VULKAN` for `scripts/build_android.sh` `scripts/build_pytorch_android.sh` We do not use Vulkan backend in pytorch_android, but with this build option we can track android aar change with `USE_VULKAN` added. Currently it is 88Kb. Test Plan: Imported from OSS Differential Revision: D21770892 Pulled By: IvanKobzarev fbshipit-source-id: a39433505fdcf43d3b524e0fe08062d5ebe0d872	2020-05-28 15:39:02 -07:00
peter	1fef2075a5	Disable some unsupported module for 32-bit build (#38950 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38322#issuecomment-632976523 and https://github.com/pytorch/pytorch/issues/38322#issuecomment-628698852. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38950 Differential Revision: D21721918 Pulled By: ezyang fbshipit-source-id: 999788bb88d3e3c2c06f8dec4f0d6b3389095936	2020-05-26 08:30:35 -07:00
Ivan Kobzarev	b460465a18	[Mobile GPU][Integration] Vulkan backend integration (#36491 ) Summary: This PR contains the initial version of Vulkan (GPU) Backend integration. The primary target environment is Android, but the desktop build is also supported. ## CMake Introducing three cmake options: USE_VULKAN: The main switch, if it is off, all other options do not affect. USE_VULKAN_WRAPPER: ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h. OFF - linking with libvulkan.so directly USE_VULKAN_SHADERC_RUNTIME: ON - Shader compilation library will be linked, and shaders will be compiled runtime. OFF - Shaders will be precompiled and shader compilation library is not included. ## Codegen if `USE_VULKAN_SHADERC_RUNTIME` is ON: Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp. if `USE_VULKAN_SHADERC_RUNTIME` is OFF: The source of shaders is included as `glsl.h`,`glsl.cpp`. All codegen results happen in the build directory. ## Build dependencies cmake/Dependencies.cmake If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK. Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it. (Desktop build was tested only on Linux). ## Pytorch integration: Adding 'Vulkan" as new Backend, DispatchKey, DeviceType. We are using Strided layout without supporting strides at the moment, but we plan to support them in the future. Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor, more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h` Main code location: `aten/src/ATen/native/vulkan` `aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor. `aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops. `aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API ## GLSL shaders Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files. All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3 ## Supported operations Code point: conv2d no-groups conv2d depthwise addmm upsample nearest 2d clamp hardtanh ## Testing `aten/src/ATen/test/vulkan_test.cpp` - contains tests for copy from CPU to Vulkan and back all supported operations Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader ## Vulkan execution The initial implementation is trivial and waits every operator's execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491 Differential Revision: D21696709 Pulled By: IvanKobzarev fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa	2020-05-26 08:30:13 -07:00
Nikita Shulga	664a3ab5c7	Enable py38 gcc9 build config (#38805 ) Summary: Add `py38-gcc9` build-only config Add appropriate `-Wno-xyz` flags to ATEN kernels as well as `tensorexp/llvm_jit.cpp` and `tensorexp/llvm_codegen.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38805 Differential Revision: D21682953 Pulled By: malfet fbshipit-source-id: 5b61d0dfe8bdec8fb13e2ae5857dc5e7c6e58e42	2020-05-21 01:38:04 -07:00
Nikita Shulga	dc918162b7	Remove `Caffe2_MAIN_LIBS` (#38408 ) Summary: Right now it is an unused alias to `torch_library` interface library Pull Request resolved: https://github.com/pytorch/pytorch/pull/38408 Differential Revision: D21598250 Pulled By: malfet fbshipit-source-id: ec9a2446b94e7ea68298831212005c2c80bbc95c	2020-05-15 12:27:15 -07:00
Nikita Shulga	4b52e52577	Use `jit_core_sources` from build_varliables.bzl (#38526 ) Summary: Replace hardcoded filelist in aten/src/ATen/CMakeLists.txt with one from `jit_source_sources` Fix `append_filelist` to work independently from the location it was invoked Pull Request resolved: https://github.com/pytorch/pytorch/pull/38526 Differential Revision: D21594582 Pulled By: malfet fbshipit-source-id: c7f216a460edd474a6258ba5ddafd4c4f59b02be	2020-05-15 08:21:37 -07:00
Wojciech Baranowski	945672bf3e	cmake: improve dependencies in incremental builds (#37661 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26304 Test procedure: With ninja: [x] Build a clean checkout [x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. [x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files [x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding. [x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. Without ninja: [x] Build a clean checkout [x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661 Differential Revision: D21434624 Pulled By: ezyang fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338	2020-05-06 14:25:18 -07:00
Lucas Hosseini	8a30553738	[TensorPipe/RPC] Add TensorPipe dependency (#36695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36695 Reviewed By: lw Differential Revision: D21312297 Pulled By: beauby fbshipit-source-id: 39fdc3de91efa4ac97dd169f09fb304b273b0050	2020-04-30 11:05:15 -07:00
Jiakai Liu	6792bafa72	[pytorch] aten codegen to filter backends for default mobile build Summary: This is a simple change to mitigate the OSS mobile default build size regression caused by #34275 and #34622. Mobile supported backends are already kinda hard-coded in function_wrapper.py as `static_dispatch_backends`: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/function_wrapper.py#L243 This is simply to align dynamic registration with static dispatch for mobile build. To measure mobile build size: ``` // Default mobile build: scripts/build_pytorch_android.sh armeabi-v7a // MobileNetV2 custom build: SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` - arm-v7 Android AAR (compressed) size: ``` +----------+-------------------+---------------+ \| \| MobileNetV2 Build \| Default Build \| +----------+-------------------+---------------+ \| Original \| 3,354,589 \| 5,731,992 \| \| #34275 \| 3,404,978 \| 6,640,526 \| \| #34622 \| 3,432,569 \| 6,640,526 \| \| This PR \| 3,431,660 \| 6,534,135 \| +----------+-------------------+---------------+ ``` Differential Revision: D20415107 Test Plan: Imported from OSS Pulled By: ljk53 fbshipit-source-id: 75acf4dc5dfe9242c01b2db0b84bd6b4a1d0cd8d	2020-04-30 01:35:38 -07:00
Mo Zhou	58a46a174e	[cmake] add USE_SYSTEM_{XNNPACK,ONNX} options. (#37501 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37501 Differential Revision: D21303527 Pulled By: ezyang fbshipit-source-id: 58353d78c66e5bcc9198ce8cde36ac7232bb4b2f	2020-04-29 09:26:16 -07:00
Michael Suo	68895eda9d	add fmt, take 7 (#37356 ) Summary: fmt is a formatting library for C++. It has several properties that make it nice for inclusion in PyTorch: - Widely used - Basically copies how Python does it - Support for all the compilers and platforms we care about - Standards track (C++20) - Small code size - Header only This PR includes it as a submodule and sets up the build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37356 Differential Revision: D21262619 Pulled By: suo fbshipit-source-id: 1d9a1a5ed08a634213748e7b02fc718ef8dac4c9	2020-04-29 09:08:24 -07:00
cyy	9259a283b7	use detected python version to find pylibs (#34041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34041 Differential Revision: D21302552 Pulled By: ezyang fbshipit-source-id: 140c3d2146bad8feb425cf3670cffdbabc5101b1	2020-04-29 08:17:15 -07:00
peter	bf53784e3c	Treat cross-execution-space-call as errors for NVCC on Windows (#37302 ) Summary: On Windows, when you call those unsupported functions like `std::pow`, `std::isnan` or `std::isinf` in the device function and compile, a warning is thrown: ``` kernel.cu kernel.cu(39): warning: calling a __host__ function from a __host__ __device__ function is not allowed kernel.cu(42): warning: calling a __host__ function from a __host__ __device__ function is not allowed kernel.cu(39): warning: calling a __host__ function("isnan<double> ") from a __host__ __device__ function("test_") is not allowed kernel.cu(42): warning: calling a __host__ function("isinf<double> ") from a __host__ __device__ function("test_") is not allowed ``` However, those calls will lead to runtime errors, see https://github.com/pytorch/pytorch/pull/36749#issuecomment-619239788 and https://github.com/pytorch/pytorch/issues/31108. So we should treat them as errors. Previously, the situation is worse because the warnings are turned off by passing in `-w`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37302 Differential Revision: D21297207 Pulled By: ngimel fbshipit-source-id: 822b8a98c10e54c38319674763b6681db21c1021	2020-04-29 01:52:52 -07:00
Bram Wasti	4234d62489	[hotfix] Workaround for older versions of ninja (#37417 ) Summary: Older versions of ninja don't like relative paths in configure_file when it is called twice. https://gitlab.kitware.com/cmake/cmake/issues/17601 Fix suggested in comments https://gitlab.kitware.com/cmake/cmake/-/issues/18584 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37417 Reviewed By: malfet Differential Revision: D21280141 Pulled By: bwasti fbshipit-source-id: 4cb94996a9e8ae8c01602ea1da6f4ce9d61fa700	2020-04-28 09:03:51 -07:00
peter	c5d6f59ab1	Replacing EHa with EHsc (#37235 ) Summary: We should not rely on the async exceptions. Catching C++ only exception is more sensible and may get a boost in both space (1163 MB -> 1073 MB, 0.92x) and performance(51m -> 49m, 0.96x). Pull Request resolved: https://github.com/pytorch/pytorch/pull/37235 Differential Revision: D21256918 Pulled By: ezyang fbshipit-source-id: 572ee96f2e4c48ad13f83409e4e113483b3a457a	2020-04-28 08:20:37 -07:00
Jacob Zhong	e33c3e49d5	Fix hard-code cmake target (#37310 ) Summary: Fix https://github.com/pytorch/pytorch/issues/33928. Basically just move the dependency into a new imported target. I'm not sure whether this modification will affect other parts, please test it throughly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37310 Differential Revision: D21263066 Pulled By: ezyang fbshipit-source-id: 7dc38f578d7e9bcb491ef5e122106fb66a33156f	2020-04-27 14:20:30 -07:00
Mo Zhou	5b9f7f7b0e	[cmake] Add USE_SYSTEM_{GLOO,FP16,PTHREADPOOL,PSIMD,FXDIV,BENCHMARK} options (#14699 ) (#37277 ) Summary: These options are disabled by default, and are supposed to be used by linux distro developers. With the existing shortcut option USE_SYSTEM_LIBS toggled, these new options will be enabled as well. Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should no longer check the existence of git submodules. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277 Differential Revision: D21256999 Pulled By: ezyang fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf	2020-04-27 09:37:27 -07:00
Mo Zhou	007163407c	[cmake] Support "Generic" BLAS (#14699 ) (#37276 ) Summary: The "Generic" BLAS refers to the Netlib BLAS. This option is meaningful to the Debian family due to the "update-alternatives" mechanism, which enables the user to switch the libblas.so providers between different implementations at runtime, such as ATLAS, OpenBLAS, and Intel MKL. Such, building against generic BLAS provides much flexibility. This new option is not documented in setup.py because it's only supposed to be used by linux distro (especially Debian family) developersonly. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37276 Differential Revision: D21256877 Pulled By: ezyang fbshipit-source-id: 55a5356653a1cfc763a5699b04afe5938f2007ec	2020-04-27 08:17:43 -07:00
Mo Zhou	ff21b15624	cmake: add USE_SYSTEM_{LIBS,CPUINFO,SLEEF} options (#14699 ) (#37137 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37137 Differential Revision: D21222632 Pulled By: ezyang fbshipit-source-id: 47624b30f8d07b31a40a26edf665bbec39e45202	2020-04-23 20:43:36 -07:00
Nikita Shulga	76cb7f2043	Use filelist from build_variables.bzl to fetch distributed file list (#37090 ) Summary: Rename `get_filelist` to `append_filelist` Repalce hadcoded filelist under `USE_DISTRIBUTED` with `append_filelist("libtorch_distributed_sources" TORCH_SRCS)` call Pull Request resolved: https://github.com/pytorch/pytorch/pull/37090 Test Plan: CI Differential Revision: D21184002 Pulled By: malfet fbshipit-source-id: 25bb7f97fcb2bf5bec8bdb3aa059ae13e7610007	2020-04-22 13:13:25 -07:00
Edward Yang	1f82679311	Revert D21156042: [pytorch][PR] CMake/Ninja: fix dependencies for .cu files Test Plan: revert-hammer Differential Revision: D21156042 Original commit changeset: fda3aaa57207 fbshipit-source-id: 59b208d4dc7ab743876af3ed382477770526aa1a	2020-04-21 14:24:27 -07:00
Wojciech Baranowski	db84689c09	CMake/Ninja: fix dependencies for .cu files (#36938 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26304 After this patch `build.ninja` entries for `.cu` files will contain a `depfile` variable pointing to a `.NVCC-depend` file containing dependencies (i.e., header files included directly or indirectly) of the `.cu` source file. Until now, those `.NVCC-depend` files were being transposed into `.cu.o.depend` files in CMake format. That did not work as intended because the `.cu.o` target file was declared to be dependent on the `.cu.o.depend` file itself, rather than its contents. In fact, Ninja lacks the functionality to process dependencies in the CMake format of those `.cu.o.depend` files. This was tested on Linux as described in https://github.com/pytorch/pytorch/issues/26304#issuecomment-614667170 I have also verified that the original problem does not reproduce with Makefiles (i.e., when `ninja` is not present in the system) and that PyTorch still build successfully with Makefiles after this patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36938 Differential Revision: D21156042 Pulled By: ezyang fbshipit-source-id: fda3aaa57207f4d6bf74d2f254fe45fb7fd90eec	2020-04-21 09:43:48 -07:00
Nikita Shulga	4668d47d1f	Add build_variable.bzl to CMAKE_RERUN target (#36809 ) Summary: `configure_file` command adds its input as a top-level dependency triggering make file regeneration if file timestamp have changed Also abort CMAKE if `exec` of build_variables.bzl failed for some reason Pull Request resolved: https://github.com/pytorch/pytorch/pull/36809 Test Plan: Add invalid statement to build_variables.bzl and check that build process fails Differential Revision: D21100721 Pulled By: malfet fbshipit-source-id: 79a54aa367fb8dedb269c78b9538b4da203d856b	2020-04-17 17:28:07 -07:00
Nikita Shulga	d7fc05b0bf	Fetch TORCH_SRCS from `build_variables.bzl` (#36737 ) Summary: Mimic `.bzl` parsing logic from https://github.com/pytorch/FBGEMM/pull/344 Generate `libtorch_cmake_sources` by running following script: ``` def read_file(path): with open(path) as f: return f.read() def get_cmake_torch_srcs(): caffe2_cmake = read_file("caffe2/CMakeLists.txt") start = caffe2_cmake.find("set(TORCH_SRCS") end = caffe2_cmake.find(")", start) return caffe2_cmake[start:end+1] def get_cmake_torch_srcs_list(): caffe2_torch_srcs = get_cmake_torch_srcs() unfiltered_list = [x.strip() for x in get_cmake_torch_srcs().split("\n") if len(x.strip())>0] return [x.replace("${TORCH_SRC_DIR}/","torch/") for x in unfiltered_list if 'TORCH_SRC_DIR' in x] import imp build_variables = imp.load_source('build_variables', 'tools/build_variables.bzl') libtorch_core_sources = set(build_variables.libtorch_core_sources) caffe2_torch_srcs = set(get_cmake_torch_srcs_list()) if not libtorch_core_sources.issubset(caffe2_torch_srcs): print("libtorch_core_sources must be a subset of caffe2_torch_srcs") print(sorted(caffe2_torch_srcs.difference(libtorch_core_sources))) ``` Move common files between `libtorch_cmake_sources` and `libtorch_extra_sources` to `libtorch_jit_core_sources` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36737 Test Plan: CI Differential Revision: D21078753 Pulled By: malfet fbshipit-source-id: f46ca48d48aa122188f028136c14687ff52629ed	2020-04-16 19:12:52 -07:00
David Reiss	83de675ebf	Fail CMake setup if trying to build with Python 2 (#35612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35612 Python 2 has reached end-of-life and is no longer supported by PyTorch. To spare users from a long, doomed build when trying to use PyTorch with Python 2, detect this case early and fail with a clear message. This commit covers CMake setup. Test Plan: Attempted to build PyTorch with Python 2 and saw a clear error quickly. Differential Revision: D20842873 Pulled By: dreiss fbshipit-source-id: b35e38c12f9381ff4ca10cf801b7a03da87b1d19	2020-04-16 10:22:36 -07:00

... 3 4 5 6 7 ...

1141 Commits