pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

Author	SHA1	Message	Date
Huy Do	24e9bbe22a	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 02:54:20 +00:00
cyy	621463a3e6	Update libfmt submodule to 10.1.1 (#108431 ) This PR updates libfmt to version 10.1.1. We also set utf-8 source encoding earlier before include third party libraries on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108431 Approved by: https://github.com/Skylion007	2023-09-03 23:44:39 +00:00
drisspg	add45aea1c	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-01 22:14:44 +00:00
PyTorch MergeBot	d569e506ab	Revert "Flash Attention v2 (#105602 )" This reverts commit `9df3d882c8`. Reverted https://github.com/pytorch/pytorch/pull/105602 on behalf of https://github.com/huydhn due to I think we miss a case here for sm80 build on inductor workflow as it is now OOM on trunk https://github.com/pytorch/pytorch/actions/runs/6042843139 ([comment](https://github.com/pytorch/pytorch/pull/105602#issuecomment-1701974862))	2023-09-01 01:15:01 +00:00
drisspg	9df3d882c8	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-08-31 16:02:20 +00:00
drisspg	182a9cf366	Add Independent Memory Efficient and Flash Attention Build Flags (#107985 ) # Summary In an effort to simplify https://github.com/pytorch/pytorch/pull/105602, this PR pulls out independent chunks of code that can be landed prior to FlashV2 landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107985 Approved by: https://github.com/cpuhrsch	2023-08-28 18:39:18 +00:00
peterjc123	8507b22fea	propagate _GLIBCXX_USE_CXX11_ABI to NVCC (#107209 ) Fixes #107161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107209 Approved by: https://github.com/malfet	2023-08-16 22:41:52 +00:00
Jesse Cai	f81f9093ec	[core][pruning][feature] cuSPARSELt build integration (#103700 ) Summary: This stack of PR's integrates cuSPARSELt into PyTorch. This PR adds support for cuSPARSELt into the build process. It adds in a new flag, USE_CUSPARSELT that defaults to false. When USE_CUSPASRELT=1 is specified, the user can also specify CUSPASRELT_ROOT, which defines the path to the library. Compiling pytorch with cusparselt support can be done as follows: `` USE_CUSPARSELT=1 CUSPARSELT_ROOT=/path/to/cusparselt python setup.py develop ``` Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700 Approved by: https://github.com/albanD	2023-08-02 12:48:39 +00:00
Driss Guessous	d184c81166	Add -fstandalone-debug debug flag (#104475 ) # Summary While debugging something in lldb, I found that the formatter I wrote for c10::intarrayref was not working correctly producing: `(std::string) $6 = error: summary string parsing error` Based off of this thread: https://github.com/vadimcn/codelldb/issues/415 I adde the standalone-debug information and fixed the std::string formatting issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104475 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-11 01:29:20 +00:00
Nikita Shulga	456ecefd52	[BE] Fix warning in top-level CMakeLists.txt (#104726 ) Fixes warning introduced by https://github.com/pytorch/pytorch/issues/102594: ``` CMake Warning (dev) in CMakeLists.txt: A logical block opening on the line /pytorch/CMakeLists.txt:726 (if) closes on the line /pytorch/CMakeLists.txt:735 (endif) with mis-matching arguments. ``` <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at b7555d5</samp> > _`DEBUG_CUDA` on_ > _No more CUDA in exe_ > _Winter bug is fixed_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104726 Approved by: https://github.com/huydhn, https://github.com/atalman	2023-07-06 22:13:29 +00:00
Xu Han	a956b1c849	optimize mimalloc build options. (#104497 ) 1. pytorch only need static lib, disable other libs. 2. disable override, pytorch only access mimalloc via cpu_alloc/cpu_free. Reference: https://github.com/microsoft/mimalloc/blob/master/CMakeLists.txt#L10-L25 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104497 Approved by: https://github.com/jgong5, https://github.com/albanD	2023-07-06 04:44:21 +00:00
Edward Z. Yang	3dc4adc7a6	Don't build CUDA with debug info by default. (#102617 ) Fixes https://github.com/pytorch/pytorch/issues/102594 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102617 Approved by: https://github.com/malfet	2023-07-05 20:16:19 +00:00
Connor Baker	0c8323e4a4	cmake: allow USE_SYSTEM_ZSTD (#104611 ) Fixes #44255. This is part of larger work I'm doing to allow for more `USE_SYSTEM_*` options to allow Nix to have faster re-builds of PyTorch: https://github.com/NixOS/nixpkgs/pull/239291. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104611 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-05 04:47:35 +00:00
Connor Baker	e8174faa02	cmake: respect USE_SYSTEM_LIBS when USE_NCCL is set (#104511 ) Even though `USE_SYSTEM_LIBS` is set to true, we still need to set `USE_SYSTEM_NCCL` for the system NCCL to be used. This fixes that by adding a conditional `set` similar to what is done for `USE_TBB`: `e9ebda29d8/CMakeLists.txt (L426-L428)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104511 Approved by: https://github.com/ezyang	2023-07-04 19:08:50 +00:00
Xu Han	6c1ccccf21	Enable mimalloc on pytorch Windows (#102595 ) This PR is implemention of [#102534](https://github.com/pytorch/pytorch/issues/102534), option 2. Major changes: 1. Add mimalloc to the submodule. 2. Add build option "USE_MIMALLOC". 3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance. Additional Test: <img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3"> This PR also build & static link mimalloc on Linux well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102595 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-27 08:53:26 +00:00
cyy	483f748dd5	[BE] Enforce missing `override` keyword (#104032 ) This PR enables `-Winconsistent-missing-destructor-override` and `-Winconsistent-missing-override` and fixes violations. <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 47e904e</samp> This pull request updates the code of various classes and operators in the `caffe2` and `aten` subdirectories to use the `override` specifier instead of the `virtual` keyword for destructors and other virtual functions that override a base class function. This improves the code readability, quality, and consistency with C++ best practices. It also modifies the `./CMakeLists.txt` file to enable warnings for these specifiers, but disable errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104032 Approved by: https://github.com/malfet	2023-06-24 02:34:24 +00:00
Nikita Shulga	0b7320315a	[CI] Move libtorch-debug CUDA build to CUDA-12.1 (#102756 ) To avoid nvcc segfaults, compile without `--source-in-ptx` option on CUDA-12.1+ <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 984e4b2</samp> > _Sing, O Muse, of the daring deeds of PyTorch, the swift and fiery_ > _framework that harnesses the power of CUDA, the blazing tool of Nvidia._ > _How they faced a mighty challenge when CUDA, the ever-shifting,_ > _released a new version, twelve point one, that broke their code and caused them grief._ Fixes https://github.com/pytorch/pytorch/issues/102372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102756 Approved by: https://github.com/atalman	2023-06-01 23:11:07 +00:00
Nikita Shulga	30cecc0e11	[MPS] Fix build regressions introduced by #92868 (#101036 ) https://github.com/pytorch/pytorch/pull/92868 introduced `OBJC` and `OBJCXX` language dialects, but fails to propagate some important flags, like OpenMP include path(if found), `-fno-objc-arc` and `-Wno-unguarded-availability-new` suppression. This PR remedies that and fixes https://github.com/pytorch/pytorch/issues/100925 <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 62677d4</samp> This pull request improves the support for MPSGraph on Apple platforms by fixing some CMake flags for parallelism and memory management. It modifies `cmake/Dependencies.cmake` and `CMakeLists.txt` accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101036 Approved by: https://github.com/atalman, https://github.com/huydhn	2023-05-10 04:15:41 +00:00
TachikakaMin	bb28f3f519	`USE_PRECOMPILED_HEADERS` is not supported on Apple M1 (#92868 ) Fixes #80018 ```bash MACOSX_DEPLOYMENT_TARGET=12.6 CC=gcc CXX=g++ DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 USE_PRECOMPILED_HEADERS=1 USE_MPS=1 python setup.py develop ``` `error: Objective-C was disabled in PCH file but is currently enabled` This PR(https://github.com/pytorch/pytorch/pull/80432) has been reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92868 Approved by: https://github.com/kulinseth, https://github.com/malfet	2023-05-08 16:03:34 +00:00
Bin Bao	e43918b93a	[inductor] Fix AOTInductor (#99203 ) Summary: Fix the broken AOTInductor flow and add a smoketest on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99203 Approved by: https://github.com/jansel	2023-04-25 14:42:12 +00:00
Nikita Shulga	6b8ef8ea4c	[BE] Build PyTorch with `-Wnewline-eof` (#99687 ) This would avoid further regressions like the ones reported in https://github.com/pytorch/pytorch/pull/96668#issuecomment-1468029259 Surround some ONNX/flatbuffer includes with `C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED("-Wnewline-eof")` cone of shame Fixes https://github.com/pytorch/pytorch/issues/96747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99687 Approved by: https://github.com/kit1980	2023-04-21 14:46:47 +00:00
Nikita Shulga	a8f5d72edf	Guard color diagnostics opts by compiler type (#98952 ) On Linux system where `/usr/bin/c++` is not a symlink to either `g++` or `clang++`, `try_compile` can still incorrectly identify `gcc` as supporting `-fcolor-diagnostics` flag. Rather than introducing a super complex condition (i.e. `USE_CCACHE` and `LINUX` ...) just guard the checks specific to compiler identifier. See https://github.com/ccache/ccache/issues/1275 Fixes https://github.com/pytorch/pytorch/issues/83500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98952 Approved by: https://github.com/albanD	2023-04-12 23:39:37 +00:00
Nikita Shulga	af0264ae08	[BE] Pass `-faligned-new` if supported by compiler (#97887 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 507f7a2</samp> > _`-faligned-new` flag_ > _always on for C++17_ > _simpler winter code_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97887 Approved by: https://github.com/atalman, https://github.com/Skylion007	2023-03-30 03:16:19 +00:00
QiangZiBro	a95815c6b7	fix compiler version detection on MacOS (#97883 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 43c1df6</samp> Fix build error on macOS with Xcode 12 or newer by updating clang version detection in `CMakeLists.txt`. Fixes https://github.com/pytorch/pytorch/issues/97882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97883 Approved by: https://github.com/malfet	2023-03-30 02:56:22 +00:00
Nikita Shulga	96e3b3ac72	[BE] Cleanup CMake flag suppressions (#97584 ) Use `append_cxx_flag_if_supported` to determine whether or not `-Werror` is supported Do not suppress deprecation warnings if glog is not used/installed, as the way check is written right now, it will suppress deprecations even if `glog` is not installed. Similarly, do not suppress deprecations on MacOS simply because we are compiling with protobuf. Fix deprecation warnings in: - MPS by replacing `MTLResourceOptionCPUCacheModeDefault`->`MTLResourceCPUCacheModeDefaultCache` - In GTests by replacing `TYPED_TEST_CASE`->`TYPED_TEST_SUITE` - In `codegen/onednn/interface.cpp`, by using passing `Stack` by reference rathern than pointer. Do not guard calls to `append_cxx_flag_if_supported` with `if(CLANG)` or `if(GCC)`. Fix some deprecated calls in `Metal` hide more complex exception under `C10_CLANG_DIAGNOSTIC_IGNORE` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97584 Approved by: https://github.com/kit1980	2023-03-27 18:46:09 +00:00
Nikita Shulga	14177f0d3d	[BE] Make `USE_FLASH_ATTENTION` private (#97579 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at b07152e</samp> This pull request refactors the CMake configuration to enable the `USE_FLASH_ATTENTION` feature for the `torch_cuda` target only, using a target-specific macro. This avoids conflicts with other libraries that also use this feature, such as fairseq. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97579 Approved by: https://github.com/kit1980	2023-03-25 05:41:07 +00:00
mikey dagitses	5f5d675587	remove unused CAFFE2_VERSION macros (#97337 ) remove unused CAFFE2_VERSION macros Summary: Nothing reads these and they are completely subsumed by TORCH_VERSION. Getting rid of these will be helpful for build unification, since they are also not used internally. Test Plan: Rely on CI. Reviewers: sahanp Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/97337 Approved by: https://github.com/malfet	2023-03-24 16:02:35 +00:00
Nikita Shulga	62c1e33fc9	[BE] Remove fast_nvcc tool (#96665 ) As of CUDA-11.4+ this functionality can be mimicked by passing [`--threads`](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#threads-number-t) option to CUDA compiler Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/96665 Approved by: https://github.com/atalman, https://github.com/PaliC	2023-03-14 03:17:31 +00:00
cyy	666efd8d5d	Improve ASAN and TSAN handling in cmake (#93147 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93147 Approved by: https://github.com/malfet	2023-03-07 14:10:13 +00:00
Peter Bell	c5f6092591	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2023-03-01 17:26:36 +00:00
PyTorch MergeBot	801b3f8fc7	Revert "Use FindCUDAToolkit to find cuda dependencies (#82695 )" This reverts commit `7289d22d67`. Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/peterbell10 due to Breaks torchaudio build	2023-02-28 02:29:09 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
cyy	c1fa403e57	suppress nvfuser loading warning when we disable nvfuser (#95603 ) To avoid annoying warnings such as "[W interface.cpp:47] Warning: Loading nvfuser library failed" Pull Request resolved: https://github.com/pytorch/pytorch/pull/95603 Approved by: https://github.com/ezyang	2023-02-27 18:56:46 +00:00
Peter Bell	7289d22d67	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2023-02-21 22:35:17 +00:00
cyy	1ab112cfab	code is clean enough that some warnings can be enabled (#95139 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95139 Approved by: https://github.com/Skylion007	2023-02-21 07:24:20 +00:00
jjsjann123	21eb7f70f1	Nvfuser python API import fix (#94036 ) 1. Having nvfuser python API import working with both devel and upstream; 2. Add environment variable to allow custom nvfuser code base to be built with upstream pytorch core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94036 Approved by: https://github.com/malfet, https://github.com/davidberard98	2023-02-16 20:10:40 +00:00
Jing Xu	8b37eff69f	remove abi uncertainty and potential abi conflict (#94306 ) Currently there is a potential conflict for `GLIBCXX_USE_CXX11_ABI` configuration if users don't explicitly set this variable. In `caffe2/CMakeLists.txt`, if the variable is not set, an `abi checker` will be used to retrieve the ABI configuration from compiler. https://github.com/pytorch/pytorch/blob/master/caffe2/CMakeLists.txt#L1165-L1183 However, in 'torch/csrc/Module.cpp`, if the variable is not set, it will be set to `0`. The conflict happens when the default ABI of the compiler is `1`. https://github.com/pytorch/pytorch/blob/master/torch/csrc/Module.cpp#L1612 This PR eliminate this uncertainty and potential conflict. The ABI will be checked and set in `CMakeLists.txt`, and pass the value to `caffe2/CMakeLists.txt`. Meanwhile, in case the `caffe2/CMakeLists.txt` is directly invoked from a `cmake` command, The original GLIBC check logic is kept in this file. If users doesn't explicitly assign a value to `GLIBCXX_USE_CXX11_ABI`, the `abi checker` will be executed and set the value accordingly. If the `abi checker` failed to compile or execute, the value will be set to `0`. If users explicitly assigned a value, then the provided value will be used. Moreover, if `GLIBCXX_USE_CXX11_ABI` is set to `0`, the '-DGLIBCXX_USE_CXX11_ABI=0' flag won't be appended to `CMAKE_CXX_FLAGS`. Thus, whether to use ABI=0 or ABI=1 fully depends on compiler's default configuration. It could cause an issue that even users explicitly set `GLIBCXX_USE_CXX11_ABI` to `0`, the compiler still builds the binaries with ABI=1. https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L44-L51 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94306 Approved by: https://github.com/malfet	2023-02-09 09:54:04 +00:00
cyy	9291f9b9e2	Simplify cmake code (#91546 ) We use various newer CMake features to simplify build system: 1.Caffe2::threads is replaced by threads::threads. 2.Some unused MSVC flags are removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91546 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-08 01:05:19 +00:00
PyTorch MergeBot	1063394898	Revert "Add fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries for _GLIBCXX_USE_CXX11_ABI=1 (#93835 )" This reverts commit `b562be793a`. Reverted https://github.com/pytorch/pytorch/pull/93835 on behalf of https://github.com/huydhn due to This breaks XLA build `b562be793a`	2023-02-07 04:49:06 +00:00
zhuhong61	b562be793a	Add fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries for _GLIBCXX_USE_CXX11_ABI=1 (#93835 ) Fixes #https://github.com/pytorch/pytorch/pull/92550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93835 Approved by: https://github.com/malfet	2023-02-07 03:05:39 +00:00
Aaron Gokaslan	2fc2ca7652	[BE]: Fix CMake LTO policy on pytorch (#93388 ) Not this is a non-functional change since non of our CIs actually build with LTO. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93388 Approved by: https://github.com/albanD	2023-02-01 17:06:53 +00:00
Dmytro Dzhulgakov	5105a8d3fc	Enable Kineto in OSS builds by fixing build condition (resubmit) (#93033 ) Resubmit of https://github.com/pytorch/pytorch/pull/89174 . I think I fixed underlying issues back then, but only CI would tell. Context: This PR enables Kineto on OSS builds because of how the flags were misconfigured before. I think generally having global observer in OSS is nice. There's some work to release on demand profiling with dynolog, and right now its build instructions start with "go change pytorch's CMake": https://github.com/facebookincubator/dynolog/blob/main/docs/pytorch_profiler.md#pytorch-setup The previous PR was reverted because of the bug in Kineto that got fixed in https://github.com/pytorch/kineto/pull/696 (and the submodule was updated since) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93033 Approved by: https://github.com/kimishpatel	2023-01-27 08:58:03 +00:00
jjsjann123	c11b301bcd	[NVFUSER] refactor nvfuser build (#89621 ) This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library. Contents inside this PR: 1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp) 2. splits the build system so nvfuser is generating its own `.so` files. Currently there are: - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser` 3. nvfuser cpp tests is currently being compiled into `nvfuser_tests` 4. cmake is refactored so that: - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`. - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built. - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary` Future work that's scoped in following PR: - Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet - Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621 Approved by: https://github.com/davidberard98	2023-01-26 02:50:44 +00:00
Driss Guessous	a3715efd8b	Remove windows check for cmake to build Fused kernels (#91909 ) # Summary Add support for fused attention kernels (FlashAttention and memory-efficient attention) on Windows. Previously we could not do this because the fixes required c++17 to do this but we have since update the PyTorch standard. This PR: - Changes invocations of unsigned long to the fixed width integer type - Adds in the #define FP16_SWITCH(COND, ...) which has been added to the flash_attention main branch - Changes the some macros used within mem-efficient attention code in order to work around the VA_ARG discrepancy between clang/gcc and msvc. An alternative would be setting the global flag Zc:preprocessor - Selectively applies /Zc:lambda to only the mem-efficient sources since applying this globally caused quantization files to not compile Pull Request resolved: https://github.com/pytorch/pytorch/pull/91909 Approved by: https://github.com/cpuhrsch	2023-01-25 01:21:12 +00:00
PyTorch MergeBot	523d4f2562	Revert "[cuDNN][cuDNN V8 API] Always build assuming cuDNN >= 8.0 (#91527 )" This reverts commit `4d07ad74f1`. Reverted https://github.com/pytorch/pytorch/pull/91527 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-01-16 13:28:09 +00:00
Edward Z. Yang	1da0ac2c93	Enable -Werror=bool-operation (#92221 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92221 Approved by: https://github.com/Skylion007	2023-01-15 20:49:53 +00:00
Eddie Yan	4d07ad74f1	[cuDNN][cuDNN V8 API] Always build assuming cuDNN >= 8.0 (#91527 ) We've been building with V8 (incl. V8 API) by default for a while now; this PR cleans up some guards for cuDNN < 8.0. CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/91527 Approved by: https://github.com/ngimel	2023-01-13 18:55:37 +00:00
Huy Do	33e3c9ac67	Not explicitly set the manifest filename in Windows (#91988 ) I'm at a loss to explain why this happens, but not setting the manifest file explicitly in the linker fixes it. ### Testing locally * With `/MANIFESTFILE:bin\torch_python.dll.manifest` ``` C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\torch_python.rsp /out:bin\torch_python.dll /implib:lib\torch_python.lib /pdb:bin\torch_python.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /NODEFAULTLIB:LIBCMT.LIB -WHOLEARCHIVE:C:/actions-runner/_work/pytorch/pytorch/build/lib/onnx.lib /MANIFEST /MANIFESTFILE:bin\torch_python.dll.manifest LINK : fatal error LNK1000: Internal error during CImplib::EmitImportThunk ``` * Work fine without the flag ``` C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\torch_python.rsp /out:bin\torch_python.dll /implib:lib\torch_python.lib /pdb:bin\torch_python.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /NODEFAULTLIB:LIBCMT.LIB -WHOLEARCHIVE:C:/actions-runner/_work/pytorch/pytorch/build/lib/onnx.lib /MANIFEST ``` In both case, the `/MANIFEST` flag is set, so the manifest file is there. In the latter case, the filename comes by appending `.manifest` suffix to `bin\torch_python.dll`. Thus, it's still correctly be `bin\torch_python.dll.manifest`. Weird. ``` C:\actions-runner\_work\pytorch\pytorch>ls -la build/bin/torch_* -rwxr-xr-x 1 runneruser 197121 246796288 Jan 11 04:30 build/bin/torch_cpu.dll -rw-r--r-- 1 runneruser 197121 381 Jan 11 04:26 build/bin/torch_cpu.dll.manifest -rwxr-xr-x 1 runneruser 197121 9728 Jan 11 03:55 build/bin/torch_global_deps.dll -rw-r--r-- 1 runneruser 197121 381 Jan 11 03:55 build/bin/torch_global_deps.dll.manifest -rwxr-xr-x 1 runneruser 197121 11746816 Jan 11 04:31 build/bin/torch_python.dll -rw-r--r-- 1 runneruser 197121 381 Jan 11 04:30 build/bin/torch_python.dll.manifest ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91988 Approved by: https://github.com/malfet, https://github.com/Blackhex, https://github.com/ZainRizvi	2023-01-11 22:28:08 +00:00
salilsdesai	ec94cbc66a	[Vulkan] Remove GLSL Code Gen (#91912 ) @bypass-github-export-checks GLSL Code Gen is not used, so this diff removes - GLSL parts of ShaderSource - Anything enclosed by USE_VULKAN_SHADERC_RUNTIME, as well as the flag itself - gen_vulkan_glsl script Plus some additional refactoring Differential Revision: [D41358861](https://our.internmc.facebook.com/intern/diff/D41358861/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41358861/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/91912 Approved by: https://github.com/mcr229	2023-01-10 20:29:47 +00:00
cyy	9710ac6531	Some CMake and CUDA cleanup given recent update to C++17 (#90599 ) The main changes are: 1. Remove outdated checks for old compiler versions because they can't support C++17. 2. Remove outdated CMake checks because it now requires 3.18. 3. Remove outdated CUDA checks because we are moving to CUDA 11. Almost all changes are in CMake files for easy audition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599 Approved by: https://github.com/soumith	2022-12-30 11:19:26 +00:00
Mengwei Liu	2f154f68ea	[torchgen] Add CI job to make sure torchgen works for Executorch op registration (#89596 ) ## Job Test running on most CI jobs. ## Test binary * `test_main.cpp`: entry for gtest * `test_operator_registration.cpp`: test cases for gtest ## Helper sources * `operator_registry.h/cpp`: simple operator registry for testing purpose. * `Evalue.h`: a boxed data type that wraps ATen types, for testing purpose. * `selected_operators.yaml`: operators Executorch care about so far, we should cover all of them. ## Templates * `NativeFunctions.h`: for generating headers for native functions. (not compiled in the test, since we will be using `libtorch`) * `RegisterCodegenUnboxedKernels.cpp`: for registering boxed operators. * `Functions.h`: for declaring operator C++ APIs. Generated `Functions.h` merely wraps `ATen/Functions.h`. ## Build files * `CMakeLists.txt`: generate code to register ops. * `build.sh`: driver file, to be called by CI job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89596 Approved by: https://github.com/ezyang	2022-12-21 03:07:32 +00:00
mikey dagitses	322e4b4c8a	set -Wsuggest-override for builds (#89852 ) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852). * __->__ #89852 * #89851 set -Wsuggest-override for builds Summary: This was flagged by a Meta internal build. Test Plan: Rely on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852 Approved by: https://github.com/malfet	2022-12-19 22:08:47 +00:00
mikey dagitses	8bd959e462	set -Winconsistent-missing-override for builds (#89851 ) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89851). * #89852 * __->__ #89851 set -Winconsistent-missing-override for builds Summary: This has triggered internally on some PyTorch code. Test Plan: Rely on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89851 Approved by: https://github.com/malfet	2022-12-17 00:30:06 +00:00
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Nikita Shulga	3ad2a032f4	Update default cmake to 3.18 (#89570 ) Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh ` Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570 Approved by: https://github.com/atalman	2022-11-23 23:23:26 +00:00
PyTorch MergeBot	902e4e3926	Revert "Fix the kineto daemon build condition (#89174 )" This reverts commit `9fd00f194a`. Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil.	2022-11-23 19:05:14 +00:00
mikey dagitses	92f9214a31	add -Wnarrowing as error to cmake builds (#89207 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89207 Approved by: https://github.com/wconstab, https://github.com/malfet	2022-11-18 03:16:18 +00:00
Dmytro Dzhulgakov	9fd00f194a	Fix the kineto daemon build condition (#89174 ) If we're not building the lite interpreter we shouldn't be disabling Kineto. This eliminates a step from https://github.com/facebookincubator/dynolog/blob/main/docs/pytorch_profiler.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/89174 Approved by: https://github.com/kimishpatel, https://github.com/malfet	2022-11-18 02:42:45 +00:00
Pruthvi Madugundu	fbd08fb358	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-04 04:43:05 +00:00
PyTorch MergeBot	0fa23663cc	Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 )" This reverts commit `1e2c4a6e0e`. Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev	2022-11-02 18:13:37 +00:00
Pruthvi Madugundu	1e2c4a6e0e	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-02 17:41:57 +00:00
Nikita Shulga	e1c123d29a	Add UBSAN to ASAN (#88055 ) Add undefined behavior sanitizer to `USE_ASAN` option. Added `torch._C._crash_if_vptr_ubsan()` that only fails if vptr belongs to a wrong class after typecast Deleted all ubsan supressions, but disabled `ProtoTest::Basic` as it fails above-mentioned vptr check. Fixes https://github.com/pytorch/pytorch/issues/88042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88055 Approved by: https://github.com/ezyang	2022-11-01 17:59:35 +00:00
Radek Bartoň	ba26bc0fc2	Fix random "C1041: cannot open program database" errors when compiling on Windows (#88084 ) Adds `/FS` option to `CMAKE_CXX_FLAGS` and `CMAKE_CUDA_FLAGS`. So far I've encountered this kind of errors: ``` C:\Users\MyUser\AppData\Local\Temp\tmpxft_00004728_00000000-7_cuda.cudafe1.cpp: fatal error C1041: cannot open program database 'C:\Projects\pytorch\build\third_party\gloo\gloo\CMakeFiles\gloo_cuda.dir\vc140.pdb'; if multiple CL.EXE write to the same .PDB file, please use /FS ``` when building with VS 2022. cc @peterjc123 @mszhanyi @skyline75489 @nbcsm Related issues: - https://github.com/pytorch/pytorch/issues/87691 - https://github.com/pytorch/pytorch/issues/39989 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88084 Approved by: https://github.com/ezyang	2022-10-31 21:11:16 +00:00
Nikita Shulga	30ea8f5c20	Limit ROCM option to Linux only (#87833 ) As it's not available on neither Windows nor MacOS cc @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/87833 Approved by: https://github.com/kit1980	2022-10-27 01:24:03 +00:00
Nikita Shulga	c28cdb53ea	[BE] Delete BUILD_SPLIT_CUDA option (#87502 ) As we are linking with cuDNN and cuBLAS dynamically for all configs anyway, as statically linked cuDNN is different library than dynamically linked one, increases default memory footprint, etc, and libtorch_cuda even if compiled for all GPU architectures is no longer approaching 2Gb binary size limit, so BUILD_SPLIT_CUDA can go away. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87502 Approved by: https://github.com/atalman	2022-10-22 06:00:59 +00:00
albanD	c141f28b64	Fix compilation warning and spurious print (#87297 ) Fixes compilation warning, make this warning an error and remove a random print. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87297 Approved by: https://github.com/malfet	2022-10-19 20:56:37 +00:00
Will Constable	78ef40973c	Set -Werror=braced-scalar-init (#86911 ) - `vector<T>({0})` would give you the vector(size, ...) ctor and produce an empty vector of T, along with the scalar-init warning - `vector<T>({T(0)})` would give you the vector of a single T(0) as you might have intended, and bypasses the warning/error - the warning can easily be missed but can have serious consequences, so make it an error Pull Request resolved: https://github.com/pytorch/pytorch/pull/86911 Approved by: https://github.com/albanD	2022-10-14 22:34:36 +00:00
Nikita Shulga	09364f4298	Compile C10 with `Wshadow` (#86666 ) This should prevent further regressions like https://github.com/pytorch/pytorch/pull/86646 Update `fmt` to `7.1.0` to fix variable shadowing in that library Pull Request resolved: https://github.com/pytorch/pytorch/pull/86666 Approved by: https://github.com/seemethere	2022-10-11 22:39:58 +00:00
Huy Do	7f02f2ac0c	[Experimentation] Add TSAN build and test (#85313 ) Some parts of the PR are adopted from the previously abandoned https://github.com/pytorch/pytorch/pull/36694. This PR is the first part to setup TSAN jobs in the CI. The data race warnings from TSAN will need to be reviewed later in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85313 Approved by: https://github.com/osalpekar	2022-10-11 19:34:44 +00:00
PyTorch MergeBot	deb414a43f	Revert "Use FindCUDAToolkit to find cuda dependencies (#82695 )" This reverts commit `fb9b96593c`. Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/malfet due to Break cublas packaging into wheel	2022-10-11 02:50:47 +00:00
Peter Bell	fb9b96593c	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2022-10-06 15:43:39 +00:00
Sahan Paliskara	936e93058b	Delete torch::deploy from pytorch core (#85953 ) As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there. This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953 Approved by: https://github.com/seemethere, https://github.com/malfet	2022-10-06 07:20:16 +00:00
Nirav Mehta	d724a91935	Adding Wunused-local-typedef build flag (#86154 ) # Summary In the past, we have seen PRs causing internal breakages caused by `-Wunused-local-typedef` flag which than had to be fixed. For example: [#79978](https://github.com/pytorch/pytorch/pull/79978) As part of this change, we want to catch this error in the PR Checks itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86154 Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/osalpekar	2022-10-04 19:43:57 +00:00
Nikita Shulga	4c6dc6a1a4	[BE] Do not use VLA (#85800 ) [Variable Length Array](https://en.wikipedia.org/wiki/Variable-length_array) is part of C99 standard, but has never been adopted to C++ Also, warning if they are used (surprisingly those warnings can not be turned into errors. Remove code duplication in `OperationUtils.mm` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85800 Approved by: https://github.com/kulinseth, https://github.com/jeanschmidt	2022-09-28 17:12:25 +00:00
Omkar Salpekar	f4251525de	Adding Wunused-lambda-capture to Clang build flags (#85655 ) Add `-Wunused-lambda-capture` to clang build flags to better align internal and OSS build systems. This flag is not supported in gcc so only adding for clang builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85655 Approved by: https://github.com/huydhn	2022-09-27 18:11:18 +00:00
Huy Do	e4471032da	Enforce non-virtual-dtor everywhere (#85586 ) This can finally be removed because NVIDIA has merged my PR on https://github.com/NVIDIA/cudnn-frontend/pull/33 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85586 Approved by: https://github.com/seemethere, https://github.com/ZainRizvi	2022-09-26 21:35:00 +00:00
cpuhrsch	6a04df3ac8	Get flash_attn to compile for CUDA 11.6 linux nightly build (#84941 ) This PR only attempts to get this code to compile for all archs so that we can dispatch to it in https://github.com/pytorch/pytorch/pull/84653 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84941 Approved by: https://github.com/drisspg, https://github.com/malfet	2022-09-26 20:49:19 +00:00
Nikita Shulga	d05a11337c	[CMake] Add functorch target (#83464 ) Move functorch/functorch into `functorch` folder - Add functorch/CMakeLists.txt that adds `functorch` native python exension - Modify `setup.py` to package pytorch and functorch together into a single wheel - Modify `functorch.__version__` is not equal to that of `torch.__version__` - Add dummy `functorch/setup.py` file for the projects that still want to build it Differential Revision: [D39058811](https://our.internmc.facebook.com/intern/diff/D39058811) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83464 Approved by: https://github.com/zou3519	2022-09-14 00:05:33 +00:00
Driss Guessous	0fc02dbba4	flash_attention integration (#81434 ) # Summary: - I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on. - Only looked at CMake did not attempt bazel or buck yet. - I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434 Approved by: https://github.com/cpuhrsch	2022-09-09 20:11:26 +00:00
Dhruv Matani	0d46bfac5b	[Mobile] Use -ffunction-sections and -fdata-sections for Mobile builds (#84704 ) Summary: Set `-ffunction-sections` and `-fdata-sections` so that each method has its own text section. This allows the linker to remove unused section when the flag `-Wl,-gc-sections` is provided at link time. Test Plan: CI and local build using `build_mobile.sh` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84704 Approved by: https://github.com/JacobSzwejbka, https://github.com/cccclai	2022-09-09 15:07:35 +00:00
Dhruv Matani	747f27a9ad	[Mobile] Update build_mobile.sh to allow lite interpreter and tracing based builds (#84647 ) Summary: Currently, build_mobile.sh doesn't allow lite interpreter builds or tracing based selective builds. build_mobile.sh is used for host builds of PyTorch for Mobile deployment. Additionally, certain flags such as `USE_BLAS` were not being respected as they should be. This change addresses that as well. Test Plan: Build using: ``` cat /tmp/selected_ops.yaml - aten::add - aten::sub ``` ``` BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN=1 USE_LIGHTWEIGHT_DISPATCH=0 BUILD_LITE_INTERPRETER=1 SELECTED_OP_LIST=/tmp/selected_ops.yaml ./scripts/build_mobile.sh ``` ``` cat /tmp/main.cpp int main() { auto m = torch::jit::_load_for_mobile("/tmp/path_to_model.ptl"); auto res = m.forward({}); return 0; } ``` Test using: ``` g++ /tmp/main.cpp -L build_mobile/lib/ -I build_mobile/install/include/ -lpthread -lc10 -ltorch_cpu -ltorch -lXNNPACK -lpytorch_qnnpack -lcpuinfo -lclog -lpthreadpool -lgloo -lkineto -lfmt -ldl -lc10 ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84647 Approved by: https://github.com/JacobSzwejbka, https://github.com/cccclai	2022-09-09 15:02:29 +00:00
John Detloff	e0229d6517	Remove caffe2 mobile (#84338 ) We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338 Approved by: https://github.com/dreiss	2022-09-08 01:49:55 +00:00
Xiao Wang	b18f984307	[cmake] Change COLORIZE_OUTPUT option to USE_COLORIZE_OUTPUT (#83716 ) Close https://github.com/pytorch/pytorch/issues/83500 Change COLORIZE_OUTPUT option to USE_COLORIZE_OUTPUT so that it can be passed and disabled through environment variable. Not sure why COLORIZE_OUTPUT=0 didn't work before but USE_COLORIZE_OUTPUT=0 works after. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83716 Approved by: https://github.com/malfet	2022-08-23 01:09:29 +00:00
Peter Bell	d07a9ba11b	Don't build nvfuser benchmarks by default (#67857 ) None of the other benchmarks are built by default, so this seems unnecessary. cc @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67857 Approved by: https://github.com/jjsjann123, https://github.com/janeyx99	2022-08-12 16:44:41 +00:00
Nikita Shulga	5e477714fa	[BE] Fix MPS build warnings (#83048 ) Mostly get rid of unused variables, but also: - Hide `-[MPSGraphTensorData printNDArray]` behind undefined method access - Rename several methods in ScatterGather/TriangularOps: - `scatterAlongAxisWithDataTensor:`->`scatterAlongAxis:withDataTensor:` - `gatherAlongAxisWithUpdatesTensor:`->`gatherAlongAxis:withUpdatesTensor:` - `getCoordinateValueWithShapeTensor:`->`coordinateAlongAxisTensor:withShapeTensor:` - Add `-Wno-unguarded-availability-new` to suppress 12.3+ availability warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/83048 Approved by: https://github.com/albanD	2022-08-10 17:29:44 +00:00
Nikita Shulga	62c8d30f9f	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-10 14:32:26 +00:00
PyTorch MergeBot	d3a1f17fc7	Revert "[BE] Add `append_cxx_flag_if_supported` macro (#82883 )" This reverts commit `d7e6aaa59b`. Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-10 10:27:59 +00:00
Nikita Shulga	d7e6aaa59b	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-08 21:04:09 +00:00
Tongliang Liao	dff70a5e1a	Make language std configurable. (#75519 ) RocksDB 7 starts to use C++17 in header. We should make this configurable, in case user needs higher std version. List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`. Doc string is from CMake code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519 Approved by: https://github.com/malfet	2022-07-13 14:21:27 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
atalman	d552ba3b4f	Use fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries (#81058 ) Fixes: #80489 Test using cuda 11.3 manywheel binary: ``` import torch print(torch.__version__) print(torch._C._PYBIND11_BUILD_ABI) ```` Output ``` 1.13.0.dev20220707+cu113 _cxxabi1011 ``` Functorch test torch : 1.13.0.dev20220707+cu113, functorch with cu102 ``` import torch print(torch.__version__) print(torch._C._PYBIND11_BUILD_ABI) from functorch import vmap x = torch.randn(2, 3, 5) vmap(lambda x: x, out_dims=3)(x) ``` Output ``` 1.13.0.dev20220707+cu113 _cxxabi1011 /home/atalman/temp/testc1.py:5: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:73.) x = torch.randn(2, 3, 5) Traceback (most recent call last): File "/home/atalman/temp/testc1.py", line 6, in <module> vmap(lambda x: x, out_dims=3)(x) File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 361, in wrapped return _flat_vmap( File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 488, in _flat_vmap return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func) File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched flat_outputs = [ File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp> _remove_batch_dim(batched_output, vmap_level, batch_size, out_dim) IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3) ``` Related Builder PR: https://github.com/pytorch/builder/pull/1083 Test PR: https://github.com/pytorch/pytorch/pull/81232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81058 Approved by: https://github.com/zou3519, https://github.com/malfet	2022-07-12 17:56:33 +00:00
Terry Lam	54bdaf76d6	[PFC] Native UCC process group for Pytorch (#79918 ) Summary: This diff integrates UCC process group as a native component of Pytorch Distributed core. It is based on the existing torch-ucc (https://github.com/facebookresearch/torch_ucc) as the wrapper for UCC collective communication library. The environment and cmake variables are named in mirroring to the existing process groups such as NCCL and Gloo. Specifically, - USE_UCC: enables UCC PG. This defaults to OFF, so there is no breakage of existing builds that do not have UCX/UCC external libraries. - USE_SYSTEM_UCC: uses external UCX and UCC shared libraries that are set accordingly with UCX_HOME and UCC_HOME. Currently, this diff only supports USE_SYSTEM_UCC=ON, i.e., requiring users to specify external libraries for UCX and UCC. In subsequent diffs, we will add UCX and UCC repos as third-party dependencies in pytorch/third-party. Test Plan: Passed Torch-UCC tests that invoke UCC process group. For example: $ sh test/start_test.sh test/torch_allreduce_test.py --backend gloo --use-cuda ... Test allreduce: succeeded Differential Revision: D36973688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79918 Approved by: https://github.com/kwen2501, https://github.com/kingchc	2022-07-12 14:45:44 +00:00
Nikita Shulga	2beb57a823	Add `-Werror=non-virtual-dtor` (reland) (#81012 ) This PR relands #80584, but instead of adding suppression in CMakeLists.txt suppresses it directly in `llvm_codegen.cpp` and just for a single header. In general, it's better to avoid `set_target_properties` pattern for suppressing warnings, as it makes build brittle and hard to debug/understand Test plan: wait for `ciflow/binaries_wheel` to finish Pull Request resolved: https://github.com/pytorch/pytorch/pull/81012 Approved by: https://github.com/huydhn, https://github.com/kit1980	2022-07-07 05:33:55 +00:00
PyTorch MergeBot	0491c10a63	Revert "Add -Werror=non-virtual-dtor (#80584 )" This reverts commit `7670035862`. Reverted https://github.com/pytorch/pytorch/pull/80584 on behalf of https://github.com/malfet due to Broke nighly builds, see https://github.com/pytorch/pytorch/runs/7209779559?check_suite_focus=true	2022-07-06 22:26:59 +00:00
Ronak Malik	d03f989d53	[ROCm] Load ROCm if Torch is used as a dependency (#80469 ) Includes LoadHIP.cmake if pytorch is used as a dependency for another project and ROCm is enabled. This removes the need to explicitly link against ROCm libraries in extension projects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80469 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-07-05 21:04:07 +00:00
Huy Do	7670035862	Add -Werror=non-virtual-dtor (#80584 ) This also resolves https://github.com/pytorch/pytorch/pull/77323 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80584 Approved by: https://github.com/seemethere	2022-07-04 16:54:47 +00:00
PyTorch MergeBot	1454515253	Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 )" This reverts commit `f988aa2b3f`. Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see `f988aa2b3f`	2022-06-30 12:49:41 +00:00
Jing Xu	f988aa2b3f	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-06-30 05:14:03 +00:00
Huy Do	39bd81a11f	Add clang -Wconstant-conversion (#80461 ) This catchs the compilation error detected in https://github.com/pytorch/pytorch/pull/75400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80461 Approved by: https://github.com/osalpekar	2022-06-29 23:42:20 +00:00
Nikita Shulga	b370959da1	[MPS] Make it compilable with either xCode or CLI (#79430 ) `xcrun --sdk macosx --show-sdk-version` works with either CommandLineTools or Xcode, but `xcodebuild -sdk macosx -version SDKVersion` works only if full Xcode is installed, which is not necessary to build PyTorch Above command yield the same output when Xcode is installed: ``` % xcodebuild -sdk macosx -version SDKVersion 12.3 % xcrun --sdk macosx --show-sdk-version 12.3 ``` But first one fails if Xcode is missing: ``` % xcodebuild -sdk macosx -version SDKVersion xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance % xcrun --sdk macosx --show-sdk-version 12.3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79430 Approved by: https://github.com/albanD	2022-06-13 21:03:48 +00:00
Nikita Shulga	3255ddeec9	Make `Wunused-local-typedef` a hard error (#77918 ) Only allow it for `libtorch_python` and tests Helps prevent regression like https://github.com/pytorch/pytorch/pull/76547#issuecomment-1132208232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77918 Approved by: https://github.com/osalpekar, https://github.com/seemethere	2022-06-09 18:14:01 +00:00
Nikita Shulga	634954c55c	[MPS] Do not pass linker command to a compiler (#78630 ) `-weak_framework` is a linker rather than a compiler option and as such it should not be passed as CXX flag Also, use `string(APPEND` rather than `set(FOO "$(FOO) ...)` Likely fixes our ability to use `sccache` for MacOS CI builds, see https://github.com/pytorch/pytorch/issues/78375#issuecomment-1143697183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78630 Approved by: https://github.com/albanD	2022-06-01 22:08:54 +00:00
Alban Desmaison	fd121dfeec	Move x86 binaries builder to macos-12 to enable MPS build Pull Request resolved: https://github.com/pytorch/pytorch/pull/77662 Approved by: https://github.com/seemethere	2022-05-19 21:59:08 +00:00
Peter Bell	5cdf79fddc	Bump minimum CMake version to 3.13 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76312 Approved by: https://github.com/malfet	2022-05-19 15:38:55 +00:00
Eddie Yan	14ab3ff484	[cuDNN V8 API] Enable cuDNN v8 API by default (#75466 ) Testing via CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/75466 Approved by: https://github.com/ngimel	2022-05-17 21:54:17 +00:00
Alban Desmaison	cf975dde0d	Make sure that we can build without xcode on mac (#77450 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77450 Approved by: https://github.com/drisspg, https://github.com/kulinseth	2022-05-13 21:18:55 +00:00
Kulin Seth	e011a8e18b	Enable PyTorch operations on MPS Backend. (#77343 ) Add PyTorch operations to MPS backend. - https://github.com/pytorch/pytorch/issues/77394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343 Approved by: https://github.com/albanD	2022-05-13 18:28:53 +00:00
sanchitintel	4ee29d6033	[Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5) Re-landing #68111/#74596 ## Description v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of #50256, the below improvements are included: * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` `torch.jit.freeze` should be used after tracing (recommended) or scripting a model. ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: * SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) * SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code is placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is in: ``` caffe2/CMakeLists.txt cmake/public/mkldnn.cmake cmake/Modules/FindMKLDNN.cmake ``` ## Limitations * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step. * We have only optimized the inference use-case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622 Approved by: https://github.com/eellison	2022-05-05 16:57:03 +00:00
Eddie Yan	e838137b3e	Add high level control of fp32 matmul precision; disable TF32 for matmuls by default #76440 CC @mruberry @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/76509 Approved by: https://github.com/ngimel	2022-05-04 20:40:13 +00:00
Nikita Shulga	8473173c36	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency. Add `third_party` to torch_cpu include directories if compiling with Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-05-03 20:21:55 +00:00
PyTorch MergeBot	3dcd67a1b3	Revert "[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)" This reverts commit `8b11d81058`. Reverted https://github.com/pytorch/pytorch/pull/74596 on behalf of https://github.com/janeyx99	2022-04-29 15:40:17 +00:00
chunyuan	8b11d81058	[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1) Re-landing https://github.com/pytorch/pytorch/pull/68111 ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596 Approved by: https://github.com/malfet	2022-04-29 01:01:33 +00:00
Kulin Seth	54c75e1e8f	Add "mps" device to PyTorch framework. Remove the "mlc" device for Mac platforms. This commit will be followed up with: * adding MPS runtime components * PyTorch ops for MPS device Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/76291 Approved by: https://github.com/albanD	2022-04-27 19:21:57 +00:00
PyTorch MergeBot	d79d9fa283	Revert "Remove breakpad dependency" This reverts commit `9aa3c7fd83`. Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet	2022-04-17 17:58:51 +00:00
Nikita Shulga	9aa3c7fd83	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-04-17 17:43:45 +00:00
Nikita Shulga	bdf5a87714	Extend sign-compare warnings to gcc (take 2) Remove `-Wno-sign-compare` option for GCC Suppress erroneous sign-compare warning in `c10::greater_than_max`(see https://godbolt.org/z/Tr3Msnz99) Fix sign-compare in torch/deploy, `caffe2::QTensor::dim32()` and `generate_proposals_op_test.cc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75544 Approved by: https://github.com/osalpekar	2022-04-13 00:06:52 +00:00
Edward Z. Yang	c2124f5c66	Turn on -Wsign-compare This is enabled on some of our internal builds, is a common source of fbcode only errors and apparently we are relatively clean on it. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74996 Approved by: https://github.com/malfet	2022-04-12 18:58:14 +00:00
PyTorch MergeBot	80e05b7df4	Revert "Extend sign-compare warnings to gcc" This reverts commit `34446653c7`. Reverted https://github.com/pytorch/pytorch/pull/75544 on behalf of https://github.com/janeyx99	2022-04-12 18:22:53 +00:00
Nikita Shulga	34446653c7	Extend sign-compare warnings to gcc Remove `-Wno-sign-compare` option for GCC Suppress erroneous sign-compare warning in `c10::greater_than_max`(see https://godbolt.org/z/Tr3Msnz99) Fix sign-compare in torch/deploy Pull Request resolved: https://github.com/pytorch/pytorch/pull/75544 Approved by: https://github.com/osalpekar	2022-04-12 17:36:48 +00:00
Nikita Shulga	90a56fc515	Add `-Wsign-compare` to list of clang flags It caused a number of internal only compilation failures, for example see: https://github.com/pytorch/pytorch/pull/74425#issuecomment-1075476438 and https://github.com/pytorch/pytorch/pull/74542#issuecomment-1083518880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75085 Approved by: https://github.com/ngimel, https://github.com/albanD	2022-04-05 14:16:47 +00:00
Xiang Gao	3b29bd00eb	Make ProcessGroupNCCL load torch_ucc.so when TORCH_UCC_LIBRARY_PATH is set (#69552 ) Summary: This is the very first step for the UCC-NCCL integration. This PR lets `ProcessGroupNCCL` load the `torch_ucc.so` if the user specifies an environmental variable `TORCH_UCC_LIBRARY_PATH`. If this environment variable is not specified by the user, then there will be no visible change. In the future, we may want to make PyTorch smart enough to automatically detect the `torch_ucc.so` in the user's system, but before doing that, I believe we should first make sure that `ProcessGroupUCC` is very well tested. Note that in this PR, `ProcessGroupNCCL` just loads the library but will not use it. I am trying to make PRs small, so the usage of `torch_ucc.so` will be submitted in later PRs. This PR requires the change in https://github.com/facebookresearch/torch_ucc/pull/56, otherwise `torch_ucc.so` can not be successfully loaded. But his PR can be landed separately without waiting for https://github.com/facebookresearch/torch_ucc/pull/56 because, in PyTorch's unit tests, UCC is never used or tested. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69552 Reviewed By: mruberry Differential Revision: D34675212 Pulled By: jiayisuse fbshipit-source-id: a3d1fb98340dbe3a931af555423863efd381f1ae (cherry picked from commit 3778b6fabe70c26b5a65e6ddec641d2ef9113cd1)	2022-03-25 18:19:39 +00:00
Will Constable	3547f20872	Land remaining parts of Torchscript Lazy Tensor backend (#74111 ) Summary: Also enables bazel build to run lazy codegen. Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111 Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds Reviewed By: bdhirsh Differential Revision: D34772403 fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496 (cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)	2022-03-22 23:14:03 +00:00
Edward Z. Yang	493bbdc4fe	Use shared CUPTI by default Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74009 Approved by: https://github.com/malfet	2022-03-16 21:04:12 +00:00
Ashwin Hari	7ed73b2803	CMake option for using static MKL libraries Fixes #70587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73069 Approved by: https://github.com/malfet	2022-03-07 19:32:33 +00:00
Mengwei Liu	9ce9803abe	[PyTorch] Add codegen unboxing ability (#69881 ) Summary: RFC: https://github.com/pytorch/rfcs/pull/40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)	2022-03-01 23:28:13 +00:00
Nikita Shulga	6302cdb9bc	[Reland] Add BUILD_LAZY_CUDA_LINALG option (#73447 ) Summary: When enabled, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Avoid symbol clashes that can result in infinite recursion by moving all symbols in the library to its own namespace. Add checks that should prevent calling self in recursion to `LinearAlgebraStubs.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/73447 Reviewed By: albanD Differential Revision: D34538827 Pulled By: malfet fbshipit-source-id: f2535b471d3524768a84b2e169b6aa24c26c03bf (cherry picked from commit 4ec24b079c861c1122f0fa86e280b977c3c2f7ac)	2022-03-01 21:33:07 +00:00
Andrey Talman	197764b35d	Remove cuda 11.1 references (#73514 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/73377 We've migrated to CUDA-11.3 as default toolkit in 1.9, it's time to stop builds (especially considering forward-compatibility guarantee across CUDA-11.x drivers) Hence we are removing CUDA 11.1 support. We should also cleanup old cuda related code from our builder and pytorch repo making scripts a little more clean. We have code that references cuda 9.2 , 10.1 , 11.0, 11.1, 11.2 and none of these are currently use Pull Request resolved: https://github.com/pytorch/pytorch/pull/73514 Reviewed By: janeyx99 Differential Revision: D34551989 Pulled By: atalman fbshipit-source-id: 9ceaaa9b25ad49689986f4b29a26d20370d9d011 (cherry picked from commit fe109c62daf429e9053c03f6e374568ba23cd041)	2022-03-01 16:37:37 +00:00
Jane Xu	31271284bc	Revert D33992795: Add BUILD_LAZY_CUDA_LINALG option Test Plan: revert-hammer Differential Revision: D33992795 (`82130758f0`) Original commit changeset: d1fa351a3206 Original Phabricator Diff: D33992795 (`82130758f0`) fbshipit-source-id: f0a66d7431aea2c358718eef16fab05712cd6cae (cherry picked from commit df4900115f712e477ed5cc97510e6515a1ca17a9)	2022-02-25 18:37:31 +00:00
Digant Desai	b2054d3025	Prepare for an update to the XNNPACK submodule (#72642 ) Summary: - Target Sha1: ae108ef49aa5623b896fc93d4298c49d1750d9ba - Make USE_XNNPACK a dependent option on cmake minimum version 3.12 - Print USE_XNNPACK under cmake options summary, and print the availability from collet_env.py - Skip XNNPACK based tests when XNNPACK is not available - Add SkipIfNoXNNPACK wrapper to skip tests - Update cmake version for xenial-py3.7-gcc5.4 image to 3.12.4 - This is required for the backwards compatibility test. The PyTorch op schema is XNNPACK dependent. See, aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp for example. The nightly version is assumed to have USE_XNNPACK=ON, so with this change we ensure that the test build can also have XNNPACK. - HACK: skipping test_xnnpack_integration tests on ROCM Pull Request resolved: https://github.com/pytorch/pytorch/pull/72642 Reviewed By: kimishpatel Differential Revision: D34456794 Pulled By: digantdesai fbshipit-source-id: 85dbfe0211de7846d8a84321b14fdb061cd6c037 (cherry picked from commit 6cf48e7b64d6979962d701b5d493998262cc8bfa)	2022-02-25 00:39:15 +00:00
Nikita Shulga	82130758f0	Add BUILD_LAZY_CUDA_LINALG option (#72306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72306 When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33992795 Pulled By: malfet fbshipit-source-id: d1fa351a320659b29754997c20d754e69bfe36c0 (cherry picked from commit d5d6c69a988b9454538ecd28674206da2541de17)	2022-02-24 03:30:04 +00:00
Daniël de Kok	d50211860a	Use SLEEF functions for NEON vectors on macOS ARM64 (#70354 ) Summary: We noticed that on M1 Macs Tranformer network profiles are dominated by scalar `exp` and `erff` functions (for softmax and GELU). The NEON `Vectorized<float>` implementation does not use SLEEF functions in order to compile on mobile platforms. However, SLEEF is already compiled on macOS ARM64 and is safe to use there. This change adds another implementation of `Vectorized<float>` that uses SLEEF functions. This implementation is only used on macOS ARM64. This change speeds up e.g. prediction of spaCy transformer models by 20% on M1 Macs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70354 Reviewed By: albanD Differential Revision: D33659540 Pulled By: kimishpatel fbshipit-source-id: b8f02a61321873fc60778190a005c466c7d0cc0c (cherry picked from commit `71286a207c`)	2022-02-07 21:55:28 +00:00
Peter Bell	4829dcea09	Codegen: Generate seperate headers per operator (#68247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247 This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and `NativeMetaFunctions.h` into seperate headers per operator base name. With `at::sum` as an example, we can include: ```cpp <ATen/core/sum.h> // Like Functions.h <ATen/core/sum_ops.h> // Like Operators.h <ATen/core/sum_native.h> // Like NativeFunctions.h <ATen/core/sum_meta.h> // Like NativeMetaFunctions.h ``` The umbrella headers are still being generated, but all they do is include from the `ATen/ops' folder. Further, `TensorBody.h` now only includes the operators that have method variants. Which means files that only include `Tensor.h` don't need to be rebuilt when you modify function-only operators. Currently there are about 680 operators that don't have method variants, so this is potentially a significant win for incremental builds. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32596272 Pulled By: albanD fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272	2021-12-14 06:40:08 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Nikita Shulga	e305e4d4d8	Suppress common warnings when building by clang (#69710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69710 Namely no range-loop-analysis (that detect when loop variable can not be const reference Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997003 Pulled By: malfet fbshipit-source-id: dba0e7875e5b667e2cc394c70dd75e2403265918	2021-12-10 16:45:38 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Peter Bell	21919be96b	CMake: Update precompiled header and fix support (#67851 ) Summary: This fixes the `USE_PRECOMPILED_HEADERS` cmake version check which was accidentally inverted, so it was always disabled. I've also made the precompiled header so it only includes headers used in 95% or more of code, weighted by compile time. This limits it to the standard library, `c10` and a limited subset of `ATen/core`. Crucially, the new pch doesn't depend on `native_functions.yaml` so won't cause as much unnecessary rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67851 Reviewed By: zou3519 Differential Revision: D32290902 Pulled By: dagitses fbshipit-source-id: dfc33330028c99b02ff40963926c1f1260d00d00	2021-12-03 06:51:56 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Yi Zhang	31d36fd35d	fix sccache issue on Windows CPU (#68870 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68796 ``` 2021-11-24T10:12:40.7634007Z Compile requests 4312 2021-11-24T10:12:40.7634484Z Compile requests executed 4300 2021-11-24T10:12:40.7634823Z Cache hits 4227 2021-11-24T10:12:40.7635122Z Cache hits (C/C++) 4227 2021-11-24T10:12:40.7636139Z Cache misses 62 2021-11-24T10:12:40.7636930Z Cache misses (C/C++) 62 2021-11-24T10:12:40.7637333Z Cache timeouts 0 2021-11-24T10:12:40.7637839Z Cache read errors 0 2021-11-24T10:12:40.7638161Z Forced recaches 0 2021-11-24T10:12:40.7638489Z Cache write errors 0 2021-11-24T10:12:40.7638828Z Compilation failures 1 2021-11-24T10:12:40.7639180Z Cache errors 10 2021-11-24T10:12:40.7639490Z Cache errors (C/C++) 10 2021-11-24T10:12:40.7639856Z Non-cacheable compilations 0 2021-11-24T10:12:40.7640244Z Non-cacheable calls 0 2021-11-24T10:12:40.7640601Z Non-compilation calls 12 2021-11-24T10:12:40.7640987Z Unsupported compiler calls 0 2021-11-24T10:12:40.7641426Z Average cache write 0.104 s 2021-11-24T10:12:40.7641763Z Average cache read miss 6.000 s 2021-11-24T10:12:40.7642110Z Average cache read hit 0.046 s 2021-11-24T10:12:40.7642485Z Failed distributed compilations 0 ``` https://github.com/pytorch/pytorch/runs/4310176911?check_suite_focus=true cc seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/68870 Reviewed By: ejguan Differential Revision: D32646289 Pulled By: janeyx99 fbshipit-source-id: bf04446439e55a4ccaf9ce7c77812752ca717a7c	2021-11-24 08:04:59 -08:00
Peter Bell	e7e1b76106	Require CMake 3.13 when building with Ninja (#68731 ) Summary: There is a bug in CMake's Ninja generator where files considered inputs to the cmake command couldn't be generated by another build step. The fix was included in CMake 3.13, but 3.10.3 is still sufficient for other cmake generators e.g. makefiles. For reference, the bug is here https://gitlab.kitware.com/cmake/cmake/-/issues/18584 This is necessary for https://github.com/pytorch/pytorch/issues/68246 but I'm isolating the change here to make testing easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68731 Reviewed By: jbschlosser Differential Revision: D32604545 Pulled By: malfet fbshipit-source-id: 9bc0bd8641ba415dd63ce21a05c177e2f1dd9866	2021-11-23 09:34:20 -08:00
Jiakai Liu	3dc0754c53	[pytorch][mobile] deprecate the LLVM-based static analyzer (#68180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180 Since we've open sourced the tracing-based selective build, we can deprecate the op-dependency-graph-based selective build and the static analyzer tool that produces the dependency graph. ghstack-source-id: 143108377 Test Plan: CIs Reviewed By: seemethere Differential Revision: D32358467 fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c	2021-11-11 16:37:08 -08:00
Nikita Shulga	77beccaedb	Do not build PyTorch with caffe2 by default (#66658 ) Summary: CAFFE2 has been deprecated for a while, but still included in every PyTorch build. We should stop building it by default, although CI should still validate that caffe2 code is buildable. Build even fewer dependencies when compiling mobile builds without Caffe2 Introduce `TEST_CAFFE2` in torch.common.utils Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc` is code is compiled without Caffe2 Should be landed after https://github.com/pytorch/builder/pull/864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658 Reviewed By: driazati, seemethere, janeyx99 Differential Revision: D31669156 Pulled By: malfet fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d	2021-10-21 20:32:47 -07:00
Chen Lai	76efbccc3b	[PyTorch Edge][tracing-based] Unify tracer between internal and external (#64152 ) Summary: As title, introduce the file `TracerRunner` shared by internal/external tracer and the main function is ``` TracerResult trace_run(const std::string& input_module_path); ``` which basically takes the path to model file and generate the trace result. The main difference between external tracer and internal tracer is 1. the dependency on `<yaml-cpp/yaml.h>`. 2. the output yaml file from internal tracer includes `model_version` and `model_asset`. These are only needed for internal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64152 ghstack-source-id: 140692467 Test Plan: ``` ./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_with_bundled_input.ptl" --build_yaml_path "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml" ``` ``` ./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/deeplabv3_scripted_with_bundled_input.ptl ``` have the same operator output selected_operators.yaml (P460296279) selected_mobile_ops.h (P460296258) Reviewed By: dhruvbird Differential Revision: D30632224 fbshipit-source-id: eb0321dbc0f1fcf6d2e05384695eebb59ac04f8c	2021-10-15 02:19:45 -07:00
Michael Suo	3ac2c74896	Revert D31082208: Use shared CUPTI by default Test Plan: revert-hammer Differential Revision: D31082208 (`8b0eae5aa8`) Original commit changeset: 14f66af92084 fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db	2021-10-12 14:37:54 -07:00
Edward Yang	8b0eae5aa8	Use shared CUPTI by default (#65401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401 Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gdankel Differential Revision: D31082208 Pulled By: ezyang fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34	2021-10-12 11:01:40 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Chen Lai	3fe5895a00	Back out "Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS" (#66267 ) Summary: Previously https://github.com/pytorch/pytorch/pull/64087 broke the test `binary_macos_wheel_3_7_cpu_build`, because wheel build is not happy with `model_tracer`. Considering it's prototype and there is no need to ship model_tracer via wheel at the moment, using the option `TRACING_BASED` for building tracer. When tracing-based is mature enough, we can ship the tracer binary via wheel eventually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66267 Original commit changeset: 8ac3d75a52d0 ghstack-source-id: 140122106 Test Plan: binary_macos_wheel_3_7_cpu_build passes {F668643831} Reviewed By: dhruvbird Differential Revision: D31478593 fbshipit-source-id: 726cab1b31c4596f6268b7824eecb20e2e59d161	2021-10-08 20:12:12 -07:00
Nikita Shulga	4c4525fa5c	Compile without -Wno-unused-variable (take 2) (#66041 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Do not delete `caffe2::OperatorBase::Output` calls as they have side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041 Reviewed By: ngimel Differential Revision: D31360142 Pulled By: malfet fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8	2021-10-04 20:39:39 -07:00
Nikita Shulga	e4ee5ca698	Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable Test Plan: revert-hammer Differential Revision: D31326599 (`a6280ab653`) Original commit changeset: 924155f1257a fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf	2021-10-01 20:40:47 -07:00
Nikita Shulga	5ef350d7cc	Revert D31359010: [pytorch][PR] Fix cang-tidy regressions caused by #65954 Test Plan: revert-hammer Differential Revision: D31359010 (`c269f471f4`) Original commit changeset: dce4b91a9891 fbshipit-source-id: 085417432b6748d3672b9b7141460f47d1c17a7f	2021-10-01 20:35:35 -07:00
Nikita Shulga	c269f471f4	Fix cang-tidy regressions caused by #65954 (#66040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66040 Reviewed By: ZolotukhinM Differential Revision: D31359010 Pulled By: malfet fbshipit-source-id: dce4b91a98913c8d8c2d8f9ebc49654265239158	2021-10-01 19:50:53 -07:00
Nikita Shulga	a6280ab653	Compile without -Wno-unused-variable (#65954 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954 Reviewed By: ngimel Differential Revision: D31326599 Pulled By: malfet fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3	2021-10-01 17:40:47 -07:00
Dhruv Matani	a84feeeade	[PyTorch Edge] Conditionally trim dispatch key set to save heap memory at runtime (#65732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65732 For certain on-device uses, runtime memory comes at a premium. On-device deployments won't use all the available dispatch keys, so it makes sense to keep only the on-device specific ones around for such uses to reduce runtime heap memory allocated. This change keeps just 10 dispatch keys (the ones that used on-device), guarded under the `C10_MOBILE_TRIM_DISPATCH_KEYS` macro. it tries to keep the other code-paths unaffected and uses `constexpr` for use in the `array` declaration, and simple inline functions to ensure that the compiler is able to optimize these for server builds. Test Plan: Build and check mobile models end to end. ``` buck build -c "pt.enable_milan_dispatch_keys_trimming"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: ezyang Differential Revision: D31185407 fbshipit-source-id: e954765606373dea6ee9466a851dca7684167b0b	2021-09-29 12:20:33 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit `03389dc851`. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Tao Xu	18fa58c4e9	[CoreML][OSS] Integrate with CMake (#64523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64523 - Build Pytorch with CoreML delegate - ` USE_PYTORCH_METAL=ON python setup.py install --cmake` - Build iOS static libs - `IOS_PLATFORM=SIMULATOR USE_COREML_DELEGATE=1 ./scripts/build_ios.sh` ghstack-source-id: 138324216 Test Plan: - Test the Helloword example {F657778559} Reviewed By: iseeyuan Differential Revision: D30594041 fbshipit-source-id: 8cece0b2d4b3ef82d3ef4da8c1054919148beb16	2021-09-17 10:32:00 -07:00
Nikita Shulga	67570a60ba	Disable ParallelTBB (#65092 ) Summary: As ParallelTBB's `at::get_thread_num` is not compatible with general model used by OpenMP and ParallelNative (where it is an contiguous thread index within parallel loop), see https://github.com/pytorch/pytorch/issues/64571#issuecomment-914691883 More examples of similar regressions: https://github.com/pytorch/pytorch/runs/3612142217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65092 Reviewed By: zhouzhuojie Differential Revision: D30995936 Pulled By: malfet fbshipit-source-id: db145b6a850d794f2c954f59f30249b291473e36	2021-09-16 12:38:45 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Nick Kreeger	882b67dff4	Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892 ) Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b	2021-09-14 09:44:18 -07:00
Hanton Yang	22d38bd10d	[OSS] Enable Metal in PyTorch MacOS nightly builds (#63718 ) Summary: Build on https://github.com/pytorch/pytorch/pull/63825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63718 Test Plan: 1.Add `ci/binaries` label to PR, so the CI will build those nightly builds 2.Make sure the following CI jobs build with `USE_PYTORCH_METAL_EXPORT` option is `ON`: ``` ci/circleci: binary_macos_arm64_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_9_cpu_nightly_build ci/circleci: binary_macos_conda_3_6_cpu_nightly_build ci/circleci: binary_macos_conda_3_7_cpu_nightly_build ci/circleci: binary_macos_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_libtorch_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_6_cpu_nightly_build ci/circleci: binary_macos_wheel_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_wheel_3_9_cpu_nightly_build ``` 3.Test `conda` and `wheel` builds locally on [HelloWorld-Metal](https://github.com/pytorch/ios-demo-app/tree/master/HelloWorld-Metal) demo with [(Prototype) Use iOS GPU in PyTorch](https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html) (1) conda ``` conda install https://15667941-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/pytorch-1.10.0.dev20210826-py3.8_0.tar.bz2 ``` (2) wheel ``` pip3 install https://15598647-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/torch-1.10.0.dev20210824-cp38-none-macosx_10_9_x86_64.whl ``` Reviewed By: xta0 Differential Revision: D30593167 Pulled By: hanton fbshipit-source-id: 471da204e94b29c11301c857c50501307a5f0785	2021-08-27 09:25:05 -07:00
Nikita Shulga	5ab356ffe6	Update CMake minimum version to 3.10 (#63660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63660 Test Plan: Imported from OSS Reviewed By: janeyx99, mruberry Differential Revision: D30543878 fbshipit-source-id: a7d938807653f39727f2cc7d7ca167200567b6a0	2021-08-25 09:25:43 -07:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
Peter Bell	f70b9ee5de	Advertise USE_PRECOMPILED_HEADERS in CONTRIBUTING.md (#62827 ) Summary: This option was added in https://github.com/pytorch/pytorch/issues/61940 and fits with this section's theme of improving build times. I've also changed it to a `cmake_dependent_option` instead of `FATAL_ERROR`ing for older CMake versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62827 Reviewed By: astaff Differential Revision: D30342102 Pulled By: malfet fbshipit-source-id: 3095b44b7085aee8a884ec95cba9f8998d4442e7	2021-08-17 10:14:40 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
peterjc123	d16587f84d	Enable rebuilds for Ninja on Windows (#62948 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59859. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62948 Reviewed By: seemethere, tktrungna Differential Revision: D30192246 Pulled By: janeyx99 fbshipit-source-id: af25cc4bf0db67a1304d9971cfa0ff6831bb3b48	2021-08-09 16:15:45 -07:00
Peter Bell	b7ac286d0e	CMake: Add optional precompiled header support (#61940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61940 This adds a `USE_PRECOMPILED_HEADERS` option to the CMake build which precompiles `ATen.h` and also `CUDAContext.h` for the cuda library. After making a change in `native_functions.yaml`, this speeds up compilation time by around 15% on my machine. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29988775 Pulled By: malfet fbshipit-source-id: a23c468c958a8b74ebaef052a5b2e5fa3836c64b	2021-08-03 09:13:47 -07:00
Can Balioglu	7565039ee9	Support system-provided Intel TBB (#61934 ) Summary: This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic. Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934 Reviewed By: malfet Differential Revision: D29805416 Pulled By: cbalioglu fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd	2021-08-02 07:39:00 -07:00
Jane Xu	e318058ffe	Ignore LNK4099 for debug binary libtorch builds (#62060 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62060 Test Plan: This CI shouldn't break and https://github.com/pytorch/pytorch/pull/62061 Reviewed By: driazati Differential Revision: D29877487 Pulled By: janeyx99 fbshipit-source-id: 497f84caab3f9ae609644fd397ad87a6dc8a2a77	2021-07-23 09:31:41 -07:00
Luca Wehrstedt	a1780432fa	Move c10d to libtorch(_cuda) (#59563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59563 ghstack-source-id: 131331264 Test Plan: CI Reviewed By: malfet Differential Revision: D28932239 fbshipit-source-id: 5df6cdfa5253b15cbbc97039fe672d6d97321e34	2021-06-15 02:01:31 -07:00
Nikita Shulga	1ea5c19c19	Add USE_WHOLE_CUDNN option (#59744 ) Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after https://github.com/pytorch/pytorch/pull/59721 towards resolving fast kernels stripping reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a	2021-06-09 21:12:42 -07:00
Nikita Shulga	7179e7ea7b	[CMake] Prefer third_party/pybind11 by default (#58951 ) Summary: To make build behaviour aligned with other third_party/ libraries, introduce `USE_SYSTEM_PYBIND11 (`d55b25a633`)` build option, which set to OFF by default, which means PyTorch will be build with bundled pybind11 even if other version is already installed locally. Fixes https://github.com/pytorch/pytorch/issues/58750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951 Reviewed By: driazati Differential Revision: D28690411 Pulled By: malfet fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c	2021-05-25 15:10:17 -07:00
Nathan John Sircombe	bf00d26deb	Enables builds with Compute Library backend for oneDNN (#55913 ) Summary: Since v1.7, oneDNN (MKL-DNN) has supported the use of Compute Library for the Arm architeture to provide optimised convolution primitives on AArch64. This change enables the use of Compute Library in the PyTorch build. Following the approach used to enable the use of CBLAS in MKLDNN, It is enabled by setting the env vars USE_MKLDNN and USE_MKLDNN_ACL. The location of the Compute Library build must be set useing `ACL_ROOT_DIR`. This is an extension of the work in https://github.com/pytorch/pytorch/pull/50400 which added support for the oneDNN/MKL-DNN backend on AArch64. _Note: this assumes that Compute Library has been built and installed at ACL_ROOT_DIR. Compute library can be downloaded here: `https://github.com/ARM-software/ComputeLibrary`_ Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55913 Reviewed By: ailzhang Differential Revision: D28559516 Pulled By: malfet fbshipit-source-id: 29d24996097d0a54efc9ab754fb3f0bded290005	2021-05-20 07:43:56 -07:00
Xiang Gao	6c70cbedb6	step 0 of cuDNN v8 convolution API integration (#51390 ) Summary: This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release. The work is not complete, and this PR is only step 0. What this PR does: - Add cudnn-frontend as a submodule. - Modify cmake to build that submodule. - Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default. - Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below. What this PR doesn't: - Only convolution forward, no backward. The backward will use v7 API. - No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions. - No test beyond PyTorch's unit tests. - Not tested for correctness on real models. - Not benchmarked for performance. - Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR) - cuDNN benchmark is not supported. - There are failing tests, which will be resolved later: ``` FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9 FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an... FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM ``` Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390 Reviewed By: malfet Differential Revision: D28513167 Pulled By: ngimel fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740	2021-05-19 12:54:09 -07:00
Pavel Belevich	96e1a83fb2	Add Gloo TCP_TLS transport (#56442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56442 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D27896285 Pulled By: pbelevich fbshipit-source-id: 589af59ca4c7c9bab2329f079382c09b71cfcf9e	2021-05-07 13:36:11 -07:00
Kimish Patel	f4a921600a	[PyTorch, Mobile] Serialization format change for source range (#54284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284 In order to bring mobile deployment, via lite interpreter, on feature parity with JIT, with respect model level debug information we must make model level debug information available to mobile runtime. At the moment, model level debug information is stored in SourceRange which associates node's of graph to where the come from in original python source code. This information is serialized as part of debug_pkl and deserialized when JIT loads the model and reads the model code. On lite interpreter, we do not have access to all the functionality of JIT and hence we cannot load model in the same way as JIT, by reading code, constructing module hierarchy and graph corresponding module methods etc. Instead in, lite interpreter, only bytecode corresonding to the compiled graph, Code, is saved. Thus in order to annotate OPs in the bytecode with equivalent SourceRange information we do the following: 1. During model serialization, we create a unique tag for each source range of the model. 2. Create a map of <SourceRange, tag> 3. During debug_pkl serialization we save tag along with SourceRange, on top of byte offset. 4. During bytecode generation, the methods of the top module are lowered. During this process methods are inlined. In the inlined graph, when the node of a graph is lowered to bytecode, we query node's source range and look it up against the map. 5. Resulting source range tag is serialized in module_debug_info. 6. During model deserialization, we read all the debug_pkl records in the archieve and create a map of <tag, SourceRange> 7. This map can be used to find source code information. During mobile runtime: 1. We read all the debug_pkl records and create <tag=debug_handle, SourceRange> map. 1.1 This map, MobileDebugInfo, is a member of mobile Module. 2. Interpreter catches appropriate exceptions and sets the thread local debug handle and rethrows the exception. 3. In Function's run method we catch exception and query current debug handle where the exception happened. 4. Query MobileDebugInfo with debug handle to retrieve source range and augment error with source range info. This information is still incomplete as it does not contain entire callstack. In the following diffs we will serialize InlinedCallStack directly. Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro, so that mobile builds can avoid building MobileDebugInfo, source range and source range pickler/unpickler. Later we will add path where, if building without debug support stack trace will contain only debug handles. They can be symbolicated later. Test Plan: Ported bunch of source range tests from test_jit.py. Added on more test in test_lite_interpreter.py Imported from OSS Reviewed By: raziel Differential Revision: D27174722 fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12	2021-05-04 09:19:27 -07:00
davidriazati@fb.com	264d87985a	Use ld.gold by default to link in CI (#57061 ) Summary: This adds an option to CMake to use `ld.gold` to link rather than `ld` (which symlinks to `ld.bfd` on Ubuntu by default). This shouldn't change any functionality, only a mild improvement on link times during builds (shaves off 1 minute) on CI. Verify by searching for `ld.gold is available` in [the logs](https://circleci.com/api/v1.1/project/github/pytorch/pytorch/13046834/output/105/0?file=true&allocation-id=608c434338107e5b6cf938a1-0-build%2F7BDA2FF1) ](https://our.intern.facebook.com/intern/diff/28123522/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57061 Pulled By: driazati Reviewed By: janeyx99 Differential Revision: D28123522 fbshipit-source-id: 5a60798ca4785427fd92bbf3b3aa5f63730e9b20	2021-05-03 10:05:36 -07:00
davidriazati@fb.com	c44cbc63cc	Ignore more compiler warnings, unify WERROR options (#56630 ) Summary: This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet). ](https://our.intern.facebook.com/intern/diff/28005063/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630 Pulled By: driazati Reviewed By: malfet Differential Revision: D28005063 fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0	2021-04-29 21:20:29 -07:00
davidriazati@fb.com	21be40b390	Add torch_cpu specific flag for debug info (#57190 ) Summary: Right now we are using `REL_WITH_DEB_INFO=1` on Linux CI binary builds. This is causing intermittent failures on CUDA builds since the debug information increases the load on the linker. This adds a workaround by a flag to enable debug info only for the target we actually want it for (`libtorch_cpu.so`, all the other binaries are stripped over their debug info after building). Example failures (from [the hud](https://ezyang.github.io/pytorch-ci-hud/build2/pytorch-nightly?mode=nightly)): * https://app.circleci.com/pipelines/github/pytorch/pytorch/311785/workflows/df640957-54b0-4592-aeef-6d5baee503ae/jobs/12932229 * https://app.circleci.com/pipelines/github/pytorch/pytorch/311784/workflows/e3b487d6-fb46-4a5d-a2d5-22eec328b678/jobs/12932228 * https://app.circleci.com/pipelines/github/pytorch/pytorch/311784/workflows/e3b487d6-fb46-4a5d-a2d5-22eec328b678/jobs/12932227 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57190 Pulled By: driazati Reviewed By: janeyx99 Differential Revision: D28085550 fbshipit-source-id: 0fc5b3e769b10c0dd3811717f968d0c933667361	2021-04-29 12:06:15 -07:00
Will Constable	21fd5f4b79	Document current deploy cpython build #56490 (#56600 ) Summary: Call out the issues with cpython deps and suggest a workaround. Fixes https://github.com/pytorch/pytorch/issues/56490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56600 Reviewed By: albanD Differential Revision: D27920647 Pulled By: wconstab fbshipit-source-id: 61a53a176eaf42a6166d649d3cb0fdfa2489e9d2	2021-04-22 09:02:29 -07:00
Eddie Yan	81f181567a	Add `USE_MAGMA` build flag (#55994 ) Summary: Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master). A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be manually deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild? CC malfet ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994 Reviewed By: mruberry Differential Revision: D27766287 Pulled By: malfet fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421	2021-04-15 00:43:12 -07:00
Ailing Zhang	1688a5d31a	Cleanup since FEATURE_TORCH_MOBILE is always true. (#55835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55835 Now that https://github.com/pytorch/pytorch/pull/55238 is landed for a week and no complains. It seems safe to say FEATURE_TORCH_MOBILE is always true and we can do some cleanup. Test Plan: Imported from OSS Reviewed By: ezyang, walterddr Differential Revision: D27721284 Pulled By: ailzhang fbshipit-source-id: 4896bc5f736373d0922cfbe8eed0d16df62f0fa1	2021-04-14 09:08:18 -07:00
Ivan Kobzarev	85fcadc059	[lite-interpreter] speed_benchmark_torch support BUILD_LITE_INTERPRETER (#55402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55402 Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D27599824 Pulled By: IvanKobzarev fbshipit-source-id: 3adbb8a16a785d3610404d71ef2d895904b1a8ef	2021-04-07 11:39:32 -07:00
SpaceIm	aeedd5c7df	cmake: fix ONNX_NAMESPACE if USE_SYSTEM_ONNX (#54973 ) Summary: `ONNX_NAMESPACE` is empty by default if `USE_SYSTEM_ONNX ON`, while it should be equal to `onnx`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54973 Reviewed By: glaringlee Differential Revision: D27466020 Pulled By: walterddr fbshipit-source-id: 47cde3604acbda3f45bec5893036b39fd1eb58c9	2021-03-31 08:29:00 -07:00
Nikita Shulga	68bdeef2ce	[CMake] Simplify CPU architecture detection logic (#54637 ) Summary: CMAKE_SYSTEM_PROCESSOR set to x86_64(on Linux) or AMD64 (`5ec224496b`)(on Windows) indicates build is running on x86_64 architecture, while `CMAKE_SYSTEM_PROCESSOR` set to aarch64 or arm64 means we running on ARMv8+ architecture. Delete `i[3-6]86` pattern as 32-bit builds are no longer supported Pull Request resolved: https://github.com/pytorch/pytorch/pull/54637 Reviewed By: ezyang Differential Revision: D27311897 Pulled By: malfet fbshipit-source-id: 26989fc9b54a96d70c768ab03ca4528506ee7808	2021-03-25 12:32:18 -07:00
Leonard Lausen	90bbe0b38b	cmake: auto-detect ccache to speed up developer builds (#49389 ) Summary: https://ccache.dev/ is a compiler cache that speeds up subsequent builds. Auto-detecting ccache ensures that it is used on systems where it is available, greatly improving build times for developers. There is no risk in enabling ccache in practice. Please refer to https://ccache.dev/ for a short summary / motivation Pull Request resolved: https://github.com/pytorch/pytorch/pull/49389 Reviewed By: ejguan Differential Revision: D27169957 Pulled By: malfet fbshipit-source-id: 673b60bbceb0d323901c8a992a75792c6da9b805	2021-03-18 14:20:53 -07:00
Ashkan Aliabadi	e5ecd1ddf8	[Vulkan]Fix build warnings-treated-as-error on Linux. (#52781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D26669311 Pulled By: AshkanAliabadi fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311	2021-03-03 13:48:43 -08:00
Chen Lai	14f7bf0629	[PyTorch] update CMake to build libtorch lite (#51419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51419 ## Summary 1. Add an option `BUILD_LITE_INTERPRETER` in `caffe2/CMakeLists.txt` and set `OFF` as default. 2. Update 'build_android.sh' with an argument to swtich `BUILD_LITE_INTERPRETER`, 'OFF' as default. 3. Add a mini demo app `lite_interpreter_demo` linked with `libtorch` library, which can be used for quick test. ## Test Plan Built lite interpreter version of libtorch and test with Image Segmentation demo app ([android version](https://github.com/pytorch/android-demo-app/tree/master/ImageSegmentation)/[ios version](https://github.com/pytorch/ios-demo-app/tree/master/ImageSegmentation)) ### Android 1. Prepare model: Prepare the lite interpreter version of model by run the script below to generate the scripted model `deeplabv3_scripted.pt` and `deeplabv3_scripted.ptl` ``` import torch model = torch.hub.load('pytorch/vision:v0.7.0', 'deeplabv3_resnet50', pretrained=True) model.eval() scripted_module = torch.jit.script(model) # Export full jit version model (not compatible lite interpreter), leave it here for comparison scripted_module.save("deeplabv3_scripted.pt") # Export lite interpreter version model (compatible with lite interpreter) scripted_module._save_for_lite_interpreter("deeplabv3_scripted.ptl") ``` 2. Build libtorch lite for android: Build libtorch for android for all 4 android abis (armeabi-v7a, arm64-v8a, x86, x86_64) `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh`. This pr is tested on Pixel 4 emulator with x86, so use cmd `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` to specify abi to save built time. After the build finish, it will show the library path: ``` ... BUILD SUCCESSFUL in 55s 134 actionable tasks: 22 executed, 112 up-to-date + find /Users/chenlai/pytorch/android -type f -name 'aar' + xargs ls -lah -rw-r--r-- 1 chenlai staff 13M Feb 11 11:48 /Users/chenlai/pytorch/android/pytorch_android/build/outputs/aar/pytorch_android-release.aar -rw-r--r-- 1 chenlai staff 36K Feb 9 16:45 /Users/chenlai/pytorch/android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar ``` 3. Use the PyTorch Android libraries built from source in the ImageSegmentation app: Create a folder 'libs' in the path, the path from repository root will be `ImageSegmentation/app/libs`. Copy `pytorch_android-release` to the path `ImageSegmentation/app/libs/pytorch_android-release.aar`. Copy 'pytorch_android_torchvision` (downloaded from [here](https://oss.sonatype.org/#nexus-search;quick~torchvision_android)) to the path `ImageSegmentation/app/libs/pytorch_android_torchvision.aar` Update the `dependencies` part of `ImageSegmentation/app/build.gradle` to ``` dependencies { implementation 'androidx.appcompat:appcompat:1.2.0' implementation 'androidx.constraintlayout:constraintlayout:2.0.2' testImplementation 'junit:junit:4.12' androidTestImplementation 'androidx.test.ext:junit:1.1.2' androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0' implementation(name:'pytorch_android-release', ext:'aar') implementation(name:'pytorch_android_torchvision', ext:'aar') implementation 'com.android.support:appcompat-v7:28.0.0' implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3' } ``` Update `allprojects` part in `ImageSegmentation/build.gradle` to ``` allprojects { repositories { google() jcenter() flatDir { dirs 'libs' } } } ``` 4. Update model loader api: Update `ImageSegmentation/app/src/main/java/org/pytorch/imagesegmentation/MainActivity.java` by 4.1 Add new import: `import org.pytorch.LiteModuleLoader;` 4.2 Replace the way to load pytorch lite model ``` // mModule = Module.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.pt")); mModule = LiteModuleLoader.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.ptl")); ``` 5. Test app: Build and run the ImageSegmentation app in Android Studio, ![image](https://user-images.githubusercontent.com/16430979/107696279-9cea5900-6c66-11eb-8286-4d1d68abff61.png) ### iOS 1. Prepare model: Same as Android. 2. Build libtorch lite for ios* `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR BUILD_LITE_INTERPRETER=1 ./scripts/build_ios.sh` 3. Remove Cocoapods from the project: run `pod deintegrate` 4. Link ImageSegmentation demo app with the custom built library: Open your project in XCode, go to your project Target’s Build Phases - Link Binaries With Libraries, click the + sign and add all the library files located in `build_ios/install/lib`. Navigate to the project Build Settings, set the value Header Search Paths to `build_ios/install/include` and Library Search Paths to `build_ios/install/lib`. In the build settings, search for other linker flags. Add a custom linker flag below ``` -all_load ``` Finally, disable bitcode for your target by selecting the Build Settings, searching for Enable Bitcode, and set the value to No. 5. Update library and api 5.1 Update `TorchModule.mm`` To use the custom built libraries the project, replace `#import <LibTorch/LibTorch.h>` (in `TorchModule.mm`) which is needed when using LibTorch via Cocoapods with the code below: ``` //#import <LibTorch/LibTorch.h> #include "ATen/ATen.h" #include "caffe2/core/timer.h" #include "caffe2/utils/string_utils.h" #include "torch/csrc/autograd/grad_mode.h" #include "torch/script.h" #include <torch/csrc/jit/mobile/function.h> #include <torch/csrc/jit/mobile/import.h> #include <torch/csrc/jit/mobile/interpreter.h> #include <torch/csrc/jit/mobile/module.h> #include <torch/csrc/jit/mobile/observer.h> ``` 5.2 Update `ViewController.swift` ``` // if let filePath = Bundle.main.path(forResource: // "deeplabv3_scripted", ofType: "pt"), // let module = TorchModule(fileAtPath: filePath) { // return module // } else { // fatalError("Can't find the model file!") // } if let filePath = Bundle.main.path(forResource: "deeplabv3_scripted", ofType: "ptl"), let module = TorchModule(fileAtPath: filePath) { return module } else { fatalError("Can't find the model file!") } ``` ### Unit test Add `test/cpp/lite_interpreter`, with one unit test `test_cores.cpp` and a light model `sequence.ptl` to test `_load_for_mobile()`, `bc.find_method()` and `bc.forward()` functions. ### Size: With the change: Android: x86: `pytorch_android-release.aar` (13.8 MB) IOS: `pytorch/build_ios/install/lib` (lib: 66 MB): ``` (base) chenlai@chenlai-mp lib % ls -lh total 135016 -rw-r--r-- 1 chenlai staff 3.3M Feb 15 20:45 libXNNPACK.a -rw-r--r-- 1 chenlai staff 965K Feb 15 20:45 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Feb 15 20:45 libclog.a -rw-r--r-- 1 chenlai staff 42K Feb 15 20:45 libcpuinfo.a -rw-r--r-- 1 chenlai staff 39K Feb 15 20:45 libcpuinfo_internals.a -rw-r--r-- 1 chenlai staff 1.5M Feb 15 20:45 libeigen_blas.a -rw-r--r-- 1 chenlai staff 148K Feb 15 20:45 libfmt.a -rw-r--r-- 1 chenlai staff 44K Feb 15 20:45 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Feb 15 20:45 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Feb 15 21:19 libtorch.a -rw-r--r-- 1 chenlai staff 60M Feb 15 20:47 libtorch_cpu.a ``` `pytorch/build_ios/install`: ``` (base) chenlai@chenlai-mp install % du -sh * 14M include 66M lib 2.8M share ``` Master (baseline): Android: x86: `pytorch_android-release.aar` (16.2 MB) IOS: `pytorch/build_ios/install/lib` (lib: 84 MB): ``` (base) chenlai@chenlai-mp lib % ls -lh total 172032 -rw-r--r-- 1 chenlai staff 3.3M Feb 17 22:18 libXNNPACK.a -rw-r--r-- 1 chenlai staff 969K Feb 17 22:18 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Feb 17 22:18 libclog.a -rw-r--r-- 1 chenlai staff 42K Feb 17 22:18 libcpuinfo.a -rw-r--r-- 1 chenlai staff 1.5M Feb 17 22:18 libeigen_blas.a -rw-r--r-- 1 chenlai staff 44K Feb 17 22:18 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Feb 17 22:18 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Feb 17 22:19 libtorch.a -rw-r--r-- 1 chenlai staff 78M Feb 17 22:19 libtorch_cpu.a ``` `pytorch/build_ios/install`: ``` (base) chenlai@chenlai-mp install % du -sh * 14M include 84M lib 2.8M share ``` Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D26518778 Pulled By: cccclai fbshipit-source-id: 4503ffa1f150ecc309ed39fb0549e8bd046a3f9c	2021-02-21 01:43:54 -08:00
Bel H	db33afbf9f	Change cmake to allow building with MLC kick-off build (#51326 ) Summary: - Allows build process to build with MLC enabled if subrepo folder mlc is in path and we can link against ML Compute on macOS BigSur - To build with MLC enabled you will need to clone the mlc repo inside the pytorch repository. - We need both this change and https://github.com/pytorch/pytorch/pull/50634 on pytorch/pytorch to enable the `mlc` device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51326 Reviewed By: glaringlee Differential Revision: D26533138 Pulled By: malfet fbshipit-source-id: 0baa06b4eb2d62dbfc0f6fc922096cb0db1cc7d1	2021-02-19 13:04:25 -08:00
Jiakai Liu	c9c4b871a5	[pytorch] reintroduce static dispatch (#51957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51957 This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` Differential Revision: D26337857 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364	2021-02-19 11:41:39 -08:00
Jane Xu	ac2bdf553e	update build_host_protoc command for macos cross compilation (#50922 ) Summary: Currently, adding a cross compile build is failing on CI due to a cmake builtin compiler check that does not pass due to cross compiling the host protoc library. Setting the CMAKE_TRY_COMPILE_TARGET_TYPE flag should fix it. (Based on this [SOF answer](https://stackoverflow.com/questions/53633705/cmake-the-c-compiler-is-not-able-to-compile-a-simple-test-program).) To test that this works, please run: `CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_NNPACK=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF python setup.py install` from a Mac x86_64 machine with Xcode12.3 (anything with MacOS 11 SDK). Then, you can check that things were compiled for arm by running `lipo -info <file>` for any file in the `build/lib` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50922 Reviewed By: malfet Differential Revision: D26355054 Pulled By: janeyx99 fbshipit-source-id: 919f3f9bd95d7c7bba6ab3a95428d3ca309f8ead	2021-02-11 14:36:51 -08:00
cyy	1aaddd83a5	don't set the same C++ and C standards twice (#51832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51832 Reviewed By: izdeby Differential Revision: D26312660 Pulled By: ezyang fbshipit-source-id: 7d646cd106397e70bca0050d0aa30eb62b085cee	2021-02-08 08:53:26 -08:00
Jane Xu	88af2149e1	Add build option to split torch_cuda library into torch_cuda_cu and torch_cuda_cpp (#49050 ) Summary: Because of the size of our `libtorch_cuda.so`, linking with other hefty binaries presents a problem where 32bit relocation markers are too small and end up overflowing. This PR attempts to break up `torch_cuda` into `torch_cuda_cu` and `torch_cuda_cpp`. `torch_cuda_cu`: all the files previously in `Caffe2_GPU_SRCS` that are * pure `.cu` files in `aten`match * all the BLAS files * all the THC files, except for THCAllocator.cpp, THCCachingHostAllocator.cpp and THCGeneral.cpp * all files in`detail` * LegacyDefinitions.cpp and LegacyTHFunctionsCUDA.cpp * RegisterCUDA.cpp CUDAHooks.cpp * CUDASolver.cpp * TensorShapeCUDA.cpp `torch_cuda_cpp`: all other files in `Caffe2_GPU_SRCS` Accordingly, TORCH_CUDA_API and TORCH_CUDA_BUILD_MAIN_LIB usages are getting split as well to TORCH_CUDA_CU_API and TORCH_CUDA_CPP_API. To test this locally, you can run `export BUILD_SPLIT_CUDA=ON && python setup.py develop`. In your `build/lib` folder, you should find binaries for both `torch_cuda_cpp` and `torch_cuda_cu`. To see that the SPLIT_CUDA option was toggled, you can grep the Summary of running cmake and make sure `Split CUDA` is ON. This build option is tested on CI for CUDA 11.1 builds (linux for now, but windows soon). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49050 Reviewed By: walterddr Differential Revision: D26114310 Pulled By: janeyx99 fbshipit-source-id: 0180f2519abb5a9cdde16a6fb7dd3171cff687a6	2021-02-01 18:42:35 -08:00
Ivan Kobzarev	dbfaf966b0	[android] turn on USE_VULKAN for android builds by default (#51291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51291 Turning on USE_VULKAN for android builds Remove standalone android vulkan build Testing all ci jobs (for master): https://github.com/pytorch/pytorch/pull/51292 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D26141891 Pulled By: IvanKobzarev fbshipit-source-id: e8e1a4ab612c0786ce09217ab9370fd75a71eb00	2021-01-29 11:58:21 -08:00
Will Constable	f2e41257e4	Back out "Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"" (#51267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51267 Original commit changeset: b70185916502 Test Plan: test locally, oss ci-all, fbcode incl deferred Reviewed By: suo Differential Revision: D26121251 fbshipit-source-id: 4315b7fd5476914c8e5d6f547e1cfbcf0c227781	2021-01-28 19:30:45 -08:00
Mike Ruberry	12a434abbc	Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" Test Plan: revert-hammer Differential Revision: D26077905 (`dc2a44c4fc`) Original commit changeset: fae83bf9822d fbshipit-source-id: b70185916502ba9ebe16d781cf0659b9f7865c9a	2021-01-27 19:53:29 -08:00
Will Constable	dc2a44c4fc	Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" (#51124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51124 Original commit changeset: 1c7133627da2 Test Plan: Test locally with interpreter_test and on CI Reviewed By: suo Differential Revision: D26077905 fbshipit-source-id: fae83bf9822d79e9a9b5641bc5191a7f3fdea78d	2021-01-27 16:49:42 -08:00
Mike Ruberry	e843974a6e	Revert D25850783: Add torch::deploy, an embedded torch-python interpreter Test Plan: revert-hammer Differential Revision: D25850783 (`3192f9e4fe`) Original commit changeset: a4656377caff fbshipit-source-id: 1c7133627da28fb12848da7a9a46de6d3b2b67c6	2021-01-26 02:07:44 -08:00
Will Constable	3192f9e4fe	Add torch::deploy, an embedded torch-python interpreter (#50458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50458 libinterpreter.so contains a frozen python distribution including torch-python bindings. Freezing refers to serializing bytecode of python standard library modules as well as the torch python library and embedding them in the library code. This library can then be dlopened multiple times in one process context, each interpreter having its own python state and GIL. In addition, each python environment is sealed off from the filesystem and can only import the frozen modules included in the distribution. This change relies on newly added frozenpython, a cpython 3.8.6 fork built for this purpose. Frozenpython provides libpython3.8-frozen.a which contains frozen bytecode and object code for the python standard library. Building on top of frozen python, the frozen torch-python bindings are added in this diff, providing each embedded interpreter with a copy of the torch bindings. Each interpreter is intended to share one instance of libtorch and the underlying tensor libraries. Known issues - Autograd is not expected to work with the embedded interpreter currently, as it manages its own python interactions and needs to coordinate with the duplicated python states in each of the interpreters. - Distributed and cuda stuff is disabled in libinterpreter.so build, needs to be revisited - __file__ is not supported in the context of embedded python since there are no files for the underlying library modules. using __file__ - __version__ is not properly supported in the embedded torch-python, just a workaround for now Test Plan: tested locally and on CI with cmake and buck builds running torch::deploy interpreter_test Reviewed By: ailzhang Differential Revision: D25850783 fbshipit-source-id: a4656377caff25b73913daae7ae2f88bcab8fd88	2021-01-25 15:14:28 -08:00
Will Constable	4bbff92014	Refactor build targets for torch::deploy (#50288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50288 torch::deploy will bundle the objects contained in libtorch-python together with frozenpython into a shared library. Therefore, the libtorch-python objs can't bring with them a dependency on system python. Buck TARGETS are added throughout the caffe2 tree to make available objects or headers that will be needed by torch::deploy but would have brought unsuitable dependencies if accessed using existing targets. CMakeLists are modified to separate a torch-python-objs object library which lets torch::deploy compile these objs with the same compile flags as libttorch_python used, but without some of the link-time dependencies such as python. CudaIPCTypes is moved from libtorch_python to libtorch_cuda because it is really not a python binding, and it statically registers a cuda_ipc_callback which would be duplicated if included in each copy of torch::deploy. Test Plan: no new functionality, just ensure existing tests continue to pass Reviewed By: malfet Differential Revision: D25850785 fbshipit-source-id: b0b81c050cbee04e9de96888f8a09d29238a9db8	2021-01-22 09:16:32 -08:00
Ilia Cherniavskii	e34992ebee	Set USE_KINETO=1 (#49897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897 Resend of https://github.com/pytorch/pytorch/pull/49201 Test Plan: see 49201 Reviewed By: malfet Differential Revision: D25717102 Pulled By: ilia-cher fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6	2021-01-22 00:09:21 -08:00
Nikita Shulga	cebab83d3f	Fix USE_MKLDN defaults (#50782 ) Summary: Fixes regression introduced by https://github.com/pytorch/pytorch/pull/50400 `cmake_dependent_option` semantic is following (see https://cmake.org/cmake/help/v3.19/module/CMakeDependentOption.html); `cmake_dependent_option(<option> "<help_text>" <value> <depends> <force>)` I.e. depends should be true for CPU_INTEL or CPU_AARCH64 but default value should be ON only if CPU_INTEL is true Pull Request resolved: https://github.com/pytorch/pytorch/pull/50782 Reviewed By: xuzhao9 Differential Revision: D25966509 Pulled By: malfet fbshipit-source-id: c891cd9234311875762403f7125d0c3803bb0e65	2021-01-19 21:41:53 -08:00
Rong Rong (AI Infra)	ebd142e94b	initial commit to enable fast_nvcc (#49773 ) Summary: draft enable fast_nvcc. * cleaned up some non-standard usages * added fall-back to wrap_nvcc Pull Request resolved: https://github.com/pytorch/pytorch/pull/49773 Test Plan: Configuration to enable fast nvcc: - install and enable `ccache` but delete `.ccache/` folder before each build. - `TORCH_CUDA_ARCH_LIST=6.0;6.1;6.2;7.0;7.5` - Toggling `USE_FAST_NVCC=ON/OFF` cmake config and run `cmake --build` to verify the build time. Initial statistic for a full compilation: * `cmake --build . -- -j $(nproc)`: - fast NVCC ``` real 48m55.706s user 1559m14.218s sys 318m41.138s ``` - normal NVCC: ``` real 43m38.723s user 1470m28.131s sys 90m46.879s ``` * `cmake --build . -- -j $(nproc/4)`: - fast NVCC: ``` real 53m44.173s user 1130m18.323s sys 71m32.385s ``` - normal NVCC: ``` real 81m53.768s user 858m45.402s sys 61m15.539s ``` * Conclusion: fast NVCC doesn't provide too much gain when compiler is set to use full CPU utilization, in fact it is even worse because of the thread switcing. initial statistic for partial recompile (edit .cu files) * `cmake --build . -- -j $(nproc)` - fast NVCC: ``` [2021-01-13 18:10:24] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o [2021-01-13 18:11:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so ``` - normal NVCC: ``` [2021-01-13 17:35:40] [ 86%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMiscOpsKernels.cu.o [2021-01-13 17:38:08] [ 86%] Linking CXX shared library ../lib/libtorch_cuda.so ``` * Conclusion: Effective compilation time for single CU file modification reduced from from 2min30sec to only 40sec when compiling multiple architecture. This shows 4X gain in speed up using fast NVCC -- reaching the theoretical limit of 5X when compiling 5 gencode architecture at the same time. Follow up PRs: - should have better fallback mechanism to detect whether a build is supported by fast_nvcc or not instead of dryruning then fail with fallback. - performance measurement instrumentation to measure what's the total compile time vs the parallel tasks critical path time. - figure out why `-j $(nproc)` gives significant sys overhead (`sys 318m41.138s` vs `sys 90m46.879s`) over normal nvcc, guess this is context switching, but not exactly sure Reviewed By: malfet Differential Revision: D25692758 Pulled By: walterddr fbshipit-source-id: c244d07b9b71f146e972b6b3682ca792b38c4457	2021-01-19 14:50:54 -08:00
Rong Rong	070a30b265	[BE] add warning message to cmake against env var "-std=c++xx" (#50491 ) Summary: this was discovered when working on https://github.com/pytorch/pytorch/issues/50230. environment variables such as CXXFLAGS="-std=c++17" will not work because we use CMAKE_CXX_STANDARD 14. Adding this warning to alert users when environment variable was set. See: [CMake env var usage](https://cmake.org/cmake/help/latest/manual/cmake-env-variables.7.html#id4) and [CXXFLAGS usage](https://cmake.org/cmake/help/latest/envvar/CXXFLAGS.html) for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50491 Reviewed By: mrshenli Differential Revision: D25907851 Pulled By: walterddr fbshipit-source-id: 5af5eec76f79f9d35456af1f2663cafbc54e7dc8	2021-01-15 07:12:56 -08:00
Nathan John Sircombe	664126bab5	Enables build with oneDNN (MKL-DNN) on AArch64 (#50400 ) Summary: Since version 1.6, oneDNN has provided limited support for AArch64 builds. This minor change is to detect an AArch64 CPU and permit the use of `USE_MKLDNN` in that case. Build flags for oneDNN are also modified accordingly. Note: oneDNN on AArch64, by default, will use oneDNN's reference C++ kernels. These are not optimised for AArch64, but oneDNN v1.7 onwards provides support for a limited set of primitives based Arm Compute Library. See: https://github.com/oneapi-src/oneDNN/pull/795 and: https://github.com/oneapi-src/oneDNN/pull/820 for more details. Support for ACL-based oneDNN primitives in PyTorch will require some further modification, Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/50400 Reviewed By: izdeby Differential Revision: D25886589 Pulled By: malfet fbshipit-source-id: 2c81277a28ad4528c2d2211381e7c6692d952bc1	2021-01-13 08:41:44 -08:00
Jane Xu	c2d37cd990	Change CMake config to enable universal binary for Mac (#50243 ) Summary: This PR is a step towards enabling cross compilation from x86_64 to arm64. The following has been added: 1. When cross compilation is detected, compile a local universal fatfile to use as protoc. 2. For the simple compile check in MiscCheck.cmake, make sure to compile the small snippet as a universal binary in order to run the check. Test plan: Kick off a minimal build on a mac intel machine with the macOS 11 SDK with this command: ``` CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF USE_NNPACK=OFF python setup.py install ``` (If you run the above command before this change, or without macOS 11 SDK set up, it will fail.) Then check the platform of the built binaries using this command: ``` lipo -info build/lib/libfmt.a ``` Output: - Before this PR, running a regular build via `python setup.py install` (instead of using the flags listed above): ``` Non-fat file: build/lib/libfmt.a is architecture: x86_64 ``` - Using this PR: ``` Non-fat file: build/lib/libfmt.a is architecture: arm64 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50243 Reviewed By: malfet Differential Revision: D25849955 Pulled By: janeyx99 fbshipit-source-id: e9853709a7279916f66aa4c4e054dfecced3adb1	2021-01-08 17:26:08 -08:00
Ashkan Aliabadi	1c12cbea90	Optimize Vulkan command buffer submission rate. (#49112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49112 Differential Revision: D25729889 Test Plan: Imported from OSS Reviewed By: SS-JIA Pulled By: AshkanAliabadi fbshipit-source-id: c4ab470fdcf3f83745971986f3a44a3dff69287f	2021-01-08 16:39:22 -08:00
Antonio Cuni	8f31621f78	Fix MKL builds on Ubuntu (#50212 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/50211 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50212 Reviewed By: janeyx99 Differential Revision: D25850876 Pulled By: walterddr fbshipit-source-id: be138db3ae370c45f5fbf3af486cf8b32518df87	2021-01-08 13:16:30 -08:00
Jithun Nair	45ec35827e	Set USE_RCCL cmake option (dependent on USE_NCCL) [REDUX] (#34683 ) Summary: Refiled duplicate of https://github.com/pytorch/pytorch/issues/31341 which was reverted in commit `63964175b5`. This PR enables RCCL support when building Gloo as part of PyTorch for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34683 Reviewed By: glaringlee Differential Revision: D25540578 Pulled By: ezyang fbshipit-source-id: fcb02e5745d62e1b7d2e02048160e9e7a4b4df2d	2021-01-06 07:03:02 -08:00
Rong Rong (AI Infra)	12ee7b61e7	support building with conda installed libraries (#50080 ) Summary: This should fix a bunch of share library compilation error when installed in conda lib, lib64 folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50080 Reviewed By: seemethere Differential Revision: D25781923 Pulled By: walterddr fbshipit-source-id: 78a74925981d65243b98bb99a65f1f2766e87a2f	2021-01-05 12:32:51 -08:00
Ilia Cherniavskii	72b00a8a52	Revert D25480770: Set USE_KINETO=1 Test Plan: revert-hammer Differential Revision: D25480770 (`1a92802bde`) Original commit changeset: 037cd774f554 fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1	2020-12-18 07:06:28 -08:00
Ilia Cherniavskii	1a92802bde	Set USE_KINETO=1 (#49201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201 This unblocks kineto profiler for 1.8 release. This PR supercedes https://github.com/pytorch/pytorch/pull/48391 Note: this will somewhat increase the size of linux server binaries, bc we add libkineto.a and libcupti_static.a: -rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a -rw-r--r-- 1 root root 13699658 Nov 13 2019 /usr/local/cuda/lib64/libcupti_static.a Test Plan: CI https://github.com/pytorch/pytorch/pull/48391 Imported from OSS Reviewed By: ngimel Differential Revision: D25480770 fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c	2020-12-18 01:48:10 -08:00
Nikita Shulga	84fce6d29a	[AARCH64] Fix HAS_VST1 check if compiled by clang (#49182 ) Summary: Use `UL` suffix supported by all C99 compatible compilers instead of `__AARCH64_UINT64_C`, which is a gcc specific extension Before the change this check would have failed as follows with a bug-free clang compiler with the following errors: ``` $ clang has_vst1.c has_vst1.c:5:41: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[0] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:5:79: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[0] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:6:41: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[1] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ has_vst1.c:6:79: warning: implicit declaration of function '__AARCH64_UINT64_C' is invalid in C99 [-Wimplicit-function-declaration] v.val[1] = vcombine_f32 (vcreate_f32 (__AARCH64_UINT64_C (0)), vcreate_f32 (__AARCH64_UINT64_C (0))); ^ 4 warnings generated. /tmp/has_vst1-b1e162.o: In function `main': has_vst1.c:(.text+0x30): undefined reference to `__AARCH64_UINT64_C' ``` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/49182 Reviewed By: walterddr Differential Revision: D25471994 Pulled By: malfet fbshipit-source-id: 0129a6f7aabc46aa117ef719d3a211449cb410f1	2020-12-10 15:19:12 -08:00
Ashkan Aliabadi	66440d1b29	Tweak Vulkan memory use. (#47728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47728 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D25032740 Pulled By: AshkanAliabadi fbshipit-source-id: 7eb72538dc1aa3feb4e2f8c4ff9c675eb8e97057	2020-11-30 14:28:09 -08:00
Joe Zhu	42e7cdc50a	Improve libuv detection on Windows (#48571 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48571 Reviewed By: ejguan Differential Revision: D25220903 Pulled By: mrshenli fbshipit-source-id: a485568621c4e289c5439474c2651186bc63c2f0	2020-11-30 11:16:13 -08:00
Gemfield	3c9e71c9ad	fix BUILD_MOBILE_BENCHMARK typo (#48515 ) Summary: BUILD_MOBILE_BENCHMARKS in CMakeLists.txt should be BUILD_MOBILE_BENCHMARK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48515 Reviewed By: albanD Differential Revision: D25198724 Pulled By: mrshenli fbshipit-source-id: 12765d10c272da04cb104202fcbabc6a0b007c5e	2020-11-30 08:38:43 -08:00
Ilia Cherniavskii	f2da18af14	Add USE_KINETO build option (#45888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45888 Adding USE_LIBKINETO build option Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake Reviewed By: Chillee Differential Revision: D25142221 Pulled By: ilia-cher fbshipit-source-id: d1634a8f9599604ff511fac59b9072854289510c	2020-11-21 20:20:32 -08:00
Bert Maher	8a996dd139	[te] Make BUILD_TENSOREXPR_BENCHMARK a real CMake option (#48158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48158 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D25059877 Pulled By: bertmaher fbshipit-source-id: a98b6c18a91b4fe89d12bf5f7ead604e3cc0c8b0	2020-11-18 12:19:14 -08:00
Rong Rong	147a48fb27	[cmake] clean up cmake/Utils.cmake (#47923 ) Summary: Consolidate into cmake/public/utils.cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/47923 Reviewed By: samestep Differential Revision: D24955961 Pulled By: walterddr fbshipit-source-id: 9d5f6af2b353a8c6f6d521c841fd0989393755cd	2020-11-16 08:12:32 -08:00
Jiakai Liu	8e3af9faa8	[pytorch] fix debug symbol flag for android clang (#46331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46331 Fix the android build size issue #46246. Test Plan: Imported from OSS Reviewed By: dhruvbird Differential Revision: D24390061 Pulled By: ljk53 fbshipit-source-id: b4a6f297e89b9c08dff4297c6a41aabd41d9fff5	2020-11-10 14:55:43 -08:00
Ashkan Aliabadi	6cd8b5e9a7	Provide CMake option to enable Vulkan API. (#46503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24379144 Pulled By: AshkanAliabadi fbshipit-source-id: 8d8c57f96bbac2a44615828a3474c912704f3a85	2020-10-20 18:45:52 -07:00
Pritam Damania	cb3c1d17e4	Promote -Wcast-function-type to an error in builds. (#46356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46356 Adding the flag `-Werror=cast-function-type` to ensure we don't allow any invalid casts (ex: PyCFunction casts). For more details see: https://github.com/pytorch/pytorch/issues/45419 ghstack-source-id: 114632980 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D24319759 fbshipit-source-id: 26ce4650c220e8e9dd3550245f214c7e6c21a5dc	2020-10-20 18:09:06 -07:00
Tao Xu	495070b388	[Metal] Add the Python binding for optimize_for_mobile (#46456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46456 Add the python binding in CMake. The general workflow is - Build pytorch - `USE_PYTORCH_METAL=ON python setup.py install --cmake` - Run optimize_for_mobile ``` import torch from torch.utils.mobile_optimizer import optimize_for_mobile scripted_model = torch.jit.load('./mobilenetv2.pt') optimized_model = optimize_for_mobile(scripted_model, backend='metal') torch.jit.export_opnames(optimized_model) torch.jit.save(optimized_model, './mobilenetv2_metal.bc') ``` The exported ops are ``` ['aten::adaptive_avg_pool2d', 'aten::add.Tensor', 'aten::addmm', 'aten::reshape', 'aten::size.int', 'metal::copy_to_host', 'metal_prepack::conv2d_run'] ``` ghstack-source-id: 114559878 Test Plan: - Sandcastle CI - Circle CI Reviewed By: kimishpatel Differential Revision: D24356768 fbshipit-source-id: fb5c4c4b6316347b67edb4132da044a81470ddfd	2020-10-17 10:26:25 -07:00
Tao Xu	04e5fcc0ed	[GPU] Introduce USE_PYTORCH_METAL (#46383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46383 The old `USE_METAL` is actually being used by Caffe2. Here we introduce a new macro to enable metal in pytorch. ghstack-source-id: 114499392 Test Plan: - Circle CI - The Person Segmentation model works Reviewed By: linbinyu Differential Revision: D24322018 fbshipit-source-id: 4e5548afba426b49f314366d89b18ba0c7e745ca	2020-10-16 18:19:32 -07:00
Michael Ranieri	b1d24dded1	make a way to disable callgrind (#46116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46116 Ideally I would just use one of the existing preprocessor flags such as `FBCODE_CAFFE2`, but this implies a whole bunch of other things elsewhere, so it is not really a solution for ovrsource. Test Plan: CI green, we are able to disable it internally with `-DNVALGRIND` Reviewed By: malfet Differential Revision: D24227360 fbshipit-source-id: 24a3b393cf46d6a16acca0a9ec52610d4bb8704f	2020-10-13 16:18:04 -07:00
Tao Xu	a277c097ac	[iOS][GPU] Add Metal/MPSCNN support on iOS (#46112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112 ### Summary This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta. allow-large-files - Users API ``` auto module = torch::jit::load(model); module.eval(); at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal(); auto output = module.forward({input}).toTensor().cpu(); ``` - Supported Models - Person Segmentation v106 (FB Internal) - Mobilenetv2 - Supported Operators - aten::conv2d - aten::addmm - aten::add.Tensor - aten::sub.Tensor - aten::mul.Tensor - aten::relu - aten::hardtanh - aten::hardtanh_ - aten::sigmoid - aten::max_pool2d - aten::adaptive_avg_pool2d - aten::reshape - aten::t - aten::view - aten::log_softmax.int - aten::upsample_nearest2d.vec - Supported Devices - Apple A9 and above - iOS 10.2 and above - CMake scripts - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON` ### Test Plan - Circle CI ghstack-source-id: 114155638 Test Plan: 1. Sandcastle CI 2. Circle CI Reviewed By: dreiss Differential Revision: D23236555 fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625	2020-10-13 01:46:56 -07:00
gunandrose4u	ffd50b8220	SET USE_DISTRIBUTED OFF when libuv is not installed (#45554 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45554 Reviewed By: izdeby Differential Revision: D24016825 Pulled By: mrshenli fbshipit-source-id: 332d860429626a915c06f98cad31e6db1cbc4eb1	2020-09-30 12:46:36 -07:00
gunandrose4u	0a38aed025	Auto set libuv_ROOT env var for Gloo submodule on Windows platform (#45484 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45484 Reviewed By: lw Differential Revision: D23990724 Pulled By: mrshenli fbshipit-source-id: 1987ce7eb7d3f9d3120c07e954cd6581cd3caf59	2020-09-29 08:58:56 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
Ivan Kobzarev	6debe825be	[vulkan] glsl shaders relaxed precision mode to cmake option (#43076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43076 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D23143354 Pulled By: IvanKobzarev fbshipit-source-id: 7b3ead1e63cf8acf6e8e547080a8ead7a2db994b	2020-09-16 12:51:34 -07:00
peter	ed862d3682	Split CUDA_NVCC_FLAGS by space (#44603 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603 Reviewed By: albanD Differential Revision: D23692320 Pulled By: ezyang fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754	2020-09-14 20:25:37 -07:00
Marcin Juszkiewicz	e261e0953e	Fix centos8 gcc (#44644 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44198 properly this time Pull Request resolved: https://github.com/pytorch/pytorch/pull/44644 Reviewed By: albanD Differential Revision: D23684909 Pulled By: malfet fbshipit-source-id: cea6f6e2ae28138f6b93a6513d1abd36d14ae573	2020-09-14 12:28:09 -07:00
Marcin Juszkiewicz	566b8d0650	handle missing NEON vst1__x2 intrinsics (#44198 ) (#44199 ) Summary: CentOS 8 on AArch64 has vld1_ intrinsics but lacks vst1q_f32_x2 one. This patch checks for it and handle it separately to vld1_* ones. Fixes https://github.com/pytorch/pytorch/issues/44198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44199 Reviewed By: seemethere Differential Revision: D23641273 Pulled By: malfet fbshipit-source-id: c2053c8e0427705eaeeeb82ec030925bff22623a	2020-09-11 16:02:44 -07:00
Yujun	db24c5c582	Change code coverage option name (#43999 ) Summary: According to [documentation](https://github.com/pytorch/pytorch/blob/master/tools/setup_helpers/cmake.py#L265), only options starts with `BUILD_` / `USE_` / `CMAKE_` in `CMakeLists.txt` can be imported by environment variables. --- This diff is originally intended to enable `c++` source coverage with `CircleCI` and `codecov.io`, but we will finish it in the future. You can find the related information in the diff history. Following is the originally procedur: Based on [this pull request](`1bda5e480c`), life becomes much easier for this time. 1.in `build.sh` - Enable coverage builld option for c++ - `apt-get install lcov` 2.in `test.sh` - run `lcov` 3.in `pytorch-job-specs.yml` - copy coverage.info to `test/` folder and upload it to codecov.io Pull Request resolved: https://github.com/pytorch/pytorch/pull/43999 Test Plan: Test on github Reviewed By: malfet Differential Revision: D23464656 Pulled By: scintiller fbshipit-source-id: b2365691f04681d25ba5c00293fbcafe8e8e0745	2020-09-11 15:55:05 -07:00
Bram Wasti	6512032699	[Static Runtime] Add OSS build for static runtime benchmarks (#43881 ) Summary: Adds CMake option. Build with: ``` BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43881 Reviewed By: hlu1 Differential Revision: D23430708 Pulled By: bwasti fbshipit-source-id: a39bf54e8d4d044a4a3e4273a5b9a887daa033ec	2020-09-02 08:00:18 -07:00
Sebastian Pop	c259146477	add missing NEON {vld1,vst1}_*_x2 intrinsics (#43683 ) Summary: Workaround for issue https://github.com/pytorch/pytorch/issues/43265. Add the missing intrinsics until gcc-7 gets the missing patches backported. Fixes https://github.com/pytorch/pytorch/issues/43265. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43683 Reviewed By: albanD Differential Revision: D23467867 Pulled By: malfet fbshipit-source-id: 7c138dd3de3c45852a60f2cfe8b4d7f7cf76bc7e	2020-09-01 21:19:39 -07:00
Rong Rong	8ca3913f47	Introduce BUILD_CAFFE2 flag (#43673 ) Summary: introduce BUILD_CAFFE2 flag. default to `ON`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43673 Reviewed By: malfet Differential Revision: D23381035 Pulled By: walterddr fbshipit-source-id: 1f4582987fa0c4a911f0b18d311c04fdbf8dd8f0	2020-09-01 10:18:23 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Ann Shan	0dc41ff465	[pytorch] add flag for autograd ops to mobile builds (#43154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43154 Adds the build flag `BUILD_MOBILE_AUTOGRAD` which toggles whether autograd files should be included for a PyTorch mobile build (default off). ghstack-source-id: 110369406 Test Plan: CI Reviewed By: ljk53 Differential Revision: D23061913 fbshipit-source-id: bc3d6683ab17f158990d83e4fae0a011d5adeca1	2020-08-20 12:39:55 -07:00
Xiang Gao	ee74c2e5be	Compress fatbin to fit into 32bit indexing (#43074 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39968 tested with `TORCH_CUDA_ARCH_LIST='3.5 5.2 6.0 6.1 7.0 7.5 8.0+PTX'`, before this PR, it was failing, and with this PR, the build succeed. With `TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0+PTX'`, `libtorch_cuda.so` with symbols changes from 2.9GB -> 2.2GB cc: ptrblck mcarilli jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43074 Reviewed By: mrshenli Differential Revision: D23176095 Pulled By: malfet fbshipit-source-id: 7b3e6d049fc080e519f21e80df05ef68e7bea57e	2020-08-18 09:48:54 -07:00
Nikita Shulga	0cf4a5bccb	Add GCC codecoverage flags (#43066 ) Summary: Rename `CLANG_CODE_COVERAGE` option to `CODE_COVERAGE` and add compiler specific flags for GCC and Clang Pull Request resolved: https://github.com/pytorch/pytorch/pull/43066 Reviewed By: scintiller Differential Revision: D23137488 Pulled By: malfet fbshipit-source-id: a89570469692f878d84f7da6f9d5dc01df423e80	2020-08-14 17:16:18 -07:00
Nikita Shulga	ea65a56854	Use `string(APPEND FOO " bar")` instead of `set(FOO "${FOO} bar") (#42844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42844 Reviewed By: scintiller Differential Revision: D23067577 Pulled By: malfet fbshipit-source-id: e4380ce02fd6aca37c955a7bc24435222c5d8b19	2020-08-12 10:33:11 -07:00
Yujun Zhao	7524699d58	Modify clang code coverage to CMakeList.txt (for MacOS) (#42837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42837 Originally we use ``` list(APPEND CMAKE_C_FLAGS -fprofile-instr-generate -fcoverage-mapping) list(APPEND CMAKE_CXX_FLAGS -fprofile-instr-generate -fcoverage-mapping) ``` But when compile project on mac with Coverage On, it has the error: `clang: error: no input files /bin/sh: -fprofile-instr-generate: command not found /bin/sh: -fcoverage-mapping: command not found` The reason behind it, is `list(APPEND CMAKE_CXX_FLAGS` will add an additional `;` to the variable. This means, if we do `list(APPEND foo a)` and then `list(APPEND foo b)`, then `foo` will be `a;b` -- with the additional `;`. Since we have `CMAKE_CXX_FLAGS` defined before in the `CMakeList.txt`, we can only use `set(...)` here After changing it to ``` set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fprofile-instr-generate -fcoverage-mapping") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fprofile-instr-generate -fcoverage-mapping") ``` Test successufully in local mac machine. Test Plan: Test locally on mac machine Reviewed By: malfet Differential Revision: D23043057 fbshipit-source-id: ff6f4891b35b7f005861ee2f8e4c550c997fe961	2020-08-11 09:57:55 -07:00
Khalid Almufti	b282297559	Replace whitelist with allowlist (#42067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41757 I've replaced all the whitelist with allowlist for this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067 Reviewed By: pbelevich Differential Revision: D22791690 Pulled By: malfet fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4	2020-07-28 08:01:16 -07:00
Edward Yang	befb22790f	Fix a number of deprecation warnings (#40179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40179 - Pass no-psabi to shut up GCC about # Suppress "The ABI for passing parameters with 64-byte alignment has changed in GCC 4.6" - Fix use of deprecated data() accessor (and minor optimization: hoist accessor out of loop) - Undeprecate NetDef.num_workers, no one is serious about fixing these - Suppress warnings about deprecated pthreadpool types Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22234138 Pulled By: ezyang fbshipit-source-id: 6a1601b6d7551a7e6487a44ae65b19acdcb7b849	2020-07-14 09:11:34 -07:00
Kimish Patel	d6feb6141f	[Vec256][neon] Add neon backend for vec256 (#39341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341 This PR introduces neon backend for vec256 class for float datatype. For now only aarch64 is enabled due to few issues with enabling in aarch32 bit. Test Plan: vec256_test Imported from OSS Differential Revision: D21822399 fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d	2020-07-09 16:25:09 -07:00
Kimish Patel	bddba1e336	Add benchmark for add op. (#40059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059 This benchmark is added specifically for mobile to see if compiler is autovectorizing and thus we have no advantage of neon backend for vec256 for add op. Test Plan: CI Imported from OSS Differential Revision: D22055146 fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5	2020-07-09 16:22:55 -07:00
Yujun Zhao	22f940b7bd	add clang code coverage compile flags (#41103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41103 add a CLANG_CODE_COVERAGE option to CMakeList. If the option is ON, add code coverage needed compile flags. Test Plan: Clone pytorch source code to local, modified these changes and builded it with `CLANG_CODE_COVERAGE ON` and `BUILD_TESTS ON`. Run a manual test and attach code coverage report. {F243609020} Reviewed By: malfet Differential Revision: D22422513 fbshipit-source-id: 27a31395c31b5b5f4b72523954722771d8f61080	2020-07-09 14:14:18 -07:00
David Reiss	b7e044f0e5	Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5	2020-06-23 19:26:21 -07:00

... 3 4 5 6 7 ...

786 Commits