pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Nikita Shulga	d7e6aaa59b	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-08 21:04:09 +00:00
Nikita Shulga	83086b7f45	Fix NCCL detection by Gloo (#82773 ) Instruct Gloo to always use bundled version of the library by passing `NCCL_EXTERNAL` Otherwise, it would link with shared library if one could be found in the system Pull Request resolved: https://github.com/pytorch/pytorch/pull/82773 Approved by: https://github.com/ngimel	2022-08-04 16:26:30 +00:00
zhang, xiaobing	86b86202b5	fix torch.config can't respect USE_MKLDNN flag issue (#75001 ) Fixes https://github.com/pytorch/pytorch/issues/74949, which reports that torch.config can't respect USE_MKLDNN flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75001 Approved by: https://github.com/malfet	2022-07-17 15:00:48 +00:00
Nikita Shulga	17fe7ce0e4	[BE] Delete Win specific case for CMake older than 3.1 (#81411 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81411 Approved by: https://github.com/janeyx99	2022-07-14 00:31:31 +00:00
Tongliang Liao	dff70a5e1a	Make language std configurable. (#75519 ) RocksDB 7 starts to use C++17 in header. We should make this configurable, in case user needs higher std version. List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`. Doc string is from CMake code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519 Approved by: https://github.com/malfet	2022-07-13 14:21:27 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
Terry Lam	54bdaf76d6	[PFC] Native UCC process group for Pytorch (#79918 ) Summary: This diff integrates UCC process group as a native component of Pytorch Distributed core. It is based on the existing torch-ucc (https://github.com/facebookresearch/torch_ucc) as the wrapper for UCC collective communication library. The environment and cmake variables are named in mirroring to the existing process groups such as NCCL and Gloo. Specifically, - USE_UCC: enables UCC PG. This defaults to OFF, so there is no breakage of existing builds that do not have UCX/UCC external libraries. - USE_SYSTEM_UCC: uses external UCX and UCC shared libraries that are set accordingly with UCX_HOME and UCC_HOME. Currently, this diff only supports USE_SYSTEM_UCC=ON, i.e., requiring users to specify external libraries for UCX and UCC. In subsequent diffs, we will add UCX and UCC repos as third-party dependencies in pytorch/third-party. Test Plan: Passed Torch-UCC tests that invoke UCC process group. For example: $ sh test/start_test.sh test/torch_allreduce_test.py --backend gloo --use-cuda ... Test allreduce: succeeded Differential Revision: D36973688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79918 Approved by: https://github.com/kwen2501, https://github.com/kingchc	2022-07-12 14:45:44 +00:00
Michael Suo	b349d15907	[build] fix compiling with clang13 (#80916 ) This check is incorrect; clang 13.1.0 doesn't exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80916 Approved by: https://github.com/malfet	2022-07-06 02:35:46 +00:00
PyTorch MergeBot	1454515253	Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 )" This reverts commit `f988aa2b3f`. Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see `f988aa2b3f`	2022-06-30 12:49:41 +00:00
Jing Xu	f988aa2b3f	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-06-30 05:14:03 +00:00
Mo Zhou	799d71378c	cmake: Fix variable typo for USE_SYSTEM_PYBIND11. (#80272 ) The correct variable name should be USE_SYSTEM_PYBIND11, as defined in the root CMakeLists.txt. In cmake/Dependencies.cmake, it is incorrectly written as USE_SYSTEM_BIND11, but cmake will not complain about this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80272 Approved by: https://github.com/suo	2022-06-27 02:08:07 +00:00
Toyohisa Kameyama	8adec19230	Specify "Generic" BLAS library name. (#74269 ) When we use pytorch with unregistered blas, spack set BLAS=Generic. pytorch is searched only libblas. If the blas package's blas library name is not libblas, spack install py-torch is failed. This PR set blas lirary names to GENERIC_BLAS_LIBRARIES environment variable, and py-torch is found blas library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74269 Approved by: https://github.com/kit1980	2022-06-20 18:44:54 +00:00
Sergii Dymchenko	f1fb575b9e	Remove -Wno-unused-but-set-variable for clang 13.0.0 (#79666 ) Fixes #74805 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79666 Approved by: https://github.com/malfet	2022-06-16 04:26:39 +00:00
Mark Harfouche	221755cc71	Link BLAS privately (#78883 ) We've some users report that they are getting symbol collisions when linking to blas. I don't see a need to re-export the blas library symbols. I figured I would share here for other packagers to be able to benefit too. xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116 xref: https://github.com/conda-forge/openblas-feedstock/issues/134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78883 Approved by: https://github.com/ezyang	2022-06-09 17:02:06 +00:00
Peter Bell	5cdf79fddc	Bump minimum CMake version to 3.13 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76312 Approved by: https://github.com/malfet	2022-05-19 15:38:55 +00:00
Nikita Shulga	4b4a6a0b19	Use TensorPipe libuv in Gloo (#77312 ) Otherwise, its possible to build TensorPipe with one version of libuv and gloo with another. Also, delete strange `GLOO_INSTALL` logic, as none of the install artifacts are really packaged as part of PyTorch (and it were probably used by Caffe2 builds) This helps solve problem for compiling PyTorch for M1, where `libuv` is not available in conda Pull Request resolved: https://github.com/pytorch/pytorch/pull/77312 Approved by: https://github.com/seemethere	2022-05-17 03:31:48 +00:00
Nikita Shulga	8473173c36	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency. Add `third_party` to torch_cpu include directories if compiling with Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-05-03 20:21:55 +00:00
Peter Bell	653892e288	Kineto: Don't search for CUPTI in default paths Should fix #75369 Searching the default system paths may point to different cuda toolkit versions, so we should restrict the search to only the paths passed explicitly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76188 Approved by: https://github.com/ezyang	2022-04-22 01:08:55 +00:00
PyTorch MergeBot	d79d9fa283	Revert "Remove breakpad dependency" This reverts commit `9aa3c7fd83`. Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet	2022-04-17 17:58:51 +00:00
Nikita Shulga	9aa3c7fd83	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-04-17 17:43:45 +00:00
Min Si	42b4d0e934	[caffe2] remove unecessary RCCL dependency Summary: RCCL is required by two components in hipified Pytorch: (1) gloo and (2) hipified ProcessGroupNCCL. - For (1) the RCCL dependency is managed in `./third_party/gloo/cmake/Dependencies.cmake` and can be enabled/disabled via `USE_RCCL`. - For (2) the RCCL dependency is managed via `./cmake/Dependencies.cmake` and can be on/off via `USE_NCCL`. The additional dependency removed in this commit forced hipified Pytorch to load librccl.so even when USE_RCCL=OFF USE_NCCL=OFF is set, i.e., when using torch_ucc/ucc for AMD GPU mem type. This caused conflicts when we use a non-system default librccl.so (i.e., not in ROCM_PATH) for torch_ucc/ucc. This commit removes the unnecessary RCCL dependency. This will ensure a cleaner way to use torch_ucc with a user-specified RCCL library. Test Plan: ## Verify OSS pytorch on an AMD GPU machine (MI100) ``` ROCM_PATH=/opt/rocm-4.5.2 git clone https://github.com/pytorch/pytorch.git cd pytorch python3 tools/amd_build/build_amd.py USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py develop USE_NCCL=0 USE_RCCL=0 USE_KINETO=0 with-proxy python3 setup.py install ``` log for develop: P492778257 log for install: P492778277 ## Verify OSS pytorch + TorchUCC on an AMD GPU machine (MI100) ``` export RCCL_INSTALL_DIR=/opt/rccl-rocm-rel-4.4 git clone https://github.com/facebookresearch/torch_ucc.git cd torch_ucc UCX_HOME=$RCCL_INSTALL_DIR UCC_HOME=$RCCL_INSTALL_DIR WITH_CUDA=$ROCM_PATH python setup.py # run param comm export HSA_ENABLE_SDMA=0 export LD_LIBRARY_PATH=$RCCL_INSTALL_DIR cd test git clone https://github.com/facebookresearch/param cd .. /bin/bash ./test/start_test.sh ./test/param/train/comms/pt/comms.py --backend ucc --device cuda --b 4 --e 4M --c 1 --collective all_reduce ``` - log for param comm: P493033836 - Verified librccl.so in `/opt/rccl-rocm-rel-4.4` is used via checking version string in log. "[localbuild]" is added in RCCL source. ``` RCCL version 2.9.9+hip4.4 [localbuild] ``` Differential Revision: D35476911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75547 Approved by: https://github.com/malfet, https://github.com/jeffdaily	2022-04-12 17:45:08 +00:00
Michael Suo	e5bf87963d	Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4) Test Plan: revert-hammer Differential Revision: D34584878 (`7dd0823011`) Original commit changeset: ce817aa8cc90 Original Phabricator Diff: D34584878 (`7dd0823011`) fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b (cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)	2022-03-21 23:07:14 +00:00
chunyuan	7dd0823011	Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111 ) Summary: ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111 Reviewed By: eellison Differential Revision: D34584878 Pulled By: malfet fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4 (cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)	2022-03-21 22:12:19 +00:00
Nikita Shulga	14dcb5a1a0	Fix asmjit compilation with clang-13 By suppressed `deprecated-copy` and `unused-but-set-variable` warnings, otherwise compilation fails with implicit default copy constructor: ``` /Users/malfet/git/pytorch/pytorch/third_party/fbgemm/third_party/asmjit/src/asmjit/core/../core/radefs_p.h:174:22: error: definition of implicit copy constructor for 'RARegCount' is deprecated because it has a user-declared copy assignment operator [-Werror,-Wdeprecated-copy] inline RARegCount& operator=(const RARegCount& other) noexcept = default; ``` Fixes https://github.com/pytorch/pytorch/issues/74352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74379 Approved by: https://github.com/seemethere, https://github.com/atalman	2022-03-17 17:09:07 +00:00
Edward Z. Yang	493bbdc4fe	Use shared CUPTI by default Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74009 Approved by: https://github.com/malfet	2022-03-16 21:04:12 +00:00
Andrey Talman	17b3ba148d	Set `BLAS_LIBRARIES` to `${MKL_LIBRARIES}` for MKL case (#72806 ) This reverts [suggestion](https://github.com/pytorch/pytorch/pull/49647#discussion_r677737470) proposed to https://github.com/pytorch/pytorch/pull/49647 Which is somehow sufficient to workaround symptoms of https://github.com/pytorch/pytorch/issue/72653 I.e. before this change, `BLAS_LIBRARIES` were set to `caffe2::mkl` which is an interface library with link property set as follows: `59dd84cab6/cmake/public/mkl.cmake (L10-L12)`	2022-02-16 07:14:27 -08:00
Aaron Enye Shi	8a43aa9538	[Kineto][Bug Fix] Avoid picking up old CUPTI headers (#72761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72761 By default, the CUPTI_INCLUDE_DIR will pick up cupti.h from /usr/include which is old (from 2017 on AWS), and missing many cupti headers. Use NO_DEFAULT_PATH to avoid that, instead search from the list of locations provided. Test Plan: Fixes missing headers error when building on AWS. (Avoids old cupti.h from /usr/include). Instead uses cupti.h from cuda/extras/CUPTI/include. ``` In file included from /scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.cpp:13:0: /scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.h:12:10: fatal error: cupti_profiler_target.h: No such file or directory #include <cupti_profiler_target.h> ^~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. ``` and ``` /scratch/aaronshi/pytorch/third_party/kineto/libkineto/src/CuptiRangeProfilerApi.cpp:7:10: fatal error: nvperf_host.h: No such file or directory #include <nvperf_host.h> ^~~~~~~~~~~~~~~ compilation terminated. ``` Reviewed By: briancoutinho Differential Revision: D34191123 Pulled By: aaronenyeshi fbshipit-source-id: d84f80308c9939ba8ed504e667847d136a261453 (cherry picked from commit `33368bd93b`)	2022-02-15 22:43:03 +00:00
Peter Bell	bc1fb7a618	CMake: Limit python include directories to only python libraries (#69085 ) Summary: `include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes `python`, `numpy` and `pybind11` into targets that only `torch_python` and `caffe2_pybind_state` are linked to. So, python libraries can't be accidentally included elsewhere. Resubmit of https://github.com/pytorch/pytorch/issues/65654, Closes https://github.com/pytorch/pytorch/issues/65828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69085 Reviewed By: anjali411 Differential Revision: D33776456 Pulled By: malfet fbshipit-source-id: 018b0f6cd5a4f8c9e36df961deff832bc4afd479 (cherry picked from commit `57063107d6`)	2022-02-07 21:18:32 +00:00
Peter Bell	847dbb8684	CMake: Clean up unused definitions (#69216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69216 This cleans up 4 pre-processor defines not used by any code: - HAVE_GCC_GET_CPUID - USE_GCC_GET_CPUID - USE_AVX - USE_AVX2 `cpuid` isn't used in PyTorch any more, we only use `cpuinfo`. `USE_AVX` is also not used, instead `HAVE__CPU_DEFINITIONS` tells you which `CPU_CAPABILITY` flags are being compiled. There is also `fbgemm`'s code path adding `third_party` as an include path, despite `fbgemm` having a dedicated include directory and a CMake setup that properly includes it. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33794424 Pulled By: malfet fbshipit-source-id: 99d504af088818d4a26c2f6ce67ec0d59a5eb703 (cherry picked from commit `2e099d41f0`)	2022-01-31 22:49:11 +00:00
Peter Bell	d693739248	CMake: Clean up unused definitions (#69216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69216 Currently `torch_cpu` has command line arguments relating to cuda libraries e.g. `-DMAGMA_V2`. This happens because `include_directories` and `add_definitions` indescriminately change the compile commands of all targets. Instead creating a proper magma target allows limiting the flags to just `torch_cuda`. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33794174 Pulled By: malfet fbshipit-source-id: 762eabf3b9576bef94e8caa3ed4764c0e2c72b08 (cherry picked from commit `f7d127b654`)	2022-01-31 22:49:11 +00:00
Peter Bell	5045c18bd1	Error if pocketfft is not found (#67909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67842 cc mruberry peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67909 Reviewed By: albanD Differential Revision: D33759534 Pulled By: malfet fbshipit-source-id: 03548c95fe233b812b303ce9603c20ff9f626c39 (cherry picked from commit `214624e254`)	2022-01-31 17:29:48 +00:00
Han Qi	1bc3571078	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer module object Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged. Test Plan: unittest Reviewed By: malfet, gmagogsfm Differential Revision: D33239362 fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763	2022-01-12 16:30:39 -08:00
Andrey Talman	6c4437118b	Deprecating Python 3.6 (#70493 ) Summary: Deprecating python 3.6 from documentation and from cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/70493 Reviewed By: suo Differential Revision: D33433118 Pulled By: atalman fbshipit-source-id: c3adc7b75714efdb5b6acda5d4cddc068fb4a145	2022-01-05 11:46:32 -08:00
Michael Suo	1adb70c6f0	Revert D33409880: [pytorch][PR] Deprecating Python 3.6 Test Plan: revert-hammer Differential Revision: D33409880 (`d95be99561`) Original commit changeset: 4f9123398960 Original Phabricator Diff: D33409880 (`d95be99561`) fbshipit-source-id: 32dc1c3c07ef99a04fab7d0fb742cf4e6c4b718a	2022-01-04 16:37:09 -08:00
Andrey Talman	d95be99561	Deprecating Python 3.6 (#70493 ) Summary: Deprecating python 3.6 from documentation and from cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/70493 Reviewed By: malfet Differential Revision: D33409880 Pulled By: atalman fbshipit-source-id: 4f912339896096be95b344724a4d9ae88cdf1a8f	2022-01-04 14:41:27 -08:00
linuxone	f64906f470	ibm z14/15 SIMD support (#66407 ) Summary: https://github.com/pytorch/pytorch/issues/66406 implemented z arch 14/15 vector SIMD additions. so far besides bfloat all other types have their SIMD implementation. it has 99% coverage and currently passing the local test. it is concise and the main SIMD file is only one header file it's using template metaprogramming, mostly. but still, there are a few macrosses left with the intention not to modify PyTorch much Sleef supports z15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66407 Reviewed By: mrshenli Differential Revision: D33370163 Pulled By: malfet fbshipit-source-id: 0e5a57f31b22a718cd2a9ac59753fb468cdda140	2022-01-04 09:40:18 -08:00
Peter Bell	c34aa715fa	AT_MKL_SEQUENTIAL and build changes (#70259 ) Summary: Re-land of https://github.com/pytorch/pytorch/pull/69419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70259 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33246757 Pulled By: ngimel fbshipit-source-id: 738f8558d4cad6752be14108f9931ec3514f6682	2021-12-22 13:52:23 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Robert Blackwell	cee4e8f35d	Add FlexiBLAS build support per #64752 (#64815 ) Summary: To enable building torch+dependencies, set WITH_BLAS=flexi BLAS=FlexiBLAS Fixes https://github.com/pytorch/pytorch/issues/64752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64815 Reviewed By: jbschlosser Differential Revision: D31997745 Pulled By: albanD fbshipit-source-id: db208d59002f5896608a03132616400f09d972aa	2021-10-28 11:28:00 -07:00
Xiang Gao	b8dfb45ac2	Refactor cub namespace handling (#66219 ) Summary: This PR is to update PyTorch with the following cub changes: - Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added. And I do the following change to PyTorch: - Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag. - Fix caffe2 failures caused by the above change. - Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219 Reviewed By: bdhirsh Differential Revision: D31626931 Pulled By: ngimel fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d	2021-10-25 14:37:09 -07:00
Michael Suo	3ac2c74896	Revert D31082208: Use shared CUPTI by default Test Plan: revert-hammer Differential Revision: D31082208 (`8b0eae5aa8`) Original commit changeset: 14f66af92084 fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db	2021-10-12 14:37:54 -07:00
Edward Yang	8b0eae5aa8	Use shared CUPTI by default (#65401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401 Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gdankel Differential Revision: D31082208 Pulled By: ezyang fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34	2021-10-12 11:01:40 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Michael Suo	9b40eaaaab	Revert D31193205: [pytorch][PR] CMake: Limit python include directories to only python libraries Test Plan: revert-hammer Differential Revision: D31193205 (`971c57f1d0`) Original commit changeset: 5c1b554a59d0 fbshipit-source-id: 5719b7df987ded6e7e212749a438db947656df87	2021-09-29 09:49:33 -07:00
Peter Bell	971c57f1d0	CMake: Limit python include directories to only python libraries (#65654 ) Summary: `include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes python, numpy and pybind11 into targets that only torch_python and caffe2_pybind_state are linked to. So, python libraries can't be accidentally included elsewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65654 Reviewed By: gchanan Differential Revision: D31193205 Pulled By: malfet fbshipit-source-id: 5c1b554a59d0e441a701a04ebb62f0032d38b208	2021-09-29 08:09:08 -07:00
Nikita Shulga	399214efd6	Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows Test Plan: revert-hammer Differential Revision: D31172530 (`6b60884f12`) Original commit changeset: 2c69ed0282c5 fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934	2021-09-24 19:18:15 -07:00
Guangyun Han	6b60884f12	Enable CUPTI for kineto by default on windows (#65608 ) Summary: Retry of https://github.com/pytorch/pytorch/pull/62175 See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information. malfet gdankel Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608 Reviewed By: zou3519 Differential Revision: D31172530 Pulled By: gdankel fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66	2021-09-24 13:00:49 -07:00
Nikita Shulga	bc02255d5e	Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows. Test Plan: revert-hammer Differential Revision: D30721329 (`7dbc21bc2b`) Original commit changeset: aa1af47df8cc fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404	2021-09-23 22:14:32 -07:00
Guangyun Han	7dbc21bc2b	Enable CUPTI for kineto by default on windows. (#62175 ) Summary: It fix nothing. For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175 Reviewed By: ezyang Differential Revision: D30721329 Pulled By: gdankel fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84	2021-09-23 15:13:47 -07:00
Nick Kreeger	882b67dff4	Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892 ) Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b	2021-09-14 09:44:18 -07:00
Peter Bell	e4f44bec27	Fix pocketfft include path in mobile build (#63714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714 PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target, Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30498369 Pulled By: malfet fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef	2021-08-23 17:48:57 -07:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
Nikita Shulga	6e5d065b2b	Add pocketfft as submodule (#62841 ) Summary: Using https://github.com/mreineck/pocketfft Also delete explicit installation of pocketfft during the build as it will be available via submodule Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5 Partially addresses https://github.com/pytorch/pytorch/issues/62821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841 Reviewed By: seemethere Differential Revision: D30140441 Pulled By: malfet fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825	2021-08-17 15:29:56 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
Pruthvi Madugundu	ab7a472980	[ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786 ) Summary: - HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm. - TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786 Reviewed By: bdhirsh Differential Revision: D30281682 Pulled By: seemethere fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673	2021-08-13 15:00:43 -07:00
Isuru Fernando	b58e04f156	Make sure FindLAPACK finds the same BLAS library (#49647 ) Summary: BLAS library is found by cmake/Dependencies.cmake and then LAPACK library is found by FindLAPACK.cmake which in turn calls FindBLAS.cmake. This means that we are searching for BLAS twice and they might be different things. By setting a few variables, this can be avoided. cc seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647 Reviewed By: seemethere, ejguan Differential Revision: D29943680 Pulled By: malfet fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59	2021-08-02 20:41:00 -07:00
Can Balioglu	7565039ee9	Support system-provided Intel TBB (#61934 ) Summary: This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic. Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934 Reviewed By: malfet Differential Revision: D29805416 Pulled By: cbalioglu fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd	2021-08-02 07:39:00 -07:00
Hong Xu	7acb8b71e1	Remove AVX detection code that duplicates FindAVX.cmake (#61748 ) Summary: This PR deletes some code in `MiscCheck.cmake` that perform the exact same functionality as `FindAVX.cmake`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748 Reviewed By: ejguan Differential Revision: D29791282 Pulled By: malfet fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213	2021-07-20 14:34:36 -07:00
Tongliang Liao	0afbb9e81e	`PYTHON_LIBRARY` may be set to empty or NOTFOUND. (#61230 ) Summary: Not sure why (maybe from dependencies?) but it can certainly break package lookup upon re-entry of cmake. So instead of checking whether they are defined, we should check whether there is any meaningful value inside. Fixes https://github.com/pytorch/pytorch/issues/59887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61230 Reviewed By: H-Huang Differential Revision: D29668766 Pulled By: malfet fbshipit-source-id: 79a59578740c4434327aff4f9a22eba9c4bf48d1	2021-07-13 07:09:31 -07:00
Nikita Shulga	4036820506	Add PocketFFT support (#60976 ) Summary: Needed on platforms, that do not have MKL, such as aarch64 and M1 - Add `AT_POCKETFFT_ENABLED()` to Config.h.in - Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT - Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations Fixes https://github.com/pytorch/pytorch/issues/41592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976 Reviewed By: walterddr, driazati, janeyx99, samestep Differential Revision: D29466530 Pulled By: malfet fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf	2021-06-30 16:28:20 -07:00
Peter Bell	31a884987d	Remove some TH includes from ATen (#60323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60323 Test Plan: Imported from OSS Reviewed By: malfet, anjali411 Differential Revision: D29252862 Pulled By: ngimel fbshipit-source-id: 9ea13495d382c04dfd52b8dd63314f53b7e83936	2021-06-22 10:55:17 -07:00
Luca Wehrstedt	08ce5eedf5	[reland] Move RPC agents to libtorch (#60170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170 Reland of #59939. Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193234 fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60	2021-06-18 05:15:09 -07:00
Mike Ruberry	f233274f30	Revert D28875276: Move RPC agents to libtorch Test Plan: revert-hammer Differential Revision: D28875276 (`fc50f91929`) Original commit changeset: f2f6970fd74d fbshipit-source-id: 3c52af652579733ebea8ddfb06576a0ce262bf78	2021-06-17 00:48:58 -07:00
Luca Wehrstedt	fc50f91929	Move RPC agents to libtorch (#59939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59939 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28875276 fbshipit-source-id: f2f6970fd74de5f112636e78edaa4410c61d8c45	2021-06-15 16:20:53 -07:00
Nikita Shulga	8845cbabf0	[CMake] Split caffe2::cudnn into public and private (#59721 ) Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336	2021-06-09 13:18:48 -07:00
Michael Wootton	e66015dadf	Add build support for kineto + rocm (#58401 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58399 CMake changes to allow kineto to build with rocm support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58401 Reviewed By: mruberry Differential Revision: D28479807 Pulled By: walterddr fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7	2021-06-03 12:15:20 -07:00
neginraoof	599f5058cf	[ONNX] Update ONNX to rel-1.9 (#55889 ) (#57080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57080 ONNX optimizer is removed in ONNX 1.9 This PR removes ONNX optimizer from a C++ code path and uses `try-except` block in Python to keep it compatible with both ONNX-1.8 and 1.9. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28467330 Pulled By: malfet fbshipit-source-id: 5e4669dd0537648898e593f9e253da18d6dc7568 Co-authored-by: neginraoof <neginmr@utexas.edu> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-06-02 08:27:17 -07:00
Jeff Daily	ba694520e5	[ROCm] fix JIT codegen (#57400 ) Summary: Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT. - ROCM_VERSION macro must be available to both device and host compilation passes. - Unifies some of CUDA and HIP differences in the code generated. - NAN / POS_INFINITY / NEG_INFINITY - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated] - Differentiates bf16 codegen for HIP. - Optionally provides missing macros when using hiprtc precompiled header feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400 Reviewed By: ejguan Differential Revision: D28421065 Pulled By: malfet fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074	2021-05-27 11:45:07 -07:00
Nikita Shulga	7179e7ea7b	[CMake] Prefer third_party/pybind11 by default (#58951 ) Summary: To make build behaviour aligned with other third_party/ libraries, introduce `USE_SYSTEM_PYBIND11 (`d55b25a633`)` build option, which set to OFF by default, which means PyTorch will be build with bundled pybind11 even if other version is already installed locally. Fixes https://github.com/pytorch/pytorch/issues/58750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58951 Reviewed By: driazati Differential Revision: D28690411 Pulled By: malfet fbshipit-source-id: e56b5a8f2a23ee1834b2a6d3807f287149decf8c	2021-05-25 15:10:17 -07:00
Xiang Gao	6c70cbedb6	step 0 of cuDNN v8 convolution API integration (#51390 ) Summary: This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release. The work is not complete, and this PR is only step 0. What this PR does: - Add cudnn-frontend as a submodule. - Modify cmake to build that submodule. - Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default. - Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below. What this PR doesn't: - Only convolution forward, no backward. The backward will use v7 API. - No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions. - No test beyond PyTorch's unit tests. - Not tested for correctness on real models. - Not benchmarked for performance. - Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR) - cuDNN benchmark is not supported. - There are failing tests, which will be resolved later: ``` FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9 FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an... FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM ``` Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390 Reviewed By: malfet Differential Revision: D28513167 Pulled By: ngimel fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740	2021-05-19 12:54:09 -07:00
peter	432676599c	Stop installing libuv on Windows (#51936 ) Summary: Fixes #{issue number} gunandrose4u Pull Request resolved: https://github.com/pytorch/pytorch/pull/51936 Reviewed By: malfet Differential Revision: D28467662 Pulled By: seemethere fbshipit-source-id: 28d203ee3af13d6a3158f188c2e889e310ee6010	2021-05-17 08:52:29 -07:00
Ilia Cherniavskii	6997e7bd39	Update Kineto submodule (#58179 ) Summary: Update Kineto submodule, minor api changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/58179 Test Plan: CI Reviewed By: gdankel Differential Revision: D28391369 Pulled By: ilia-cher fbshipit-source-id: 61fbf63d9ec2db66fac203944679e4b99cb0d568	2021-05-13 04:03:04 -07:00
Ilia Cherniavskii	c714596027	[kineto] Update Kineto submodule, cupti library paths (#57789 ) Summary: Update kineto submodule, improve cupti detection Pull Request resolved: https://github.com/pytorch/pytorch/pull/57789 Test Plan: CI Reviewed By: ngimel Differential Revision: D28297175 Pulled By: ilia-cher fbshipit-source-id: 5895270fae160097ae8872a592984d0e4a1b187b	2021-05-10 19:15:59 -07:00
Ilia Cherniavskii	65fad0ebd2	Expand Kineto platform support (ci-all) (#56323 ) Summary: Expanding support to all builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/56323 Test Plan: CI Reviewed By: malfet Differential Revision: D28171478 Pulled By: ilia-cher fbshipit-source-id: 16bc752d1be3cbaeda5316f5d8a687ae05a83d22	2021-05-05 15:00:01 -07:00
davidriazati@fb.com	c44cbc63cc	Ignore more compiler warnings, unify WERROR options (#56630 ) Summary: This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet). ](https://our.intern.facebook.com/intern/diff/28005063/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630 Pulled By: driazati Reviewed By: malfet Differential Revision: D28005063 fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0	2021-04-29 21:20:29 -07:00
davidriazati@fb.com	4b96fc060b	Remove distutils (#57040 ) Summary: [distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places. Fixes #56527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040 Pulled By: driazati Reviewed By: nikithamalgifb Differential Revision: D28051356 fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720	2021-04-29 12:10:11 -07:00
davidriazati@fb.com	d1b6383d65	Hide warnings for deprecated quantization APIs (#56291 ) Summary: These have a tracking task to actually fix them but in the meantime they should not be clogging up everyone's build output (see #55952). ](https://our.intern.facebook.com/intern/diff/27830229/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56291 Pulled By: driazati Reviewed By: bertmaher Differential Revision: D27830229 fbshipit-source-id: f1e5d6e9b2c63d4a4ad99a1744a520f8c681c22b	2021-04-19 11:11:33 -07:00
Jeff Daily	e1752ffa04	[reland][ROCm] use hiprtc precompiled header (#55965 ) Summary: Revert "Revert D27449031 (`2a7df657fe`): [pytorch][PR] [ROCm] use hiprtc precompiled header". Reland PR https://github.com/pytorch/pytorch/issues/54350. This reverts commit `204ac21bf1`. The original PR was reverted under suspicion that it was causing CI instability, but it was instead due to a hardware failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55965 Reviewed By: jbschlosser Differential Revision: D27755907 Pulled By: malfet fbshipit-source-id: 75bf0b9d888df3dee62f00a366b1123757e0474e	2021-04-15 15:47:56 -07:00
Eddie Yan	81f181567a	Add `USE_MAGMA` build flag (#55994 ) Summary: Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master). A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be manually deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild? CC malfet ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994 Reviewed By: mruberry Differential Revision: D27766287 Pulled By: malfet fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421	2021-04-15 00:43:12 -07:00
Alexander Golynski	204ac21bf1	Revert D27449031: [pytorch][PR] [ROCm] use hiprtc precompiled header Test Plan: revert-hammer Differential Revision: D27449031 (`2a7df657fe`) Original commit changeset: 81a8d7847a47 fbshipit-source-id: b7b970c8ea4110357fba3ad4d52a86fa5641d90c	2021-04-01 06:42:04 -07:00
Jeff Daily	2a7df657fe	[ROCm] use hiprtc precompiled header (#54350 ) Summary: HIP's runtime compiler (hiprtc) is adding support for precompiled HIP headers in the ROCm 4.2 release. Conditionally add support for this feature. Using this feature will improve the ROCm torch wheel user experience; users will no longer need to install HIP headers separately to use torch JIT features. The use of this feature is conditionalized on a new ROCM_VERSION macro. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54350 Reviewed By: H-Huang Differential Revision: D27449031 Pulled By: malfet fbshipit-source-id: 81a8d7847a47ce2bb253d1ea58740ef66ed154a3	2021-03-31 13:36:50 -07:00
Shruti Ramesh	f1f3c8b0fa	Adding PyTorch + DNNL + AMD BLIS path (#54953 ) Summary: These changes provide the user with an additional option to choose the DNNL+BLIS path for PyTorch. This assumes BLIS is already downloaded or built from source and the necessary library file is available at the location: $BLIS_HOME/lib/libblis.so and include files are available at: $BLIS_HOME/include/blis/blis.h and $BLIS_HOME/include/blis/cblas.h Export the below variables to build PyTorch with MKLDNN+BLIS and proceed with the regular installation procedure as below: $export BLIS_HOME=path-to-BLIS $export PATH=$BLIS_HOME/include/blis:$PATH LD_LIBRARY_PATH=$BLIS_HOME/lib:$LD_LIBRARY_PATH $export BLAS=BLIS USE_MKLDNN_CBLAS=ON WITH_BLAS=blis $python setup.py install CPU only Dockerfile to build PyTorch with AMD BLIS is available at : docker/cpu-blis/Dockerfile Example command line to build using the Dockerfile: sudo DOCKER_BUILDKIT=1 docker build . -t docker-image-repo-name Example command line to run the built docker container: sudo docker run --name container-name -it docker-image-repo-name Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/54953 Reviewed By: glaringlee Differential Revision: D27466799 Pulled By: malfet fbshipit-source-id: e03bae9561be3a67429df3b1be95a79005c63050	2021-03-31 10:40:25 -07:00
Jeff Daily	1dffbe759b	[ROCm] utilize PUBLIC vs PRIVATE linking to avoid incorrect dependencies (#54727 ) Summary: Fixes the build of projects that depend on torch, such as torchaudio. Otherwise torchaudio will complain that gloo_hip is missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54727 Reviewed By: H-Huang Differential Revision: D27361513 Pulled By: ezyang fbshipit-source-id: 714cc2db23e7adf3e89303e941b78c27625b9460	2021-03-30 19:22:56 -07:00
Michael Melesse	2620bce42a	[ROCM] load only hipfft separately past rocm4.1 (#54349 ) Summary: This PR is a follow up to https://github.com/pytorch/pytorch/pull/53408. It only loads hipfft if the version is rocm 4.1 or after and stops loading rocfft. This was done to resolve some issues observed in our internal ci due to conflicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54349 Reviewed By: ezyang Differential Revision: D27374252 Pulled By: ngimel fbshipit-source-id: 724e80df5011ea8fabd81739e18ae8a13d3a7ea0	2021-03-26 19:54:25 -07:00
Michael Melesse	4c1af249fb	[ROCM] load hipfft separately from rocfft (#53408 ) Summary: This PR makes changes to how hipfft is loaded in pytorch. hipfft is packaged in a separate library to rocfft following rocm 4.1. We check the rocm version and if it is past rocm 4.1 we load hipfft in addition to rocfft. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53408 Reviewed By: albanD Differential Revision: D26952702 Pulled By: malfet fbshipit-source-id: f42be304b587c060816e39d36f5c1a2cdc37bfab	2021-03-11 09:18:33 -08:00
Ilia Cherniavskii	795ed5ca3f	Enable Kineto in CPU builds (#53174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53174 Enable Kineto also in the CPU builds (non-mobile, non-Windows(atm)) Test Plan: CI Reviewed By: gdankel Differential Revision: D26776112 Pulled By: ilia-cher fbshipit-source-id: 8733f65c2993105136c853f2a7b6e497d0fa53bf	2021-03-04 19:15:52 -08:00
Ashkan Aliabadi	e5ecd1ddf8	[Vulkan]Fix build warnings-treated-as-error on Linux. (#52781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D26669311 Pulled By: AshkanAliabadi fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311	2021-03-03 13:48:43 -08:00
Jeff Daily	d02ea9a141	[ROCm] add hipMAGMA support (#51238 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48831. - CI image is updated to build hipMAGMA from source and set env MAGMA_HOME. - CMake is updated to separate different requirements for CUDA versus ROCm MAGMA. - Some unit tests that become enabled with MAGMA are currently skipped for ROCm due to failures. Fixing these failures will be follow-on work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51238 Reviewed By: ngimel Differential Revision: D26184918 Pulled By: malfet fbshipit-source-id: ada632f1ae7b413e8cae6543fe931dcd46985821	2021-02-01 22:09:33 -08:00
Luca Wehrstedt	b77f72b5a0	Enable TensorPipe's SHM transport (#50760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120470890 Test Plan: Exported three times to CircleCI with tests consistently passing Reviewed By: mrshenli Differential Revision: D23814828 fbshipit-source-id: f355cb6515776debad536924de4f4d3fbb05a874	2021-01-27 11:45:09 -08:00
Jeff Daily	b2e5617553	[ROCm] rename HIP_HCC_FLAGS to HIP_CLANG_FLAGS (#50917 ) Summary: ROCm 3.5 replaced hcc with hip-clang and deprecated HIP_HCC_FLAGS. HIP_CLANG_FLAGS should be used moving forward. HIP_HCC_FLAGS will be removed soon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50917 Reviewed By: ejguan Differential Revision: D26008094 Pulled By: walterddr fbshipit-source-id: cfec4f96fbd9bd338834a841c37267f6a4703cab	2021-01-22 07:24:05 -08:00
Ilia Cherniavskii	e34992ebee	Set USE_KINETO=1 (#49897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49897 Resend of https://github.com/pytorch/pytorch/pull/49201 Test Plan: see 49201 Reviewed By: malfet Differential Revision: D25717102 Pulled By: ilia-cher fbshipit-source-id: 5e794a7f5fe160ca64ac9d190c4fd3e8f1e443e6	2021-01-22 00:09:21 -08:00
Luca Wehrstedt	112a583467	Enable TensorPipe's CMA channel (#50759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50759 ghstack-source-id: 120032288 Test Plan: Exported to CircleCI and tested Reviewed By: mrshenli Differential Revision: D25959326 fbshipit-source-id: be6df209ff3a79a8961acbda64ee7805a5c434a9	2021-01-20 10:53:47 -08:00
Ilia Cherniavskii	72b00a8a52	Revert D25480770: Set USE_KINETO=1 Test Plan: revert-hammer Differential Revision: D25480770 (`1a92802bde`) Original commit changeset: 037cd774f554 fbshipit-source-id: 6a6062195033ca91fcc0cfa1e890e47efc774ac1	2020-12-18 07:06:28 -08:00
Ilia Cherniavskii	1a92802bde	Set USE_KINETO=1 (#49201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49201 This unblocks kineto profiler for 1.8 release. This PR supercedes https://github.com/pytorch/pytorch/pull/48391 Note: this will somewhat increase the size of linux server binaries, bc we add libkineto.a and libcupti_static.a: -rw-r--r-- 1 jenkins jenkins 1107502 Dec 10 21:16 build/lib/libkineto.a -rw-r--r-- 1 root root 13699658 Nov 13 2019 /usr/local/cuda/lib64/libcupti_static.a Test Plan: CI https://github.com/pytorch/pytorch/pull/48391 Imported from OSS Reviewed By: ngimel Differential Revision: D25480770 fbshipit-source-id: 037cd774f5547d9918d6055ef5cc952a54e48e4c	2020-12-18 01:48:10 -08:00
Abdelrauf	95a1725a4a	Vsx initial support issue27678 (#41541 ) Summary: ### Pytorch Vec256 ppc64le support implemented types: - double - float - int16 - int32 - int64 - qint32 - qint8 - quint8 - complex_float - complex_double Notes: All basic vector operations are implemented: There are a few problems: - minimum maximum nan propagation for ppc64le is missing and was not checked - complex multiplication, division, sqrt, abs are implemented as PyTorch x86. they can overflow and have precision problems than std ones. That's why they were either excluded or tested in smaller domain range - precisions of the implemented float math functions ~~Besides, I added CPU_CAPABILITY for power. but as because of quantization errors for DEFAULT I had to undef and use vsx for DEFAULT too~~ #### Details ##### Supported math functions + plus sign means vectorized, - minus sign means missing, (implementation notes are added inside braces) (notes). Example: -(both ) means it was also missing on x86 side g( func_name) means vectorization is using func_name sleef - redirected to the Sleef unsupported function_name \| float \| double \| complex float \| complex double \|-- \| -- \| -- \| -- \| --\| acos \| sleef \| sleef \| f(asin) \| f(asin) asin \| sleef \| sleef \| +(pytorch impl) \| +(pytorch impl) atan \| sleef \| sleef \| f(log) \| f(log) atan2 \| sleef \| sleef \| unsupported \| unsupported cos \| +((ppc64le:avx_mathfun) ) \| sleef \| -(both) \| -(both) cosh \| f(exp) \| -(both) \| -(both) \| erf \| sleef \| sleef \| unsupported \| unsupported erfc \| sleef \| sleef \| unsupported \| unsupported erfinv \| - (both) \| - (both) \| unsupported \| unsupported exp \| + \| sleef \| - (x86:f()) \| - (x86:f()) expm1 \| f(exp) \| sleef \| unsupported \| unsupported lgamma \| sleef \| sleef \| \| log \| + \| sleef \| -(both) \| -(both) log10 \| f(log) \| sleef \| f(log) \| f(log) log1p \| f(log) \| sleef \| unsupported \| unsupported log2 \| f(log) \| sleef \| f(log) \| f(log) pow \| + f(exp) \| sleef \| -(both) \| -(both) sin \| +((ppc64le:avx_mathfun) ) \| sleef \| -(both) \| -(both) sinh \| f(exp) \| sleef \| -(both) \| -(both) tan \| sleef \| sleef \| -(both) \| -(both) tanh \| f(exp) \| sleef \| -(both) \| -(both) hypot \| sleef \| sleef \| -(both) \| -(both) nextafter \| sleef \| sleef \| -(both) \| -(both) fmod \| sleef \| sleef \| -(both) \| -(both) [Vec256 Test cases Pr https://github.com/pytorch/pytorch/issues/42685](https://github.com/pytorch/pytorch/pull/42685) Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on Vsx vec_round failed for (even)+0 .5 values~~ . it was solved by using vec_rint - ~~complex types are not tested~~ After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Pull Request resolved: https://github.com/pytorch/pytorch/pull/41541 Reviewed By: zhangguanheng66 Differential Revision: D23922049 Pulled By: VitalyFedyunin fbshipit-source-id: bca25110afccecbb362cea57c705f3ce02f26098	2020-12-10 13:42:39 -08:00
peterjc123	5450614cf6	Correctly apply WIN32_LEAN_AND_MEAN to the whole repo (#49025 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49025 Reviewed By: zhangguanheng66 Differential Revision: D25399912 Pulled By: ezyang fbshipit-source-id: 9b7225b0e43511e0b8981c39035d814a4406c523	2020-12-08 19:38:23 -08:00
Rong Rong	b89c328493	Add fftw3 cmake as alternative for FFT/DFT (#48808 ) Summary: added cmake discovery in Dependencies.cmake for fftw3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48808 Reviewed By: janeyx99 Differential Revision: D25375320 Pulled By: walterddr fbshipit-source-id: cde3afc51eef9c621c7d19be7ad7573fc8b838c2	2020-12-08 10:35:33 -08:00
Jithun Nair	5f62308739	Hipify revamp [REDUX] (#48715 ) Summary: [Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451] This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to cpp_extension.py to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path. The list of changes to cpp_extension.py is as follows: 1. Call hipify when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715 Reviewed By: bdhirsh Differential Revision: D25272824 Pulled By: ezyang fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e	2020-12-02 18:03:23 -08:00
Daily, Jeff	7f869dca70	[ROCm] update debug flags (#46717 ) Summary: Improves support for rocgdb when setting DEBUG=1 and building for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46717 Reviewed By: mrshenli Differential Revision: D25171544 Pulled By: malfet fbshipit-source-id: b4699ba2277dcb89f07efb86f7153fae82a80dc3	2020-11-30 15:27:30 -08:00
Rong Rong	af520d9d04	[cmake] clean up blas discovery (#47940 ) Summary: remove useless variable changes in blas discovery Pull Request resolved: https://github.com/pytorch/pytorch/pull/47940 Reviewed By: malfet Differential Revision: D25122228 Pulled By: walterddr fbshipit-source-id: 12bc3ce9e4f89a72b6a92c10d14024e5941f4b96	2020-11-30 10:29:50 -08:00
Nikita Shulga	e7ca62be08	Fix PyTorch compilation on Apple M1 (#48275 ) Summary: Update cpuinfo and sleef to contain build fixes for M1 Fixes https://github.com/pytorch/pytorch/issues/48145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48275 Reviewed By: walterddr Differential Revision: D25135153 Pulled By: malfet fbshipit-source-id: 2a82e14407d6f40c7dacd11109a8499d808c8ec1	2020-11-26 07:08:33 -08:00
Ilia Cherniavskii	f7a8bf2855	Use libkineto in profiler (#46470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46470 Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Reviewed By: Chillee Differential Revision: D25142223 Pulled By: ilia-cher fbshipit-source-id: b0dff46c28da5fb0a8e01cf548aa4f2b723fde80	2020-11-25 04:32:16 -08:00
Ilia Cherniavskii	f2da18af14	Add USE_KINETO build option (#45888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45888 Adding USE_LIBKINETO build option Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake Reviewed By: Chillee Differential Revision: D25142221 Pulled By: ilia-cher fbshipit-source-id: d1634a8f9599604ff511fac59b9072854289510c	2020-11-21 20:20:32 -08:00
Nikita Shulga	8af9f2cc23	Revert D24924736: [pytorch][PR] Hipify revamp Test Plan: revert-hammer Differential Revision: D24924736 (`10b490a3e0`) Original commit changeset: 4af42b8ff4f2 fbshipit-source-id: 7f8f90d55d8a69a2890ec73622fcea559189e381	2020-11-18 11:48:30 -08:00
Jithun Nair	10b490a3e0	Hipify revamp (#45451 ) Summary: This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to `cpp_extension.py` to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path. The list of changes to `cpp_extension.py` is as follows: 1. Call `hipify` when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451 Reviewed By: ezyang Differential Revision: D24924736 Pulled By: malfet fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d	2020-11-18 08:37:49 -08:00
Rong Rong	7391edb591	[hotfix] fix misleadingly summary BLAS=MKL when there's no BLAS install (#47803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47803 Reviewed By: samestep Differential Revision: D24907453 Pulled By: walterddr fbshipit-source-id: a3e41041f6aa506b054eb0ffc61f8525ba02cbf1	2020-11-12 16:05:14 -08:00
Nikita Shulga	e8a73fbf34	Workaround PyTorch debug build crash using old GCC (#47805 ) Summary: gcc-7.4.x or older fails to compile XNNPACK in debug mode with internal compiler error Workaround this in a build script by pasing -O1 optimisation flag to XNNPACK if compiled on older compilers Fixes https://github.com/pytorch/pytorch/issues/47292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47805 Reviewed By: seemethere Differential Revision: D24905758 Pulled By: malfet fbshipit-source-id: 93f4e3b3b5c10b69734627c50e36b2eb544699c8	2020-11-11 16:33:47 -08:00
Nikita Shulga	83d358da7c	Fix LAPACK functionality detection from static OpenBLAS (#46710 ) Summary: BLAS `sgemm_` only depends on pthreads, but LAPACK `cheev_` also depends on libm Pull Request resolved: https://github.com/pytorch/pytorch/pull/46710 Reviewed By: walterddr Differential Revision: D24476082 Pulled By: malfet fbshipit-source-id: e0b91116f18bbcdabb1f99c2ec9d98283df4393f	2020-10-26 08:34:28 -07:00
peter	89f368bef8	Enable XNNPACK on Windows & Update XNNPACK (#45830 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44283. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45830 Reviewed By: zhangguanheng66 Differential Revision: D24504302 Pulled By: ezyang fbshipit-source-id: ab28088a4fbb553a27ed7c8da87ec7b40c73c2f1	2020-10-23 14:17:45 -07:00
Shen Li	eadc59df55	Enable TP_USE_CUDA and TP_ENABLE_CUDA_IPC (#46523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46523 Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D24385830 Pulled By: mrshenli fbshipit-source-id: 59a40843b4dc1585e176062476da9ab74c84179b	2020-10-19 09:05:00 -07:00
peterjc123	bb99bea774	Compress NVCC flags for Windows (#45842 ) Summary: Fixes #{issue number} This makes the command line shorter. Also updates `randomtemp` in which the previous version has a limitation that the length of the argument cannot exceed 260. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45842 Reviewed By: albanD Differential Revision: D24137088 Pulled By: ezyang fbshipit-source-id: f0b4240735306e302eb3887f54a2b7af83c9f5dc	2020-10-07 08:39:15 -07:00
Xiang Gao	2fa062002e	CUDA BFloat16 infrastructure (#44925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44925 Reviewed By: agolynski Differential Revision: D23783910 Pulled By: ngimel fbshipit-source-id: dacac2ad87d58056bdc68bfe0b7ab1de5c2af0d8	2020-10-02 16:21:30 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
peter	ed862d3682	Split CUDA_NVCC_FLAGS by space (#44603 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44603 Reviewed By: albanD Differential Revision: D23692320 Pulled By: ezyang fbshipit-source-id: 6a63d94ab8b88e7a82f9d65f03523d6ef639c754	2020-09-14 20:25:37 -07:00
Nikita Shulga	fc51047af5	Small fixes in Dependency.cmake and run_test.py (#44414 ) Summary: Do not add gencode flags to NVCC_FLAGS twice: First time they are added in `cmake/public/cuda.cmake` no need to do it again in `cmake/Dependencies.cmake` Copy `additional_unittest_args` before appending local options to it in `run_test()` method Pull Request resolved: https://github.com/pytorch/pytorch/pull/44414 Reviewed By: seemethere Differential Revision: D23605733 Pulled By: malfet fbshipit-source-id: 782a0da61650356a978a892fb03c66cb1a1ea26b	2020-09-09 15:09:33 -07:00
Parichay Kapoor	8ecfa9d9a2	[cmake] End support for python3.5 for pytorch (#43105 ) Summary: PyTorch uses f-string in its python codes. Python support for f-string started with version 3.6 Using python version 3.5 or older fails the build with latest release/master. This patch checks the version of the python used for build and mandates it to be 3.6 or higher. Signed-off-by: Parichay Kapoor <kparichay@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/43105 Reviewed By: glaringlee Differential Revision: D23301481 Pulled By: malfet fbshipit-source-id: e9b4f7bffce7384c8ade3b7d131b10cf58f5e8a0	2020-08-25 09:42:42 -07:00
Luca Wehrstedt	c30bc6d4d7	Update TensorPipe submodule (#42522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CI Reviewed By: malfet Differential Revision: D22959472 fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67	2020-08-06 02:14:58 -07:00
Edward Yang	352e15f1a2	Revert D22812445: Update TensorPipe submodule Test Plan: revert-hammer Differential Revision: D22812445 (`2335430086`) Original commit changeset: e6d824bb28f5 fbshipit-source-id: 606632a9aaf2513b5ac949e4d6687aa7563eae5d	2020-07-31 10:16:48 -07:00
Luca Wehrstedt	2335430086	Update TensorPipe submodule (#42225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42225 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CircleCI is all green. Reviewed By: beauby Differential Revision: D22812445 fbshipit-source-id: e6d824bb28f5afe75fd765de0430968174f3531f	2020-07-30 02:32:52 -07:00
Alexander Grund	a4b831a86a	Replace if(NOT ${var}) by if(NOT var) (#41924 ) Summary: As explained in https://github.com/pytorch/pytorch/issues/41922 using `if(NOT ${var})" is usually wrong and can lead to issues like https://github.com/pytorch/pytorch/issues/41922 where the condition is wrongly evaluated to FALSE instead of TRUE. Instead the unevaluated variable name should be used in all cases, see the CMake docu for details. This fixes the `NOT ${var}` cases by using a simple regexp replacement. It seems `pybind11_PREFER_third_party` is the only variable really prone to causing an issue as all others are set. However due to CMake evaluating unquoted strings in `if` conditions as variable names I recommend to never use unquoted `${var}` in an if condition. A similar regexp based replacement could be done on the whole codebase but as that does a lot of changes I didn't include this now. Also `if(${var})` will likely lead to a parser error if `var` is unset instead of a wrong result Fixes https://github.com/pytorch/pytorch/issues/41922 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41924 Reviewed By: seemethere Differential Revision: D22700229 Pulled By: mrshenli fbshipit-source-id: e2b3466039e4312887543c2e988270547a91c439	2020-07-23 15:49:20 -07:00
Anush Elangovan	c86699d425	[cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387 ) Summary: Add support for including pytorch via an add_subdirectory() This requires using PROJECT_* instead of CMAKE_* which refer to the top-most project including pytorch. TEST=add_subdirectory() into a pytorch checkout and build. There are still some hardcoded references to TORCH_SRC_DIR, I will fix in a follow on commit. For now you can create a symlink to <pytorch>/torch/ in your project. Change-Id: Ic2a8aec3b08f64e2c23d9e79db83f14a0a896abc Pull Request resolved: https://github.com/pytorch/pytorch/pull/41387 Reviewed By: zhangguanheng66 Differential Revision: D22539944 Pulled By: ezyang fbshipit-source-id: b7e9631021938255f0a6ea897a7abb061759093d	2020-07-15 11:09:05 -07:00
Alexander Grund	ac3542fa59	Define PSIMD_SOURCE_DIR when including FP16 (#41233 ) Summary: Avoids a superflous redownload when *NNPACK is not used (e.g. on Power) Example: https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/1128/consoleFull Search for "Downloading PSimd" See also https://github.com/pytorch/pytorch/issues/41178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41233 Differential Revision: D22488833 Pulled By: malfet fbshipit-source-id: 637291419ddd3b2a8dc25e211a4ebbba955e5855	2020-07-10 16:55:10 -07:00
Kimish Patel	d6feb6141f	[Vec256][neon] Add neon backend for vec256 (#39341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341 This PR introduces neon backend for vec256 class for float datatype. For now only aarch64 is enabled due to few issues with enabling in aarch32 bit. Test Plan: vec256_test Imported from OSS Differential Revision: D21822399 fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d	2020-07-09 16:25:09 -07:00
Kimish Patel	bddba1e336	Add benchmark for add op. (#40059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059 This benchmark is added specifically for mobile to see if compiler is autovectorizing and thus we have no advantage of neon backend for vec256 for add op. Test Plan: CI Imported from OSS Differential Revision: D22055146 fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5	2020-07-09 16:22:55 -07:00
Alexander Grund	7c29a4e66f	Don't add NCCL dependency to gloo if system NCCL is used (#41180 ) Summary: This avoids a (currently only) warning of cmake: ``` The dependency target "nccl_external" of target "gloo_cuda" does not exist. Call Stack (most recent call first): CMakeLists.txt:411 (include) ``` This will be a real problem once Policy CMP0046 is set which will make this warning be an error Pull Request resolved: https://github.com/pytorch/pytorch/pull/41180 Differential Revision: D22460623 Pulled By: malfet fbshipit-source-id: 0222b12b435e5e2fdf2bc85752f95abba1e3d4d5	2020-07-09 12:10:29 -07:00
Ashkan Aliabadi	c8deca8ea8	Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40524 Reviewed By: ezyang Differential Revision: D22215742 Pulled By: AshkanAliabadi fbshipit-source-id: ef594e0901337a92b21ddd44e554da66c723eb7c	2020-07-09 10:00:36 -07:00
Thomas Viehmann	a8bc7545d5	use PYTORCH_ROCM_ARCH to set GLOO_ROCM_ARCH (#40170 ) Summary: Previously it used the default arch set which may or may not coincide with the user's. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40170 Differential Revision: D22400866 Pulled By: xw285cornell fbshipit-source-id: 222ba684782024fa68f37bf7d4fdab9a2389bdea	2020-07-07 19:41:02 -07:00
David Reiss	b7e044f0e5	Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5	2020-06-23 19:26:21 -07:00
Kate Mormysh	92d3182c11	Revert D21232894: Unify PyTorch mobile's threadpool usage. Test Plan: revert-hammer Differential Revision: D21232894 (`b9d3869df3`) Original commit changeset: 8b3de86247fb fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd	2020-06-23 17:09:14 -07:00
Ashkan Aliabadi	b9d3869df3	Unify PyTorch mobile's threadpool usage. (#37243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243 * Why * As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version. The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands. This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell. So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do. The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the exact same third party implementation in this PR. Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged. * How * This is where things get tricky. A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use. pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far. Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration. When it is all said or done, the layering will look like this: a) aten::parallel_for, uses b) caffe2::PThreadPool, which uses c) pthreadpool C API, which delegates to c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here. c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to c-2-1) caffe2::ThreadPool, and the rabbit hole ends here. NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b). Differential Revision: D21232894 Test Plan: Imported from OSS Reviewed By: dreiss Pulled By: AshkanAliabadi fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354	2020-06-23 16:34:51 -07:00
Richard Zou	2ba5f98dd1	Revert D22068657: [pytorch][PR] Remove global CMAKE_INSTALL_RPATH_USE_LINK_PATH directive Test Plan: revert-hammer Differential Revision: D22068657 Original commit changeset: b04c529572a9 fbshipit-source-id: d8227dfc12d9b6382f7bf2905686b6025034561c	2020-06-17 13:05:01 -07:00
mattip	49732f0450	Remove global CMAKE_INSTALL_RPATH_USE_LINK_PATH directive (#37737 ) Summary: Closes gh-35418, PR gh-16414 added [the `CMAKE_INSTALL_RPATH_USE_LINK_PATH`directive](https://github.com/pytorch/pytorch/pull/16414/files#diff-dcf5891602b4162c36c2125c806639c5R16) which is non-standard and will cause CMake to write an `RPATH` entry for libraries outside the current build. Removing it leaves an RPATH entry for `$ORIGIN` but removes the entries for things like `/usr/local/cuda-10.2/lib64/stubs:/usr/local/cuda-10.2/lib64` for `libcaffe2_nvrtc.so` on linux. The added test fails before this PR, passes after. It is equivalent to checking `objdump -p torch/lib/libcaffe2_nvrtc.so \| grep RPATH` for an external path to the directory where cuda "lives" I am not sure if it solve the `rpath/libc++.1.dylib` problem for `_C.cpython-37m-darwin.so` on macOS in issue gh-36941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37737 Differential Revision: D22068657 Pulled By: ezyang fbshipit-source-id: b04c529572a94363855f1e4dd3e93c9db3c85657	2020-06-16 11:18:39 -07:00
peter	0f39ed86a7	Cleanup debug info switches with MSVC (#39703 ) Summary: Switch off `/Z7` so that we don't generate debug info in Release and MinSizeRel builds, so that we will probably get smaller static libraries and object files and faster build time Pull Request resolved: https://github.com/pytorch/pytorch/pull/39703 Differential Revision: D21960684 Pulled By: ezyang fbshipit-source-id: 909a237a138183591d667885b13fc311470eed65	2020-06-09 14:11:40 -07:00
Michael Voznesensky	fce01a9bab	[JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379 ) Summary: Before: ``` 2020-05-11 18:31:41 INFO Benchmarking 'basic', best of 10 runs (with 1 warmup runs) { "Big Tensors Save": { "mean": 17.8048762, "median": 17.458917 }, "Big Tensors Load": { "mean": 3.2556887, "median": 2.9668495000000004 }, "Small Tensors Save": { "mean": 4.0381357, "median": 3.9440125 }, "Small Tensors Load": { "mean": 5.8792499, "median": 5.603067 }, "benchmark_run_at": "2020-05-12T01:31:41" } ``` After ``` Use zipfile serialization: True 2020-05-12 20:15:32 INFO Benchmarking 'basic', best of 10 runs (with 1 warmup runs) { "Big Tensors Save": { "mean": 4.7534657, "median": 4.646732 }, "Big Tensors Load": { "mean": 3.6001919, "median": 3.493285 }, "Small Tensors Save": { "mean": 4.1066924, "median": 4.1219255 }, "Small Tensors Load": { "mean": 6.3902358, "median": 6.36977 }, "benchmark_run_at": "2020-05-13T03:15:32" } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38379 Differential Revision: D21779494 Pulled By: voznesenskym fbshipit-source-id: 694d65029a5b817424d454bd331e285df828c67a	2020-05-29 01:56:18 -07:00
Ivan Kobzarev	928e99b9bb	[vulkan] jni build support USE_VULKAN (#39188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39188 Extracting Vulkan_LIBS and Vulkan_INCLUDES setup from `cmake/Dependencies.cmake` to `cmake/VulkanDependencies.cmake` and reuse it in android/pytorch_android/CMakeLists.txt Adding control to build with Vulkan setting env variable `USE_VULKAN` for `scripts/build_android.sh` `scripts/build_pytorch_android.sh` We do not use Vulkan backend in pytorch_android, but with this build option we can track android aar change with `USE_VULKAN` added. Currently it is 88Kb. Test Plan: Imported from OSS Differential Revision: D21770892 Pulled By: IvanKobzarev fbshipit-source-id: a39433505fdcf43d3b524e0fe08062d5ebe0d872	2020-05-28 15:39:02 -07:00
peter	1fef2075a5	Disable some unsupported module for 32-bit build (#38950 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38322#issuecomment-632976523 and https://github.com/pytorch/pytorch/issues/38322#issuecomment-628698852. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38950 Differential Revision: D21721918 Pulled By: ezyang fbshipit-source-id: 999788bb88d3e3c2c06f8dec4f0d6b3389095936	2020-05-26 08:30:35 -07:00
Ivan Kobzarev	b460465a18	[Mobile GPU][Integration] Vulkan backend integration (#36491 ) Summary: This PR contains the initial version of Vulkan (GPU) Backend integration. The primary target environment is Android, but the desktop build is also supported. ## CMake Introducing three cmake options: USE_VULKAN: The main switch, if it is off, all other options do not affect. USE_VULKAN_WRAPPER: ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h. OFF - linking with libvulkan.so directly USE_VULKAN_SHADERC_RUNTIME: ON - Shader compilation library will be linked, and shaders will be compiled runtime. OFF - Shaders will be precompiled and shader compilation library is not included. ## Codegen if `USE_VULKAN_SHADERC_RUNTIME` is ON: Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp. if `USE_VULKAN_SHADERC_RUNTIME` is OFF: The source of shaders is included as `glsl.h`,`glsl.cpp`. All codegen results happen in the build directory. ## Build dependencies cmake/Dependencies.cmake If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK. Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it. (Desktop build was tested only on Linux). ## Pytorch integration: Adding 'Vulkan" as new Backend, DispatchKey, DeviceType. We are using Strided layout without supporting strides at the moment, but we plan to support them in the future. Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor, more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h` Main code location: `aten/src/ATen/native/vulkan` `aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor. `aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops. `aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API ## GLSL shaders Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files. All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3 ## Supported operations Code point: conv2d no-groups conv2d depthwise addmm upsample nearest 2d clamp hardtanh ## Testing `aten/src/ATen/test/vulkan_test.cpp` - contains tests for copy from CPU to Vulkan and back all supported operations Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader ## Vulkan execution The initial implementation is trivial and waits every operator's execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491 Differential Revision: D21696709 Pulled By: IvanKobzarev fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa	2020-05-26 08:30:13 -07:00
Wojciech Baranowski	945672bf3e	cmake: improve dependencies in incremental builds (#37661 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26304 Test procedure: With ninja: [x] Build a clean checkout [x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. [x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files [x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding. [x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. Without ninja: [x] Build a clean checkout [x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661 Differential Revision: D21434624 Pulled By: ezyang fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338	2020-05-06 14:25:18 -07:00
Lucas Hosseini	8a30553738	[TensorPipe/RPC] Add TensorPipe dependency (#36695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36695 Reviewed By: lw Differential Revision: D21312297 Pulled By: beauby fbshipit-source-id: 39fdc3de91efa4ac97dd169f09fb304b273b0050	2020-04-30 11:05:15 -07:00
Mo Zhou	58a46a174e	[cmake] add USE_SYSTEM_{XNNPACK,ONNX} options. (#37501 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37501 Differential Revision: D21303527 Pulled By: ezyang fbshipit-source-id: 58353d78c66e5bcc9198ce8cde36ac7232bb4b2f	2020-04-29 09:26:16 -07:00
Michael Suo	68895eda9d	add fmt, take 7 (#37356 ) Summary: fmt is a formatting library for C++. It has several properties that make it nice for inclusion in PyTorch: - Widely used - Basically copies how Python does it - Support for all the compilers and platforms we care about - Standards track (C++20) - Small code size - Header only This PR includes it as a submodule and sets up the build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37356 Differential Revision: D21262619 Pulled By: suo fbshipit-source-id: 1d9a1a5ed08a634213748e7b02fc718ef8dac4c9	2020-04-29 09:08:24 -07:00
cyy	9259a283b7	use detected python version to find pylibs (#34041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34041 Differential Revision: D21302552 Pulled By: ezyang fbshipit-source-id: 140c3d2146bad8feb425cf3670cffdbabc5101b1	2020-04-29 08:17:15 -07:00
Mo Zhou	5b9f7f7b0e	[cmake] Add USE_SYSTEM_{GLOO,FP16,PTHREADPOOL,PSIMD,FXDIV,BENCHMARK} options (#14699 ) (#37277 ) Summary: These options are disabled by default, and are supposed to be used by linux distro developers. With the existing shortcut option USE_SYSTEM_LIBS toggled, these new options will be enabled as well. Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should no longer check the existence of git submodules. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277 Differential Revision: D21256999 Pulled By: ezyang fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf	2020-04-27 09:37:27 -07:00
Mo Zhou	007163407c	[cmake] Support "Generic" BLAS (#14699 ) (#37276 ) Summary: The "Generic" BLAS refers to the Netlib BLAS. This option is meaningful to the Debian family due to the "update-alternatives" mechanism, which enables the user to switch the libblas.so providers between different implementations at runtime, such as ATLAS, OpenBLAS, and Intel MKL. Such, building against generic BLAS provides much flexibility. This new option is not documented in setup.py because it's only supposed to be used by linux distro (especially Debian family) developersonly. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37276 Differential Revision: D21256877 Pulled By: ezyang fbshipit-source-id: 55a5356653a1cfc763a5699b04afe5938f2007ec	2020-04-27 08:17:43 -07:00
Mo Zhou	ff21b15624	cmake: add USE_SYSTEM_{LIBS,CPUINFO,SLEEF} options (#14699 ) (#37137 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37137 Differential Revision: D21222632 Pulled By: ezyang fbshipit-source-id: 47624b30f8d07b31a40a26edf665bbec39e45202	2020-04-23 20:43:36 -07:00
David Reiss	83de675ebf	Fail CMake setup if trying to build with Python 2 (#35612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35612 Python 2 has reached end-of-life and is no longer supported by PyTorch. To spare users from a long, doomed build when trying to use PyTorch with Python 2, detect this case early and fail with a clear message. This commit covers CMake setup. Test Plan: Attempted to build PyTorch with Python 2 and saw a clear error quickly. Differential Revision: D20842873 Pulled By: dreiss fbshipit-source-id: b35e38c12f9381ff4ca10cf801b7a03da87b1d19	2020-04-16 10:22:36 -07:00
Yinghai Lu	eb00bac2b5	Make FakeLowP tests work (#36525 ) Summary: Make the e2e FakeLowP python tests work with Glow lowering in OSS environment. Added a README.md as a guideline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36525 Reviewed By: hyuen Differential Revision: D21004706 Pulled By: yinghai fbshipit-source-id: d182152e4a1a3368640bd7872cb9ea4d4bff4b02	2020-04-13 20:16:33 -07:00
Yinghai Lu	c1efe1ddb5	Enable building of FakeLowP ops (#36170 ) Summary: We open sourced the FakeLowp ops as a reference implementation of fp16 ops. This PR makes it buildable. ``` USE_CUDA=0 USE_ROCM=0 USE_FAKELOWP=ON python setup.py install ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36170 Test Plan: Build Onnxifi library in Glow. ``` cp ${GLOW}/build/lib/Onnxifi/libonnxifi-glow.so ${MY_PATH}/ibonnxifi.so LD_LIBRARY_PATH=${MY_PATH}/ibonnxifi.so python pytorch/caffe2/python/fakelowp/test_sls_nnpi_fp16.py ``` It doesn't run successfully right now because we need to open source the glow gflags and some other ops like `FbgemmPack`. Reviewed By: houseroad Differential Revision: D20980681 Pulled By: yinghai fbshipit-source-id: 6dd31883a985850a77261bcc527029479bbc303f	2020-04-11 13:17:59 -07:00
Owen Anderson	b8383b3d4c	[WIP] Enable NNC's LLVM dependency in CI (#35564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35564 Differential Revision: D20848144 Pulled By: resistor fbshipit-source-id: 992589447162766fbe8df0c696563511a2bb8e52	2020-04-06 15:54:35 -07:00
Nikita Shulga	e2adcc1c53	Report CUDA separate compilation flag (#35726 ) Summary: In Summary specify whether CUDA code is compiled with separate compilation enabled Also, correctly handle space-separate TORCH_NVCC_FLAGS when adding them to NVCC_CUDA_FLAGS Pull Request resolved: https://github.com/pytorch/pytorch/pull/35726 Test Plan: CI + local build with TORCH_NVCC_FLAGS set to "-Xfatbin -compress-all" Differential Revision: D20830885 Pulled By: malfet fbshipit-source-id: 0e0ecab4a97b6c8662a2c4bfc817857da9f32201	2020-04-02 19:35:02 -07:00
peter	3bdc4a37ed	CMake script cleanup - mixed case for function names (#35589 ) Summary: Running the following code. ```bash cmake --help-command-list \| grep -v "cmake version" \| while read c; do echo 's/\b'"$(echo $c \| tr '[:lower:]' '[:upper:]')"'$\s$(/'"$c"'\1(/g' done >convert.sed && git ls-files -z -- bootstrap '.cmake' '.cmake.in' 'CMakeLists.txt' \| egrep -z -v '^(cmake/Modules/\|cmake/Modules_CUDA_fix/)' \| xargs -0 sed -i -f convert.sed && rm convert.sed ``` cmake-lint is too sensitive about mixed case so I didn't switch the check on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35589 Differential Revision: D20735648 Pulled By: ezyang fbshipit-source-id: a09a60a7ce921bb198575a35335faa299bd10b66	2020-03-30 11:37:02 -07:00
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
Johannes M Dieterich	835ee34e38	[ROCm] Update to ROCm 3.1.1 (#35552 ) Summary: Redux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35552 Differential Revision: D20701593 Pulled By: ezyang fbshipit-source-id: 1946d1e8fb47d597da903bae5d355bf52a5f017f	2020-03-27 12:21:12 -07:00
peter	f5383a213f	Fix openmp detection with clang-cl (#35365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35365 Differential Revision: D20653049 Pulled By: ezyang fbshipit-source-id: 193c0d956b1aea72b3daa104ef49c4bf167a165a	2020-03-26 19:59:53 -07:00
Edward Yang	3622e1c90f	Revert D20589048: [pytorch][PR] [ROCm] Update CI dockers to ROCm release 3.1.1 Test Plan: revert-hammer Differential Revision: D20589048 Original commit changeset: 568f40c1b90f fbshipit-source-id: 724c4fe99e8806f00d2f7dceb71d15a02358f663	2020-03-26 09:31:59 -07:00
Johannes M Dieterich	f7f7c4edd9	[ROCm] Update CI dockers to ROCm release 3.1.1 (#33930 ) Summary: Request to update ROCm CI dockers to release 3.1 Changes required to the PyTorch source base attached: * switch to the fast path for the Caffe2 ReLU operator * switch to the new hipMemcpyWithStream(stream) API to replace hipMemcpyAsync(stream) && hipStreamSynchronize(stream) paradigm in an optimized fashion * disable two regressed unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/33930 Differential Revision: D20589048 Pulled By: ezyang fbshipit-source-id: 568f40c1b90f311eb2ba57f02a9901114d8364af	2020-03-26 07:55:44 -07:00
Nikita Shulga	f87cd83d11	Append multiple arguments to list of flags as multiple items (#34899 ) Summary: This makes PyTorch compileable(but not linkable) with `CUDA_SEPARABLE_COMPILATION` option enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34899 Test Plan: CI Differential Revision: D20501050 Pulled By: malfet fbshipit-source-id: 02903890a827fcc430a26f397d4d05999cf3a441	2020-03-17 16:48:32 -07:00
Mikhail Zolotukhin	ea5c86c276	[TensorExpr] Add LLVM codegen. (#34228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34228 This PR adds LLVM codegen to tensor expressions. LLVM is added as an optional build dependency specified with `USE_LLVM=<path_to_llvm>` variable. If this variable is not set or LLVM is not found in the specified path, the LLVM codegen is completely disabled. Differential Revision: D20251832 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 77e203ab4421eb03afc64f8da17e0daab277ecc2	2020-03-16 11:49:34 -07:00
Kimish Patel	84bd71dbd4	Enable threading for XNNPACK ops. (#34547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34547 This enables threading by passing a threadpool to xnnpack ops. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20370553 fbshipit-source-id: 4db08e73f8c69b9e722b0e11a00621c4e229a31a	2020-03-14 12:53:36 -07:00
Hong Xu	e73d4286b0	Fix conflict between XNNPACK's clog dependency and our cpuinfo dependency (#33922 ) Summary: Currently if we run ```bash DEBUG=1 ONNX_ML=0 MAX_JOBS=8 CMAKE_CXX_COMPILER_LAUNCHER=ccache CMAKE_C_COMPILER_LAUNCHER=ccache CMAKE_CUDA_COMPILER_LAUNCHER=ccache USE_OPENMP=0 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_NCCL=0 USE_CUDA=1 USE_CUDNN=0 USE_STATIC_CUDNN=0 USE_NNPACK=0 USE_QNNPACK=0 USE_FBGEMM=0 BUILD_TEST=0 TORCH_CUDA_ARCH_LIST="6.1" python setup.py develop --cmake-only ``` then `touch build/CMakeCache.txt` (which adjusting build options will do), then `python setup.py develop`, the following error message will show up: ``` CMake Error at build/clog-source/CMakeLists.txt:249 (ADD_SUBDIRECTORY): ADD_SUBDIRECTORY not given a binary directory but the given source directory "/home/hong/wsrc/pytorch/build/clog-source" is not a subdirectory of "/home/hong/wsrc/pytorch/build/clog-source". When specifying an out-of-tree source a binary directory must be explicitly specified. ``` This is due to a conflict between our cpuinfo submodule and XNNPACK's external clog dependency. Moving our cpuinfo upward and setting CLOG_SOURCE_DIR resolves the issue. --- Also reverted https://github.com/pytorch/pytorch/issues/33947 , where `CLOG_SOURCE_DIR` as an option is not quite appropriate (given that cpuinfo uses its included clog subdir) and the setting of this variable should be a bit later when the dir of cpuinfo is known. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33922 Differential Revision: D20193572 Pulled By: ezyang fbshipit-source-id: 7cdbdc947a6c7e0ef10df33feccb5b20e1b3ba43	2020-03-02 10:40:12 -08:00
Kimish Patel	0e52627358	Fixing pthreadpool symbol conflict issue. (#33869 ) Summary: Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that is conflicting, to pthread_create_c2. Removed 2 other conflicting symbols that are not used internally at all. Pointing XNNPACK to original repo instead of the fork. Copy pasted the new interface and implementation to caff2/utils/threadpool, so that for internal builds we compile against this. When threadpool is unified this will be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869 Differential Revision: D20140580 Pulled By: kimishpatel fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3	2020-02-28 21:23:18 -08:00
David Reiss	991f7a20f2	Use clog from cpuinfo/deps instead of downloading (#33947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33947 XNNPACK was downloading clog because we weren't setting CLOG_SOURCE_DIR. Actually, it was downloading cpuinfo and pointing to the copy of clog within that. So let's just point to the copy of clog within the cpuinfo submodule we already have. (Note: this ignores all push blocking failures!) Test Plan: Ran cmake and didn't see any downloading. Verified that our clog is the same as the one that was being downloaded with `diff -Naur`. Differential Revision: D20169656 Pulled By: suo fbshipit-source-id: ba0f7d1535f702e504fbc4f0102e567f860db94b	2020-02-28 15:19:03 -08:00
Wojciech Baranowski	8aa09de19e	build: set -DNDEBUG in Release (#32719 ) Summary: This might lead to silent undefined behaviour (e.g. with out-of-bound indices). This affects `test_multinomial_invalid_probs_cuda` which is now removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32719 Test Plan: * Build with VERBOSE=1 and manually inspect `less ndebug.build.log \| grep 'c++' \| grep -v -- -DNDEBUG` (only with nina on Linux) * CI Fixes https://github.com/pytorch/pytorch/issues/22745 Differential Revision: D20104340 Pulled By: yf225 fbshipit-source-id: 2ebfd7ddae632258a36316999eeb5c968fb7642c	2020-02-26 12:53:31 -08:00
Ashkan Aliabadi	6aecfd1e80	Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c	2020-02-24 21:58:56 -08:00
Ashkan Aliabadi	039dc90854	Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration. Test Plan: revert-hammer Differential Revision: D19521853 Original commit changeset: 99a1fab31d0e fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6	2020-02-23 22:07:19 -08:00
Ashkan Aliabadi	941b42428a	Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509 ) Summary: In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Reviewed By: dreiss Differential Revision: D19521853 Pulled By: AshkanAliabadi fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa	2020-02-23 19:08:42 -08:00
Hong Xu	15ba902c08	Turn ONNX_ML into a proper build option. (#33424 ) Summary: The detection of the env variable ONNX_ML has been properly handled in tools/setup_helpers/cmake.py, line 242. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33424 Differential Revision: D20043991 Pulled By: ezyang fbshipit-source-id: 91d1d49a5a12f719e67d9507cc203c8a40992f03	2020-02-21 15:42:33 -08:00
Hongzhang Shan	5e80ca12bb	[pt][fbgemm] Turn on USE_FBGEMM on Windows env (#297 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/297 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33250 As Title says. FBGEMM has recently added the support for Windows. ghstack-source-id: 97932881 Test Plan: CI Reviewed By: jspark1105 Differential Revision: D19738268 fbshipit-source-id: e7f3c91f033018f6355edeaf6003bd2803119df4	2020-02-19 15:09:21 -08:00
davidriazati	74ce3a032c	Fix some bugs with zipfile serialization (#32244 ) Summary: Stacked PRs * #32958 - Make zip serialization the default * #32244 - Fix some bugs with zipfile serialization It includes the following changes: * Split up tests so that we can test both serialization methods * Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end) * Call `readinto` on a buffer if possible instead of `read` + a copy * Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine) ](https://our.intern.facebook.com/intern/diff/19418935/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32244 Pulled By: driazati Reviewed By: eellison Differential Revision: D19418935 fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573	2020-02-05 15:32:14 -08:00
Nathan Goldbaum	1f1ce53e8e	Don't install pybind11 header directory for system pybind11 installs (#30758 ) Summary: For system pybind11 installs this is a system header location that should not get installed since it might include other unrelated headers. Since the header is already installed for a system install there's no need to install the headers, so only do the install when we use the bundled pybind11 version. Closes https://github.com/pytorch/pytorch/issues/29823. Closes https://github.com/pytorch/pytorch/issues/30627. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30758 Differential Revision: D18820189 Pulled By: bddppq fbshipit-source-id: fcc9fa657897e18c07da090752c912e3be513b17	2019-12-04 16:43:21 -08:00
Sebastian Messmer	bc2e6d10fa	Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14" Summary: Original commit changeset: 775d2e29be0b Test Plan: CI Reviewed By: mruberry Differential Revision: D18775520 fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac	2019-12-03 14:33:43 -08:00
Sebastian Messmer	a2ed50c920	Revert D17908478: Switch PyTorch/Caffe2 to C++14 Test Plan: revert-hammer Differential Revision: D17908478 Original commit changeset: 6e340024591e fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d	2019-11-27 14:57:05 -08:00
Sebastian Messmer	d0acc9c085	Switch PyTorch/Caffe2 to C++14 (#30406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406 ghstack-source-id: 94642238 Test Plan: waitforsandcastle Differential Revision: D17908478 fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb	2019-11-27 10:47:31 -08:00
Daya Khudia	79b797ccac	Build time warning on windows for fbgemm (#29062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29062 Build time warning ghstack-source-id: 94202405 Test Plan: None Reviewed By: jianyuh Differential Revision: D18279505 fbshipit-source-id: 873cdeb848d34849d6babc435b1a42171f0609a3	2019-11-19 14:30:20 -08:00
Junjie Bai	b0c245d52d	Consolidate the places that find pybind11 include dirs (#29659 ) Summary: Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659 Differential Revision: D18458208 Pulled By: bddppq fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d	2019-11-12 14:51:56 -08:00
Junjie Bai	f111f1b1a7	Suppress implicit int-float conversion warning in ROCm build (#29604 ) Summary: ``` c10/util/Half.h:467:37: warning: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-int-float-conversion] return f < limit::lowest() \|\| f > limit::max(); ~ ^~~~~~~~~~~~ c10/util/Half.h:497:41: note: in instantiation of function template specialization 'c10::overflows<long, double>' requested here if (!std::is_same<To, bool>::value && overflows<To, From>(f)) { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29604 Differential Revision: D18440713 Pulled By: bddppq fbshipit-source-id: f059b4e37e90fa84308be52ff5e1070ffd04031e	2019-11-12 10:44:28 -08:00
Sergei Nikolaev	1e2049c566	#26426 fixed (#28715 ) Summary: This is the fix for reverted https://github.com/pytorch/pytorch/issues/26426 houseroad bddppq soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/28715 Reviewed By: hl475 Differential Revision: D18146731 Pulled By: houseroad fbshipit-source-id: 247366451a6334e84df82d00339521f797b33130	2019-11-01 12:53:01 -07:00
Junjie Bai	d37c2d7c8d	Revert D17495965: TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test Test Plan: revert-hammer Differential Revision: D17495965 Original commit changeset: 3e8dbe8943f5 fbshipit-source-id: d47fcbec22b0d61df41d7dbf15cfdde196ac818f	2019-10-25 13:58:16 -07:00
Sergei Nikolaev	4996e3aca2	TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test (#26426 ) Summary: This PR makes Caffe2 compatible with TensorRT 6. To make sure it works well, new unit test is added. This test checks PyTorch->ONNX->TRT6 inference flow for all classification models from TorhchVision Zoo. Note on CMake changes: it has to be done in order to import onnx-tensorrt project. See https://github.com/pytorch/pytorch/issues/18524 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26426 Reviewed By: hl475 Differential Revision: D17495965 Pulled By: houseroad fbshipit-source-id: 3e8dbe8943f5a28a51368fd5686c8d6e86e7f693	2019-10-25 13:01:57 -07:00
Peter Bell	03d24dba6c	Fix static linking cuDNN without static CUDA (#28378 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/27887#issuecomment-544649765 The logs show that `USE_STATIC_CUDNN` is used but not `CAFFE2_STATIC_LINK_CUDA`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28378 Differential Revision: D18061841 Pulled By: ezyang fbshipit-source-id: 3b9b49953094e02f808ff12107ba4226688d9986	2019-10-22 10:08:09 -07:00
Edward Yang	a3902c901a	Revert "Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887 )" (#28310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28310 This reverts commit `3d3bff5ff1`. Test Plan: Imported from OSS Differential Revision: D18042859 Pulled By: ezyang fbshipit-source-id: cded781dda6fcc04199af6abd07ac09fdc0405de	2019-10-21 14:45:17 -07:00
Peter Bell	3d3bff5ff1	Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15476, supersedes https://github.com/pytorch/pytorch/issues/23496, supersedes and closes https://github.com/pytorch/pytorch/issues/27607 As explained by rgommers in https://github.com/pytorch/pytorch/issues/23496, linking against the expanded library path for `libculibos` in `cmake/Dependencies.cmake` hard codes the path into the distributed cmake files. Instead, I only link against the targets (e.g. `caffe2::cudnn`) and move the dependency on `libculibos` into the cuda import targets declared in `cmake/public/cuda.cmake`. That file is distributed with the other cmake files and so the variable is expanded on the user's machine. I am now also using `CMAKE_STATIC_LIBRARY_SUFFIX` instead of `.a` to fix the windows issue from https://github.com/pytorch/pytorch/issues/15828. I don't have a windows setup to confirm though. Finally, to get pytorch to compile with the extra libraries enabled, I also had to link `__caffe2_nccl` to `torch_python`; otherwise I was getting include errors as the hard coded include directory was wrong. `nccl` is built into `build` not `third_party/build`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27887 Differential Revision: D17929440 Pulled By: ezyang fbshipit-source-id: 3db6bd94d758fca2e1d6a64f4f5eea03cc07cf64	2019-10-16 09:21:47 -07:00
Johannes M Dieterich	17c672e704	enable rocTX API (#27416 ) Summary: ROCm 2.9 brings support for the rocTX API through rocTracer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416 Differential Revision: D17777480 Pulled By: bddppq fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7	2019-10-05 01:55:00 -07:00
Junjie Bai	f4d0d0a811	Enable RCCL in ROCm build (#27383 ) Summary: continues https://github.com/pytorch/pytorch/pull/23884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383 Differential Revision: D17767248 Pulled By: bddppq fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90	2019-10-04 17:41:41 -07:00
Hong Xu	5e5cbceeba	remove tools/setup_helpers/cudnn.py (#25876 ) Summary: FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed. Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876 Differential Revision: D17346270 Pulled By: ezyang fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894	2019-09-24 07:44:33 -07:00
Jiakai Liu	d6e3aed032	add eigen blas for mobile build (#26508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26508 Enable BLAS for pytorch mobile build using Eigen BLAS. It's not most juicy optimization for typical mobile CV models as we are already using NNPACK/QNNPACK for most ops there. But it's nice to have good fallback implementation for other ops. Test Plan: - Create a simple matrix multiplication script model: ``` import torch class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.weights = torch.ones(1000, 1000) def forward(self, x): return torch.mm(x, self.weights) n = Net() module = torch.jit.trace_module(n, {'forward': torch.ones(1000, 1000)}) module.save('mm.pk') ``` - Before integrate with eigen blas: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch \ --model=mm.pk \ --input_dims="1000,1000" \ --input_type=float \ --warmup=5 \ --iter=5' Milliseconds per iter: 2218.52. ``` - After integrate with eigen blas: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch_eigen \ --model=mm.pk \ --input_dims="1000,1000" \ --input_type=float \ --warmup=5 \ --iter=5' Milliseconds per iter: 314.535. ``` - Improve MobileNetV2 single thread perf by ~5%: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch \ --model=mobilenetv2.pk \ --input_dims="1,3,224,224" \ --input_type=float \ --warmup=5 \ --iter=20 \ --print_output=false \ --caffe2_threadpool_force_inline=true' Milliseconds per iter: 367.055. adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch_eigen \ --model=mobilenetv2.pk \ --input_dims="1,3,224,224" \ --input_type=float \ --warmup=5 \ --iter=20 \ --print_output=false \ --caffe2_threadpool_force_inline=true' Milliseconds per iter: 348.77. ``` Differential Revision: D17489587 fbshipit-source-id: efe542db810a900f680da7ec7e60f215f58db66e	2019-09-20 15:45:11 -07:00
Ashkan Aliabadi	dc851ab5d4	Integrate forked QNNPACK into mobile PyTorch builds. (#25844 ) Summary: Enable forked QNNPACK builds in PyTorch mobile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25844 Differential Revision: D17336458 Pulled By: AshkanAliabadi fbshipit-source-id: 6ea09dd6c114b64313e9159bf7f17253bc87bfdb	2019-09-16 20:50:43 -07:00
Jiakai Liu	075adb4d2d	remove pthreadpool.a from install directory (#25977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977 Call add_subdirectory() explicitly before NNPACK/QNNPACK with EXCLUDE_FROM_ALL property so that pthreadpool target won't be installed by default for libtorch mobile build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977 Test Plan: Imported from OSS Differential Revision: D17312083 Pulled By: ljk53 fbshipit-source-id: 79851d0aa9402c5b9287ef4bbd8d7fd3a341497d	2019-09-11 12:27:56 -07:00
Jiakai Liu	74b48f21c1	remove protobuf from Dependencies.cmake for libtorch mobile build (#25958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25958 Should have cleaned up the remaining protobuf dependencies before landing PR #25896. Test Plan: - CI build; Reviewed By: dreiss Differential Revision: D17296949 Pulled By: ljk53 fbshipit-source-id: 20c444e63900c7fa054db3cc757d3f18614af630	2019-09-10 18:23:20 -07:00
Soumith Chintala	73855ecd43	fix cudnn static linkage (#25848 ) Summary: Fix regression caused by https://github.com/pytorch/pytorch/pull/24938 This fixes CUDA nightly breakages Pull Request resolved: https://github.com/pytorch/pytorch/pull/25848 Differential Revision: D17256348 Pulled By: soumith fbshipit-source-id: dded577717947d0f092e9d76b423b2bc7c56070a	2019-09-08 21:41:57 -07:00
J M Dieterich	748436a514	Enable BLIS from the FLAME project as a BLAS choice. (#23819 ) Summary: BLIS is AMD's official recommendation for BLAS. Mimicks my ancient `f5bc78263e` in cmake upstream BLIS WWW: https://github.com/flame/blis Pull Request resolved: https://github.com/pytorch/pytorch/pull/23819 Differential Revision: D17231360 Pulled By: bddppq fbshipit-source-id: 68db70d63e410438f99b2bf57986b81ff6b6c5b3	2019-09-06 12:00:25 -07:00
Jiakai Liu	67c530851c	get rid of protobuf dependencies (#25650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650 This PR removes protobuf dependencies from mobile build altogether: - caffe2/proto: protobuf files, including caffe2.proto and torch.proto; - caffe2 components that depend on caffe2.proto, including most part of caffe2/core, caffe2/utils; - libprotobuf / libprotobuf-lite dependencies; - protobuf compiler; - some utils class, e.g.: netdef_converter.cpp; - introduce a macro to disable third_party/onnx which depends on protobuf; Test Plan: - builds; - link with demo app to make sure it can load and run a model in pickle format; Differential Revision: D17183548 Pulled By: ljk53 fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531	2019-09-06 08:48:20 -07:00
Johannes M Dieterich	9c5a899773	Enable jit fusion on ROCm (#22872 ) Summary: As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails * new hipification rules for API_RTC * add hiprtc APIs to the shim loader * update cmake infrastructure to find the hiprtc library (it is part of the HIP package) * enabling of unit tests in the jit_fuser test set * special casing in resource strings for HIP - the typedefs CUDA requires would be redundant * for now disable the occupancy calculation we do not support yet and hard-code Thanks to t-vi for working with me on getting this integration done! Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872 Differential Revision: D17207425 Pulled By: bddppq fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe	2019-09-05 18:22:08 -07:00
Pieter Noordhuis	3556bea5aa	Build torch.distributed with Gloo backend on macOS (#25260 ) Summary: In facebookincubator/gloo#212, a libuv based Gloo transport was introduced, which allows us to use Gloo on macOS (and later perhaps also Windows). This commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS. A few notes: * The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`. * The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS). * The TCP store works but sometimes crashes on process termination. * The distributed tests are not yet run. * The nightly builds don't use `USE_DISTRIBUTED=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260 Reviewed By: mrshenli Differential Revision: D17202381 Pulled By: pietern fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c	2019-09-05 07:09:50 -07:00

... 2 3 4 5 6 ...

625 Commits