pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiang Gao	c66ca74f03	Add device debug info to CUDA build (#31929 ) Summary: Also print NVCC flags in the summary Pull Request resolved: https://github.com/pytorch/pytorch/pull/31929 Differential Revision: D19312079 Pulled By: ezyang fbshipit-source-id: cd20d5a385f61174c1907a9ad883c04de66ef037	2020-01-08 09:56:20 -08:00
Edward Yang	a9dae70bae	Remove LibIRC logic from cmake. (#31152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31152 Per apaszke: I can't find any reasonable references to libIRC online, so I decided to remove this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262582 Pulled By: ezyang fbshipit-source-id: a1d47462427a3e0ca469062321d608e0badf8548	2020-01-06 14:39:43 -08:00
Hong Xu	daf00beaba	Remove duplicated Numa detection code. (#30628 ) Summary: cmake/Dependencies.cmake (`1111a6b810/cmake/Dependencies.cmake (L595-L609)`) has already detected Numa. Duplicated detection and variables may lead to incorrect results. Close https://github.com/pytorch/pytorch/issues/29968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628 Differential Revision: D18782479 Pulled By: ezyang fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0	2020-01-03 08:48:46 -08:00
Sebastian Messmer	5554e5b793	Docs: c++11 -> c++14 (#30530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30530 Switch some mentions of "C++11" in the docs to "C++14" ghstack-source-id: 95812049 Test Plan: testinprod Differential Revision: D18733733 fbshipit-source-id: b9d0490eb3f72bad974d134bbe9eb563f6bc8775	2019-12-17 14:09:02 -08:00
Peter Bell	7cb83bea3b	Fix static cuda builds on older cmake versions (#30935 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/28378#issuecomment-562597033 To reproduce the failure I had to downgrade to `cmake 3.9` (Ubuntu 18 uses 3.10 apparently). These older `cmake` versions unfortunately don't seem to allow `target_link_libraries(INTERFACE)` to be used with imported libraries. Switching back to `set_property(TARGET)` fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30935 Differential Revision: D18956912 Pulled By: albanD fbshipit-source-id: a2b728ee3268599a428b7878c988e1edef5d9dda	2019-12-14 20:29:27 -08:00
Richard Zou	9047d4df45	Remove all remaining usages of BUILD_NAMEDTENSOR (#31116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116 Changelist: - remove BUILD_NAMEDTENSOR macro - remove torch._C._BUILD_NAMEDTENSOR - remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR Future: - In the next diff, I will remove all usages of ATen/core/EnableNamedTensor.h since that header doesn't do anything anymore - After that, we'll be done with the BUILD_NAMEDTENSOR removal. Test Plan: - run CI Differential Revision: D18934951 Pulled By: zou3519 fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d	2019-12-12 09:53:03 -08:00
Jiakai Liu	bf1b4b6fef	add torch_cpu to the static library list in TorchConfig.cmake.in (#30769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30769 The TorchConfig.cmake is the public cmake we produce in install folder for 3rd party client code to get all libtorch dependencies easily. Apparently this build flow is not well covered by our CI (which is focused on 1st party build / shared libraries?) as the little dummy project for code analysis testing purpose was broken by #30315 without fail any CI. Fixed the problem for mobile build and add the dummy project build to mobile CI as well. Test Plan: - make sure new CI pass; Differential Revision: D18825054 Pulled By: ljk53 fbshipit-source-id: 80506f3875ffbc1a191154bb9e3621c621e08b12	2019-12-05 11:13:32 -08:00
Nathan Goldbaum	1f1ce53e8e	Don't install pybind11 header directory for system pybind11 installs (#30758 ) Summary: For system pybind11 installs this is a system header location that should not get installed since it might include other unrelated headers. Since the header is already installed for a system install there's no need to install the headers, so only do the install when we use the bundled pybind11 version. Closes https://github.com/pytorch/pytorch/issues/29823. Closes https://github.com/pytorch/pytorch/issues/30627. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30758 Differential Revision: D18820189 Pulled By: bddppq fbshipit-source-id: fcc9fa657897e18c07da090752c912e3be513b17	2019-12-04 16:43:21 -08:00
Edward Yang	38986e1dea	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. This is a reland of https://github.com/pytorch/pytorch/pull/29731 but I've extracted all of the prep work into separate PRs which can be landed before this one. Some things of note: * torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) * The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO" * A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly * I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an exported fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way. * There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706 Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18790941 Pulled By: ezyang fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7	2019-12-04 08:04:57 -08:00
Sebastian Messmer	bc2e6d10fa	Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14" Summary: Original commit changeset: 775d2e29be0b Test Plan: CI Reviewed By: mruberry Differential Revision: D18775520 fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac	2019-12-03 14:33:43 -08:00
Sebastian Messmer	a2ed50c920	Revert D17908478: Switch PyTorch/Caffe2 to C++14 Test Plan: revert-hammer Differential Revision: D17908478 Original commit changeset: 6e340024591e fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d	2019-11-27 14:57:05 -08:00
Sebastian Messmer	d0acc9c085	Switch PyTorch/Caffe2 to C++14 (#30406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406 ghstack-source-id: 94642238 Test Plan: waitforsandcastle Differential Revision: D17908478 fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb	2019-11-27 10:47:31 -08:00
Jiakai Liu	43fb0015db	custom build script (#30144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144 Create script to produce libtorch that only contains ops needed by specific models. Developers can use this workflow to further optimize mobile build size. Need keep a dummy stub for unused (stripped) ops because some JIT side logic requires certain function schemas to be existed in the JIT op registry. Test Steps: 1. Build "dump_operator_names" binary and use it to dump root ops needed by a specific model: ``` build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml ``` 2. The MobileNetV2 model should use the following ops: ``` - aten::t - aten::dropout - aten::mean.dim - aten::add.Tensor - prim::ListConstruct - aten::addmm - aten::_convolution - aten::batch_norm - aten::hardtanh_ - aten::mm ``` NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm". You need fix it manually for now. 3. Run custom build script locally (use Android as an example): ``` SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` 4. Checkout demo app that uses locally built library instead of downloading from jcenter repo: ``` git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git ``` 5. Copy locally built libraries to demo app folder: ``` find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \; ``` 6. Build demo app with locally built libtorch: ``` cd ${HOME}/src/android-demo-app/HelloWorldApp ./gradlew clean && ./gradlew assembleDebug ``` 7. Install and run the demo app. In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M. Test Plan: Imported from OSS Differential Revision: D18612127 Pulled By: ljk53 fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a	2019-11-20 13:16:02 -08:00
Edward Yang	b0309d1b5b	More documentation on caffe2_interface_library (#29903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29903 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18616888 Pulled By: ezyang fbshipit-source-id: 360760a688dcc8ba117cd79d89db2afb2c35ab27	2019-11-20 08:58:01 -08:00
Daya Khudia	79b797ccac	Build time warning on windows for fbgemm (#29062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29062 Build time warning ghstack-source-id: 94202405 Test Plan: None Reviewed By: jianyuh Differential Revision: D18279505 fbshipit-source-id: 873cdeb848d34849d6babc435b1a42171f0609a3	2019-11-19 14:30:20 -08:00
Edward Yang	0e5200adfe	Refactor target_compile_options into torch_compile_options (#29730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29730 Back in the day, Caffe2 had a good idea: instead of spattering target_compile_options all over the codebase, define a helper function which sets all the options for a target. This is especially helpful if I want to split libtorch.so into libtorch_cpu.so and libtorch_cuda.so; I need a way to easily apply options to multiple targets. A shared helper function is just the ticket. I moved every target_compile_options call in caffe2/CMakeLists.txt that didn't seem target dependent (exclusions included OpenMP flags, API-related macros, ONNX related macros and HIP flags) into torch_compile_options. I slavishly preserved the structure: there's a nearly redundant WERROR if() in the output but I preserved it. There is one thing I don't like about this, which is that now the compile options are off in a random directory that no one would expect. But c'est la vie... Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18571166 Pulled By: ezyang fbshipit-source-id: 21cd5f7663485077600782078fbb1787fab09035	2019-11-18 07:05:48 -08:00
David Reiss	d22f61432d	Update fbjni and enable PyTorch JNI build Summary: - Add a "BUILD_JNI" option that enables building PyTorch JNI bindings and fbjni. This is off by default because it adds a dependency on jni.h. - Update to the latest fbjni so we can inhibit building its tests, because they depend on gtest. - Set JAVA_HOME and BUILD_JNI in Linux binary build configurations if we can find jni.h in Docker. Test Plan: - Built on dev server. - Verified that libpytorch_jni links after libtorch when both are built in a parallel build. Differential Revision: D18536828 fbshipit-source-id: 19cb3be8298d3619352d02bb9446ab802c27ec66	2019-11-15 13:59:44 -08:00
Jiakai Liu	b508de6412	add static libraries to TorchConfig.cmake.in (#29837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29837 The current TorchConfig seems only handles shared libraries. When building static libraries it doesn't provide the list of all needed static libraries. This is especially a problem for mobile build as we build static libraries first then link into shared library / binary to do "gc-sections". Today we have to manually import these dependent libraries on each callsite. Test Plan: - build_mobile.sh builds and runs; - The baby test project in #29716 builds and runs; - Will check CI for other platforms; Differential Revision: D18513404 Pulled By: ljk53 fbshipit-source-id: c3dc2c01004c4c9c4574c71fd9a4253c9e19e1e9	2019-11-14 20:41:33 -08:00
Junjie Bai	b0c245d52d	Consolidate the places that find pybind11 include dirs (#29659 ) Summary: Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659 Differential Revision: D18458208 Pulled By: bddppq fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d	2019-11-12 14:51:56 -08:00
Junjie Bai	f111f1b1a7	Suppress implicit int-float conversion warning in ROCm build (#29604 ) Summary: ``` c10/util/Half.h:467:37: warning: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-int-float-conversion] return f < limit::lowest() \|\| f > limit::max(); ~ ^~~~~~~~~~~~ c10/util/Half.h:497:41: note: in instantiation of function template specialization 'c10::overflows<long, double>' requested here if (!std::is_same<To, bool>::value && overflows<To, From>(f)) { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29604 Differential Revision: D18440713 Pulled By: bddppq fbshipit-source-id: f059b4e37e90fa84308be52ff5e1070ffd04031e	2019-11-12 10:44:28 -08:00
Hong Xu	21d11e0b64	FindCUDA: Use find_program instead of find_path to find nvcc (#29160 ) Summary: Otherwise nvcc is not found if it is in env PATH but a non-standard location. Import from my patch for CMake: https://gitlab.kitware.com/cmake/cmake/merge_requests/3990 Although we currently do nvcc search in a Python script, it will be removed soon in https://github.com/pytorch/pytorch/issues/28617. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29160 Differential Revision: D18326693 Pulled By: ezyang fbshipit-source-id: dc7ff3f6026f0655386ff685bce7372e2b061a4b	2019-11-05 08:51:35 -08:00
Sergei Nikolaev	1e2049c566	#26426 fixed (#28715 ) Summary: This is the fix for reverted https://github.com/pytorch/pytorch/issues/26426 houseroad bddppq soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/28715 Reviewed By: hl475 Differential Revision: D18146731 Pulled By: houseroad fbshipit-source-id: 247366451a6334e84df82d00339521f797b33130	2019-11-01 12:53:01 -07:00
Adam J. Stewart	4cf7277d62	Explain how to specify library location for MKL (#28779 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24334. I'm still kind of confused why `FindMKL.cmake` was unable to locate my MKL libraries. They are in the standard `/opt/intel/mkl` installation prefix on macOS. But at least with this more detailed error message, it will be easier for people to figure out how to fix the problem. zhangguanheng66 xkszltl soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/28779 Differential Revision: D18170998 Pulled By: soumith fbshipit-source-id: 47e61baadd84c758267dca566eb1fb8a081de92f	2019-10-28 08:00:54 -07:00
Junjie Bai	d37c2d7c8d	Revert D17495965: TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test Test Plan: revert-hammer Differential Revision: D17495965 Original commit changeset: 3e8dbe8943f5 fbshipit-source-id: d47fcbec22b0d61df41d7dbf15cfdde196ac818f	2019-10-25 13:58:16 -07:00
Sergei Nikolaev	4996e3aca2	TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test (#26426 ) Summary: This PR makes Caffe2 compatible with TensorRT 6. To make sure it works well, new unit test is added. This test checks PyTorch->ONNX->TRT6 inference flow for all classification models from TorhchVision Zoo. Note on CMake changes: it has to be done in order to import onnx-tensorrt project. See https://github.com/pytorch/pytorch/issues/18524 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26426 Reviewed By: hl475 Differential Revision: D17495965 Pulled By: houseroad fbshipit-source-id: 3e8dbe8943f5a28a51368fd5686c8d6e86e7f693	2019-10-25 13:01:57 -07:00
Peter Bell	03d24dba6c	Fix static linking cuDNN without static CUDA (#28378 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/27887#issuecomment-544649765 The logs show that `USE_STATIC_CUDNN` is used but not `CAFFE2_STATIC_LINK_CUDA`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28378 Differential Revision: D18061841 Pulled By: ezyang fbshipit-source-id: 3b9b49953094e02f808ff12107ba4226688d9986	2019-10-22 10:08:09 -07:00
Edward Yang	a3902c901a	Revert "Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887 )" (#28310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28310 This reverts commit `3d3bff5ff1`. Test Plan: Imported from OSS Differential Revision: D18042859 Pulled By: ezyang fbshipit-source-id: cded781dda6fcc04199af6abd07ac09fdc0405de	2019-10-21 14:45:17 -07:00
Peter Bell	3d3bff5ff1	Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15476, supersedes https://github.com/pytorch/pytorch/issues/23496, supersedes and closes https://github.com/pytorch/pytorch/issues/27607 As explained by rgommers in https://github.com/pytorch/pytorch/issues/23496, linking against the expanded library path for `libculibos` in `cmake/Dependencies.cmake` hard codes the path into the distributed cmake files. Instead, I only link against the targets (e.g. `caffe2::cudnn`) and move the dependency on `libculibos` into the cuda import targets declared in `cmake/public/cuda.cmake`. That file is distributed with the other cmake files and so the variable is expanded on the user's machine. I am now also using `CMAKE_STATIC_LIBRARY_SUFFIX` instead of `.a` to fix the windows issue from https://github.com/pytorch/pytorch/issues/15828. I don't have a windows setup to confirm though. Finally, to get pytorch to compile with the extra libraries enabled, I also had to link `__caffe2_nccl` to `torch_python`; otherwise I was getting include errors as the hard coded include directory was wrong. `nccl` is built into `build` not `third_party/build`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27887 Differential Revision: D17929440 Pulled By: ezyang fbshipit-source-id: 3db6bd94d758fca2e1d6a64f4f5eea03cc07cf64	2019-10-16 09:21:47 -07:00
Edward Yang	0b6186d778	Remove Tensor.h, TensorMethods.h from src/core. (#27086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086 This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past). This is a commandeer of #25031 Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D17687345 Pulled By: ezyang fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f	2019-10-06 09:37:50 -07:00
Johannes M Dieterich	17c672e704	enable rocTX API (#27416 ) Summary: ROCm 2.9 brings support for the rocTX API through rocTracer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416 Differential Revision: D17777480 Pulled By: bddppq fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7	2019-10-05 01:55:00 -07:00
Junjie Bai	f4d0d0a811	Enable RCCL in ROCm build (#27383 ) Summary: continues https://github.com/pytorch/pytorch/pull/23884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383 Differential Revision: D17767248 Pulled By: bddppq fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90	2019-10-04 17:41:41 -07:00
J M Dieterich	32d009a37f	Add gfx908 to the list of per-default compiled architectures. (#27388 ) Summary: ROCm 2.8 added preliminary support for gfx908. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27388 Differential Revision: D17767772 Pulled By: bddppq fbshipit-source-id: 172daf5bb66d3db86a13e287059af4b9b90a7f57	2019-10-04 14:49:33 -07:00
Mingbo Wan	5379e87a32	Cuda101 upgrade (#26823 ) Summary: test run: https://github.com/pytorch/pytorch/issues/26732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26823 Reviewed By: soumith Differential Revision: D17576095 Pulled By: mingbowan fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b	2019-09-25 14:44:12 -07:00
Hong Xu	5e5cbceeba	remove tools/setup_helpers/cudnn.py (#25876 ) Summary: FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed. Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876 Differential Revision: D17346270 Pulled By: ezyang fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894	2019-09-24 07:44:33 -07:00
jasjuang	e4821012ad	prevent generating caffe2::mkldnn for multiple times (#25257 ) Summary: This is a similar problem to https://github.com/pytorch/pytorch/issues/25004. After the merge of https://github.com/pytorch/pytorch/issues/25167, I recompiled torch and discovered another similar bug. ezyang please take a look Pull Request resolved: https://github.com/pytorch/pytorch/pull/25257 Differential Revision: D17528116 Pulled By: ezyang fbshipit-source-id: 1657d9ee6dced3548f246010b05e2b3c25c37dee	2019-09-23 08:53:02 -07:00
Jiakai Liu	d6e3aed032	add eigen blas for mobile build (#26508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26508 Enable BLAS for pytorch mobile build using Eigen BLAS. It's not most juicy optimization for typical mobile CV models as we are already using NNPACK/QNNPACK for most ops there. But it's nice to have good fallback implementation for other ops. Test Plan: - Create a simple matrix multiplication script model: ``` import torch class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.weights = torch.ones(1000, 1000) def forward(self, x): return torch.mm(x, self.weights) n = Net() module = torch.jit.trace_module(n, {'forward': torch.ones(1000, 1000)}) module.save('mm.pk') ``` - Before integrate with eigen blas: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch \ --model=mm.pk \ --input_dims="1000,1000" \ --input_type=float \ --warmup=5 \ --iter=5' Milliseconds per iter: 2218.52. ``` - After integrate with eigen blas: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch_eigen \ --model=mm.pk \ --input_dims="1000,1000" \ --input_type=float \ --warmup=5 \ --iter=5' Milliseconds per iter: 314.535. ``` - Improve MobileNetV2 single thread perf by ~5%: ``` adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch \ --model=mobilenetv2.pk \ --input_dims="1,3,224,224" \ --input_type=float \ --warmup=5 \ --iter=20 \ --print_output=false \ --caffe2_threadpool_force_inline=true' Milliseconds per iter: 367.055. adb shell 'cd /data/local/tmp; \ ./speed_benchmark_torch_eigen \ --model=mobilenetv2.pk \ --input_dims="1,3,224,224" \ --input_type=float \ --warmup=5 \ --iter=20 \ --print_output=false \ --caffe2_threadpool_force_inline=true' Milliseconds per iter: 348.77. ``` Differential Revision: D17489587 fbshipit-source-id: efe542db810a900f680da7ec7e60f215f58db66e	2019-09-20 15:45:11 -07:00
Ashkan Aliabadi	dc851ab5d4	Integrate forked QNNPACK into mobile PyTorch builds. (#25844 ) Summary: Enable forked QNNPACK builds in PyTorch mobile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25844 Differential Revision: D17336458 Pulled By: AshkanAliabadi fbshipit-source-id: 6ea09dd6c114b64313e9159bf7f17253bc87bfdb	2019-09-16 20:50:43 -07:00
Tao Xu	3051e36e05	Remove armv7s build from iOS (#26222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26222 ### Summary The last generation of armv7s device is Phone 5C. As discussed with David offline, we decided not to support iOS armv7s devices. ### Test plan - CI finishes successfully - Builds can be run only on X86_64 and arm64 devices Test Plan: Imported from OSS Differential Revision: D17385308 Pulled By: xta0 fbshipit-source-id: f883999aed18224ea3386b1f016964a33270fa34	2019-09-14 11:07:37 -07:00
Sebastian Messmer	8321f2592e	Register ATen ops with c10 (#26131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131 Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators - This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true): - `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser - out functions have different argument order in C++ as in the jit schema - `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like `int[3]` not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. ghstack-source-id: 90060748 Test Plan: a diff stacked on top uses these registrations to call these ops from ATen Differential Revision: D16603131 fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb	2019-09-13 13:52:40 -07:00
Jiakai Liu	075adb4d2d	remove pthreadpool.a from install directory (#25977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977 Call add_subdirectory() explicitly before NNPACK/QNNPACK with EXCLUDE_FROM_ALL property so that pthreadpool target won't be installed by default for libtorch mobile build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977 Test Plan: Imported from OSS Differential Revision: D17312083 Pulled By: ljk53 fbshipit-source-id: 79851d0aa9402c5b9287ef4bbd8d7fd3a341497d	2019-09-11 12:27:56 -07:00
albanD	63df9ffd0b	Fix typo in OpenBLAS cmake detection Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25966 Differential Revision: D17315925 Pulled By: albanD fbshipit-source-id: 55c6b4a1ddeaf95714034ec66a4d59b0f00ba634	2019-09-11 09:10:42 -07:00
Jiakai Liu	74b48f21c1	remove protobuf from Dependencies.cmake for libtorch mobile build (#25958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25958 Should have cleaned up the remaining protobuf dependencies before landing PR #25896. Test Plan: - CI build; Reviewed By: dreiss Differential Revision: D17296949 Pulled By: ljk53 fbshipit-source-id: 20c444e63900c7fa054db3cc757d3f18614af630	2019-09-10 18:23:20 -07:00
Johannes M Dieterich	26675b507f	Enable libflame as a LAPACK choice (#25795 ) Summary: libflame is BLIS's companion LAPACK from the FLAME project Mimicks my ancient `f5bc78263e` in cmake upstream BLIS WWW: https://github.com/flame/libflame Pull Request resolved: https://github.com/pytorch/pytorch/pull/25795 Differential Revision: D17286461 Pulled By: bddppq fbshipit-source-id: 7cd0d27127c78563574791415e4a34f045df30df	2019-09-10 10:34:55 -07:00
Soumith Chintala	73855ecd43	fix cudnn static linkage (#25848 ) Summary: Fix regression caused by https://github.com/pytorch/pytorch/pull/24938 This fixes CUDA nightly breakages Pull Request resolved: https://github.com/pytorch/pytorch/pull/25848 Differential Revision: D17256348 Pulled By: soumith fbshipit-source-id: dded577717947d0f092e9d76b423b2bc7c56070a	2019-09-08 21:41:57 -07:00
J M Dieterich	748436a514	Enable BLIS from the FLAME project as a BLAS choice. (#23819 ) Summary: BLIS is AMD's official recommendation for BLAS. Mimicks my ancient `f5bc78263e` in cmake upstream BLIS WWW: https://github.com/flame/blis Pull Request resolved: https://github.com/pytorch/pytorch/pull/23819 Differential Revision: D17231360 Pulled By: bddppq fbshipit-source-id: 68db70d63e410438f99b2bf57986b81ff6b6c5b3	2019-09-06 12:00:25 -07:00
Jiakai Liu	67c530851c	get rid of protobuf dependencies (#25650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650 This PR removes protobuf dependencies from mobile build altogether: - caffe2/proto: protobuf files, including caffe2.proto and torch.proto; - caffe2 components that depend on caffe2.proto, including most part of caffe2/core, caffe2/utils; - libprotobuf / libprotobuf-lite dependencies; - protobuf compiler; - some utils class, e.g.: netdef_converter.cpp; - introduce a macro to disable third_party/onnx which depends on protobuf; Test Plan: - builds; - link with demo app to make sure it can load and run a model in pickle format; Differential Revision: D17183548 Pulled By: ljk53 fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531	2019-09-06 08:48:20 -07:00
Hong Xu	cc4211069e	Do not pass down USE_GLOO_IBVERBS to CMake (#25720 ) Summary: It doesn't seem to be used anywhere once down to CMake in this repo or any submodules Pull Request resolved: https://github.com/pytorch/pytorch/pull/25720 Differential Revision: D17225088 Pulled By: pietern fbshipit-source-id: a24b080e6346a203b345e2b834fe095e3b9aece0	2019-09-06 02:40:42 -07:00
Johannes M Dieterich	9c5a899773	Enable jit fusion on ROCm (#22872 ) Summary: As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails * new hipification rules for API_RTC * add hiprtc APIs to the shim loader * update cmake infrastructure to find the hiprtc library (it is part of the HIP package) * enabling of unit tests in the jit_fuser test set * special casing in resource strings for HIP - the typedefs CUDA requires would be redundant * for now disable the occupancy calculation we do not support yet and hard-code Thanks to t-vi for working with me on getting this integration done! Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872 Differential Revision: D17207425 Pulled By: bddppq fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe	2019-09-05 18:22:08 -07:00
Pieter Noordhuis	3556bea5aa	Build torch.distributed with Gloo backend on macOS (#25260 ) Summary: In facebookincubator/gloo#212, a libuv based Gloo transport was introduced, which allows us to use Gloo on macOS (and later perhaps also Windows). This commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS. A few notes: * The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`. * The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS). * The TCP store works but sometimes crashes on process termination. * The distributed tests are not yet run. * The nightly builds don't use `USE_DISTRIBUTED=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260 Reviewed By: mrshenli Differential Revision: D17202381 Pulled By: pietern fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c	2019-09-05 07:09:50 -07:00
James Reed	817f4502fb	Dynamic dispatch for optimized quantized op kernels (#25545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25545 This re-uses the infrastructure from ATen/native/cpu, which compiles kernels multiple times for different instruction sets and dispatches dynamically based on the CPU's capability flags at runtime. This ensures we use the most optimal quantized kernel for the given machine Test Plan: Imported from OSS Differential Revision: D17166369 Pulled By: jamesr66a fbshipit-source-id: 8c8393f99365e1408819bbaf254c1b5734a34b70	2019-09-04 13:26:40 -07:00

1 2 3 4 5 ...

639 Commits