pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
vishalrao487	d2623da52c	replaced whitelist with allowlist (#45260 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41754 (1) Intially file was named gen_op_registration_whitelist.py I changed it to gen_op_registration_allowlist.py (2) There were some whitelist in comment inside the file, I changed it to allowlist ![update1](https://user-images.githubusercontent.com/62737243/94106752-b296e780-fe59-11ea-8541-632a1dbf90d6.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45260 Reviewed By: dhruvbird Differential Revision: D23947182 Pulled By: ljk53 fbshipit-source-id: 31b486592451dbb0605d7950e07747cbb72ab80f	2020-09-29 00:27:46 -07:00
Peter Bell	b70fac75ac	CMake: Fix python dependencies in codegen (#45275 ) Summary: I noticed while working on https://github.com/pytorch/pytorch/issues/45163 that edits to python files in the `tools/codegen/api/` directory wouldn't trigger rebuilds. This tells CMake about all of the dependencies, so rebuilds are triggered automatically. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45275 Reviewed By: zou3519 Differential Revision: D23922805 Pulled By: ezyang fbshipit-source-id: 0fbf2b6a9b2346c31b9b0384e5ad5e0eb0f70e9b	2020-09-25 09:16:38 -07:00
Abdelrauf	6954ae1278	Vec256 Test cases (#42685 ) Summary: [Tests for Vec256 classes https://github.com/pytorch/pytorch/issues/15676](https://github.com/pytorch/pytorch/issues/15676) Testing Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on Vsx vec_round failed for (even)+0 .5 values~~ . it was solved by using vec_rint - ~~complex types are not tested~~ After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Fixes: https://github.com/pytorch/pytorch/issues/15676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42685 Reviewed By: malfet Differential Revision: D23034406 Pulled By: glaringlee fbshipit-source-id: d1bf03acdfa271c88744c5d0235eeb8b77288ef8	2020-09-16 11:48:02 -07:00
Edward Yang	6ea89166bd	Rewrite of ATen code generator (#42629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629 How to approach reviewing this diff: - The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen. - The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`. - All of the inputs to the old codegen are deleted. - Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI. - LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D23183978 Pulled By: ezyang fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763	2020-08-31 09:00:22 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Jiakai Liu	3afd24d62c	[pytorch] check in default generated op dependency graph (#43570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43570 Add the default op dependency graph to the source tree - use it if user runs custom build in dynamic dispatch mode without providing the graph. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23326988 Pulled By: ljk53 fbshipit-source-id: 5fefe90ca08bb0ca20284e87b70fe1dba8c66084	2020-08-27 14:51:44 -07:00
Anush Elangovan	c86699d425	[cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387 ) Summary: Add support for including pytorch via an add_subdirectory() This requires using PROJECT_* instead of CMAKE_* which refer to the top-most project including pytorch. TEST=add_subdirectory() into a pytorch checkout and build. There are still some hardcoded references to TORCH_SRC_DIR, I will fix in a follow on commit. For now you can create a symlink to <pytorch>/torch/ in your project. Change-Id: Ic2a8aec3b08f64e2c23d9e79db83f14a0a896abc Pull Request resolved: https://github.com/pytorch/pytorch/pull/41387 Reviewed By: zhangguanheng66 Differential Revision: D22539944 Pulled By: ezyang fbshipit-source-id: b7e9631021938255f0a6ea897a7abb061759093d	2020-07-15 11:09:05 -07:00
Wojciech Baranowski	0b9717b86a	When linking libtorch_cpu.so, put AVX sources last in the input list (#40449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40449 Reviewed By: VitalyFedyunin Differential Revision: D22312501 Pulled By: colesbury fbshipit-source-id: 4c09adb0173749046f20b84241d6c940b339ad77	2020-07-06 07:56:12 -07:00
Ivan Kobzarev	b460465a18	[Mobile GPU][Integration] Vulkan backend integration (#36491 ) Summary: This PR contains the initial version of Vulkan (GPU) Backend integration. The primary target environment is Android, but the desktop build is also supported. ## CMake Introducing three cmake options: USE_VULKAN: The main switch, if it is off, all other options do not affect. USE_VULKAN_WRAPPER: ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h. OFF - linking with libvulkan.so directly USE_VULKAN_SHADERC_RUNTIME: ON - Shader compilation library will be linked, and shaders will be compiled runtime. OFF - Shaders will be precompiled and shader compilation library is not included. ## Codegen if `USE_VULKAN_SHADERC_RUNTIME` is ON: Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp. if `USE_VULKAN_SHADERC_RUNTIME` is OFF: The source of shaders is included as `glsl.h`,`glsl.cpp`. All codegen results happen in the build directory. ## Build dependencies cmake/Dependencies.cmake If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK. Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it. (Desktop build was tested only on Linux). ## Pytorch integration: Adding 'Vulkan" as new Backend, DispatchKey, DeviceType. We are using Strided layout without supporting strides at the moment, but we plan to support them in the future. Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor, more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h` Main code location: `aten/src/ATen/native/vulkan` `aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor. `aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops. `aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API ## GLSL shaders Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files. All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3 ## Supported operations Code point: conv2d no-groups conv2d depthwise addmm upsample nearest 2d clamp hardtanh ## Testing `aten/src/ATen/test/vulkan_test.cpp` - contains tests for copy from CPU to Vulkan and back all supported operations Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader ## Vulkan execution The initial implementation is trivial and waits every operator's execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491 Differential Revision: D21696709 Pulled By: IvanKobzarev fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa	2020-05-26 08:30:13 -07:00
Nikita Shulga	664a3ab5c7	Enable py38 gcc9 build config (#38805 ) Summary: Add `py38-gcc9` build-only config Add appropriate `-Wno-xyz` flags to ATEN kernels as well as `tensorexp/llvm_jit.cpp` and `tensorexp/llvm_codegen.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38805 Differential Revision: D21682953 Pulled By: malfet fbshipit-source-id: 5b61d0dfe8bdec8fb13e2ae5857dc5e7c6e58e42	2020-05-21 01:38:04 -07:00
Nikita Shulga	4b52e52577	Use `jit_core_sources` from build_varliables.bzl (#38526 ) Summary: Replace hardcoded filelist in aten/src/ATen/CMakeLists.txt with one from `jit_source_sources` Fix `append_filelist` to work independently from the location it was invoked Pull Request resolved: https://github.com/pytorch/pytorch/pull/38526 Differential Revision: D21594582 Pulled By: malfet fbshipit-source-id: c7f216a460edd474a6258ba5ddafd4c4f59b02be	2020-05-15 08:21:37 -07:00
Jiakai Liu	6792bafa72	[pytorch] aten codegen to filter backends for default mobile build Summary: This is a simple change to mitigate the OSS mobile default build size regression caused by #34275 and #34622. Mobile supported backends are already kinda hard-coded in function_wrapper.py as `static_dispatch_backends`: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/function_wrapper.py#L243 This is simply to align dynamic registration with static dispatch for mobile build. To measure mobile build size: ``` // Default mobile build: scripts/build_pytorch_android.sh armeabi-v7a // MobileNetV2 custom build: SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` - arm-v7 Android AAR (compressed) size: ``` +----------+-------------------+---------------+ \| \| MobileNetV2 Build \| Default Build \| +----------+-------------------+---------------+ \| Original \| 3,354,589 \| 5,731,992 \| \| #34275 \| 3,404,978 \| 6,640,526 \| \| #34622 \| 3,432,569 \| 6,640,526 \| \| This PR \| 3,431,660 \| 6,534,135 \| +----------+-------------------+---------------+ ``` Differential Revision: D20415107 Test Plan: Imported from OSS Pulled By: ljk53 fbshipit-source-id: 75acf4dc5dfe9242c01b2db0b84bd6b4a1d0cd8d	2020-04-30 01:35:38 -07:00
Bram Wasti	4234d62489	[hotfix] Workaround for older versions of ninja (#37417 ) Summary: Older versions of ninja don't like relative paths in configure_file when it is called twice. https://gitlab.kitware.com/cmake/cmake/issues/17601 Fix suggested in comments https://gitlab.kitware.com/cmake/cmake/-/issues/18584 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37417 Reviewed By: malfet Differential Revision: D21280141 Pulled By: bwasti fbshipit-source-id: 4cb94996a9e8ae8c01602ea1da6f4ce9d61fa700	2020-04-28 09:03:51 -07:00
Nikita Shulga	76cb7f2043	Use filelist from build_variables.bzl to fetch distributed file list (#37090 ) Summary: Rename `get_filelist` to `append_filelist` Repalce hadcoded filelist under `USE_DISTRIBUTED` with `append_filelist("libtorch_distributed_sources" TORCH_SRCS)` call Pull Request resolved: https://github.com/pytorch/pytorch/pull/37090 Test Plan: CI Differential Revision: D21184002 Pulled By: malfet fbshipit-source-id: 25bb7f97fcb2bf5bec8bdb3aa059ae13e7610007	2020-04-22 13:13:25 -07:00
Nikita Shulga	4668d47d1f	Add build_variable.bzl to CMAKE_RERUN target (#36809 ) Summary: `configure_file` command adds its input as a top-level dependency triggering make file regeneration if file timestamp have changed Also abort CMAKE if `exec` of build_variables.bzl failed for some reason Pull Request resolved: https://github.com/pytorch/pytorch/pull/36809 Test Plan: Add invalid statement to build_variables.bzl and check that build process fails Differential Revision: D21100721 Pulled By: malfet fbshipit-source-id: 79a54aa367fb8dedb269c78b9538b4da203d856b	2020-04-17 17:28:07 -07:00
Nikita Shulga	d7fc05b0bf	Fetch TORCH_SRCS from `build_variables.bzl` (#36737 ) Summary: Mimic `.bzl` parsing logic from https://github.com/pytorch/FBGEMM/pull/344 Generate `libtorch_cmake_sources` by running following script: ``` def read_file(path): with open(path) as f: return f.read() def get_cmake_torch_srcs(): caffe2_cmake = read_file("caffe2/CMakeLists.txt") start = caffe2_cmake.find("set(TORCH_SRCS") end = caffe2_cmake.find(")", start) return caffe2_cmake[start:end+1] def get_cmake_torch_srcs_list(): caffe2_torch_srcs = get_cmake_torch_srcs() unfiltered_list = [x.strip() for x in get_cmake_torch_srcs().split("\n") if len(x.strip())>0] return [x.replace("${TORCH_SRC_DIR}/","torch/") for x in unfiltered_list if 'TORCH_SRC_DIR' in x] import imp build_variables = imp.load_source('build_variables', 'tools/build_variables.bzl') libtorch_core_sources = set(build_variables.libtorch_core_sources) caffe2_torch_srcs = set(get_cmake_torch_srcs_list()) if not libtorch_core_sources.issubset(caffe2_torch_srcs): print("libtorch_core_sources must be a subset of caffe2_torch_srcs") print(sorted(caffe2_torch_srcs.difference(libtorch_core_sources))) ``` Move common files between `libtorch_cmake_sources` and `libtorch_extra_sources` to `libtorch_jit_core_sources` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36737 Test Plan: CI Differential Revision: D21078753 Pulled By: malfet fbshipit-source-id: f46ca48d48aa122188f028136c14687ff52629ed	2020-04-16 19:12:52 -07:00
peter	3bdc4a37ed	CMake script cleanup - mixed case for function names (#35589 ) Summary: Running the following code. ```bash cmake --help-command-list \| grep -v "cmake version" \| while read c; do echo 's/\b'"$(echo $c \| tr '[:lower:]' '[:upper:]')"'$\s$(/'"$c"'\1(/g' done >convert.sed && git ls-files -z -- bootstrap '.cmake' '.cmake.in' 'CMakeLists.txt' \| egrep -z -v '^(cmake/Modules/\|cmake/Modules_CUDA_fix/)' \| xargs -0 sed -i -f convert.sed && rm convert.sed ``` cmake-lint is too sensitive about mixed case so I didn't switch the check on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35589 Differential Revision: D20735648 Pulled By: ezyang fbshipit-source-id: a09a60a7ce921bb198575a35335faa299bd10b66	2020-03-30 11:37:02 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
xiaobingsuper	fb70893e78	remove cadd_avx2 dead code (#34883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34883 Test Plan: Imported from OSS Differential Revision: D20611526 Pulled By: ngimel fbshipit-source-id: 78c80b7361119fc8d2b9f6b4f0c86b61723fe05d	2020-03-24 12:00:56 -07:00
Jiakai Liu	61b680c012	[pytorch] force c10 schema registration for custom build Summary: PR #32521 has several issues with mobile builds: 1. It didn't work with static dispatch (which OSS mobile build currently uses); 2. PR #34275 fixed 1) but it doesn't fix custom build for #32521; 3. manuallyBoxedKernel has a bug with ops which only have catchAllKernel: `2d7ede5f71` Both 1) and 2) have similar root cause - some JIT side code expects certain schemas to be registered in JIT registry. For example: considering this code snippet: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/frontend/builtin_functions.cpp#L10 ``` auto scalar_operators_source = CodeTemplate( R"SCRIPT( def mul(a : ${Scalar}, b : Tensor) -> Tensor: return b * a ... ``` It expects "aten::mul.Scalar(Tensor self, Scalar other) -> Tensor" to be registered in JIT - it doesn't necessarily need to call the implementation, though; otherwise it will fail some type check: https://github.com/pytorch/pytorch/pull/34013#issuecomment-592982889 Before #32521, all JIT registrations happen in register_aten_ops_.cpp generated by gen_jit_dispatch.py. After #32521, for ops with full c10 templated boxing/unboxing support, JIT registrations happen in TypeDefault.cpp/CPUType.cpp/... generated by aten/gen.py, with c10 register API via RegistrationListener in register_c10_ops.cpp. However, c10 registration in TypeDefault.cpp/CPUType.cpp/... are gated by `#ifndef USE_STATIC_DISPATCH`, thus these schemas won't be registered in JIT registry when USE_STATIC_DISPATCH is enabled. PR #34275 fixes the problem by moving c10 registration out of `#ifndef USE_STATIC_DISPATCH` in TypeDefault.cpp/CPUType.cpp/..., so that all schemas can still be registered in JIT. But it doesn't fix custom build, where we only keep c10 registrations for ops used by specific model directly (for static dispatch custom build) and indirectly (for dynamic dispatch custom build). Currently there is no way for custom build script to know things like "aten::mul.Scalar(Tensor self, Scalar other) -> Tensor" needs to be kept, and in fact the implementation is not needed, only schema needs to be registered in JIT. Before #32521, the problem was solved by keeping a DUMMY placeholder for unused ops in register_aten_ops_.cpp: https://github.com/pytorch/pytorch/blob/master/tools/jit/gen_jit_dispatch.py#L326 After #32521, we could do similar thing by forcing aten/gen.py to register ALL schema strings for selective build - which is what is PR is doing. Measured impact on custom build size (for MobileNetV2): ``` SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` Before: 3,404,978 After: 3,432,569 ~28K compressed size increase due to including more schema strings. The table below summarizes the relationship between codegen flags and 5 build configurations that are related to mobile: ``` +--------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------+ \| \| Open Source \| FB BUCK \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| \| Default Build \| Custom Build w/ Stat-Disp \| Custom Build w/ Dyna-Disp \| Full-JIT \| Lite-JIT \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| Dispatch Type \| Static \| Static \| Dynamic \| Dynamic (WIP) \| Dynamic (WIP) \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| ATen/gen.py \| \| \| \| \| \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| --op_registration_whitelist \| unset \| used root ops \| closure(used root ops) \| unset \| closure(possibly used ops) \| \| --backend_whitelist \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| \| --per_op_registration \| false \| false \| false \| false \| true \| \| --force_schema_registration \| false \| true \| true \| false \| false \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| tools/setup_helpers/generate_code.py \| \| \| \| \| \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| --disable-autograd \| true \| true \| true \| false \| WIP \| \| --selected-op-list-path \| file(used root ops) \| file(used root ops) \| file(used root ops) \| unset \| WIP \| \| --disable_gen_tracing \| false \| false \| false \| false \| WIP \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ ``` Differential Revision: D20397421 Test Plan: Imported from OSS Pulled By: ljk53 fbshipit-source-id: 906750949ecacf68ac1e810fd22ee99f2e968d0b	2020-03-20 20:07:34 -07:00
Jiakai Liu	064c478453	[pytorch] register c10 ops for static dispatch to unblock c10 boxing Summary: PR #32521 broke static dispatch because some ops are no longer registered in register_aten_ops_.cpp - it expects the c10 registers in TypeDefault.cpp / CPUType.cpp / etc to register these ops. However, all c10 registers are inside `#ifndef USE_STATIC_DISPATCH` section. To measure the OSS mobile build size impact of this PR: ``` # default build: SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a # mobilenetv2 custom build: scripts/build_pytorch_android.sh armeabi-v7a ``` - Before this PR, Android AAR size for arm-v7: default build: 5.5M; * mobilenetv2 custom build: 3.2M; - After this PR: * default build: 6.4M; * mobilenetv2 custom build: 3.3M; It regressed default build size by ~1M because more root ops are registered by c10 registers, e.g. backward ops which are filtered out by gen_jit_dispatch.py for inference-only mobile build. mobilenetv2 custom build size regressed by ~100k presumably because the op whitelist is not yet applied to things like BackendSelectRegister. Differential Revision: D20266240 Test Plan: Imported from OSS Pulled By: ljk53 fbshipit-source-id: 97a9a06779f8c62fe3ff5cce089aa7fa9dee3c4a	2020-03-20 20:07:15 -07:00
Jiakai Liu	3c042a6ab9	[pytorch][mobile] support for custom mobile build with dynamic dispatch (#34055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34055 Enable custom mobile build with dynamic dispatch for OSS build. It calls a python util script to calculate transitive dependencies from the op dependency graph and the list of used root ops, then pass the result as the op registration whitelist to aten codegen, so that only these used ops are registered and kept at link time. For custom build with dynamic dispatch to work correctly, it's critical to have the accurate list of used ops. Current assumption is that only those ops referenced by TorchScript model are used. It works well if client code doesn't call libtorch API (e.g. tensor methods) directly; otherwise the extra used ops need to be added to the whitelist manually, as shown by the HACK in prepare_model.py. Also, if JIT starts calling extra ops independent of specific model, then the extra ops need to be added to the whitelist as well. Verified the correctness of the whole process with MobileNetV2: ``` TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh ``` Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D20193327 Pulled By: ljk53 fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa	2020-03-03 19:25:16 -08:00
xiaobing.zhang	b678256bfb	Move glu to Aten(CPU) (#33179 ) Summary: This PR move glu to Aten(CPU). Test script: ``` import torch import torch.nn.functional as F import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" #warm up for n in [10, 100, 1000, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(1000): output = F.glu(input) output.backward(grad_output) for n in [10, 100, 1000, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(10000): t1 = _time() output = F.glu(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms). input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms). input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms). input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms). ``` After: ``` input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms). input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179 Differential Revision: D19839835 Pulled By: VitalyFedyunin fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9	2020-02-28 14:54:38 -08:00
Francis Charette Migneault	0150f40dde	dont force msvc /Ox flag which can conflict with /RTC1 in debug config (#33164 ) Summary: Relates to https://github.com/pytorch/pytorch/issues/33132 This fix doesn't add full multi-configuration support described in https://github.com/pytorch/pytorch/issues/33132 but at least avoid the error presented in the issue when `CMAKE_BUILD_TYPE=Debug` is used with MSVC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33164 Differential Revision: D19899727 Pulled By: ezyang fbshipit-source-id: 28a364d920c4a3fb577c6b484ccd69a133fbcf5d	2020-02-13 22:15:20 -08:00
Edward Yang	0b6186d778	Remove Tensor.h, TensorMethods.h from src/core. (#27086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086 This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past). This is a commandeer of #25031 Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D17687345 Pulled By: ezyang fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f	2019-10-06 09:37:50 -07:00
Sebastian Messmer	8321f2592e	Register ATen ops with c10 (#26131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131 Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators - This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true): - `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser - out functions have different argument order in C++ as in the jit schema - `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like `int[3]` not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. ghstack-source-id: 90060748 Test Plan: a diff stacked on top uses these registrations to call these ops from ATen Differential Revision: D16603131 fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb	2019-09-13 13:52:40 -07:00
James Reed	817f4502fb	Dynamic dispatch for optimized quantized op kernels (#25545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25545 This re-uses the infrastructure from ATen/native/cpu, which compiles kernels multiple times for different instruction sets and dispatches dynamically based on the CPU's capability flags at runtime. This ensures we use the most optimal quantized kernel for the given machine Test Plan: Imported from OSS Differential Revision: D17166369 Pulled By: jamesr66a fbshipit-source-id: 8c8393f99365e1408819bbaf254c1b5734a34b70	2019-09-04 13:26:40 -07:00
Sebastian Messmer	791347642b	Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888 This is an alternative to https://github.com/pytorch/pytorch/pull/23684. Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed. ghstack-source-id: 89357687 Test Plan: waitforsandcastle Differential Revision: D16673569 fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf	2019-09-04 01:35:19 -07:00
Roy Li	2a698682e4	Remove Type dispatch (#21964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21964 ghimport-source-id: fdfb555ac4efbf31ae7d2c700a5aa44ad0cc4d7f Test Plan: Imported from OSS Differential Revision: D15897424 Pulled By: li-roy fbshipit-source-id: 3cd6744254e34d70e6875ffde749b5cf959b663c	2019-06-30 04:11:35 -07:00
Sam Gross	b90790ab1b	Don't split 256-bit AVX2 load/store intrinsics (#20609 ) Summary: Recent versions of GCC split unaligned load and store intrinsics into two 128-bit instructions. On old processors (Sandy Bridge) this was a bit faster for unaligned data, but bit slower for aligned data. On new processors (Intel Haswell+, recent AMD) splitting loads is slower on both aligned and unaligned data. Clang, MSVC, and ICC do not split unaligned load and store intrinsics. There's a good explanation here: https://stackoverflow.com/questions/52626726/why-doesnt-gcc-resolve-mm256-loadu-pd-as-single-vmovupd#tab-top Splitting load and store intrinsics makes no sense in our AVX2 configuration because the CPUs that support AVX2 instructions are the same CPUs where splitting is disadvantageous on all data alignemnt. Note that this doesn't change the AVX configuration (used by CPUs that support AVX but not AVX2). It's possible this would be benficial for that configuration too (our data is usually 32-byte aligned), but I'd prefer the conservative change for now. torch.add generated assembly (hot loop) (GCC 7.3.0) before: https://gist.github.com/colesbury/066376537bccd514daf8fe4ab54d8295 after: https://gist.github.com/colesbury/8b4b948145001d44b225c51d2428bb91 Timing of `torch.add(x, y, out=z)` for size 10240 (1 thread, Broadwell, no turbo): before: 7.35 us after: 6.39 us (Take the torch.add timings with a grain of salt. The difference in timings is much larger than I would expect.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/20609 Differential Revision: D15385800 Pulled By: colesbury fbshipit-source-id: 66415b148a3b19360b9de9881af594ab46547b6f	2019-05-17 09:16:17 -07:00
Jiakai Liu	c7c02724cd	CMakeLists changes to enable libtorch for Android (#19762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19762 ghimport-source-id: 287aa7fea4efd38994e14d794123eb2046b91fc0 Differential Revision: D15087653 Pulled By: ljk53 fbshipit-source-id: 4498ff9f7f7903c3e25541184302b811267958e9	2019-05-03 09:28:53 -07:00
Jiakai Liu	8cd6d2f101	rename BUILD_ATEN_MOBILE to INTERN_BUILD_MOBILE and make it private (#19942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19942 ghimport-source-id: 6bacc8f5ad7911af8cf5fde9fcb604ade666b862 Reviewed By: dzhulgakov Differential Revision: D15144325 Pulled By: ljk53 fbshipit-source-id: d63a70f007110d5d1055d6bec1ed09a1a6aafdae	2019-05-01 00:20:24 -07:00
Gemfield	20159c3ffe	remove redundant --install_dir parameter in GEN_COMMAND (#18473 ) Summary: remove redundant --install_dir parameter in GEN_COMMAND, since "--install_dir parameter " already contained in ${GEN_COMMAND}. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18473 Differential Revision: D14620193 Pulled By: ezyang fbshipit-source-id: ee9953b5d055f4b8beb3557f95f6539051b0028a	2019-03-26 10:22:00 -07:00
Owen Anderson	fc2d8c6889	Eliminate PYCMD in favor of PYTHON_EXECUTABLE in CMake. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16522 Differential Revision: D13867376 Pulled By: resistor fbshipit-source-id: 6bce68facea83c5161a31fcdfafe08827999eb2b	2019-01-30 17:13:43 -08:00
andersj	8a5ba577c1	Revert "remove use of tmp_install" (#15847 ) Summary: This reverts commit `04bf528589`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15847 Differential Revision: D13603174 Pulled By: anderspapitto fbshipit-source-id: ae321434d3345ad94fad67bf71fd027cddeb4588	2019-01-08 16:30:19 -08:00
andersj	04bf528589	remove use of tmp_install Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14553 Differential Revision: D13583335 Pulled By: anderspapitto fbshipit-source-id: 8711fead9eda877c1037a0bc59f91a3d2e01f3e0	2019-01-04 13:48:12 -08:00
Zachary DeVito	60f02b87be	fix an issue where two rules build the same .py files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15230 Differential Revision: D13471625 Pulled By: zdevito fbshipit-source-id: a982413a308c7a9bb5b6a82fe96fd3de44f555aa	2018-12-14 14:52:52 -08:00
Edward Yang	b710642969	Make ATen HIPify out-of-place, but still reuse CUDA names. (#14866 ) Summary: ``` This diff changes the HIPification of ATen to be out-of-place. We now have the following mappings: - ATen/cuda => ATen/hip - ATen/native/cuda => ATen/native/hip - ATen/native/sparse/cuda => ATen/native/sparse/hip - THC => THH - THCUNN => THHUNN The build system is adjusted to know about these new build paths, and HIPify is taught how to adjust include paths and THC_GENERIC_FILE appropriately. ATen_hip is now built as the ATen_hip library, rather than reusing ATen_cuda. However, despite these new filepaths, none of the identifiers in ATen have actually changed. So, e.g., THHGeneral.h still defines functions named THC_blahblah, and HIP still shows up as CUDA in PyTorch itself. We'll tackle this in a subsequent PR; this diff is just to get the files out-of-place. Minor extra improvements: - Don't edit tmp_install when hipifying - HIP no longer builds native_cudnn_cpp; it was unnecessary - Caffe2_HIP_INCLUDES is now Caffe2_HIP_INCLUDE, for consistency with all the other variables. - HIP build now properly respects ATEN_CUDA_FILES_GEN_LIB (it did not previously.) - You can now override file extension matching in pyHIPIFY by explicitly specifying its full name in the matching list. This is used so we can HIPify CMakeLists.txt in some situations. A little bit of string and ceiling wax: - gen.py grows a --rocm flag so that it knows to generate CUDA files which actually refer to the HIP headers (e.g., THH.h) We'll get rid of this eventually and generate real HIP files, but not for this PR. - Management of HIP dependencies is now completely deleted from the ATen CMakeLists.txt. The old code was dead (because it was shoveled in ATen_CUDA_DEPENDENCY_LIBS and promptly ignored by the Caffe2 build system) and didn't actually work. ``` Stacked on https://github.com/pytorch/pytorch/pull/14849 review last commit only Pull Request resolved: https://github.com/pytorch/pytorch/pull/14866 Differential Revision: D13419475 Pulled By: ezyang fbshipit-source-id: cb4c843df69a1d8369314c9fab1b7719520fa3db	2018-12-11 19:15:27 -08:00
Christian Puhrsch	f564163951	Remove SSE-only code and convolve5x5 (#12109 ) Summary: Performance oriented code will use AVX/AVX2, so we don't need SSE specific code anymore. This will also reduce the probability of running into an error on legacy CPUs. On top of this convolve is covered by modern libraries such as MKLDNN, which are much more performant and which we now build against by default (even for builds from source). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12109 Differential Revision: D10055134 Pulled By: colesbury fbshipit-source-id: 789b8a34d5936d9c144bcde410c30f7eb1c776fa	2018-10-09 10:53:50 -07:00
Gregory Chanan	9a7c196040	Move Type, Tensor, TensorMethods to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11519 Reviewed By: yf225 Differential Revision: D9771684 Pulled By: gchanan fbshipit-source-id: a57ee2072af99ce856f895c688b09d750a8606e0	2018-09-12 13:10:54 -07:00
Christian Puhrsch	aeb6094538	Unify opt flag for cmake codegen (#11227 ) Summary: Also enables debug for non-MSVC for kernel codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/11227 Differential Revision: D9656506 Pulled By: cpuhrsch fbshipit-source-id: 667195cb55de1a1a9042b6b1c4436e9c6c743333	2018-09-05 08:55:49 -07:00
Mingzhe Li	f0d8a36e70	Completely remove build_aten and use_aten (#10469 ) Summary: Breaking out of #8338 to completely remove build_aten and use_aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10469 Reviewed By: orionr Differential Revision: D9413639 Pulled By: mingzhe09088 fbshipit-source-id: b7203aa4f5f2bb95c504c8dc187a3167f2570183	2018-08-20 20:26:42 -07:00
Edward Yang	64a6f17177	Fix ATen/core header installation. (#10463 ) Summary: Fixes #10353 and fixes #10397. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10463 Differential Revision: D9296491 Pulled By: ezyang fbshipit-source-id: f825c2a21a113e44a6f5c1c5ec17814d9deac366	2018-08-13 09:25:49 -07:00
Edward Yang	37a226de63	When BUILD_ATEN=OFF, use ATen/core directly (#10019 ) Summary: ATenCore.h is a dummy header to just test that this is working at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019 Reviewed By: smessmer Differential Revision: D9067262 Pulled By: ezyang fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee	2018-07-30 21:09:55 -07:00
Christian Puhrsch	e9e47ce8f1	Vectorize sigmoid (#8612 ) Summary: This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8612 Reviewed By: ezyang Differential Revision: D8712298 Pulled By: cpuhrsch fbshipit-source-id: 01a3d06af8d04513edd024ab1d01a6b753fc6f6a	2018-07-10 12:40:39 -07:00
anderspapitto	41ef5c2d4b	Support for generating ATen during the fbcode build, rather than committing the generated files (#8002 ) Paint the internal bikeshed a slightly different color to appease Buck tooling.	2018-06-01 16:04:02 -04:00
Orion Reblitz-Richardson	4bf0202cac	[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399 ) * Have PyTorch depend on minimal libcaffe2.so instead of libATen.so * Build ATen tests as a part of Caffe2 build * Hopefully cufft and nvcc fPIC fixes * Make ATen install components optional * Add tests back for ATen and fix TH build * Fixes for test_install.sh script * Fixes for cpp_build/build_all.sh * Fixes for aten/tools/run_tests.sh * Switch ATen cmake calls to USE_CUDA instead of NO_CUDA * Attempt at fix for aten/tools/run_tests.sh * Fix typo in last commit * Fix valgrind call after pushd * Be forgiving about USE_CUDA disable like PyTorch * More fixes on the install side * Link all libcaffe2 during test run * Make cuDNN optional for ATen right now * Potential fix for non-CUDA builds * Use NCCL_ROOT_DIR environment variable * Pass -fPIC through nvcc to base compiler/linker * Remove THCUNN.h requirement for libtorch gen * Add Mac test for -Wmaybe-uninitialized * Potential Windows and Mac fixes * Move MSVC target props to shared function * Disable cpp_build/libtorch tests on Mac * Disable sleef for Windows builds * Move protos under BUILD_CAFFE2 * Remove space from linker flags passed with -Wl * Remove ATen from Caffe2 dep libs since directly included * Potential Windows fixes * Preserve options while sleef builds * Force BUILD_SHARED_LIBS flag for Caffe2 builds * Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing * Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake * Fixes for the last two changes * Potential fix for Mac build failure * Switch Caffe2 to build_caffe2 dir to not conflict * Cleanup FindMKL.cmake * Another attempt at Mac cpp_build fix * Clear cpp-build directory for Mac builds * Disable test in Mac build/test to match cmake	2018-05-24 07:47:27 -07:00

47 Commits