pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	9a8f71f23e	Convert logging f-strings to use % format (#98697 ) Codemod done with https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with assistance from ChatGPT. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Edward Z. Yang	5df59f957f	Fix G001,G002,G003 in logs to % syntax (#97812 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97812 Approved by: https://github.com/Skylion007, https://github.com/kiukchung, https://github.com/malfet, https://github.com/mlazos	2023-04-01 01:43:33 +00:00
loganthomas	c848a777e8	DOC: Various typo fixes (#97095 ) Various typos found while browsing documentation/source code. Thank you for a wonderful deep-learning library! Pull Request resolved: https://github.com/pytorch/pytorch/pull/97095 Approved by: https://github.com/mikaylagawarecki, https://github.com/kit1980	2023-03-20 20:46:04 +00:00
min-jean-cho	e70ea8d58d	enable taskset core pinning in addition to numactl (#96011 ) - port https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-cpu/pull/740 to `run_cpu` - use-case by https://github.com/pytorch/serve/pull/2166 where `numactl` is unavailable (e.g., requires `privileged` mode) This PR automatically tries taskset if numactl core binding doesn't work. Reference: `taskset` is added to adapt to launcher use-cases such as in docker where `numactl` requires to be ran in `privileged` mode, where the `privileged` mode "wont work for deployments like sagemaker for example" as raised by TorchServe. Please see [torchserve ipex docker discussion](https://github.com/pytorch/serve/pull/1401#issuecomment-1090817704) for reference. To address such use-cases, `taskset` can be used in place of `numactl` to set core affinity. Note that, unlike `numactl`, `taskset` does not provide memory binding to local memories; however, memory binding may not be needed in these use-cases that typically do not span multi sockets. Hence we can automatically try taskset if numactl doesn't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96011 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-03-07 01:19:46 +00:00
Nikita Shulga	5de3ead712	[MPS] Add optional `minor` argument to `is_macos13_or_newer` (#95065 ) Will be needed if one wants to make accurate XFAIL validation I.e. `torch.backends.mps.is_macos13_or_newer()` will return True if PyTorch is running on MacOS 13.0 or newer, `torch.backends.mps.is_macos13_or_newer(1)` will return True if running on MacOS 13.1 or newer and `torch.backends.mps.is_macos13_or_newer(2)` will return True if running on MacOS 13.2 or newer Do not use 13.3 check as `@available` does not really work for shared libraries Pull Request resolved: https://github.com/pytorch/pytorch/pull/95065 Approved by: https://github.com/albanD	2023-02-17 18:30:20 +00:00
Aaron Gokaslan	b46b2e35d4	[BE] Add flake8-logging-format linter (#94840 ) Follow up to #94708 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94840 Approved by: https://github.com/ezyang	2023-02-15 17:54:50 +00:00
Ramin Azarmehr	b57e6fdb50	[MPS] Enable Memory Leak Detection for test_mps.py (#94646 ) - To check for Memory Leaks in `test_mps.py`, set the env-variable `PYTORCH_TEST_MPS_MEM_LEAK_CHECK=1` when running test_mps.py (used CUDA code as reference). - Added support for the following new python interfaces in MPS module: `torch.mps.[empty_cache(), set_per_process_memory_fraction(), current_allocated_memory(), driver_allocated_memory()]` - Renamed `_is_mps_on_macos_13_or_newer()` to `_mps_is_on_macos_13_or_newer()`, and `_is_mps_available()` to `_mps_is_available()` to be consistent in naming with prefix `_mps`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94646 Approved by: https://github.com/malfet	2023-02-13 17:56:24 +00:00
Xuehai Pan	5b1cedacde	[BE] [2/3] Rewrite `super()` calls in functorch and torch (#94588 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-10 21:16:33 +00:00
Driss Guessous	70026aaad6	[SDPA] update type hint for scaled_dot_product_attention and documentation (#94008 ) # Summary - Adds type hinting support for SDPA - Updates the documentation adding warnings and notes on the context manager - Adds scaled_dot_product_attention to the non-linear activation function section of nn.functional docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/94008 Approved by: https://github.com/cpuhrsch	2023-02-10 18:02:43 +00:00
Xuehai Pan	a229b4526f	[BE] Prefer dash over underscore in command-line options (#94505 ) Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility. Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library: `argparse.BooleanOptionalAction`: `4a9dff0e5a/Lib/argparse.py (L893-L895)` ```python class BooleanOptionalAction(Action): def __init__(...): if option_string.startswith('--'): option_string = '--no-' + option_string[2:] _option_strings.append(option_string) ``` It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-09 20:16:49 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
blzheng	0c1777acec	Dynamo benchmark: add CPU specific changes (#88477 ) This pr adds some CPU specific changes: - Add support for IPEX backend - https://github.com/pytorch/torchdynamo/issues/1618 - https://github.com/pytorch/torchdynamo/issues/1534 - Enable CPU launcher in runner.py. - Fix the issue that some environment variables are not support on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-01-07 09:26:06 +00:00
Nikita Shulga	fd3a7264ae	[MPS] Add `group_norm[fwd+backward]` and `mean_var` (take 2) (#91190 ) Use Prims to implement group_norm, group_norm_backward and mean_var Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in `15af4b1cee/torch/__init__.py (L1095)` is executed last during init process. Add `__all__` to `torch/backends/mps/__init__.py` as well as alias all imports as private Add `TestNNMPS.test_group_norm_backward` that validates no NaNs are generated during the backward pass Fixes https://github.com/pytorch/pytorch/issues/88331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190 Approved by: https://github.com/albanD	2022-12-22 08:54:37 +00:00
PyTorch MergeBot	645eda0a00	Revert "[MPS] Add `group_norm[fwd+backward]` and `mean_var` (#91190 )" This reverts commit `371716eb36`. Reverted https://github.com/pytorch/pytorch/pull/91190 on behalf of https://github.com/kit1980 due to Broke test_correct_module_names because of underscore _ops	2022-12-21 19:37:43 +00:00
Eddie Yan	8b617f813d	[cuBLAS] Add an option to disable reduced precision reductions for BF16 GEMM (#89172 ) Essentially the same change as #67946, except that the default is to disallow reduced precision reductions in `BFloat16` GEMMs (for now). If performance is severely regressed, we can change the default, but this option appears to be necessary to pass some `addmm` `BFloat16` tests on H100. CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/89172 Approved by: https://github.com/ngimel	2022-12-21 18:58:28 +00:00
Nikita Shulga	371716eb36	[MPS] Add `group_norm[fwd+backward]` and `mean_var` (#91190 ) Use Prims to implement group_norm, group_norm_backward and mean_var Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in `15af4b1cee/torch/__init__.py (L1095)` is executed last during init process. Depends on https://github.com/pytorch/pytorch/pull/91203 Fixes https://github.com/pytorch/pytorch/issues/88331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190 Approved by: https://github.com/albanD	2022-12-21 17:33:27 +00:00
Nikita Shulga	3859aace20	[MPS] Skip tests broken on Ventura (#90843 ) Also add `torch.backends.mps.is_macos13_or_newer` See https://github.com/pytorch/pytorch/issues/85758 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90843 Approved by: https://github.com/kulinseth, https://github.com/albanD	2022-12-14 19:51:00 +00:00
Xiao Wang	e856a4d66b	Add an env var to skip cudnn version compatibility check (#89184 ) skip the check by setting `PYTORCH_SKIP_CUDNN_COMPATIBILITY_CHECK=1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89184 Approved by: https://github.com/ngimel	2022-11-17 20:10:52 +00:00
Kazuaki Ishizaki	1cd6ebe095	Fix typos in messages under torch (#89049 ) This PR fixes typos of messages in `.py` files under torch directory. Only in `torch/onnx/symbolic_opset16.py`, fix a typo in comment to make the operator name correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89049 Approved by: https://github.com/lezcano	2022-11-17 04:18:14 +00:00
Driss Guessous	b291c1213a	Create native function for determining which implementation of SDP to call (#89029 ) # Summary Creates a callable native function that can determine which implementation of scaled dot product will get called. This allows to bump re-order the runtime dispatch of SDP to enable autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89029 Approved by: https://github.com/cpuhrsch	2022-11-16 03:07:54 +00:00
Driss Guessous	35c611d30f	Add mem efficient backend flag (#87946 ) # Summary Add in a torch.backends.cuda flag and update context manager to pic between the three implementations of the scaled_dot_product_attention. cc @cpuhrsch @jbschlosser @bhosmer @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/87946 Approved by: https://github.com/cpuhrsch	2022-10-28 15:51:10 +00:00
Jane Xu	91c7015426	[einsum] Fix opt_einsum defaults to be more reasonable (#86985 ) Fixes the confusing situation mentioned here https://github.com/pytorch/pytorch/issues/85224#issuecomment-1278628262 by - setting better OG defaults - changing warnings to errors now that we have better defaults Test plan: - Ran einsum tests locally + CI - Uninstalled opt-einsum and ran through setting - `enabled` to False (doesn't throw error) - `strategy` to anything that's not None (errors) - `strategy` to None (noops) - Installed opt-einsum and ran through setting - `enabled` to False (doesn't throw error) - `enabled` to True (doesn't throw error, no ops + defaults to 'auto') - `strategy` to random string (errors) - `strategy` to None (noops, still is 'auto') - `strategy` to 'greedy' (is set to 'greedy') Pull Request resolved: https://github.com/pytorch/pytorch/pull/86985 Approved by: https://github.com/soulitzer	2022-10-15 06:23:50 +00:00
Jane Xu	a348975e00	Add opteinsum backend to give users control (#86219 ) This achieves the same things as https://github.com/pytorch/pytorch/pull/85908 but using backends instead of kwargs (which breaks torchscript unfortunately). This also does mean we let go of numpy compatibility BUT the wins here are that users can control what opt einsum they wanna do! The backend allows for..well you should just read the docs: ``` .. attribute:: torch.backends.opteinsum.enabled A :class:`bool` that controls whether opt_einsum is enabled (on by default). If so, torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html) to calculate an optimal path of contraction for faster performance. .. attribute:: torch.backends.opteinsum.strategy A :class:`str` that specifies which strategies to try when `torch.backends.opteinsum.enabled` is True. By default, torch.einsum will try the "auto" strategy, but the "greedy" and "optimal" strategies are also supported. Note that the "optimal" strategy is factorial on the number of inputs as it tries all possible paths. See more details in opt_einsum's docs (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html). ``` In trying (and failing) to land 85908, I discovered that jit script does NOT actually pull from python's version of einsum (because it cannot support variadic args nor kwargs). Thus I learned that jitted einsum does not subscribe to the new opt_einsum path calculation. Overall, this is fine since jit script is getting deprecated, but where is the best place to document this? ## Test plan: - added tests to CI - locally tested that trying to set the strategy to something invalid will error properly - locally tested that tests will pass even if you don't have opt-einsum - locally tested that setting the strategy when opt-einsum is not there will also error properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/86219 Approved by: https://github.com/soulitzer, https://github.com/malfet	2022-10-05 06:33:25 +00:00
Driss Guessous	cd6477617c	Custom sdp implementations dense (#85984 ) # Summary - This code creates the runtime dispatch system for choosing a performant fused SDP kernel. The only choice of fused kernel is flash_attention. It also creates python flags and a context manager that can be used to turn off and on behavior for dispatch. - This also adds support for flash_attention with dense tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85984 Approved by: https://github.com/cpuhrsch	2022-10-03 17:36:37 +00:00
Xia, Weiwen	3a3e2002d8	[Quant] Add unified x86 quant backend (#84329 ) ## Description Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) ## Validation Correctness Covered by UT Accuracy By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) Performance Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput \| fbgemm \| x86 \| improvement -- \| -- \| -- \| -- wide_resnet101_2 \| 173.5675 \| 241.815 \| 39.32% resnext101_32x8d \| 174.365 \| 339.8175 \| 94.89% resnet50 \| 573.155 \| 1174.14 \| 104.86% vgg19_bn \| 260.335 \| 337.92 \| 29.80% vgg19 \| 257.935 \| 333.265 \| 29.21% inception_v3 \| 601.1175 \| 1309.33 \| 117.82% densenet161 \| 296.645 \| 435.5625 \| 46.83% mnasnet1_0 \| 1216.7 \| 4057.515 \| 233.49% squeezenet1_0 \| 1220.085 \| 5153.3875 \| 322.38% alexnet \| 2294.91 \| 2624.6375 \| 14.37% fbnetc_100 \| 976.2825 \| 3110.1825 \| 218.57% shufflenet_v2_x0_5 \| 1555.76 \| 3026.125 \| 94.51% spnasnet_100 \| 1059.065 \| 3502.0975 \| 230.68% pytorch-unet \| 192.76 \| 246.77 \| 28.02% acgan \| 257.32 \| 333.7325 \| 29.70% cgan \| 7790.6925 \| 7803.1025 \| 0.16% sgan \| 257.565 \| 338.8875 \| 31.57% se_resnet50 \| 492.3725 \| 916.5175 \| 86.14% vggm \| 300.2875 \| 316.2075 \| 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168	2022-09-29 00:44:40 +00:00
anjali411	cf2f552cd8	Add __all__ to torch.{fx, distributed, backends} submodules (#85079 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85079 Approved by: https://github.com/rohan-varma	2022-09-20 12:51:08 +00:00
Bartek Rymkowski	0a6f32619e	CoreML .mlmodel export support (#84784 ) Test Plan: This was tested manually - model was exported and XCode was used to analyze it Reviewed By: jmdetloff Differential Revision: D39048536 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84784 Approved by: https://github.com/jmdetloff	2022-09-17 02:06:43 +00:00
joncrall	b136f3f310	More doctest refinements. (#83317 ) Follow up to #82797 Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way. @ezyang @vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317 Approved by: https://github.com/ezyang	2022-08-22 20:07:26 +00:00
Jing Xu	5257d1d64b	A Launch script with Best Recipe of Deep Learning on Intel Xeon CPU (#63932 ) Fixes https://github.com/pytorch/pytorch/issues/63556 Usage: `python -m torch.backends.xeon.launch [--knobs] <script> [script parameters]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63932 Approved by: https://github.com/albanD	2022-07-29 12:57:22 +00:00
Jing Xu	0e95746580	[RFC] enable oneMKL&oneDNN on-demands verbose functinality (#63212 ) RFC: Problem statement  Intel oneMKL and oneDNN are used to accelerate performance on Intel platforms. Both these 2 libraries provide verbose functionality to dump detailed operator execution information as well as execution time. These verbose messages are very helpful to performance profiling. However, the verbose functionality works for the entire execution. In many scenarios, though, we only would like to profile partial of the execution process. This feature is to expose PyTorch API functions to control oneDNN and oneMKL verbose functionality in runtime. Additional context   The most used performance profiling steps are shown as the following code snippet: ``` def inference(model, inputs): # step0 (optional): jit model = torch.jit.trace(model, inputs) # step1: warmup for _ in range(100): model(inputs) # step2: performance profiling. We only care the profiling result, as well as oneDNN and oneMKL verbose messages, of this step model(inputs) # step3 (optional): benchmarking t0 = time.time() for _ in range(100): model(inputs) t1 = time.time() print(‘dur: {}’.format((t1-t0)/100)) return model(inputs) ``` Since environment variables MKL_VERBOSE and DNNL_VERBOSE will be effect to the entire progress, we will get a great number of verbose messages for all of 101 iterations (if step3 is not involved). However, we only care about the verbose messages dumped in step2. It is very difficult to filter unnecessary verbose messages out if we are running into a complicated usages scenario. Also, jit trace will also bring more undesired verbose messages. Furthermore, there are more complicated topologies or usages like cascaded topologies as below: ``` model1 = Model1() model2 = Model2() model3 = Model3() x1 = inference(model1, x) x2 = inference(model2, x1) y = inference(model3, x2) ``` There are many cases that it is very hard to split these child topologies out. In this scenario, it is not possible to investigate performance of each individual topology with `DNNL_VERBOSE` and `MKL_VERBOSE`. To solve this issue, oneDNN and oneMKL provide API functions to make it possible to control verbose functionality in runtime. ``` int mkl_verbose (int enable) status dnnl::set_verbose(int level) ``` oneDNN and oneMKL print verbose messages to stdout when oneMKL or oneDNN ops are executed. Sample verbose messages: ``` MKL_VERBOSE SGEMM(t,n,768,2048,3072,0x7fff64115800,0x7fa1aca58040,3072,0x1041f5c0,3072,0x7fff64115820,0x981f0c0,768) 8.52ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:44 dnnl_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_training,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,,,mb16ic768oc768,0.0839844 ``` Design and implementation  The design is to make python-interfaced wrap functions to invoke mkl_verbose and dnnl::set_verbose functions. Design concern   - Need to add wrapper C++ functions for mkl_verbose and dnnl::set_verbose functions in torch/csrc and aten/csrc. - Python API functions will be added to device-specific backends - with torch.backends.mkl.verbose(1): - with torch.backends.mkldnn.verbose(1): Use cases   ``` def inference(model, inputs): # step0 (optional): jit model = torch.jit.trace(model, inputs) # step1: warmup for _ in range(100): model(inputs) # step2: performance profiling with torch.backends.mkl.verbose(1), torch.backends.mkldnn.verbose(1): model(inputs) # step3 (optional): benchmarking t0 = time.time() for _ in range(100): model(inputs) t1 = time.time() print(‘dur: {}’.format((t1-t0)/100)) return model(inputs) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63212 Approved by: https://github.com/VitalyFedyunin, https://github.com/malfet	2022-07-27 23:29:35 +00:00
Eddie Yan	ae6dd20ba7	[cuDNN V8 API] (reopen 2) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#78299 ) Reopen of #77002 to address comments by @malfet CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/78299 Approved by: https://github.com/ngimel	2022-07-07 23:25:23 +00:00
atalman	a2ee1a92d6	Change cudnn incompatibility message wording (#80877 ) Change cudnn incompatibility message wording Please refer to: #80637 Test: ``` File "/home/atalman/torch/backends/cudnn/__init__.py", line 67, in version if not _init(): File "/home/atalman/torch/backends/cudnn/__init__.py", line 50, in _init raise RuntimeError( RuntimeError: cuDNN version incompatibility: PyTorch was compiled against (8, 3, 2) but found runtime version (8, 0, 3). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN.Looks like your LD_LIBRARY_PATH contains incompatible version of cudnnPlease either remove it from the path or install cudnn (8, 3, 2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80877 Approved by: https://github.com/zou3519	2022-07-07 17:29:19 +00:00
lezcano	f7b9a46880	Deprecate torch.lu BC-breaking note: This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`. A upgrade guide is added to the documentation for `torch.lu`. Note this PR DOES NOT remove `torch.lu`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77636 Approved by: https://github.com/malfet	2022-06-07 22:50:14 +00:00
Akshit Khurana	bb3e1f30a8	[Pytorch NNAPI] Add compilation_preference & relax_f32_to_f16 APIs (#78758 ) Summary: compilation_preference is one of: ANEURALNETWORKS_PREFER_LOW_POWER = 0 ANEURALNETWORKS_PREFER_FAST_SINGLE_ANSWER = 1 ANEURALNETWORKS_PREFER_SUSTAINED_SPEED = 2 relax_f32_to_f16 calls Model_relaxComputationFloat32toFloat16 Test Plan: Tested on device with nnapi models * Works with existing exported models * Works with new exported models with options Differential Revision: D36433236 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78758 Approved by: https://github.com/kimishpatel	2022-06-06 20:57:34 +00:00
Max Ren	93d5a722b1	[coreml] Introducing Quantization (#78108 ) Summary: Adding Quantization mode to preprocess, which allows us to run through quantization for coreml models Test Plan: https://fburl.com/anp/r0ntsbq0 Notebook runnining through quantization workflow: created a custom bentos kernel to run it through coreml ```bento_kernel( name = "coreml", deps = [ "fbsource//third-party/pypi/coremltools:coremltools", "//caffe2:coreml_backend", "//caffe2:coreml_backend_cpp", "//caffe2:torch", "//caffe2/torch/fb/mobile/model_exporter:model_exporter", ], ) ``` Initial benchmarks on iPhone 11: FP32 Core ML Model: https://our.intern.facebook.com/intern/aibench/details/203998485252700 Quantized Core ML Model: https://our.intern.facebook.com/intern/aibench/details/927584023592505 High End Quantized Model: https://our.intern.facebook.com/intern/aibench/details/396271714697929 Summarized Results \| Backend \| Quantization \| p50 net latency \| Model Size \| \|---------\|--------------\|-----------------\|------------\| \| Core ML \| No \| 1.2200 \| 1.2mb \| \| Core ML \| Yes \| 1.2135 \| 385kb \| \| CPU \| Yes \| 3.1720 \| 426kb \| Reviewed By: SS-JIA Differential Revision: D36559966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78108 Approved by: https://github.com/jmdetloff	2022-06-01 17:10:17 +00:00
PyTorch MergeBot	b994ce359e	Revert "[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002 )" This reverts commit `c274f2ad52`. Reverted https://github.com/pytorch/pytorch/pull/77002 on behalf of https://github.com/malfet due to please, as it breaks internal CI, but also no CUDA heads should be included from `torch/csrc/Module.cpp`, but rather should be implemented/registered in `torch/csrc/cuda/Module.cpp`	2022-05-24 21:52:35 +00:00
Nikita Shulga	6244daa6a9	[MPS] Fix torch.mps.is_available() (#78121 ) By introducing `at:mps::is_available()` and changing `torch._C._is_mps_available` from property to memoizable callable Also, if `_mtl_device` is released in MPSDevice destructor, shouldn't it be retained in the constructor Looks like GitHubActions Mac runner does not have any Metal devices available, according to https://github.com/malfet/deleteme/runs/6560871657?check_suite_focus=true#step:3:15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78121 Approved by: https://github.com/albanD	2022-05-24 05:10:38 +00:00
Eddie Yan	c274f2ad52	[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002 ) (reopening due to botched merge) The cuDNN V8 API (main support merged in https://github.com/pytorch/pytorch/pull/60755) potentially exposes many more kernels with benchmark=True. While these additional kernels can improve performance, it is often unnecessary to run every kernel returned by the heuristic and doing so may degrade the user experience by causing the first model iteration to be very slow. To alleviate this issue, this PR introduces torch.backends.cudnn.benchmark_limit. benchmark_limit specifies the maximum number of working cuDNN kernels to try for a given workload, with the default being 10 (similar to what TensorFlow does). benchmark_limit = 0 yields the current behavior of trying every kernel returned by the heuristic. CC @ptrblck @ngimel @xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77002 Approved by: https://github.com/ngimel	2022-05-24 00:11:47 +00:00
Kulin Seth	f348b1b2b5	Add the Runtime components for MPS backend. (#76725 ) The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend. Current list of identified TODOs are: - https://github.com/pytorch/pytorch/issues/77176 - Unify the logic with CUDACachingAllocator and remove redundant code. - https://github.com/pytorch/pytorch/issues/77170 - Look into using C++ smart pointers where possible with ObjC code - Use empty_strided_generic() to implement the `empty_strided_mps` code - https://github.com/pytorch/pytorch/issues/77144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725 Approved by: https://github.com/albanD	2022-05-11 17:19:45 +00:00
PyTorch MergeBot	1467e0dd5d	Revert "Deprecate torch.lu" This reverts commit `a5bbfd94fb`. Reverted https://github.com/pytorch/pytorch/pull/73804 on behalf of https://github.com/malfet	2022-05-09 19:06:44 +00:00
lezcano	a5bbfd94fb	Deprecate torch.lu BC-breaking note: This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`. A upgrade guide is added to the documentation for `torch.lu`. Note this PR DOES NOT remove `torch.lu`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73804 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-05-05 19:17:11 +00:00
Kurt Mohler	5375b2e994	Resolve `int[]?` arguments to new OptionalIntArrayRef class This PR uses the `OptionalArrayRef` template class that was drafted in #64084. Fixes #44409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864 Approved by: https://github.com/ezyang	2022-03-26 01:45:50 +00:00
Tao Xu	06ff4f570c	[Core ML] Support enumerated input shapes (#74441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74441 For xirp based segmentation models, we want to support enumerated input shapes. This allows us to support both landscape and portrait mode images without sacrificing the performance. P488118264 ghstack-source-id: 151736964 Test Plan: `buck run coreml:xirp -- --model="/home/taox/xirp/xirp_20a.pt" --out="/home/taox/xirp/xirp_20a_coreml_enumerated.ptl"` Reviewed By: mcr229 Differential Revision: D34803184 fbshipit-source-id: c462c0783846a1489ca7ce4d5a654aa6927c9c44 (cherry picked from commit 67d418c97531daaf3d03d1000ca4a4ff60de2a95)	2022-03-21 21:32:24 +00:00
Weiwen Xia	060f1b822a	Add onednn quant backend (#74137 ) Summary: Resolve the conflicts in https://github.com/pytorch/pytorch/pull/69820 jerryzh168 Please review. Thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74137 Reviewed By: samdow Differential Revision: D34840477 Pulled By: jerryzh168 fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425 (cherry picked from commit de76bb808b315e9a2e45d8c5f1c1233a47d669c4)	2022-03-15 01:28:21 +00:00
Jerry Zhang	5a897536f3	Revert D33716039: [pytorch][PR] Add ONEDNN quantization backend Test Plan: revert-hammer Differential Revision: D33716039 (`989b24855e`) Original commit changeset: 6f7bb807e857 Original Phabricator Diff: D33716039 (`989b24855e`) fbshipit-source-id: ed233c5b99d4edb7d5a9d6c600825c78555f16d0 (cherry picked from commit d3e1f825b06ef67adb13623ccb7cbf1b700c1dd5)	2022-03-11 22:06:25 +00:00
Xia Weiwen	989b24855e	Add ONEDNN quantization backend (#69820 ) Summary: This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI. ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK. To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models. ```python torch.backends.quantized.engine = 'onednn' ``` ## Design docs https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983 https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096 ## File changes Add ONEDNN to qengine list - aten/src/ATen/Context.cpp - c10/core/QEngine.h - torch/ao/quantization/qconfig.py - torch/backends/quantized/\_\_init\_\_.py Implement qconv & qlinear for ONEDNN backend - aten/src/ATen/native/quantized/cpu/conv_serialization.h - aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp - aten/src/ATen/native/quantized/cpu/onednn_utils.h - aten/src/ATen/native/quantized/cpu/qconv.cpp - aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp - aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp - aten/src/ATen/native/quantized/cpu/qlinear.cpp - aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp - aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp Skip tests that are not supported by ONEDNN - test/ao/sparsity/test_kernels.py - test/quantization/core/test_quantized_module.py - test/quantization/core/test_quantized_op.py ## Validation results This PR has passed `test_quantization.py` and `test_mkldnn.py`. Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform: (Note: Tested with single instance on single core. Using the latest oneDNN library.) Table 1. Performance comparison of int8 2d convolution operator \|No.\| Shape\| FBGEMM\| ONEDNN\| Gain\| \|-\|-\|-\|-\|-\| \|1\| IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 668.310us\| 535.630us\| 24.8%\| \|2\| IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 290.630us\| 281.810us\| 3.1%\| \|3\| IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 1.045ms\| 893.010us\| 17.0%\| \|4\| IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 385.320us\| 373.720us\| 3.1%\| \|5\| IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 1.876ms\| 1.641ms\| 14.3%\| \|6\| IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 660.460us\| 638.470us\| 3.4%\| Table 2. Performance comparison of int8 linear operator \|No.\| Shape (m, n, k)\| FBGEMM\| ONEDNN\| Gap\| \|-\|-\|-\|-\|-\| \|1\| 64, 800, 320\| 80.550us\| 96.770us\| 20.10%\| \|2\| 64, 768, 512\| 101.230us\| 130.720us\| 29.10%\| \|3\| 16, 256, 512\| 30.230us\| 51.450us\| 70.20%\| \|4\| 128, 128, 128\| 33.810us\| 50.480us\| 49.30%\| \|5\| 256, 512, 256\| 154.490us\| 195.050us\| 26.30%\| \|6\| 1024, 1024, 1024\| 3.134ms\| 3.514ms\| 12.10%\| ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820 Reviewed By: HDCharles Differential Revision: D33716039 Pulled By: jerryzh168 fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd (cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)	2022-03-11 20:31:49 +00:00
lkct	7d542a4f2b	Fix type annotation for `torch.backends.cudnn.allow_tf32` (#72757 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/72753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72757 Reviewed By: samdow Differential Revision: D34204436 Pulled By: ngimel fbshipit-source-id: 3528efd7bdf72c1d9338806555ecb643ab94ffeb (cherry picked from commit `7036c2e6e6`)	2022-02-14 17:26:37 +00:00
Akshit Khurana	a70297e7cb	NNAPI: quant logistic fix (#70847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70847 NNAPI needs a fixed zero point and scale for sigmoid (logistic) ghstack-source-id: 146555935 Test Plan: LIBNEURALNETWORKS_PATH="/path/to/libneuralnetworks.so" pytest test/test_nnapi.py Reviewed By: dreiss Differential Revision: D33237918 fbshipit-source-id: 05ef3a81bf1589ad44b599a19bce4066531c432b	2022-01-07 13:36:33 -08:00
Akshit Khurana	44283c2766	NNAPI: Add qint16 support via int16 (#70621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70621 Pytorch doesn't have support for qint16 yet. Add an option to handle qint16 via int16 & qint32 data types. * For qint16 tensors in NNAPI, the user sends a qint32 tensor. We convert the qint32 to int16 for the converter and set the zero point and scale for nnapi * inputs to the model have to have fixed scale and zero point and are only supported for testing * Added a flag use_int16_for_qint16 which will be used maintain backwards compatibility in the converter when true qint16 is supported in PyTorch ghstack-source-id: 146507483 Test Plan: pytest test/test_nnapi.py Reviewed By: dreiss Differential Revision: D33285124 fbshipit-source-id: b6376fa1bb18a0b9f6a18c545f600222b650cb66	2022-01-04 23:12:38 -08:00
Akshit Khurana	1150046d29	NNAPI: Add runtime flexible shapes & return shapes (#70334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70334 * Use 0 for load time flexible shapes * -1 for runtime flexible shapes * NNAPI needs return shapes for flexible outputs Test Plan: Tested via upcoming ops Reviewed By: dreiss Differential Revision: D33237922 fbshipit-source-id: 50afdd8e3c6401dfb79b4bc09513c9882a09e5d5	2022-01-04 08:37:09 -08:00
Akshit Khurana	d9106116aa	nnapi: Add int32 type torchscript expressions (#70197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70197 Test Plan: * `pytest test/test_nnapi.py` * Testing via ops following this commit Reviewed By: anshuljain1, dreiss Differential Revision: D33237917 fbshipit-source-id: f0493620f28a62ad9fe0b97b67d1e25059d50c24	2022-01-03 19:00:38 -08:00
Xiao Wang	bfe5ad28e6	[Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980 ) Summary: Per title. This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU. Usage: ```python torch.backends.cuda.preferred_linalg_library('cusolver') ``` Available options (str): `'default'`, `'cusolver'`, `'magma'`. Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime. Performance of linear algebra operators after this PR should be no worse than before. The flag is set to `'default'` by default, which makes everything the same as before this PR. The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980 Reviewed By: mruberry Differential Revision: D32849457 Pulled By: ngimel fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6	2021-12-03 19:06:30 -08:00
eqy	790763b0fe	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 ) Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe	2021-11-09 17:27:20 -08:00
Akshit Khurana	1de8976e85	Add quantized::convtranspose2d (#63914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63914 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531889 fbshipit-source-id: a65e389da2722efbc62e3fe1edf503732326350d	2021-09-24 17:07:29 -07:00
Akshit Khurana	ab5eb56983	add qmul (#63913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63913 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531890 fbshipit-source-id: 29d88cc61bd1e328cc7ae7a91a2f8d4819803c8d	2021-09-24 17:06:17 -07:00
Tao Xu	7dc3858deb	[CoreML][fbcode] Add the `preprocess` python APIs (#64521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521 Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage. ghstack-source-id: 138324214 Test Plan: ``` (base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb] buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt" Parsing buck files: finished in 0.5 sec Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated Total time: 11.1 sec Converting Frontend ==> MIL Ops: 100%\|██████████████████████████████████████████▉\| 382/383 [00:00<00:00, 692.58 ops/s] Running MIL optimization passes: 100%\|███████████████████████████████████████████\| 18/18 [00:00<00:00, 45.55 passes/s] Translating MIL ==> MLModel Ops: 100%\|███████████████████████████████████████████\| 704/704 [00:01<00:00, 468.56 ops/s] input { name: "input_0" type { multiArrayType { shape: 1 shape: 3 shape: 224 shape: 224 dataType: FLOAT32 } } } output { name: "645" type { multiArrayType { dataType: FLOAT32 } } } metadata { userDefined { key: "com.github.apple.coremltools.source" value: "torch==1.10.0a0+fb" } userDefined { key: "com.github.apple.coremltools.version" value: "4.1" } } {'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'} WARNING: Logging before InitGoogleLogging() is written to STDERR W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module) graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2, %x.1 : Tensor): %51 : str = prim::Constant[value="Exception: Backend is not available."]() %50 : str = prim::Constant[value="AssertionError: "]() %14 : str = prim::Constant[value="forward"]() # <string>:5:62 %48 : Tensor = prim::Uninitialized() %44 : Tensor = prim::Uninitialized() %typed_inputs.1 : Any[] = prim::ListConstruct(%x.1) %__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19 %49 : Tensor = prim::If(%8) # <string>:4:16 block0(): %__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1) %15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47 %17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24 %18 : Any = prim::ListUnpack(%17) %20 : bool = prim::isinstance[types=[Tensor]](%18) %39 : Tensor = prim::If(%20) # <string>:6:18 block0(): %22 : Tensor = prim::unchecked_cast(%18) -> (%22) block1(): = prim::RaiseException(%50) # <string>:6:18 -> (%44) -> (%39) block1(): = prim::RaiseException(%51) # <string>:9:18 -> (%48) return (%49) ``` Reviewed By: raziel Differential Revision: D30585154 fbshipit-source-id: 66c7d2e931be6eaa3c43a0ee131ea8046452449d	2021-09-17 00:25:14 -07:00
Akshit Khurana	2d58f3f56d	NNAPI: Support const values in binary ops Summary: NNAPI converter failed with 1 const value and one tensor earlier Code suggestions from dreiss Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary Imported from OSS Reviewed By: anshuljain1 Differential Revision: D28893881 fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6	2021-08-20 21:10:26 -07:00
Amy He	73f1e2d1dc	[8/N] Nnapi backend delegation preprocess: New refactored design (#62225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225 Rewrote the preprocess function for Android NNAPI delegate. Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule). Dictionary returned contains: "shape_compute_module": torch::jit::Module, "ser_model": torch::Tensor, "weights": List[torch.Tensor], "inp_mem_fmts": List[int], "out_mem_fmts": List[int] Purpose and Future: The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient. Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well. nnapi_backend_preprocess.cpp: preprocess implementation prepare.py: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule Test: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully ghstack-source-id: 134444190 Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully Reviewed By: raziel Differential Revision: D29922279 fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab	2021-07-27 18:52:48 -07:00
Akshit Khurana	8e71f48f0a	Handle simple NNAPI flatten NHWC case (#61796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61796 We can easily handle nnapi conversion for nhwc inputs that have 1 channel or H & W are 1 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Imported from OSS Reviewed By: saketh-are Differential Revision: D29827735 fbshipit-source-id: 65dee4b42fceef1b032bf5dd1c4cc6e020d01e14	2021-07-26 10:59:04 -07:00
Akshit Khurana	a3670ba377	Add option to specify custom NNAPI serializer (#61025 ) Summary: To add serializer for custom ops we can subclass default serializer and update ADDER_MAP Pull Request resolved: https://github.com/pytorch/pytorch/pull/61025 Test Plan: * pytest test/test_nnapi.py::TestNNAPI for current serializer * Custom serializers to be tested with custom ops Imported from OSS Reviewed By: anshuljain1 Differential Revision: D29480745 fbshipit-source-id: 37e3f8de3c97f6c8a486f9879ce11430ea89af34	2021-07-09 15:27:10 -07:00
Akshit Khurana	ae65f63971	Make nnapi flatten converter accept flex inputs (#61024 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61024 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29480748 fbshipit-source-id: c334b09600a64d3e552cec843d6da3de28e7d27c	2021-07-09 15:27:02 -07:00
Akshit Khurana	76c0f223d3	Make nnapi cat converter accept flex inputs Summary: As title Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_cat Reviewed By: anshuljain1 Differential Revision: D29480747 fbshipit-source-id: 161803054ff1a4c2c750fc30a5f0fc6d8a24b2c9	2021-07-09 14:27:53 -07:00
Akshit Khurana	9e81d3d869	Make NNAPI linear converter accept flex inputs (#61022 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61022 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_linear Reviewed By: anshuljain1 Differential Revision: D29480749 fbshipit-source-id: 35975861740298c9e16f866c939e7ee3c2151710	2021-07-09 14:27:51 -07:00
Akshit Khurana	9e533a62f6	Make conv2d nnapi converter accept flexible batch (#61021 ) Summary: Same as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61021 Test Plan: pytest test/test_nnapi.py::TestNNAPI Reviewed By: anshuljain1 Differential Revision: D29480746 fbshipit-source-id: 7217c8f3a811db8c3c373f3e7ca31caf9502ef22	2021-07-09 10:28:10 -07:00
Akshit Khurana	8bd3e52e00	Add conv2d transpose NNAPI converter (#59529 ) Summary: * Conv2d transpose support * Quantize WIP Pull Request resolved: https://github.com/pytorch/pytorch/pull/59529 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_conv2d_transpose Reviewed By: anshuljain1 Differential Revision: D28926335 fbshipit-source-id: 8f90182f96cee0a13c4f38331d421e1e8ac618de	2021-07-09 09:29:20 -07:00
Ivan Kobzarev	7b6ddb6793	[nnapi] add log_softmax (#61378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61378 Test Plan: Imported from OSS Reviewed By: axitkhurana Differential Revision: D29597355 Pulled By: IvanKobzarev fbshipit-source-id: 55124749f8eeffa2b2713f7cffd5ccf965561de1	2021-07-07 18:28:39 -07:00
Akshit Khurana	baa518e2f6	Add Int32 support for NNAPI (#59365 ) Summary: Support Int32 tensors in NNAPI converter Pull Request resolved: https://github.com/pytorch/pytorch/pull/59365 Test Plan: Local testing with FB prod models Reviewed By: anshuljain1 Differential Revision: D28881040 fbshipit-source-id: 2dacceffd322a21d91bfefcf2fb2ea400d952d0d	2021-07-07 12:40:49 -07:00
Akshit Khurana	cf285d8eea	Add aten::slice NNAPI converter (#59364 ) Summary: Add support for aten::slice op in the NNAPI model converter * If start = 0; end = max -> identity * Flexible shapes can be passed through * Flexible shapes can't be sliced over Pull Request resolved: https://github.com/pytorch/pytorch/pull/59364 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_slice Reviewed By: anshuljain1 Differential Revision: D28881039 fbshipit-source-id: 3c1c630ff27b5bba6eda403d87570c61d43ae90e	2021-07-07 12:40:47 -07:00
Akshit Khurana	d26372794a	Add aten::detach NNAPI converter (#58543 ) Summary: * Add support for aten::detach op in the NNAPI model converter as a no-op * Also add flexible op support for add_pointwise_simple_unary_op Pull Request resolved: https://github.com/pytorch/pytorch/pull/58543 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_detatch Reviewed By: anshuljain1 Differential Revision: D28531942 fbshipit-source-id: 4387dbbbadd8ce6b690841f3a903e68a380b849d	2021-07-07 12:40:46 -07:00
Akshit Khurana	0be228dd5f	Add aten::flatten NNAPI converter (#60885 ) Summary: Add support for aten::div op in the NNAPI model converter. Startup time variable size support isn't supported as shapes go as inputs to NNAPI op Runtime variable size support to supported soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/60885 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29451725 fbshipit-source-id: 8902745f7758c8cc88ad4b4ce02b8301ff894bd4	2021-07-07 12:40:44 -07:00
Akshit Khurana	b297f65b66	Add aten::div NNAPI converter (#58541 ) Summary: Add support for aten::div op in the NNAPI model converter. Add variable size input test as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58541 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_div Reviewed By: anshuljain1 Differential Revision: D28531943 fbshipit-source-id: e96342146f6de216f7b88443618edfc54963747c	2021-07-07 12:40:42 -07:00
Akshit Khurana	eab18a9a40	Add aten::to NNAPI converter (#58540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58540 Add support for aten::to op in the NNAPI model converter for simple cases like to("cpu"), to("gpu") Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_to Reviewed By: anshuljain1 Differential Revision: D28531941 fbshipit-source-id: 0c934f7aceaff2669307c3426efe32046d8c44f3	2021-07-07 12:40:41 -07:00
Akshit Khurana	14d604a13e	Add aten::softmax NNAPI converter (#58539 ) Summary: Add support for aten::softmax op in the NNAPI model converter with flexible size Pull Request resolved: https://github.com/pytorch/pytorch/pull/58539 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_softmax Reviewed By: anshuljain1 Differential Revision: D28531946 fbshipit-source-id: 8633f3e3f7f52795f9866ff16ad0867ea36a19e8	2021-07-07 12:39:31 -07:00
Akshit Khurana	369802a504	Add aten::avgpool2d NNAPI converter (#58538 ) Summary: Add support for aten::avgpool2d op in the NNAPI model converter with var size support Pull Request resolved: https://github.com/pytorch/pytorch/pull/58538 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_avgpool2d Reviewed By: anshuljain1 Differential Revision: D28531944 fbshipit-source-id: 43ff8c9389365698c282f204042b49c7ec84d824	2021-07-01 14:07:14 -07:00
Akshit Khurana	c4bb6a5781	NNAPI: flex size support for upsample_nearest2d op (#57563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57563 Add flexible size support for upsample_nearest2d op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200847 fbshipit-source-id: 901fe3f6e68e4c16ece730f3ffa68dc88c6ed6c3	2021-05-05 13:54:43 -07:00
Akshit Khurana	4c609a9782	NNAPI: Add qadd flexible size support (#57562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57562 Add flexible size support for qadd op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200849 fbshipit-source-id: d5b2ea8e9eb8ae405ff2c960f7549cef60bc0991	2021-05-05 13:54:41 -07:00
Akshit Khurana	28cd04ea64	NNAPI: add flexible size support for conv2d (#57561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57561 Add flexible size support for conv2d op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200848 fbshipit-source-id: d94ccf48a3d8453aa8e96c7cac02948c4cd870cc	2021-05-05 13:53:33 -07:00
Guilherme Leobas	e7c79cb158	Add type annotations to nnapi (#48142 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48141 ~Mypy is complaining about a missing arg in a function call.~ ```bash torch/backends/_nnapi/serializer.py:806: error: Too few arguments for "_do_add_binary" [call-arg] Found 1 error in 1 file (checked 1140 source files) ``` `9392137dbe/torch/backends/_nnapi/serializer.py (L804-L806)` ~dreiss, would you mind take a look when you have some cycles to spare and see what would be the appropriated value for `fuse_code` here? Thanks :)~ Edit: https://github.com/pytorch/pytorch/issues/48925 got merged a couple of days ago. The blocking part is now unblocked, and I just pushed the changes to make mypy happy again. This PR is ready for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48142 Reviewed By: ezyang Differential Revision: D28006249 Pulled By: walterddr fbshipit-source-id: 5e43eeba7143512a549efaad31541f86718add7c	2021-04-26 19:08:07 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
David Reiss	da7a27b847	[NNAPI] Initial flexible size support (#54701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54701 We need NNAPI models to support inputs (and, by extension, intermediate values and outputs) whose shape is only determined at load time. For example, a vision models input shape might be dependent on the aspect ratio of the device camera. While NNAPI has full support for variable shapes (by setting components of the operand shape to 0), the guidance we have received is that vendor-provided drivers for real hardware are not able to support this efficiently. Therefore, we take a hybrid approach where shapes are calculated at model load time to semi-dynamically construct our NNAPI model. While this doesn't let us have truly dynamic input shapes, it does allow us to ensure that the vendor driver only sees fixed shapes, so we get maximum performance. In this initial commit, only PReLU supports dynamic shapes. Additional operators will be converted in separate diffs. - In order to convert a flexible-shape model, the user supplies inputs with shapes containing dimensions of size 0 for the flexible dimensions. - During conversion, we generate code to compute the shapes of all intermediates and outputs as a function of the input shapes. - We no longer run the input model to produce the output templates. Instead, we generate code to return properly-sized templates, given the input shapes. - All of this generated code goes into a "ShapeComputeModule" that is used by the NnapiModule during initialization. - The ShapeComputeModule mutates the serialized model to fill in the computed sizes for each operand. This requires us to change the dtype for the serialized model to int32, but this should be fine because everything in it is already 4-byte aligned. - NnapiInitWrapper no longer exists. Instead, initialization is performed on the first run, based on the real arguments. We plan to provide an API for doing eager initialization. - Unit test updated to allow separate arguments to be given for trace, conversion, and inference. A flexible-shape test case was added for PReLU. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536796 Pulled By: dreiss fbshipit-source-id: 105585f247987b1e6ec6946a6fe44401237cb0a0	2021-04-06 13:49:43 -07:00
David Reiss	1e3b3a4714	[NNAPI] Create get_next_operand_id (#54700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54700 This is an internal method just to make it more clear what len(self.operands) is doing. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536794 Pulled By: dreiss fbshipit-source-id: 678cee8a47df6757dd2e6feabf2560fd82d32e26	2021-04-06 13:49:41 -07:00
David Reiss	ca67c17e46	[NNAPI] Add fixed-size assertions (#54699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54699 We'll soon be adding support for flexible-size tensors to the NNAPI converter, but it won't be added to all ops at once. Create get_tensor_operand_by_jitval_fixed_size as a wrapper for get_tensor_operand_by_jitval that verifies that the argument has a fixed shape. Update all call sites. As flexible size support is added to each op, the call sites can be converted back and proper size checks added. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536791 Pulled By: dreiss fbshipit-source-id: 6fb1fea814d767b6ff263fd8b88240a51be74777	2021-04-06 13:49:38 -07:00
David Reiss	5936faee7e	[NNAPI] Rename local variable (#54698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54698 "mf" was short for memory format, but the concept that this variable represents was renamed to "dim_order", so rename the variable. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536793 Pulled By: dreiss fbshipit-source-id: 2b31c70da1ff221a7833e67486690fa606f01dea	2021-04-06 13:49:35 -07:00
David Reiss	1f1d26137b	[NNAPI] Use code generation to better support list input/output (#54697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54697 Previously, models being converted to NNAPI were expected to take inputs as separate arguments, but the generated NNAPI model could only take multiple inputs as a list. Now the generated model always takes inputs (single or multiple) as separate tensor arguments. Previously, models being converted to NNAPI were expected to return outputs as a single tensor or tuple of tensors, but the generated NNAPI model would return multiple outputs as a list. Now the generated model returns a tuple as well (or single tensor). Internally, we decied what output format to use (single tensor or tuple) based on the conversion process, rather than by running the model. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536790 Pulled By: dreiss fbshipit-source-id: c0f93c85d450757e568985947cc2f32043795859	2021-04-06 13:49:33 -07:00
David Reiss	d34d6244e7	[NNAPI] Use array instead of struct for serializing ints (#54696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54696 This was originally developed for a Python version where array was not available. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536792 Pulled By: dreiss fbshipit-source-id: 39e5507e37d4f91871113439fe752a4d5373eaba	2021-04-06 13:49:30 -07:00
David Reiss	476c597ae6	[NNAPI] Handle binary ops combining NHWC+NCHW in some cases (#48812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48812 This came up in a squeeze-and-excitation model. Starting with an NHWC tensor T, we perform a mean operation across H and W, giving an NxC tensor, which (after some fully connected layers) is reshaped to NxCx1x1, then multiplied with T. To handle this, we detect the specific case of a binary op with one NHWC input and one contiguous input with H,W == 1,1 and allow the op to be applied (after transposing the contiguous input). Test Plan: Unit test. Reviewed By: axitkhurana Differential Revision: D25317939 Pulled By: dreiss fbshipit-source-id: b4c17ab3b874d1a7defa04664010ba82115f1c20	2021-04-06 13:49:25 -07:00
David Reiss	b057d27b0b	[NNAPI] Add support for unsqueeze, cat, and mean (#48811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48811 Test Plan: Unit tests. Reviewed By: axitkhurana Differential Revision: D25317936 Pulled By: dreiss fbshipit-source-id: 9b3a0a75b8157ae35ac13d52293a67800bad0ded	2021-04-06 13:49:22 -07:00
David Reiss	8fcf9ca341	[NNAPI] Update support for Linear (#54695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54695 Previously, torch.nn.Linear was calling aten::addmm internally. Now it's calling aten::linear, so add support for that. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536795 Pulled By: dreiss fbshipit-source-id: 42c8d2a80b20ac12ed9bba599c5e0e874256bb13	2021-04-06 13:49:17 -07:00
David Reiss	8d960f7043	[NNAPI] Fix hardtanh (#47520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47520 NNAPI defines "RELU1" as clamping from [-1, 1], not [0, 1] as I previously assumed. Fix our implementation to match. Test Plan: Upcoming unit test. Reviewed By: axitkhurana Differential Revision: D25317934 Pulled By: dreiss fbshipit-source-id: 70efd5bb6092b0628ff6b765ce6f6274ef28d741	2021-04-06 13:49:14 -07:00
David Reiss	beca1fdbec	[NNAPI] Fix MUL op (#47519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47519 This wasn't updated when _do_add_binary was refactored. Test Plan: Upcoming unit test. Reviewed By: axitkhurana Differential Revision: D25317938 Pulled By: dreiss fbshipit-source-id: 99212404c189481cfa692dd77d8f7c7865b6872b	2021-04-06 13:49:12 -07:00
David Reiss	38a3c28f17	[NNAPI] Remove solid weights support (#47518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47518 This was left over from an old version of the code. The idea was that instead of indexing into separate tensors for each weight, you could bundle them all into a single file and use different offsets into that file. With the current design, this is nontrivial to support, so drop the code for now. Test Plan: CI Reviewed By: axitkhurana Differential Revision: D25317935 Pulled By: dreiss fbshipit-source-id: e26ab3a8d437cb1bbb50319209fa56d9c571ce61	2021-04-06 13:49:09 -07:00
David Reiss	1be909f074	[NNAPI] Fix models with no weights (#47517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47517 While we're unlikely to see this in practice, it comes up in unit tests. This type annotation is necessary for `torch.jit.script` to figure out the type of the list if it is empty. Test Plan: Unit tests in a later diff. Reviewed By: axitkhurana Differential Revision: D25317937 Pulled By: dreiss fbshipit-source-id: de8b6665c6fcd3cd2b39e3c696a39336c064e4c1	2021-04-06 13:49:06 -07:00
Akshit Khurana	d0fd41dcfe	Add size op in nnapi serializer (#52026 ) Summary: serializer didn't support aten::size Pull Request resolved: https://github.com/pytorch/pytorch/pull/52026 Test Plan: Torchvision Mobilenetv2 [script](https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.html) works. [Test](`ecfed07cc5`) to be merged after [this PR](https://github.com/pytorch/pytorch/pull/47521/files) is merged Reviewed By: dreiss Differential Revision: D26363133 Pulled By: axitkhurana fbshipit-source-id: 772a6bea62bca69f8bba19c25c582a1734a70eb1	2021-02-10 15:57:01 -08:00
David Reiss	9a9383ef2e	PyTorch NNAPI integration prototype (#46780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46780 This is in prototype status, but pretty functional. There are two major parts. - Model converter. This is a pure Python component that consumes a model in TorchScript format, converts the operations into NNAPI semantics, and serializes the model in a custom format. It then wraps the result in a new TorchScript model that can invoke NNAPI under the hood. - Runtime. This is a TorchBind object that deserializes the model and sends the result to NNAPI. This is fairly simple since the serialized format is basically just a list of NNAPI calls to make, so most of the code is spent on bounds checking. A few notes on the design. - Currently, all tensor sizes need to be fixed, and those fixed sizes are burned directly into the serialized model. This will probably need to change. NNAPI supports variable-sized tensors, but the important hardware backends do not. However, we're seeing use cases crop up where the input size is not known until around the time that the model is loaded (for example, it might depend on the camera aspect ratio). I think the proper fix here is to remove the code in the converter that eagerly calculates the sizes of the intermediate tensors and replace it with a code generator that will generate some TorchScript code that will perform those calculations at model load time. This way, we will be able to support models that have variable-sized inputs while still only showing fixed-sized operands to NNAPI. - The important hardware backends want operands to be in NHWC order, but PyTorch natively represents all tensors and NCHW. The strategy for this is to keep NCHW during most of the conversion process, but track and additional value per operand representing the "dimension order". The dimension order gets propagated through convolutions and pointwise ops. When we're ready to serialize the model, we reorder the dimensions for "channels last" operands to NHWC. Test Plan: Some local testing with FB prod models. I'll need to add some examples and automated tests. Reviewed By: iseeyuan Differential Revision: D24574040 Pulled By: dreiss fbshipit-source-id: 6adc8571b234877ee3666ec0c0de24da35c38a1f	2020-11-05 21:31:01 -08:00
Jane (Yuan) Xu	1c996b7170	Enable typechecking for torch.testing._internal.common_quantized.* (#44805 ) Summary: Addresses a subproblem of [Issue 42969](https://github.com/pytorch/pytorch/issues/42969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44805 Reviewed By: malfet Differential Revision: D23742754 Pulled By: janeyx99 fbshipit-source-id: e916a6a0c049cac318549a485d47f19363087d15	2020-09-17 14:24:32 -07:00
Xiang Gao	e48201c5cf	Mention TF32 on related docs (#44690 ) Summary: cc: ptrblck ![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690 Reviewed By: ngimel Differential Revision: D23727921 Pulled By: mruberry fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914	2020-09-16 19:18:30 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Nikita Shulga	c44e4878ae	Enable torch.backends.quantized typechecks (#44794 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44794 Reviewed By: walterddr Differential Revision: D23734353 Pulled By: malfet fbshipit-source-id: 491bd7c8f147759715eb296d7537a172685aa066	2020-09-16 12:21:20 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
Shawn Zhong	21ba3b4f40	Fix `torch.backends.cudnn` mypy error (#38947 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38410 ![image](https://user-images.githubusercontent.com/6421097/82724121-74b26880-9c99-11ea-9b63-e92de2dccdf2.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38947 Differential Revision: D21765290 Pulled By: ezyang fbshipit-source-id: 5d2b25f039a653c609d60cdaac4a7ac5812ae291	2020-06-03 10:55:43 -07:00
guol-fnst	42b2dee6c2	`verbose` unused in `torch.backends.cudnn` (#39228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39228 Differential Revision: D21818455 Pulled By: ezyang fbshipit-source-id: abf158f2d745fd135cd0966ee30d559cefa456c0	2020-06-01 09:08:03 -07:00
Ailing Zhang	7c13a07286	[Reland] Remove uses of type() part 2 (#38288 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/38140. It got reverted since it broke slow tests which were only run on master branch(thanks mruberry !). Enabling all CI tests in this PR to make sure they pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38288 Reviewed By: mruberry Differential Revision: D21524923 Pulled By: ailzhang fbshipit-source-id: 3a9ecc7461781066499c677249112434b08d2783	2020-05-12 13:37:14 -07:00
Mike Ruberry	f6b1c046b6	Revert D21483808: [pytorch][PR] Remove uses of type() part 2 Test Plan: revert-hammer Differential Revision: D21483808 Original commit changeset: 12f5de6151ba fbshipit-source-id: 2755fa97ae3f342ae88b1531acfa790772a27c17	2020-05-09 00:42:39 -07:00
Ailing Zhang	86d28706e0	Remove uses of type() part 2 (#38140 ) Summary: I'm mostly done with cleaning up test/ folder. There're a bunch of remaining callsites but they're "valid" in testing `type()` functionalities. We cannot remove them until it's fully deprecated. Next PR would mainly focus on move some callsites to an internal API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38140 Differential Revision: D21483808 Pulled By: ailzhang fbshipit-source-id: 12f5de6151bae59374cfa0372e827651de7e1c0f	2020-05-08 19:30:46 -07:00
Kimish Patel	4c30fc7238	Integrate XNNPACK with custom class for packing weights. (#34047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34047 This PR integrates the added xnnpack conv2d and linear op via custom class registration for packed weights. The packed struct is serializable. Test Plan: python test test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185657 fbshipit-source-id: fc7e692d8f913e493b293b02d92f4e78536d7698	2020-03-14 12:51:56 -07:00
Peter Bell	5fc5cf6571	Stop using ctypes to interface with CUDA libraries. (#33678 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678 Differential Revision: D20249187 Pulled By: ezyang fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed	2020-03-11 07:22:46 -07:00
Jithun Nair	718c538ff9	Add ability to enable/disable MIOpen at runtime (#33118 ) Summary: 1. Set `torch._C.has_cudnn` to `True` for ROCm 2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()` 3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118 Differential Revision: D19977719 Pulled By: bddppq fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad	2020-02-20 10:47:57 -08:00
peter	b77c25dec0	Fix dll load logic for Python 3.8 on Windows (#32215 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215 Differential Revision: D19501869 Pulled By: ezyang fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915	2020-01-22 08:33:34 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Dmytro Dzhulgakov	764bf826e3	Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840 Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized Test Plan: Imported from OSS Differential Revision: D17604403 Pulled By: dzhulgakov fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc	2019-09-27 13:45:15 -07:00
Supriya Rao	45391ccecb	Update qengine flag in python to string (#26620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620 This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now). set_qengine and get_qengine return an int which represents the at::QEngine enum Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17533582 fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f	2019-09-23 17:56:50 -07:00
Jerry Zhang	8f50ea0f5c	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26471 att Test Plan: . Imported from OSS Differential Revision: D17491215 fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9	2019-09-19 17:42:09 -07:00
Ailing Zhang	b1ecf4bc82	Revert D17464904: Add NoQEngine to QEngine and refactor the name of set/get qengine Test Plan: revert-hammer Differential Revision: D17464904 Original commit changeset: d8f2cebb978f fbshipit-source-id: 8feb86f7347f455eb51538ce7893d4a096ba0ba4	2019-09-18 20:04:58 -07:00
Jerry Zhang	4f7292f7ee	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26330 att Test Plan: . Imported from OSS Differential Revision: D17464904 fbshipit-source-id: d8f2cebb978fcbc478bc7e111ba24bc71a6f8915	2019-09-18 19:38:59 -07:00
Supriya Rao	24d5b5f5f9	Add Runtime flag for quantized backend. (#25680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680 Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both. The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine) ghstack-source-id: 89935643 Test Plan: Verified torch.backends.quantized.engine works Differential Revision: D17198233 fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672	2019-09-11 21:37:36 -07:00
jiayisun	b9bf91feb8	Add torch.backends.mkldnn.enabled flag (#25459 ) Summary: This PR is about add torch.backends.mkldnn.enabled flag said in https://github.com/pytorch/pytorch/issues/25186 which can be used disable mkldnn at runtime step as torch.backends.cudnn.enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25459 Differential Revision: D17258926 Pulled By: ezyang fbshipit-source-id: e179ad364cc608fdaa7d0f37e2e762ceb5eda598	2019-09-11 12:09:40 -07:00
peter	d6f62b70f3	Fix cuda and cudnn libraries search process on Windows (#20205 ) Summary: Fixes #20202 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20205 Differential Revision: D15258626 Pulled By: ezyang fbshipit-source-id: 855ad457a8bb7a46accc7cf6ec5cb09e98f6e770	2019-05-08 06:08:47 -07:00
Tongzhou Wang	973d51079b	Add device-specific cuFFT plan caches (#19300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300 Differential Revision: D14986967 Pulled By: soumith fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255	2019-04-18 06:39:35 -07:00
Edward Yang	50df3e5e2e	Add ability to query if built with CUDA and MKL-DNN. (#18362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362 ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18362 Add ability to query if built with CUDA and MKL-DNN. Fixes #18108. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14584430 fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473	2019-03-25 10:39:09 -07:00
SsnL	13422fca32	Add torch.backends.openmp.is_available(); fix some cmake messages (#16425 ) Summary: 1. add `torch.backends.openmp.is_available()` 2. Improve various `cmake` outputs 3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets 4. Fix `MKL` warning message, and QUIET flag. 5. Fix various typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425 Differential Revision: D13903395 Pulled By: soumith fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d	2019-01-31 16:15:46 -08:00
Lu Fang	b1b00f329e	Fix the flake8 linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549 Reviewed By: bddppq Differential Revision: D13877435 Pulled By: houseroad fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540	2019-01-30 09:36:00 -08:00
David Riazati	bc74ec80d0	Add support for torch.backends.cudnn.enabled (#13057 ) Summary: This is used commonly in `nn` functions. This PR adds it as a weak module (and also alters the conversion of weak modules to strong modules to accept ordinary `object`s) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057 Differential Revision: D10846618 Pulled By: driazati fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b	2018-10-31 09:31:09 -07:00
sclarkson	2b033332c8	Allow linking to backwards-compatible cuDNN at runtime (#12239 ) Summary: Fixes #12193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12239 Differential Revision: D10321744 Pulled By: soumith fbshipit-source-id: bf437f7f9b6231158a1585d2dabae8d937396478	2018-10-10 23:56:51 -07:00
Matt Dawkins	87b2f05a9c	Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435 ) Summary: Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435 Differential Revision: D9736396 Pulled By: soumith fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f	2018-09-09 11:40:25 -07:00
Peter Goldsborough	9ce15173fb	Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012 ) Summary: The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class. The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set. To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API. I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN. ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/9012 Reviewed By: pjh5 Differential Revision: D8689786 Pulled By: goldsborough fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c	2018-06-29 17:25:23 -07:00
Tongzhou Wang	e6c7b38f94	Cache cufft plans (#8344 ) * cache cufft plans * use an LRU cache * suffix CuFFTParams members with _ * import print_function for py2 * lint * fix potential race; add dummy impl for CPU only builds * cpp formatting; remove nccl makefile change * Use CUDA hooks instead * comments and doc * update the error message * move LRU cachae to a separate file and native::detail namespace * update comment * specify NOTE location in CuFFTPlanCache.h * update disabled_features.yaml to make amd ci work * another fix for AMD CI in disabled_features.yaml * Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__ * improve the notes * lint * revert onnx change * put back inlining for CUFFT_CHECK	2018-06-22 13:02:34 -04:00
Peter Goldsborough	0acddd6cee	Add torch.cuda.cudnn_is_available (#8703 )	2018-06-20 14:18:03 -07:00
Edward Z. Yang	64834f6fb8	Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275 ) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a hooks interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-10 10:28:33 -07:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
gchanan	749d51414a	Separate cuda-ness from dtype. (#6470 ) * Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.	2018-04-12 14:05:44 -04:00
Tongzhou Wang	22ef8e5654	[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855 ) This is the first of three PRs that #5537 will be split into. This PR adds mkl headers to included files, and provides helper functions for MKL fft and cuFFT. In particular, on POSIX, headers are using mkl-include from conda, and on Windows, it is from a new file @yf225 and I made and uploaded to s3. * add mkl-include to required packages * include MKL headers; add AT_MKL_ENABLED flag; add a method to query MKL availability * Add MKL and CUFFT helpers	2018-03-19 15:43:14 -04:00
gchanan	a3442f62bc	Support native namespace functions with type dispatch. (#5576 ) * Support native namespace functions with type dispatch. Use 'ones' as an example. Note this is a "halfway" solution; i.e. the call chain is: at::ones(shape, dtype) -> dtype.ones(shape, dtype) -> CPUFloatType.ones(shape, dtype) -> at::native::ones(shape, dtype) The "nicer" solution would probably be something like: at::ones(shape, dtype) -> dtype.ones(shape) -> CPUFloatType.ones(shape) -> at::native::ones(shape, this) * Fix type inference. * Fix test install. * Fix extensions. * Put dtype argument at the beginning. * Fix extension.cpp. * Fix rnn. * Move zeros in the same manner. * Fix cuda. * Change randn. * Change rand. * Change randperm. * Fix aten contrib. * Resize in randperm_out. * Implement eye. * Fix sparse zeros. * linspace, logspace. * arange. * range. * Remove type dispatch from gen_python_functions. * Properly generate maybe_init_cuda for type dispatch functions not named type. * Don't duplicate dtype, this parameters for native type dispatched functions. * Call VariableType factory methods from the base type so it gets version number 0. * Address review comments.	2018-03-09 10:52:53 -05:00
Edward Z. Yang	0877558e60	Port cuDNN RNN dropout state initialization to ATen and make Python c… (#5383 ) * Port cuDNN RNN dropout state initialization to ATen and make Python code use it. Fixes #5138. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Variable/Tensor bugfix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-02 10:00:00 -05:00
Sam Gross	895aebac08	Use Variable instead of Tensor in Function.forward (#4786 ) The Tensor and Variable classes are being merged. autograd.Function.forward is now called on Variables, but with "no-grad" mode (torch.no_grad()) enabled. One benefit is that we no longer have to explicitly track shared storages.	2018-02-06 17:24:27 -05:00
Edward Z. Yang	7bd2db997e	Port cuDNN RNN bindings to ATen (#4881 ) * Add transpose() to TensorGeometry. This code is dead; I briefly used it in my RNN patchset but eventually rewrote it to not be necessary. However, it seemed like a useful gadget so I kept it. In general, it seems that it would be useful for TensorGeometry to support all operations that Tensor does, but it only computes the changes to sizes/strides instead of actually doing the computation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Turn on wrap_dim behavior for TensorGeometry Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support for hard-coded differentiable outputs. Some outputs of functions are nondifferentiable, and should always be returned with requires_grad=False. Traditionally, we have used the presence of 'grad' to signal that only the first output is differentiable, and the rest are not, but cudnn_rnn (to be implemented) breaks this pattern; its first three outputs are differentiable, but its last output is a buffer that is just consumed by backwards. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * TensorGeometry constructor from just sizes The sizes are assumed to form a contiguous tensor, and we compute the strides we would get in that case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support saving TensorList for backwards. There is some back story here. Saved TensorList in backwards will be used by cudnn_rnn, and it is worth asking, why is it necessary to save a list of tensors? Indeed, technically speaking a list of tensors is not necessary, we only need to save the sizes of each of the weight tensors. (We need the sizes because cuDNN is only going to blast the derivative of weights into a flat buffer, but we need to match the sizes of the views into the buffer when we eventually return the derivatives.) However, it was surprisingly awful trying to implement passing just sizes, because as non-Tensor arguments, the JIT interpreter generation code is expected to handle all non-Tensor arguments as attributes in the trace, and our attributes struct doesn't actually know how to do arrays of arrays. Saved TensorList code was much easier to get working, so that's what this patch does. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef. Like ArrayRef, this class does not own the underlying data, it is expected to be used in situations where the data resides in some other buffer. This is intended to be trivially copyable, so it should be passed by value. For now, 2D only (so the copies are actually cheap, without having to write a SmallVector class) and contiguous only (so we can return non-strided ArrayRef on index). The intended use-case (not in this commit) is to make it easier to work with RNN weights, which are num_weights x num_layers matrix of parameters. P.S. dimension 0 indexes rows, dimension 1 indexes columns Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Generalize getDataType in Descriptors.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Change copy_range to take Tensor, and change cat_tensors_backward accordingly Should a backward function return a Variable or a Tensor? For the most part, all of our backward functions return Tensor, except cat_tensors_backward, which returns a variable_list (which is really the only thing that matters, because Tensor and Variable are interconvertible). But this is kind of weird, because it means that you can't implement a backwards in ATen that returns a std::vector<Tensor>, and then hook it up transparently with the derivatives code. So I switched it over. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 5-ary return Tensor tuple. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support code generation with mixed Tensor/TensorList in output. I don't think I ended up using this in cudnn_rnn, but this seems it might be useful for someone else later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 4-ary boolean array Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add support for retain_variables in tools/autograd/derivatives.yaml 'retain_variables', a bool which is true if a user has specified that saved variables should be retained in case the backwards is run again later. This allows an optimization where we can destroy saved buffers if we know variables are not going to be retained, e.g., it is (will be) used by _cudnn_rnn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lazily initialize cuDNN descriptors Previously, cuDNN descriptors were eagerly allocated as soon as a FooDescriptor object was created. However, in some uses of TensorDescriptor, this is problematic: some tensors are optional and cuDNN's API expects to be given a nullptr TensorDescriptor in this case, not an uninitialized (but allocated) descriptor. Lazily initializing the descriptors makes it less likely for us to use uninitialized memory and matches the usual semantics of unique_ptr. It's good sense! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port cuDNN RNNs to ATen. This brings three new functions: - _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into a single contiguous weight buffer as required by cuDNN - _cudnn_rnn: run RNN forwards - _cudnn_rnn_backward: run RNN backwards RNNs have a lot of parameters, so we restructured what was previously a single 'fn' object that recorded all the parameters into three objects: RNNDescriptorParams, TensorDescriptorListParams and DropoutDescriptorParams. We make use of MatrixRef to organize the weight tensors (which are weight/bias x number of layers), but I did not teach the codegen how to pass these as arguments/return values natively, so instead a MatrixRef is passed as its constituent ArrayRef and int64_t stride0. cudnn_rnn has three differentiable outputs and one nondifferentiable one, so it makes use of the support for hard-coded differentiable outputs. I haven't deleted all of the descriptor code from Python, because dropout initialization still goes through this codepath, that should be fixed soon but I don't see it as essential for this PR. This commit also removes the last use of NestedIOFunction from PyTorch. There are some shenanigans with cuDNN dropout descriptor initialization, see below: Note [cuDNN dropout descriptor initialization] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In most cases, setting descriptors in cuDNN is cheap (e.g., cudnnSetTensorNdDescriptor). However, this is not the case for cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an expensive precomputation to initialize the random number generator states. In cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor, which means that law-abiding clients were expected to generate a dropout descriptor once and cache it. However, our ATen interface is (1) stateless (so we can't cache the descriptors) and (2) does not accept arbitrary user types in its interface (so we can't pass the descriptor in). This puts us in a pickle. In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which forgoes the expensive initialization process, and can initialize the descriptor with a pre-initialized state CUDA tensor. This is great, because it means we can simply pass in the state tensor and then initialize the descriptor internally. Unfortunately, this function is not available in cuDNN 6. To work around this, we break the cuDNN abstraction barrier, and have the struct layout of the underlaying dropout descriptor. With this struct, we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix cuDNN 7 behavior. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused, controversial methods from MatrixRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add missing filter_dim_a slice Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Replace nested for-loop with itertools.chain. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR comment on mut_desc() Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Refactor DropoutDescriptor API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Use cached CurrentDeviceProperties from Context. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Document _cudnn_rnn outputs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Improve fmap docs, convert some functions to use it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Move IndexRange to autograd/function.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Elaborate on CUDNN_STATUS_INVALID_VALUE return some more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add an all-in-one setter for RNNDescriptorParams. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Print what the unrecognized RNN mode was Signed-off-by: Edward Z. Yang <ezyang@fb.com> * RNN TensorDescriptor improvements - Have an explicit size/stride overload for set TensorDescriptor, so you don't have to create a goofy view to feed in. - Change the padding to 3D rather than 5D, which is all you actually need (it's just 2D that is not supported by cuDNN API.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix implementation of cudnnRestoreDropoutDescriptor, plus test. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Better comments about input layout. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add comment about no-DropoutDescriptor argument RNNDescriptor function. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Rename vocab_size back to input_size. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't use backslash in comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Bugfix for contiguous TensorGeometry calculation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make contiguity errors more user-friendly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/fn.dropout.train/fn_train/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make dcx properly undefined when not required. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Remove old TODO. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add state size check in cudnnRestoreDropoutDescriptor Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Explicitly narrow int64_t to size_t Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Restore copyParams comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update benchmark numbers, and slight engineering improvements. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Typofix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-05 13:54:11 -05:00
Edward Z. Yang	7d25a41251	Fix #4492 , make it impossible to forget to reset cudnn flags (#4503 ) Three stage plan to no more stupidly weird "why isn't cuDNN enabled" bugs: - Add torch.backends.cudnn.disable_global_flags(), which as its name suggests, disables global flag setting in cuDNN, so that you are not allowed to make changes to this state. However, the flags() context manager continues to work (since they are non-global changes). - Call disable_global_flags() in test/common.py - Switch all of the manual flag setting/unsetting in test/test_nn.py to use the context manager. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-08 12:21:09 -05:00
Edward Z. Yang	5f7c5502b8	Further improvements to ATen convolution (#4287 ) - Rename THNN convolution to have thnn_ prefix. - Propagate CuDNN benchmark and deterministic to at::Context - Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults The conv_transposeNd wrappers are updated to have the same argument order as Python. - torch.nn.functional directly dispatches to the native wrappers - Make it possible to turn off tracing for some native wrappers, so I don't have to write symbolics for all the functions above - Spectral ops can now make use of CuDNN convolution if possible - Better commentary on cudnn_batch_norm - Turn on DCE for all JIT tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-21 13:03:43 -05:00
Edward Z. Yang	787b9c5202	Propagate CuDNN enabled to ATen library. (#4104 ) This is not currently used by anything, but eventually ATen will need to make decisions about whether or not to use CuDNN functions or not, which means we need to propagate this variable to ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-14 11:29:25 -05:00
Richard Zou	28890b2046	Add rnn args check (#3925 ) * Add rnn args check * Check both hidden sizes for LSTM * RNN args check test	2017-12-13 12:48:00 -05:00
peter	ba3b79b06b	Fix the missing import	2017-11-14 09:36:43 +01:00
Christian Sarofeen	0443c11f7e	Fix for cuDNN half precision RNN for pre-volta archs (#3613 ) * Fix for cuDNN half RNN on pre-volta archs * Fix cuDNN versioning in rnn. * lint fix	2017-11-11 11:34:58 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
Sean Naren	cf256ee268	Added tensor op check for cudnn rnns (#3409 )	2017-11-01 05:51:23 -04:00
Priya Goyal	2443fcac0b	Deterministic cudnn algorithms	2017-10-10 10:53:34 -04:00
Adam Paszke	ceb4f84d12	Improve memory usage of cuDNN RNN modules (#2179 )	2017-07-25 04:00:17 +05:30
Gregory Chanan	69287250d1	Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests.	2017-06-11 05:37:59 -04:00
Sam Gross	625850c2c2	Check cuDNN version at runtime (#1586 ) * Check cuDNN version at runtime This checks that the version from cudnn.h matches the version from libcudnn.so. Fixes #1476 * Only check major and minor version numbers	2017-05-19 01:55:09 -04:00

1 2 3 4 5 ...

289 Commits