pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Akshit Khurana	bb3e1f30a8	[Pytorch NNAPI] Add compilation_preference & relax_f32_to_f16 APIs (#78758 ) Summary: compilation_preference is one of: ANEURALNETWORKS_PREFER_LOW_POWER = 0 ANEURALNETWORKS_PREFER_FAST_SINGLE_ANSWER = 1 ANEURALNETWORKS_PREFER_SUSTAINED_SPEED = 2 relax_f32_to_f16 calls Model_relaxComputationFloat32toFloat16 Test Plan: Tested on device with nnapi models * Works with existing exported models * Works with new exported models with options Differential Revision: D36433236 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78758 Approved by: https://github.com/kimishpatel	2022-06-06 20:57:34 +00:00
Max Ren	93d5a722b1	[coreml] Introducing Quantization (#78108 ) Summary: Adding Quantization mode to preprocess, which allows us to run through quantization for coreml models Test Plan: https://fburl.com/anp/r0ntsbq0 Notebook runnining through quantization workflow: created a custom bentos kernel to run it through coreml ```bento_kernel( name = "coreml", deps = [ "fbsource//third-party/pypi/coremltools:coremltools", "//caffe2:coreml_backend", "//caffe2:coreml_backend_cpp", "//caffe2:torch", "//caffe2/torch/fb/mobile/model_exporter:model_exporter", ], ) ``` Initial benchmarks on iPhone 11: FP32 Core ML Model: https://our.intern.facebook.com/intern/aibench/details/203998485252700 Quantized Core ML Model: https://our.intern.facebook.com/intern/aibench/details/927584023592505 High End Quantized Model: https://our.intern.facebook.com/intern/aibench/details/396271714697929 Summarized Results \| Backend \| Quantization \| p50 net latency \| Model Size \| \|---------\|--------------\|-----------------\|------------\| \| Core ML \| No \| 1.2200 \| 1.2mb \| \| Core ML \| Yes \| 1.2135 \| 385kb \| \| CPU \| Yes \| 3.1720 \| 426kb \| Reviewed By: SS-JIA Differential Revision: D36559966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78108 Approved by: https://github.com/jmdetloff	2022-06-01 17:10:17 +00:00
PyTorch MergeBot	b994ce359e	Revert "[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002 )" This reverts commit `c274f2ad52`. Reverted https://github.com/pytorch/pytorch/pull/77002 on behalf of https://github.com/malfet due to please, as it breaks internal CI, but also no CUDA heads should be included from `torch/csrc/Module.cpp`, but rather should be implemented/registered in `torch/csrc/cuda/Module.cpp`	2022-05-24 21:52:35 +00:00
Nikita Shulga	6244daa6a9	[MPS] Fix torch.mps.is_available() (#78121 ) By introducing `at:mps::is_available()` and changing `torch._C._is_mps_available` from property to memoizable callable Also, if `_mtl_device` is released in MPSDevice destructor, shouldn't it be retained in the constructor Looks like GitHubActions Mac runner does not have any Metal devices available, according to https://github.com/malfet/deleteme/runs/6560871657?check_suite_focus=true#step:3:15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78121 Approved by: https://github.com/albanD	2022-05-24 05:10:38 +00:00
Eddie Yan	c274f2ad52	[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002 ) (reopening due to botched merge) The cuDNN V8 API (main support merged in https://github.com/pytorch/pytorch/pull/60755) potentially exposes many more kernels with benchmark=True. While these additional kernels can improve performance, it is often unnecessary to run every kernel returned by the heuristic and doing so may degrade the user experience by causing the first model iteration to be very slow. To alleviate this issue, this PR introduces torch.backends.cudnn.benchmark_limit. benchmark_limit specifies the maximum number of working cuDNN kernels to try for a given workload, with the default being 10 (similar to what TensorFlow does). benchmark_limit = 0 yields the current behavior of trying every kernel returned by the heuristic. CC @ptrblck @ngimel @xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77002 Approved by: https://github.com/ngimel	2022-05-24 00:11:47 +00:00
Kulin Seth	f348b1b2b5	Add the Runtime components for MPS backend. (#76725 ) The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend. Current list of identified TODOs are: - https://github.com/pytorch/pytorch/issues/77176 - Unify the logic with CUDACachingAllocator and remove redundant code. - https://github.com/pytorch/pytorch/issues/77170 - Look into using C++ smart pointers where possible with ObjC code - Use empty_strided_generic() to implement the `empty_strided_mps` code - https://github.com/pytorch/pytorch/issues/77144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725 Approved by: https://github.com/albanD	2022-05-11 17:19:45 +00:00
PyTorch MergeBot	1467e0dd5d	Revert "Deprecate torch.lu" This reverts commit `a5bbfd94fb`. Reverted https://github.com/pytorch/pytorch/pull/73804 on behalf of https://github.com/malfet	2022-05-09 19:06:44 +00:00
lezcano	a5bbfd94fb	Deprecate torch.lu BC-breaking note: This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`. A upgrade guide is added to the documentation for `torch.lu`. Note this PR DOES NOT remove `torch.lu`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73804 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-05-05 19:17:11 +00:00
Kurt Mohler	5375b2e994	Resolve `int[]?` arguments to new OptionalIntArrayRef class This PR uses the `OptionalArrayRef` template class that was drafted in #64084. Fixes #44409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864 Approved by: https://github.com/ezyang	2022-03-26 01:45:50 +00:00
Tao Xu	06ff4f570c	[Core ML] Support enumerated input shapes (#74441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74441 For xirp based segmentation models, we want to support enumerated input shapes. This allows us to support both landscape and portrait mode images without sacrificing the performance. P488118264 ghstack-source-id: 151736964 Test Plan: `buck run coreml:xirp -- --model="/home/taox/xirp/xirp_20a.pt" --out="/home/taox/xirp/xirp_20a_coreml_enumerated.ptl"` Reviewed By: mcr229 Differential Revision: D34803184 fbshipit-source-id: c462c0783846a1489ca7ce4d5a654aa6927c9c44 (cherry picked from commit 67d418c97531daaf3d03d1000ca4a4ff60de2a95)	2022-03-21 21:32:24 +00:00
Weiwen Xia	060f1b822a	Add onednn quant backend (#74137 ) Summary: Resolve the conflicts in https://github.com/pytorch/pytorch/pull/69820 jerryzh168 Please review. Thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74137 Reviewed By: samdow Differential Revision: D34840477 Pulled By: jerryzh168 fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425 (cherry picked from commit de76bb808b315e9a2e45d8c5f1c1233a47d669c4)	2022-03-15 01:28:21 +00:00
Jerry Zhang	5a897536f3	Revert D33716039: [pytorch][PR] Add ONEDNN quantization backend Test Plan: revert-hammer Differential Revision: D33716039 (`989b24855e`) Original commit changeset: 6f7bb807e857 Original Phabricator Diff: D33716039 (`989b24855e`) fbshipit-source-id: ed233c5b99d4edb7d5a9d6c600825c78555f16d0 (cherry picked from commit d3e1f825b06ef67adb13623ccb7cbf1b700c1dd5)	2022-03-11 22:06:25 +00:00
Xia Weiwen	989b24855e	Add ONEDNN quantization backend (#69820 ) Summary: This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI. ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK. To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models. ```python torch.backends.quantized.engine = 'onednn' ``` ## Design docs https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983 https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096 ## File changes Add ONEDNN to qengine list - aten/src/ATen/Context.cpp - c10/core/QEngine.h - torch/ao/quantization/qconfig.py - torch/backends/quantized/\_\_init\_\_.py Implement qconv & qlinear for ONEDNN backend - aten/src/ATen/native/quantized/cpu/conv_serialization.h - aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp - aten/src/ATen/native/quantized/cpu/onednn_utils.h - aten/src/ATen/native/quantized/cpu/qconv.cpp - aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp - aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp - aten/src/ATen/native/quantized/cpu/qlinear.cpp - aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp - aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp Skip tests that are not supported by ONEDNN - test/ao/sparsity/test_kernels.py - test/quantization/core/test_quantized_module.py - test/quantization/core/test_quantized_op.py ## Validation results This PR has passed `test_quantization.py` and `test_mkldnn.py`. Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform: (Note: Tested with single instance on single core. Using the latest oneDNN library.) Table 1. Performance comparison of int8 2d convolution operator \|No.\| Shape\| FBGEMM\| ONEDNN\| Gain\| \|-\|-\|-\|-\|-\| \|1\| IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 668.310us\| 535.630us\| 24.8%\| \|2\| IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 290.630us\| 281.810us\| 3.1%\| \|3\| IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 1.045ms\| 893.010us\| 17.0%\| \|4\| IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 385.320us\| 373.720us\| 3.1%\| \|5\| IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0\| 1.876ms\| 1.641ms\| 14.3%\| \|6\| IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0\| 660.460us\| 638.470us\| 3.4%\| Table 2. Performance comparison of int8 linear operator \|No.\| Shape (m, n, k)\| FBGEMM\| ONEDNN\| Gap\| \|-\|-\|-\|-\|-\| \|1\| 64, 800, 320\| 80.550us\| 96.770us\| 20.10%\| \|2\| 64, 768, 512\| 101.230us\| 130.720us\| 29.10%\| \|3\| 16, 256, 512\| 30.230us\| 51.450us\| 70.20%\| \|4\| 128, 128, 128\| 33.810us\| 50.480us\| 49.30%\| \|5\| 256, 512, 256\| 154.490us\| 195.050us\| 26.30%\| \|6\| 1024, 1024, 1024\| 3.134ms\| 3.514ms\| 12.10%\| ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820 Reviewed By: HDCharles Differential Revision: D33716039 Pulled By: jerryzh168 fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd (cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)	2022-03-11 20:31:49 +00:00
lkct	7d542a4f2b	Fix type annotation for `torch.backends.cudnn.allow_tf32` (#72757 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/72753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72757 Reviewed By: samdow Differential Revision: D34204436 Pulled By: ngimel fbshipit-source-id: 3528efd7bdf72c1d9338806555ecb643ab94ffeb (cherry picked from commit `7036c2e6e6`)	2022-02-14 17:26:37 +00:00
Akshit Khurana	a70297e7cb	NNAPI: quant logistic fix (#70847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70847 NNAPI needs a fixed zero point and scale for sigmoid (logistic) ghstack-source-id: 146555935 Test Plan: LIBNEURALNETWORKS_PATH="/path/to/libneuralnetworks.so" pytest test/test_nnapi.py Reviewed By: dreiss Differential Revision: D33237918 fbshipit-source-id: 05ef3a81bf1589ad44b599a19bce4066531c432b	2022-01-07 13:36:33 -08:00
Akshit Khurana	44283c2766	NNAPI: Add qint16 support via int16 (#70621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70621 Pytorch doesn't have support for qint16 yet. Add an option to handle qint16 via int16 & qint32 data types. * For qint16 tensors in NNAPI, the user sends a qint32 tensor. We convert the qint32 to int16 for the converter and set the zero point and scale for nnapi * inputs to the model have to have fixed scale and zero point and are only supported for testing * Added a flag use_int16_for_qint16 which will be used maintain backwards compatibility in the converter when true qint16 is supported in PyTorch ghstack-source-id: 146507483 Test Plan: pytest test/test_nnapi.py Reviewed By: dreiss Differential Revision: D33285124 fbshipit-source-id: b6376fa1bb18a0b9f6a18c545f600222b650cb66	2022-01-04 23:12:38 -08:00
Akshit Khurana	1150046d29	NNAPI: Add runtime flexible shapes & return shapes (#70334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70334 * Use 0 for load time flexible shapes * -1 for runtime flexible shapes * NNAPI needs return shapes for flexible outputs Test Plan: Tested via upcoming ops Reviewed By: dreiss Differential Revision: D33237922 fbshipit-source-id: 50afdd8e3c6401dfb79b4bc09513c9882a09e5d5	2022-01-04 08:37:09 -08:00
Akshit Khurana	d9106116aa	nnapi: Add int32 type torchscript expressions (#70197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70197 Test Plan: * `pytest test/test_nnapi.py` * Testing via ops following this commit Reviewed By: anshuljain1, dreiss Differential Revision: D33237917 fbshipit-source-id: f0493620f28a62ad9fe0b97b67d1e25059d50c24	2022-01-03 19:00:38 -08:00
Xiao Wang	bfe5ad28e6	[Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980 ) Summary: Per title. This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU. Usage: ```python torch.backends.cuda.preferred_linalg_library('cusolver') ``` Available options (str): `'default'`, `'cusolver'`, `'magma'`. Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime. Performance of linear algebra operators after this PR should be no worse than before. The flag is set to `'default'` by default, which makes everything the same as before this PR. The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980 Reviewed By: mruberry Differential Revision: D32849457 Pulled By: ngimel fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6	2021-12-03 19:06:30 -08:00
eqy	790763b0fe	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 ) Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe	2021-11-09 17:27:20 -08:00
Akshit Khurana	1de8976e85	Add quantized::convtranspose2d (#63914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63914 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531889 fbshipit-source-id: a65e389da2722efbc62e3fe1edf503732326350d	2021-09-24 17:07:29 -07:00
Akshit Khurana	ab5eb56983	add qmul (#63913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63913 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531890 fbshipit-source-id: 29d88cc61bd1e328cc7ae7a91a2f8d4819803c8d	2021-09-24 17:06:17 -07:00
Tao Xu	7dc3858deb	[CoreML][fbcode] Add the `preprocess` python APIs (#64521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521 Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage. ghstack-source-id: 138324214 Test Plan: ``` (base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb] buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt" Parsing buck files: finished in 0.5 sec Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated Total time: 11.1 sec Converting Frontend ==> MIL Ops: 100%\|██████████████████████████████████████████▉\| 382/383 [00:00<00:00, 692.58 ops/s] Running MIL optimization passes: 100%\|███████████████████████████████████████████\| 18/18 [00:00<00:00, 45.55 passes/s] Translating MIL ==> MLModel Ops: 100%\|███████████████████████████████████████████\| 704/704 [00:01<00:00, 468.56 ops/s] input { name: "input_0" type { multiArrayType { shape: 1 shape: 3 shape: 224 shape: 224 dataType: FLOAT32 } } } output { name: "645" type { multiArrayType { dataType: FLOAT32 } } } metadata { userDefined { key: "com.github.apple.coremltools.source" value: "torch==1.10.0a0+fb" } userDefined { key: "com.github.apple.coremltools.version" value: "4.1" } } {'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'} WARNING: Logging before InitGoogleLogging() is written to STDERR W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module) graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2, %x.1 : Tensor): %51 : str = prim::Constant[value="Exception: Backend is not available."]() %50 : str = prim::Constant[value="AssertionError: "]() %14 : str = prim::Constant[value="forward"]() # <string>:5:62 %48 : Tensor = prim::Uninitialized() %44 : Tensor = prim::Uninitialized() %typed_inputs.1 : Any[] = prim::ListConstruct(%x.1) %__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19 %49 : Tensor = prim::If(%8) # <string>:4:16 block0(): %__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1) %15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47 %17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24 %18 : Any = prim::ListUnpack(%17) %20 : bool = prim::isinstance[types=[Tensor]](%18) %39 : Tensor = prim::If(%20) # <string>:6:18 block0(): %22 : Tensor = prim::unchecked_cast(%18) -> (%22) block1(): = prim::RaiseException(%50) # <string>:6:18 -> (%44) -> (%39) block1(): = prim::RaiseException(%51) # <string>:9:18 -> (%48) return (%49) ``` Reviewed By: raziel Differential Revision: D30585154 fbshipit-source-id: 66c7d2e931be6eaa3c43a0ee131ea8046452449d	2021-09-17 00:25:14 -07:00
Akshit Khurana	2d58f3f56d	NNAPI: Support const values in binary ops Summary: NNAPI converter failed with 1 const value and one tensor earlier Code suggestions from dreiss Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary Imported from OSS Reviewed By: anshuljain1 Differential Revision: D28893881 fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6	2021-08-20 21:10:26 -07:00
Amy He	73f1e2d1dc	[8/N] Nnapi backend delegation preprocess: New refactored design (#62225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225 Rewrote the preprocess function for Android NNAPI delegate. Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule). Dictionary returned contains: "shape_compute_module": torch::jit::Module, "ser_model": torch::Tensor, "weights": List[torch.Tensor], "inp_mem_fmts": List[int], "out_mem_fmts": List[int] Purpose and Future: The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient. Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well. nnapi_backend_preprocess.cpp: preprocess implementation prepare.py: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule Test: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully ghstack-source-id: 134444190 Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully Reviewed By: raziel Differential Revision: D29922279 fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab	2021-07-27 18:52:48 -07:00
Akshit Khurana	8e71f48f0a	Handle simple NNAPI flatten NHWC case (#61796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61796 We can easily handle nnapi conversion for nhwc inputs that have 1 channel or H & W are 1 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Imported from OSS Reviewed By: saketh-are Differential Revision: D29827735 fbshipit-source-id: 65dee4b42fceef1b032bf5dd1c4cc6e020d01e14	2021-07-26 10:59:04 -07:00
Akshit Khurana	a3670ba377	Add option to specify custom NNAPI serializer (#61025 ) Summary: To add serializer for custom ops we can subclass default serializer and update ADDER_MAP Pull Request resolved: https://github.com/pytorch/pytorch/pull/61025 Test Plan: * pytest test/test_nnapi.py::TestNNAPI for current serializer * Custom serializers to be tested with custom ops Imported from OSS Reviewed By: anshuljain1 Differential Revision: D29480745 fbshipit-source-id: 37e3f8de3c97f6c8a486f9879ce11430ea89af34	2021-07-09 15:27:10 -07:00
Akshit Khurana	ae65f63971	Make nnapi flatten converter accept flex inputs (#61024 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61024 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29480748 fbshipit-source-id: c334b09600a64d3e552cec843d6da3de28e7d27c	2021-07-09 15:27:02 -07:00
Akshit Khurana	76c0f223d3	Make nnapi cat converter accept flex inputs Summary: As title Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_cat Reviewed By: anshuljain1 Differential Revision: D29480747 fbshipit-source-id: 161803054ff1a4c2c750fc30a5f0fc6d8a24b2c9	2021-07-09 14:27:53 -07:00
Akshit Khurana	9e81d3d869	Make NNAPI linear converter accept flex inputs (#61022 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61022 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_linear Reviewed By: anshuljain1 Differential Revision: D29480749 fbshipit-source-id: 35975861740298c9e16f866c939e7ee3c2151710	2021-07-09 14:27:51 -07:00
Akshit Khurana	9e533a62f6	Make conv2d nnapi converter accept flexible batch (#61021 ) Summary: Same as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61021 Test Plan: pytest test/test_nnapi.py::TestNNAPI Reviewed By: anshuljain1 Differential Revision: D29480746 fbshipit-source-id: 7217c8f3a811db8c3c373f3e7ca31caf9502ef22	2021-07-09 10:28:10 -07:00
Akshit Khurana	8bd3e52e00	Add conv2d transpose NNAPI converter (#59529 ) Summary: * Conv2d transpose support * Quantize WIP Pull Request resolved: https://github.com/pytorch/pytorch/pull/59529 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_conv2d_transpose Reviewed By: anshuljain1 Differential Revision: D28926335 fbshipit-source-id: 8f90182f96cee0a13c4f38331d421e1e8ac618de	2021-07-09 09:29:20 -07:00
Ivan Kobzarev	7b6ddb6793	[nnapi] add log_softmax (#61378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61378 Test Plan: Imported from OSS Reviewed By: axitkhurana Differential Revision: D29597355 Pulled By: IvanKobzarev fbshipit-source-id: 55124749f8eeffa2b2713f7cffd5ccf965561de1	2021-07-07 18:28:39 -07:00
Akshit Khurana	baa518e2f6	Add Int32 support for NNAPI (#59365 ) Summary: Support Int32 tensors in NNAPI converter Pull Request resolved: https://github.com/pytorch/pytorch/pull/59365 Test Plan: Local testing with FB prod models Reviewed By: anshuljain1 Differential Revision: D28881040 fbshipit-source-id: 2dacceffd322a21d91bfefcf2fb2ea400d952d0d	2021-07-07 12:40:49 -07:00
Akshit Khurana	cf285d8eea	Add aten::slice NNAPI converter (#59364 ) Summary: Add support for aten::slice op in the NNAPI model converter * If start = 0; end = max -> identity * Flexible shapes can be passed through * Flexible shapes can't be sliced over Pull Request resolved: https://github.com/pytorch/pytorch/pull/59364 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_slice Reviewed By: anshuljain1 Differential Revision: D28881039 fbshipit-source-id: 3c1c630ff27b5bba6eda403d87570c61d43ae90e	2021-07-07 12:40:47 -07:00
Akshit Khurana	d26372794a	Add aten::detach NNAPI converter (#58543 ) Summary: * Add support for aten::detach op in the NNAPI model converter as a no-op * Also add flexible op support for add_pointwise_simple_unary_op Pull Request resolved: https://github.com/pytorch/pytorch/pull/58543 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_detatch Reviewed By: anshuljain1 Differential Revision: D28531942 fbshipit-source-id: 4387dbbbadd8ce6b690841f3a903e68a380b849d	2021-07-07 12:40:46 -07:00
Akshit Khurana	0be228dd5f	Add aten::flatten NNAPI converter (#60885 ) Summary: Add support for aten::div op in the NNAPI model converter. Startup time variable size support isn't supported as shapes go as inputs to NNAPI op Runtime variable size support to supported soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/60885 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29451725 fbshipit-source-id: 8902745f7758c8cc88ad4b4ce02b8301ff894bd4	2021-07-07 12:40:44 -07:00
Akshit Khurana	b297f65b66	Add aten::div NNAPI converter (#58541 ) Summary: Add support for aten::div op in the NNAPI model converter. Add variable size input test as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58541 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_div Reviewed By: anshuljain1 Differential Revision: D28531943 fbshipit-source-id: e96342146f6de216f7b88443618edfc54963747c	2021-07-07 12:40:42 -07:00
Akshit Khurana	eab18a9a40	Add aten::to NNAPI converter (#58540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58540 Add support for aten::to op in the NNAPI model converter for simple cases like to("cpu"), to("gpu") Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_to Reviewed By: anshuljain1 Differential Revision: D28531941 fbshipit-source-id: 0c934f7aceaff2669307c3426efe32046d8c44f3	2021-07-07 12:40:41 -07:00
Akshit Khurana	14d604a13e	Add aten::softmax NNAPI converter (#58539 ) Summary: Add support for aten::softmax op in the NNAPI model converter with flexible size Pull Request resolved: https://github.com/pytorch/pytorch/pull/58539 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_softmax Reviewed By: anshuljain1 Differential Revision: D28531946 fbshipit-source-id: 8633f3e3f7f52795f9866ff16ad0867ea36a19e8	2021-07-07 12:39:31 -07:00
Akshit Khurana	369802a504	Add aten::avgpool2d NNAPI converter (#58538 ) Summary: Add support for aten::avgpool2d op in the NNAPI model converter with var size support Pull Request resolved: https://github.com/pytorch/pytorch/pull/58538 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_avgpool2d Reviewed By: anshuljain1 Differential Revision: D28531944 fbshipit-source-id: 43ff8c9389365698c282f204042b49c7ec84d824	2021-07-01 14:07:14 -07:00
Akshit Khurana	c4bb6a5781	NNAPI: flex size support for upsample_nearest2d op (#57563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57563 Add flexible size support for upsample_nearest2d op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200847 fbshipit-source-id: 901fe3f6e68e4c16ece730f3ffa68dc88c6ed6c3	2021-05-05 13:54:43 -07:00
Akshit Khurana	4c609a9782	NNAPI: Add qadd flexible size support (#57562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57562 Add flexible size support for qadd op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200849 fbshipit-source-id: d5b2ea8e9eb8ae405ff2c960f7549cef60bc0991	2021-05-05 13:54:41 -07:00
Akshit Khurana	28cd04ea64	NNAPI: add flexible size support for conv2d (#57561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57561 Add flexible size support for conv2d op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200848 fbshipit-source-id: d94ccf48a3d8453aa8e96c7cac02948c4cd870cc	2021-05-05 13:53:33 -07:00
Guilherme Leobas	e7c79cb158	Add type annotations to nnapi (#48142 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48141 ~Mypy is complaining about a missing arg in a function call.~ ```bash torch/backends/_nnapi/serializer.py:806: error: Too few arguments for "_do_add_binary" [call-arg] Found 1 error in 1 file (checked 1140 source files) ``` `9392137dbe/torch/backends/_nnapi/serializer.py (L804-L806)` ~dreiss, would you mind take a look when you have some cycles to spare and see what would be the appropriated value for `fuse_code` here? Thanks :)~ Edit: https://github.com/pytorch/pytorch/issues/48925 got merged a couple of days ago. The blocking part is now unblocked, and I just pushed the changes to make mypy happy again. This PR is ready for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48142 Reviewed By: ezyang Differential Revision: D28006249 Pulled By: walterddr fbshipit-source-id: 5e43eeba7143512a549efaad31541f86718add7c	2021-04-26 19:08:07 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
David Reiss	da7a27b847	[NNAPI] Initial flexible size support (#54701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54701 We need NNAPI models to support inputs (and, by extension, intermediate values and outputs) whose shape is only determined at load time. For example, a vision models input shape might be dependent on the aspect ratio of the device camera. While NNAPI has full support for variable shapes (by setting components of the operand shape to 0), the guidance we have received is that vendor-provided drivers for real hardware are not able to support this efficiently. Therefore, we take a hybrid approach where shapes are calculated at model load time to semi-dynamically construct our NNAPI model. While this doesn't let us have truly dynamic input shapes, it does allow us to ensure that the vendor driver only sees fixed shapes, so we get maximum performance. In this initial commit, only PReLU supports dynamic shapes. Additional operators will be converted in separate diffs. - In order to convert a flexible-shape model, the user supplies inputs with shapes containing dimensions of size 0 for the flexible dimensions. - During conversion, we generate code to compute the shapes of all intermediates and outputs as a function of the input shapes. - We no longer run the input model to produce the output templates. Instead, we generate code to return properly-sized templates, given the input shapes. - All of this generated code goes into a "ShapeComputeModule" that is used by the NnapiModule during initialization. - The ShapeComputeModule mutates the serialized model to fill in the computed sizes for each operand. This requires us to change the dtype for the serialized model to int32, but this should be fine because everything in it is already 4-byte aligned. - NnapiInitWrapper no longer exists. Instead, initialization is performed on the first run, based on the real arguments. We plan to provide an API for doing eager initialization. - Unit test updated to allow separate arguments to be given for trace, conversion, and inference. A flexible-shape test case was added for PReLU. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536796 Pulled By: dreiss fbshipit-source-id: 105585f247987b1e6ec6946a6fe44401237cb0a0	2021-04-06 13:49:43 -07:00
David Reiss	1e3b3a4714	[NNAPI] Create get_next_operand_id (#54700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54700 This is an internal method just to make it more clear what len(self.operands) is doing. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536794 Pulled By: dreiss fbshipit-source-id: 678cee8a47df6757dd2e6feabf2560fd82d32e26	2021-04-06 13:49:41 -07:00
David Reiss	ca67c17e46	[NNAPI] Add fixed-size assertions (#54699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54699 We'll soon be adding support for flexible-size tensors to the NNAPI converter, but it won't be added to all ops at once. Create get_tensor_operand_by_jitval_fixed_size as a wrapper for get_tensor_operand_by_jitval that verifies that the argument has a fixed shape. Update all call sites. As flexible size support is added to each op, the call sites can be converted back and proper size checks added. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536791 Pulled By: dreiss fbshipit-source-id: 6fb1fea814d767b6ff263fd8b88240a51be74777	2021-04-06 13:49:38 -07:00
David Reiss	5936faee7e	[NNAPI] Rename local variable (#54698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54698 "mf" was short for memory format, but the concept that this variable represents was renamed to "dim_order", so rename the variable. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536793 Pulled By: dreiss fbshipit-source-id: 2b31c70da1ff221a7833e67486690fa606f01dea	2021-04-06 13:49:35 -07:00
David Reiss	1f1d26137b	[NNAPI] Use code generation to better support list input/output (#54697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54697 Previously, models being converted to NNAPI were expected to take inputs as separate arguments, but the generated NNAPI model could only take multiple inputs as a list. Now the generated model always takes inputs (single or multiple) as separate tensor arguments. Previously, models being converted to NNAPI were expected to return outputs as a single tensor or tuple of tensors, but the generated NNAPI model would return multiple outputs as a list. Now the generated model returns a tuple as well (or single tensor). Internally, we decied what output format to use (single tensor or tuple) based on the conversion process, rather than by running the model. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536790 Pulled By: dreiss fbshipit-source-id: c0f93c85d450757e568985947cc2f32043795859	2021-04-06 13:49:33 -07:00
David Reiss	d34d6244e7	[NNAPI] Use array instead of struct for serializing ints (#54696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54696 This was originally developed for a Python version where array was not available. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536792 Pulled By: dreiss fbshipit-source-id: 39e5507e37d4f91871113439fe752a4d5373eaba	2021-04-06 13:49:30 -07:00
David Reiss	476c597ae6	[NNAPI] Handle binary ops combining NHWC+NCHW in some cases (#48812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48812 This came up in a squeeze-and-excitation model. Starting with an NHWC tensor T, we perform a mean operation across H and W, giving an NxC tensor, which (after some fully connected layers) is reshaped to NxCx1x1, then multiplied with T. To handle this, we detect the specific case of a binary op with one NHWC input and one contiguous input with H,W == 1,1 and allow the op to be applied (after transposing the contiguous input). Test Plan: Unit test. Reviewed By: axitkhurana Differential Revision: D25317939 Pulled By: dreiss fbshipit-source-id: b4c17ab3b874d1a7defa04664010ba82115f1c20	2021-04-06 13:49:25 -07:00
David Reiss	b057d27b0b	[NNAPI] Add support for unsqueeze, cat, and mean (#48811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48811 Test Plan: Unit tests. Reviewed By: axitkhurana Differential Revision: D25317936 Pulled By: dreiss fbshipit-source-id: 9b3a0a75b8157ae35ac13d52293a67800bad0ded	2021-04-06 13:49:22 -07:00
David Reiss	8fcf9ca341	[NNAPI] Update support for Linear (#54695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54695 Previously, torch.nn.Linear was calling aten::addmm internally. Now it's calling aten::linear, so add support for that. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536795 Pulled By: dreiss fbshipit-source-id: 42c8d2a80b20ac12ed9bba599c5e0e874256bb13	2021-04-06 13:49:17 -07:00
David Reiss	8d960f7043	[NNAPI] Fix hardtanh (#47520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47520 NNAPI defines "RELU1" as clamping from [-1, 1], not [0, 1] as I previously assumed. Fix our implementation to match. Test Plan: Upcoming unit test. Reviewed By: axitkhurana Differential Revision: D25317934 Pulled By: dreiss fbshipit-source-id: 70efd5bb6092b0628ff6b765ce6f6274ef28d741	2021-04-06 13:49:14 -07:00
David Reiss	beca1fdbec	[NNAPI] Fix MUL op (#47519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47519 This wasn't updated when _do_add_binary was refactored. Test Plan: Upcoming unit test. Reviewed By: axitkhurana Differential Revision: D25317938 Pulled By: dreiss fbshipit-source-id: 99212404c189481cfa692dd77d8f7c7865b6872b	2021-04-06 13:49:12 -07:00
David Reiss	38a3c28f17	[NNAPI] Remove solid weights support (#47518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47518 This was left over from an old version of the code. The idea was that instead of indexing into separate tensors for each weight, you could bundle them all into a single file and use different offsets into that file. With the current design, this is nontrivial to support, so drop the code for now. Test Plan: CI Reviewed By: axitkhurana Differential Revision: D25317935 Pulled By: dreiss fbshipit-source-id: e26ab3a8d437cb1bbb50319209fa56d9c571ce61	2021-04-06 13:49:09 -07:00
David Reiss	1be909f074	[NNAPI] Fix models with no weights (#47517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47517 While we're unlikely to see this in practice, it comes up in unit tests. This type annotation is necessary for `torch.jit.script` to figure out the type of the list if it is empty. Test Plan: Unit tests in a later diff. Reviewed By: axitkhurana Differential Revision: D25317937 Pulled By: dreiss fbshipit-source-id: de8b6665c6fcd3cd2b39e3c696a39336c064e4c1	2021-04-06 13:49:06 -07:00
Akshit Khurana	d0fd41dcfe	Add size op in nnapi serializer (#52026 ) Summary: serializer didn't support aten::size Pull Request resolved: https://github.com/pytorch/pytorch/pull/52026 Test Plan: Torchvision Mobilenetv2 [script](https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.html) works. [Test](`ecfed07cc5`) to be merged after [this PR](https://github.com/pytorch/pytorch/pull/47521/files) is merged Reviewed By: dreiss Differential Revision: D26363133 Pulled By: axitkhurana fbshipit-source-id: 772a6bea62bca69f8bba19c25c582a1734a70eb1	2021-02-10 15:57:01 -08:00
David Reiss	9a9383ef2e	PyTorch NNAPI integration prototype (#46780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46780 This is in prototype status, but pretty functional. There are two major parts. - Model converter. This is a pure Python component that consumes a model in TorchScript format, converts the operations into NNAPI semantics, and serializes the model in a custom format. It then wraps the result in a new TorchScript model that can invoke NNAPI under the hood. - Runtime. This is a TorchBind object that deserializes the model and sends the result to NNAPI. This is fairly simple since the serialized format is basically just a list of NNAPI calls to make, so most of the code is spent on bounds checking. A few notes on the design. - Currently, all tensor sizes need to be fixed, and those fixed sizes are burned directly into the serialized model. This will probably need to change. NNAPI supports variable-sized tensors, but the important hardware backends do not. However, we're seeing use cases crop up where the input size is not known until around the time that the model is loaded (for example, it might depend on the camera aspect ratio). I think the proper fix here is to remove the code in the converter that eagerly calculates the sizes of the intermediate tensors and replace it with a code generator that will generate some TorchScript code that will perform those calculations at model load time. This way, we will be able to support models that have variable-sized inputs while still only showing fixed-sized operands to NNAPI. - The important hardware backends want operands to be in NHWC order, but PyTorch natively represents all tensors and NCHW. The strategy for this is to keep NCHW during most of the conversion process, but track and additional value per operand representing the "dimension order". The dimension order gets propagated through convolutions and pointwise ops. When we're ready to serialize the model, we reorder the dimensions for "channels last" operands to NHWC. Test Plan: Some local testing with FB prod models. I'll need to add some examples and automated tests. Reviewed By: iseeyuan Differential Revision: D24574040 Pulled By: dreiss fbshipit-source-id: 6adc8571b234877ee3666ec0c0de24da35c38a1f	2020-11-05 21:31:01 -08:00
Jane (Yuan) Xu	1c996b7170	Enable typechecking for torch.testing._internal.common_quantized.* (#44805 ) Summary: Addresses a subproblem of [Issue 42969](https://github.com/pytorch/pytorch/issues/42969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44805 Reviewed By: malfet Differential Revision: D23742754 Pulled By: janeyx99 fbshipit-source-id: e916a6a0c049cac318549a485d47f19363087d15	2020-09-17 14:24:32 -07:00
Xiang Gao	e48201c5cf	Mention TF32 on related docs (#44690 ) Summary: cc: ptrblck ![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690 Reviewed By: ngimel Differential Revision: D23727921 Pulled By: mruberry fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914	2020-09-16 19:18:30 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Nikita Shulga	c44e4878ae	Enable torch.backends.quantized typechecks (#44794 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44794 Reviewed By: walterddr Differential Revision: D23734353 Pulled By: malfet fbshipit-source-id: 491bd7c8f147759715eb296d7537a172685aa066	2020-09-16 12:21:20 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
Shawn Zhong	21ba3b4f40	Fix `torch.backends.cudnn` mypy error (#38947 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38410 ![image](https://user-images.githubusercontent.com/6421097/82724121-74b26880-9c99-11ea-9b63-e92de2dccdf2.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38947 Differential Revision: D21765290 Pulled By: ezyang fbshipit-source-id: 5d2b25f039a653c609d60cdaac4a7ac5812ae291	2020-06-03 10:55:43 -07:00
guol-fnst	42b2dee6c2	`verbose` unused in `torch.backends.cudnn` (#39228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39228 Differential Revision: D21818455 Pulled By: ezyang fbshipit-source-id: abf158f2d745fd135cd0966ee30d559cefa456c0	2020-06-01 09:08:03 -07:00
Ailing Zhang	7c13a07286	[Reland] Remove uses of type() part 2 (#38288 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/38140. It got reverted since it broke slow tests which were only run on master branch(thanks mruberry !). Enabling all CI tests in this PR to make sure they pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38288 Reviewed By: mruberry Differential Revision: D21524923 Pulled By: ailzhang fbshipit-source-id: 3a9ecc7461781066499c677249112434b08d2783	2020-05-12 13:37:14 -07:00
Mike Ruberry	f6b1c046b6	Revert D21483808: [pytorch][PR] Remove uses of type() part 2 Test Plan: revert-hammer Differential Revision: D21483808 Original commit changeset: 12f5de6151ba fbshipit-source-id: 2755fa97ae3f342ae88b1531acfa790772a27c17	2020-05-09 00:42:39 -07:00
Ailing Zhang	86d28706e0	Remove uses of type() part 2 (#38140 ) Summary: I'm mostly done with cleaning up test/ folder. There're a bunch of remaining callsites but they're "valid" in testing `type()` functionalities. We cannot remove them until it's fully deprecated. Next PR would mainly focus on move some callsites to an internal API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38140 Differential Revision: D21483808 Pulled By: ailzhang fbshipit-source-id: 12f5de6151bae59374cfa0372e827651de7e1c0f	2020-05-08 19:30:46 -07:00
Kimish Patel	4c30fc7238	Integrate XNNPACK with custom class for packing weights. (#34047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34047 This PR integrates the added xnnpack conv2d and linear op via custom class registration for packed weights. The packed struct is serializable. Test Plan: python test test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185657 fbshipit-source-id: fc7e692d8f913e493b293b02d92f4e78536d7698	2020-03-14 12:51:56 -07:00
Peter Bell	5fc5cf6571	Stop using ctypes to interface with CUDA libraries. (#33678 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678 Differential Revision: D20249187 Pulled By: ezyang fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed	2020-03-11 07:22:46 -07:00
Jithun Nair	718c538ff9	Add ability to enable/disable MIOpen at runtime (#33118 ) Summary: 1. Set `torch._C.has_cudnn` to `True` for ROCm 2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()` 3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118 Differential Revision: D19977719 Pulled By: bddppq fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad	2020-02-20 10:47:57 -08:00
peter	b77c25dec0	Fix dll load logic for Python 3.8 on Windows (#32215 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215 Differential Revision: D19501869 Pulled By: ezyang fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915	2020-01-22 08:33:34 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Dmytro Dzhulgakov	764bf826e3	Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840 Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized Test Plan: Imported from OSS Differential Revision: D17604403 Pulled By: dzhulgakov fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc	2019-09-27 13:45:15 -07:00
Supriya Rao	45391ccecb	Update qengine flag in python to string (#26620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620 This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now). set_qengine and get_qengine return an int which represents the at::QEngine enum Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17533582 fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f	2019-09-23 17:56:50 -07:00
Jerry Zhang	8f50ea0f5c	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26471 att Test Plan: . Imported from OSS Differential Revision: D17491215 fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9	2019-09-19 17:42:09 -07:00
Ailing Zhang	b1ecf4bc82	Revert D17464904: Add NoQEngine to QEngine and refactor the name of set/get qengine Test Plan: revert-hammer Differential Revision: D17464904 Original commit changeset: d8f2cebb978f fbshipit-source-id: 8feb86f7347f455eb51538ce7893d4a096ba0ba4	2019-09-18 20:04:58 -07:00
Jerry Zhang	4f7292f7ee	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26330 att Test Plan: . Imported from OSS Differential Revision: D17464904 fbshipit-source-id: d8f2cebb978fcbc478bc7e111ba24bc71a6f8915	2019-09-18 19:38:59 -07:00
Supriya Rao	24d5b5f5f9	Add Runtime flag for quantized backend. (#25680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680 Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both. The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine) ghstack-source-id: 89935643 Test Plan: Verified torch.backends.quantized.engine works Differential Revision: D17198233 fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672	2019-09-11 21:37:36 -07:00
jiayisun	b9bf91feb8	Add torch.backends.mkldnn.enabled flag (#25459 ) Summary: This PR is about add torch.backends.mkldnn.enabled flag said in https://github.com/pytorch/pytorch/issues/25186 which can be used disable mkldnn at runtime step as torch.backends.cudnn.enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25459 Differential Revision: D17258926 Pulled By: ezyang fbshipit-source-id: e179ad364cc608fdaa7d0f37e2e762ceb5eda598	2019-09-11 12:09:40 -07:00
peter	d6f62b70f3	Fix cuda and cudnn libraries search process on Windows (#20205 ) Summary: Fixes #20202 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20205 Differential Revision: D15258626 Pulled By: ezyang fbshipit-source-id: 855ad457a8bb7a46accc7cf6ec5cb09e98f6e770	2019-05-08 06:08:47 -07:00
Tongzhou Wang	973d51079b	Add device-specific cuFFT plan caches (#19300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300 Differential Revision: D14986967 Pulled By: soumith fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255	2019-04-18 06:39:35 -07:00
Edward Yang	50df3e5e2e	Add ability to query if built with CUDA and MKL-DNN. (#18362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362 ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18362 Add ability to query if built with CUDA and MKL-DNN. Fixes #18108. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14584430 fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473	2019-03-25 10:39:09 -07:00
SsnL	13422fca32	Add torch.backends.openmp.is_available(); fix some cmake messages (#16425 ) Summary: 1. add `torch.backends.openmp.is_available()` 2. Improve various `cmake` outputs 3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets 4. Fix `MKL` warning message, and QUIET flag. 5. Fix various typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425 Differential Revision: D13903395 Pulled By: soumith fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d	2019-01-31 16:15:46 -08:00
Lu Fang	b1b00f329e	Fix the flake8 linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549 Reviewed By: bddppq Differential Revision: D13877435 Pulled By: houseroad fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540	2019-01-30 09:36:00 -08:00
David Riazati	bc74ec80d0	Add support for torch.backends.cudnn.enabled (#13057 ) Summary: This is used commonly in `nn` functions. This PR adds it as a weak module (and also alters the conversion of weak modules to strong modules to accept ordinary `object`s) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057 Differential Revision: D10846618 Pulled By: driazati fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b	2018-10-31 09:31:09 -07:00
sclarkson	2b033332c8	Allow linking to backwards-compatible cuDNN at runtime (#12239 ) Summary: Fixes #12193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12239 Differential Revision: D10321744 Pulled By: soumith fbshipit-source-id: bf437f7f9b6231158a1585d2dabae8d937396478	2018-10-10 23:56:51 -07:00
Matt Dawkins	87b2f05a9c	Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435 ) Summary: Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435 Differential Revision: D9736396 Pulled By: soumith fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f	2018-09-09 11:40:25 -07:00
Peter Goldsborough	9ce15173fb	Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012 ) Summary: The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class. The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set. To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API. I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN. ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/9012 Reviewed By: pjh5 Differential Revision: D8689786 Pulled By: goldsborough fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c	2018-06-29 17:25:23 -07:00
Tongzhou Wang	e6c7b38f94	Cache cufft plans (#8344 ) * cache cufft plans * use an LRU cache * suffix CuFFTParams members with _ * import print_function for py2 * lint * fix potential race; add dummy impl for CPU only builds * cpp formatting; remove nccl makefile change * Use CUDA hooks instead * comments and doc * update the error message * move LRU cachae to a separate file and native::detail namespace * update comment * specify NOTE location in CuFFTPlanCache.h * update disabled_features.yaml to make amd ci work * another fix for AMD CI in disabled_features.yaml * Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__ * improve the notes * lint * revert onnx change * put back inlining for CUFFT_CHECK	2018-06-22 13:02:34 -04:00
Peter Goldsborough	0acddd6cee	Add torch.cuda.cudnn_is_available (#8703 )	2018-06-20 14:18:03 -07:00
Edward Z. Yang	64834f6fb8	Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275 ) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a hooks interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-10 10:28:33 -07:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
gchanan	749d51414a	Separate cuda-ness from dtype. (#6470 ) * Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.	2018-04-12 14:05:44 -04:00

1 2 3 4 5

206 Commits