pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Akshit Khurana	1150046d29	NNAPI: Add runtime flexible shapes & return shapes (#70334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70334 * Use 0 for load time flexible shapes * -1 for runtime flexible shapes * NNAPI needs return shapes for flexible outputs Test Plan: Tested via upcoming ops Reviewed By: dreiss Differential Revision: D33237922 fbshipit-source-id: 50afdd8e3c6401dfb79b4bc09513c9882a09e5d5	2022-01-04 08:37:09 -08:00
Akshit Khurana	d9106116aa	nnapi: Add int32 type torchscript expressions (#70197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70197 Test Plan: * `pytest test/test_nnapi.py` * Testing via ops following this commit Reviewed By: anshuljain1, dreiss Differential Revision: D33237917 fbshipit-source-id: f0493620f28a62ad9fe0b97b67d1e25059d50c24	2022-01-03 19:00:38 -08:00
Xiao Wang	bfe5ad28e6	[Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980 ) Summary: Per title. This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU. Usage: ```python torch.backends.cuda.preferred_linalg_library('cusolver') ``` Available options (str): `'default'`, `'cusolver'`, `'magma'`. Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime. Performance of linear algebra operators after this PR should be no worse than before. The flag is set to `'default'` by default, which makes everything the same as before this PR. The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980 Reviewed By: mruberry Differential Revision: D32849457 Pulled By: ngimel fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6	2021-12-03 19:06:30 -08:00
eqy	790763b0fe	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 ) Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe	2021-11-09 17:27:20 -08:00
Akshit Khurana	1de8976e85	Add quantized::convtranspose2d (#63914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63914 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531889 fbshipit-source-id: a65e389da2722efbc62e3fe1edf503732326350d	2021-09-24 17:07:29 -07:00
Akshit Khurana	ab5eb56983	add qmul (#63913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63913 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531890 fbshipit-source-id: 29d88cc61bd1e328cc7ae7a91a2f8d4819803c8d	2021-09-24 17:06:17 -07:00
Tao Xu	7dc3858deb	[CoreML][fbcode] Add the `preprocess` python APIs (#64521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521 Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage. ghstack-source-id: 138324214 Test Plan: ``` (base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb] buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt" Parsing buck files: finished in 0.5 sec Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated Total time: 11.1 sec Converting Frontend ==> MIL Ops: 100%\|██████████████████████████████████████████▉\| 382/383 [00:00<00:00, 692.58 ops/s] Running MIL optimization passes: 100%\|███████████████████████████████████████████\| 18/18 [00:00<00:00, 45.55 passes/s] Translating MIL ==> MLModel Ops: 100%\|███████████████████████████████████████████\| 704/704 [00:01<00:00, 468.56 ops/s] input { name: "input_0" type { multiArrayType { shape: 1 shape: 3 shape: 224 shape: 224 dataType: FLOAT32 } } } output { name: "645" type { multiArrayType { dataType: FLOAT32 } } } metadata { userDefined { key: "com.github.apple.coremltools.source" value: "torch==1.10.0a0+fb" } userDefined { key: "com.github.apple.coremltools.version" value: "4.1" } } {'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'} WARNING: Logging before InitGoogleLogging() is written to STDERR W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module) graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2, %x.1 : Tensor): %51 : str = prim::Constant[value="Exception: Backend is not available."]() %50 : str = prim::Constant[value="AssertionError: "]() %14 : str = prim::Constant[value="forward"]() # <string>:5:62 %48 : Tensor = prim::Uninitialized() %44 : Tensor = prim::Uninitialized() %typed_inputs.1 : Any[] = prim::ListConstruct(%x.1) %__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19 %49 : Tensor = prim::If(%8) # <string>:4:16 block0(): %__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1) %15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47 %17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24 %18 : Any = prim::ListUnpack(%17) %20 : bool = prim::isinstance[types=[Tensor]](%18) %39 : Tensor = prim::If(%20) # <string>:6:18 block0(): %22 : Tensor = prim::unchecked_cast(%18) -> (%22) block1(): = prim::RaiseException(%50) # <string>:6:18 -> (%44) -> (%39) block1(): = prim::RaiseException(%51) # <string>:9:18 -> (%48) return (%49) ``` Reviewed By: raziel Differential Revision: D30585154 fbshipit-source-id: 66c7d2e931be6eaa3c43a0ee131ea8046452449d	2021-09-17 00:25:14 -07:00
Akshit Khurana	2d58f3f56d	NNAPI: Support const values in binary ops Summary: NNAPI converter failed with 1 const value and one tensor earlier Code suggestions from dreiss Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary Imported from OSS Reviewed By: anshuljain1 Differential Revision: D28893881 fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6	2021-08-20 21:10:26 -07:00
Amy He	73f1e2d1dc	[8/N] Nnapi backend delegation preprocess: New refactored design (#62225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225 Rewrote the preprocess function for Android NNAPI delegate. Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule). Dictionary returned contains: "shape_compute_module": torch::jit::Module, "ser_model": torch::Tensor, "weights": List[torch.Tensor], "inp_mem_fmts": List[int], "out_mem_fmts": List[int] Purpose and Future: The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient. Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well. nnapi_backend_preprocess.cpp: preprocess implementation prepare.py: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule Test: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully ghstack-source-id: 134444190 Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully Reviewed By: raziel Differential Revision: D29922279 fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab	2021-07-27 18:52:48 -07:00
Akshit Khurana	8e71f48f0a	Handle simple NNAPI flatten NHWC case (#61796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61796 We can easily handle nnapi conversion for nhwc inputs that have 1 channel or H & W are 1 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Imported from OSS Reviewed By: saketh-are Differential Revision: D29827735 fbshipit-source-id: 65dee4b42fceef1b032bf5dd1c4cc6e020d01e14	2021-07-26 10:59:04 -07:00
Akshit Khurana	a3670ba377	Add option to specify custom NNAPI serializer (#61025 ) Summary: To add serializer for custom ops we can subclass default serializer and update ADDER_MAP Pull Request resolved: https://github.com/pytorch/pytorch/pull/61025 Test Plan: * pytest test/test_nnapi.py::TestNNAPI for current serializer * Custom serializers to be tested with custom ops Imported from OSS Reviewed By: anshuljain1 Differential Revision: D29480745 fbshipit-source-id: 37e3f8de3c97f6c8a486f9879ce11430ea89af34	2021-07-09 15:27:10 -07:00
Akshit Khurana	ae65f63971	Make nnapi flatten converter accept flex inputs (#61024 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61024 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29480748 fbshipit-source-id: c334b09600a64d3e552cec843d6da3de28e7d27c	2021-07-09 15:27:02 -07:00
Akshit Khurana	76c0f223d3	Make nnapi cat converter accept flex inputs Summary: As title Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_cat Reviewed By: anshuljain1 Differential Revision: D29480747 fbshipit-source-id: 161803054ff1a4c2c750fc30a5f0fc6d8a24b2c9	2021-07-09 14:27:53 -07:00
Akshit Khurana	9e81d3d869	Make NNAPI linear converter accept flex inputs (#61022 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61022 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_linear Reviewed By: anshuljain1 Differential Revision: D29480749 fbshipit-source-id: 35975861740298c9e16f866c939e7ee3c2151710	2021-07-09 14:27:51 -07:00
Akshit Khurana	9e533a62f6	Make conv2d nnapi converter accept flexible batch (#61021 ) Summary: Same as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61021 Test Plan: pytest test/test_nnapi.py::TestNNAPI Reviewed By: anshuljain1 Differential Revision: D29480746 fbshipit-source-id: 7217c8f3a811db8c3c373f3e7ca31caf9502ef22	2021-07-09 10:28:10 -07:00
Akshit Khurana	8bd3e52e00	Add conv2d transpose NNAPI converter (#59529 ) Summary: * Conv2d transpose support * Quantize WIP Pull Request resolved: https://github.com/pytorch/pytorch/pull/59529 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_conv2d_transpose Reviewed By: anshuljain1 Differential Revision: D28926335 fbshipit-source-id: 8f90182f96cee0a13c4f38331d421e1e8ac618de	2021-07-09 09:29:20 -07:00
Ivan Kobzarev	7b6ddb6793	[nnapi] add log_softmax (#61378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61378 Test Plan: Imported from OSS Reviewed By: axitkhurana Differential Revision: D29597355 Pulled By: IvanKobzarev fbshipit-source-id: 55124749f8eeffa2b2713f7cffd5ccf965561de1	2021-07-07 18:28:39 -07:00
Akshit Khurana	baa518e2f6	Add Int32 support for NNAPI (#59365 ) Summary: Support Int32 tensors in NNAPI converter Pull Request resolved: https://github.com/pytorch/pytorch/pull/59365 Test Plan: Local testing with FB prod models Reviewed By: anshuljain1 Differential Revision: D28881040 fbshipit-source-id: 2dacceffd322a21d91bfefcf2fb2ea400d952d0d	2021-07-07 12:40:49 -07:00
Akshit Khurana	cf285d8eea	Add aten::slice NNAPI converter (#59364 ) Summary: Add support for aten::slice op in the NNAPI model converter * If start = 0; end = max -> identity * Flexible shapes can be passed through * Flexible shapes can't be sliced over Pull Request resolved: https://github.com/pytorch/pytorch/pull/59364 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_slice Reviewed By: anshuljain1 Differential Revision: D28881039 fbshipit-source-id: 3c1c630ff27b5bba6eda403d87570c61d43ae90e	2021-07-07 12:40:47 -07:00
Akshit Khurana	d26372794a	Add aten::detach NNAPI converter (#58543 ) Summary: * Add support for aten::detach op in the NNAPI model converter as a no-op * Also add flexible op support for add_pointwise_simple_unary_op Pull Request resolved: https://github.com/pytorch/pytorch/pull/58543 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_detatch Reviewed By: anshuljain1 Differential Revision: D28531942 fbshipit-source-id: 4387dbbbadd8ce6b690841f3a903e68a380b849d	2021-07-07 12:40:46 -07:00
Akshit Khurana	0be228dd5f	Add aten::flatten NNAPI converter (#60885 ) Summary: Add support for aten::div op in the NNAPI model converter. Startup time variable size support isn't supported as shapes go as inputs to NNAPI op Runtime variable size support to supported soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/60885 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29451725 fbshipit-source-id: 8902745f7758c8cc88ad4b4ce02b8301ff894bd4	2021-07-07 12:40:44 -07:00
Akshit Khurana	b297f65b66	Add aten::div NNAPI converter (#58541 ) Summary: Add support for aten::div op in the NNAPI model converter. Add variable size input test as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58541 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_div Reviewed By: anshuljain1 Differential Revision: D28531943 fbshipit-source-id: e96342146f6de216f7b88443618edfc54963747c	2021-07-07 12:40:42 -07:00
Akshit Khurana	eab18a9a40	Add aten::to NNAPI converter (#58540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58540 Add support for aten::to op in the NNAPI model converter for simple cases like to("cpu"), to("gpu") Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_to Reviewed By: anshuljain1 Differential Revision: D28531941 fbshipit-source-id: 0c934f7aceaff2669307c3426efe32046d8c44f3	2021-07-07 12:40:41 -07:00
Akshit Khurana	14d604a13e	Add aten::softmax NNAPI converter (#58539 ) Summary: Add support for aten::softmax op in the NNAPI model converter with flexible size Pull Request resolved: https://github.com/pytorch/pytorch/pull/58539 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_softmax Reviewed By: anshuljain1 Differential Revision: D28531946 fbshipit-source-id: 8633f3e3f7f52795f9866ff16ad0867ea36a19e8	2021-07-07 12:39:31 -07:00
Akshit Khurana	369802a504	Add aten::avgpool2d NNAPI converter (#58538 ) Summary: Add support for aten::avgpool2d op in the NNAPI model converter with var size support Pull Request resolved: https://github.com/pytorch/pytorch/pull/58538 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_avgpool2d Reviewed By: anshuljain1 Differential Revision: D28531944 fbshipit-source-id: 43ff8c9389365698c282f204042b49c7ec84d824	2021-07-01 14:07:14 -07:00
Akshit Khurana	c4bb6a5781	NNAPI: flex size support for upsample_nearest2d op (#57563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57563 Add flexible size support for upsample_nearest2d op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200847 fbshipit-source-id: 901fe3f6e68e4c16ece730f3ffa68dc88c6ed6c3	2021-05-05 13:54:43 -07:00
Akshit Khurana	4c609a9782	NNAPI: Add qadd flexible size support (#57562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57562 Add flexible size support for qadd op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200849 fbshipit-source-id: d5b2ea8e9eb8ae405ff2c960f7549cef60bc0991	2021-05-05 13:54:41 -07:00
Akshit Khurana	28cd04ea64	NNAPI: add flexible size support for conv2d (#57561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57561 Add flexible size support for conv2d op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200848 fbshipit-source-id: d94ccf48a3d8453aa8e96c7cac02948c4cd870cc	2021-05-05 13:53:33 -07:00
Guilherme Leobas	e7c79cb158	Add type annotations to nnapi (#48142 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48141 ~Mypy is complaining about a missing arg in a function call.~ ```bash torch/backends/_nnapi/serializer.py:806: error: Too few arguments for "_do_add_binary" [call-arg] Found 1 error in 1 file (checked 1140 source files) ``` `9392137dbe/torch/backends/_nnapi/serializer.py (L804-L806)` ~dreiss, would you mind take a look when you have some cycles to spare and see what would be the appropriated value for `fuse_code` here? Thanks :)~ Edit: https://github.com/pytorch/pytorch/issues/48925 got merged a couple of days ago. The blocking part is now unblocked, and I just pushed the changes to make mypy happy again. This PR is ready for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48142 Reviewed By: ezyang Differential Revision: D28006249 Pulled By: walterddr fbshipit-source-id: 5e43eeba7143512a549efaad31541f86718add7c	2021-04-26 19:08:07 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
David Reiss	da7a27b847	[NNAPI] Initial flexible size support (#54701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54701 We need NNAPI models to support inputs (and, by extension, intermediate values and outputs) whose shape is only determined at load time. For example, a vision models input shape might be dependent on the aspect ratio of the device camera. While NNAPI has full support for variable shapes (by setting components of the operand shape to 0), the guidance we have received is that vendor-provided drivers for real hardware are not able to support this efficiently. Therefore, we take a hybrid approach where shapes are calculated at model load time to semi-dynamically construct our NNAPI model. While this doesn't let us have truly dynamic input shapes, it does allow us to ensure that the vendor driver only sees fixed shapes, so we get maximum performance. In this initial commit, only PReLU supports dynamic shapes. Additional operators will be converted in separate diffs. - In order to convert a flexible-shape model, the user supplies inputs with shapes containing dimensions of size 0 for the flexible dimensions. - During conversion, we generate code to compute the shapes of all intermediates and outputs as a function of the input shapes. - We no longer run the input model to produce the output templates. Instead, we generate code to return properly-sized templates, given the input shapes. - All of this generated code goes into a "ShapeComputeModule" that is used by the NnapiModule during initialization. - The ShapeComputeModule mutates the serialized model to fill in the computed sizes for each operand. This requires us to change the dtype for the serialized model to int32, but this should be fine because everything in it is already 4-byte aligned. - NnapiInitWrapper no longer exists. Instead, initialization is performed on the first run, based on the real arguments. We plan to provide an API for doing eager initialization. - Unit test updated to allow separate arguments to be given for trace, conversion, and inference. A flexible-shape test case was added for PReLU. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536796 Pulled By: dreiss fbshipit-source-id: 105585f247987b1e6ec6946a6fe44401237cb0a0	2021-04-06 13:49:43 -07:00
David Reiss	1e3b3a4714	[NNAPI] Create get_next_operand_id (#54700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54700 This is an internal method just to make it more clear what len(self.operands) is doing. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536794 Pulled By: dreiss fbshipit-source-id: 678cee8a47df6757dd2e6feabf2560fd82d32e26	2021-04-06 13:49:41 -07:00
David Reiss	ca67c17e46	[NNAPI] Add fixed-size assertions (#54699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54699 We'll soon be adding support for flexible-size tensors to the NNAPI converter, but it won't be added to all ops at once. Create get_tensor_operand_by_jitval_fixed_size as a wrapper for get_tensor_operand_by_jitval that verifies that the argument has a fixed shape. Update all call sites. As flexible size support is added to each op, the call sites can be converted back and proper size checks added. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536791 Pulled By: dreiss fbshipit-source-id: 6fb1fea814d767b6ff263fd8b88240a51be74777	2021-04-06 13:49:38 -07:00
David Reiss	5936faee7e	[NNAPI] Rename local variable (#54698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54698 "mf" was short for memory format, but the concept that this variable represents was renamed to "dim_order", so rename the variable. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536793 Pulled By: dreiss fbshipit-source-id: 2b31c70da1ff221a7833e67486690fa606f01dea	2021-04-06 13:49:35 -07:00
David Reiss	1f1d26137b	[NNAPI] Use code generation to better support list input/output (#54697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54697 Previously, models being converted to NNAPI were expected to take inputs as separate arguments, but the generated NNAPI model could only take multiple inputs as a list. Now the generated model always takes inputs (single or multiple) as separate tensor arguments. Previously, models being converted to NNAPI were expected to return outputs as a single tensor or tuple of tensors, but the generated NNAPI model would return multiple outputs as a list. Now the generated model returns a tuple as well (or single tensor). Internally, we decied what output format to use (single tensor or tuple) based on the conversion process, rather than by running the model. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536790 Pulled By: dreiss fbshipit-source-id: c0f93c85d450757e568985947cc2f32043795859	2021-04-06 13:49:33 -07:00
David Reiss	d34d6244e7	[NNAPI] Use array instead of struct for serializing ints (#54696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54696 This was originally developed for a Python version where array was not available. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536792 Pulled By: dreiss fbshipit-source-id: 39e5507e37d4f91871113439fe752a4d5373eaba	2021-04-06 13:49:30 -07:00
David Reiss	476c597ae6	[NNAPI] Handle binary ops combining NHWC+NCHW in some cases (#48812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48812 This came up in a squeeze-and-excitation model. Starting with an NHWC tensor T, we perform a mean operation across H and W, giving an NxC tensor, which (after some fully connected layers) is reshaped to NxCx1x1, then multiplied with T. To handle this, we detect the specific case of a binary op with one NHWC input and one contiguous input with H,W == 1,1 and allow the op to be applied (after transposing the contiguous input). Test Plan: Unit test. Reviewed By: axitkhurana Differential Revision: D25317939 Pulled By: dreiss fbshipit-source-id: b4c17ab3b874d1a7defa04664010ba82115f1c20	2021-04-06 13:49:25 -07:00
David Reiss	b057d27b0b	[NNAPI] Add support for unsqueeze, cat, and mean (#48811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48811 Test Plan: Unit tests. Reviewed By: axitkhurana Differential Revision: D25317936 Pulled By: dreiss fbshipit-source-id: 9b3a0a75b8157ae35ac13d52293a67800bad0ded	2021-04-06 13:49:22 -07:00
David Reiss	8fcf9ca341	[NNAPI] Update support for Linear (#54695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54695 Previously, torch.nn.Linear was calling aten::addmm internally. Now it's calling aten::linear, so add support for that. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536795 Pulled By: dreiss fbshipit-source-id: 42c8d2a80b20ac12ed9bba599c5e0e874256bb13	2021-04-06 13:49:17 -07:00
David Reiss	8d960f7043	[NNAPI] Fix hardtanh (#47520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47520 NNAPI defines "RELU1" as clamping from [-1, 1], not [0, 1] as I previously assumed. Fix our implementation to match. Test Plan: Upcoming unit test. Reviewed By: axitkhurana Differential Revision: D25317934 Pulled By: dreiss fbshipit-source-id: 70efd5bb6092b0628ff6b765ce6f6274ef28d741	2021-04-06 13:49:14 -07:00
David Reiss	beca1fdbec	[NNAPI] Fix MUL op (#47519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47519 This wasn't updated when _do_add_binary was refactored. Test Plan: Upcoming unit test. Reviewed By: axitkhurana Differential Revision: D25317938 Pulled By: dreiss fbshipit-source-id: 99212404c189481cfa692dd77d8f7c7865b6872b	2021-04-06 13:49:12 -07:00
David Reiss	38a3c28f17	[NNAPI] Remove solid weights support (#47518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47518 This was left over from an old version of the code. The idea was that instead of indexing into separate tensors for each weight, you could bundle them all into a single file and use different offsets into that file. With the current design, this is nontrivial to support, so drop the code for now. Test Plan: CI Reviewed By: axitkhurana Differential Revision: D25317935 Pulled By: dreiss fbshipit-source-id: e26ab3a8d437cb1bbb50319209fa56d9c571ce61	2021-04-06 13:49:09 -07:00
David Reiss	1be909f074	[NNAPI] Fix models with no weights (#47517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47517 While we're unlikely to see this in practice, it comes up in unit tests. This type annotation is necessary for `torch.jit.script` to figure out the type of the list if it is empty. Test Plan: Unit tests in a later diff. Reviewed By: axitkhurana Differential Revision: D25317937 Pulled By: dreiss fbshipit-source-id: de8b6665c6fcd3cd2b39e3c696a39336c064e4c1	2021-04-06 13:49:06 -07:00
Akshit Khurana	d0fd41dcfe	Add size op in nnapi serializer (#52026 ) Summary: serializer didn't support aten::size Pull Request resolved: https://github.com/pytorch/pytorch/pull/52026 Test Plan: Torchvision Mobilenetv2 [script](https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.html) works. [Test](`ecfed07cc5`) to be merged after [this PR](https://github.com/pytorch/pytorch/pull/47521/files) is merged Reviewed By: dreiss Differential Revision: D26363133 Pulled By: axitkhurana fbshipit-source-id: 772a6bea62bca69f8bba19c25c582a1734a70eb1	2021-02-10 15:57:01 -08:00
David Reiss	9a9383ef2e	PyTorch NNAPI integration prototype (#46780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46780 This is in prototype status, but pretty functional. There are two major parts. - Model converter. This is a pure Python component that consumes a model in TorchScript format, converts the operations into NNAPI semantics, and serializes the model in a custom format. It then wraps the result in a new TorchScript model that can invoke NNAPI under the hood. - Runtime. This is a TorchBind object that deserializes the model and sends the result to NNAPI. This is fairly simple since the serialized format is basically just a list of NNAPI calls to make, so most of the code is spent on bounds checking. A few notes on the design. - Currently, all tensor sizes need to be fixed, and those fixed sizes are burned directly into the serialized model. This will probably need to change. NNAPI supports variable-sized tensors, but the important hardware backends do not. However, we're seeing use cases crop up where the input size is not known until around the time that the model is loaded (for example, it might depend on the camera aspect ratio). I think the proper fix here is to remove the code in the converter that eagerly calculates the sizes of the intermediate tensors and replace it with a code generator that will generate some TorchScript code that will perform those calculations at model load time. This way, we will be able to support models that have variable-sized inputs while still only showing fixed-sized operands to NNAPI. - The important hardware backends want operands to be in NHWC order, but PyTorch natively represents all tensors and NCHW. The strategy for this is to keep NCHW during most of the conversion process, but track and additional value per operand representing the "dimension order". The dimension order gets propagated through convolutions and pointwise ops. When we're ready to serialize the model, we reorder the dimensions for "channels last" operands to NHWC. Test Plan: Some local testing with FB prod models. I'll need to add some examples and automated tests. Reviewed By: iseeyuan Differential Revision: D24574040 Pulled By: dreiss fbshipit-source-id: 6adc8571b234877ee3666ec0c0de24da35c38a1f	2020-11-05 21:31:01 -08:00
Jane (Yuan) Xu	1c996b7170	Enable typechecking for torch.testing._internal.common_quantized.* (#44805 ) Summary: Addresses a subproblem of [Issue 42969](https://github.com/pytorch/pytorch/issues/42969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44805 Reviewed By: malfet Differential Revision: D23742754 Pulled By: janeyx99 fbshipit-source-id: e916a6a0c049cac318549a485d47f19363087d15	2020-09-17 14:24:32 -07:00
Xiang Gao	e48201c5cf	Mention TF32 on related docs (#44690 ) Summary: cc: ptrblck ![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690 Reviewed By: ngimel Differential Revision: D23727921 Pulled By: mruberry fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914	2020-09-16 19:18:30 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Nikita Shulga	c44e4878ae	Enable torch.backends.quantized typechecks (#44794 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44794 Reviewed By: walterddr Differential Revision: D23734353 Pulled By: malfet fbshipit-source-id: 491bd7c8f147759715eb296d7537a172685aa066	2020-09-16 12:21:20 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
Shawn Zhong	21ba3b4f40	Fix `torch.backends.cudnn` mypy error (#38947 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38410 ![image](https://user-images.githubusercontent.com/6421097/82724121-74b26880-9c99-11ea-9b63-e92de2dccdf2.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38947 Differential Revision: D21765290 Pulled By: ezyang fbshipit-source-id: 5d2b25f039a653c609d60cdaac4a7ac5812ae291	2020-06-03 10:55:43 -07:00
guol-fnst	42b2dee6c2	`verbose` unused in `torch.backends.cudnn` (#39228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39228 Differential Revision: D21818455 Pulled By: ezyang fbshipit-source-id: abf158f2d745fd135cd0966ee30d559cefa456c0	2020-06-01 09:08:03 -07:00
Ailing Zhang	7c13a07286	[Reland] Remove uses of type() part 2 (#38288 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/38140. It got reverted since it broke slow tests which were only run on master branch(thanks mruberry !). Enabling all CI tests in this PR to make sure they pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38288 Reviewed By: mruberry Differential Revision: D21524923 Pulled By: ailzhang fbshipit-source-id: 3a9ecc7461781066499c677249112434b08d2783	2020-05-12 13:37:14 -07:00
Mike Ruberry	f6b1c046b6	Revert D21483808: [pytorch][PR] Remove uses of type() part 2 Test Plan: revert-hammer Differential Revision: D21483808 Original commit changeset: 12f5de6151ba fbshipit-source-id: 2755fa97ae3f342ae88b1531acfa790772a27c17	2020-05-09 00:42:39 -07:00
Ailing Zhang	86d28706e0	Remove uses of type() part 2 (#38140 ) Summary: I'm mostly done with cleaning up test/ folder. There're a bunch of remaining callsites but they're "valid" in testing `type()` functionalities. We cannot remove them until it's fully deprecated. Next PR would mainly focus on move some callsites to an internal API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38140 Differential Revision: D21483808 Pulled By: ailzhang fbshipit-source-id: 12f5de6151bae59374cfa0372e827651de7e1c0f	2020-05-08 19:30:46 -07:00
Kimish Patel	4c30fc7238	Integrate XNNPACK with custom class for packing weights. (#34047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34047 This PR integrates the added xnnpack conv2d and linear op via custom class registration for packed weights. The packed struct is serializable. Test Plan: python test test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185657 fbshipit-source-id: fc7e692d8f913e493b293b02d92f4e78536d7698	2020-03-14 12:51:56 -07:00
Peter Bell	5fc5cf6571	Stop using ctypes to interface with CUDA libraries. (#33678 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678 Differential Revision: D20249187 Pulled By: ezyang fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed	2020-03-11 07:22:46 -07:00
Jithun Nair	718c538ff9	Add ability to enable/disable MIOpen at runtime (#33118 ) Summary: 1. Set `torch._C.has_cudnn` to `True` for ROCm 2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()` 3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118 Differential Revision: D19977719 Pulled By: bddppq fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad	2020-02-20 10:47:57 -08:00
peter	b77c25dec0	Fix dll load logic for Python 3.8 on Windows (#32215 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215 Differential Revision: D19501869 Pulled By: ezyang fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915	2020-01-22 08:33:34 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Dmytro Dzhulgakov	764bf826e3	Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840 Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized Test Plan: Imported from OSS Differential Revision: D17604403 Pulled By: dzhulgakov fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc	2019-09-27 13:45:15 -07:00
Supriya Rao	45391ccecb	Update qengine flag in python to string (#26620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620 This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now). set_qengine and get_qengine return an int which represents the at::QEngine enum Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17533582 fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f	2019-09-23 17:56:50 -07:00
Jerry Zhang	8f50ea0f5c	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26471 att Test Plan: . Imported from OSS Differential Revision: D17491215 fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9	2019-09-19 17:42:09 -07:00
Ailing Zhang	b1ecf4bc82	Revert D17464904: Add NoQEngine to QEngine and refactor the name of set/get qengine Test Plan: revert-hammer Differential Revision: D17464904 Original commit changeset: d8f2cebb978f fbshipit-source-id: 8feb86f7347f455eb51538ce7893d4a096ba0ba4	2019-09-18 20:04:58 -07:00
Jerry Zhang	4f7292f7ee	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26330 att Test Plan: . Imported from OSS Differential Revision: D17464904 fbshipit-source-id: d8f2cebb978fcbc478bc7e111ba24bc71a6f8915	2019-09-18 19:38:59 -07:00
Supriya Rao	24d5b5f5f9	Add Runtime flag for quantized backend. (#25680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680 Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both. The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine) ghstack-source-id: 89935643 Test Plan: Verified torch.backends.quantized.engine works Differential Revision: D17198233 fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672	2019-09-11 21:37:36 -07:00
jiayisun	b9bf91feb8	Add torch.backends.mkldnn.enabled flag (#25459 ) Summary: This PR is about add torch.backends.mkldnn.enabled flag said in https://github.com/pytorch/pytorch/issues/25186 which can be used disable mkldnn at runtime step as torch.backends.cudnn.enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25459 Differential Revision: D17258926 Pulled By: ezyang fbshipit-source-id: e179ad364cc608fdaa7d0f37e2e762ceb5eda598	2019-09-11 12:09:40 -07:00
peter	d6f62b70f3	Fix cuda and cudnn libraries search process on Windows (#20205 ) Summary: Fixes #20202 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20205 Differential Revision: D15258626 Pulled By: ezyang fbshipit-source-id: 855ad457a8bb7a46accc7cf6ec5cb09e98f6e770	2019-05-08 06:08:47 -07:00
Tongzhou Wang	973d51079b	Add device-specific cuFFT plan caches (#19300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300 Differential Revision: D14986967 Pulled By: soumith fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255	2019-04-18 06:39:35 -07:00
Edward Yang	50df3e5e2e	Add ability to query if built with CUDA and MKL-DNN. (#18362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362 ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18362 Add ability to query if built with CUDA and MKL-DNN. Fixes #18108. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14584430 fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473	2019-03-25 10:39:09 -07:00
SsnL	13422fca32	Add torch.backends.openmp.is_available(); fix some cmake messages (#16425 ) Summary: 1. add `torch.backends.openmp.is_available()` 2. Improve various `cmake` outputs 3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets 4. Fix `MKL` warning message, and QUIET flag. 5. Fix various typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425 Differential Revision: D13903395 Pulled By: soumith fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d	2019-01-31 16:15:46 -08:00
Lu Fang	b1b00f329e	Fix the flake8 linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549 Reviewed By: bddppq Differential Revision: D13877435 Pulled By: houseroad fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540	2019-01-30 09:36:00 -08:00
David Riazati	bc74ec80d0	Add support for torch.backends.cudnn.enabled (#13057 ) Summary: This is used commonly in `nn` functions. This PR adds it as a weak module (and also alters the conversion of weak modules to strong modules to accept ordinary `object`s) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057 Differential Revision: D10846618 Pulled By: driazati fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b	2018-10-31 09:31:09 -07:00
sclarkson	2b033332c8	Allow linking to backwards-compatible cuDNN at runtime (#12239 ) Summary: Fixes #12193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12239 Differential Revision: D10321744 Pulled By: soumith fbshipit-source-id: bf437f7f9b6231158a1585d2dabae8d937396478	2018-10-10 23:56:51 -07:00
Matt Dawkins	87b2f05a9c	Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435 ) Summary: Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435 Differential Revision: D9736396 Pulled By: soumith fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f	2018-09-09 11:40:25 -07:00
Peter Goldsborough	9ce15173fb	Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012 ) Summary: The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class. The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set. To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API. I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN. ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/9012 Reviewed By: pjh5 Differential Revision: D8689786 Pulled By: goldsborough fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c	2018-06-29 17:25:23 -07:00
Tongzhou Wang	e6c7b38f94	Cache cufft plans (#8344 ) * cache cufft plans * use an LRU cache * suffix CuFFTParams members with _ * import print_function for py2 * lint * fix potential race; add dummy impl for CPU only builds * cpp formatting; remove nccl makefile change * Use CUDA hooks instead * comments and doc * update the error message * move LRU cachae to a separate file and native::detail namespace * update comment * specify NOTE location in CuFFTPlanCache.h * update disabled_features.yaml to make amd ci work * another fix for AMD CI in disabled_features.yaml * Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__ * improve the notes * lint * revert onnx change * put back inlining for CUFFT_CHECK	2018-06-22 13:02:34 -04:00
Peter Goldsborough	0acddd6cee	Add torch.cuda.cudnn_is_available (#8703 )	2018-06-20 14:18:03 -07:00
Edward Z. Yang	64834f6fb8	Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275 ) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a hooks interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-10 10:28:33 -07:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
gchanan	749d51414a	Separate cuda-ness from dtype. (#6470 ) * Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.	2018-04-12 14:05:44 -04:00
Tongzhou Wang	22ef8e5654	[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855 ) This is the first of three PRs that #5537 will be split into. This PR adds mkl headers to included files, and provides helper functions for MKL fft and cuFFT. In particular, on POSIX, headers are using mkl-include from conda, and on Windows, it is from a new file @yf225 and I made and uploaded to s3. * add mkl-include to required packages * include MKL headers; add AT_MKL_ENABLED flag; add a method to query MKL availability * Add MKL and CUFFT helpers	2018-03-19 15:43:14 -04:00
gchanan	a3442f62bc	Support native namespace functions with type dispatch. (#5576 ) * Support native namespace functions with type dispatch. Use 'ones' as an example. Note this is a "halfway" solution; i.e. the call chain is: at::ones(shape, dtype) -> dtype.ones(shape, dtype) -> CPUFloatType.ones(shape, dtype) -> at::native::ones(shape, dtype) The "nicer" solution would probably be something like: at::ones(shape, dtype) -> dtype.ones(shape) -> CPUFloatType.ones(shape) -> at::native::ones(shape, this) * Fix type inference. * Fix test install. * Fix extensions. * Put dtype argument at the beginning. * Fix extension.cpp. * Fix rnn. * Move zeros in the same manner. * Fix cuda. * Change randn. * Change rand. * Change randperm. * Fix aten contrib. * Resize in randperm_out. * Implement eye. * Fix sparse zeros. * linspace, logspace. * arange. * range. * Remove type dispatch from gen_python_functions. * Properly generate maybe_init_cuda for type dispatch functions not named type. * Don't duplicate dtype, this parameters for native type dispatched functions. * Call VariableType factory methods from the base type so it gets version number 0. * Address review comments.	2018-03-09 10:52:53 -05:00
Edward Z. Yang	0877558e60	Port cuDNN RNN dropout state initialization to ATen and make Python c… (#5383 ) * Port cuDNN RNN dropout state initialization to ATen and make Python code use it. Fixes #5138. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Variable/Tensor bugfix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-02 10:00:00 -05:00
Sam Gross	895aebac08	Use Variable instead of Tensor in Function.forward (#4786 ) The Tensor and Variable classes are being merged. autograd.Function.forward is now called on Variables, but with "no-grad" mode (torch.no_grad()) enabled. One benefit is that we no longer have to explicitly track shared storages.	2018-02-06 17:24:27 -05:00
Edward Z. Yang	7bd2db997e	Port cuDNN RNN bindings to ATen (#4881 ) * Add transpose() to TensorGeometry. This code is dead; I briefly used it in my RNN patchset but eventually rewrote it to not be necessary. However, it seemed like a useful gadget so I kept it. In general, it seems that it would be useful for TensorGeometry to support all operations that Tensor does, but it only computes the changes to sizes/strides instead of actually doing the computation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Turn on wrap_dim behavior for TensorGeometry Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support for hard-coded differentiable outputs. Some outputs of functions are nondifferentiable, and should always be returned with requires_grad=False. Traditionally, we have used the presence of 'grad' to signal that only the first output is differentiable, and the rest are not, but cudnn_rnn (to be implemented) breaks this pattern; its first three outputs are differentiable, but its last output is a buffer that is just consumed by backwards. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * TensorGeometry constructor from just sizes The sizes are assumed to form a contiguous tensor, and we compute the strides we would get in that case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support saving TensorList for backwards. There is some back story here. Saved TensorList in backwards will be used by cudnn_rnn, and it is worth asking, why is it necessary to save a list of tensors? Indeed, technically speaking a list of tensors is not necessary, we only need to save the sizes of each of the weight tensors. (We need the sizes because cuDNN is only going to blast the derivative of weights into a flat buffer, but we need to match the sizes of the views into the buffer when we eventually return the derivatives.) However, it was surprisingly awful trying to implement passing just sizes, because as non-Tensor arguments, the JIT interpreter generation code is expected to handle all non-Tensor arguments as attributes in the trace, and our attributes struct doesn't actually know how to do arrays of arrays. Saved TensorList code was much easier to get working, so that's what this patch does. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef. Like ArrayRef, this class does not own the underlying data, it is expected to be used in situations where the data resides in some other buffer. This is intended to be trivially copyable, so it should be passed by value. For now, 2D only (so the copies are actually cheap, without having to write a SmallVector class) and contiguous only (so we can return non-strided ArrayRef on index). The intended use-case (not in this commit) is to make it easier to work with RNN weights, which are num_weights x num_layers matrix of parameters. P.S. dimension 0 indexes rows, dimension 1 indexes columns Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Generalize getDataType in Descriptors.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Change copy_range to take Tensor, and change cat_tensors_backward accordingly Should a backward function return a Variable or a Tensor? For the most part, all of our backward functions return Tensor, except cat_tensors_backward, which returns a variable_list (which is really the only thing that matters, because Tensor and Variable are interconvertible). But this is kind of weird, because it means that you can't implement a backwards in ATen that returns a std::vector<Tensor>, and then hook it up transparently with the derivatives code. So I switched it over. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 5-ary return Tensor tuple. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support code generation with mixed Tensor/TensorList in output. I don't think I ended up using this in cudnn_rnn, but this seems it might be useful for someone else later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 4-ary boolean array Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add support for retain_variables in tools/autograd/derivatives.yaml 'retain_variables', a bool which is true if a user has specified that saved variables should be retained in case the backwards is run again later. This allows an optimization where we can destroy saved buffers if we know variables are not going to be retained, e.g., it is (will be) used by _cudnn_rnn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lazily initialize cuDNN descriptors Previously, cuDNN descriptors were eagerly allocated as soon as a FooDescriptor object was created. However, in some uses of TensorDescriptor, this is problematic: some tensors are optional and cuDNN's API expects to be given a nullptr TensorDescriptor in this case, not an uninitialized (but allocated) descriptor. Lazily initializing the descriptors makes it less likely for us to use uninitialized memory and matches the usual semantics of unique_ptr. It's good sense! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port cuDNN RNNs to ATen. This brings three new functions: - _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into a single contiguous weight buffer as required by cuDNN - _cudnn_rnn: run RNN forwards - _cudnn_rnn_backward: run RNN backwards RNNs have a lot of parameters, so we restructured what was previously a single 'fn' object that recorded all the parameters into three objects: RNNDescriptorParams, TensorDescriptorListParams and DropoutDescriptorParams. We make use of MatrixRef to organize the weight tensors (which are weight/bias x number of layers), but I did not teach the codegen how to pass these as arguments/return values natively, so instead a MatrixRef is passed as its constituent ArrayRef and int64_t stride0. cudnn_rnn has three differentiable outputs and one nondifferentiable one, so it makes use of the support for hard-coded differentiable outputs. I haven't deleted all of the descriptor code from Python, because dropout initialization still goes through this codepath, that should be fixed soon but I don't see it as essential for this PR. This commit also removes the last use of NestedIOFunction from PyTorch. There are some shenanigans with cuDNN dropout descriptor initialization, see below: Note [cuDNN dropout descriptor initialization] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In most cases, setting descriptors in cuDNN is cheap (e.g., cudnnSetTensorNdDescriptor). However, this is not the case for cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an expensive precomputation to initialize the random number generator states. In cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor, which means that law-abiding clients were expected to generate a dropout descriptor once and cache it. However, our ATen interface is (1) stateless (so we can't cache the descriptors) and (2) does not accept arbitrary user types in its interface (so we can't pass the descriptor in). This puts us in a pickle. In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which forgoes the expensive initialization process, and can initialize the descriptor with a pre-initialized state CUDA tensor. This is great, because it means we can simply pass in the state tensor and then initialize the descriptor internally. Unfortunately, this function is not available in cuDNN 6. To work around this, we break the cuDNN abstraction barrier, and have the struct layout of the underlaying dropout descriptor. With this struct, we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix cuDNN 7 behavior. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused, controversial methods from MatrixRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add missing filter_dim_a slice Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Replace nested for-loop with itertools.chain. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR comment on mut_desc() Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Refactor DropoutDescriptor API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Use cached CurrentDeviceProperties from Context. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Document _cudnn_rnn outputs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Improve fmap docs, convert some functions to use it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Move IndexRange to autograd/function.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Elaborate on CUDNN_STATUS_INVALID_VALUE return some more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add an all-in-one setter for RNNDescriptorParams. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Print what the unrecognized RNN mode was Signed-off-by: Edward Z. Yang <ezyang@fb.com> * RNN TensorDescriptor improvements - Have an explicit size/stride overload for set TensorDescriptor, so you don't have to create a goofy view to feed in. - Change the padding to 3D rather than 5D, which is all you actually need (it's just 2D that is not supported by cuDNN API.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix implementation of cudnnRestoreDropoutDescriptor, plus test. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Better comments about input layout. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add comment about no-DropoutDescriptor argument RNNDescriptor function. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Rename vocab_size back to input_size. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't use backslash in comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Bugfix for contiguous TensorGeometry calculation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make contiguity errors more user-friendly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/fn.dropout.train/fn_train/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make dcx properly undefined when not required. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Remove old TODO. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add state size check in cudnnRestoreDropoutDescriptor Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Explicitly narrow int64_t to size_t Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Restore copyParams comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update benchmark numbers, and slight engineering improvements. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Typofix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-05 13:54:11 -05:00
Edward Z. Yang	7d25a41251	Fix #4492 , make it impossible to forget to reset cudnn flags (#4503 ) Three stage plan to no more stupidly weird "why isn't cuDNN enabled" bugs: - Add torch.backends.cudnn.disable_global_flags(), which as its name suggests, disables global flag setting in cuDNN, so that you are not allowed to make changes to this state. However, the flags() context manager continues to work (since they are non-global changes). - Call disable_global_flags() in test/common.py - Switch all of the manual flag setting/unsetting in test/test_nn.py to use the context manager. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-08 12:21:09 -05:00
Edward Z. Yang	5f7c5502b8	Further improvements to ATen convolution (#4287 ) - Rename THNN convolution to have thnn_ prefix. - Propagate CuDNN benchmark and deterministic to at::Context - Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults The conv_transposeNd wrappers are updated to have the same argument order as Python. - torch.nn.functional directly dispatches to the native wrappers - Make it possible to turn off tracing for some native wrappers, so I don't have to write symbolics for all the functions above - Spectral ops can now make use of CuDNN convolution if possible - Better commentary on cudnn_batch_norm - Turn on DCE for all JIT tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-21 13:03:43 -05:00
Edward Z. Yang	787b9c5202	Propagate CuDNN enabled to ATen library. (#4104 ) This is not currently used by anything, but eventually ATen will need to make decisions about whether or not to use CuDNN functions or not, which means we need to propagate this variable to ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-14 11:29:25 -05:00
Richard Zou	28890b2046	Add rnn args check (#3925 ) * Add rnn args check * Check both hidden sizes for LSTM * RNN args check test	2017-12-13 12:48:00 -05:00
peter	ba3b79b06b	Fix the missing import	2017-11-14 09:36:43 +01:00
Christian Sarofeen	0443c11f7e	Fix for cuDNN half precision RNN for pre-volta archs (#3613 ) * Fix for cuDNN half RNN on pre-volta archs * Fix cuDNN versioning in rnn. * lint fix	2017-11-11 11:34:58 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
Sean Naren	cf256ee268	Added tensor op check for cudnn rnns (#3409 )	2017-11-01 05:51:23 -04:00
Priya Goyal	2443fcac0b	Deterministic cudnn algorithms	2017-10-10 10:53:34 -04:00
Adam Paszke	ceb4f84d12	Improve memory usage of cuDNN RNN modules (#2179 )	2017-07-25 04:00:17 +05:30
Gregory Chanan	69287250d1	Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests.	2017-06-11 05:37:59 -04:00
Sam Gross	625850c2c2	Check cuDNN version at runtime (#1586 ) * Check cuDNN version at runtime This checks that the version from cudnn.h matches the version from libcudnn.so. Fixes #1476 * Only check major and minor version numbers	2017-05-19 01:55:09 -04:00
Sam Gross	e6c9509a41	Fix call to Tensor.set_ in rnn.py (#1592 )	2017-05-18 20:28:49 -04:00
Sam Gross	b9379cfab7	Use cuDNN and NCCL symbols from _C library (#1017 ) This ensures that we use the same library at the C++ level and with Python ctypes. It moves the searching for the correct library from run-time to compile-time.	2017-03-16 16:10:17 -04:00
Adam Paszke	1487278fdf	Allow backprop through cuDNN RNN in eval mode Handling of dropout descriptors has been improved too.	2017-03-01 19:42:39 +01:00
Adam Paszke	da725830c2	Add support for variable length sequences in RNNs (#873 )	2017-03-01 17:36:32 +01:00
Christian Sarofeen	04aba1caec	Fix cuDNN dropout desc for multi-gpu (#772 )	2017-02-17 19:16:12 +01:00
bdfhjk	a217fefee1	Update rnn.py Fixed a problem with outputting the RuntimeError if arguments are incorrect in cudnn/rnn.py	2017-02-15 21:49:42 +01:00
Adam Paszke	72c1982734	Add some more asserts to cuDNN RNN	2017-02-14 21:28:50 +01:00
Adam Paszke	63edca44f2	Add tests for non-contiguous inputs and gradients	2017-02-14 21:28:50 +01:00
ngimel	f096fb6859	adding cudnn V6 support (#515 )	2017-01-31 02:01:37 +01:00
Adam Paszke	0180e638e5	Remove unnecessary zero_() calls in cuDNN RNN	2017-01-28 14:36:57 +01:00
Adam Paszke	95c6ae04fb	Fix non-contiguous grad handling in cuDNN RNN	2017-01-28 14:36:57 +01:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
ngimel	b32dd4a876	add cudnn deb package installation paths to cudnn discovery, add 5.1.10 to load options (#448 )	2017-01-13 14:32:23 -05:00
ngimel	59b23d79c6	fix cudnn rnn batch_first with tests (#445 ) * fix cudnn rnn batch_first with tests	2017-01-13 13:40:27 -05:00
Adam Lerer	183b3aacd2	Hold CuDNN PRNG state between RNN iterations	2016-12-30 00:14:55 +01:00
Sam Gross	8a29338837	Use cuDNN for Conv3d and ConvTranspose3d (#359 ) I've also updated test_nn.py to run marked tests twice: once with cuDNN enabled and once with it disabled.	2016-12-28 16:14:47 -05:00
Adam Paszke	cd82b2b869	Implement comparison and logical operators for tensors	2016-12-28 00:04:08 +01:00
soumith	a9c2809ce3	change the order of cudnn libs	2016-12-21 05:44:16 -08:00
Sergey Zagoruyko	5586f48ad5	add cudnn 5.0.5 to supported versions (#321 )	2016-12-17 07:57:20 -05:00
Adam Paszke	8e09f0590b	Make sure that C extension was compiled with cuDNN before using it	2016-12-15 00:47:55 +01:00
Adam Paszke	0580f5a928	Add __len__ for tensors	2016-12-01 23:14:41 +01:00
Marat Dukhan	e3f440b1d0	Make torch.backends.cudnn work on OSX	2016-11-22 19:06:08 +01:00
Adam Lerer	7f51af7cbc	adding dropout, bidirection, etc. to RNN (#214 )	2016-11-10 13:25:14 -05:00
Sam Gross	ad2d413c0b	Add C++ bindings for cuDNN (#167 ) The Python ctypes bindings overhead was high enough that it slowed down multi-gpu training when using 4+ Maxwell GPUs.	2016-10-26 19:51:48 -04:00
Adam Lerer	b5d13296c6	addressing comments	2016-10-23 21:11:22 -07:00
Adam Lerer	86288265ad	Adding rnn cell library	2016-10-23 20:23:48 -07:00
Adam Lerer	1eb6870853	add nobias option to rnn	2016-10-23 20:23:48 -07:00
Adam Lerer	942ca477a6	Copying weights for CUDNN	2016-10-23 20:23:48 -07:00
Adam Lerer	b0e33fb473	cudnn + THNN match with parameters	2016-10-23 20:23:48 -07:00
Adam Lerer	d58b627b98	CUDNN RNN bindings	2016-10-23 20:23:48 -07:00
Sam Gross	a02917f502	Fix typo	2016-10-14 14:07:29 -07:00
Sam Gross	70d8bd04c0	Make cuDNN descriptors extend object Fixes weird double __del__ issue	2016-10-14 13:58:20 -07:00
Soumith Chintala	50326e94b1	try cudnn 5.1.5 and 5.1.3 in that order to load them up. This is needed because cudnn for cuda 7.5 ships with 5.1.3 and cudnn for cuda 8.0 ships with 5.1.5	2016-10-09 22:26:43 -04:00
Soumith Chintala	160723b5b4	fix cudnn lib name	2016-10-09 21:19:50 -04:00
soumith	833bedb46b	cudnn relative check in binary builds	2016-10-02 11:45:46 -07:00
Sam Gross	14965cfce9	Run cuDNN operations on the correct device	2016-09-29 16:27:07 -07:00
Sam Gross	cb5d4e836f	Lazy load CUDA and THNN modules (#64 )	2016-09-28 19:29:53 -04:00
Soumith Chintala	412019dbe4	fixing CPU builds by making cuda imports optional	2016-09-28 11:56:18 -04:00
Sam Gross	779a460030	Add cuDNN support for convolutions (#36 )	2016-09-27 17:55:04 -04:00

... 2 3 4 5 6

290 Commits