pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
eellison	29fe90e2a2	[release/1.6] [JIT] Dont include view ops in autodiff graphs (#42029 ) * Dont include view ops in autodiff graphs * skip view ops in autodiff testing * two more tests * appease calng format * Pacify clang-format Co-authored-by: eellison <eellison@fb.com> Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>	2020-07-24 13:41:32 -07:00
Nikita Shulga	35ad2d8586	Revert "[jit] fix tuple alias analysis (#41992 )" This reverts commit `8aa878fc93`.	2020-07-24 13:32:00 -07:00
Michael Suo	8aa878fc93	[jit] fix tuple alias analysis (#41992 ) Previously when analyzing a TupleConstruct, we ignored the aliasing information of the inputs and simply marked all elements of the returned tuple as wildcards. But since we can fully reason about the contents of a tuple statically, we should be able to assign them aliasing information. This analysis was not only incomplete but produced incorrect results, since if `a` is not a wildcard, `a noalias wilcard`. So if we looked at `tuple(a)` and reported the aliasing info as `tuple(wildcard)`, then `tuple[0] noalias a`, which is...wrong.	2020-07-24 08:05:20 -07:00
Nikita Shulga	43d746305c	Preserve CUDA gencode flags (#41212 ) Summary: Add `torch._C._cuda_getArchFlags()` that returns list of architecture `torch_cuda` were compiled with Add `torch.cuda.get_arch_list()` and `torch.cuda.get_gencode_flags()` methods that returns architecture list and gencode flags PyTorch were compiled with Print warning if some of GPUs is not compatible with any of the CUBINs Pull Request resolved: https://github.com/pytorch/pytorch/pull/41173 Differential Revision: D22459998 Pulled By: malfet fbshipit-source-id: 65d40ae29e54a0ba0f3f2da11b821fdb4d452d95	2020-07-09 17:34:50 -07:00
Negin Raoof	9409e03903	[ONNX][1.6] Update interpolate recompute_scale_factor default (#41117 ) * Update interpolate recompute_scale_factor default * Update upsampling.h * Update functional.py	2020-07-09 17:24:53 -07:00
Rohan Varma	77ffb25925	Add guard for non-default stream in DDP's autograd engine callback (#40115 ) (#41151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40115 Closes https://github.com/pytorch/pytorch/issues/37790 Closes https://github.com/pytorch/pytorch/issues/37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106481208 Differential Revision: D22073353 fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e	2020-07-08 21:08:17 -07:00
Jerry Zhang	a857af50a4	[quant][graphmode][fix] cloning schema in insert_observers (#40624 ) (#40934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40624 Previously we didn't clone schema, so the default schema is used, this is causing issue for some models Test Plan: Imported from OSS Differential Revision: D22259519 fbshipit-source-id: e2a393a54cb18f55da0c7152a74ddc22079ac350	2020-07-07 13:27:36 -07:00
Jerry Zhang	d0045e5520	Some fixes for graph mode quantization (#40935 ) * [quant] aten::repeat work for quantized tensor (#40644) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40644 Test Plan: Imported from OSS Differential Revision: D22268558 fbshipit-source-id: 3bc9a129bece1b547c519772ecc6b980780fb904 * [quant][graphmode][fix] remove unsupported ops in the list (#40653) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40653 (Note: this ignores all push blocking failures!) Test Plan: Imported from OSS Differential Revision: D22271413 fbshipit-source-id: a01611b5d90849ac673fa5a310f910c858e907a3	2020-07-07 13:26:27 -07:00
Jerry Zhang	0406b69b79	[quant][graphmode][fix] Fold conv bn (#40865 ) (#40970 ) * [quant][graphmode][fix] Fold conv bn (#40865) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40865 1. applied filter for the module types 2. removed the assumption that the conv bn are immediate child of parent module Test Plan: python test/test_quantization.py TestQuantizeJitPasses Imported from OSS Differential Revision: D22338074 fbshipit-source-id: 64739a5e56c0a74249a1dbc2c8454b88ec32aa9e * [quant][graphmode][fix] Print the node in error message (#40889) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40889 Test Plan: Imported from OSS Differential Revision: D22348266 fbshipit-source-id: eed2ece5c94fcfaf187d6770bed4a7109f0c0b4a	2020-07-07 13:25:39 -07:00
Jerry Zhang	6220cc4380	[quant][graphmode][fix] dequantize propagation for {add/mul}_scalar + aten::repeat (#40933 ) * [quant][graphmode][fix] dequantize propagation for {add/mul}_scalar (#40596) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40596 Previously the fusion patterns for {add/mul}_scalar is inconsistent since the op pattern produces a non-quantized tensor and the op replacement graph produces a quantized tensor Test Plan: Imported from OSS Differential Revision: D22251072 fbshipit-source-id: e16eb92cf6611578cca1ed8ebde961f8d0610137 * [quant][graphmode] Support quantization for `aten::apend` (#40743) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40743 `aten::append` modifies input inplace and the output is ignored, these ops are not supported right now, so we'll need to first make `aten::append` non-inplace by change ``` ignored = aten::append(list, x) ``` to ``` x_list = aten::ListConstruct(x) result = aten::add(list, x_list) ``` and then quantize the aten::add instead. Test Plan: TestQuantizeJitOps.test_general_shape_ops Imported from OSS Differential Revision: D22302151 fbshipit-source-id: 931000388e7501e9dd17bec2fad8a96b71a5efc5	2020-07-07 13:25:02 -07:00
eellison	c35b4c770b	Bucket of shape analysis fixes (#41044 ) * [JIT] fix unfold shape analysis (#40749) Summary: unfold on a 0-dimensioned tensor returns a 1-dim tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/40749 Differential Revision: D22361481 Pulled By: eellison fbshipit-source-id: 621597e5f97f6e39953eb86f8b85bb4142527a9f * shape analysis fix for default dtype' ghstack-source-id: 723aa27c2685417715a0891f5ca1ae885d4c9832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40938 * fix grad thrashing of shape analysis ghstack-source-id: dd8742b1da52d17e9d6ab6c81ff0b27520b09417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40939 Co-authored-by: Elias Ellison <eellison@fb.com>	2020-07-07 12:59:47 -07:00
Mikhail Zolotukhin	11b70b0041	[JIT] Switch executor from Simple to Legacy. (#41017 ) * properly skip legacy tests regardless of the default executor (#40381) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40381 Differential Revision: D22173938 Pulled By: Krovatkin fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7 * [JIT] Switch executor from Simple to Legacy. This is done for 1.6 only in order to recover performance regressions caused by the Legacy->Simple switch that was done in 1.5. On master we still plan to use Simple executor and fix the performance issues in 1.7 without falling back to the Legacy executor. Co-authored-by: Nikolay Korovaiko <korovaikon@gmail.com>	2020-07-06 21:35:02 -07:00
Nick Korovaiko	3f13c9a2c8	infer tensor properties based on an input tensor rather than defaults for xxx_like ctors (#40895 ) (#41016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40895 Reviewed By: eellison Differential Revision: D22358878 Pulled By: Krovatkin fbshipit-source-id: 2db2429aa89c180d8e52a6bb1265308483da46a2	2020-07-06 16:52:59 -07:00
Nick Korovaiko	63a94c021a	shape inference of undefined for prim::grad (#40866 ) (#41015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40866 Reviewed By: pbelevich Differential Revision: D22358988 Pulled By: Krovatkin fbshipit-source-id: 7118d7f8d4eaf056cfb71dc0d588d38b1dfb0fc7	2020-07-06 16:51:37 -07:00
Nick Korovaiko	2b175ba909	update requires_gard on loop inputs correctly (master) (#40926 ) (#41014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40926 Reviewed By: eellison Differential Revision: D22359471 Pulled By: Krovatkin fbshipit-source-id: 823e87674e2d2917f075255ec926e0485972f4e2	2020-07-06 16:30:14 -07:00
Jerry Zhang	e89c4f0dec	[quant] Fix fuse linear pass (#40549 ) (#40751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40549 Currently we didn't check if %weight_t is produced by `aten::t`, this will fuse some `matmul`/`addmm` that is not 2d to `aten::linear`, which is incorrect Test Plan: Imported from OSS Differential Revision: D22225921 fbshipit-source-id: 9723e82fdbac6d8e1a7ade22f3a9791321ab12b6	2020-07-02 10:23:22 -07:00
Jerry Zhang	ea273c68f9	Inplace construct of TorchScript Module and inplace option for quantization (#40750 ) * [WIP][JIT] Add ScriptModule._reconstruct (#39979) Summary: Summary This commit adds an instance method `_reconstruct` that permits users to reconstruct a `ScriptModule` from a given C++ `Module` instance. Testing This commit adds a unit test for `_reconstruct`. Fixes This pull request fixes https://github.com/pytorch/pytorch/issues/33912. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39979 Differential Revision: D22172323 Pulled By: SplitInfinity fbshipit-source-id: 9aa6551c422a5a324b822a09cd8d7c660f99ca5c * [quant][graphmode] Enable inplace option for top level API (#40414) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40414 after `_reconstruct` is supported in RecursiveScriptModule: https://github.com/pytorch/pytorch/pull/39979 we can support inplace option in quantization API Test Plan: Imported from OSS Differential Revision: D22178326 fbshipit-source-id: c78bc2bcf2c42b06280c12262bb31aebcadc6c32 Co-authored-by: Meghan Lele <meghanl@fb.com>	2020-07-02 10:22:45 -07:00
Jerry Zhang	4dd37bfbf7	[jit] Remove unnecessary clone APIs for script::Module and RecursiveScriptModule (#40297 ) (#40748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40297 Test Plan: Imported from OSS Differential Revision: D22191660 fbshipit-source-id: 4b338ca82caaca04784bffe01fdae3d180c192f4	2020-07-02 10:22:27 -07:00
Nikita Shulga	b4b8f5b9d4	Release GIL during DDP construction. (#40877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40495 As part of debugging flaky ddp_under_dist_autograd tests, I realized we were running into the following deadlock. 1) Rank 0 would go into DDP construction, hold GIL and wait for broadcast in DDP construction. 2) Rank 3 is a little slower and performs an RRef fetch call before the DDP construction. 3) The RRef fetch call is done on Rank 0 and tries to acquire GIL. 4) We now have a deadlock since Rank 0 is waiting for Rank 3 to enter the collective and Rank 3 is waiting for Rank 0 to release GIL. ghstack-source-id: 106534442 Test Plan: 1) Ran ddp_under_dist_autograd 500 times. 2) waitforbuildbot Differential Revision: D22205180 fbshipit-source-id: 6afd55342e801b9edb9591ff25158a244a8ea66a Co-authored-by: Pritam Damania <pritam.damania@fb.com>	2020-07-01 13:36:50 -07:00
Wanchao	41816dc97f	[1.6] Fix dictConstruct ordering and enable dict mix (#40797 ) A combination of https://github.com/pytorch/pytorch/pull/39601 and https://github.com/pytorch/pytorch/pull/40424 both are approved and merged in master	2020-07-01 09:30:16 -07:00
Mike Ruberry	ddea6c552f	Ports full dtype inference deprecation to 1.6 (#40799 ) * ports full deprecation * fixtures * Fixes lint * Trying to fix phantom lint issue * nuclear lint option * Paradoxical linter fix Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2020-07-01 09:27:27 -07:00
Mikhail Zolotukhin	091537a764	[JIT][1.6] Shape analysis fixes. (#40716 ) * [JIT] Update type of the unsqueeze's output in shape analysis. * [JIT] Fix shape analysis for aten::masked_select. The reference says that this op always returns a 1-D tensor, even if the input and the mask are 0-D.	2020-07-01 08:41:05 -07:00
peterjc123	415e499330	Fix zip serialization for file > 2GiB for Windows (#40852 )	2020-07-01 08:36:40 -07:00
Mike Ruberry	75a074abdc	1.6 Port: Dynamic Versioning (#40542 ) Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2020-06-30 10:18:18 -07:00
James Reed	0c90b6da5c	[1.6 cherrypick] Fix zip serialization for file > 2GiB (#40757 ) * [1.6 cherrypick] Fix zip serialization for file > 2GiB * Update test/test_serialization.py Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>	2020-06-30 07:10:02 -07:00
wconstab	fe45c2c986	Allow slicing sequential container (#40538 ) - fixes #38034 - works around missing slice functionality in Sequential by casting to tuple and slicing that instead - supports iterating on the resulting slice but not call()	2020-06-29 19:29:19 -07:00
peterjc123	ea1b0dba18	Remove constexpr for NVCC on Windows (#40676 )	2020-06-29 13:48:50 -07:00
eellison	8682ac147b	Docs merge (#40569 ) Co-authored-by: Elias Ellison <eellison@fb.com>	2020-06-26 12:24:08 -07:00
mrshenli	0dc93ac119	[v1.6.0 patch] Install method docstrings from PyRRef to RRef (#40620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40461 It turned out `:inheried-members:` (see [doc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass)) is not really usable. Because pybind11 generates a docstring that writes `self` as parent class, `rpc.PyRRef`, type. As a workaround, I am pulling docstrings on parent-class, `PyRRef` class, into subclass, `RRef`. And do surgery on the docstring generated by pybind11. {F241283111} ghstack-source-id: 106472496 P134031188 Differential Revision: D7933834 fbshipit-source-id: c03a8a4c9d98888b64492a8caba1591595bfe247 Co-authored-by: Shihao Xu <shihaoxu@fb.com>	2020-06-26 12:15:28 -07:00
Ilia Cherniavskii	d8c384544e	Destroy CUDA events after profiling (#39962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39962 Adding a simple wrapper with ref count for cuda event and destroying cuda event after the last copy is destroyed Test Plan: CI cuda profiler tests Differential Revision: D22027092 Pulled By: ilia-cher fbshipit-source-id: e0810388aa60b2291eb010896e13af1fad92e472	2020-06-23 10:44:39 -07:00
Jerry Zhang	f652abc1dd	[jit] Enable `copy.deepcopy` and `copy.copy` for RecursiveScriptModule (#32685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32685 att Test Plan: . Imported from OSS Differential Revision: D21220755 fbshipit-source-id: 5c71e9bb9f43032cf60563a9e67579118a8d7e33	2020-06-23 09:21:12 -07:00
Pritam Damania	54c05fa34e	Add basic GPU support to distributed autograd. (#40312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40312 As part of https://github.com/pytorch/pytorch/issues/40255, we realized that GPU support for distributed autograd was broken as part of our multithreaded autograd change. To fix this in the short term for 1.6, this PR includes the following changes: 1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the autograd graph. 2) The long lived CPU thread has its own ready_queue and this queue is used for all GraphTasks created by DistEngine. 3) In thread_main(), the CPU thread cannot exit once the GraphTask is done processing because of the new CPU thread added in 1). 4) To resolve this, thread_main() now has a parameter `device_thread` instead of `reentrant_thread`. When device_thread is True, we expect this to be a long lived device thread that does not exit. 5) When device_thread is False, thread_main is expected to run a GraphTask and return once done. ghstack-source-id: 106391329 Test Plan: waitforbuildbot Differential Revision: D22146183 fbshipit-source-id: dd146b7a95f55db75f6767889b7255e9d62d5825	2020-06-23 07:49:00 -07:00
Luca Wehrstedt	78b3d5f878	[TensorPipe] Register multiplexing channel over UV (#40389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40389 The `mpt_uv` channel MultiPlexes over a Transport, namely the UV one. What this means is that it takes a tensor, chunks it into equal parts and sends each of them on a separate UV connection, each running in a separate UV loop. Thus they each have their own socket and thread. This allows them to reach bandwidths that go beyond what a simple single-threaded approach can do, which is necessary to reach the high bandwidths of some modern NICs. ghstack-source-id: 106375511 Test Plan: Ran a few manual tests myself, for the rest relied on the PyTorch RPC tests. Differential Revision: D22144380 fbshipit-source-id: ef555fa04c6f13a4acf3bd5f7b03d04d02460d38	2020-06-23 00:24:17 -07:00
Jerry Zhang	ba89a89376	[quant][graphmode][refactor] InsertQuantDeQuantHelper (#40384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40384 Test Plan: Imported from OSS Differential Revision: D22164072 fbshipit-source-id: 0ca86265cfef1afa99dd860a452f3dd76e31792a	2020-06-22 21:30:17 -07:00
Vitaly Fedyunin	7bf1dd582a	Fix Cuda IPC deadlock (#40347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40347 Fixes: #39541 Fixes: #25301 Differential Revision: D22152662 Test Plan: Imported from OSS Pulled By: VitalyFedyunin fbshipit-source-id: 82548aa4c937e0260932244e78cb132bcb3209b3	2020-06-22 20:50:25 -07:00
Jerry Zhang	18122facb9	[quant][graphmode] Add warning for debug option for add_scalar/mul_scalar (#40383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40383 debug option is not supported for these cases, so we print a warning if it occurs Test Plan: Imported from OSS Differential Revision: D22164071 fbshipit-source-id: 90459530f4efdd6d255df4f015606cb0e9070cd3	2020-06-22 20:29:44 -07:00
Jerry Zhang	64f925eb0c	[quant][graphmode] Add support for functional linear (#40331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40331 Test Plan: Imported from OSS Differential Revision: D22162905 fbshipit-source-id: 3e0320d5f5c267c778af8e2fe4224f8383aab2c8	2020-06-22 18:05:06 -07:00
Michael Carilli	8066fba226	[RELAND2] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40358 ) Summary: https://github.com/pytorch/pytorch/pull/40129 fixed the error responsible for the first revert, but exposed another error in the same test. This PR is intended as the "master copy" for merge, and it runs on full CI. Two other PRs (restricted to run on a small subset of CI) supporting debugging DDP failures/hangs with multiple devices per process (`test_c10d.py:DistributedDataParallelTest.test_grad_layout_1devicemodule_2replicaperprocess`). - https://github.com/pytorch/pytorch/pull/40290 tries the test with purely rowmajor contiguous params on an untouched master. In other words https://github.com/pytorch/pytorch/pull/40290 contains none of this PR's diffs aside from the test itself. - https://github.com/pytorch/pytorch/pull/40178, for comparison, tries the test with this PR's diffs. Both fail the same way, indicating failure is unrelated to this PR's other diffs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40358 Differential Revision: D22165785 Pulled By: albanD fbshipit-source-id: ac7cdd79af5c080ab74341671392dca8e717554e	2020-06-22 17:13:21 -07:00
anjali411	8ec2ae9a9f	Add view_as_real, view_as_complex for complex tensors (#39099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39099 Test Plan: Imported from OSS Differential Revision: D22057886 Pulled By: anjali411 fbshipit-source-id: bad5ba7097ba0dd13f2c549b2463094dee9afa14	2020-06-22 15:15:27 -07:00
Alban Desmaison	02ae9a1583	add TypeError to c10 and fix segfault in error checking in Tensor constructor (#40106 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40106 Differential Revision: D22137193 Pulled By: albanD fbshipit-source-id: 11d059263c00a834211f016bd9a9e18fdc0437ef	2020-06-22 13:42:44 -07:00
Zhang, Xiaobing	87c5f02f3d	jit: Conv3d + BatchNorm3d fusion (#40082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40082 Differential Revision: D22120340 Pulled By: jerryzh168 fbshipit-source-id: fce6c5f03fe7ab6c60620cbdf547d5a466a470e3	2020-06-22 11:15:52 -07:00
Rohan Varma	14f7e95c1a	Add prefix of remote events for RPC profiling (#40066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40066 Builds on top of the previous PR to ensure that all remotely profiled events are prefixed with the key for the RPC that generated them. The key is generated by the result of `_build_rpc_profiling_key` in `rpc/internal.py` and prefixed onto the event name. In order to do this, we set the current-key when creating the RPC in Python, retrieve the currently-set key in C++ and save a GloballyUniqueId -> key mapping to an in-memory map. When we receive an RPC with profiling information, we expect to receive this ID back, and look up the corresponding profiling key in the map. The key is then added to all the remote events. Tested by adding tests to ensure the key is added to all the remote events. Also added a UT which tests in under the multi-threading scenario, to ensure that the mapping's correctness is maintained when several RPCs are in the process of being created at once. ghstack-source-id: 106316106 Test Plan: Unit test Differential Revision: D22040035 fbshipit-source-id: 9215feb06084b294edbfa6e03385e13c1d730c43	2020-06-22 11:01:07 -07:00
BowenBao	eaa91071ca	[ONNX] Support large attribute and subgraph for large model (#38793 ) Summary: Previously large tensor data in attributes and subgraphs are not stored externally. ONNX won't be able to serialize the model for cases where the total size sums up to >= 2GB. This PR enables that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38793 Reviewed By: hl475 Differential Revision: D22111092 Pulled By: houseroad fbshipit-source-id: 355234e50825d576754de33c86a9690161caaeaf	2020-06-22 10:34:37 -07:00
Edward Yang	e4766fb4d9	Meta tensors, but without code deduplication (#38490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38490 A meta tensor is a tensor that is a lot like a normal tensor, except it doesn't actually have any data associated with it. You can use them to carry out shape/dtype computations without actually having to run the actual code; for example, this could be used to do shape inference in a JIT analysis pass. Check out the description in DispatchKey.h for more information. Meta tensors are part of a larger project to rationalize how we write kernels so that we don't have to duplicate shape logic in CPU kernel, CUDA kernel and meta kernel (this PR makes the duplication problem worse!) However, that infrastructure can be built on top of this proof of concept, which just shows how you can start writing meta kernels today even without this infrastructure. There are a lot of things that don't work: - I special cased printing for dense tensors only; if you try to allocate a meta sparse / quantized tensor things aren't going to work. - The printing formula implies that torch.tensor() can take an ellipsis, but I didn't add this. - I wrote an example formula for binary operators, but it isn't even right! (It doesn't do type promotion of memory layout correctly). The most future proof way to do it right is to factor out the relevant computation out of TensorIterator, as it is quite involved. - Nothing besides torch.add works right now - Meta functions are ALWAYS included in mobile builds (selective build doesn't work on them). This isn't a big deal for now but will become more pressing as more meta functions are added. One reason I'm putting up this PR now is to check with Yinghai Lu if we can unblock shape inference for accelerators, while we are still working on a long term plan for how to unify all shape computation across our kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21935609 Pulled By: ezyang fbshipit-source-id: f7d8636eeb8516b6bc296db99a16e56029972eee	2020-06-22 09:18:33 -07:00
Vasiliy Kuznetsov	ab8a99bd36	graph mode: add hardswish inplace handling (#40284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40284 Adds graph mode handling for inplace hardswish, and test coverage for functional hardswish. Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_hardswish ``` Imported from OSS Differential Revision: D22140628 fbshipit-source-id: 55a514f7dc1130d510f69ee4e611d7cb5e08d02e	2020-06-21 09:40:50 -07:00
Vasiliy Kuznetsov	c6dbfcaf9e	quantized elu: graph mode handling (#40111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40111 Adds graph mode handling for quantized elu. Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_elu ``` Imported from OSS Differential Revision: D22075080 fbshipit-source-id: 37fb1b9e390f2a33d47cbd025157532379b6aa64	2020-06-21 09:40:48 -07:00
Vasiliy Kuznetsov	13d54c6471	quantized elu: require observation (#40100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40100 ELU has a range of [-1, inf]. In the original PR which added the quantized operator we decided to pass the quantization params from the input. However, it makes more sense to require observation for this op. This PR changes the API to require observation. Next PRs in this stack will add the eager and graph mode handling. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qelu ``` Imported from OSS Differential Revision: D22075083 fbshipit-source-id: 0ea0fd05a00cc7a5f122a2b1de09144bbd586f32	2020-06-21 09:38:28 -07:00
Ivan Kobzarev	3852215170	[vulkan] jit passes for vulkan conv2 prepack and fuse with clamp (#39282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39282 Test Plan: Imported from OSS Differential Revision: D21962424 Pulled By: IvanKobzarev fbshipit-source-id: 2d20e827d2c3836b7e6b443293377c68dc1ffa5a	2020-06-20 14:12:21 -07:00
Zafar	9da277c635	[quant][graphmodel] linear_relu (#40021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40021 This replaces #36889 due to significant merge conflicts Test Plan: Imported from OSS Differential Revision: D22087061 Pulled By: z-a-f fbshipit-source-id: 6a65cdd3c0c0c957968a9d017902fb6d03b58150	2020-06-19 23:32:54 -07:00
Jerry Zhang	e04a611b91	[quant][graphmode] clang format changes (#40329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40329 Test Plan: Imported from OSS Differential Revision: D22149706 fbshipit-source-id: 3c07cb0c09a53a01fc69185943ddc409264a6ff5	2020-06-19 23:22:43 -07:00

1 2 3 4 5 ...

5620 Commits