pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Salil Desai	bc68625151	[Vulkan] Add support for Optimization Blocklist to Vulkan Rewrite (#87431 ) Optimization Blocklist will be used in a future diff (D40315730) to make the rewrite to transfer input/output backends optional Differential Revision: [D40315729](https://our.internmc.facebook.com/intern/diff/D40315729/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87431 Approved by: https://github.com/mcr229, https://github.com/digantdesai	2022-10-31 14:15:51 +00:00
Edward Z. Yang	1ff52225f1	Unify SymIntNode and SymFloatNode into SymNode (#87817 ) This refactor was prompted by challenges handling mixed int/float operations in C++. A previous version of this patch added overloads for each permutation of int/float and was unwieldy https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different approach. The general outline of the patch is to combine the C++ types SymIntNode and SymFloatNode into a single type, SymNode. This is type erased; we no longer know statically at C++ if we have an int/float and have to test it with the is_int()/is_float() virtual methods. This has a number of knock on effects. - We no longer have C++ classes to bind to Python. Instead, we take an entirely new approach to our Python API, where we have a SymInt/SymFloat class defined entirely in Python, which hold a SymNode (which corresponds to the C++ SymNode). However, SymNode is not pybind11-bound; instead, it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode when it goes into C++. This implies a userland rename. In principle, it is also possible for the canonical implementation of SymNode to be written in C++, and then bound to Python with pybind11 (we have this code, although it is commented out.) However, I did not implement this as we currently have no C++ implementations of SymNode. Because we do return SymInt/SymFloat from C++ bindings, the C++ binding code needs to know how to find these classes. Currently, this is done just by manually importing torch and getting the attributes. - Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now takes SymInt/SymFloat, rather than SymNode, bringing it in line with how __torch_dispatch__ works. Some miscellaneous improvements: - SymInt now has a constructor that takes SymNode. Note that this constructor is ambiguous if you pass in a subclass of SymNode, so an explicit downcast is necessary. This means toSymFloat/toSymInt are no more. This is a mild optimization as it means rvalue reference works automatically. - We uniformly use the caster for c10::SymInt/SymFloat, rather than going the long way via the SymIntNode/SymFloatNode. - Removed some unnecessary toSymInt/toSymFloat calls in normalize_* functions, pretty sure this doesn't do anything. - guard_int is now a free function, since to guard on an int you cannot assume the method exists. A function can handle both int and SymInt inputs. - We clean up the magic method definition code for SymInt/SymFloat/SymNode. ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets plain methods; this is to help avoid confusion between the two types. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817 Approved by: https://github.com/albanD, https://github.com/anjali411	2022-10-27 20:56:02 +00:00
samdow	169ec120ef	[Modes] refactor modes to only use a stack in cpp (#86458 ) Refactors the mode code to only have the C++ mode stack and not the "C++ mode" like we originally had. This also simplifies the mode logic in a number of places Pull Request resolved: https://github.com/pytorch/pytorch/pull/86458 Approved by: https://github.com/zou3519	2022-10-21 19:18:23 +00:00
albanD	12b2f70a89	Symintify pad ops (#87046 ) Following comments below, we need to add support for `std::negate`/`std::min`/`std::max`/`operator-` for SymInt Pull Request resolved: https://github.com/pytorch/pytorch/pull/87046 Approved by: https://github.com/ezyang	2022-10-19 21:43:08 +00:00
lezcano	48f0231223	Fix Scalar(bool) handling in toIValue (#87179 ) At the moment, they were casted to `int64`, which breaks quite a few casting rules for example in `ops.aten`. Quite a vintage bug, circa 2020. With this fix, the following code prints `torch.bool`, rather than `torch.int64`. ```python import torch msk = torch.tensor([False]) b = torch.tensor([False]) print(torch.ops.aten.where.ScalarSelf(msk, True, b).dtype) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87179 Approved by: https://github.com/albanD	2022-10-18 18:53:03 +00:00
albanD	c21dcffc00	Very limited pow support (#87042 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87042 Approved by: https://github.com/ezyang	2022-10-17 13:14:07 +00:00
albanD	3a4c0900c7	Reland 3 of Merge more symbolic meta kernels and symint changes from branch (#86795 ) Take 3 Contains: - symintification of split* - floor support on SymFloat - pad_backward, gather, scatter meta Pull Request resolved: https://github.com/pytorch/pytorch/pull/86795 Approved by: https://github.com/z-a-f	2022-10-17 02:09:40 +00:00
tangleintel	7980ed95bd	Support unpacking python dictionary in torch.jit.trace() (#81623 ) # Support unpacking python dictionary in torch.jit.trace() ## Problem statement & Motivation ### Problem 1(usability): Say, if you have a model and its forward method defined as follows: `def forward(self, key1=value1, key2=value2, key3=value3)` And you have a dataset and each data point in the dataset is a python dict as follows: `data = {key1:value1, key3:value3, key2:value2}` The problem is that if you want to trace the model using the dict data by the giving dataset, you need unpack the dictionary and reorder its value manually and make up a tuple as `data_tuple = (value1, value2, value3)` as the `example_inputs` parameter of `torch.jit.trace()`. This marshalling process is not user friendly. ### Problem 2 (feasibility): Say, if you have a model and its forward method defined as follows: `def forward(self, key1=None, key2=None, key3=None)` -> The default value is None And you have a dataset and each data point in the dataset is a python dict as follows: `data = {key1:value1, key3:value3}` -> Only part of the required value by forward was given, the rest use the default value. The problem is that if you want to trace the model using the dict data by the giving dataset, it's not feasible at all. Cause neither you can pass a tuple like `T1 = (value1, value3)` nor `T2 = (value1, None, value3)`. T1 will mismatch value3 with key2 and T2 include None type which will be blocked by tracer's type checking. (Of course you can pass `T3 = (value1,)` to make the trace function finish without exception, but the traced model you get probably is not what you expect cause the different input may result in different traced result.). These problems come from the HuggingFace's PT model, especially in text-classification tasks with datasets such as [MRPC,](https://paperswithcode.com/dataset/mrpc) [MNLI](https://paperswithcode.com/dataset/multinli) etc. ## Solution To address these two issues, we propose to support a new type, that is, python dict as example_inputs parameter for torch.jit.trace(). We can base on the runtime type information of the example_inputs object to determine if we fall back to the original tuple path or go into the new dictionary path. Both problem 1 and problem 2 can be solved by utilizing the "``" operator. ## Limitation & Mitigation 1. If we use dict as example_inputs to trace the model, then we have to pass a dictionary to the traced model too. (Cause probably we will change the order of debug name of the input parameter in torchscript IR, thus we can't assume the traced model's input parameters order are the same with the original model.). We need highlight this too in the document to mitigate this problem. For example: ``` # fetch a data from dataloader, and the data is a dictionary # and the example_inputs_dict is like: {key1:value1, key3:value3, key2:value2} # the forward() is like: def forward(self, key1=value1, key2=value2, key3=value3) example_inputs_dict = next(iter(dataloader)) jit_model = model.eval() # use the dictionary to trace the model jit_model = torch.jit.trace(jit_model, example_inputs_dict, strict=False) # Now the IR will be graph(%self : __torch__.module.___torch_mangle_n.Mymodule, %key1 : type1, %key3 : type3, %key2 : type2) jit_model = torch.jit.freeze(jit_model) # It's OK to use dict as the parameter for traced model jit_model(example_inputs_dict) example_inputs_tuple = (value1, value3, value2) # It's wrong to rely on the original args order. jit_model(example_inputs_tuple) ``` ## Note 1. This PR will make some UT introduced in [39601](https://github.com/pytorch/pytorch/pull/39601) fail, which I think should be classified as unpacking a tuple containing a single dictionary element in our solution. 4. I think there is ambiguity since currently we only specify passing a tuple or a single Tensor as our example_inputs parameter in torch.jit.trace()*'s documentation, but it seems we can still passing a dictionary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81623 Approved by: https://github.com/davidberard98	2022-10-15 05:33:09 +00:00
BowenBao	45274c56a4	[ONNX] Partially re-enable RoiAlign and RoiPool unit tests (#86169 ) This PR depends on https://github.com/pytorch/vision/pull/6685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86169 Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock	2022-10-13 14:39:44 +00:00
albanD	66cab5245f	Reland 2 min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86797 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86797 Approved by: https://github.com/bdhirsh	2022-10-13 00:31:19 +00:00
PyTorch MergeBot	2aa981ab74	Revert "Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334 ) (#86488 )" This reverts commit `978b46d7c9`. Reverted https://github.com/pytorch/pytorch/pull/86488 on behalf of https://github.com/osalpekar due to Broke executorch builds internally with the following message: RuntimeError: Missing out variant for functional op: aten::split.Tensor(Tensor(a -> *) self, SymInt split_size, int dim=0) -> Tensor(a)[] . Make sure you have loaded your custom_ops_generated_lib	2022-10-11 23:39:50 +00:00
PyTorch MergeBot	811b8e012b	Revert "min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86643 )" This reverts commit `86f914e996`. Reverted https://github.com/pytorch/pytorch/pull/86643 on behalf of https://github.com/osalpekar due to Need to revert this to cleanly revert https://github.com/pytorch/pytorch/pull/86488. This should be safe to re-land later	2022-10-11 23:12:40 +00:00
albanD	86f914e996	min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86643 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86643 Approved by: https://github.com/anjali411	2022-10-11 17:37:30 +00:00
albanD	978b46d7c9	Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334 ) (#86488 ) symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops add meta_bernoulli_ meta kernel for at::gather get pytorch_struct to pass: meta for scatter_add, fix backward symintify split ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/86488 Approved by: https://github.com/ezyang	2022-10-10 15:54:28 +00:00
PyTorch MergeBot	75df4b5e3d	Revert "Merge more symbolic meta kernels and symint changes from branch (#86334 )" This reverts commit `08e3999fa4`. Reverted https://github.com/pytorch/pytorch/pull/86334 on behalf of https://github.com/seemethere due to Trying to revert https://github.com/pytorch/pytorch/pull/86207, this PR causes merge conflicts with the initial revert so will have to revert this as well	2022-10-07 16:03:30 +00:00
Brian Hirsh	08e3999fa4	Merge more symbolic meta kernels and symint changes from branch (#86334 ) symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops add meta_bernoulli_ meta kernel for at::gather get pytorch_struct to pass: meta for scatter_add, fix backward symintify split ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/86334 Approved by: https://github.com/ezyang	2022-10-06 23:29:04 +00:00
Edward Z. Yang	79dd621f76	Symbolic shapes mega merge PR (Oct 3) (#86160 ) - TensorGeometry supports symint - check_size supports symint - functorch batch rule improved symint - Some operator support for symint in LTC - More supported operations on SymInt and SymFloat - More symint support in backwards formulas This merge includes code contributions from bdhirsh and anjali411. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86160 Approved by: https://github.com/Chillee	2022-10-04 04:12:09 +00:00
Horace He	82d9592f1b	Batch of symintifications to allow more models to pass in inference (#86104 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86104 Approved by: https://github.com/ezyang	2022-10-04 04:01:58 +00:00
Edward Z. Yang	cb87983cb8	Decay integer-only (Optional)SymIntArrayRef to IntList in IValue (#86094 ) We have logic that says if you ask for a SymIntList from an IValue, but the IValue is actually an IntList, we will still give it to you in that case (check ivalue_to_arg in aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h). However, we also need the inverse version of this logic, which says that if you construct an IValue from a SymIntArrayRef, and it is actually integer only, we need to store it as an IntList, so that toIntList on the IValue will work. The way this works is a bit twisty, but our basic strategy is to disable construction of IValue from list container types that contain SymInt directly, and then directly implement variants of these constructors by hand, which iterate over the elements of the list and test if there are any SymInts or not to decide what type to construct the underlying List. These variants have to be templated, otherwise we will run afoul ambiguous overloads. I only did the overloads that actually occurred in practice; you may need to add more if you SymIntify more stuff. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86094 Approved by: https://github.com/anjali411, https://github.com/albanD	2022-10-03 20:12:32 +00:00
Edward Z. Yang	8753703b68	Fix some bugs in SymFloat IValue and toPyObject handling (#86072 ) - Test for symbolic cases first before non-symbolic, as symbolic ints/floats advertise as being ints/floats - Add missing case for toPyObject Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86072 Approved by: https://github.com/wconstab	2022-10-03 02:06:38 +00:00
Edward Z. Yang	365498f673	Add rmod support to SymIntNode (#86053 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86053 Approved by: https://github.com/wconstab	2022-10-02 02:53:49 +00:00
Edward Z. Yang	0060d871df	Add a bunch of extra functionality to SymFloat (#86046 ) - SymInt to SymFloat conversion - All the basic arithmetic operators on c10::SymFloat Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86046 Approved by: https://github.com/wconstab	2022-10-02 02:53:46 +00:00
Edward Z. Yang	07800c9c81	Miscellaneous fixes from symbolic-shapes branch (#86042 ) - Make toIValue accept SymIntNode and SymFloatNode where number (aka Scalar) is expected - Binding for symintlistOptional in python arg parser - Teach translate to convert from IntArrayRef to ArrayRef<int64_t> - Don't query _symint function for meta info in LTC unless LTC is code generating a symint function Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86042 Approved by: https://github.com/Chillee	2022-10-01 13:57:58 +00:00
Will Constable	d003757a84	Clone symint on set_sizes_and_strides (#85878 ) From the perspective of having valid sympy expressions for any given size/stride property, we can have tensors inherit SymInts from each other (in cases where the size expression is unchanged, which is a common case). But we also use SymInts to let us build graph traces of our programs, and we need to be able to trace from a SymInt back to the tensor that it originated from in order to trace correct graphs. This change ensures each tensor starts with fresh SymInts. - note: our policy has already been to use PySymIntNode objects to store pointers to proxy-tracer objects for use during tracing - before making this change (to clone symints), sometimes we'd attempt to store more than one proxy-tracer object on the same symint and the last-stored one would clobber all the earlier ones. This would result in tracing the wrong graph in some cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85878 Approved by: https://github.com/ezyang	2022-09-30 16:10:31 +00:00
Edward Z. Yang	61b4e8a7bf	More SymFloat support (#85411 ) - Support storing SymFloat in IValue - Add SymFloat to JIT type system (erases to float) - Printing support for SymFloat - add/sub/mul/truediv operator support for SymFloat - Support truediv on integers, it returns a SymFloat - Support parsing SymFloat from Python object Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/85411 Approved by: https://github.com/albanD	2022-09-22 08:07:22 +00:00
Nikita Shulga	c05ca0dbf2	[torch.futures] Fix nullptr deref (#85304 ) `torch.jit.wait(None)` and `torch.futures.collect_all((None,))` should not crash. Fixes https://github.com/pytorch/pytorch/issues/85237 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85304 Approved by: https://github.com/kit1980	2022-09-20 01:49:04 +00:00
Edward Z. Yang	8c9d7fabd6	Add SymInt::guard_int (#85139 ) This allows you to explicitly guard on the specific integer value of a SymInt so that you can condition on it. If possible, prefer guarding on a boolean expression instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/85139 Approved by: https://github.com/Chillee	2022-09-17 16:05:07 +00:00
Michael Voznesensky	8ca1839d32	Python Dispatcher integration with C++ dispatcher (#85050 ) #84826 but without ghstack Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050 Approved by: https://github.com/malfet	2022-09-15 00:43:36 +00:00
PyTorch MergeBot	706b990306	Revert "Python Dispatcher integration with C++ dispatcher (#84826 )" This reverts commit `35f6a69191`. Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see `35f6a69191`	2022-09-14 14:07:58 +00:00
Michael Voznesensky	35f6a69191	Python Dispatcher integration with C++ dispatcher (#84826 ) Signed-off-by: Edward Z. Yang <ezyangfb.com> From @ezyang's original PR: There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients: We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch. I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful. I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826 Approved by: https://github.com/ezyang	2022-09-14 06:57:19 +00:00
Edward Z. Yang	7e900f204f	Avoid throwing an exception when ScriptList doesn't match. (#84921 ) This prevents 'catch throw' gdb breakpoint pollution and should also improve performance. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84921 Approved by: https://github.com/Chillee	2022-09-13 14:40:01 +00:00
Edward Z. Yang	7a9ab5c232	Move Python argument related functions to cpp file (#84919 ) No changes to contents, just moving things out of header. I only moved the stuff I suspected I'd be editing; maybe more things from this header could migrate out. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84919 Approved by: https://github.com/suo	2022-09-13 07:22:23 +00:00
Wenzhe Xue	a2cccb2d6b	add oneDNN graph fuser context API and unittest (#82491 ) ### Description Add oneDNN graph context manager API to be consistent with other fusers. NNC and nvFuser have two ways to use: 1) a function to enable/disable and 2) a context manager. And the later way is used extensively in libraries like Dynamo. Currently oneDNN Graph fuser only has the former way. To promote the usage of oneDNN graph fuser, this PR creates the context manager for oneDNN graph fuser. This PR should not affect any performance. ### Testing A unit-test `test_context_manager` is added under `test/test_jit_llga_fuser.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82491 Approved by: https://github.com/malfet	2022-09-12 20:09:00 +00:00
Peter Bell	2feb31cb26	Improve torch::jit::as_{module,object} performance (#84399 ) This caches the import of `torch.jit.ScriptModule`, `torch.ScriptObject` and `torch.jit.RecursiveScriptClass`. I measure a ~0.8 us performance uplift locally when calling a `torch.ops` function with a `ScriptObject` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84399 Approved by: https://github.com/ezyang	2022-09-07 16:58:28 +00:00
Peter Bell	f125bd2cbb	Support torch.ScriptObject in torch::jit::as_object (#84398 ) When a torchbind class is returned from an operator, it has the class `torch.ScriptObject`, yet the `torch.ops` interface checks against `torch.jit.RecursiveScriptClass` or else falls back to a much slower path that doesn't return the original c++ object. On my machine I see a 2 us performance improvement when calling a `torch.ops` function with a `ScriptObject` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84398 Approved by: https://github.com/ezyang	2022-09-06 15:00:52 +00:00
Edward Z. Yang	2a332afbf4	Add SymFloat, support SymInt to SymFloat conversion (#84284 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284 Approved by: https://github.com/albanD	2022-09-03 01:30:32 +00:00
Elias Ellison	97b2dff600	Add Initial Support For Fake Tensor Constant Tracking (#84387 ) Adds support for constant tensor tracking within FakeTensors. Copy-pasta'ing from `proxy_tensor.py` why this is useful: ``` # In some circumstances, we will be tracing in a situation where a tensor # is statically known to be a constant (currently, this only happens if # you run torch.tensor; deterministic factory functions like torch.arange # don't get this treatment). When the tensor in question is small, it's # helpful to due constant propagation in case we call item() (in which # case we can return the constant value that is known, rather than give # an error.) ``` This PR only attempts to add support for the tracing scenarios where we run each operation linearly - aot autograd, torchdynamo. It does not yet handle how constant tensors should be handled as part of the persistent fx graph. Additionally, it does not yet attempt to de-duplicate or interact with ProxyMode's only constant tensor handling. Edit: plan is to rely on functionalization for fx graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/84387 Approved by: https://github.com/ezyang	2022-09-02 02:43:04 +00:00
Horace He	6a3ecda5a2	Started storing faketensor/symbolic shape metadata on FX nodes in make_fx (#84114 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84114 Approved by: https://github.com/SherlockNoMad	2022-08-31 04:39:48 +00:00
Edward Z. Yang	ad44670fa1	Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628 )" (#84173 ) Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature" Original commit changeset: dab4a9dba4fa Original commit changeset: dcaf16c037a9 Original Phabricator Diff: D38984222 Original Phabricator Diff: D39075159 Also update Metal registrations for C++ registration changes. Also update NNPI registration to account for tightened schema checking Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173 Approved by: https://github.com/Krovatkin	2022-08-29 18:01:07 +00:00
Kimish Patel	cfd18e105f	[Pytorch][Ondevice quantization] Add device side API to convert model (#83807 ) Summary: This diff adds device side API which will convert the model to its quantized equivalent. THe input model must have been prepared AOT for quantization. API is implemented by: - Running reset obervers - Running observe method - Running quantize method - And replacing method, e.g. forward, with its quantized equivalent. Test Plan: test/quantization/jit/test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38889818](https://our.internmc.facebook.com/intern/diff/D38889818) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83807 Approved by: https://github.com/iseeyuan	2022-08-29 17:57:38 +00:00
Kimish Patel	5c7e801c50	[pytorch][on device quant] Finalize method for ondevice quant (#83571 ) Summary: After inserting quant dequant nodes in the graph, we need 1. Insert packed param creation and quantized op 2. Create packed_params attribute in the top module. For this we need graph that inlined except for calculate_qparams method calls. But they can be inlined too. So perhaps we need to make sure no other callmethods exist. 3. Insert SetAttr for the packed param 4. Insert GetAttr for the packed param 5. Use GetAttr output for quantized op where applicable, e.g. linear_dynamic The above is added to quantize_<method-name> method created inprevious step. Once the above steps are done clone the method into quantized_<method-name> Modify quantize_<method-name>: 1. Remove all outputs from the method. 2. Run dce 3. Remove all inputs from the method except self. Modify quantized_<method-name>: 1. Remove all packed_param setAttr nodes. 2. Run dce. This should result in removal of all nodes that generate packed param. Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571 Approved by: https://github.com/jerryzh168	2022-08-29 17:53:11 +00:00
Kimish Patel	446afb5f9f	[On Device Quantization][pytorch]Make insert_quant_dequant support ondevice ptq (#83570 ) Summary: This diff adds a way to: - clone previously observed method - Add calls to observer's calculate_qparams methods - Extract the scale and zero point - Use them to insert quant dequant nodes Now for forward method we have - observe_forward - quantize_forward observe_forward is used post training to observer statistics. In the case of dynamic PTQ this requires just running that method once to update weight observer statistics. quantize_forward method will be used to use the observer statistics to calculate quantization parameters and apply that to quant dequant op. Subsequent diffs will replace dequant + op with their quantized op counter parts and replace quantize ops with relevant packed params class where possible Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771419](https://our.internmc.facebook.com/intern/diff/D38771419) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83570 Approved by: https://github.com/jerryzh168	2022-08-29 17:51:00 +00:00
Kimish Patel	9189edb3b3	[Quantization][Pytorch] On device quantization support part 1 (#83568 ) Summary: TO support on device quantization this diff introduces observer insertion. Specifically observers are inserted by adding new method with prefix observ_. Intent is that post training, this method will be run to record statistics Test Plan: test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771417](https://our.internmc.facebook.com/intern/diff/D38771417) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83568 Approved by: https://github.com/jerryzh168	2022-08-29 17:22:30 +00:00
Ivan Yashchuk	3aae6ff1e1	Add nvprims.var_mean (#83508 ) This PR adds nvfuser-specific primitive - `var_mean`. Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager. I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`). Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti): ```py import torch from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.executor import execute def func(a): return torch.native_layer_norm(a, (1024,), None, None, 1e-6) a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda") with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) for _ in range(10): execute(gm, a, executor="strictly_nvfuser"); ``` run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py` ```py # WITH THIS PR # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.033792 ms, achieved: 621.818 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.032608 ms, achieved: 644.396 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.03072 ms, achieved: 684 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # ON MASTER # kernel1 run in 0.05632 ms, achieved: 373.091 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043808 ms, achieved: 479.649 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s ``` So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape. Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`). Ref. https://github.com/pytorch/pytorch/issues/80187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508 Approved by: https://github.com/ngimel	2022-08-28 18:45:25 +00:00
PyTorch MergeBot	b159a5230f	Revert "Add nvprims.var_mean (#83508 )" This reverts commit `7e7694b661`. Reverted https://github.com/pytorch/pytorch/pull/83508 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-28 11:30:27 +00:00
Ivan Yashchuk	7e7694b661	Add nvprims.var_mean (#83508 ) This PR adds nvfuser-specific primitive - `var_mean`. Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager. I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`). Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti): ```py import torch from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.executor import execute def func(a): return torch.native_layer_norm(a, (1024,), None, None, 1e-6) a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda") with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) for _ in range(10): execute(gm, a, executor="strictly_nvfuser"); ``` run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py` ```py # WITH THIS PR # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.033792 ms, achieved: 621.818 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.032608 ms, achieved: 644.396 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.03072 ms, achieved: 684 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # ON MASTER # kernel1 run in 0.05632 ms, achieved: 373.091 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043808 ms, achieved: 479.649 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s ``` So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape. Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`). Ref. https://github.com/pytorch/pytorch/issues/80187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508 Approved by: https://github.com/ngimel	2022-08-27 09:05:20 +00:00
PyTorch MergeBot	c7edcd6968	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `9790d90e4b`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487	2022-08-27 01:23:17 +00:00
Edward Z. Yang	9790d90e4b	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-26 01:35:40 +00:00
PyTorch MergeBot	a7edf71360	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `8fae7027b3`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222	2022-08-25 00:49:40 +00:00
Edward Z. Yang	8fae7027b3	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-23 22:04:07 +00:00

1 2 3 4 5 ...

690 Commits