Commit Graph

102 Commits

Author SHA1 Message Date
Horace He
12a19a4846 Made tracing of proxy symints lazy (#85185)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85185
Approved by: https://github.com/ezyang
2022-09-17 20:36:35 +00:00
lezcano
98b8ef99e1 Add refs for sinc and sgn (#85142)
This PR superseded https://github.com/pytorch/pytorch/pull/80171

This does not add the ref for `special.sinc` as I was getting some
errors. This should be added to https://github.com/pytorch/pytorch/pull/84957
(cc @nkaretnikov)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85142
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-09-17 06:09:13 +00:00
Horace He
377b5d6f8b Added additional simplifications/caching for replacements and divisibility (#84918)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84918
Approved by: https://github.com/ezyang
2022-09-17 01:33:48 +00:00
Natalia Gimelshein
6162a04364 fix half_to_float arg in *softmax decomp (#85120)
Fixes https://github.com/pytorch/torchdynamo/issues/1239

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85120
Approved by: https://github.com/Chillee
2022-09-16 15:54:50 +00:00
Horace He
4bdc0af53d Added support for symbolic is_contiguous (#84829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84829
Approved by: https://github.com/ezyang
2022-09-16 04:54:01 +00:00
Edward Z. Yang
00ce302c07 Performance optimizations to proxy tensor (#85049)
- Lazily allocate FX nodes for size/stride accessors on proxy tensor
- Properly track derived computations on strides/numel/etc
- Remove unnecessary tree_map at end of proxy tensor trace checking
  invariants; we will just have to be smart (it's too expensive)
- Avoid tree_map in sym proxy tracing

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85049
Approved by: https://github.com/wconstab
2022-09-16 00:28:50 +00:00
Michael Voznesensky
8ca1839d32 Python Dispatcher integration with C++ dispatcher (#85050)
#84826 but without ghstack
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050
Approved by: https://github.com/malfet
2022-09-15 00:43:36 +00:00
Edward Z. Yang
ccade9410f Don't detach when making views; force caller to detach (#84893)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84893
Approved by: https://github.com/soulitzer, https://github.com/SherlockNoMad
2022-09-14 22:32:45 +00:00
PyTorch MergeBot
706b990306 Revert "Python Dispatcher integration with C++ dispatcher (#84826)"
This reverts commit 35f6a69191.

Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see 35f6a69191
2022-09-14 14:07:58 +00:00
Michael Voznesensky
35f6a69191 Python Dispatcher integration with C++ dispatcher (#84826)
Signed-off-by: Edward Z. Yang <ezyangfb.com>

From @ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826
Approved by: https://github.com/ezyang
2022-09-14 06:57:19 +00:00
PyTorch MergeBot
8ca057eb71 Revert "Don't detach when making views; force caller to detach (#84893)"
This reverts commit 3bb8d6a93c.

Reverted https://github.com/pytorch/pytorch/pull/84893 on behalf of https://github.com/malfet due to Broke MPS, see 3bb8d6a93c
2022-09-14 01:09:04 +00:00
Edward Z. Yang
3bb8d6a93c Don't detach when making views; force caller to detach (#84893)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84893
Approved by: https://github.com/soulitzer, https://github.com/SherlockNoMad
2022-09-13 23:31:21 +00:00
kshitij12345
4f6027b78a [opinfo] narrow: add new sample for Tensor overload (#84785)
`narrow` accepts `start` argument to be a Tensor. We add a sample to test this overload.

NOTE: This leads to a bunch of failed tests and hence the skips and xfails
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84785
Approved by: https://github.com/zou3519
2022-09-12 16:59:08 +00:00
Edward Z. Yang
c5a8946e40 Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)" (#84806)
This reverts commit ca3b2bfbe3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806
Approved by: https://github.com/Chillee
2022-09-10 06:17:35 +00:00
Eli Uriegas
ca3b2bfbe3 Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)
This reverts commit 591b75bf98.

Manual revert of https://github.com/pytorch/pytorch/pull/84641

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796
Approved by: https://github.com/izaitsevfb
2022-09-10 00:18:13 +00:00
Ivan Yashchuk
01c54ad6de Remove deprecated torch.eig (#70982)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.eig`.

cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70982
Approved by: https://github.com/Lezcano, https://github.com/malfet
2022-09-09 21:31:57 +00:00
Edward Z. Yang
591b75bf98 Redo how custom/python_custom methods on TensorImpl work (#84641)
A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.

The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.

In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.

Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
2022-09-09 13:41:13 +00:00
Horace He
1459a909b4 Added mv, mm, and binary_cross_entropy_with_logits decomps (#84451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84451
Approved by: https://github.com/ngimel
2022-09-08 17:56:18 +00:00
Richard Zou
a47bc96fb7 [composite compliance] fix linalg.eigvals (#84137)
linalg.eigvals fails in some cases with functorch and the root of the
problem is that it is not composite compliant.

In particular, checks that branch on whether or not a Tensor requires
grad do not work with functorch. In order to support functorch with
them, we have to include an additional "if the tensor is a Tensor
Subclass, then assume that it MAY require grad, so we must always go
through the differentiable path".

This PR also changes the batching rule for linalg.eigvals to be a
decomposition instead of what it was previously. What it was previously
was masking the error in functorch's test suite.

Unfortunately we don't comprehensive tests for this on the functorch
side which is why this was not caught before. I'll look into why
that is in the future; it's a bit complicated.

Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84137
Approved by: https://github.com/Lezcano, https://github.com/IvanYashchuk, https://github.com/samdow
2022-09-07 14:29:21 +00:00
Richard Zou
c771d73461 [composite compliance] fix max_pool1d (#84127)
max_pool1d has a fast path for CPU tensors that do not require grad that
directly accesses the data_ptr. This PR makes the change that if the
input Tensor is a Tensor Subclass, then we want to walk through the
"slow path" of calling max_pool1d_with_indices.

Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84127
Approved by: https://github.com/kshitij12345, https://github.com/samdow, https://github.com/malfet
2022-09-06 17:13:09 +00:00
Edward Z. Yang
2a332afbf4 Add SymFloat, support SymInt to SymFloat conversion (#84284)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284
Approved by: https://github.com/albanD
2022-09-03 01:30:32 +00:00
Edward Z. Yang
d39490a711 Add meta function for repeat (#84349)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84349
Approved by: https://github.com/Krovatkin
2022-09-01 20:44:52 +00:00
Edward Z. Yang
cd96f3f676 Use register_meta for everything in meta_registrations (#84297)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84297
Approved by: https://github.com/Chillee
2022-08-31 23:58:24 +00:00
PyTorch MergeBot
65f98eb47d Revert "Add meta function for repeat (#84349)"
This reverts commit 44bc6db8f8.

Reverted https://github.com/pytorch/pytorch/pull/84349 on behalf of https://github.com/janeyx99 due to Land race with the revert causing test_fx failures 44bc6db8f8
2022-08-31 18:27:59 +00:00
Horace He
85931eaa6b Rename fake_result to val (#84331)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84331
Approved by: https://github.com/ezyang
2022-08-31 17:44:18 +00:00
Edward Z. Yang
44bc6db8f8 Add meta function for repeat (#84349)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84349
Approved by: https://github.com/Krovatkin
2022-08-31 17:20:21 +00:00
PyTorch MergeBot
14093b5979 Revert "Use register_meta for everything in meta_registrations (#84297)"
This reverts commit 8cd296f680.

Reverted https://github.com/pytorch/pytorch/pull/84297 on behalf of https://github.com/suo due to broke test_proxy_tensor on master
2022-08-31 16:32:24 +00:00
Horace He
a27a4a02fe Refactored proxytensor to clean up separate branches (#84325)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84325
Approved by: https://github.com/ezyang
2022-08-31 09:37:55 +00:00
Horace He
6a3ecda5a2 Started storing faketensor/symbolic shape metadata on FX nodes in make_fx (#84114)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84114
Approved by: https://github.com/SherlockNoMad
2022-08-31 04:39:48 +00:00
Edward Z. Yang
8cd296f680 Use register_meta for everything in meta_registrations (#84297)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84297
Approved by: https://github.com/Chillee
2022-08-31 02:13:21 +00:00
Nikolay Korovaiko
eda217ab67 Reland symint_numel (#84281)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84281
Approved by: https://github.com/ezyang
2022-08-30 21:53:34 +00:00
Nikolay Korovaiko
44a975335e Revert "Re-land sym_numel (#82374) (#82726) (#82731) (#82855)" (#84207)
This reverts commit bfebf254dd.

Differential Revision: [D39104562](https://our.internmc.facebook.com/intern/diff/D39104562)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84207
Approved by: https://github.com/robieta
2022-08-30 13:22:58 +00:00
Edward Z. Yang
ad44670fa1 Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628)" (#84173)
Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature"

Original commit changeset: dab4a9dba4fa
Original commit changeset: dcaf16c037a9

Original Phabricator Diff: D38984222
Original Phabricator Diff: D39075159

Also update Metal registrations for C++ registration changes.

Also update NNPI registration to account for tightened schema checking

Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173
Approved by: https://github.com/Krovatkin
2022-08-29 18:01:07 +00:00
PyTorch MergeBot
c7edcd6968 Revert "Don't introduce new overload for SymInt (#83628)"
This reverts commit 9790d90e4b.

Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487
2022-08-27 01:23:17 +00:00
Edward Z. Yang
9790d90e4b Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-26 01:35:40 +00:00
Mario Lezcano
f5a3515083 Make linalg.inv composite of linalg.solve (#80074)
The `getri` kernel calls inside `getrs` so we can do so explicitly
ourselves and save ourselves from having to maintain an extra kernel.
This way we just need to optimise `lu_factor` and `lu_solve` and `inv`
will be as efficient as it can be, as it'll be choosing the best backend
to perform the factorisation and the best backend (not necessarily the
same) to perform the solve.

Fixes https://github.com/pytorch/pytorch/issues/77498

The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet
2022-08-25 09:28:55 +00:00
Horace He
e3c89d0778 Disable autocast cache during aotdispatch (#84035)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84035
Approved by: https://github.com/jansel
2022-08-25 08:58:50 +00:00
Horace He
8b8942b114 Fix dumb make_fx issue (#84011)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84011
Approved by: https://github.com/ezyang
2022-08-25 06:52:01 +00:00
PyTorch MergeBot
a7edf71360 Revert "Don't introduce new overload for SymInt (#83628)"
This reverts commit 8fae7027b3.

Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222
2022-08-25 00:49:40 +00:00
PyTorch MergeBot
5321bf52f2 Revert "Make linalg.inv composite of linalg.solve (#80074)"
This reverts commit 4737b33614.

Reverted https://github.com/pytorch/pytorch/pull/80074 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628
2022-08-25 00:43:00 +00:00
Mario Lezcano
3e6e0a1d10 Support a stable double backward on linalg.det for real inputs (#80217)
The complex case still fails. I do not know why.

Fixes https://github.com/pytorch/pytorch/issues/62327
Fixes https://github.com/pytorch/pytorch/issues/53364
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80217
Approved by: https://github.com/nikitaved, https://github.com/albanD, https://github.com/malfet
2022-08-24 15:18:56 +00:00
Mario Lezcano
4737b33614 Make linalg.inv composite of linalg.solve (#80074)
The `getri` kernel calls inside `getrs` so we can do so explicitly
ourselves and save ourselves from having to maintain an extra kernel.
This way we just need to optimise `lu_factor` and `lu_solve` and `inv`
will be as efficient as it can be, as it'll be choosing the best backend
to perform the factorisation and the best backend (not necessarily the
same) to perform the solve.

Fixes https://github.com/pytorch/pytorch/issues/77498

The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074
Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet
2022-08-24 15:18:56 +00:00
Edward Z. Yang
0491e1a13a Support returning symbolic strides from t.stride() in Python (#83842)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83842
Approved by: https://github.com/albanD, https://github.com/Chillee, https://github.com/bdhirsh
2022-08-24 04:32:51 +00:00
Edward Z. Yang
8fae7027b3 Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-23 22:04:07 +00:00
Ivan Yashchuk
cb488e6d2f Allow None arguments for elementwise type promotion wrapper and fix clamp with None arguments (#83586)
Fixes https://github.com/pytorch/torchdynamo/issues/759
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83586
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-08-23 17:47:10 +00:00
Horace He
7ebdb4c72f Refactored ops on size to be dispatcher ops (#83719)
An example of how the graph looks now.
```
def forward(self, x_1):
    size = torch.ops.math.size(x_1, 0)
    size_1 = torch.ops.math.size(x_1, 1);  x_1 = None
    ones = torch.ops.aten.ones.default([1], device = device(type='cpu'), pin_memory = False)
    expand_sym_int = torch.ops.aten.expand.SymInt(ones, [size, size_1]);  ones = size = size_1 = None
    cos_default = torch.ops.aten.cos.default(expand_sym_int);  expand_sym_int = None
    return (cos_default,)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83719
Approved by: https://github.com/ezyang
2022-08-23 15:48:00 +00:00
Horace He
0e0af73ba2 Add support for partial decompositions in make_fx (#83770)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83770
Approved by: https://github.com/ngimel
2022-08-20 01:03:39 +00:00
Edward Z. Yang
9152144944 Coverage for nondeterministic_seeded, respect it in constant prop (#83650)
- nondeterministic_seeded was not applied to enough functions.  I added
  some heuristics to codegen for identifying functions that are likely
  to be random and added a bunch of these tags to functions.  Not sure
  I got all of them.

- Don't constant propagate through nondeterministic functions in FX
  tracing.

It would be better to do some testing for the tag but this would be quite an effort.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83650
Approved by: https://github.com/bdhirsh, https://github.com/eellison
2022-08-18 22:18:10 +00:00
Edward Z. Yang
24acc3155f Be more conservative about propagating constants. (#83648)
If a constant would turn into something large, don't keep
it as a constant, just drop it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83648
Approved by: https://github.com/eellison
2022-08-18 22:18:10 +00:00
Edward Z. Yang
817a82704f Delete ProxyTensor wrapper subclass (#83330)
I was working on https://github.com/pytorch/torchdynamo/issues/80 and my
working hypothesis for what was causing the error was that proxy tensor
was not advertising correct dispatch keys, causing AMP to operate
differently when you traced.  I could have fixed this directly by
replicating fake tensor's fix for setting dispatch keys to also apply to
proxy tensor, but I was like, "Why must I repeat myself."

This PR is the result.  It completely deletes the ProxyTensor wrapper
subclass, so that when we are tracing, the tensors flowing through the
program are the *original* real or fake tensors, depending on what the
user requested in the top-level API.  There is no more wrapping.  To
store the Proxy objects necessary for actually doing tracing, I store
the property directly on the tensors.  (Note: I never
clean up old entries from the map at the moment, this is easily fixed
by using a weak map)

Benefits of doing this:

* No more tip-toeing around no_dispatch() creation of new ProxyTensors;
  we never create new tensors (except when we call the underlying func),
  so you don't have to worry about accidentally tracing them.

* No more syncing up metadata from in place operators.  In particular
  https://github.com/pytorch/pytorch/issues/81526 is mooted

* This fixes https://github.com/pytorch/torchdynamo/issues/519 as we no longer need to teach proxy tensor to support sparse tensor.

* No more schlepping symbolic integers from the inner fake tensor to the
  outer proxy tensor.  If you can make a fake tensor with symbolic ints,
  you're done, nothing else to do.

To avoid having to rewrite all of the guts, when I get to the actual
proxy tensor handler, I first "fetch" the stored ProxyTensor data from
the weakmap via a tree_map, and then operate on the consequent data as
before.  A more optimized implementation is possible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83330
Approved by: https://github.com/Chillee
2022-08-18 01:56:07 +00:00