This also comes with some bug fixes that were uncovered from doing
this:
- Forward device calls to inner tensor in FunctionalTensorWrapper
- Make legacyExtractDispatchKey exclude Functionalize, so that
it can get at the real device type key. This is noncontroversial.
- Stop stripping dense from key set. The reason for this is
FunctionalWrapperTensor may be used in contexts where people
query if it is dense or not. If it doesn't report this correctly
(from the dispatch key), it will cause errors. This caused some
torchbench models to fail when I did one-pass tracing.
- Save and restore reapply views TLS correctly
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88063
Approved by: https://github.com/bdhirsh
`.detach()` worked in basic cases previously, but didn't properly preserve view relationships between the base and the output. This wasn't heavily tested, because autograd doesn't normally encounter `FunctionalTensorWrapper` directly, but could become more common if we fuse functionalization and autograd into a single tracing pass.
This will also be a bug fix for LTC (and XLA when they use functionalization)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87750
Approved by: https://github.com/ezyang
This fixes an issue with mobile: The output of view_copy ops should always be contiguous.
Later, we can consider adding optional arguments to the `view_copy()` functions to let you explicitly say what the contiguity of the output can be (e.g. channels_last)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85747
Approved by: https://github.com/ezyang
I'm testing out turning on re-inplacing + functionalization by default with the AOTAutograd + eager backend on torchbench + huggingface models. This PR contains a few bug fixes from turning re-inplacing on:
(1) Handle more gracefully when FakeTensorMode is already turned on when you call reinplace
(2) More robust detection for when an inplace variant of an op exists (the dumb bug was that `pow.Scalar` doesn't have an inplace variant, even though there are several overloads of `pow_`. None of them are eligible though
(3) Avoid re-inplacing when it would require resizing the input buffer. This isn't allowed, because inplace ops aren't allowed to resize their inputs.
For the last one, I gave the two main examples in more detail in the comments. Important cases are:
```
# This should not be re-inplaced at all; the op broadcasts, so this would require resizing the self tensor
torch.add(tensor[1, 4], tensor[4, 4])
# This should not be re-inplaced, because the inplace and out-of-place variants of the op return different dtypes
torch.ge(a, b)
# However, this means that today when functionalization functionalists a `torch.ge_(a, b)` call, reinplacing won't properly de-functionalize it. I mentioned that optimization is worth adding later in the comments
```
(4) There's some logic around keeping `storage_to_nodes` up to date when we see a view op: if we re-inplace `out = a.add(...)`, and later in the program we encounter a "later_node",`out.view(..)`, and need to replace it with `a.view(...)`, then we need to update some metadata structures. I had to fix that logic: specifically, if "later_node" isn't a dispatcher op, (e.g. if it's an FX output node), I wasn't properly handling the case where the node's fake_meta info was not a tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83626
Approved by: https://github.com/ezyang
This allows you to directly call into the CompositeImplicitAutograd
implementation of an operator, *without* changing any aspects of the
dispatcher state. In particular, you can use this to recursively call
into a decomposition, dispatching back to your tensor subclass/mode
as desired.
Hypothetically, we should also make these available in the
decompositions dictionary, but I'm leaving this as future work as
enumerating these decompositions is annoying (as operators are lazily
registered.)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83075
Approved by: https://github.com/albanD
The functional variant of one of the `arange` overloads has a schema mismatch with the out variant. The functional one has `Scalar step`, but the corresponding out variant has `Scalar step=1`. This isn't allowed, so it had to be special-cased in the python codegen and manually bound. This adds the default `step` value to the functional overload and removes the special-casing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81380
Approved by: https://github.com/ngimel
Fixes#81774
`TensorOptions` arguments in the JIT schema are optional, but in the Python API these were being translated to non-optional but with a default value. This change makes the arguments accept `None` for consistency with the JIT schema. However, it also means that `dtype=c10::nullopt` was previously completely untested so this also fixes several related bugs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82241
Approved by: https://github.com/ngimel
There's an existing assert in functionalization that's probably too restrictive - when you pass a list of tensors to an op that has a mix of functional and nonfunctional tensors, we should just selectively unwrap the functional tensors and call the op rather than erroring.
I added a test for it in `test_functionalization.py` - it looks like this behavior can also show up when tracing with `make_fx()`, when constants get baked in as module properties, which don't get wrapped up when you try to functionalize the module's forward function.
Should fix the last of https://github.com/pytorch/torchdynamo/issues/88#issuecomment-1193059940
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82326
Approved by: https://github.com/ezyang
Adds a "reinplacing" FX transform, that goes through an FX graph and tries to convert out-of-place op calls into inplace calls whenever possible.
Followups from this PR include:
- Set up torch bench, and run the whole torchbench suite using AOTAutograd + functionalize + rein placing transforms to surface any issues (this is what I'm currently working on). Right now, I have some basic unit tests just to sanity check that the general logic makes sense.
- Add any missing inplace ops. This is mostly the `*_scatter*` ops, e.g. `diagonal_scatter_`, because these ops will commonly show up an FX graph after running functionalization.
The criteria for when you can swap an op `b = a.add(...)` with `a.add_(...)` is:
(1) An inplace variant of the operator with the same schema needs to exist (`aten.add` -> `aten.add_`)
(2) `a` (**or any of its aliases**) can't be used as an input to any other operators later on in the graph
(3) `a` can't be one of the inputs to the entire graph. It also can't be an **alias** of any of the inputs ***
*** One thing to note: (3) means that we can't technically guarantee that we'll get back **all** memory usage that we lost from functionalization. Functionalization converts input mutations into out-of-place calls, and then adds a `copy_()` to the end of the graph to preserve semantics.
I added logic to handle `copy_()` in this PR because it it's a pretty important optimizations in the context of `functionalization()`: any program that performs input mutations will have a `copy_()` in it after running functionalization.
There are some examples in the test file, but I think staring at an example of where re-inplacing is/isn't allowed to run is helpful:
```
// Before functionalization
def foo(a):
tmp1 = a.add_(1)
tmp2 = a.add(2)
// After functionalization
def foo(a)
tmp1 = a.add(1)
tmp2 = a.add(2)
....
a.copy_(tmp1)
// After re-inplacing
def foo(a)
// first add() is safe to re-inplace even though a is a program input,
// because a's data is overwritten later by a copy_()
tmp1 = a.add_(1)
// second add() is NOT safe to re-inplace, because:
// (1) a and tmp1 are aliased. Note that they weren't aliased in the original program,
but they are now that we've done some re-inplacing.
// (2) tmp1 is used as an input later in the program
tmp2 = a.add(2)
....
a.copy_(tmp1)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80897
Approved by: https://github.com/ezyang
It's kinda annoying to have wrapper subclass tensors (like `FunctionalTensorWrapper` include backend dispatch keys in their keyset, because when we occasionally write something buggy, we'll send the wrapper tensor the the backend kernel (which usually segfaults). By ensuring that wrapper tensors don't get backend keys, we'll get a nicer error when that happens.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81471
Approved by: https://github.com/ezyang
fixes https://github.com/pytorch/pytorch/issues/81618
At some point it looks like this became broken (you can see the updated expect test looks better now, and the original was just returning a constant).
I also got a repro that was failing with an assert, that I confirmed now passes:
```
def foo(t, y):
out_1 = torch.ones(1)
return torch.add(t, y, out=out_1)
g = make_fx(functionalize(foo))(torch.tensor([1]), torch.tensor([1]))
print(g.code)
out1 = functionalize(foo)(torch.tensor([1]), torch.tensor([1]))
out2 = foo(torch.tensor([1]), torch.tensor([1]))
print(out1 == out2)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81702
Approved by: https://github.com/ezyang
`ProxyTorchDispatchMode` was added recently as part of `make_fx`, which was secretly causing the meta tensor calls used inside of functionalization to get baked into the graph. It also wasn't caught because the functionalization tests in core don't use `make_fx`, and the tests in functorch aren't as comprehensive.
Now that `make_fx` is in core, I also ported the functionalization test infra over to use it, which would have caught the regression. This also makes the tests cleaner, since mode-based tracing lets us pick up factory functions in the trace output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80416
Approved by: https://github.com/ezyang, https://github.com/albanD
This should fix the last issue that @anijain2305 hit when running ResNet with TorchDynamo <> functionalization.
Today if you try to call an `OpOverloadPacket` from python with some arguments, we will use the types of those arguments to perform overload resolution. With some functional variants of ops, this can be ambiguous.
Today this affects just one op: `_fused_moving_avg_obs_fq_helper`, although it would potentially affect e.g. `native_batch_norm` in the future.
Example:
```
# There are technically two overloads:
# torch.ops.aten._fused_moving_avg_obs_fq_helper.default (returns 2 argument, mutates 4 of its inputs inplace)
# torch.ops.aten._fused_moving_avg_obs_fq_helper.functional (returns 6 argument, mutates none of its inputs)
# We pick the wrong one - no way to know that we should pick the functional one, just from the call site.
outs = torch.ops.aten._fused_moving_avg_obs_fq_helper(a, a, a, a, a, a, a, 1.0, 0, 1, 0)
# raises an error - tries to call the overload with only 2 returns
return _fused_moving_avg_obs_fq_helper_functional[5]
```
Specifically, functionalization will bake `_fused_moving_avg_obs_fq_helper.functional` into the graph, but when AOTAutograd tries to compile with TorchScript, it needs to remove the overload name (TS doesn't know how to parse overload names directly, so we need to remove the overload name and let it infer the right overload at runtime later- so it picks the wrong one).
The situation is pretty similar to inplace; `ops.aten.add` and `ops.aten.add_` represent two different `OverloadPacket` objects; they can't be overloads of the same op, because their schemas would be ambiguous - the alias annotations are different, but that isn't enough to disambiguate).
In this PR, I try to fix the situation in a pretty similar way to how we handle `inplace` in the data model: `inplace` ops get their own base operator name, but they are represented as a flag inside of `BaseOperatorName` in the data model.
Two other important changes that I made as part of this PR:
(1) Originally, there were ~100 different `*_functional` operators: e.g. we had operators named `resize.functional` and `zero.functional`. The `_functional` bit isn't actually necessary in most cases: it's only necessary for operators that **also** have a `SchemaKind.mutable` variant, where `_fused_moving_avg_obs_fq_helper` is the only op that fits that description today. So I removed the unnecessary notion of "functional" from those other ops. I also added a bunch of assertions to force this restriction.
I think that makes more sense in the long run, because it eliminates an unnecessary difference in the model. E.g. we don't have `add_.Tensor` and `add.Tensor_functional`. We just have `add_.Tensor` and `add.Tensor`.
(2) I noticed that we actually still weren't pairing up a bunch of `_foreach` operators correctly, because their input arguments were different (`self` vs. `tensors`). Since they're private API's, I went ahead and changed the argument names directly so they get matched up. Before this PR, we were generating a separate `_foreach_add` and `_foreach_add.functional` variant in a bunch of cases, that really did the same thing (but happened to have a different name for the first argument).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80556
Approved by: https://github.com/ezyang, https://github.com/albanD
Double-header bug fix:
- As reported by jansel, dtypes are still showing up as integers
when the schema is an optional dtype. This is simple enough to
fix and I added a test for it. But while I was at it...
- I noticed that the THPMemoryFormat_new idiom with "unused" name
doesn't actually work, the repr of the returned memory format
object is wrong and this shows up when we try to log the args/kwargs.
So I fixed memory format to do it properly along with everything
else.
Fixes https://github.com/pytorch/pytorch/issues/77135
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77543
Approved by: https://github.com/albanD, https://github.com/jansel