Summary:
Original commit changeset: e11cddf1fecc
Original Phabricator Diff: D49064185
Test Plan:
Comparing PT1 and PT2 performance on the IG Feed Model with this diff backed out: N4274204
Comparing the PT1 and PT2 performance on IG Feed with this diff committed: N4271093
Reviewed By: zou3519
Differential Revision: D49230047
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109199
Approved by: https://github.com/zou3519, https://github.com/xw285cornell
Fixes#106893
There are two main changes:
- Before this PR, the function returned by once_differentiable was
included in skipfiles (because its .co_code is
torch/autograd/function.py). This PR adds a mechanism to tell Dynamo
to inline a function, no matter if it is included in skipfiles.
- A bugfix: when we are introspecting the backward, we need to turn the
grad mode off. This is to accurately model the eager-mode semantics:
In eager-mode PyTorch, if second-order gradients were not requested, then
the grad mode is off. torch.compile does not work with higher-order
gradients and just assumes we do first-order gradients, so this is OK.
Test Plan:
- new test
Differential Revision: [D49064185](https://our.internmc.facebook.com/intern/diff/D49064185)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108686
Approved by: https://github.com/voznesenskym
Fixes#103613.
A requirement for HigherOrderOperators is that after Dynamo capture, the body
function should be functional (i.e. has no observable side effects).
If the body function mutates a variable that is not local to the body, then we
that should induce a graph break.
This PR distinguish between MutableLocals created inside/outside body
and adds relevant checks. (Design originally proposed by voznesenskym.)
- We tag each mutable_local with an id that corresponds to where it came
from. The mutable_local may represent an existing object that gets
tracked by Dynamo or an object that is created while Dynamo is
introspecting.
- This id changes when we are introspecting the body of a HigherOrderOperator.
- If Dynamo wants to perform a side effect using a mutable_local, we
check its .scope field with the current scope id and raise Unsupported
in the desired case (non-local mutation inside HigherOrderOperator body)
- The id is a global thread_local variable. I can make this not a global
variable, but it just takes some engineering time to thread a number through
each of the various ways Dynamo can construct a mutable_local.
Test Plan:
- Add a bunch of new tests. Tests combinations of {global, nonlocal} x
{number, Tensor, list, object, nn.Module} and asserts that HigherOrderOp
falls back on those cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104077
Approved by: https://github.com/voznesenskym, https://github.com/jansel
Make mutation faster to speed up tracing optimizers, helps with https://github.com/pytorch/torchdynamo/issues/1803
`replace_all` no longer iterates over the entire variable tracker data structure every time a mutation is performed
Each variable tracker internally keeps a set of contained mutable variable trackers, to provide a hint to `replace_all`. This is populated with a call to `apply` from `__post_init__` in the base `VariableTracker`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89170
Approved by: https://github.com/jansel