As title. Without this patch we get the following error:
Tweaking the `allow_non_fake_inputs` flag on tensor mode doesn't quite
work for AOTAutograd, which also needs to fake-tensor-propagate the
`nonstrict_trace`-ed function, but that's _after_ Dynamo has handled the
`nonstrict_trace` processing and put the `flat_apply(...)` node into the graph.
So we can't easily to temporarily enable the `allow_non_fake_inputs`
flag on current fake mode, when AOTAutograd processes a `flat_apply`
node from Dynamo's `nonstrict_trace` handling. And after discussing
with zou3519, I decided to add a global `FakeTensorTLS` that contains a
`allow_non_fake_inputs_override` flag, and patch the `nonstrict_trace`-ed
function to temporarily tweak this flag during its execution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147572
Approved by: https://github.com/zou3519
ghstack dependencies: #146714, #146367, #146950, #147571
Fixing this is actually a bit annoying:
(1) FakeTensorMode sees a function where all of its inputs are real tensors, so it tries to run the real compute before converting the output to a FakeTensor
(2) we don't actually want this, because the "real compute" is support to error normally, when you do `meta_tensor.to(device='cpu')`. Instead, we want FakeTensor to actually skip constant prop and run the normal FakeTensor implementation, which will not error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146729
Approved by: https://github.com/zou3519, https://github.com/SherlockNoMad, https://github.com/albanD
ghstack dependencies: #146642
Summary:
When encountering a mismatched fake kernel that also creates unbacked symbols, draft export will fail with `PendingUnbackedSymbolNotFound` error.
Clearing `shape_env.pending_fresh_unbacked_symbols` fixes this issue.
Test Plan:
```
buck2 run mode/dev-nosan caffe2/test:test_export -- -r test_override_mismatched_fake_kernel_with_unbacked_symbols
```
Differential Revision: D68920990
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146089
Approved by: https://github.com/pianpwk
Pickling GraphModule needs some special handling for wrapping things that normally can't be pickled - but async compile needs to pass them across a wire so we need to be able to serialize it - add some helpers to enable that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141659
Approved by: https://github.com/jamesjwu
The codebase has a few locations where callable parameter type information is lost when the unpackings *args and **kwargs are typed as Any. Refactor these instances to retain type information using typing_extensions.ParamSpec.
Also, in these functions, enforce return type with TypeVar.
Addresses #142306
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143797
Approved by: https://github.com/Skylion007
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>
A few changes to MetaTensorDesc and friends:
1. Change view_func from a raw method to an ADT where the common case (FakeTensor._view_func_unsafe) is a simple representation instead.
2. (minor) Remove and fix some `type: ignore`s added by #141839
3. (minor) Fix _UNSERIALIZABLE to be a set instead of a dict which is converted into a set each time it's used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141926
Approved by: https://github.com/ezyang
Currently real tensor tracing raises MetadataMismatchErrors if registered fake kernels don't match the real kernels (e.g. shape, aliasing, dtype, etc.). This adds an option to use fake kernel inference to bypass mismatches - this option defaults to False for real tensor tracing, but is on for draft export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139766
Approved by: https://github.com/angelayi, https://github.com/zou3519
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
Summary:
While testing exportability for PT2 Inference models, we found various cases of invalid op inputs during tracing, for example errors like: `a and b must have same reduction dim`, `expected scalar type Long but found Int`, etc. Looking more closely, these happened to due the same few meta kernels & eager kernels producing mismatched outputs upstream (e.g. different output tensor dtype, int output).
Adding checks to catch mismatched outputs in real tensor prop upstream, so errors are raised at the mismatched op, instead of the downstream ops taking them as inputs. Relies a lot on utils from [CrossRefFakeMode](929797dedb/torch/_subclasses/fake_utils.py (L78))
Follow ups: could add more checks, and maybe have a flag to only enable these for cases like draft mode, so perf doesn't suffer?
Test Plan: test_export, test_fake_tensor
Differential Revision: D64210055
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137747
Approved by: https://github.com/zou3519
As discussed w/ @ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:
1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
2) It updates the tensorify pass to do the backup specialization
This pass was originally part of the [PR](https://github.com/pytorch/pytorch/pull/137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:
1) Integrate turning off specialize float only in the automatic dynamic pass.
2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138868
Approved by: https://github.com/ezyang
As discussed w/ @ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:
1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
2) It updates the tensorify pass to do the backup specialization
This pass was originally part of the [PR](https://github.com/pytorch/pytorch/pull/137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:
1) Integrate turning off specialize float only in the automatic dynamic pass.
2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138868
Approved by: https://github.com/ezyang
This PR changes real_tensor_prop to also infer fake kernels when the
operator doesn't have it.
We infer the fake output to be of the same properties as the real
output, with unbacked symints in the sizes and some stride order.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139213
Approved by: https://github.com/pianpwk
ghstack dependencies: #139212
When we see a custom op:
- check that its mutation annotations are correct
- check that its aliasing constraints matches our constraints for custom
ops.
Otherwise, there may be undefined behavior.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139212
Approved by: https://github.com/angelayi
Summary:
* Fixed real tensor tracing w/ torchbind objs by passing the cloned tensor obj. For now I just catch the exception and have an error message if the `_clone` fails, but up for discussion on what to do here
* Separate question, should we require people to set up FakeScriptObjects and stuff for draft mode?
* Prevent side effects from happening when we do the first pass of custom ops profiling by cloning/copying everything. Not sure if deepcopying the model will succeed in all cases... But also I guess this path can be removed once custom ops profiling turns into one pass.
Test Plan: `buck2 run @//mode/dev-nosan //scripts/angelayi/draft_export:test_draft_export`
Reviewed By: ydwu4
Differential Revision: D64124825
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138797
Approved by: https://github.com/ydwu4
unwrap_tensor_subclasses -> get_plain_tensors
Is used at runtime. For small models this overhead is feasible in comparison with small compiled kernel.
1/ Removing asserts from runtime path
2/ Removing list creation with using optional output list to append argument
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138498
Approved by: https://github.com/bdhirsh
Summary: Prototyping the custom op meta kernel generation. Rest of the changes are in fbcode/scripts/angelayi
Test Plan: followup diff (D63837739)
Differential Revision: D63837740
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137277
Approved by: https://github.com/zou3519