## What's the problem?
The popular `fx.node.map_arg()` and `fx.node.map_aggregate()` apply operations recursively on `dict`s, `tuples`, `list`s, etc, and return a new collection of the same type.
Unfortunately, their base input type is `Argument`, which is [very unspecific indeed](5d55a6585d/torch/fx/node.py (L48-L58)): most type information is just thrown away at the call site of either of these functions, as far as the type checker goes.
As `torch` moves to a more typed code base, this would force innocent, unsuspecting developers to add logically unnecessary casts or `# type: ignore` statements.
## What's the solution?
Making these two `node.map_*` functions generic on the first argument and return type means that type information is preserved for the type checker. (The signature of the other parameter, the function that visits the nodes and subnodes, has not changed, nor should it.)
## Won't it break everything?
It doesn't break the type checker - one place needed an extra hint.
There have been code breakages, resolved one, at least one new one... we'll see!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146248
Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007
Pickling GraphModule needs some special handling for wrapping things that normally can't be pickled - but async compile needs to pass them across a wire so we need to be able to serialize it - add some helpers to enable that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141659
Approved by: https://github.com/jamesjwu
Mitigates the deterministic benchmark regression: https://github.com/pytorch/pytorch/issues/144775#issuecomment-2593411844. and maybe the dashboard issue.
fx.Node.is_impure is unexpectedly a hot spot. It gets called for every node in the graph whenever we invoke DCE, which should be okay, EXCEPT we invoke DCE on the full graph ~10 times at various stages of torch.compile, and an insane number of times (>O(parameters)) for the subgraphs traced by the pattern matcher.
I considered addressing this problem by reducing the amount of times DCE is called, but I think we can only trim the ones from the pattern matcher, which will require some refactor/caching solution that I leave out of this PR.
torch.Tag.nondeterministic_seeded is provided by native_functions.yml and is implemented as a list. Most of the time, it has <=2 elements, so it's not really worth it to turn it into a set for fast lookup.
Using the deterministic instruction count benchmarks
```python
# before
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8914894946
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8866669058
# after
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8770562314
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8779547794
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145118
Approved by: https://github.com/ezyang, https://github.com/zou3519
Currently there are a few type annotations that falsely state that mypy doesn't support recursive types.
Recursive type support is available in mypy for a few years already. It has been officially enabled in [version 0.991](https://mypy-lang.blogspot.com/2022/11/mypy-0990-released.html). Pyright even had support for recursive types earlier (https://github.com/microsoft/pyright/issues/569), so there is probably no reason not to model these types correctly.
This PR models these types properly now. Since this has turned a few implicit `Any` into fully typed variables that are not narrowed cleanly, a small number of type ignores were necessary.
Note that regarding the `Argument` it is desirable to model it in a covariant way (i.e. using `Sequence` and `Mapping`) instead of making it invariant unnecessarily (using `List` and `Dict`). If it were modeled invariant, it would for instance mean that a `List[Node]` would not type check as `Argument`, because invariance would mean that it really has to be a `List[Argument]` (i.e., including all the branches of the union type). Since even the name of the type "argument" strongly suggest that it is semantically used as "argument", having covariance natural anyway.
There are no chances in this PR that affect runtime behavior.
CC @Skylion007
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142300
Approved by: https://github.com/ezyang, https://github.com/Skylion007
Summary: Change fx graph module's _replace_hook from a single hook, to a list of hooks. This is to prepare to registering more hooks for inductor provenance tracking, where we might need to register multiple hooks for node replacement.
Test Plan:
```
buck run mode/dev-nosan caffe2/test:fx -- -r test_hooks_for_node_update
buck run mode/dev-nosan caffe2/test:test_export -- -r test_replace_hook
```
Differential Revision: D66726724
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142006
Approved by: https://github.com/zhxchen17
Part of #134054.
This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).
- mark Proxy dump as NotImplemented for better error msg
- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.
More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.
Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node
```
Reviewed By: angelayi
Differential Revision: D60319175
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
The problem was we were shoving SymInts into the constant_args side
table. The root problem is that torch.fx.node.base_types, which we use
to determine what can be put in the graph, doesn't actually have SymInt
in it. This PR fixes base_types to include SymInt.
Test Plan:
- tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131363
Approved by: https://github.com/oulgen, https://github.com/justinchuby
The problem was we were shoving SymInts into the constant_args side
table. The root problem is that torch.fx.node.base_types, which we use
to determine what can be put in the graph, doesn't actually have SymInt
in it. This PR fixes base_types to include SymInt.
Test Plan:
- tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131363
Approved by: https://github.com/oulgen
Summary:
This diff reverts D59561509
D59561509: [FX][export] DCE pass, check schema for node impurity (#130395) by yushangdi causes the following test failure:
Tests affected:
- [cogwheel:cogwheel_mtia_cmf_m5_shrunk_test#test_flow_with_verification](https://www.internalfb.com/intern/test/844425041436985/)
Here's the Multisect link:
https://www.internalfb.com/multisect/6533402
Here are the tasks that are relevant to this breakage:
T191383430: 10+ tests unhealthy for ads_mtia_inference
The backout may land if someone accepts it.
If this diff has been generated in error, you can Commandeer and Abandon it.
Test Plan: NA
Differential Revision: D60029318
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131341
Approved by: https://github.com/angelayi
This is a copy of Brian's PR https://github.com/pytorch/pytorch/pull/128754, with some changes in the test_distributed_patterns.py unit tests to more closely reflect FSDP2 patterns. Also disabled two tests `test_input_mutation_storage_resize_up_down` and `test_input_mutation_storage_resize_not_supported` in test_aotdispatch.py until we figure out the right behavior for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129203
Approved by: https://github.com/bdhirsh
Currently if `x` is a CUDA tensor, calling `x.untyped_storage().resize_()` seems to always go into the `built without cuda` branch of `resize_storage_bytes_()` regardless of whether PyTorch is built with CUDA. I suspect this is because `inductor_ops.cpp` is only included in `libtorch_cpu.so` thus doesn't have the `USE_CUDA` information or ability to link to CUDA-related functions.
This PR moves `resize_storage_bytes_()` related custom op functions out of `inductor_ops.cpp` into its standalone file `resize_storage_bytes.cpp` to be included in `libtorch_python.so` instead. This mimics the setup for `StorageMethods.cpp`. This way, `resize_storage_bytes_()` can have access to the CUDA-related functions, which passes the CUDA unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129215
Approved by: https://github.com/jansel
We have some (limited) support for `set_()` input mutations in `torch.compile`, but one restriction today is that we force them to run outside of the graph, in the opaque runtime epilogue.
This is a problem for ppFSDP. Why? The usage pattern of ppFSDP forward graphs look something like this:
```
def forward_fsdp(sacrificial_param, sharded_param, inp):
allgathered_param = allgather(sharded_param)
sacrificial_param.set_(allgathered_param) # hidden in an autograd.Function that we trace
out = matmul(sacrificial_param, inp)
sacrificial_param.untyped_storage().resize_(0)
return out
```
When we functionalize this graph, `sacrificial_param` sees two distinct types of input mutations, that we must preserve: a `set_`, and a `resize_`. Importantly, at runtime the `set_()` must run **before** the `resize_()`. Why? the `set_()` updates the storage of our sacrificial param to the allgather'd data, which allows the call to `sacrificial_param.resize_()` to free the allgathered data later. If we run the two mutations in reverse order, we will never free the allgathered data.
We want to put the `resize_()` mutation op inside of the graph (see next PR, also there's a much longer description in that PR for anyone interested). However, this will require us to put `set_()` in the graph as well, in order for them to run in the correct order.
In order to do this, I had to add some extra restrictions: You are now required to run `set_()` under `no_grad()` if you use it with `torch.compile`, and if you perform any other mutations to the input, those must be under no_grad as well (otherwise, the mutations may mutate the `grad_fn` of the input, making it no longer safe to keep in the graph). These restrictions are hopefully reasonable, since `set_()` doesn't see much usage today (and the original impetus for adding set_() support a few months ago was for fsdp anyway)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122981
Approved by: https://github.com/jansel
ghstack dependencies: #122433, #123646
Summary: This was a micro optimization that I thought would save time but it is not correct. For example, we cannot compare fake tensors.
Test Plan:
```
buck2 run 'fbcode//mode/opt' fbcode//langtech/edge/ns/tools/tests:test_ns_jit_traced_model_all_optimization_f328819347_portal_ns
```
now passes
Differential Revision: D55904083
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123616
Approved by: https://github.com/aakhundov
This is a lot of files changed! Don't panic! Here's how it works:
* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.
In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.
The codemod was done with this script authored by GPT-4:
```
import glob
exclude_patterns = [
...
]
for pattern in exclude_patterns:
for filepath in glob.glob(pattern, recursive=True):
if filepath.endswith('.py'):
with open(filepath, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write('# mypy: ignore-errors\n\n' + content)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD