Commit Graph

160 Commits

Author SHA1 Message Date
Jason Ansel
5d4e7d58b4 [fx] Move Node._prepend/Node._remove_from_list to C++ (#148261)
Microbenchmarking `fx.symbolic_trace(lambda x: functools.reduce(operator.add, [x, *range(100000)]))`, before:
```
24303536 function calls (23503339 primitive calls) in 10.726 seconds
```
after:
```
20003454 function calls (19203257 primitive calls) in 8.936 seconds
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148261
Approved by: https://github.com/oulgen
ghstack dependencies: #148243, #148260
2025-03-10 16:06:11 +00:00
Jason Ansel
bf752c36da [fx] Move Node._update_args_kwargs to C++ (#148260)
Microbenchmarking `fx.symbolic_trace(lambda x: functools.reduce(operator.add, [x, *range(100000)]))`, before:
```
25203549 function calls (24403352 primitive calls) in 12.090 seconds
```
after:
```
24303536 function calls (23503339 primitive calls) in 10.726 seconds
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148260
Approved by: https://github.com/oulgen
ghstack dependencies: #148243
2025-03-10 16:06:02 +00:00
Jason Ansel
bec7bdad47 [fx] Move map_aggregate to C++ (#148243)
Microbenchmarking `fx.symbolic_trace(lambda x: functools.reduce(operator.add, [x, *range(100000)]))`, before:
```
30603618 function calls (29403419 primitive calls) in 13.744 seconds
```
after:
```
25203549 function calls (24403352 primitive calls) in 12.090 seconds
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148243
Approved by: https://github.com/oulgen
2025-03-10 16:05:53 +00:00
PyTorch MergeBot
92beda54c8 Revert "[fx] Move map_aggregate to C++ (#148243)"
This reverts commit edaff88f69.

Reverted https://github.com/pytorch/pytorch/pull/148243 on behalf of https://github.com/jovianjaison due to breaking internal builds [T216910920] ([comment](https://github.com/pytorch/pytorch/pull/148243#issuecomment-2698724058))
2025-03-04 19:40:21 +00:00
PyTorch MergeBot
17d003fe75 Revert "[fx] Move Node._update_args_kwargs to C++ (#148260)"
This reverts commit 0135f57f4a.

Reverted https://github.com/pytorch/pytorch/pull/148260 on behalf of https://github.com/jovianjaison due to breaking internal builds [T216910920] ([comment](https://github.com/pytorch/pytorch/pull/148243#issuecomment-2698724058))
2025-03-04 19:40:21 +00:00
PyTorch MergeBot
97b9e68bc6 Revert "[fx] Move Node._prepend/Node._remove_from_list to C++ (#148261)"
This reverts commit 29c2de9ae1.

Reverted https://github.com/pytorch/pytorch/pull/148261 on behalf of https://github.com/jovianjaison due to breaking internal builds [T216910920] ([comment](https://github.com/pytorch/pytorch/pull/148243#issuecomment-2698724058))
2025-03-04 19:40:21 +00:00
Jason Ansel
29c2de9ae1 [fx] Move Node._prepend/Node._remove_from_list to C++ (#148261)
Microbenchmarking `fx.symbolic_trace(lambda x: functools.reduce(operator.add, [x, *range(100000)]))`, before:
```
24303536 function calls (23503339 primitive calls) in 10.726 seconds
```
after:
```
20003454 function calls (19203257 primitive calls) in 8.936 seconds
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148261
Approved by: https://github.com/oulgen
ghstack dependencies: #148243, #148260
2025-03-02 22:42:31 +00:00
Jason Ansel
0135f57f4a [fx] Move Node._update_args_kwargs to C++ (#148260)
Microbenchmarking `fx.symbolic_trace(lambda x: functools.reduce(operator.add, [x, *range(100000)]))`, before:
```
25203549 function calls (24403352 primitive calls) in 12.090 seconds
```
after:
```
24303536 function calls (23503339 primitive calls) in 10.726 seconds
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148260
Approved by: https://github.com/oulgen
ghstack dependencies: #148243
2025-03-02 22:42:31 +00:00
Jason Ansel
edaff88f69 [fx] Move map_aggregate to C++ (#148243)
Microbenchmarking `fx.symbolic_trace(lambda x: functools.reduce(operator.add, [x, *range(100000)]))`, before:
```
30603618 function calls (29403419 primitive calls) in 13.744 seconds
```
after:
```
25203549 function calls (24403352 primitive calls) in 12.090 seconds
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148243
Approved by: https://github.com/oulgen
2025-03-02 22:42:31 +00:00
Xuehai Pan
cba14212e6 [FX] micro-optimization map_aggregate(immutable_dict) (#147691)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147691
Approved by: https://github.com/Skylion007, https://github.com/jansel
ghstack dependencies: #147699, #144640
2025-02-24 09:14:08 +00:00
Simon Fan
ac88a6c00d [fx] demote node prepend to self log from warning to debug (#147538)
FIXES https://github.com/pytorch/pytorch/issues/147175

This is harmless, not sure why this is a user warning. Writing reordering graph passes is more concise when we ignore this warning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147538
Approved by: https://github.com/yanboliang
2025-02-21 01:32:34 +00:00
Tom Ritchford
272ead7b5e Make fx.node.map_arg() and .map_aggregate() generic (#146248)
## What's the problem?

The popular `fx.node.map_arg()` and `fx.node.map_aggregate()` apply operations recursively on `dict`s, `tuples`, `list`s, etc, and return a new collection of the same type.

Unfortunately, their base input type is `Argument`, which is [very unspecific indeed](5d55a6585d/torch/fx/node.py (L48-L58)): most type information is just thrown away at the call site of either of these functions, as far as the type checker goes.

As `torch` moves to a more typed code base, this would force innocent, unsuspecting developers to add logically unnecessary casts or `# type: ignore` statements.

## What's the solution?

Making these two `node.map_*` functions generic on the first argument and return type means that type information is preserved for the type checker. (The signature of the other parameter, the function that visits the nodes and subnodes, has not changed, nor should it.)

## Won't it break everything?

It doesn't break the type checker - one place needed an extra hint.

There have been code breakages, resolved one, at least one new one... we'll see!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146248
Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007
2025-02-14 19:25:32 +00:00
Aaron Orenstein
1f8ff94d4f PEP585: Add noqa to necessary tests (#146391)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146391
Approved by: https://github.com/justinchuby, https://github.com/Skylion007
2025-02-12 15:29:50 +00:00
Aaron Orenstein
57d8278ab9 pickler for GraphModule (#141659)
Pickling GraphModule needs some special handling for wrapping things that normally can't be pickled - but async compile needs to pass them across a wire so we need to be able to serialize it - add some helpers to enable that.

Differential Revision: [D68921318](https://our.internmc.facebook.com/intern/diff/D68921318)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141659
Approved by: https://github.com/jamesjwu
2025-01-31 05:34:28 +00:00
Yidi Wu
d1143c4b37 [export] fix non-strict pre_dispatch exporting while_loop (#145762)
fix https://github.com/pytorch/pytorch/issues/145737.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145762
Approved by: https://github.com/tugsbayasgalan, https://github.com/zou3519, https://github.com/avikchaudhuri
2025-01-30 18:58:34 +00:00
PyTorch MergeBot
2de53b3b65 Revert "pickler for GraphModule (#141659)"
This reverts commit c6ad08357b.

Reverted https://github.com/pytorch/pytorch/pull/141659 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, please take a look at D68694181 for more details. ([comment](https://github.com/pytorch/pytorch/pull/141659#issuecomment-2617045120))
2025-01-27 22:39:30 +00:00
leslie-fang-intel
2e80093306 setitem node shouldn't be deadcode eliminated (#145714)
**Summary**
Fix issue https://github.com/pytorch/pytorch/issues/145697. The `operator.setitem` has been eliminated as dead code, causing a correctness issue. Mark it as impure in this PR to avoid this side effect.

**TestPlan**
```
python -u -m pytest -s -v test/fx/test_dce_pass.py -k test_keep_setitem
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145714
Approved by: https://github.com/ezyang
2025-01-27 15:08:21 +00:00
Aaron Orenstein
c6ad08357b pickler for GraphModule (#141659)
Pickling GraphModule needs some special handling for wrapping things that normally can't be pickled - but async compile needs to pass them across a wire so we need to be able to serialize it - add some helpers to enable that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141659
Approved by: https://github.com/jamesjwu
2025-01-26 19:29:13 +00:00
Simon Fan
27598cd154 [fx] move DCE rand check to import time (#145118)
Mitigates the deterministic benchmark regression: https://github.com/pytorch/pytorch/issues/144775#issuecomment-2593411844. and maybe the dashboard issue.

fx.Node.is_impure is unexpectedly a hot spot. It gets called for every node in the graph whenever we invoke DCE, which should be okay, EXCEPT we invoke DCE on the full graph ~10 times at various stages of torch.compile, and an insane number of times (>O(parameters)) for the subgraphs traced by the pattern matcher.

I considered addressing this problem by reducing the amount of times DCE is called, but I think we can only trim the ones from the pattern matcher, which will require some refactor/caching solution that I leave out of this PR.

torch.Tag.nondeterministic_seeded is provided by native_functions.yml and is implemented as a list. Most of the time, it has <=2 elements, so it's not really worth it to turn it into a set for fast lookup.

Using the deterministic instruction count benchmarks
```python
# before
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8914894946
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8866669058
# after
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8770562314
aotdispatcher_partitioner_cpu,compile_time_instruction_count,8779547794
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145118
Approved by: https://github.com/ezyang, https://github.com/zou3519
2025-01-22 02:23:02 +00:00
Aaron Orenstein
0b2a3687b9 PEP585 update - torch/fx (#145166)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145166
Approved by: https://github.com/bobrenjc93
2025-01-20 18:11:54 +00:00
Simon Fan
7f1946aa9b [aot] don't dce aten rng nodes (#144319)
FIXES https://github.com/pytorch/pytorch/issues/143431

For aot_eager backend, we dce twice in aot. The first dce errs on the side of caution and provides a restrictive dce function: 2e1ea8598f/torch/fx/experimental/proxy_tensor.py (L1173)

The second one is more aggressive: 2e1ea8598f/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py (L185)
But this deviates from eager accuracy when rand ops are dce'd

The repro doesn't work for inductor, but that's a separate issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144319
Approved by: https://github.com/jansel
2025-01-09 05:27:49 +00:00
Fabian Keller
8cb68b136f Proper modeling of recursive types (#142300)
Currently there are a few type annotations that falsely state that mypy doesn't support recursive types.

Recursive type support is available in mypy for a few years already. It has been officially enabled in [version 0.991](https://mypy-lang.blogspot.com/2022/11/mypy-0990-released.html). Pyright even had support for recursive types earlier (https://github.com/microsoft/pyright/issues/569), so there is probably no reason not to model these types correctly.

This PR models these types properly now. Since this has turned a few implicit `Any` into fully typed variables that are not narrowed cleanly, a small number of type ignores were necessary.

Note that regarding the `Argument` it is desirable to model it in a covariant way (i.e. using `Sequence` and `Mapping`) instead of making it invariant unnecessarily (using `List` and `Dict`). If it were modeled invariant, it would for instance mean that a `List[Node]` would not type check as `Argument`, because invariance would mean that it really has to be a `List[Argument]` (i.e., including all the branches of the union type). Since even the name of the type "argument" strongly suggest that it is semantically used as "argument", having covariance natural anyway.

There are no chances in this PR that affect runtime behavior.

CC @Skylion007

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142300
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2024-12-07 21:30:45 +00:00
Shangdi Yu
51cbac4e6a [export] Change fx graph _replace_hook to a list of Callable (#142006)
Summary: Change fx graph module's _replace_hook from a single hook, to a list of hooks. This is to prepare to registering more hooks for inductor provenance tracking, where we might need to register multiple hooks for node replacement.

Test Plan:
```
buck run mode/dev-nosan caffe2/test:fx -- -r test_hooks_for_node_update
buck run mode/dev-nosan caffe2/test:test_export -- -r test_replace_hook
```

Differential Revision: D66726724

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142006
Approved by: https://github.com/zhxchen17
2024-12-05 03:26:48 +00:00
angelayi
0fbc0830ba [export] Add device and dtype fields to assert_tensor_metadata (#141071)
Differential Revision: [D66321128](https://our.internmc.facebook.com/intern/diff/D66321128)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141071
Approved by: https://github.com/yushangdi, https://github.com/zou3519
2024-11-22 20:54:55 +00:00
Xuehai Pan
abbd71d29d [BE][Easy] enable PYFMT for torch.fx (#138443)
Reproduce command:

```bash
ghstack checkout https://github.com/pytorch/pytorch/pull/138443
git checkout HEAD~1 torch/
lintrunner -a --take "PYFMT" --all-files
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138443
Approved by: https://github.com/ezyang
2024-10-21 19:15:49 +00:00
Jason Ansel
28330a8a39 [reland 1/3][fx] Bypass custom __setattr__ in Node.__init__ (#135733)
Relands #135079 whcih was reverted by #135562

I broke this up into three parts to test internally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135733
Approved by: https://github.com/oulgen
2024-09-12 04:29:37 +00:00
Ivan Zaitsev
440f8f57af Revert "[fx] Bypass custom __setattr__ in Node.__init__ (#135079)" (#135562)
This reverts commit 66da3b3b2a.

#135079 breaks internal tests and needs to be reverted. Revert with mergebot doesn't work as this PR is technically part of the stack, but, according to @jansel, it should be possible to revert it individually.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135562
Approved by: https://github.com/jansel, https://github.com/seemethere
2024-09-10 18:07:11 +00:00
Jason Ansel
66da3b3b2a [fx] Bypass custom __setattr__ in Node.__init__ (#135079)
Before:
![image](https://github.com/user-attachments/assets/5f0a6ae6-6049-44d0-b5f2-a549a23ad97f)

After:
![image](https://github.com/user-attachments/assets/51c9f91b-f8a0-4043-8362-65813feec823)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135079
Approved by: https://github.com/oulgen
ghstack dependencies: #135070, #135076, #135082, #135084
2024-09-06 06:11:46 +00:00
Jason Ansel
bdfc8d9f96 [fx] Don't use generators in map_aggregate (#135082)
While the generators avoid a copy, they are slow.

Before:
![image](https://github.com/user-attachments/assets/70a55a9a-0595-4105-b0ab-22cf77c7409c)

After:
![image](https://github.com/user-attachments/assets/cecb9c59-ae36-47de-8b08-cab2c7cb3d57)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135082
Approved by: https://github.com/oulgen
ghstack dependencies: #135070, #135076
2024-09-05 23:41:30 +00:00
Jason Ansel
70779dded8 [fx] Compile time optimization in Node.__update_args_kwargs (#135076)
Before this we took two passes over all of the args.

Before:
![image](https://github.com/user-attachments/assets/24ce5628-03f4-4983-9f2d-5ddf0ca5816e)

After:
![image](https://github.com/user-attachments/assets/c9681aa2-32f0-4f6b-a598-fc6f90ffafb5)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135076
Approved by: https://github.com/Chillee
ghstack dependencies: #135070
2024-09-05 23:41:30 +00:00
Aaron Orenstein
ed86ac2f25 [BE] typing for decorators - fx/_compatibility (#134054)
Summary: See #131429

Test Plan: unit tests pass

Differential Revision: D61493706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134054
Approved by: https://github.com/oulgen
2024-08-26 04:00:27 +00:00
Aaron Orenstein
d95aedf5fd [BE] typing for decorators - fx/_compatibility (part 1) (#134202)
Part of #134054.

This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
2024-08-22 17:07:33 +00:00
Shangdi Yu
825002c9c6 [export][fx] More robust DCE pass (#132764)
Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).

- mark Proxy dump as NotImplemented for better error msg

- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.

More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.

Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test/quantization:test_quantization -- -r  qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r  test_fold_bn_erases_bn_node
```

Reviewed By: angelayi

Differential Revision: D60319175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
2024-08-06 22:27:22 +00:00
PyTorch MergeBot
945bf78894 Revert "[BE] typing for decorators - fx/_compatibility (#131568)"
This reverts commit 193f62fde9.

Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident.  This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))
2024-07-28 03:43:39 +00:00
Aaron Orenstein
193f62fde9 [BE] typing for decorators - fx/_compatibility (#131568)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568
Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519
2024-07-25 22:24:19 +00:00
rzou
98984422eb [triton_op] fix autotuning (#131363)
The problem was we were shoving SymInts into the constant_args side
table. The root problem is that torch.fx.node.base_types, which we use
to determine what can be put in the graph, doesn't actually have SymInt
in it. This PR fixes base_types to include SymInt.

Test Plan:
- tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131363
Approved by: https://github.com/oulgen, https://github.com/justinchuby
2024-07-24 14:03:37 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
PyTorch MergeBot
6b8ec2b371 Revert "[triton_op] fix autotuning (#131363)"
This reverts commit 154f27455a.

Reverted https://github.com/pytorch/pytorch/pull/131363 on behalf of https://github.com/ZainRizvi due to This was a tricky one, but looking at the code it's the change to torch/fx/node.py that triggered the type violation errors. Reverting since this is now breaking trunk ([comment](https://github.com/pytorch/pytorch/pull/131363#issuecomment-2245899858))
2024-07-23 18:01:09 +00:00
Oguz Ulgen
4ca8705035 Add mypy typing to fx node (#131434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131434
Approved by: https://github.com/zou3519
2024-07-23 17:00:31 +00:00
rzou
154f27455a [triton_op] fix autotuning (#131363)
The problem was we were shoving SymInts into the constant_args side
table. The root problem is that torch.fx.node.base_types, which we use
to determine what can be put in the graph, doesn't actually have SymInt
in it. This PR fixes base_types to include SymInt.

Test Plan:
- tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131363
Approved by: https://github.com/oulgen
2024-07-23 16:15:00 +00:00
Shangdi Yu
29e2e2afb6 Revert D59561509: Multisect successfully blamed "D59561509: [FX][export] DCE pass, check schema for node impurity (#130395)" for one test failure (#131341)
Summary:
This diff reverts D59561509
D59561509: [FX][export] DCE pass, check schema for node impurity (#130395) by yushangdi causes the following test failure:

Tests affected:
- [cogwheel:cogwheel_mtia_cmf_m5_shrunk_test#test_flow_with_verification](https://www.internalfb.com/intern/test/844425041436985/)

Here's the Multisect link:
https://www.internalfb.com/multisect/6533402
Here are the tasks that are relevant to this breakage:
T191383430: 10+ tests unhealthy for ads_mtia_inference

The backout may land if someone accepts it.

If this diff has been generated in error, you can Commandeer and Abandon it.

Test Plan: NA

Differential Revision: D60029318

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131341
Approved by: https://github.com/angelayi
2024-07-23 05:23:47 +00:00
Shangdi Yu
27ded03545 [FX][export] DCE pass, check schema for node impurity (#130395)
Change the default DCE pass to check node schema for impure nodes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130395
Approved by: https://github.com/angelayi, https://github.com/jgong5
2024-07-18 16:31:40 +00:00
PyTorch MergeBot
433ef4e444 Revert "[FX][export] DCE pass, check schema for node impurity (#130395)"
This reverts commit e22b0acc76.

Reverted https://github.com/pytorch/pytorch/pull/130395 on behalf of https://github.com/yushangdi due to breaking tests, need to rebase and fix ([comment](https://github.com/pytorch/pytorch/pull/130395#issuecomment-2235192986))
2024-07-18 02:46:03 +00:00
Shangdi Yu
e22b0acc76 [FX][export] DCE pass, check schema for node impurity (#130395)
Change the default DCE pass to check node schema for impure nodes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130395
Approved by: https://github.com/angelayi, https://github.com/jgong5
2024-07-18 00:55:20 +00:00
Brian Hirsh
a4d7aa498b [Traceable FSDP2] Add auto-functionalize support for mutable list[Tensor] (copy from Brian's PR #127347); enable E2E inductor unit test for transformer model (#129502)
Copy of Brian's PR: https://github.com/pytorch/pytorch/pull/127347 with additional changes to support mutable `List[Tensor]` in Inductor. Also enable E2E inductor unit test for Traceable FSDP2 + transformer model.

Test commands:
- `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_trace_fsdp_set_`
- `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_simple_mlp_fullgraph_backend_aot_eager`
- `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_simple_mlp_fullgraph_backend_inductor`
- `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_transformer_fullgraph_backend_aot_eager`
- `pytest -rA test/dynamo/test_misc.py::MiscTests::test_auto_functionalize_tensorlist`
- `pytest -rA  test/inductor/test_torchinductor.py::GPUTests::test_fallback_mutable_op_list_cuda`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129502
Approved by: https://github.com/zou3519
2024-06-27 17:50:57 +00:00
PyTorch MergeBot
45b2931b7e Revert "[Traceable FSDP2] Don't decompose fsdp.split_with_sizes_copy (#129414)"
This reverts commit b24787b757.

Reverted https://github.com/pytorch/pytorch/pull/129414 on behalf of https://github.com/ZainRizvi due to This PR is seems to be causing multiple macos failures.  Looks like it was merged before trunk jobs were started, which would have run those tests ([comment](https://github.com/pytorch/pytorch/pull/129414#issuecomment-2189479505))
2024-06-25 17:05:55 +00:00
Will Feng
b24787b757 [Traceable FSDP2] Don't decompose fsdp.split_with_sizes_copy (#129414)
This makes it easier to do pattern-matching on `fsdp.split_with_sizes_copy` in Inductor passes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129414
Approved by: https://github.com/bdhirsh
2024-06-25 03:08:56 +00:00
Brian Hirsh
b91a9dc328 [Brian's PR #128754] Use torch.ops.fsdp.set_ for FSDP2 storage resize; dont functionalize resize_, set_, split_with_sizes_copy.out (#129203)
This is a copy of Brian's PR https://github.com/pytorch/pytorch/pull/128754, with some changes in the test_distributed_patterns.py unit tests to more closely reflect FSDP2 patterns. Also disabled two tests `test_input_mutation_storage_resize_up_down` and `test_input_mutation_storage_resize_not_supported` in test_aotdispatch.py until we figure out the right behavior for them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129203
Approved by: https://github.com/bdhirsh
2024-06-23 06:07:19 +00:00
Will Feng
e165a5971f [Traceable FSDP2] Fix support for CUDA resize_storage_bytes_ (#129215)
Currently if `x` is a CUDA tensor, calling `x.untyped_storage().resize_()` seems to always go into the `built without cuda` branch of `resize_storage_bytes_()` regardless of whether PyTorch is built with CUDA. I suspect this is because `inductor_ops.cpp` is only included in `libtorch_cpu.so` thus doesn't have the `USE_CUDA` information or ability to link to CUDA-related functions.

This PR moves `resize_storage_bytes_()` related custom op functions out of `inductor_ops.cpp` into its standalone file `resize_storage_bytes.cpp` to be included in `libtorch_python.so` instead. This mimics the setup for `StorageMethods.cpp`. This way, `resize_storage_bytes_()` can have access to the CUDA-related functions, which passes the CUDA unit test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129215
Approved by: https://github.com/jansel
2024-06-22 18:38:47 +00:00
Oguz Ulgen
5b5d269d34 Speed up fx graph iteration by implementing it in C++ (#128288)
Before this change
```
python benchmarks/dynamo/microbenchmarks/fx_microbenchmarks.py
iterating over 100000000 FX nodes took 19.5s (5132266 nodes/s)
```

After this change
```
python benchmarks/dynamo/microbenchmarks/fx_microbenchmarks.py
iterating over 100000000 FX nodes took 3.4s (29114001 nodes/s)
```

5.7x improvement

Differential Revision: [D58343997](https://our.internmc.facebook.com/intern/diff/D58343997)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128288
Approved by: https://github.com/jansel, https://github.com/albanD
2024-06-11 05:48:31 +00:00