Commit Graph

95 Commits

Author SHA1 Message Date
Animesh Jain
a32eac345f [dynamo] Return gm.forward for eager backend (#124109)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124109
Approved by: https://github.com/yanboliang, https://github.com/jansel
ghstack dependencies: #124445
2024-04-20 14:11:05 +00:00
Boyuan Feng
9a71d12d92 [CUDAGraphTree] Support mutated inputs from prior cudagraph pool (#123231)
# PR
This PR supports mutating inputs in cudagraph trees, if these inputs are outputs from previous cudagraph. Please check #121861 for more details.

# Note on Optimistic Mutation Check
To determine whether applying cudagraph, we need to check input mutations, falling into four categories: a) no mutation, b) mutation on parameters/buffers, c) mutation on cudagraph recorded tensors, d) mutation on non-cudagraph recorded tensors. We can apply cudagraph for type a,b,c but cannot for type d. This input mutation types depends on function, current_node, and inputs.

Since `check_for_mutation` is slow, there is a trade-off on making type c or d faster.
- To make type d) faster, we want to `check_for_mutation` and call eager function early. However, this adds unnecessary overhead to type a, b, c due to the extra check.
- To make type c) faster, we want to skip `check_for_mutation` at the beginning and only `check_for_mutation` before `record_function` for a new function. This removes the overhead of `check_for_mutation` for type a, b, c. However, this adds extra overhead to type d due to `check_invariants` for all children nodes.

Instead, we design optimistic mutation check. The assumption is that, given a function and a node, the input mutation types usually remain the same across inputs. So, if we have ever detect a function on a node with type d, we will never detect it as type c. The detailed design is:
- [Slow Path] On the first invocation of a function on a node, we run `check_for_mutation` once and cache the input mutation type as `non_cudagraph_managed_mutation[node_id][func_id]`.
- [Fast Path] On the subsequent invocations of a function on a node, we skip `check_for_mutation`. For `non_cudagraph_managed_mutation[node_id][func_id]` as true, we directly call eager function. Otherwise, we `check_variants` and call cudagraph function.
- [Slow Path] Before `record_function`, we run `check_for_mutation` again.

**Q1: Would there be overhead for type a,b,c,d?**
A: No. We only check input mutation types for the first invocation of a function on a node.

**Q2: If a function happens to be type c during the first invocation on a node, could we detect it as type d in the future?**
A: Yes. This is done by `check_invariants` and guarantees the correctness.

**Q3: If a function happens to be type d during the first invocation on a node, could it still be recognized as type c in the future?**
A: No. But this should happen rarely according to our assumption. In the rare case that it happens, there would not be any correctness issues and the performance is the same as the eager (or inductor optimized) function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123231
Approved by: https://github.com/eellison
2024-04-19 10:32:12 +00:00
Xuehai Pan
93e249969b [BE] enable ruff rule RSE and remove useless parentheses in raise statements (#124261)
Remove useless parentheses in `raise` statements if the exception type is raised with no argument.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261
Approved by: https://github.com/albanD
2024-04-17 19:29:34 +00:00
Brian Hirsh
01ab5a3104 aot_eager and aot_eager_decomp_partition: include input mutations in graph (#123646)
In the next PR I force `set_()` input mutation to require always been in the graph.

It's a lot easier to do this if we make our other debugging backends allow input mutations in the graph. Input mutations are relatively hardened at this point, so I'd rather just have our debugging backends consistently allow input mutations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123646
Approved by: https://github.com/ezyang
ghstack dependencies: #122433
2024-04-11 00:07:20 +00:00
angelayi
493478db4a [effects] Add inductor support for tokens (#122347)
Given the following code/dynamo graph:
```
class GraphModule(torch.nn.Module):
    def forward(self, L_x_ : torch.Tensor):
        l_x_ = L_x_
        _print = torch.ops.aten._print('moo')
        res = l_x_ + l_x_;  l_x_ = None
        _print_1 = torch.ops.aten._print('moo')
        return (res,)
```

AOTAutograd will trace the following program, threading tokens from the inputs, through the effectful operator calls (torch.ops.aten._print), and as an output:
```
class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: "f32[0]", arg1_1: "f32[2, 3]"):
        with_effects = torch._higher_order_ops.effects.with_effects(arg0_1, torch.ops.aten._print.default, 'moo');  arg0_1 = None
        getitem: "f32[0]" = with_effects[0];  with_effects = None
        add: "f32[2, 3]" = torch.ops.aten.add.Tensor(arg1_1, arg1_1);  arg1_1 = None
        with_effects_1 = torch._higher_order_ops.effects.with_effects(getitem, torch.ops.aten._print.default, 'moo');  getitem = None
        getitem_2: "f32[0]" = with_effects_1[0];  with_effects_1 = None
        return (getitem_2, add)
```
However when we get to inductor, since we want the inductor generated code to not have any token inputs/outputs for better readability, we want to modify the aten graph by removing the tokens from inputs, and creating them through `torch.ops.aten._make_dep_token`, and sinking them through the `torch.ops.aten._sink_tokens` operators.
This has to be done *after* the partitioner, otherwise the partitioner will add the make_token/sink_token operators to the backwards graph.
```
class <lambda>(torch.nn.Module):
   def forward(self, arg1_1: "f32[2, 3]"):
       _make_dep_token_default: "f32[0]" = torch.ops.aten._make_dep_token.default()
       with_effects = torch._higher_order_ops.effects.with_effects(_make_dep_token_default, torch.ops.aten._print.default, 'moo');  _make_dep_token_default = None
       getitem: "f32[0]" = with_effects[0];  with_effects = None
       add: "f32[2, 3]" = torch.ops.aten.add.Tensor(arg1_1, arg1_1);  arg1_1 = None
       with_effects_1 = torch._higher_order_ops.effects.with_effects(getitem, torch.ops.aten._print.default, 'moo');  getitem = None
       getitem_2: "f32[0]" = with_effects_1[0];  with_effects_1 = None
       _sink_tokens_default = torch.ops.aten._sink_tokens.default((getitem_2,));  getitem_2 = None
       return (add,)
```
When doing inductor lowering, we convert `with_effects` calls to an `EffectfulKernel`, which just a `FallbackKernel` but with a pointer to previous effectful operator's call. During scheduling, we will create a `StarDep` between the EffectfulKernel and its previous EffectfulKernel so that they don't get reordered. The inductor generated python code looks like:
```
def call(args):
    arg1_1, = args
    args.clear()
    assert_size_stride(arg1_1, (2, 3), (3, 1))
    # Source Nodes: [_print], Original ATen: []
    buf2 = aten._print.default('moo')
    # Source Nodes: [_print_1], Original ATen: []
    buf3 = aten._print.default('moo')
    buf4 = empty_strided_cpu((2, 3), (3, 1), torch.float32)
    cpp_fused_add_0(arg1_1, buf4)
    del arg1_1
    return (buf4, )
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122347
Approved by: https://github.com/bdhirsh
2024-04-09 03:22:32 +00:00
Simon Fan
8ac0f072e6 [aot eager] Support frontend graphs with list arguments (#123212)
We already support bumpy inputs for 3rd party frontend and compiled backward graph, we should add the behavior to aot_eager too

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123212
Approved by: https://github.com/jansel
ghstack dependencies: #122691, #122746, #123007
2024-04-03 17:07:52 +00:00
eellison
5f46312dbb Reapply "Switch cudagraph backend to cudagraph trees (#121019)" and "Add Cudagraphs disable checking (#121018)" (#121864) (#122713)
This reverts commit 92ed8553a6.

No longer importing codecache or boxed_nop at top level, both of which casued issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122713
Approved by: https://github.com/anijain2305
2024-04-02 16:11:00 +00:00
JackCaoG
e38d60bc07 Remove some stale xla dynamo backend (#122128)
`torchxla_trace_once ` and `aot_torchxla_trivial ` should be removed.

In our internal(hopefully dashboard can be open source soon) torchbench daily runs, `openxla` backend has much higher passing rate and similar perfomrance as the `openxla_eval`(non-aot-auto-grad backend). We still use `openxla_eval` in llama2 example but I think we should move user to `openxla` backend going forward.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122128
Approved by: https://github.com/alanwaketan, https://github.com/jansel
2024-03-21 01:13:50 +00:00
Animesh Jain
92ed8553a6 Revert "Switch cudagraph backend to cudagraph trees (#121019)" and "Add Cudagraphs disable checking (#121018)" (#121864)
This reverts commit 9373ad0bb8.

Revert "Add Cudagraphs disable checking (#121018)"

This reverts commit 4af0e634bf.

Causes compilation time increase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121864
Approved by: https://github.com/eellison
2024-03-15 00:03:09 +00:00
Boyuan Feng
0c1ac4484d Support call_method in DDPOptimizer (#121771)
This PR fixes Issue #111279.

While #111279 reported the issue with `MultiheadAttention`, a minimal reproduction would be:
```python
class ToyModel(nn.Module):
    def __init__(self,):
        super().__init__()
        self.linear = nn.Linear(128, 10)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear.forward(x) # Error
        # return self.linear(x) # OK
```

Dynamo treats `self.linear(x)` as `call_module` while treating `self.linear.forward(x)` as a [`get_attr` and a `call_method`](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/variables/nn_module.py#L358-L378). However, existing DDPOptimizer assumes, for a `get_attr` node, `getattr(gm, node.target)` gives a tensor with the `requires_grad` attribute. Existing DDPOptimizer also does not support `call_method` nodes.

This PR adds support for `call_method` and check on `get_attr`. It also checks if a module's parameters have been added to a bucket to support multiple method calls from the same module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121771
Approved by: https://github.com/yf225
2024-03-13 20:03:15 +00:00
eellison
9373ad0bb8 Switch cudagraph backend to cudagraph trees (#121019)
Switch torch.compile(..., backend="cudagraphs") to use cudagraph trees. Enabled a few test in cudagraph_trees and note that there is another test suite existing for cudagraphs backend: https://github.com/pytorch/pytorch/blob/main/test/dynamo/test_cudagraphs.py.

This is basically the inductor cudagraphs without inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121019
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #121017, #121018
2024-03-08 22:56:26 +00:00
eellison
4af0e634bf Add Cudagraphs disable checking (#121018)
Adds the same cudagraphs disable checking from inductor - cudagraph trees to cudagraphs backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121018
Approved by: https://github.com/ezyang
ghstack dependencies: #121017
2024-03-08 22:47:24 +00:00
Elias Ellison
937e89f252 cudagraphs backend refactoring (#121017)
This is just some refactoring.. no functional changes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121017
Approved by: https://github.com/ezyang
2024-03-08 19:47:41 +00:00
Elias Ellison
d03b11ad5b Pass inductor strides forward in ddp optimizer (#120523)
# Note: Returning Fake Tensors on First AOT Autograd Call
            #
            # Inductor will optimize strides of outputs when it deems it profitable.
            # For instance, converting to channels last. When we split the graph here
            # into multiple inductor compilations, we need to make sure that the
            # output strides of one compilation is appropriately passed to the subsequent
            # compilations. However, the mapping from inductor output to dynamo output
            # is non-trivial due to aot_autograd's deduping, de-aliasing, mutation, re-writing,
            # subclass handling, etc. In order to replay all this logic we set a flag such that
            # the first invocation of inductor in aot_autograd will return Fake Tensors with
            # appropriate strides. Then, all of aot autograd's runtime logic is replayed.
            # This gives us the appropriately strided outputs here which will reflect runtime strides.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120523
Approved by: https://github.com/yf225, https://github.com/bdhirsh
2024-02-29 22:25:00 +00:00
Elias Ellison
9c9bde515c Factor out Submod compilers (#120527)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120527
Approved by: https://github.com/kadeng
2024-02-28 22:11:47 +00:00
Edward Z. Yang
1a1fc1047d Add structured trace logs (#120289)
Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit

How to read the diff:
* Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes)
* torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs
* torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines.
* torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log.
* test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable.

https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289
Approved by: https://github.com/Skylion007
ghstack dependencies: #120712
2024-02-28 01:01:41 +00:00
PyTorch MergeBot
f3dd2a544c Revert "Add structured trace logs (#120289)"
This reverts commit 9dfaef962c.

Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))
2024-02-27 19:49:05 +00:00
Edward Z. Yang
9dfaef962c Add structured trace logs (#120289)
Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit

How to read the diff:
* Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes)
* torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs
* torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines.
* torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log.
* test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable.

https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs.

Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289
Approved by: https://github.com/Skylion007
2024-02-27 00:04:23 +00:00
gs-olive
e0f6fa6a7c Windows Dynamo Error Removal CI Check (#115969)
Rebase of #111313 onto `main`, for CI validation

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969
Approved by: https://github.com/PaliC, https://github.com/thiagocrepaldi
2024-02-14 21:14:36 +00:00
PyTorch MergeBot
4a5b2cd6cb Revert "Windows Dynamo Error Removal CI Check (#115969)"
This reverts commit 45e7af5818.

Reverted https://github.com/pytorch/pytorch/pull/115969 on behalf of https://github.com/PaliC due to this pr ended up breaking some of our periodic tests ([comment](https://github.com/pytorch/pytorch/pull/115969#issuecomment-1942934386))
2024-02-14 01:11:46 +00:00
gs-olive
45e7af5818 Windows Dynamo Error Removal CI Check (#115969)
Rebase of #111313 onto `main`, for CI validation

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969
Approved by: https://github.com/ezyang
2024-02-08 21:23:45 +00:00
Will Constable
abe3c55a6a Update DDP dynamo debug docs (#118295)
Refreshes https://github.com/pytorch/pytorch/pull/114201 and updates it to include other log names that also include ddp_optimizer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118295
Approved by: https://github.com/LucasLLC, https://github.com/wanchaol
2024-01-29 14:58:26 +00:00
Edward Z. Yang
d03173e88c Unify MYPYINDUCTOR and MYPY (#118432)
The original motivation for MYPYINDUCTOR was a faster type checking configuration that only checked a subset of files. With the removal of `follow_imports = ignore`, we are now able to use dmypy to do fast incremental typechecking, eliminating the need for this.

Perhaps erroneously, when I tee'ed up this PR I elected to delete the `follow_imports = skip` designations in the mypy-inductor.ini. This lead to a number of extra type error suppressions that I manually edited. You will need to review.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118432
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418
2024-01-27 17:23:20 +00:00
Will Feng
a27ed4d364 [dynamo / DDP] Add optimize_ddp_lazy_compile config to control lazy compile for DDPOptimizer (False by default) (#116292)
We want to enable `optimize_ddp_lazy_compile` by default as soon as possible, becuase it will fix stride mismatch errors (see motivation: https://github.com/pytorch/pytorch/pull/114154).

However, lazy compile currently causes shape mismatch in other cases (`test_graph_split_inductor_transpose`) and we need to fix them before we can enable it by default.

Differential Revision: D52373445

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116292
Approved by: https://github.com/williamwen42, https://github.com/wconstab
2023-12-21 22:34:24 +00:00
Jon Chuang
2cf0cf8137 [dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)
Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154
Approved by: https://github.com/wconstab, https://github.com/yf225
2023-12-06 18:50:14 +00:00
PyTorch MergeBot
e38a3a6079 Revert "[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)"
This reverts commit 3f574eadb4.

Reverted https://github.com/pytorch/pytorch/pull/114154 on behalf of https://github.com/clee2000 due to reverted internally, broke internal builds, not sure why bot isn't working ([comment](https://github.com/pytorch/pytorch/pull/114154#issuecomment-1832496040))
2023-11-29 18:43:17 +00:00
Jon Chuang
3f574eadb4 [dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)
Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154
Approved by: https://github.com/wconstab
2023-11-28 06:29:43 +00:00
Will Constable
2333d381b2 Make 'distributed' TORCH_LOGS include ddpoptimizer (#114376)
There are now 3 ways to see logs from ddpoptimzer.
1) TORCH_LOGS="distributed"
2) TORCH_LOGS="dynamo"
3) TORCH_LOGS="torch._dynamo.backends.distributed"

(1 and 2 are different supersets of 3 that also include other content)

Note: ddp_graphs is still a separate 'artifact' logger, which just
includes graph dumps from the graph-splitting process.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114376
Approved by: https://github.com/wanchaol
2023-11-28 02:39:28 +00:00
voznesenskym
081c5b3adc Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926) (#114526)
Summary:

The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this)

cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: huydhn, Chillee

Differential Revision: D51566250

Pulled By: voznesenskym

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526
Approved by: https://github.com/Chillee, https://github.com/huydhn
2023-11-26 23:40:32 +00:00
PyTorch MergeBot
2f3beb715c Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926)"
This reverts commit 2ca1119d53.

Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))
2023-11-22 12:52:33 +00:00
PyTorch MergeBot
e239a2b2d7 Revert "[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)"
This reverts commit 266054c3ca.

Reverted https://github.com/pytorch/pytorch/pull/114154 on behalf of https://github.com/DanilBaibak due to The lower PR in the stack https://github.com/pytorch/pytorch/pull/113926 breaks the internal build ([comment](https://github.com/pytorch/pytorch/pull/114154#issuecomment-1822704476))
2023-11-22 12:46:15 +00:00
Jon Chuang
266054c3ca [dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)
Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154
Approved by: https://github.com/wconstab
2023-11-21 22:40:08 +00:00
voznesenskym
2ca1119d53 Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926)
The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926
Approved by: https://github.com/ezyang, https://github.com/eellison
2023-11-20 23:06:37 +00:00
Jez Ng
4667e20b3f Delete a bunch of type-ignores (#113990)
* Replaced `ignore[import]` by mypy config file entries
* Removed a bunch of ignores around previously-fixed attr-defined /
  call-arg issues
* Fixed some invalid / undefined types; added a few more type-ignores to
  squelch the downstream errors this exposed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113990
Approved by: https://github.com/eellison, https://github.com/Skylion007
ghstack dependencies: #113979
2023-11-18 02:48:38 +00:00
Aaron Gokaslan
cb856b08b2 [BE]: Attach cause to some exceptions and enable RUFF TRY200 (#111496)
Did some easy fixes from enabling TRY200. Most of these seem like oversights instead of intentional. The proper way to silence intentional errors is with `from None` to note that you thought about whether it should contain the cause and decided against it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111496
Approved by: https://github.com/malfet
2023-10-19 21:56:36 +00:00
Aaron Bockover
0e2b22c451 [ONNX] switch from onnxscript-preview to onnxscript (#109139)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109139
Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi
2023-09-18 22:24:47 +00:00
Brian Hirsh
da914aed21 error when using _dynamo.optimize_ddp=True and _inductor.keep_output_stride=False together (#108235)
From talking to @wconstab, we agreed that because of the way DDPOptimizer is written, it is (sort of) incompatible with inductor's `keep_output_stride=False` optimizations (and will cause silent correctness problems if you use them ogether). Added an assertion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108235
Approved by: https://github.com/wconstab
ghstack dependencies: #108081
2023-09-05 20:02:35 +00:00
Aaron Bockover
15e5bd5103 [ONNX] Support torch.compile(backend="onnxrt", options=OrtBackendOptions(...)) (#107973)
This reworks the DORT backend factory function to support the options kwarg of torch.compile, and defines a concrete OrtBackendOptions type that can be used to influence the backend.

Caching is also implemented in order to reuse backends with equal options.

Wrapping the backend in auto_autograd also becomes an option, which allows `OrtBackend` to always be returned as the callable for torch.compile; wrapping happens internally if opted into (True by default).

Lastly, expose options for configuring preferred execution providers (will be attempted first), whether or not to attempt to infer an ORT EP from a torch found device in the graph or inputs, and finally the default/fallback EPs.

### Demo

The following demo runs `Gelu` through `torch.compile(backend="onnxrt")` using various backend options through a dictionary form and a strongly typed form. It additionally exports the model through both the ONNX TorchScript exporter and the new TorchDynamo exporter.

```python
import math

import onnx.inliner
import onnxruntime
import torch
import torch.onnx

torch.manual_seed(0)

class Gelu(torch.nn.Module):
    def forward(self, x):
        return x * (0.5 * torch.erf(math.sqrt(0.5) * x) + 1.0)

@torch.compile(
    backend="onnxrt",
    options={
        "preferred_execution_providers": [
            "NotARealEP",
            "CPUExecutionProvider",
        ],
        "export_options": torch.onnx.ExportOptions(dynamic_shapes=True),
    },
)
def dort_gelu(x):
    return Gelu()(x)

ort_session_options = onnxruntime.SessionOptions()
ort_session_options.log_severity_level = 0

dort_gelu2 = torch.compile(
    Gelu(),
    backend="onnxrt",
    options=torch.onnx._OrtBackendOptions(
        preferred_execution_providers=[
            "NotARealEP",
            "CPUExecutionProvider",
        ],
        export_options=torch.onnx.ExportOptions(dynamic_shapes=True),
        ort_session_options=ort_session_options,
    ),
)

x = torch.randn(10)

torch.onnx.export(Gelu(), (x,), "gelu_ts.onnx")

export_output = torch.onnx.dynamo_export(Gelu(), x)
export_output.save("gelu_dynamo.onnx")
inlined_model = onnx.inliner.inline_local_functions(export_output.model_proto)
onnx.save_model(inlined_model, "gelu_dynamo_inlined.onnx")

print("Torch Eager:")
print(Gelu()(x))

print("DORT:")
print(dort_gelu(x))
print(dort_gelu2(x))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107973
Approved by: https://github.com/BowenBao
2023-08-26 18:20:18 +00:00
JackCaoG
08e49fe97a Make openxla and opexla_eval backend show up in list_backends (#107905)
The reason to keep the non-aot(openxla_eval) backend is discussed in https://github.com/pytorch/xla/issues/5430#issuecomment-1683191662.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107905
Approved by: https://github.com/jansel
2023-08-25 21:52:17 +00:00
JackCaoG
139437bb84 Make Openxla dynamo backend take boxed input (#107260)
Fixes https://github.com/pytorch/xla/issues/5454

Also adding the inference(non-aot) backend back since we see a speed regression when using the aot-backend compared to the non-aot openxla backend. It is being tracked in https://github.com/pytorch/xla/issues/5430

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107260
Approved by: https://github.com/shunting314, https://github.com/jansel
2023-08-18 16:58:05 +00:00
Wei-Sheng Chin
22f5889753 [Dynamo, ONNX] Replace onnxrt backend with new backend from ONNXRuntime team (#106929)
In https://github.com/pytorch/pytorch/pull/106589, a new ONNXRuntime-based Dynamo backend is introduced. As mentioned in that PR, we hope to replace legacy `onnxrt` with that new backend. This PR remove legacy `onnxrt` and register the new backend under the same name.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106929
Approved by: https://github.com/thiagocrepaldi, https://github.com/BowenBao, https://github.com/abock, https://github.com/msaroufim, https://github.com/jansel
2023-08-15 22:50:46 +00:00
Ivan Yashchuk
c913f3857f Remove dynamo+nvfuser (#105789)
This PR removes unmaintained Dynamo+nvFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789
Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD
2023-08-08 22:29:32 +00:00
PyTorch MergeBot
891bb259f8 Revert "Remove dynamo+nvfuser (#105789)"
This reverts commit 6030151d37.

Reverted https://github.com/pytorch/pytorch/pull/105789 on behalf of https://github.com/DanilBaibak due to Break a lot of tests on main. ([comment](https://github.com/pytorch/pytorch/pull/105789#issuecomment-1669710571))
2023-08-08 14:20:32 +00:00
Ivan Yashchuk
6030151d37 Remove dynamo+nvfuser (#105789)
This PR removes unmaintained Dynamo+nvFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789
Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD
2023-08-08 13:29:31 +00:00
JackCaoG
c9eb95cca4 Update XLA dyanmo backend name (#106489)
This is to deprecate the old XLA dyanmo backend and rename it `openxla`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106489
Approved by: https://github.com/jansel, https://github.com/shunting314
2023-08-03 20:00:37 +00:00
Animesh Jain
735e6ae801 [dynamo] Maintainable code - Move decorators in a separate file (#105070)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105070
Approved by: https://github.com/ezyang
2023-07-13 07:41:19 +00:00
Animesh Jain
d0e5c681f5 [dynamo][ddp][ac] Fallback to single bucket when higher order op (#104639)
This helps unblock an internal model. The real fix requires lot of work, which might question the alternate approach of partitioning AOT graphs instead of Dynamo graphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104639
Approved by: https://github.com/wconstab
2023-07-06 02:20:15 +00:00
jiayisun
d62a80adc3 remove ipex backend (#104329)
Move IPEX backend from PyTorch to IPEX.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104329
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-07-04 09:21:27 +00:00
Brian Hirsh
875f60399e pre_dispatch tracing: support autocast and no_grad/enable_grad ctx managers, add a pre_dispatch_eager dynamo backend (#103024)
This PR adds support for `enable_grad`/`no_grad`/`autocast` context managers getting properly traced in `pre_dispatch` tracing. The stuff in this PR includes:
- I added a torch function mode that runs during make_fx pre_dispatch tracing, `ProxyTorchFunctionMode`. It directly intercepts the torch ops that run during the above context managers, and adds them to the current graph instead of executing them
- `enable_grad` and `no_grad` currently desugar into `torch._C.set_grad_enabled(bool)`, but this API isn't currently overrideable by torch function so I added the ability to interpose there
- the `torch.amp` context managers don't currently have a nice equivalent, like `set_autocast_enabled(state)`, so I ended up adding two new API's: `torch.amp._set_autocast_enabled` and `torch.amp._set_autocast_disabled`. If you look at how the context manager is implemented, it ends up calling several different state-changing functions, some of which depend on the backend - so I figured that it would be cleaner just to add a new API (that should probably only be used by tracing) - but open to feedback
- I added a new dynamo backend, `compile(backend="pre_dispatch_eager")`. When pre_dispatch tracing becomes always-on in inductor, it will be another potential surface for bugs. I also added a test file for it (`test/dynamo/test_pre_dispatch.py`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103024
Approved by: https://github.com/ezyang
2023-06-29 14:17:42 +00:00
Animesh Jain
f9c64a1156 [debugging] aot_eager backend to use the min-cut partitioner (#103555)
default_partitioner is kind of broken when it comes to memory footprint. Moving aot_eager to use min-cut partitioner is better debugging experience.

One bad thing though would be that we will much lower speedup numbers, because min cut partitioner will try to recompute ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103555
Approved by: https://github.com/eellison, https://github.com/jansel
2023-06-22 09:31:08 +00:00