Update doc (#166024)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166024 Approved by: https://github.com/yiming0416
2025-12-06 12:20:52 +01:00 · 2025-10-21 13:32:43 -07:00 · 2025-10-21 13:32:43 -07:00 · 0a93295da0
commit 0a93295da0
parent 4b898b51b9
1 changed files with 39 additions and 25 deletions
--- a/docs/source/export.md
+++ b/docs/source/export.md
@ -44,9 +44,9 @@ following invariants. More specifications about the IR can be found
 - **Normalized**: There are no Python semantics within the graph. Submodules
  from the original programs are inlined to form one fully flattened
  computational graph.
- **Graph properties**: The graph is purely functional, meaning it does not
-  contain operations with side effects such as mutations or aliasing. It does
-  not mutate any intermediate values, parameters, or buffers.
+- **Graph properties**: By default, the graph may contain both functional and
+  non-functional operators (including mutations). To obtain a purely functional
+  graph, use `run_decompositions()` which removes mutations and aliasing.
 - **Metadata**: The graph contains metadata captured during tracing, such as a
  stacktrace from user's code.

@ -56,8 +56,8 @@ Under the hood, `torch.export` leverages the following latest technologies:
  called the Frame Evaluation API to safely trace PyTorch graphs. This
  provides a massively improved graph capturing experience, with much fewer
  rewrites needed in order to fully trace the PyTorch code.
- **AOT Autograd** provides a functionalized PyTorch graph and ensures the graph
-  is decomposed/lowered to the ATen operator set.
+- **AOT Autograd** ensures the graph is decomposed/lowered to the ATen operator
+  set. When using `run_decompositions()`, it can also provide functionalization.
 - **Torch FX (torch.fx)** is the underlying representation of the graph,
  allowing flexible Python-based transformations.

@ -444,23 +444,31 @@ saved_exported_program = torch.export.load('exported_program.pt2')

 (training-export)=

-## Export IR, Decompositions
+## Export IR: Training vs Inference

 The graph produced by `torch.export` returns a graph containing only
 [ATen operators](https://pytorch.org/cppdocs/#aten), which are the basic unit of
-computation in PyTorch. As there are over
-3000 ATen operators, export provides a way to narrow down the operator set used
-in the graph based on certain characteristics, creating different IRs.
+computation in PyTorch. Export provides different IR levels based on your use case:

-By default, export produces the most generic IR which contains all ATen
-operators, including both functional and non-functional operators. A functional
-operator is one that does not contain any mutations or aliasing of the inputs.
+| IR Type | How to Obtain | Properties | Operator Count | Use Case |
+|---------|---------------|------------|----------------|----------|
+| Training IR | `torch.export.export()` (default) | May contain mutations | ~3000 | Training with autograd |
+| Inference IR | `ep.run_decompositions(decomp_table={})` | Purely functional | ~2000 | Inference deployment |
+| Core ATen IR | `ep.run_decompositions(decomp_table=None)` | Purely functional, highly decomposed | ~180 | Minimal backend support |
+
+### Training IR (Default)
+
+By default, export produces a **Training IR** which contains all ATen
+operators, including both functional and non-functional (mutating) operators.
+A functional operator is one that does not contain any mutations or aliasing
+of the inputs, while non-functional operators may modify their inputs in-place.
 You can find a list of all ATen operators
 [here](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml)
 and you can inspect if an operator is functional by checking
 `op._schema.is_mutable`.

-This generic IR can be used to train in eager PyTorch Autograd.
+This Training IR, which may contain mutations, is designed for training use
+cases and can be used with eager PyTorch Autograd.

 ```{code-cell}
 import torch
@ -480,15 +488,18 @@ ep_for_training = torch.export.export(M(), (torch.randn(1, 1, 3, 3),))
 print(ep_for_training.graph_module.print_readable(print_output=False))
 ```

-However, if you want to use the IR for inference, or decrease the amount of
-operators being used, you can lower the graph through the
-{func}`ExportedProgram.run_decompositions` API. This method decomposes the
-ATen operators into the ones specified in the decomposition table, and
-functionalizes the graph.
+### Inference IR (via run_decompositions)

-By specifying an empty set, we're only performing functionalization, and does
-not do any additional decompositions. This results in an IR which contains ~2000
-operators (instead of the 3000 operators above), and is ideal for inference cases.
+To obtain an **Inference IR** suitable for deployment, use the
+{func}`ExportedProgram.run_decompositions` API. This method automatically:
+1. Functionalizes the graph (removes all mutations and converts them to functional equivalents)
+2. Optionally decomposes ATen operators based on the provided decomposition table
+
+This produces a purely functional graph ideal for inference scenarios.
+
+By specifying an empty decomposition table (`decomp_table={}`), you get just
+the functionalization without additional decompositions. This produces an
+Inference IR with ~2000 functional operators (compared to 3000+ in Training IR).

 ```{code-cell}
 import torch
@ -514,11 +525,14 @@ As we can see, the previously in-place operator,
 `torch.ops.aten.add_.default` has now been replaced with
 `torch.ops.aten.add.default`, a functional operator.

-We can also further lower this exported program to an operator set which only
-contains the
+### Core ATen IR
+
+We can further lower the Inference IR to the
 `Core ATen Operator Set <https://pytorch.org/docs/main/torch.compiler_ir.html#core-aten-ir>`__,
-which is a collection of only ~180 operators. This IR is optimal for backends
-who do not want to reimplement all ATen operators.
+which contains only ~180 operators. This is achieved by passing `decomp_table=None`
+(which uses the default decomposition table) to `run_decompositions()`. This IR
+is optimal for backends who want to minimize the number of operators they need
+to implement.

 ```{code-cell}
 import torch