Add dynamic shapes doc (#159428)

This PR adds new Dynamic Shapes documentation and expands on the existing one.
- Adds a new structure with Intro, Core Concepts, Troubleshooting

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159428
Approved by: https://github.com/bobrenjc93

Co-authored-by: bobrenjc93 <bobren@meta.com>
This commit is contained in:
Svetlana Karslioglu 2025-09-22 21:01:24 +00:00 committed by PyTorch MergeBot
parent 8abc2af9b9
commit 8e62d01f7a
21 changed files with 1292 additions and 146 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 418 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 256 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 530 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 187 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 359 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 189 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 566 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 251 KiB

View File

@ -0,0 +1,239 @@
(dynamic_shapes_advanced_control_options)=
# Advanced Options to Control Dynamic Behavior
PyTorch provides several advanced options to control dynamic behavior.
These options requires a deep understanding of the PyTorch internals and
may inlvolve setting additional tools. These options include:
* Profile-Guided Optimization (PGO) is a technique that allows the compiler
to save automatic dynamic decisions and reuse them across jobs.
* Compiler Collective is a feature that is used to modify automatic dynamic
shapes behavior by inferring if an input is dynamic based on whether
its size varies across ranks.
## Profile-Guided Optimization (PGO)
Profile-Guided Optimization (PGO) enhances automatic dynamic by sharing profiling decisions across runs of your model. Specifically, it serializes all the choices made by automatic dynamic into a file on disk. You can then copy this file—or store it in a centralized metadata service like S3—and reuse it on other machines to ensure consistent behavior across environments.
For the purposes of the rest of this tutorial, you can use the following environmental variables to turn on PGO locally `TORCH_COMPILE_JOB_ID=1 TORCH_DYNAMO_AUTOMATIC_DYNAMIC_LOCAL_PGO=1`
(identifying-dynamic-elements-marked-by-pgo)=
### Identifying Dynamic Elements Marked by PGO
Use `tlparse` to find line numbers of interest and check for multiple values
seen for inputs.
To determine which elements are marked as dynamic by Profile-Guided Optimization (PGO),
follow these steps using `tlparse`:
1. In the `tlparse` output, identify the line number of the frame of interest. Example:
```{image} ../_static/img/dynamic_shapes/tlparse4_pgo.png
```
2. Open `local_code` using `put_local_code_state_` or `put_remote_code_state_` for the
latest frame (for example, 6/1).
Each `?` indicates that multiple values have been observed for this input.
For instance, the following output shows that the input `L['m']` has been seen with
multiple sizes at `size[0]`, but the stride has consistently been 1:
```
/data/users/bobren/a/pytorch/r2.py:2:func:
L['m']: fully dynamic scalar or tensor
L['x']: tensor size=[?] stride=[1]
L['y']: tensor size=[?] stride=[1]
L['z']: tensor size=[?] stride=[1]
```
```{note}
If an element is marked as dynamic by PGO, it does not guarantee that it will remain dynamic in the graph. Specialization can revert it to a static state.
```
## Compiler Collective
Different ranks can communicate with each other to share observed sizes. In the second
iteration, automatic dynamic uses this information to determine which elements to mark
as dynamic based on inputs seen across all ranks. Check this [PR](https://github.com/pytorch/pytorch/pull/130935) for more details.
To enable this feature, use `enable_compiler_collectives=True` with the `@config.patch`
decorator.
```python
@config.patch(enable_compiler_collectives=True)
```
```{note}
This feature enables the use of collectives during compilation to
synchronize behavior across ranks. Currently, it is used to modify
automatic dynamic shapes behavior by inferring if an input is dynamic
based on whether its size varies across ranks. Since this synchronization
uses collectives, all ranks must run compilation simultaneously; ranks must
not diverge with graph breaks. This is most reliably achieved by ensuring
torch is only run on SPMD programs. Violating this invariant may result in
deadlocking NCCL and encountering a NCCL timeout.
```
## Reducing Compilations: Step by Step
If you have a model that you can run on your master job and have a `tlparse`,
here's whatyou should do next:
### Step 1: Mark Dynamic Elements
The first step is to reduce initial compilations that are eventually optimized away
by automatic dynamic or PGO. This is straightforward because we know it will work
upfront. If, in one run, a frame starts with static graphs and converges to
dynamic graphs, and if you notice a reduction in the number of compiled
frames in a second (warm) PGO-enabled run, it's likely due to this optimization.
This is a two-step process:
1. Find elements marked as dynamic by PGO or automatic dynamic.
2. Mark them as dynamic using one of the {ref}`user_annotations`.
#### How to Identify Elements to Mark as Dynamic
Follow these guidelines:
1. **PGO artifact:** Follow the steps in {ref}`identifying-dynamic-elements-marked-by-pgo`.
2. **Dynamic Logs:** If you have a run with `TORCH_LOGS="+dynamic"`, each
time a new dynamic dimension is allocated, a debug line will specify it
along with the input name.
3. **Compare Graphs:** For frames with reduced compilations across runs,
inspect the Dynamo graphs in the second run or the latest runs in the
cold run. Look for elements marked as dynamic in those graphs. Specifically,
find graphs that are similar (once specialized and once dynamic).
Even without a warm run, you can inspect all graphs for a specific frame
to see if some are similar and converge to a dynamic version.
For example, in the following `tlparse` snapshot, Dynamo graphs 20/0,
20/1, and 20/2 are similar except for different sizes (for example,
graph 20/0 vs. graph 20/2). In the Dynamo graph of 20/2, sizes `s0`,
`s1`, and `s5` are used for `rotary_pos_emb_` and `x`.
```{image} ../_static/img/dynamic_shapes/tlparse5_dynamic_shapes.png
```
```{tip}
Two graphs are considered similar if they have the same sequence of calls for
torch operations and the same tensor inputs. Variations may exist in integer
inputs that could be inlined in the specialized version or arithmetic
computations that only exist in the dynamic version due to inlining in the
static version.
```
### Step 2: Debugging: Identifying Missed Opportunities
The complexity of debugging can vary greatly depending on the issues you
encounter. The end result is often to find a bug, enable a flag, or modify
user/framework code.
#### Finding Similar Graphs
Start by identifying a group of similar graphs that you might want to combine
into one dynamic graph, as discussed in the previous section on comparing
graphs. If you can't find any similar graphs, there's nothing further to do
in this step.
#### Quick Checks: Fail Fast
After finding similar graphs, you want to understand why the have recompilations.
Check the following:
1. **Check Recompile Reasons:** For graphs you believe are similar, click on
`recompile_reason` in the `tlparse` output for the later graph. Ensure the
reason is size-related and not due to other factors. For example, while
in these screenshot the recomplile reason is size-related:
```{image} ../_static/img/dynamic_shapes/tlparse6_size_related_recompilations.png
```
In the one below it is not, which indicates that dynamic shapes won't resolve it:
```{image} ../_static/img/dynamic_shapes/tlparse7_not_size_related_recompilations.png
:width: 500px
:align: center
```
2. **Compare Guards Files:** Ensure there are no guards on non-size-related
elementsthat exist in one graph but not the others.
3. **Early Check for Custom Triton Kernels:** Check if your model calls custom
Triton kernels with `tl.constexpr` arguments, as these are always
specialized. If your model receives different values for these arguments,
it could be a source of recompilation.
## **Identifying and Fixing Recompilation Causes**
1. **Is Something Not Marked Dynamic but Should Be?** Determine if an input was
marked dynamic and got specialized or was not marked dynamic at all. You can
identify this by:
* Checking the Dynamo graph - look for `Sym(number)`. For example:
```
Sym(256) vs Sym(s0)
```
* Using dynamic logs:
```
["TORCH_LOGS=+dynamic"]
create_symbol s2 = 2 for L['self']._modules['cle ...
```
* Reviewing guards files. If a tensor size is dynamic, it will be indicated as `None`:
```
TENSOR_MATCH:check_tensor(L['self'].x._parameters['weight']], Parameter, DispatchKeySet(CPU, BackendSelect, ADInplaceOrView, AutogradCPU), torch.float32, device=None, requires_grad=True, size=[None, None], stride=[None, 1])
```
2. **Why Is It Not Marked Dynamic?** If you determine an element is not marked dynamic, consider:
* Checking if it's an `nn` module property, parameter, or field. Verify setting for the flags:
* `force_parameter_static_shapes = True`
* `force_nn_module_property_static_shapes = True`
* `allow_unspec_int_on_nn_module = False`
* Or using the dynamic allow list to mark it dynamic, which should have the highest priority.
```{tip}
Marking elements one by one can be time-consuming. Initially, flip the flags to
identify any blocking specializations, then decide how to mark them
dynamic at the end of the process.
```
* If you feel, like it could be a bug, please file a bug report and mark
with the `module: dynamic shapes` label. Check the list of known issues in
[this list](https://github.com/pytorch/pytorch/issues?q=sort%3Aupdated-desc+state%3Aopen+label%3A%22module%3A+dynamic+shapes%22).
3. **Is a Dynamic Element Getting Specialized?** Determine why it is specialized.
It could be due to user code (such as an `if` condition), framework code, or a
call to a Triton kernel. To identify the reason for specialization:
* **Using tlparse:** Check the `compilation_metrics` for a specialization section, which will indicate what got specialized and the user and framework stack when it happened. Example:
```{image} ../_static/img/dynamic_shapes/tlparse8_compilation_metrics.png
```
The log above indicates that `s0` is specialized to `33` due to the following code:
```
`if self.x ==33` at example4.py line 16.
```
* **+Dynamic Logs:** pass `["TORCH_LOGS=+dynamic"]`. Look for the first specialization, as once a variable is specialized, all dependent variables get specialized too.
Example log:
```
torch/fx/experimental/symbolic_shapes.py:6557] [0/2] eval Eq(s0, 33) [guard added] if self.x ==33: # example4.py:16 in forward (_dynamo/variables/tensor.py:1242 in evaluate_expr), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Eq(s0, 33)"
V0228 12:04:24.190000 2990033 torch/fx/experimental/symbolic_shapes.py:6000] [0/2] _update_var_to_range s0 = VR[33, 33] (update)
```
The log above indicates that `s0` is specialized to `33` due to the following code:
```
if self.x ==33. At example4.py like 16.
```

View File

@ -0,0 +1,45 @@
(backed-vs-unbacked-symints)=
# Backed vs Unbacked Symints
Backed `SymInts` are symbolic integers that have a concrete value or "hint"
associated with them. This means that torch can use these values to make
decisions about control flow, such as determining which branch of code
to execute. They are typically derived from operations where the size or
value is known or can be inferred.
Unbacked `SymInts` are symbolic integers that do not have a concrete value or
hint. They often arise from data-dependent operations, such as `.nonzero()`
or `.item()`, where the size or value cannot be determined at compile time.
Since they lack a concrete value, they cannot be used for control flow
decisions, and attempting to do so requires a graph break.
Unbacked `SymInts` use *oblivious-size reasoning* which is particularly
useful when you are dealing with
{ref}`0/1 specialization recompilation problem <zero-one-specialization>`.
In summary, backed `SymInts` have known values that can be used for
decision-making, while unbacked `SymInts` do not, requiring special handling
to avoid graph breaks.
Unbacked symbolic integers can be too restrictive, causing most PyTorch programs
to fail. To address this, you can use the following methods and APIs as
workaround:
* Use higher-level APIs like `empty` instead of `empty_strided` to create tensors.
This ensures the tensor is non-overlapping and dense, avoiding unnecessary stride
sorting and guard creation.to avoid unnecessary recomputation of these properties.
* Modify your code to make precomputed properties *lazy*. This ensures that
guards on unbacked symbolic integers are only applied when necessary,
reducing computational overhead.
## How to use unbacked
To use unbacked APIs, replace `mark_dynamic` with `mark_unbacked` and
`TORCH_COMPILE_DYNAMIC_SOURCES` with `TORCH_COMPILE_UNBACKED_SOURCES`.
This tells the compiler to treat an input as unbacked.
```{seealso}
* {ref}`dynamic_shapes`
* {ref}`torch.export`
* {ref}`what_is_a_specialization`
```

View File

@ -0,0 +1,10 @@
(dynamic_shapes_beyond_the_basics)=
# Beyond the Basics
This section covers some advanced topics related to dynamic shapes. This includes more complex explanations of how dynamic shapes work, 0/1 specialization problems, and so on.
```{toctree}
:maxdepth: 1
dynamic_shapes_zero_one_specialization
dynamic_shapes_backed_unbacked
```

View File

@ -0,0 +1,134 @@
(dynamic_shapes_core_concepts)=
# Dynamic Shapes Core Concepts
This section described the core concepts of dynamic shapes in PyTorch. It is intended to be a
reference for engineers working on the PyTorch compiler stack and anyone who wants to understand
the inner workings of dynamic shapes.
## Symbolic integers
Symbolic integers (Symints) are used to represent variables that can span a range. For example:
```python
x = torch.randn(5, 5) # this tensor has a shape [5, 5]
torch._dynamo.decorators.mark_dynamic(x, 0)
x = torch.randn(5, 5) # this tensor has a shape [s0, 5]
y = torch.cat([x, x], dim=0) # this tensor has a shape [2*s0, 5]
```
However, `z = x * y` would throw an error since we know that pointwise operation like multiply must
operate on same sized tensors but we know statically `s0 != 2 * s0`. Astute readers may point out
that this is not true when `s0 == 0` and the reason why that doesn't matter here is described in
{ref}`zero-one-specialization`.
## Guards
In `torch.compile`, a guard is a mechanism that is used to ensure the validity of a compiled code graph.
By default, when you make a variable dynamic, it can range from `[-inf, inf]`. For example:
```python
def foo(x): return x / 2
This works for any dynamic x. But if your code is:
def foo(x)
if x > 5:
return x / 2
return x / 3
```
If you call `foo(6)`, it returns `x / 2` and adds a guard `x > 5`. Calling `foo(4)` later will
require recompilation because the guard is broken.
## Runtime Asserts
You can use runtime asserts to provide hints when you know certain facts, like batch size being less than 100:
```python
def foo(batch_size):
torch._check(batch_size < 100)
if batch_size < 100:
return do_something
return do_something_else()
```
## "Hint" Value
A "hint value" in the context of `torch.compile` refers to the actual values known during the compilation process that help the JIT compiler make decisions about expressions. Hint values are particularly useful for handling dynamic shapes, as they provide concrete information that guides the compilation without requiring recompilation for varying dimensions.
## Dynamic Behavior Overview
PyTorch assumes static shapes by default. When a size change is detected, it attempts to
recompile with dynamic input, although this may fail if there are conditional branches
or missing support for dynamic shapes. To diagnose overspecialization, you can set
`TORCH_LOGS=dynamic` to view "eval" entries that indicate when and why guards are added.
If you anticipate a dimension will be dynamic, you can use `torch._dynamo.mark_dynamic(tensor, dim)`
to mark it in advance, specifying `min` and `max` values if known. Using `torch.compile(dynamic=False)`
disables automatic dynamic shapes, leading to recompilation for each unique size. Conversely,
`torch.compile(dynamic=True)` aims to use dynamic shapes as much as possible which is most useful
for small and may not be suitable for large models due to potential crashes or performance issues.
You can whitelist specific sources to be marked as dynamic using the `TORCH_COMPILE_DYNAMIC_SOURCES` environment variable or `torch.compiler.config.dynamic_sources`. This is particularly useful for large
models with graph breaks, as you can maintain dynamism across graph breaks since
source names stay consistent. You can also use this to mark integers as dynamic. The format is a comma-delimited list of source names, for example, `"L['x'], L['y']"`.
You can also use regexes, for example, `"L\['x.*'\], L\['y.*'\]")`.
This whitelist takes precedence over other flags like `dynamic=False` `force_nn_module_property_static_shapes`, and `force_parameter_static_shapes`.
Sometimes it can be cumbersome to find the right inputs to mark as dynamic. If
you're willing to take a performance hit for the first batch, one other affordable
option we have are the `eager_then_compile` stances which derive dynamism for you.
See {func}`torch.compiler.set_stance` for more details.
## Overall Architecture
Symbolic shapes workflow:
1. When compiling a frame in Dynamo, we allocate a `ShapeEnv` (attached to `FakeTensorMode`) to
track symbolic shapes.
2. We allocate symbolic sizes for tensors on entry, based on policy decisions.
3. We propagate symbolic sizes through operators, maintaining both FX IR for symbolic compute export
and Sympy expressions for reasoning.
4. We add guards based on conditionals during Dynamo tracing or Inductor optimization, induced from both Python and C++.
5. Guards can simplify symbolic variables. For instance, asserting `s0 == 4` allows replacing all occurrences of `s0` with `4`.
6. After tracing and optimizing, we install all guards with the compiled code, ensuring reusability only if all guards evaluate true.
## Internal API Class Hierarchy
### Python Classes
- **`SymInt`/`SymFloat`/`SymBool`**: User-visible classes that simulate their `int`/`float`/`bool` counterparts. Adding two `SymInts` produces a new `SymInt` that symbolically tracks the integer addition.
- **`SymNode`**: Internal structure (accessible via `symint.node`) that holds actual symbolic tracking information. `SymNode` is type-erased, making it convenient to represent mixed-type operations.
- **`ShapeEnv`**: Per-compile context state that tracks all free symbols and guards accumulated so far. Every `SymNode` records its `ShapeEnv` (but not vice versa; `SymNodes` are only used if they participate in a guard).
### C++ Equivalents
- **`c10::SymInt`/`SymFloat`/`SymBool`**: User-visible classes that simulate `int`/`float`/`bool`
- **`c10::SymNode`/`SymNodeImpl`**: Analogous to Python `SymNode`
- **No C++ `ShapeEnv`**: For debugging ease, the entire symbolic reasoning apparatus remains in Python
When writing code traceable with `make_fx`, it must handle `SymInt`/`SymFloat`/`SymBool` flowing through it.
## Value Ranges and Constraints
Symbolic variables maintain **value ranges** that specify the set of possible values. By default:
- Size-like unbacked `SymInts` have value range `[0, Inf]`
- Regular unbacked `SymInts` have value range `[-Inf, Inf]`
When assertions are made (e.g., `torch._check(x == y)`), the system:
1. Attempts to replace unbacked symbols with equivalent expressions
2. Refines value ranges based on the assertion
3. Remembers boolean expressions that are always true
Important files:
- C++ SymInt API: `c10/core/SymInt.h`, `SymFloat.h`, `SymBool.h`
- Python SymInt API: `torch/__init__.py` (look for `SymInt/SymFloat/SymBool`)
- C++ plumbing: `c10/core/SymNodeImpl.h`, `torch/csrc/utils/python_symnode.h`, `torch/csrc/jit/python/init.cpp`
- Python infrastructure: `torch/fx/experimental/symbolic_shapes.py`
- Other important files: `torch/_subclasses/fake_tensor.py`, `torch/_meta_registrations.py`, decomps, PrimTorch refs
```{seealso}
* {ref}`dynamic_shapes`
* {ref}`dynamic_shapes_troubleshooting`
```

View File

@ -0,0 +1,101 @@
(debugging-tlparse-torch-logs)=
# Debugging with `tlparse` and `TORCH_LOGS=dynamic`
`tlparse` is a tool used for analyzing and understanding the compilation
process in PyTorch, particularly when dealing with dynamic shapes. It helps
identify where guards and specializations occur in your code.
`TORCH_LOGS=dynamic` is an environment variable setting that enables detailed
logging of dynamic shape operations, providing insights into how symbolic
shapes are handled during execution.
This section will guide you through using `tlparse` and `TORCH_LOGS=dynamic` to
troubleshoot dynamic shape issues in your code, including debugging
specialization, guards, and more.
# Debugging Specialization
In the following example, `x.shape[0]` is dynamic but becomes specialized due to multiplication:
```python
import torch
@torch.compile
def fn(x, y):
return x * y
x = torch.randn(5)
y = torch.randn(5)
torch._dynamo.decorators.mark_dynamic(x, 0)
fn(x, y)
```
By using `TORCH_LOGS=dynamic`, you can observe this specialization in the logs:
```xml
TORCH_LOGS=dynamic python tl.py
I0721 11:10:00.950000 845259 torch/fx/experimental/symbolic_shapes.py:3776] [0/0] create_env
I0721 11:10:01.030000 845259 torch/fx/experimental/symbolic_shapes.py:5117] [0/0] create_symbol s77 = 5 for L['x'].size()[0] [2, int_oo] return x * y # tl.py:5 in fn (_dynamo/variables/builder.py:3466 in <lambda>), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s77" or to suppress this message run with TORCHDYNAMO_EXTENDED_ADVICE="0"
I0721 11:10:01.038000 845259 torch/fx/experimental/symbolic_shapes.py:7211] [0/0] eval Eq(s77, 5) [guard added] return x * y # tl.py:5 in fn (_subclasses/fake_impls.py:922 in infer_size), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Eq(s77, 5)"
```
The line `eval Eq(s77, 5) [guard added] return x * y # tl.py:5` indicates the specialization.
## Debugging Guards
Consider the following code, which may cause recompilations due to dynamic
shapes:
```python
import torch
@torch.compile
def fn(x, y):
if x.shape[0] < 10:
return x * y
x = torch.randn(5)
y = torch.randn(5)
torch._dynamo.decorators.mark_dynamic(x, 0)
torch._dynamo.decorators.mark_dynamic(y, 0)
fn(x, y)
```
To identify where dynamic shape guards originate, use `tlparse`. Here is an example tlparse output:
```{image} ../_static/img/dynamic_shapes/tlparse9_debugging_guards.png
```
By clicking on the `dynamo_cpp_guards` link, you can view all guards from the compilation, including the symbolic shape guard `L['x'].size()[0] <= 9`.
Astute readers will notice the 0/1 specialization where we guard on `L['x'].size()[0] >= 2`. By modifying the code to use unbacked symbols, this guard is removed:
```python
import torch
@torch.compile
def fn(x, y):
# Necessary runtime assert since we can't guard on unbacked
torch._check(x.shape[0] < 10)
if x.shape[0] < 10:
return x * y
x = torch.randn(5)
y = torch.randn(5)
torch._dynamo.decorators.mark_unbacked(x, 0)
torch._dynamo.decorators.mark_unbacked(y, 0)
fn(x, y)
```
Now, this compiled region can be used for inputs of size 0 and 1:
```{image} ../_static/img/dynamic_shapes/tlparse10_debugging_guards_unbacked.png
```
```{seealso}
* {ref}`dynamic_shapes`
* {ref}`troubleshooting_guardondatadependentsymnode_errors`
```

View File

@ -0,0 +1,14 @@
(dynamic_shapes_troubleshooting)=
# Troubleshooting Dynamic Shapes
This section contains a list of common issues that you may encounter when using
dynamic shapes. The section describes how to use `TORCH_LOGS` and `tlparse` to
debug the issues, as well as provides some general tips and tricks to help you
resolve the issues.
```{toctree}
:maxdepth: 1
dynamic_shapes_debugging_tlparse_torch_logs
dynamic_shapes_troubleshooting_guardon_errors
```

View File

@ -0,0 +1,411 @@
(troubleshooting_guardondatadependentsymnode_errors)=
# Troubleshooting GuardOnDataDependentSymNode Errors
When working with PyTorch models that have data-dependent control flow (using functions
like `item()`, `tolist()`, or `nonzero())`, you may encounter `GuardOnDataDependentSymNode` errors.
This section explains what these errors are and how to fix them.
## Common Error Pattern
The following output shows the common error pattern `GuardOnDataDependentSymNode` errors:
```sh
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u2, -1) (unhinted: Eq(u2, -1)). (Size-like symbols: none)
Potential framework code culprit (scroll up for full backtrace):
File "/data/users/ezyang/a/pytorch/torch/_prims_common/__init__.py", line 855, in infer_size
if d == -1:
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u2"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing
```
## Root Cause
These errors occur when PyTorch tries to convert a symbolic quantity (for example, `u2 == -1`)
into a concrete value (such as, `False`) to make branching decisions. In a typical scenario,
where data-dependent sizes are not involved, PyTorch can determine the concrete value at
compile time and install a guard to ensure the compilation result remains valid. However,
with data-dependent quantities, the true value is unknown at compile time, resulting in errors.
You can often rewrite your model, by adding `torch._check` or `torch._check_is_size` to
bypass these issues. This document aims to teach you how.
## Debugging Tools
Here is the list of some of the debugging tools available in PyTorch that you can use to troubleshoot these errors:
* `TORCH_LOGS="dynamic"` - Shows detailed logs about symbolic operations
* `TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u2"` - Provides extended logs for specific symbols
* `TORCHDYNAMO_EXTENDED_DEBUG_CPP=1` - Helps when guards are triggered from C++
## Error Variations
Here is a the list of error variations that you might encounter:
| Error Variations | Description |
|------------------|-------------|
| "Could not guard on data-dependent expression" | Occurs when trying to extract a concrete boolean from expressions like u0 == 0 or u0 > 10 |
| "Could not extract specialized integer from data-dependent expression" | Occurs when trying to extract a concrete integer value. <br/> **Common causes:** <br/> - Control flow that depends on the integer (such as, looping `u0` times) <br/> - Overspecialization in code that could work symbolically |
## How to Diagnose Your Problem
### Step 1: Examine the Potential Framework Culprit (Python Backtrace)
The exception provides a backtrace, which often indicates the problem.
Given that PT2 backtraces can be lengthy, the error message will also
suggest a potential framework culprit. For example:
```sh
Potential framework code culprit (scroll up for full backtrace):
File "/data/users/ezyang/a/pytorch/torch/_prims_common/__init__.py", line 855, in infer_size
if d == -1:
```
**Consider the Following:**
* Does it make sense that this condition is triggering a guard on a
data-dependent symbol?
* Should we know if the quantity in question is size-like?
(The exception lists size-like symbols; if a symbol is not listed,
it might be an arbitrary integer.)
* If the equation involves two distinct symbols, should we know
they are actually equal?
* If all symbols are size-like but the equation involves 0 or 1,
are we missing a `guard_size_oblivious` wrapper? (Remember, for
`guard_size_oblivious` between two size tuples, use `sym_eq` instead
of regular equality.)
In the example above, testing if `d` (a data-dependent value) is `-1` suggests
that `d` should be non-negative if it were a size. This indicates a missing
`torch._check_is_size`. If `d` is already size-like but `numel() == 0` fails,
consider wrapping it in `guard_size_oblivious`.
Using `TORCH_LOGS=dynamic` and examining the user stack trace is crucial for
understanding how to fix the problem, as they guide you on how to modify the
user program.
```sh
[INFO] create_unbacked_symint u0 [-9223372036854775808, 9223372036854775807] (w.py:40 in custom_op_meta)
```
This log message indicates where (`w.py:40`) the unbacked `SymInt` was
allocated. An unbacked `SymInt` may be allocated multiple times, so track
their equalities:
```sh
[INFO] set_replacement u1 = u0 (trivial_lhs) ValueRanges(lower=0, upper=9223372036854775807, is_bool=False)
```
### Step 2: Examine the C++ Backtrace
If the framework code culprit is uninformative, the guard might be in C++. You can
force a C++ backtrace by running with `TORCHDYNAMO_EXTENDED_DEBUG_CPP=1`. This
provides a detailed C++ backtrace with Python, CPython, and C10/ATen/libtorch
frames interspersed. Look for symbols in the `at::` or `c10::` namespace that
resemble kernel-specific code, likely related to the kernel executed per the Python
backtrace. If using a non-debug build of PyTorch, inlining may cause missing
frames, requiring source code investigation to locate the issue. For example, see https://github.com/pytorch/pytorch/pull/118579.
Here is an example C++ backtrace from a debugging session:
```
[2024-02-08 08:20:45,259] torch.fx.experimental.symbolic_shapes: [INFO] File "../
__gen_aten__/out/RegisterCompositeImplicitAutograd.cpp", line 2025, in at::
(anonymous namespace)::(anonymous namespace)
::wrapper_CompositeImplicitAutograd_Tensor_narrow(at::Tensor const&, long,
at::Tensor const&, c10::SymInt) [2024-02-08 08:20:45,259] torch.fx.experimental.
symbolic_shapes: [INFO] File "../aten/src/ATen/native/TensorShape.cpp", line 1410,
in at::native::narrow_tensor_symint(at::Tensor const&, long, at::Tensor const&,
c10::SymInt) [2024-02-08 08:20:45,259] torch.fx.experimental.symbolic_shapes:
[INFO] File "../__gen_aten__/out/core/TensorMethods.cpp", line 52, in long
at::Tensor::item<long>() const [2024-02-08 08:20:45,259] torch.fx.experimental.
symbolic_shapes: [INFO] File "../ATen/core/TensorBody.h", line 4274, in
at::Tensor::item() const
```
In this example, `at::native::narrow_tensor_symint` calls into `item`, which
triggers the guard on a data-dependent `SymNode`. You can modify the C++ code to
avoid specializing, or verify if you should be in this C++ code (e.g., `start` was
not expected to be a `Tensor`, and modifying this fixed the problem).
## Tools for Fixing Errors
There are a few important functions which you should use to troubleshoot this problem.
### torch._check(cond, msg_fn)
`torch._check` is a function used to assert conditions at runtime, particularly when dealing with symbolic integers (`SymInts`) in PyTorch.
**Example Usage:**
```python
torch._check(x.size(0) == y, lambda: f"size mismatch: {x.size(0)} != {y}")
```
The code above does the following:
* Creates a deferred runtime assertion instead of a compile-time guard
* Teaches the symbolic reasoning system facts about your unbacked SymInts
* Can eliminate unbacked symbols by replacing them with equivalent expressions
* Refines value ranges of symbols
* Remembers boolean expressions that are always true
Semantically, the function behaves like a conditional check:
```python
if not cond:
raise RuntimeError(msg_fn())
```
But there a number of key differences:
* The condition is always assumed true at compile time, even if it involves unbacked `SymInts`. The actual check is deferred to runtime, avoiding
compile-time errors. Instead of setting up a guard, we implement a
deferred runtime assertion to verify the condition at runtime. At compile
time, we assume the condition won't trigger an error, so we don't need
to determine if it evaluates to `True` or `False`.
* If you perform an equality test `u0 = RHS`, we try to replace all instances
of `u0` with RHS. We will ALWAYS do this if RHS has no unbacked symbols,
as removing unbacked symbols is beneficial—eliminating them prevents
the creation of a `GuardOnDataDependentSymNode`. Even if we are not able
to eliminate u0, we can refine its value range. The value range specifies
what the set of possible values for a variable are. By default, size-like
unbacked SymInts have a value range of `[0, Inf]`; if you assert it is
equal to an expression with a refined value range, say `[2, 20]`, then
`u0`s value range will be updated to `[2, 20]`. We also have limited
support for propagating value ranges in reverse.
* If you perform a boolean test `f(u0)`, we will remember that this expression always evaluates to True, and if you evaluate an expression that contains this expression, we will substitute it with True. We also support some limited reasoning on logically equivalent statements. For example, if you `torch._check(u0 < 4)`, we will also know that `u0 >= 4` evaluates to `False`, and so performing a test like this in a normal non-check conditional will go through fine.
### `torch._check_is_size(size)` and `guard_size_oblivious(cond)`
Example:
```python
u0 = y.item()
torch._check_is_size(u0)
```
**Semantic Equivalent:**
```python
if u0 < 0:
raise RuntimeError("u0 is not a size")
```
**Key Differences:**
Like `torch._check`, this test will always succeed at compile time, and it will establish that `u0 >= 0`. This refines the value range of `u0` to `[0, Inf]` instead of `[-Inf, Inf]`.
Marking `u0` as size-like is crucial. Size-like unbacked `SymInts` behave like
their regular counterparts, except when involved in a boolean expression
evaluated with `guard_size_oblivious`. In such cases, they are assumed not to equal zero or one, temporarily setting their value range to `[2, Inf]`. For instance, a conditional check like `u0 == 1` will evaluate to `False` when `u0` is size-like, instead of causing an error.
For example, `guard_size_oblivious(u0 == 1)` will always return `False` when `u0`
is size-like.
Marking unbacked symbols as size-like is essential in contexts where tensor
sizes are expected. PyTorch internals often check if sizes are zero or one to
handle special cases related to empty or single-element tensors. If you pass an
unbacked symbol to a factory function like `torch.empty`, it will automatically
be marked as size-like. However, some quantities, like arguments to `Tensor.view`,
cannot be inferred as size-like because `-1` is a valid argument. In such cases,
you need to explicitly use `torch._check_is_size` on an unbacked `SymInt` before
passing it to `view`.
In PyTorch framework code, if you need to test a size for zero or one, wrap the
test in `guard_size_oblivious` to assume that size-like unbacked `SymInts` will
not pass this test. Generally, most framework code has logic for the `>= 2`
case, which works for the `0/1` case. If using `guard_size_oblivious` in
PyTorch framework code resolves your issue, it's likely acceptable. However,
avoid using `guard_size_oblivious` in user code, especially if different
behavior is required for the `0/1` case at runtime, such as in a
hand-tracking application.
In C++, this can be done with `TORCH_GUARD_SIZE_OBLIVIOUS(u0.sym_eq(0))`, for example.
### torch._check_is_size(size, max=upper_bound) (New)
This function is semantically equivalent to `torch._check(size <= upper_bound)`.
However, under `guard_size_oblivious`, it assumes that `size < upper_bound`.
This functionality only works when the upper bound is an integer constant. If
`upper_bound` is a symbolic expression, normal semantics apply. There is
potential to extend this functionality to symbolic expressions with further
development.
For more details, see the related issue https://github.com/pytorch/pytorch/issues/120288.
### `torch._constrain_as_value` and `torch._constrain_as_size`
These APIs are more specialized and are effectively equivalent to
`torch._check` and `torch._check_is_size`, with the added capability
of adjusting the value range of a variable by specifying minimum and
maximum values. However, in recommendation models, these functions are
unlikely to resolve `GuardOnDataDependentSymNode` errors effectively.
While `constrain_as_value` might seem like a convenient way to ensure a
variable stays within the bounds of another tensor, it is often impractical.
This is because value ranges only support constant bounds, and it's common
for the tensor you want to index into to have a symbolic dimension (for
example, `s0`). Using its size as the maximum value for a value range
will force specialization, which is usually undesirable. Instead, if
necessary, manually handle range checks by using `torch._check()` on
appropriate expressions based on the errors you encounter.
## Common Fix Patterns
There are several common methods to resolve issues like this. Below,
we outline the most frequently used solutions.
### When It's Unfixable
In some cases, the issue is genuinely unfixable due to the nature of the code.
Consider the following example:
```python
i = x.item()
if i > 4:
return x * 2
else:
return x + 3
```
If the user code is branching on a data-dependent value, it is impossible to
trace as is. In such cases, you may need to consider alternative approaches,
such as using `torch.cond`.
Another common pattern involves indexing with a data-dependent value:
```python
return self.mlps[x.item()]
```
Here, `self.mlps` is a Python list or `ModuleList`, and the code branches on a data-dependent value. The simplest solution is to induce a graph break before the indexing operation.
### `u0` is a Size, but We Dont Know It
Some guards fail on tests that essentially ask, "Is this a size?" but we don't know it is a size. These fall into two categories:
1. **Regular Tests:**
These are tests like `u0 >= 0` or `u0 != -1` that are unconditionally true
for sizes. Adding a `torch._check_is_size(...)` on the relevant size will
assert that these tests are true. This is typically uncommon because if
the test is for error checking, we can infer that the condition must be
true, as an error would occur otherwise. An important exception is APIs
that accept both sizes and `-1`; in such cases, the user must indicate that
the input data-dependent quantity cannot be `-1`, as something unusual would
happen otherwise. For an example, see
https://github.com/pytorch/pytorch/pull/107788.
Sometimes, you can refactor an error-checking API to split a logical
disjunction of conditionals into separate conditionals. If you can do so
to achieve a single `torch._check(x == y)` statement, it will enable
the automatic generation of a deferred runtime assertion. For an example,
see https://github.com/pytorch/pytorch/pull/110979.
2. **Edge Case Tests:**
These are tests like `u0 == 0` or `u0 == 1`, which are not always true for
sizes, but where our choice doesnt really matter. These tests handle edge
cases, such as dealing with an empty tensor or testing for broadcasting when
we want to assume broadcasting is not occurring. To resolve these situations,
two steps are needed:
* First, the guard itself must be evaluated via `guard_size_oblivious`,
which assumes that size-like integers cannot equal zero or one, with the
promise that if they do, something reasonable will happen.
* Second, the symbols themselves must be marked as size-like, either
inferred because they were passed to tensor factory functions or explicitly
specified with `torch._check_is_size(...)`. For examples of making guards
size-oblivious, see https://github.com/pytorch/pytorch/pull/118579.
Sometimes, these tests can occur in C++. While there are corresponding
C++ APIs for these tests, it can be more challenging to localize the problem,
as you do not get a useful backtrace by default.
### `u0` is Actually Equal to `u1`, but We Dont Know It
Multiple unbacked `SymInts` can be known to be equal at compile time:
```python
i0 = x.sum().item()
i1 = x.sum().item()
return torch.randn(i0) + torch.randn(i1)
```
If there is a `torch._check(i0 == i1)` somewhere (in the example above, this
check would occur inside the shape-checking rule for addition), we will
automatically unify the two unbacked `SymInts` and recognize them as equal.
However, if such an assertion is missing, you may need to explicitly add an
assertion to achieve this unification. For an example, see
https://github.com/pytorch/pytorch/issues/111950).
```{note}
If we allocate an unbacked `SymInt` and
immediately set it equal to another, these instances are benign and not easily
eliminated entirely from the framework.
```
### `u0` is a Tensor
Another reason you might be overallocating unbacked `SymInts` is due to passing
around a `Tensor` and relying on its implicit conversion to an integer. Many
functions that accept an integer will also accept a `Tensor` and automatically
call `item()` on the integer argument. It's beneficial to examine
`TORCH_LOGS=dynamic` to determine whether the number of unbacked `SymInts` is
as expected or excessive. When this occurs, a new `SymInt` will be allocated at
the line where a PyTorch function is invoked.
This issue is less likely to cause problems now because the return value of
`t.item()` is memoized, ensuring that you consistently receive the same unbacked
`SymInt` if you call it multiple times.
### Overspecialization Issue
In non-strict export mode, consider the following code:
```python
u0 = x.sum().item() return y[:u0]
```
This code will fail when trying to evaluate `u0` because, when a `SymInt` is
used directly inside a Python slice (without using Dynamo), Python forces the
integer to be specialized and fails if it is unbacked.
To resolve this, you can rewrite the program to avoid specialization.
For the example above, you can fix it by not using slices:
```python
u0 = x.sum().item() return y.narrow(0, 0, u0)
```
For more details, see the related issue
https://github.com/pytorch/pytorch/issues/111950.
### Use Lengths Instead of Offsets
When working with variable sequence lengths, it's common to have tensors
representing either the lengths or offsets of the sequences. For example, given
`values = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]`, you might have `lengths = [3, 2, 4]`
and `offsets = [0, 3, 5, 9]`. While these representations are interconvertible,
it's better to work with lengths when dealing with them as integers (by calling
`lengths.tolist()`), rather than offsets.
The reason is that when you perform a `torch.split()` on your `values` tensor, you
need to create tensors for each sub-sequence, such as tensors of sizes 3, 2, and 4.
If you have unbacked `SymInts` for sizes, they become `u0`, `u1`, and `u2`. You can
easily indicate that they are size-like, and you're done. However, if you have
unbacked `SymInts` for offsets, they become `u1 - u0`, `u2 - u1`, `u3 - u2`, which
complicates matters. These quantities cannot be conveniently marked as size-like,
leading to potential issues. Since it's relatively straightforward to write code
using either lengths or offsets, you should prefer using lengths.
```{seealso}
* {ref}`dynamic_shapes`
* {ref}`debugging-tlparse-torch-logs`
```

View File

@ -0,0 +1,33 @@
(zero-one-specialization)=
# The Zero-One Specialization Problem
Before you read this section, you should understand the basics of
dynamic shapes. Make sure you have read the following sections:
* {ref}`dynamic_shapes`
* {ref}`torch.export`
* {ref}`what_is_a_specialization`
In `torch.compile`, we specialize automatically on inputs with sizes
0 or 1 and assume that any remaining inputs cannot be 0 or 1. This
simplifies tasks like contiguity and broadcasting checks, as it
avoids adding extra guards. However, this can cause problems for
sparse models with many symbolic integers that in practice have
tensors of size 0, 1, or 2. For example, consider when you a task is
something like collecting likes on page.
While it's possible to stop specializing on 0/1 upfront, executing
normal PyTorch code often reintroduces 0/1 guards, as many conditions
in PyTorch check for values being 0 or 1. Although models that work
for `N > 2` often generalize to `N = 1`, this isn't guaranteed, especially
with symbolic variables. For example, in hand tracking, a dimension
size of `N = 0`, `1`, or `2` may lead to different graph behaviors.
Simply hoping that the `N > 2` model generalizes can expose soundness issues.
```{seealso}
* {ref}`dynamic_shapes`
* {ref}`torch.export`
* {ref}`what_is_a_specialization`
* {ref}`backed-vs-unbacked-symints`
```

View File

@ -82,55 +82,48 @@ Some of the most commonly used backends include:
## Read More
```{eval-rst}
.. toctree::
:caption: Getting Started for PyTorch Users
:maxdepth: 1
```{toctree}
:caption: Getting Started for PyTorch Users
:maxdepth: 2
torch.compiler_get_started
torch.compiler_api
torch.compiler.config
torch.compiler_fine_grain_apis
torch.compiler_backward
torch.compiler_aot_inductor
torch.compiler_inductor_profiling
torch.compiler_profiling_torch_compile
torch.compiler_faq
torch.compiler_troubleshooting
torch.compiler_performance_dashboard
torch.compiler_inductor_provenance
torch.compiler_get_started
torch.compiler_api
torch.compiler.config
torch.compiler_dynamic_shapes
torch.compiler_fine_grain_apis
torch.compiler_backward
torch.compiler_aot_inductor
torch.compiler_inductor_profiling
torch.compiler_profiling_torch_compile
torch.compiler_faq
torch.compiler_troubleshooting
torch.compiler_performance_dashboard
torch.compiler_inductor_provenance
```
```{eval-rst}
.. toctree::
:caption: `torch.compile` Programming Model
```{toctree}
:caption: torch.compile Programming Model
:maxdepth: 2
compile/programming_model
compile/programming_model
```
% _If you want to contribute a developer-level topic
% that provides in-depth overview of a torch._dynamo feature,
% add in the below toc.
```{toctree}
:caption: Deep Dive for PyTorch Developers
:maxdepth: 1
```{eval-rst}
.. toctree::
:caption: Deep Dive for PyTorch Developers
:maxdepth: 1
torch.compiler_dynamo_overview
torch.compiler_dynamo_deepdive
torch.compiler_dynamic_shapes
torch.compiler_nn_module
torch.compiler_cudagraph_trees
torch.compiler_fake_tensor
torch.compiler_dynamo_overview
torch.compiler_dynamo_deepdive
torch.compiler_nn_module
torch.compiler_cudagraph_trees
torch.compiler_fake_tensor
```
```{eval-rst}
.. toctree::
:caption: HowTo for PyTorch Backend Vendors
:maxdepth: 1
```{toctree}
:caption: HowTo for PyTorch Backend Vendors
:maxdepth: 1
torch.compiler_custom_backends
torch.compiler_transformations
torch.compiler_ir
torch.compiler_custom_backends
torch.compiler_transformations
torch.compiler_ir
```

View File

@ -1,129 +1,295 @@
# Dynamic Shapes
---
file_format: mystnb
kernelspec:
name: python3
mystnb:
execution_timeout: 30
execution_show_tb: True
merge_streams: True
---
Code: [symbolic_shapes.py](https://github.com/pytorch/pytorch/blob/db4572dbf18f1cf50cf662547e272d3117063747/torch/fx/experimental/symbolic_shapes.py)
```{code-cell}
:tags: [remove-cell]
import torch
from compile import header_code
See also: [The dynamic shapes manual](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.fh8zzonyw8ng)
## Motivation
Deep learning compilers commonly only work for static shapes, that is to say, they produced compiled programs which only work for a single specific configuration of input shapes, and must recompile if any input shape changes. This assumption works great for the majority of commonly run deep learning models today, but there are a few situations where it is insufficient:
- Some dimensions, such as batch size or sequence length, may vary. For example, an inference service performing adaptive batching will execute inference requests with varying batch sizes depending on how many requests it received within its batching window. We may also want to consider padding out variable size sequences only to the maximum sequence length within a batch, which may vary from batch-to-batch.
- Some models exhibit data-dependent output shapes, that is to say, the size of their outputs and intermediates may depend on the actual input data which may vary across runs. For example, detection models may first generate a variable number of potential bounding boxes before running a more expensive image recognition model to identify if the subject is in a bounding box. The number of bounding boxes is data dependent.
- One particularly important case of data-dependent shapes occurs when dealing with sparse representations, such as sparse tensors, jagged tensors, and graph neural networks. In all of these cases, the amount of data to be processed depends on the sparse structure of the problem, which will typically vary in a data-dependent way.
In supporting dynamic shapes, we chose not to support dynamic rank programs, e.g., programs whose inputs tensors change in dimensionality, as this pattern rarely occurs in real-world deep learning programs, and it avoids the need to reason inductively over symbolic lists of shapes.
## Abridged public API
The default dynamic behavior in PyTorch 2.1 is:
- PT2 assumes everything is static by default
- If we recompile because a size changed, we will instead attempt to recompile
that size as being dynamic (sizes that have changed are likely to change in
the future). This generalization may fail (e.g., because user code does a
conditional branch on the size in question or missing dynamic shapes support
in PT2). If you are trying to understand why PT2 has overspecialized some
code, run with `TORCH_LOGS=dynamic` and look for "eval" entries that say
when guards are added and why.
- If you know ahead of time something will be dynamic, you can skip the first
recompile with `torch._dynamo.mark_dynamic(tensor, dim)`. If you know ahead of time
the `min` and `max` value this dimension can take, you can specify `torch._dynamo.mark_dynamic(tensor, dim, min=min, max=max)`
- If you say `torch.compile(dynamic=False)`, we will turn off automatic
dynamic shapes on recompiles and always recompile for each distinct size.
Conversely, if you say `torch.compile(dynamic=True)`, we will try to make
everything as dynamic as possible. This is mostly useful for small
operators; if you try it on a big model it will (1) probably crash PT2 and (2) run slow for no good reason.
- You can whitelist specific sources to be marked as dynamic using the
`TORCH_COMPILE_DYNAMIC_SOURCES` environment variable or by setting
`torch.compiler.config.dynamic_sources`. This is particularly useful for large
models with graph breaks, as you can maintain dynamism across graph breaks since
source names stay consistent. You can also use this to mark integers as dynamic.
The format is a comma-delimited list of source names, e.g., `"L['x'], L['y']"`.
You can also use regexes, e.g., `"L\['x.*'\], L\['y.*'\]")`.
This whitelist takes precedence over other flags like `dynamic=False`,
`force_nn_module_property_static_shapes`, and `force_parameter_static_shapes`.
- Sometimes it can be cumbersome to find the right inputs to mark as dynamic. If
you're willing to take a performance hit for the first batch, one other affordable
option we have are the eager_then_compile stances which derive dynamism for you.
See [torch.compiler.set_stance](https://docs.pytorch.org/docs/stable/generated/torch.compiler.set_stance.html) for more details.
## The Guard Model
When considering how to add support for dynamic shapes to TorchDynamo and TorchInductor, we made a major design decision: in order to reuse decompositions and other preexisting code written in Python/C++ targeting the PyTorch API, we must be able to trace through dynamic shapes. Unlike a fully symbolic system which might capture both branches of a conditional, we always pick one branch and specialize our trace under the assumption that we only use this trace when we would have made the same choice for that branch in the future. To do this, we maintain a "hint" for every symbolic size saying what its concrete value is at compile time (as TorchDynamo is a just-in-time compiler, it always knows what the actual input sizes are.) When we perform a condition on a tensor, we simply consult the hint to find out which branch to take.
This greatly simplifies the symbolic shape formulas we produce, but means we have a much more involved system for managing guards. Consider, for example, the following program:
```python
def f(x, y):
z = torch.cat([x, y])
if z.size(0) > 2:
return z.mul(2)
else:
return z.add(2)
torch._logging.set_logs(graph_breaks=True, graph_code=True)
```
The final IR we will compile with TorchInductor will either be `torch.cat([x, y]).add(2)` or `torch.cat([x, y]).mul(2)` (with the condition flattened away), but to determine which branch we are in, we would need to know the size of `z`, an intermediate. Because TorchDynamo must know upfront if a compiled trace is valid (we do not support bailouts, like some JIT compilers), we must be able to reduce `z.size(0)` as an expression in terms of the inputs, `x.size(0) + y.size(0)`. This is done by writing meta functions for all operators in PyTorch which can propagate size information to the output of a tensor without actually performing computation on the node.
(dynamic_shapes)=
# Dynamic Shapes
## Overall architecture
This section explains how to work with dynamic shapes in PyTorch, including how
to debug and fix common errors, implement support for dynamic shapes in
operators, and understand the underlying mechanisms.
Symbolic shapes workflow:
Dynamic shapes allow PyTorch models to handle inputs with varying dimensions
without recompilation. This enables more flexible models that can process
different batch sizes, sequence lengths, or image dimensions in a single
compiled artifact. Dynamic shapes work by symbolically tracing tensor
dimensions rather than using concrete values, creating a computation
graph that adapts to different input shapes at runtime. By default,
PyTorch assumes all input shapes to be static.
1. When we start compiling a frame in Dynamo, we allocate a ShapeEnv (attached to FakeTensorMode) which keeps track of symbolic shapes state.
2. We allocate symbolic sizes for tensors on entry (what is static or dynamic is a policy decision, with some knobs).
3. We propagate the symbolic sizes through operators, maintaining both (1) FX IR so that we can faithfully export symbolic compute, and (2) Sympy expressions representing the size vars, so we can reason about them.
4. When we condition on symbolic sizes, either in Dynamo tracing or in Inductor optimization, we add guards based on the conditional. These can be induced from both Python and C++.
5. These guards can induce further simplifications on symbolic variables. For example, if you assert `s0 == 4`, we can now replace all occurrences of `s0` with `4`.
6. When we're done tracing and optimizing, we install all of these guards with the compiled code; the compiled code is only reusable if all the guards evaluate true.
Typically, deep learning compilers only support static shapes, requiring
recompilation for input shape changes. While this approach covers many use cases,
there are situations where this is insufficient:
Important files:
- **Variable Dimensions** - Batch sizes or sequence lengths vary, such as in
adaptive batching.
- **Data-Dependent Outputs** - Models produce outputs based on input data,
like variable bounding boxes in detection models.
- **Sparse Representations** - Processing depends on data-varying sparse structures,
such as in sparse tensors, jagged tensors, and graph neural networks.
- C++ SymInt API: `c10/core/SymInt.h`, `SymFloat.h`, `SymBool.h`
- Python SymInt API: `torch/__init__.py` (look for `SymInt/SymFloat/SymBool`)
- C++ plumbing: `c10/core/SymNodeImpl.h`, `torch/csrc/utils/python_symnode.h`, `torch/csrc/jit/python/init.cpp`
- Python infrastructure: `torch/fx/experimental/symbolic_shapes.py`
- Other important files: `torch/_subclasses/fake_tensor.py`, `torch/_meta_registrations.py`, decomps, PrimTorch refs
Dynamic shapes do not support dynamic rank programs, programs which input tensors
change in dimensionality, as this is uncommon and unnecessarily complex.
## Abridged internal API
Understanding the Python class hierarchy:
## What does it mean for a size/integer to be dynamic?
- SymInt/SymFloat/SymBool: these are user-visible classes that simulate their int/float/bool counterparts. If you add two SymInts, we give you a new SymInt that symbolically tracks that the integer addition had occurred.
- SymNode: this is the internal structure (accessible via e.g., `symint.node`) which holds the actual symbolic tracking info. SymNode is type erased; this makes it more convenient to represent mixed-type operations. Note that technically you don't have to call into Python SymNode from SymInt; for example, XLA's C++ `SymNodeImpl` would take the place of SymNode.
- ShapeEnv: per-compile context state which keeps track of all the free symbols and guards we have accumulated so far. Every SymNode records its ShapeEnv (but not vice versa; SymNodes only get used if they participate in a guard).
Dynamic shapes allow avoiding recompilations by making certain dimensions or integers
dynamic. For example, if a function `f(x)` is compiled with a static size, it will need
recompilation for different sizes:
C++ is fairly similar:
```{note}
For simplicity, this example uses `@torch.compile(dynamic=True)`. Note, that
this option is not recommended due to it being error prone.
For a recommended way of enabling dynamic shapes, see {ref}`enable-dynamic-behavior`.
```
- c10::SymInt/SymFloat/SymBool: user-visible classes that simulate int/float/bool.
- c10::SymNode/SymNodeImpl: analogous to SymNode
- There is no ShapeEnv in C++; for ease of debugging, the entire symbolic reasoning apparatus is in Python.
```{code-cell}
import torch
@torch.compile(dynamic=False)
def f(x):
return x* x.size()[0]
When you write code that is traceable with `make_fx`, it must be able to deal with SymInt/SymFloat/SymBool flowing through it. [The dynamic shapes manual](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit#heading=h.fh8zzonyw8ng) gives some guidance for how to do this.
f(torch.rand(10))
f(torch.rand(20))
f(torch.rand(30))
f(torch.rand(40))
```
## DimDynamic policy
In the produced output, you can see that four graphs were generated.
See the corresponding <a href="_static/img/dynamic_shapes/tlparse1_dynamic_shapes_false.png" target="_blank">tlparse output</a>
Symbolic reasoning:
By making the size dynamic, the function can handle various sizes without recompilation:
- Value ranges
- Sympy usage notes
- Constraints
- DimDynamic/Constraint
```{code-cell}
import torch
@torch.compile(dynamic=True)
def f(x):
return x* x.size()[0]
## Unbacked SymInts
f(torch.rand(10))
f(torch.rand(20))
f(torch.rand(30))
f(torch.rand(40))
```
To resolve control flow, we check the hint, aka actual value, of a symbolic integer to determine which branch to go. However, in some cases, we may not have a hint: so-called unbacked symbolic integers arise when a size variable emerges from a data-dependent operation like `.nonzero()` or `.item()`. It is illegal to perform control flow on these symbolic integers, so we must graph break on these operations.
With dynamic shapes enabled, only one graph is created. See the
corresponding <a href="_static/img/dynamic_shapes/tlparse2_dynamic_shapes_true.png" target="_blank">tlparse output</a>.
Naively implemented, this is too restrictive: most PyTorch programs will immediately fail if you try to do anything with unbacked symbolic integers. Here are the most important enhancements to make this actually work:
While compilation time differences
are minimal for this small example, more complex use cases would show significant
performance improvements.
- On tensor creation, PyTorch precomputes a lot of data about a tensor; for example, if you use `empty_strided` to create a tensor, we will eagerly sort the strides and determine if the tensor is non-overlapping and dense. Sorts produce a lot of guards. However, it is more common to produce a tensor directly with a higher-level API like `empty`, which is guaranteed to produce a non-overlapping and dense tensor. We modified PyTorch to avoid needlessly recomputing these properties.
- Even if nontrivial compute is needed, sometimes a property is never actually queried at all. Making these precomputed properties lazy allows us to avoid guarding on an unbacked symbolic integer unless it is actually needed.
- The data in an integer tensor is generally not known to be non-negative. However, we provide an API `constrain_range` whereby a user can specify that a size is bounded above and below by known limits.
(what_is_a_specialization)=
## What is a specialization?
Similar to the dynamic APIs, there are corresponding unbacked APIs: namely you can use mark_unbacked instead of `mark_dynamic` and `TORCH_COMPILE_UNBACKED_SOURCES` instead of `TORCH_COMPILE_DYNAMIC_SOURCES` to tell the compiler to mark an input as unbacked.
**Specialization** refers to optimizing a computational graph for specific input shapes
by examining shape conditions during control flow. If a branch is taken based on a
shape condition, the graph is tailored for that condition. If a new input doesn't meet
this condition, the system will recompile the graph.
In future versions of PT2 (beyond PT2.1), we will extend our reasoning system
to infer that an unbacked symbolic integer is size-like based on usage. For
example, if you pass the result of an `.item()` call to a factory function
like `torch.empty`, we will automatically infer that the result is a size
(because if it was not, it would fail.) This assumption would get validated
at runtime, raising an error if it was not fulfilled.
Specialization allows you to create optimized computational graphs for specific input
shapes, which can significantly improve execution speed.
```{code-cell}
import torch
@torch.compile(dynamic=True)
def f(x):
if x.size()[0] == 10:
return x * 10
if x.size()[0] <= 30:
return x*200
return x*x.size()[0]
f(torch.rand(10))
f(torch.rand(20))
f(torch.rand(30))
f(torch.rand(40))
f(torch.rand(50))
```
In the code above, we specialize that the graph requires an input size of 10, in which
case it will return `x * 10`. If the input size is less than 30, it will return `x * 200`.
In the output, you can see that this creates three graphs.
See the corresponding <a href="_static/img/dynamic_shapes/tlparse3_specialization.png" target="_blank">tlparse output</a>
This is how graphs created for the above function:
```{image} _static/img/dynamic_shapes/dynamic_shapes_example_specialization.png
```
(enable-dynamic-behavior)=
## Enabling Dynamic Behavior
There are the following ways to make things dynamic:
* {ref}`automatic_dynamic`
* {ref}`user_annotations` (preferred)
* {ref}`torch_compile_dynamic_true` (for testing only)
* {ref}`dynamic_shapes_advanced_control_options` (for advanced use cases)
Read below about each of this options.
(automatic_dynamic)=
### Automatic dynamic
**Automatic dynamic** is the default behavior where {func}`torch.compile` performs
the initial compilation assuming static shapes are used, while tracking the
input sizes from that first compilation. When a recompile is triggered, it
uses this information to identify which dimensions have changed and marks
those as dynamic for the second compilation.
(user_annotations)=
### User Annotations
Several APIs allow users to explicitly mark specific inputs
by name or code as dynamic. This is useful for avoiding initial compilations that
would eventually become dynamic with the previous tools. It is also used to mark
elements that do not automatically get marked as dynamic, such as neural network
module parameters, and so on. User annotations are the preferred way to enable
dynamic shapes.
#### `mark_dynamic(tensor, dim, min=min, max=max)`
The {func}`torch._dynamo.mark_dynamic` function marks a tensor dimension as dynamic and will fail if it
gets specialized. It does not work for integers. Use this function only if you know
all graphs in the frame using this input converge to a single dynamic graph.
Otherwise, you may encounter a misleading constraint violation error.
In such cases, consider using {func}`torch._dynamo.maybe_mark_dynamic`. Currently,
{func}`torch._dynamo.mark_dynamic`
does not have precedence over `force_parameter_static_shapes = True` or `force_nn_module_property_static_shapes = True`.
If you know in advance that a particular dimension will be dynamic, you
can avoid the initial recompilation by using {func}`torch._dynamo.mark_dynamic(tensor, dim)`.
Additionally, if you already know the minimum and maximum possible
values for this dimension, you can specify them with
{func}`torch._dynamo.mark_dynamic(tensor, dim, min=min, max=max)`.
Here is a quick example:
```{code-cell}
import torch
@torch.compile(dynamic=True)
def f(x):
return x * x.size()[0]
x = torch.randn(10)
torch._dynamo.mark_dynamic(x, 0)
# first invocation we give it is a tensor marked as dynamic
f(x)
# rest of these invocations will use dynamically compiled code
f(torch.randn(20))
f(torch.randn(30))
f(torch.randn(40))
```
#### `maybe_mark_dynamic(tensor, dim)`
The {func}`torch._dynamo.maybe_mark_dynamic` function shares all properties
with {func}`torch._dynamo.mark_dynamic`
but does not fail if the size gets specialized. Use it for inputs shared by
multiple graphs or if the number of graphs does not converge to one for a specific
frame. For instance, in the example above, use {func}`torch._dynamo.maybe_mark_dynamic()` because graphs
with sizes 0 and 1 will specialize. However, you can use {func}`torch._dynamo.mark_dynamic` to ensure
you never specialize.
#### `mark_unbacked(tensor, dim)`
The {func}`torch._dynamo.mark_unbacked` function marks a tensor dimension as unbacked. It is unlikely
to be the tool you need, but it could be useful if the specialization occurs inside
a condition `guard_size_oblivious(x)`, and if using it removes the specialization.
Ensure it fixes the specialization and does not introduce a data-dependent error
that converts to a graph break at or before the specialization location
you are trying to avoid. It might be better to use the next option.
(dynamic_sources_allow_list)=
#### Dynamic Allow List (`DYNAMIC_SOURCES`)
Use the evnironmental variable `TORCH_COMPILE_DYNAMIC_SOURCES` to pass a configuration
list of source names to be marked as dynamic. For example:
`TORCH_COMPILE_DYNAMIC_SOURCES=L[x],L[y]`
It's easiest to find these dynamic source names using the PGO artifact in `tlparse`.
You can copy and paste the dynamic source names from the PGO artifact. This method works
for integers and tensor sizes and has the highest precedence over all other flags
that force static shapes. It will not throw an error if what is marked dynamic
gets specialized or if the provided input does not exist.
Here is an example:
```{code-cell}
import torch
@torch.compile()
def f(x):
return x * x.size()[0]
with torch.compiler.config.patch(dynamic_sources="L['x']"):
f(torch.rand(10))
f(torch.rand(20))
f(torch.rand(30))
f(torch.rand(40))
```
(torch.compiler.set_stance_eager_then_compile)=
#### `torch.compiler.set_stance ("eager_then_compile")`
At times, identifying the appropriate inputs to mark as dynamic can
be challenging. If you are willing to accept a performance cost for
the first batch, another convenient option is to use the
`eager_then_compile` stances, which automatically determine dynamic
inputs for you. For more information, see {func}`torch.compiler.set_stance` and [Dynamic Compilation Control with torch.compiler.set_stance](https://docs.pytorch.org/tutorials/recipes/torch_compiler_set_stance_tutorial.html).
(torch_compile_dynamic_true)=
### `torch.compile (dynamic=true)` (Not recommended)
This setting forces all sizes and integers to be dynamic, increasing the
chance of encountering dynamic shape bugs. Setting this option is not
recommended due to it being error prone.
It would make every input size dynamic which may result it performance
regressions and ultimately increase compilation time.
PyTorch also provides advanced control options for dynamic shapes, see:
{ref}`dynamic_shapes_advanced_control_options`.
## Where Do I Go From Here?
If you encounter a framework code bug or an issue with specialization,
file an issue so it can be reviewed and potentially improved. If the issue
is within your user code, consider whether you are willing to rewrite your
code to avoid it. Determine if it affects correctness or if it's a redundant
check. If the issue involves a Triton custom kernel with a `constexpr`
argument, evaluate whether you can rewrite it to address the problem.
```{toctree}
:maxdepth: 1
compile/dynamic_shapes_core_concepts
compile/dynamic_shapes_troubleshooting
compile/dynamic_shapes_advanced_control_options
compile/dynamic_shapes_beyond_the_basics
```
```{seealso}
* [tlparse documentation](https://github.com/pytorch/tlparse)
* [The dynamic shapes manual](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit?tab=t.0#heading=h.fh8zzonyw8ng)
```