The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.
This PR is the result of *a lot* of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:
1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)
This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).
We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926
Approved by: https://github.com/ezyang, https://github.com/eellison
During the course of fake tensor propagation (and, potentially, also Dynamo execution, although I do not believe it is possible to exercise this right now), we may generate deferred runtime asserts, which represent "guards" on unbacked symbols which cannot be immediately checked on entry to a code block; instead, they have to be checked at runtime. However, we currently accumulate these deferred runtime asserts into the ShapeEnv, and don't do anything with them.
This PR modifies Dynamo to automatically insert these runtime asserts into the FX graph, before passing it on to the backend compiler. The assert format coincides with the export assert format as practiced in `torch/_export/passes/add_runtime_assertions_for_constraints_pass.py`, but actually these passes are completely disjoint right now as I only handle deferred runtime asserts, while export only handles ranges (which I should probably also handle, but don't in this PR.)
The assertions must be inserted by Dynamo, because you could potentially then pass the asserts onto another backend like "eager" which no longer looks at the ShapeEnv before. Thanks to previous work in export, these asserts are preserved in AOTAutograd, but they are dropped by Inductor, which needs to be fixed in future work. This piece will be a bit awkward, as Inductor would have preferred to work with the Sympy expressions directly, ah well.
Here is what the Dynamo traced FX graph looks like for the test in question:
```
<eval_with_key>.0 class GraphModule(torch.nn.Module):
def forward(self, L_x_ : torch.Tensor):
l_x_ = L_x_
# File: /data/users/ezyang/c/pytorch/wu.py:8, code: y = x.item()
item = l_x_.item()
# No stacktrace found for following nodes
ge_1 = item >= 0
scalar_tensor_default = torch.ops.aten.scalar_tensor.default(ge_1); ge_1 = None
_assert_async_msg = torch.ops.aten._assert_async.msg(scalar_tensor_default, "Deferred runtime assert failed: i0 >= 0, where i0 was defined by 'item' (for more information, run with TORCH_LOGS=+dynamo,dynamic)"); scalar_tensor_default = None
# File: /data/users/ezyang/c/pytorch/wu.py:9, code: torch._check_is_size
_check_is_size = torch._check_is_size(item)
# File: /data/users/ezyang/c/pytorch/wu.py:10, code: if y >= 0:
ge = item >= 0; item = None
# File: /data/users/ezyang/c/pytorch/wu.py:11, code: return x * 2
mul = l_x_ * 2; l_x_ = None
return (mul,)
```
Note that we actually keep the `_check_is_size` in the graph redundantly. However, assert_async is retained in the graph, whereas _check_is_size ends up getting DCE'ed.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113958
Approved by: https://github.com/aakhundov, https://github.com/tugsbayasgalan
ghstack dependencies: #113978
Copied from @ezyang 's #113693.
The motivation for this change is that we'd like to guard on storage offset in inductor, to make assumptions about data alignment.
create_symbolic_sizes_strides_storage_offset() creates the sizes/strides/offset for fake tensors - they can either be integers or symints. This PR changes storage_offset to always be dynamic. In variables/builder.py, we remove a conditional so that all tensors get added to tracked_fakes. This is because the storage offset will be dynamic even if the other logic in builder.py suggests that it will be static; otherwise, we run into this issue:
1e260c851b/torch/fx/experimental/symbolic_shapes.py (L892-L895)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113734
Approved by: https://github.com/ezyang
Copied from @ezyang 's #113693.
The motivation for this change is that we'd like to guard on storage offset in inductor, to make assumptions about data alignment.
create_symbolic_sizes_strides_storage_offset() creates the sizes/strides/offset for fake tensors - they can either be integers or symints. This PR changes storage_offset to always be dynamic. In variables/builder.py, we remove a conditional so that all tensors get added to tracked_fakes. This is because the storage offset will be dynamic even if the other logic in builder.py suggests that it will be static; otherwise, we run into this issue:
1e260c851b/torch/fx/experimental/symbolic_shapes.py (L892-L895)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113734
Approved by: https://github.com/ezyang
This prepares the PR where we implement sets in terms of dicts.
To do so, rather than storing internally a dictionary that maps literals
to VariableTrackers, it stores (pretty much) a dictionary from VTs to VTs.
To do so, keys are wrapped in an opaque internal class `_Hashable`.
The Hashable class is opaque on purpose so that it fails hard if
if it inadvertently leaks back into user code.
We also found and fixed a number of latent bugs and inconsistencies
in the way dynamo checked what can be a dict key. More generally, we
make much clearer what are the things that need to be modified to add
a new supported key type to Dicts.
Fixes https://github.com/pytorch/pytorch/issues/107595
Fixes https://github.com/pytorch/pytorch/issues/111603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111196
Approved by: https://github.com/jansel
They are used in many contexts that don't actually check if the returned
type is `None`. I have also created `try_get()` for the cases where we
do actually want an Optional type returned.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535
Approved by: https://github.com/ezyang
ghstack dependencies: #113412
Since PyTorch does not have readonly tensors, compiling code with readonly numpy arrays warns about possible UB. Thus detect readonly arrays, flip them to be writeable and clone the resulting tensor.
BTW, this is a break away from numpy semantics: the resulting array is writeable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112524
Approved by: https://github.com/lezcano
(legality) It is currently impossible (and should remain impossible) - (due to dedup guards - all static tensors are unique) - to access the same **static** tensor value from a **different source**.
As for `getattr(nn.Module, tensor)` source collisions, we will never instantiate a `nn.Module getattr` source for a static tensor, due to:
- side-effect tracking (as long as we track all static tensors - see also https://github.com/pytorch/pytorch/pull/112025 for extra sanity check)
- See: c8a5bb451e/torch/_dynamo/variables/builder.py (L227)
(no worse) In any case, this field is currently unused.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111911
Approved by: https://github.com/voznesenskym
AutogradFunctionContextVariable was mutating self._saved_tensors, which is generally not allowed since VariableTracker objects should be read-only and are frequently copied via apply/clone. This was causing some test failures up the PR stack.
This moves the mutation into a separate object that is not copied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112216
Approved by: https://github.com/voznesenskym
ghstack dependencies: #112122
Major change in this PR is to make torch context manager class a separate ```TorchCtxManagerClassVariable```, since we have dynamo implementation for these ctx managers.
I was thinking to wrap them as ```UserDefinedClassVariable``` and do dispatch at ```USCVariable.call_function```, but it seems almost the same amount of work and this way is more clear.
This is on the way of moving ```TorchVariable``` to ```TorchFunctionVariable``` which will only handle the functions who would be allowed in graph (e.g, ```torch.sin```) and constant folded (e.g, ```torch.is_floating_point```). All other torch functions would be go through skip/inline rules, and would be wrapped as ```UserFunctionVariable``` (for inlined) and ```SkipFilesVariable``` (for skipped).
The next steps:
* Wrap torch modules, classes, objects as regular ```PythonModuleVariable```, ```UserDefinedClassVariable``` and ```UserDefinedObjectVariable```.
* Generate the allow in graph torch functions list and wrap them as ```TorchFunctionVariable```.
* Finally merge ```skipfiles.check``` and ```is_allowed``` into one function ```allow_skip.check(fn)``` which would return a Enum of allow, skip and inline.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111622
Approved by: https://github.com/jansel
This PR implements 2 things:
1. support the device agnostic stream and runtime APIs captured by the dynamo.
2. support the stream methods(include the event) captured by the dynamo.
Here are details for 1st.
Previously the stream captured in dynamo was tightly bind to CUDA. Here we implement a global singleton container named `StreamMethodContainer` for different backends to register their associated stream methods to dynamo. When import the backend’s product, the stream operations can be registered directly by calling
```
device_stream_method = {'current_stream': method_1,
'create_stream_context': method_2,
'set_stream': method_3,
'set_stream_by_id': method_4}
torch._dynamo.stream.register_stream_method(device_name, device_stream_method)
```
Stream methods need to be passed in this API according to the precise semantics represented by the dict key in `device_stream_method`. After register, these methods can be used by dynamo to capture the stream operations in users’ script, for example, get the current stream or set the specific stream. Additionally, the wrapped stream variable and the stream context variable are changed to be the device-agnostic, the proxy functions of these variables are assigned by the associated methods in the container. All of this are illustrated in the below. Below is a illustration.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108312
Approved by: https://github.com/jansel, https://github.com/jgong5
This major change in this PR is to consolidate the skipfiles.check rules, the major thing done is merging the original ```FILE_INLINELIST``` with ```SUBMOD_INLINELIST``` into new ```MOD_INLINELIST``` and a legacy ```LEGACY_MOD_INLINELIST```.
Let's use the following example to illustrate what is the expected behavior for this force inline list:
fa995626a8/torch/_dynamo/skipfiles.py (L344-L369)
The handling logic is:
* If f2 is inlined, we will check both ```MOD_INLINELIST``` and ```LEGACY_MOD_INLINELIST``` to consultant force inline rules for f3.
* If f2 is skipped, we will check ```LEGACY_MOD_INLINELIST``` only for inline rules for f3.
The reason behind this design is: if f2 is skipped, if we always trace all recursively called functions, we will go to the very low level functions (e.g, ```super().__init__```) which caused graph breaks. We treated this as a signal that all functions that f2 recursively called should be skipped as well if f2 is skipped. This is also a feature that many PyTorch developers requested, they just want to skip all recursive functions if they mark the upper level functions as skipped.
For PyTorch developers, we should only use ```MOD_INLINELIST``` going forward. I think most of the modules in the ```LEGACY_MOD_INLINELIST``` are legacy things to workaround when we didn't have a good skip/inline API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111451
Approved by: https://github.com/ezyang
Fixes#109604
Resubmit gh-109715 + several skips and small fixes to make tests pass.
The main fix here is by @ysiraichi : previously, dynamo did not resume tracing numpy ndarrays after a graph break.
While at it, fix several small issues Yukio's fix uncovers:
- graph break gracefully on numpy dtypes which do not map to torch.dtypes (uint16 etc)
- recognize array scalars in dynamo, treat them as 0D ndarrays
- make sure that iterating over torch.ndarray generates arrays not bare tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110512
Approved by: https://github.com/lezcano
`sys.modules` is currently treated as a constant dictionary and any reference to
it will result in guards on the full contents of `sys.modules`. This instead
adds a specialized variable tracker which tries to guard only on the modules
referenced by the code. e.g.
```
sys.modules["operator"].add(x, x)
```
will generate the guard
```
___dict_contains('operator', G['sys'].modules)
```
It does this with special support for `__contains__` `__getitem__` and `.get`
which are probably the most commonly used with `sys.modules`. For anything else
we just fall back to building the dict tracker as normal.
While accessing `sys.modules` may seem unusual, it actually comes up when
inlining the `warnings.catch_warnings` context manager which internally accesses
`sys.modules["warnings"]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110990
Approved by: https://github.com/ezyang
Several improvements for skipfiles:
* Add ```FUNC_INLINELIST``` to support function level skip/inline check.
* Use ```fn.__code__``` to match function since we can't get the function object sometimes.
* Use python module string name for ```FILE_INLINELIST``` and ```SUBMODULE_INLINELIST```.
* Use filename to match file and python module, which can fundamentally resolved the circular import issues introduced by skipfiles.
* Use ```TYPE_CHECKING``` to ensure the python module string name is correct.
* Add unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110835
Approved by: https://github.com/ezyang