Summary: To preserve global state guards we need to make the C++ type serialzable. Using json because it's easier to do and we don't have a lot of data in global state.
Test Plan: test_dynamo -k test_global_state_guard_serialization
Differential Revision: D72410611
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150636
Approved by: https://github.com/williamwen42
Doing this removes the need of collecting `id` and therefore facilitates serialization. It also improves readability with recompilations. Earlier, recompile message will just show the `id`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149228
Approved by: https://github.com/jansel
Doing this removes the need of collecting `id` and therefore facilitates serialization. It also improves readability with recompilations. Earlier, recompile message will just show the `id`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149228
Approved by: https://github.com/jansel
Fix
```
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'torch::jit::AliasDb::WriteRegistry'
91 | static_assert(sizeof(_Tp)>0,
| ^~~~~~~~~~~
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:399:4: note: in instantiation of member function 'std::default_delete<torch::jit::AliasDb::WriteRegistry>::operator()' requested here
399 | get_deleter()(std::move(__ptr));
| ^
../torch/csrc/jit/ir/alias_analysis.cpp:200:10: note: in instantiation of member function 'std::unique_ptr<torch::jit::AliasDb::WriteRegistry>::~unique_ptr' requested here
200 | AliasDb::~AliasDb() = default;
| ^
../torch/csrc/jit/ir/alias_analysis.cpp:200:23: note: in defaulted destructor for 'torch::jit::AliasDb' first required here
200 | AliasDb::~AliasDb() = default;
| ^
../torch/csrc/jit/ir/alias_analysis.h:298:10: note: forward declaration of 'torch::jit::AliasDb::WriteRegistry'
298 | struct WriteRegistry;
| ^
1 error generated.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148758
Approved by: https://github.com/Skylion007
In hinsight, we never needed a DICT_SUBCLASS_GUARD_MANAGER, because Dynamo would inline through the overridden keys method. In this PR, we ensure that while creating guards and constructing variable trackers, we get the `d.keys()` value by using `dict.keys(d)`. This ensures that we do not call overridden keys method. Therefore, the C++ guard can use `PyDict_Next` directly to check the guards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143722
Approved by: https://github.com/jansel
In hinsight, we never needed a DICT_SUBCLASS_GUARD_MANAGER, because Dynamo would inline through the overridden keys method. In this PR, we ensure that while creating guards and constructing variable trackers, we get the `d.keys()` value by using `dict.keys(d)`. This ensures that we do not call overridden keys method. Therefore, the C++ guard can use `PyDict_Next` directly to check the guards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143722
Approved by: https://github.com/jansel
Implements https://github.com/pytorch/pytorch/issues/93753 - move frame local guard accessors to C++.
Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible.
Some implementation notes:
- `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells.
- `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily.
- We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed.
- We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up)
dynamo_guard_eval.py microbenchmark update:
- 713.2us -> 630.0us (3.10)
- 598.8us -> 530.7us (3.12)
Other followups:
- Add `FrameLocalsMapping` version for `check_verbose_nopybind` in order to match behavior between `check_nopybind` and `check_verbose_nopybind`. This can prevent difficult debugging situations where guards fail (`check_nopybind` returns false) but no guard error message is generated (`check_verbose_nopybind` succeeds).
- Rewrite the `SHAPE_ENV` guard into C++ - it is a fairly common guard that results in `FrameLocalsMapping` needing to convert to a dict
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140063
Approved by: https://github.com/jansel
ghstack dependencies: #142117, #142430
This PR moves the logic for computing the overlapping relations between input tensors that
share a storage instance to C++.
In summary, this PR:
- Moves both `tensors_definitely_do_not_overlap` and part of `compute_overlapping_tensors`
to C++
- Introduces a `check_overlapping` function that re-runs `compute_overlapping_tensors`,
checking that the result is consistent with what is expected
- Introduces the `StorageOverlapChecker` class
- Keeps track of overlapping and non-overlapping tensors
- Actually checks the overlapping relation (call `check_overlapping`) when all tensors
are collected
- Introduces the `STORAGE_OVERLAPPING` relational guard
- Has a reference to a `StorageOverlapChecker`
- Stores the to-be-checked tensors in the checker, and triggers its check
- Introduces `install_storage_overlapping_guard` python function
- Creates an instance of `StorageOverlapChecker`
- Creates 2 instances of the `STORAGE_OVERLAPPING` guard (for overlapping and
non-overlapping tensors), referencing the same `StorageOverlapChecker` instance
**Why is `StorageOverlapChecker` needed?**
The way `GuardManager` is implemented, we have no control over the order in which the
check methods are called, i.e. no control over the order the tensors are collected. So, we
can't easily split them in "overlapping" and non-overlapping kinds.
Instead, we create 2 instances of `STORAGE_OVERLAPPING` guard, each of which helps
collecting the tensors for one of the kinds mentioned above. They are then used in a
single `StorageOverlapChecker` instance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140013
Approved by: https://github.com/bdhirsh
ghstack dependencies: #139554, #139555
A subsequeunt patch attempts to fix a side-effect issue for range
iterators, which in turn exposed an exising issue on guards for range
iterators -- the following test started failing:
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/test_tensor_creation_ops.py TestTensorCreationCPU.test_hstack_column_stack_cpu_int16
```
This patch adds a `RANGE_ITERATOR_MATCH` guard to make sure that we
properly guard on range iterators, and adds a regression test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141902
Approved by: https://github.com/jansel
ghstack dependencies: #141713, #141714, #141715
This is a bug on the main exposed by https://github.com/pytorch/pytorch/issues/139476
We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.
If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.
So, I am adding a flag which keeps the current state but allows the
users to remove this optimization. Not ideal, but given how serious guard eval perf has to be,
we are in the gray are of unsoundness vs performance tradeoff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139560
Approved by: https://github.com/jansel
This is a bug on the main exposed by https://github.com/pytorch/pytorch/issues/139476
We have dict tag optimization where if the dict tag does not change, we
skip guards on all the items of the dict that are "immutable". We
considered tensors as immutable in such scenarios. This is critical for
guard eval performance, because generally users dont change their
parameters.
If I try to remove this optimization, we see slowdowns, e.g, 3.03x to
2.95x on conv_mixer TIMM benchamrk.
So, I am adding a flag which keeps the current state but allows the
users to remove this optimization. Not ideal, but given how serious guard eval perf has to be,
we are in the gray are of unsoundness vs performance tradeoff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139560
Approved by: https://github.com/jansel