I was debugging an internal ne divergence for a while that ended up being because of a bad meta. I added an explicit a config option and an explicit backend `aot_eager_decomp_partition_crossref` to enable the FakeCrossRefMode when running the graph. I added an explicit backend bc I suspect it will be useful for internal models but I'm also happy to leave as config option.
It will only test ops that have meta to avoid memory overhead of hitting fallback path and running in eager.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138651
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
Follow up to https://github.com/pytorch/pytorch/pull/131936. In the original bisector you'd have to test inline if we were disabling a component - `if BisectionManager.disable_subsystem("inductor", "post_grad_passes", debug_info)`. This adds a convenient way of testing config changes for root causing issue. I've added `emulate_precision_casts` and aot_eager_decomp_partition cse as initial ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137346
Approved by: https://github.com/zou3519
This is a utility to aid the torch.compile debugging. You provide a function that returns True on success, False on failure, or do something out of process and run bisect_helper `good | bad`.
The bisector will first go through backends - `eager`, `aot_eager`, `aot_eager_decomp_partition`, `inductor` to find the first failing backend. Then, it will go through subsystems within the backend - currently limited but could be expanded - and try to find the first subsystem for which disabling fixes the problem. Once it has found the failing subsystem, it will find the number of times the subsystem is applied, and then bisect through it.
An example usage of how to hook it up for aot_eager_decomp_partition and decomposition subsystem is :
```
from torch._inductor.bisect_helper import BisectionManager
if op in CURRENT_DECOMPOSITION_TABLE:
if BisectionManager.disable_subsystem("aot_eager_decomp_partition", "decomposition", lambda: repr(op)):
return NotImplemented
```
Once it has discovered the problematic change, it will print out the associated debug info, and you can set the same limits with `TORCH_BISECT_BACKEND` `TORCH_BISECT_SUBSYSTEM` and `TORCH_BISECT_MAX`.
We could add further options as an automated way of going through a check list for checking divergence - e.g., the mode to emulate amp casts.
Fix for https://github.com/pytorch/pytorch/issues/126546
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131936
Approved by: https://github.com/ezyang