Commit Graph

27 Commits

Author SHA1 Message Date
chilli
13681382d5 Add heuristic for when evict_first should be set (and some other minor things) (#108841)
Example of when the `evict_first` heuristic helps.
```
@torch.compile
def f(a, b):
    return (a * b).sum(dim=-1)

N = 512
inps = (torch.randn(N, N, N).permute(2, 1, 0), torch.randn(N, N, N).permute(1, 2, 0))
from torch._inductor.utils import do_bench
print(do_bench(lambda: f(*inps)))
```

This generates code like this: http://ix.io/4HFs

```
Original: 3.8 ms
This PR: 3.54 ms
Always `evict_first: 5.4ms
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108841
Approved by: https://github.com/lezcano, https://github.com/jansel
2023-10-01 17:06:12 +00:00
Jez Ng
fe452108fb Enable typechecking for _inductor/debug.py (#109335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109335
Approved by: https://github.com/eellison
ghstack dependencies: #109269, #109347
2023-09-18 18:12:23 +00:00
willfengg
8010f6bf48 [dynamo][inductor] Provide public API to get compiler options/configs (#105026)
issues resolved: https://github.com/pytorch/pytorch/issues/101832

**context**: get torch.compile config for further usage. E.g, the training platform wants to get if model is compiled with cudagraph enabled and trigger further action

**how it is implemented**
   * the core logic is backend.get_compiler_config() in torch/_dynamo/eval_frame.py
   * for backend='inductor' / _TorchCompileInductorWrapper, we have inductor-specific implementation in get_compiler_config in torch/_inductor/compile_fx.py and torch/__init__.py

**how to use it**: Below is an example.

```
model = DummyModule()
optimized_module = torch.compile(
    model, options={"triton.cudagraphs": True}
)
compiler_config = optimized_module.get_compiler_config()

if compiler_config["triton.cudagraphs"]:
   pass
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105026
Approved by: https://github.com/yanboliang, https://github.com/jansel
2023-07-18 06:12:06 +00:00
Edward Z. Yang
26108d5d2b Add --check-str support to after_aot minifier (#104758)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104758
Approved by: https://github.com/janeyx99, https://github.com/voznesenskym
2023-07-08 20:20:55 +00:00
Edward Z. Yang
5b600dee19 Properly preserve --tracing-mode when isolated minify (#104101)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104101
Approved by: https://github.com/voznesenskym
2023-07-05 20:19:11 +00:00
Edward Z. Yang
fd40abb706 Minor bugfix for int inputs in minifier (#104100)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104100
Approved by: https://github.com/albanD
2023-06-23 16:17:12 +00:00
Edward Z. Yang
1506acebaf Detect symbolic tracing_mode with free_symbols (#103515)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103515
Approved by: https://github.com/anijain2305
2023-06-13 17:57:16 +00:00
Edward Z. Yang
7112880cc1 Preserve leaf-ness and requires_grad-ness in minified repros (#102899)
Also some minor refactoring

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102899
Approved by: https://github.com/albanD
2023-06-05 19:56:00 +00:00
Animesh Jain
68e55bff62 [minifier] add missing import (#102521)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102521
Approved by: https://github.com/jansel
2023-05-30 20:57:16 +00:00
Shunting Zhang
029c6a9934 [accuracy minifier] cast copied model rather than update the original model (#101901)
This is the fix Ed found during the break of the summit :)

I think I'd better to split it out of https://github.com/pytorch/pytorch/pull/99773 so people don't need to patch that PR to run the repro.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101901
Approved by: https://github.com/ezyang
2023-05-20 00:50:32 +00:00
Edward Z. Yang
96487d0d1f Refactor after_dynamo to have a CLI interface too. (#101220)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101220
Approved by: https://github.com/anijain2305
2023-05-14 19:03:16 +00:00
Edward Z. Yang
ee4cb4b1e7 Add --offload-to-disk support to minifier (#100546)
When minifying extremely large repros, the minifier can run out of memory. This is because, for delta debugging, the minifier keeps a copy of every intermediate output in the network. This can easily put you over the memory limit for your GPU. To make matters worse, we cannot easily delta debug in such a situation, as delta debugging involves replacing intermediates with inputs, but doing so can cause an intermediate to become live longer than its actual extent in the original model (since inputs all have to be allocated up front).

The strategy in this PR is to use `load_tensor` from the previous PR to offer a low memory mode for delta debugging. Instead of putting intermediates as inputs, we instead load them in the middle of the graph in question.  If, through DCE, the load_tensor ends up floating to the top of the graph, we can input-ify it. We now no longer save all intermediates in memory, but instead save them to disk. I used this to successfully minify the repro that helped us solve https://github.com/pytorch/pytorch/pull/100332

The testing is not very good. I can try to add more robust testing but it will involve a more involved refactor to FX minifier. Let me know if that's what you want.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100546
Approved by: https://github.com/anijain2305, https://github.com/voznesenskym
2023-05-05 05:25:03 +00:00
Edward Z. Yang
c2556c034d Improve minifier printing to be more chatty when it makes sense (#100486)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100486
Approved by: https://github.com/voznesenskym
2023-05-04 02:51:26 +00:00
Edward Z. Yang
c7e9f40653 Misc accuracy improvements on minifier (#100447)
The changes:

* Add config knob `same_two_models_use_fp64` for toggling whether or not to use fp64
* Add a test showing that RMSE is superior to atol/rtol
* Add `--strict-accuracy` options, which allows for testing against integral/boolean accuracy.  Regular accuracy by default now ONLY. There's a test which exercises this, it's a little delicate but I had trouble thinking of a good test otherwise.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100447
Approved by: https://github.com/voznesenskym
2023-05-04 02:51:26 +00:00
Edward Z. Yang
1bbca4fbc0 Relax after_aot restriction on no buffers, serialize small constants (#100472)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100472
Approved by: https://github.com/bdhirsh, https://github.com/voznesenskym
2023-05-03 03:10:22 +00:00
Edward Z. Yang
0a479d9b9c Simplify minifier testing by incorporating fault injection in prod code (#100357)
Previously, minifier testing injected faults by injecting extra code
into the repro scripts, and then ensuring this code got propagated to
all subsequent subprocess calls.  This was not only quite complicated,
but also induced a big slowdown on the minifier, because to inject the
faults, you had to import torch._inductor, which would cause the
compilation threads to immediately get initialized before you even got
to do anything else in the repro script.

This new approach fixes this problem by incorporating the fault
injection into "prod" code.  Essentially, for inductor fault injection
we introduce some new config flags that let you "configure" Inductor to
be buggy; for Dynamo fault injection we just permanently keep the buggy
testing backends registered.  This is MUCH simpler: we only have to
propagate the buggy config (which is something we're already doing),
and it saves the minifier scripts from having to immediately initialize
inductor on entry.

Also, I enable the test for Triton runtime errors, now that tl.assert_device is here.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100357
Approved by: https://github.com/voznesenskym
2023-05-02 11:44:06 +00:00
Edward Z. Yang
2d8deffc1e Refactor repro/minifier into CLI; add analyze (#100226)
This is a two part PR; I can split it if you really want me to.

The first part is a refactor of the after aot repro/minifier scripts to come with a command line interface. I maintain exact BC with the previous interface (so, e.g., you still get a repro.py and a run_minifier.py that do the same thing as before), but each of these scripts also take command line arguments now which you can use to customize what actually happens. Check `run_repro` for full documentation on the arguments.

The second part of this is an implementation of `analyze` subcommand on the new CLI for any repro.

<img width="1277" alt="image" src="https://user-images.githubusercontent.com/13564/235045677-8545aab7-5e83-4813-bbec-47783dc60122.png">

This facility is oriented towards accuracy debugging. It does several things:

1. It will run your model twice and check for nondeterminism in inductor/float64, *even* on intermediate inputs (our benchmarking nondeterminism test only checks for nondeterminism on the final output). This makes localizing which operator is nondeterministic easy.
2. It will run your compiled model side-by-side with eager and float64 variants, and then report when things diverge too far from RMSE delta from float64.

Importantly, it does all this without requiring every intermediate to be held in memory (which will cause an OOM on large repros, such as the one I tested this on.)

Some other minor improvements:

* MinifierTestBase now has an easy to comment out spot that you can use to retain the temporary directory; good for debugging
* We print "running minifier" and "running repro" in MinifierTestBase to make it easier to orient where logs are coming from
* same takes a `log_error` optional argument which you can use to reroute the error logs when things mismatch
* counters["inductor"]["intermediate_hooks"] tracks the number of intermediate hooks we've codegen'ed; good for populate the tqdm interface
* torch.fx.interpreter gets an official `boxed_run` interface which uses the boxed arguments calling convention and doesn't retain inputs unnecessarily long
* torch.utils._content_store gets compute_tensor_metadata/read_tensor_metadata helper functions for computing tensor information without serializing it

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100226
Approved by: https://github.com/bertmaher, https://github.com/bdhirsh, https://github.com/anijain2305
2023-05-01 11:12:38 +00:00
Jason Ansel
884c5c86f1 Pass torch.compile mode/options to all backends (#99645)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99645
Approved by: https://github.com/anijain2305
2023-04-27 19:41:26 +00:00
Edward Z. Yang
67e0913de9 Add support for serializing real tensor data in after aot minifier (#99834)
The new minifier script looks like this:

```
import torch._dynamo.repro.after_aot
reader = torch._dynamo.repro.after_aot.InputReader(save_dir='/tmp/tmpcsngx39e')
buf0 = reader.storage('e2b39c716c0d4efb9fa57375a3902b9dab666893', 16)
t0 = reader.tensor(buf0, (4,))
args = [t0]
mod = make_fx(Repro(), tracing_mode='real')(*args)
```

The real tensor data is stored in the storages folder of the checkpoint dump directory. If you delete this folder / it is otherwise missing, we will transparently fall back to generating random data like before. The tensors are serialized using content store from #99809, which means each storage is content-addressed and we will automatically deduplicate equivalent data (which is useful if you keep dumping out, e.g., your parameters.) We don't use the tensor serialization capability from content store, instead all of the tensor metadata is stored inline inside the repro script (so that everything is in one file if you lose the checkpointed tensors).

We also add a stable_hash option to content store, where we use a slow SHA-1 sum on the data in CPU side to compute a hash that is stable across systems with the same endianness.

Out of rage, I also added support for Dtype.itemsize property access.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99834
Approved by: https://github.com/voznesenskym
2023-04-27 11:52:13 +00:00
Animesh Jain
5f138a6b65 [minifier][after dynamo] clone inputs while retaining gradness (#100066)
Helps with minifying one failure in https://github.com/pytorch/pytorch/issues/98561

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100066
Approved by: https://github.com/ezyang
2023-04-26 21:31:18 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Edward Z. Yang
881c57230d Move more stuff to after_aot (#99557)
Not sure why this didn't work first time around. Second time's a charm.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99557
Approved by: https://github.com/anijain2305
2023-04-21 16:20:40 +00:00
Edward Z. Yang
c17ff0ed36 Print AOT Autograd graph name when accuracy failed (#99366)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99366
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2023-04-20 15:35:47 +00:00
Edward Z. Yang
805a6dc8d2 Add an expect test for test_save_graph_repro (#99538)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99538
Approved by: https://github.com/anijain2305
2023-04-20 00:00:40 +00:00
Edward Z. Yang
bc9eaa7abf Run post-aot compiler at compilation time, not at runtime. (#99457)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99457
Approved by: https://github.com/anijain2305
2023-04-19 19:36:09 +00:00
Edward Z. Yang
b01edf45f8 Add typing to debug_utils and repro (#99452)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99452
Approved by: https://github.com/anijain2305
2023-04-19 16:00:19 +00:00
Edward Z. Yang
2e25fb5d55 Refactor debug_utils into after_aot and after_dynamo modules (#99450)
There are no code changes but I did take the opportunity to
reorder and group the functions once they were placed in their
respective modules.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99450
Approved by: https://github.com/anijain2305
2023-04-19 16:00:19 +00:00