pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
chilli	13681382d5	Add heuristic for when `evict_first` should be set (and some other minor things) (#108841 ) Example of when the `evict_first` heuristic helps. ``` @torch.compile def f(a, b): return (a * b).sum(dim=-1) N = 512 inps = (torch.randn(N, N, N).permute(2, 1, 0), torch.randn(N, N, N).permute(1, 2, 0)) from torch._inductor.utils import do_bench print(do_bench(lambda: f(*inps))) ``` This generates code like this: http://ix.io/4HFs ``` Original: 3.8 ms This PR: 3.54 ms Always `evict_first: 5.4ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108841 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-10-01 17:06:12 +00:00
Jez Ng	fe452108fb	Enable typechecking for _inductor/debug.py (#109335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109335 Approved by: https://github.com/eellison ghstack dependencies: #109269, #109347	2023-09-18 18:12:23 +00:00
willfengg	8010f6bf48	[dynamo][inductor] Provide public API to get compiler options/configs (#105026 ) issues resolved: https://github.com/pytorch/pytorch/issues/101832 context: get torch.compile config for further usage. E.g, the training platform wants to get if model is compiled with cudagraph enabled and trigger further action how it is implemented * the core logic is backend.get_compiler_config() in torch/_dynamo/eval_frame.py * for backend='inductor' / _TorchCompileInductorWrapper, we have inductor-specific implementation in get_compiler_config in torch/_inductor/compile_fx.py and torch/__init__.py how to use it: Below is an example. ``` model = DummyModule() optimized_module = torch.compile( model, options={"triton.cudagraphs": True} ) compiler_config = optimized_module.get_compiler_config() if compiler_config["triton.cudagraphs"]: pass ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105026 Approved by: https://github.com/yanboliang, https://github.com/jansel	2023-07-18 06:12:06 +00:00
Edward Z. Yang	26108d5d2b	Add --check-str support to after_aot minifier (#104758 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104758 Approved by: https://github.com/janeyx99, https://github.com/voznesenskym	2023-07-08 20:20:55 +00:00
Edward Z. Yang	5b600dee19	Properly preserve --tracing-mode when isolated minify (#104101 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104101 Approved by: https://github.com/voznesenskym	2023-07-05 20:19:11 +00:00
Edward Z. Yang	fd40abb706	Minor bugfix for int inputs in minifier (#104100 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104100 Approved by: https://github.com/albanD	2023-06-23 16:17:12 +00:00
Edward Z. Yang	1506acebaf	Detect symbolic tracing_mode with free_symbols (#103515 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103515 Approved by: https://github.com/anijain2305	2023-06-13 17:57:16 +00:00
Edward Z. Yang	7112880cc1	Preserve leaf-ness and requires_grad-ness in minified repros (#102899 ) Also some minor refactoring Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102899 Approved by: https://github.com/albanD	2023-06-05 19:56:00 +00:00
Animesh Jain	68e55bff62	[minifier] add missing import (#102521 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102521 Approved by: https://github.com/jansel	2023-05-30 20:57:16 +00:00
Shunting Zhang	029c6a9934	[accuracy minifier] cast copied model rather than update the original model (#101901 ) This is the fix Ed found during the break of the summit :) I think I'd better to split it out of https://github.com/pytorch/pytorch/pull/99773 so people don't need to patch that PR to run the repro.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/101901 Approved by: https://github.com/ezyang	2023-05-20 00:50:32 +00:00
Edward Z. Yang	96487d0d1f	Refactor after_dynamo to have a CLI interface too. (#101220 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101220 Approved by: https://github.com/anijain2305	2023-05-14 19:03:16 +00:00
Edward Z. Yang	ee4cb4b1e7	Add --offload-to-disk support to minifier (#100546 ) When minifying extremely large repros, the minifier can run out of memory. This is because, for delta debugging, the minifier keeps a copy of every intermediate output in the network. This can easily put you over the memory limit for your GPU. To make matters worse, we cannot easily delta debug in such a situation, as delta debugging involves replacing intermediates with inputs, but doing so can cause an intermediate to become live longer than its actual extent in the original model (since inputs all have to be allocated up front). The strategy in this PR is to use `load_tensor` from the previous PR to offer a low memory mode for delta debugging. Instead of putting intermediates as inputs, we instead load them in the middle of the graph in question. If, through DCE, the load_tensor ends up floating to the top of the graph, we can input-ify it. We now no longer save all intermediates in memory, but instead save them to disk. I used this to successfully minify the repro that helped us solve https://github.com/pytorch/pytorch/pull/100332 The testing is not very good. I can try to add more robust testing but it will involve a more involved refactor to FX minifier. Let me know if that's what you want. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100546 Approved by: https://github.com/anijain2305, https://github.com/voznesenskym	2023-05-05 05:25:03 +00:00
Edward Z. Yang	c2556c034d	Improve minifier printing to be more chatty when it makes sense (#100486 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100486 Approved by: https://github.com/voznesenskym	2023-05-04 02:51:26 +00:00
Edward Z. Yang	c7e9f40653	Misc accuracy improvements on minifier (#100447 ) The changes: * Add config knob `same_two_models_use_fp64` for toggling whether or not to use fp64 * Add a test showing that RMSE is superior to atol/rtol * Add `--strict-accuracy` options, which allows for testing against integral/boolean accuracy. Regular accuracy by default now ONLY. There's a test which exercises this, it's a little delicate but I had trouble thinking of a good test otherwise. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100447 Approved by: https://github.com/voznesenskym	2023-05-04 02:51:26 +00:00
Edward Z. Yang	1bbca4fbc0	Relax after_aot restriction on no buffers, serialize small constants (#100472 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100472 Approved by: https://github.com/bdhirsh, https://github.com/voznesenskym	2023-05-03 03:10:22 +00:00
Edward Z. Yang	0a479d9b9c	Simplify minifier testing by incorporating fault injection in prod code (#100357 ) Previously, minifier testing injected faults by injecting extra code into the repro scripts, and then ensuring this code got propagated to all subsequent subprocess calls. This was not only quite complicated, but also induced a big slowdown on the minifier, because to inject the faults, you had to import torch._inductor, which would cause the compilation threads to immediately get initialized before you even got to do anything else in the repro script. This new approach fixes this problem by incorporating the fault injection into "prod" code. Essentially, for inductor fault injection we introduce some new config flags that let you "configure" Inductor to be buggy; for Dynamo fault injection we just permanently keep the buggy testing backends registered. This is MUCH simpler: we only have to propagate the buggy config (which is something we're already doing), and it saves the minifier scripts from having to immediately initialize inductor on entry. Also, I enable the test for Triton runtime errors, now that tl.assert_device is here. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100357 Approved by: https://github.com/voznesenskym	2023-05-02 11:44:06 +00:00
Edward Z. Yang	2d8deffc1e	Refactor repro/minifier into CLI; add analyze (#100226 ) This is a two part PR; I can split it if you really want me to. The first part is a refactor of the after aot repro/minifier scripts to come with a command line interface. I maintain exact BC with the previous interface (so, e.g., you still get a repro.py and a run_minifier.py that do the same thing as before), but each of these scripts also take command line arguments now which you can use to customize what actually happens. Check `run_repro` for full documentation on the arguments. The second part of this is an implementation of `analyze` subcommand on the new CLI for any repro. <img width="1277" alt="image" src="https://user-images.githubusercontent.com/13564/235045677-8545aab7-5e83-4813-bbec-47783dc60122.png"> This facility is oriented towards accuracy debugging. It does several things: 1. It will run your model twice and check for nondeterminism in inductor/float64, even on intermediate inputs (our benchmarking nondeterminism test only checks for nondeterminism on the final output). This makes localizing which operator is nondeterministic easy. 2. It will run your compiled model side-by-side with eager and float64 variants, and then report when things diverge too far from RMSE delta from float64. Importantly, it does all this without requiring every intermediate to be held in memory (which will cause an OOM on large repros, such as the one I tested this on.) Some other minor improvements: * MinifierTestBase now has an easy to comment out spot that you can use to retain the temporary directory; good for debugging * We print "running minifier" and "running repro" in MinifierTestBase to make it easier to orient where logs are coming from * same takes a `log_error` optional argument which you can use to reroute the error logs when things mismatch * counters["inductor"]["intermediate_hooks"] tracks the number of intermediate hooks we've codegen'ed; good for populate the tqdm interface * torch.fx.interpreter gets an official `boxed_run` interface which uses the boxed arguments calling convention and doesn't retain inputs unnecessarily long * torch.utils._content_store gets compute_tensor_metadata/read_tensor_metadata helper functions for computing tensor information without serializing it Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100226 Approved by: https://github.com/bertmaher, https://github.com/bdhirsh, https://github.com/anijain2305	2023-05-01 11:12:38 +00:00
Jason Ansel	884c5c86f1	Pass torch.compile mode/options to all backends (#99645 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99645 Approved by: https://github.com/anijain2305	2023-04-27 19:41:26 +00:00
Edward Z. Yang	67e0913de9	Add support for serializing real tensor data in after aot minifier (#99834 ) The new minifier script looks like this: ``` import torch._dynamo.repro.after_aot reader = torch._dynamo.repro.after_aot.InputReader(save_dir='/tmp/tmpcsngx39e') buf0 = reader.storage('e2b39c716c0d4efb9fa57375a3902b9dab666893', 16) t0 = reader.tensor(buf0, (4,)) args = [t0] mod = make_fx(Repro(), tracing_mode='real')(*args) ``` The real tensor data is stored in the storages folder of the checkpoint dump directory. If you delete this folder / it is otherwise missing, we will transparently fall back to generating random data like before. The tensors are serialized using content store from #99809, which means each storage is content-addressed and we will automatically deduplicate equivalent data (which is useful if you keep dumping out, e.g., your parameters.) We don't use the tensor serialization capability from content store, instead all of the tensor metadata is stored inline inside the repro script (so that everything is in one file if you lose the checkpointed tensors). We also add a stable_hash option to content store, where we use a slow SHA-1 sum on the data in CPU side to compute a hash that is stable across systems with the same endianness. Out of rage, I also added support for Dtype.itemsize property access. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99834 Approved by: https://github.com/voznesenskym	2023-04-27 11:52:13 +00:00
Animesh Jain	5f138a6b65	[minifier][after dynamo] clone inputs while retaining gradness (#100066 ) Helps with minifying one failure in https://github.com/pytorch/pytorch/issues/98561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100066 Approved by: https://github.com/ezyang	2023-04-26 21:31:18 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
Edward Z. Yang	881c57230d	Move more stuff to after_aot (#99557 ) Not sure why this didn't work first time around. Second time's a charm. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99557 Approved by: https://github.com/anijain2305	2023-04-21 16:20:40 +00:00
Edward Z. Yang	c17ff0ed36	Print AOT Autograd graph name when accuracy failed (#99366 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99366 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2023-04-20 15:35:47 +00:00
Edward Z. Yang	805a6dc8d2	Add an expect test for test_save_graph_repro (#99538 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99538 Approved by: https://github.com/anijain2305	2023-04-20 00:00:40 +00:00
Edward Z. Yang	bc9eaa7abf	Run post-aot compiler at compilation time, not at runtime. (#99457 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99457 Approved by: https://github.com/anijain2305	2023-04-19 19:36:09 +00:00
Edward Z. Yang	b01edf45f8	Add typing to debug_utils and repro (#99452 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99452 Approved by: https://github.com/anijain2305	2023-04-19 16:00:19 +00:00
Edward Z. Yang	2e25fb5d55	Refactor debug_utils into after_aot and after_dynamo modules (#99450 ) There are no code changes but I did take the opportunity to reorder and group the functions once they were placed in their respective modules. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99450 Approved by: https://github.com/anijain2305	2023-04-19 16:00:19 +00:00

27 Commits