pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	acd0873d3b	[CI] Fix `TestDynamoTimed.test_ir_count` for 3.12 (#154268 ) Python-3.12 emits the same bytecode as 3.13 for code in question Pull Request resolved: https://github.com/pytorch/pytorch/pull/154268 Approved by: https://github.com/clee2000, https://github.com/atalman ghstack dependencies: #154237	2025-05-23 20:08:19 +00:00
IvanKobzarev	4439255148	[aotd] Support saved tensors hooks in aot_autograd (#150032 ) https://github.com/pytorch/pytorch/issues/148222 Goal: At the moment autograd saved tensors hooks are run in eager after compiled forward. They are executed at the same time for all saved tensors. Hooks can be used to reduce amout of memory used for saved tensors, doing quantization or offloading to cpu. This is suboptimal for optimization of peak memory. Better solution will be to put the hooks in the graph, as close as possible to the last usage of the tensor. To get user specified autograd saved tensors hooks in the graph. Logic: UX: If user specifies with torch.autograd.graph.saved_tensors_hooks(pack_gm, unpack_gm). Where pack_gm and unpack_gm are torch.fx.GraphModule. Then AotAutograd will retrace those graph modules, doing decompositions and functionalization in aot_autograd, inlining the result graphs in forward epilogue and backward prologue. User may want to use control logic in the hooks, for example applying quantization only for specific dtypes and sizes. This is also possible, user can put it into torch.fx.wrap function and use symbolic trace to make a GraphModule. In that case AotAutograd cahing will work only in case when user explicitly set to the torch.fx.wrap call_function node "user_cache_hash" metadata. If this metadata set - then aot_autograd cache can use saved cache artifact. If metadata is not set - then cache is bypassed. Dynamo: Dynamo traces pack and unpack hooks and installs them as subgraph and explicitly adds to the output_graph. (As those subgraphs are not used and will not be copied in the result by default). The complexity here is that at this moment we do not have example of inputs for the hooks. We trace pack_hook with some Tensor from the inputs. The result subgraphs are added to the hashing of AotAutograd Cache. In AotAutograd we retrace the graph with the true saved tensors coming from partitioner. Backwards Compatibility: As current hooks are executed in eager mode and not all of them will be traceable - we only try to put in the graph hooks, explicitly marked by user with annotation (@_inlineable_saved_tensors_hooks). For other hooks or if compiled autograd is enabled - keep the same logic. Recompilations: Hooks are guarded with lambda guard matching function id to cause recompilation if user reruns compiled function. Aot_autograd: After partitioner prepared forward and backward module - we trace prepared at Dynamo graphs for pack and unpack hooks and inline them in epilogue of forward and prologue of backward. Forward outputs and backward inputs are changed, transparently for user. We do not try to put it close the last usage etc., relying on inductor to do this optimization. ``` INFO: TRACED GRAPH ===== Forward graph pre saved_tensors_hooks inlining 3 ===== /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"): # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1 add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1); primals_3 = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x) view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2]) return (view, add, primals_1, primals_2) INFO: TRACED GRAPH ===== Backward graph pre saved_tensors_hooks inlining 3 ===== /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"): # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1 add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1); primals_3 = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x) view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2]) return (view, add, primals_1, primals_2) INFO: TRACED GRAPH ===== saved_tensors_pack_hook add 3 ===== /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class pack_float8(torch.nn.Module): def forward(self, x_1: "f32[s0, s1][s1, 1]cuda:0"): # No stacktrace found for following nodes _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(x_1, dtype = torch.float8_e4m3fn); x_1 = None return (torch.float32, _to_copy) INFO: TRACED GRAPH ===== saved_tensors_unpack_hook add 3 ===== <eval_with_key>.22 from /data/users/ivankobzarev/a/pytorch/torch/fx/experimental/proxy_tensor.py:1225 in wrapped class pack_float8(torch.nn.Module): def forward(self, x_1: "f32[s0, s1][s1, 1]cuda:0"): # No stacktrace found for following nodes _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(x_1, dtype = torch.float8_e4m3fn); x_1 = None return (torch.float32, _to_copy) INFO: TRACED GRAPH ===== Forward graph 3 ===== /data/users/ivankobzarev/a/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", primals_3: "f32[s0, s1][s1, 1]cuda:0"): # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6660 in simple_fn, code: x = x + 1 add: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(primals_3, 1); primals_3 = None # No stacktrace found for following nodes _to_copy: "f8e4m3fn[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(add, dtype = torch.float8_e4m3fn) # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x) view: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.view.default(add, [primals_1, primals_2]); add = None return (view, _to_copy, primals_1, primals_2) INFO: TRACED GRAPH ===== Backward graph 3 ===== <eval_with_key>.21 class GraphModule(torch.nn.Module): def forward(self, primals_1: "Sym(s0)", primals_2: "Sym(s1)", add_packed_2: "f8e4m3fn[s0, s1][s1, 1]cuda:0", tangents_1: "f32[s0, s1][s1, 1]cuda:0"): # No stacktrace found for following nodes _to_copy: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten._to_copy.default(add_packed_2, dtype = torch.float32); add_packed_2 = None # File: /data/users/ivankobzarev/a/pytorch/test/functorch/test_aotdispatch.py:6661 in simple_fn, code: x = SAF.apply(x) add_7: "f32[s0, s1][s1, 1]cuda:0" = torch.ops.aten.add.Tensor(tangents_1, _to_copy); tangents_1 = _to_copy = None return (None, None, add_7) ``` Differential Revision: [D72187044](https://our.internmc.facebook.com/intern/diff/D72187044) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150032 Approved by: https://github.com/bdhirsh	2025-05-22 14:09:38 +00:00
clr	a952f42bdb	dynamo: Log if we're using dynamic shapes via set_feature_usage (#153490 ) This makes it extremely clear if a specific model didn't use dynamic shapes and should have (except it had a bad config option). Pull Request resolved: https://github.com/pytorch/pytorch/pull/153490 Approved by: https://github.com/jansel	2025-05-16 23:59:00 +00:00
clr	85f97b5a8c	compile_fx: make a compile event that corresponds to the fx_compile waitcounter (#152983 ) This is a pretty minor change, but by having exact correspondence, we can easily confirm data differences between perfetto and wait counters Pull Request resolved: https://github.com/pytorch/pytorch/pull/152983 Approved by: https://github.com/jansel, https://github.com/masnesral	2025-05-14 01:54:42 +00:00
Sam Larsen	dde705864a	Fix test broken by D73809989 (#153413 ) Summary: I forgot to remove this unused field in D73809989. Test Plan: `buck test 'fbcode//mode/opt' fbcode//caffe2/test:fbonly -- --exact 'caffe2/test:fbonly - test_compilation_metrics_logger_in_sync (caffe2.test.fb.test_fb.TestFBOnly)'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153413 Approved by: https://github.com/c00w	2025-05-13 16:44:30 +00:00
Animesh Jain	7fdd754136	[compile-time traces] Profile large missing gaps in compile time (#151256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256 Approved by: https://github.com/bdhirsh, https://github.com/masnesral, https://github.com/zou3519, https://github.com/jansel	2025-05-13 14:44:51 +00:00
Sam Larsen	e6e1ca1996	[easy] Fix test_dynamo_timed (#152387 ) Summary: I'm just trying to fix the test again. It's out of date because it's disabled and some dynamo_timed-related fields are gone now. Test Plan: `python test/dynamo/test_utils.py -k dynamo_timed` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152387 Approved by: https://github.com/anijain2305	2025-04-29 19:22:56 +00:00
Animesh Jain	159e2f96e3	[dynamo][ci] Fix recently broken test (#151877 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151877 Approved by: https://github.com/masnesral, https://github.com/jansel	2025-04-22 06:42:03 +00:00
Sam Larsen	80a3877b3d	[easy] Fix test_dynamo_timed (#151816 ) Summary: The structured logging counter is a global that might have been affected by earlier tests. Clear it explicitly. Fixes #148093 Test Plan: `pytest test/dynamo/test_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151816 Approved by: https://github.com/ppanchalia	2025-04-22 00:12:31 +00:00
Sam Larsen	f20a266512	[easy] Update test/dynamo/test_utils.py (#151599 ) Summary: test/dynamo/test_utils.py is out of date because of some new dynamo_timed fields. (I guess the test is disabled?). Bring it up to date Test Plan: `python test/dynamo/test_utils.py` Fixes #148093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151599 Approved by: https://github.com/Skylion007	2025-04-18 18:49:24 +00:00
Sam Larsen	585d03fa39	Record how many parameters we're parsing within dynamo (#148508 ) This allows us to track how many paramaters we have in compilations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148508 Approved by: https://github.com/jansel, https://github.com/anijain2305 Co-authored-by: Sam Larsen <slarsen@meta.com>	2025-04-16 06:15:11 +00:00
Sam Larsen	2a1e2b88ed	[logging] Add pgo remote get/put timings to dynamo_compile (#150322 ) Test Plan: https://fburl.com/scuba/dynamo_compile/sandbox/xf950tw8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150322 Approved by: https://github.com/ppanchalia	2025-04-07 18:08:26 +00:00
Sam Larsen	90543e90a0	Fix broken dynamo_timed test due to python_version field (#149659 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149659 Approved by: https://github.com/ppanchalia	2025-03-21 00:27:28 +00:00
Shunting Zhang	6c7d8419e3	fix two accuracy regression (#149172 ) There are 2 accuracy regression in 3/12 nightly perf run. I can not repro them locally thus there is no effective way to bisect. Raise the tolerance to make them pass the accuracy check. - error log for HF MegatronBertForQuestionAnswering https://gist.github.com/shunting314/25322b66e15e98feed32e0d9a1e43316 - error log for TIMM gluon_inception_v3 https://gist.github.com/shunting314/df64ce22327df27a7057bbbd19ef5164 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149172 Approved by: https://github.com/jansel, https://github.com/eellison	2025-03-17 19:34:00 +00:00
Sam Larsen	7cdbb913e7	[logging] Set compile_id in the CachingAutotuner during compilation so we have it for dynamo_timed logging (#148693 ) Summary: This is a simpler alternative to https://github.com/pytorch/pytorch/pull/146455, where we can stick the compileId (and forward/backward bool) in the CachingAutotuner so that we have it for logging `benchmark_all_configs`. Recall that the first attempt put the compileId in the inductor_meta and that interfered with caching. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` * tlparse: https://fburl.com/e71yn6uc * dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/4ageghhv * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/4fgv1itq Pull Request resolved: https://github.com/pytorch/pytorch/pull/148693 Approved by: https://github.com/eellison	2025-03-13 03:50:58 +00:00
clr	2a7e997b3f	test/dynamo/test_utils: Fix one broken test on different python versions (#148987 ) We correctly handed different python version in the explicit ir_nodes test, but didn't handle it in the dynamo_timed test. Just explicitly deleting the fields there so the dynamo_timed test passes on all python versions. (I noticed it breaking on 3.13). Pull Request resolved: https://github.com/pytorch/pytorch/pull/148987 Approved by: https://github.com/jansel	2025-03-12 02:11:08 +00:00
clr	6b0fd741d1	dynamo: Count number of opcodes processes (#147149 ) This gives us a decent proxy for how big of a graph we functionally had to parse. Note that this is a cummulative counter. If people feel strongly, I can either write into the dynamo_timed datasets with metrics contexts, or clear the counters / write a counter per frame id as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147149 Approved by: https://github.com/jansel	2025-03-10 19:20:09 +00:00
Sam Larsen	40c2505f16	[logging] Log individual Triton kernel compilation times to dynamo_compile (#147022 ) Summary: Gather the compilation time of individual triton kernels and log them to dynamo_compile: * Time compilation in `_worker_compile_triton` and pass back to the main process and logged from `get_result()`. * Added a way to track the "top N" (or N most-expensive compiles) in the metrics_context. I did this because I doubt we really care to capture potentially thousands of kernel compile times. That would be problematic for scuba logging anyway, so let's limit the number we track from the beginning. Arbitrarily chose 25 for now. * Format the list of compile times as a json string before logging. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` Scuba: https://fburl.com/scuba/dynamo_compile/sandbox/nc4dzm3r Pull Request resolved: https://github.com/pytorch/pytorch/pull/147022 Approved by: https://github.com/jamesjwu	2025-03-03 19:32:17 +00:00
Raymond Li	c5bf9aaf1c	Log graph breaks (#146537 ) Graph breaks currently aren't logged to dynamo_compile and pt2_compile_events. We want to log them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146537 Approved by: https://github.com/c00w	2025-02-27 11:06:33 +00:00
Simon Fan	1d4adf4e1f	[dynamo] log recompile reason to dynamo_compile (#146117 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146117 Approved by: https://github.com/bobrenjc93	2025-02-03 21:04:04 +00:00
Colin L. Rice	c1161957a4	inductor_config_logging: Don't drop keys (#144700 ) This bit me while I was trying to debug some trace issues. In general this config is already quite large when dumping, so adding more fields doesn't make it significantly worse. Also a number of the items we are type checking for (except the test configs), don't even show up. Primarily this will help us when debugging rocm, halide, and trace configs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144700 Approved by: https://github.com/ezyang	2025-01-27 23:47:25 +00:00
Animesh Jain	ef60de07a0	[dynamo] Log guard latency (#145132 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132 Approved by: https://github.com/ezyang ghstack dependencies: #145509	2025-01-25 03:01:18 +00:00
PyTorch MergeBot	6f60c65a3a	Revert "[dynamo] Log guard latency (#145132 )" This reverts commit `0a310d7388`. Reverted https://github.com/pytorch/pytorch/pull/145132 on behalf of https://github.com/anijain2305 due to CI failures observed after PR was merged ([comment](https://github.com/pytorch/pytorch/pull/145132#issuecomment-2611268421))	2025-01-24 00:11:50 +00:00
Animesh Jain	0a310d7388	[dynamo] Log guard latency (#145132 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132 Approved by: https://github.com/ezyang ghstack dependencies: #145351, #145420	2025-01-23 23:30:07 +00:00
Colin L. Rice	73278e6a5d	easy: sort dictionary keys for inductor config when publishing (#143307 ) This means we should get consistent logging strings for the same config on different ranks Pull Request resolved: https://github.com/pytorch/pytorch/pull/143307 Approved by: https://github.com/xmfan	2025-01-09 18:01:20 +00:00
Colin L. Rice	d79fbf6b6d	test/dynamo/test_utils: logging - Stop testing for impossible things. (#143535 ) We don't support assigning to objects or numeric constants at the top level in config modules, no need to test for them. (This specifically breaks later sorting refactoring, since it requires < to be implemented). Pull Request resolved: https://github.com/pytorch/pytorch/pull/143535 Approved by: https://github.com/ppanchalia	2024-12-20 17:21:49 +00:00
bobrenjc93	8850a7b62c	add some logging for tensorify (#143391 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143391 Approved by: https://github.com/jamesjwu	2024-12-19 20:06:26 +00:00
qiurc	90cc43f270	Support garbage collection after pt2 compilation (#143364 ) Summary: Support garbage collection after pt2 compilation. Add jk to control the global rollout / rollback of this functionality Add env var to control individual job's rollout Test Plan: Test the model training job with / without this changes Reviewers: @yuxihu @ezyang , @Yuzhen11 , Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143364 Approved by: https://github.com/ezyang	2024-12-18 07:25:11 +00:00
Sam Larsen	60c54467db	[logging] Log runtime autotuning timing to scuba (#141919 ) See test plan in internal diff [D66679369](https://our.internmc.facebook.com/intern/diff/D66679369) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141919 Approved by: https://github.com/jamesjwu, https://github.com/ezyang	2024-12-13 21:22:13 +00:00
Sam Larsen	30b61e521c	[logging] Populate compile_time_autotune_time_us (#143104 ) See testing in attached diff Differential Revision: [D67128210](https://our.internmc.facebook.com/intern/diff/D67128210) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143104 Approved by: https://github.com/ezyang	2024-12-12 17:08:43 +00:00
Sam Larsen	692b5e75ed	[logging] Add triton_compile_time_us column to dynamo_compile (#142068 ) Test Plan: See internal diff [D66799565](https://www.internalfb.com/diff/D66799565) Differential Revision: [D66799565](https://our.internmc.facebook.com/intern/diff/D66799565) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142068 Approved by: https://github.com/c00w	2024-12-06 16:11:57 +00:00
Bob Ren	9286c21b22	Fix fbcode tests for automatic dynamic unspecialize float (#141975 ) Differential Revision: D66708552 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141975 Approved by: https://github.com/bdhirsh, https://github.com/atalman	2024-12-03 23:59:06 +00:00
Bob Ren	2f72635a5c	automatic dynamic unspecialize float (#141647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141647 Approved by: https://github.com/ezyang	2024-11-29 22:36:53 +00:00
PyTorch MergeBot	9e98b3d73c	Revert "automatic dynamic unspecialize float (#141647 )" This reverts commit `1a32daeb17`. Reverted https://github.com/pytorch/pytorch/pull/141647 on behalf of https://github.com/atalman due to functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inner_grad [GH job link](https://github.com/pytorch/pytorch/actions/runs/12080983316/job/33697901875) [HUD commit link](`1a32daeb17`) ([comment](https://github.com/pytorch/pytorch/pull/141647#issuecomment-2507980876))	2024-11-29 15:00:33 +00:00
Bob Ren	1a32daeb17	automatic dynamic unspecialize float (#141647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141647 Approved by: https://github.com/ezyang	2024-11-29 07:53:53 +00:00
PyTorch MergeBot	ad37afd590	Revert "Always unspecialize float in OSS (#138922 )" This reverts commit `ba5253da9b`. Reverted https://github.com/pytorch/pytorch/pull/138922 on behalf of https://github.com/yf225 due to perf regression on torchbench ([comment](https://github.com/pytorch/pytorch/pull/138922#issuecomment-2499277511))	2024-11-26 00:03:03 +00:00
Sam Larsen	07906f2f2b	[logging] Move population of common MetricsContext fields to record_compilation_metrics (#141291 ) Summary: Fix outstanding TODOs related to logging of CompilationMetrics by moving the population of common fields to record_compilation_metrics() instead of populating those independently wherever we use a the metrics_context contextmanager: * Keep track of start and end time in MetricsContext and pass those to record_compilation_metrics() and populate those fields in that function. * Pass exception info to record_compilation_metrics() and populate those field in that function. * Add a new contextmanager, chromium_event_timed, to create the start/end "dynamo" event. This is important because I want this contextmanager to complete _after_ building the CompilationMetrics. * Populate the compile_id field centrally in record_compilation_metrics(). * Populate the structured_logging_overhead centrally in record_compilation_metrics(). * Add the CompilationMetrics to the current chromium event in record_compilation_metrics(), after all common fields have been added. In a future diff, I can also add _all_ compilation metrics to the chromium event. Test plan: Unit tests. Also see internal testing: * dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/jrascnf9 * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/l3jnla06 * tlparse: https://fburl.com/bq5a9nqs Pull Request resolved: https://github.com/pytorch/pytorch/pull/141291 Approved by: https://github.com/jamesjwu	2024-11-25 13:18:40 +00:00
Bob Ren	ba5253da9b	Always unspecialize float in OSS (#138922 ) Fixes https://github.com/pytorch/pytorch/issues/107277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138922 Approved by: https://github.com/ezyang Co-authored-by: Edward Z. Yang <ezyang@meta.com>	2024-11-24 01:58:13 +00:00
PyTorch MergeBot	a8c90e5140	Revert "Always unspecialize float in OSS (#138922 )" This reverts commit `6d779d0549`. Reverted https://github.com/pytorch/pytorch/pull/138922 on behalf of https://github.com/huydhn due to Sorry for reverting your change but there is some slow tests failing after this land ([comment](https://github.com/pytorch/pytorch/pull/138922#issuecomment-2495076878))	2024-11-22 23:18:36 +00:00
Jovian Anthony Jaison	45d62d6fc5	[dynamo] Added cuda and triton versions to dynamo_compile (#141290 ) Opening another PR since #141140 was reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141290 Approved by: https://github.com/masnesral	2024-11-22 20:04:42 +00:00
Bob Ren	6d779d0549	Always unspecialize float in OSS (#138922 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138922 Approved by: https://github.com/ezyang Co-authored-by: Edward Z. Yang <ezyang@meta.com>	2024-11-22 17:54:42 +00:00
Colin L. Rice	f5d00f1456	pytorch/features: Make a feature logger and record triton bundling (#141056 ) This modifies metrics_context to allow us to store whether a feature was used or not. This also starts recording this for triton bundling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141056 Approved by: https://github.com/masnesral	2024-11-22 01:31:08 +00:00
Prajesh Praveen Anchalia	4e34fbdcbc	Add inductor_fx_graph_cache stats to dynamo_utils (#141190 ) Summary: Add the following inductor fx graph cache stats to dynamo compile - inductor_fx_cache_hit_count - inductor_fx_cache_miss_count - inductor_fx_cache_backend_type - inductor_fx_cache_hit_keys - inductor_fx_cache_miss_keys - remote_cache_version Test Plan: Run local tests and staging logger: P1683061460 Differential Revision: D66232206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141190 Approved by: https://github.com/masnesral	2024-11-21 20:59:10 +00:00
Ivan Zaitsev	149677e30c	Revert "[dynamo] Added cuda and triton versions to dynamo_compile" (#141280 ) Reverts pytorch/pytorch#141140 reason: conflicts with https://github.com/pytorch/pytorch/pull/141190 and wasn't merged using mergebot Pull Request resolved: https://github.com/pytorch/pytorch/pull/141280 Approved by: https://github.com/clee2000, https://github.com/kit1980	2024-11-21 20:50:06 +00:00
Jovian Anthony Jaison	11d0ba068f	[dynamo] Added cuda and triton versions to dynamo_compile (#141140 ) [dynamo] Added cuda and triton versions to dynamo_compile (#141140) Summary: Add cuda and triton versions to dynamo_compile logging site. Test Plan: $ buck2 run mode/opt //scripts/oulgen:runner File changed: fbcode//caffe2/torch/_dynamo/convert_frame.py Buck UI: https://www.internalfb.com/buck2/1a8ada1f-d54e-44b2-a368-b2ff2030e113 Network: Up: 65KiB Down: 0B (reSessionID-8f4d1d6d-a680-4ecc-8e73-c29c932d824b) Jobs completed: 2166. Time elapsed: 7.0s. Cache hits: 0%. Commands: 3 (cached: 0, remote: 0, local: 3) BUILD SUCCEEDED ... Cuda: 12.4.0 Triton: 3.0.0 Reviewed By: masnesral Differential Revision: D66181508	2024-11-21 12:20:02 -08:00
Henry Tsang	4f2543c31d	[logs] Add dynamo_timed to get better compilation time breakdown for AOTI (#140198 ) Adding some dynamo timed for the purpose of better understanding AOTI compilation time. Probably would require a few more passes. A lot of time is spent in Scheduler.__init__, and not enough annotations are there. run_command_and_check takes a lot time as well. But there is probably not much we can do. Maybe we can add a config to tune C++ optimization level? traces: <img width="1205" alt="Screenshot 2024-11-08 at 4 41 10 PM" src="https://github.com/user-attachments/assets/61645264-b3af-4d4a-804d-700b0f831c7c"> Differential Revision: D65554141 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140198 Approved by: https://github.com/desertfire	2024-11-19 18:54:17 +00:00
Prajesh Praveen Anchalia	1e234e63b3	[pytorch][dynamo_compile] Log inductor config to dynamo_compile (#140790 ) Summary: Scrubbed inductor config logging to dynamo_compile as json:str. Scrub RE: `r'((^TYPE_CHECKING$)\|(._progress$)\|(.TESTING.)\|(.(rocm\|halide).)\|(^trace\..)\|(^_))'`to save some space. Test Plan: Staging logger: https://fburl.com/data/ltkt08zm P1679697917 {F1958428018} Differential Revision: D65806399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140790 Approved by: https://github.com/masnesral	2024-11-19 02:39:33 +00:00
Sam Larsen	e2e67a010a	[logging] Add dynamo_compile fields for pre-dispatch/joint/post-dispatch times (#140306 ) Tested internally: P1679622670 Differential Revision: [D65986059](https://our.internmc.facebook.com/intern/diff/D65986059) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140306 Approved by: https://github.com/ezyang	2024-11-15 15:02:08 +00:00
Sam Larsen	b11ff3cf60	[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849 ) Here's the overview: There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits. Some specifics: * Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile). * Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed. * Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead. * `record_compilation_metrics` is now called on exit from MetricsContext. * Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`. * Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext. And specifically, several changes to dynamo_timed: * "Modernize" the parameters and update all callsites accordingly. * Move the backwards logging of the CompilationMetrics to the backwards compile location. * Add a parameter for which CompilationMetrics field to update Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849 Approved by: https://github.com/ezyang	2024-11-14 19:11:20 +00:00
PyTorch MergeBot	d63eb3c46c	Revert "[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849 )" This reverts commit `cb15c15157`. Reverted https://github.com/pytorch/pytorch/pull/139849 on behalf of https://github.com/kit1980 due to Breaking an internal tests + there is a bug according to the author ([comment](https://github.com/pytorch/pytorch/pull/139849#issuecomment-2474459094))	2024-11-13 18:47:51 +00:00

1 2

58 Commits