pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	5bcfdae71d	Revert "Make PT2 compile backprop through custom op without autograd key a hard error (#166367 )" This reverts commit `4acc66f119`. Reverted https://github.com/pytorch/pytorch/pull/166367 on behalf of https://github.com/atalman due to internal build failures ([comment](https://github.com/pytorch/pytorch/pull/166367#issuecomment-3473150269))	2025-10-31 13:44:05 +00:00
Edward Z. Yang	4acc66f119	Make PT2 compile backprop through custom op without autograd key a hard error (#166367 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/166367 Approved by: https://github.com/bdhirsh	2025-10-30 18:43:07 +00:00
Menglu Yu	e95920e3e6	[Optimus] Rename the post_grad_graph tlparse log (#166109 ) Summary: ezyang observed a cache miss issue, see details in https://github.com/pytorch/pytorch/issues/166012 We thus rename the post_grad_graph tlparse log name to resolve the cache issue. Differential Revision: D85309891 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166109 Approved by: https://github.com/jamesjwu	2025-10-28 00:23:01 +00:00
Scott Wolchok	331b7cc054	Fix double dispatch to Python for detach (#163671 ) This fixes #71725. Differential Revision: [D83857880](https://our.internmc.facebook.com/intern/diff/D83857880) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163671 Approved by: https://github.com/ezyang, https://github.com/albanD	2025-10-15 17:24:50 +00:00
Aleksei Nikiforov	c733072874	Fix IValue from SymBool on big-endian system (#163647 ) Skip test_compiled_autograd_attribution on s390x It fails both on s390x and x86_64 at least under some circumstances. Disable it for now until on s390x until it works reliably. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163647 Approved by: https://github.com/malfet	2025-10-14 15:07:48 +00:00
PyTorch MergeBot	267348fe7f	Revert "Fix double dispatch to Python for detach (#163671 )" This reverts commit `a3e3efe474`. Reverted https://github.com/pytorch/pytorch/pull/163671 on behalf of https://github.com/seemethere due to We should've reverted this when we decided to revert https://github.com/pytorch/pytorch/pull/164691 since they were actually stacked ([comment](https://github.com/pytorch/pytorch/pull/163671#issuecomment-3400009953))	2025-10-14 03:55:36 +00:00
Scott Wolchok	a3e3efe474	Fix double dispatch to Python for detach (#163671 ) This fixes #71725. Differential Revision: [D83857880](https://our.internmc.facebook.com/intern/diff/D83857880) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163671 Approved by: https://github.com/ezyang, https://github.com/albanD	2025-10-13 16:10:17 +00:00
Edward Yang	8627454c84	Add local file path to inductor_output_code trace metadata (#160920 ) ## Summary - include local file path in `inductor_output_code` structured trace metadata - adjust structured trace tests for new `file_path` field ## Testing - `python test/dynamo/test_structured_trace.py StructuredTraceTest.test_compile_id_serialization_deserialization` - `lintrunner -a torch/_inductor/codecache.py torch/_inductor/graph.py test/dynamo/test_structured_trace.py` (fails: MYPY failure) ------ https://chatgpt.com/codex/tasks/task_e_68a2b02b54ec8323ae820120605a9f1c Pull Request resolved: https://github.com/pytorch/pytorch/pull/160920 Approved by: https://github.com/oulgen	2025-09-18 18:39:46 +00:00
Shangdi Yu	92c2daebb6	Add inductor provenance tracking artifacts to cache (#161440 ) Summary: - Add inductor provenance tracking artifacts to cache - Update the tlparse version pin to `0.4.0`. The old tlparse version errors out on the new tlparse output. The lowest tlparse version that works is `0.3.42`. tlparse error: ``` thread 'main' panicked at src/parsers.rs:671:71: called `Result::unwrap()` on an `Err` value: Error("EOF while parsing a value", line: 1, column: 0) stack backtrace: 0: 0x55e4ff1c7f00 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h6d42cc84fc840290 1: 0x55e4ff1ee503 - core::fmt::write::h5af61a909e3ec64d 2: 0x55e4ff1c4c33 - std::io::Write::write_fmt::h5a7b54aa6e4a315d 3: 0x55e4ff1c7d52 - std::sys::backtrace::BacktraceLock::print::h555579e7396c26ac 4: 0x55e4ff1c8caf - std::panicking::default_hook::{{closure}}::h9128866118196224 5: 0x55e4ff1c8b1a - std::panicking::default_hook::h52e9e7314e0255f6 6: 0x55e4ff1c9652 - std::panicking::rust_panic_with_hook::h541791bcc774ef34 7: 0x55e4ff1c93fa - std::panicking::begin_panic_handler::{{closure}}::h6479a2f0137c7d19 8: 0x55e4ff1c8419 - std::sys::backtrace::__rust_end_short_backtrace::ha04e7c0fc61ded91 9: 0x55e4ff1c908d - rust_begin_unwind 10: 0x55e4fef7a030 - core::panicking::panic_fmt::h5764ee7030b7a73d 11: 0x55e4fef7a406 - core::result::unwrap_failed::h3ff7104a9ace307a 12: 0x55e4fefb3c56 - <tlparse::parsers::ArtifactParser as tlparse::parsers::StructuredLogParser>::parse::h20bc51a17ffc494a 13: 0x55e4fef9669a - tlparse::run_parser::h20c7729f151eec62 14: 0x55e4fef99a1b - tlparse::parse_path::he4892147f47fbade 15: 0x55e4fef7c760 - tlparse::main::hdc05613b32f4f53b 16: 0x55e4fef89263 - std::sys::backtrace::__rust_begin_short_backtrace::h15f188f3edf42596 17: 0x55e4fef8827d - std::rt::lang_start::{{closure}}::he2c21e32a442538e 18: 0x55e4ff1be0f0 - std::rt::lang_start_internal::h15895544e2012228 19: 0x55e4fef83975 - main 20: 0x7f0b3662a610 - __libc_start_call_main 21: 0x7f0b3662a6c0 - __libc_start_main_alias_2 22: 0x55e4fef7a610 - <unknown> 23: 0x0 - <unknown> ``` Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r test_kernel_information_generation python test/dynamo/test_structured_trace.py -k test_chromium_event ``` Differential Revision: D80976585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161440 Approved by: https://github.com/oulgen	2025-08-28 01:16:02 +00:00
Sandeep Narendranath Karjala	ec585ceab4	[inductor] structured-log graph execution order + test (#160448 ) Summary: - Emit a structured trace per compiled graph execution to reconstruct execution order in TLParse. - Adds debug.log_graph_execution(name) called from `CompiledFxGraph.__call__`, producing an artifact named inductor_graph_execution with payload {"graph": "graph_<id>"}. Testing: - Add inline test to verify structure and output Pull Request resolved: https://github.com/pytorch/pytorch/pull/160448 Approved by: https://github.com/xmfan	2025-08-27 18:12:46 +00:00
PyTorch MergeBot	4a1aca11c2	Revert "[inductor] structured-log graph execution order + test (#160448 )" This reverts commit `995397d47a`. Reverted https://github.com/pytorch/pytorch/pull/160448 on behalf of https://github.com/atalman due to internal failure please see associated diff ([comment](https://github.com/pytorch/pytorch/pull/160448#issuecomment-3223939035))	2025-08-26 12:20:37 +00:00
Sandeep Narendranath Karjala	995397d47a	[inductor] structured-log graph execution order + test (#160448 ) Summary: - Emit a structured trace per compiled graph execution to reconstruct execution order in TLParse. - Adds debug.log_graph_execution(name) called from `CompiledFxGraph.__call__`, producing an artifact named inductor_graph_execution with payload {"graph": "graph_<id>"}. Testing: - Add inline test to verify structure and output Pull Request resolved: https://github.com/pytorch/pytorch/pull/160448 Approved by: https://github.com/xmfan	2025-08-25 20:12:18 +00:00
bobrenjc93	9a41570199	[rfc] add hint_override kwarg to mark_dynamic (#161007 ) The motivation for this change can be seen through the following example: ``` import torch GPU_TYPE = "cuda" @torch.compile def no_override(x): return x.sum(dim=0) @torch.compile def override(x): return x.sum(dim=0) x_small = torch.randn(4096, 512, device=GPU_TYPE) no_override(x_small) torch._dynamo.decorators.mark_dynamic(x_small, 0, hint_override=4096 * 1000) override(x_small) ``` Previously, when reductions were split, codegen relied only on the first observed shape. With a small input, this resulted in a small split size: ``` def triton_red_fused_sum_0(in_ptr0, out_ptr0, ks0, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): xnumel = 16384 rnumel = r0_numel ``` With the new scheme, inductor honors hint_override during codegen, producing larger and more appropriate split sizes: ``` def triton_red_fused_sum_0(in_ptr0, out_ptr0, ks0, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): xnumel = 1024000 rnumel = r0_numel ``` This addresses a broader problem with dynamism: performance and numerics previously depended on whichever shape was seen first. For example: ``` f(s0) -> f(s2) f(s1) -> f(s2) ``` could generate different kernels. With the new approach, an explicit override pins the chosen configuration: ``` f(s0, hint_override=s0) -> f(s2) f(s1, hint_override=s0) -> f(s2) ``` ensuring consistent kernel generation regardless of input order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161007 Approved by: https://github.com/jansel	2025-08-21 02:22:52 +00:00
PyTorch MergeBot	90ea9ccefe	Revert "[rfc] add hint_override kwarg to mark_dynamic (#161007 )" This reverts commit `0533ff2ccb`. Reverted https://github.com/pytorch/pytorch/pull/161007 on behalf of https://github.com/jeffdaily due to failing on both cuda and rocm ([comment](https://github.com/pytorch/pytorch/pull/161007#issuecomment-3206893756))	2025-08-20 15:31:33 +00:00
bobrenjc93	0533ff2ccb	[rfc] add hint_override kwarg to mark_dynamic (#161007 ) The motivation for this change can be seen through the following example: ``` import torch GPU_TYPE = "cuda" @torch.compile def no_override(x): return x.sum(dim=0) @torch.compile def override(x): return x.sum(dim=0) x_small = torch.randn(4096, 512, device=GPU_TYPE) no_override(x_small) torch._dynamo.decorators.mark_dynamic(x_small, 0, hint_override=4096 * 1000) override(x_small) ``` Previously, when reductions were split, codegen relied only on the first observed shape. With a small input, this resulted in a small split size: ``` def triton_per_fused_sum_1(in_ptr0, out_ptr0, xnumel, r0_numel, XBLOCK : tl.constexpr): xnumel = 512 r0_numel = 32 ``` With the new scheme, inductor honors hint_override during codegen, producing larger and more appropriate split sizes: ``` def triton_red_fused_sum_0(in_ptr0, out_ptr0, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): xnumel = 16384 r0_numel = 128 ``` This addresses a broader problem with dynamism: performance and numerics previously depended on whichever shape was seen first. For example: ``` f(s0) -> f(s2) f(s1) -> f(s2) ``` could generate different kernels. With the new approach, an explicit override pins the chosen configuration: ``` f(s0, hint_override=s0) -> f(s2) f(s1, hint_override=s0) -> f(s2) ``` ensuring consistent kernel generation regardless of input order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161007 Approved by: https://github.com/jansel	2025-08-20 07:51:09 +00:00
Colin Peppler	512fc768e9	Add tlparse artifact for joint graph passes (for inference & non-freezing only) (#160589 ) Summary: Joint graph passes run several FX passes which can modify the graph before it hits Inductor. There's three usages of joint graph passes: - for inference & not freezing (we add structured loggings only for this) - for inference & freezing - for fw/bw split Rollback Plan: Reviewed By: yushangdi Differential Revision: D80130321 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160589 Approved by: https://github.com/yushangdi	2025-08-19 23:18:40 +00:00
Sandeep Narendranath Karjala	c699668009	[inductor] TLParse tensor metadata logging + test (#160132 ) Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132 Approved by: https://github.com/xmfan	2025-08-17 04:27:49 +00:00
PyTorch MergeBot	26297c27e2	Revert "[inductor] TLParse tensor metadata logging + test (#160132 )" This reverts commit `2603e40be5`. Reverted https://github.com/pytorch/pytorch/pull/160132 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/17010600949/job/48226137423) [HUD commit link](`2603e40be5`). landrace with another PR that changed some had_cuda related things ([comment](https://github.com/pytorch/pytorch/pull/160132#issuecomment-3193969792))	2025-08-16 23:47:03 +00:00
Sandeep Narendranath Karjala	2603e40be5	[inductor] TLParse tensor metadata logging + test (#160132 ) Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132 Approved by: https://github.com/xmfan ghstack dependencies: #160260	2025-08-16 16:37:18 +00:00
Sandeep Narendranath Karjala	fc80f6859e	Fix collective schedule logging and runtime tests (#160260 ) Summary: - Fix collective schedule logging so that only logs when collectives present - Fix runtime estimate test to check if each op has a number value Pull Request resolved: https://github.com/pytorch/pytorch/pull/160260 Approved by: https://github.com/Skylion007	2025-08-11 20:58:52 +00:00
ghostspiders	af10f1f86c	Fix requires_cuda to requires_cuda_and_triton (#160222 ) Fixes ##159399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160222 Approved by: https://github.com/janeyx99	2025-08-10 07:05:52 +00:00
gaoyvfeng	50f23ff6f8	rename-HAS_CUDA-to-HAS_CUDA_AND_TRITON (#159883 ) Fixes #159399 "Modified torch.testing._internal.inductor_utils and test/inductor" Pull Request resolved: https://github.com/pytorch/pytorch/pull/159883 Approved by: https://github.com/janeyx99	2025-08-08 15:44:52 +00:00
Sandeep Narendranath Karjala	8034b2a732	[inductor] Add TLParse artifact for logging runtime of collective and compute ops (#159730 ) Summary: - debug.py: Added log_runtime_estimates() function to dump runtime estimation data as structured tlparse artifacts in JSON format - test_structured_trace.py: Added comprehensive test coverage with testing compute and collective ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/159730 Approved by: https://github.com/yushangdi ghstack dependencies: #159190	2025-08-05 22:06:32 +00:00
Michael Lazos	9f8cfe7476	[Dynamo] Fix arg ordering in tf modes (#159707 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159707 Approved by: https://github.com/zou3519	2025-08-05 01:43:21 +00:00
Sandeep Narendranath Karjala	85e74d5ace	[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 ) This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs. - Iterates over scheduler.nodes, filters for _CollectiveKernel nodes - Extracts each op’s python_kernel_name - Emits a structured JSON payload under the inductor_collective_schedule artifact name - Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact - Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190 Approved by: https://github.com/yushangdi, https://github.com/xmfan	2025-08-01 21:51:42 +00:00
PyTorch MergeBot	490cb3f1a4	Revert "[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 )" This reverts commit `bb62e1f769`. Reverted https://github.com/pytorch/pytorch/pull/159190 on behalf of https://github.com/clee2000 due to broke [GH job link](https://github.com/pytorch/pytorch/actions/runs/16658705097/job/47150840171) [HUD commit link](`bb62e1f769`) on mac ([comment](https://github.com/pytorch/pytorch/pull/159190#issuecomment-3141513921))	2025-07-31 22:22:13 +00:00
Sandeep Narendranath Karjala	bb62e1f769	[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 ) This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs. - Iterates over scheduler.nodes, filters for _CollectiveKernel nodes - Extracts each op’s python_kernel_name - Emits a structured JSON payload under the inductor_collective_schedule artifact name - Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact - Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190 Approved by: https://github.com/yushangdi, https://github.com/xmfan	2025-07-31 19:58:07 +00:00
Shangdi Yu	82a1ee1135	Refactor Provenance Tracking (#158399 ) Summary: As inductor provenance tracking is getting more use cases, we want to separate the inductor provenance tracking guarding flag from the general `trace.enabled`, so we can enable provenance tracking without all the overhead of `trace.enabled` - change the guard flag from `trace.enabled` to `trace.provenance_tracking`. It is turned on by either `TORCH_COMPILE_DEBUG=1` or `INDUCTOR_PROVENANCE=1`. - Move the provenance tracking logic and variables out of DebugContext, because DebugContext is only enabled with `trace.enabled`. Since the variables are now global variables, added `reset_provenance_globals()` context manager to reset them for each `compile_fx()` call. - Move `set_kernel_post_grad_provenance_tracing` from `util.py` to `debug.py` so now all provenance related logic is in `debug.py`. In the future, if we want to enable it further, we can change the provenance tracking flag to be enabled when `TORCH_TRACE` is set. I think we should do that in a separate PR, so it's easier to revert if this flag change creates any problem. See more motivation in internal Diff Test Plan: ``` buck2 run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing ``` Differential Revision: D78287976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158399 Approved by: https://github.com/angelayi	2025-07-17 00:23:00 +00:00
Edward Z. Yang	7afb834f93	Inline dispatch_and_compile into its call site. (#158150 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/158150 Approved by: https://github.com/jamesjwu, https://github.com/wconstab ghstack dependencies: #158149	2025-07-15 19:08:55 +00:00
Boyuan Feng	94995eba07	[Log] add a hook for recompile user context (#157961 ) Users may want compile-related but customized logging info to dynamo_compile. One example is to logging the current training iteration index when recompilation happens. In general, current training iteration index is not available to compiler, since the same compiled function may be called multiple times in the same training iteration. The user could provide the training iteration index in a user hook where torch.compile logs it when recompilation happens. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157961 Approved by: https://github.com/masnesral	2025-07-11 03:41:33 +00:00
Menglu Yu	701e22112d	[PT2][Optimus][Observability] Refactor the logging to avoid excessive tlparse log (#153584 ) Summary: context: https://fb.workplace.com/groups/943185660584207/permalink/1215335930035844/ Test Plan: before: aps-aps-ig_v4_2t_2_make_baseline_30batch-735703723-f735706162 tlparse: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/aps-aps-ig_v4_2t_2_make_baseline_30batch-735703723-f735706162/attempt_0/version_0/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000&fbclid=IwZXh0bgNhZW0CMTEAAR575JfJZUtE7kQCqzIZVCYomv1q03JzuMFVok8qDA_FuGC8oZ6rhhb2EziSQA_aem_abITQJZQP45t51_r-J-cFw Differential Revision: D74776025 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153584 Approved by: https://github.com/jamesjwu	2025-05-19 22:57:29 +00:00
Animesh Jain	ecd74c953f	[dynamo] Recursively realize the stack_values (#152853 ) Might also fix - https://github.com/pytorch/pytorch/issues/135696 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152853 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos, https://github.com/jansel	2025-05-07 02:36:44 +00:00
PyTorch MergeBot	a28dcdba2c	Revert "[aot][ca] save bw_module in AOTAutogradCache (#151860 )" This reverts commit `613bd46272`. Reverted https://github.com/pytorch/pytorch/pull/151860 on behalf of https://github.com/huydhn due to Chatting with @xmfan and decide to revert and reland this instead ([comment](https://github.com/pytorch/pytorch/pull/151860#issuecomment-2856709646))	2025-05-07 00:56:54 +00:00
James Wu	93d8f6ee32	[reland] Detailed triton kernel logging (#152694 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152694 Approved by: https://github.com/Skylion007	2025-05-05 02:46:57 +00:00
Simon Fan	613bd46272	[aot][ca] save bw_module in AOTAutogradCache (#151860 ) Compiled Autograd retraces AOT's bw_module at backward runtime into a larger graph, and today this runs into an issue on warm cache runs because the bw_module is not restored. This PR adds it to the cache, by first stripping it bare from unserializable metadata. I also intentionally differentiate the cached and non-cached versions to avoid accidental attempts of AOT compilation with a restored bw_module (would probably crash). Note that since the cache entry may be used by runs that use compiled autograd and runs that do not, we need to cache both the lowered backward and the bw_module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151860 Approved by: https://github.com/jamesjwu ghstack dependencies: #149707	2025-05-01 21:59:43 +00:00
PyTorch MergeBot	835413baed	Revert "[Optimus][Observability] Improve tlparse logging (#151635 )" This reverts commit `06a3c3c8cd`. Reverted https://github.com/pytorch/pytorch/pull/151635 on behalf of https://github.com/clee2000 due to broke dynamo/test_structured_trace.py::StructuredTraceTest::test_ddp_graphs [GH job link](https://github.com/pytorch/pytorch/actions/runs/14600342064/job/40970324075) [HUD commit link](`06a3c3c8cd`), test did fail on PR but dr ci says it matches an existing failure, which it does, but also this PR breaks the test too ([comment](https://github.com/pytorch/pytorch/pull/151635#issuecomment-2822538113))	2025-04-22 21:39:23 +00:00
Menglu Yu	06a3c3c8cd	[Optimus][Observability] Improve tlparse logging (#151635 ) Summary: We improve tlparse logging for Optimus graph transformaton to enable easier debug Test Plan: ``` TORCH_TRACE=~/my_trace_log_dir CUDA_VISIBLE_DEVICES=5 buck2 run mode/opt //aps_models/ads/ecosystem/tooling/tools/efficient_module_suite/pyper_models:pyper_model_perf_benchmark -- --flow_id 720055919 --shrink_model --mfu_profile_module "impl.shared_arch.dense_sparse_interaction" --use_synthetic_data ``` Differential Revision: D73229681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151635 Approved by: https://github.com/Yuzhen11	2025-04-22 16:56:08 +00:00
Sam Larsen	bd77c3e054	[easy] Update test/dynamo/test_structured_trace.py (#151606 ) Summary: test/dynamo/test_structured_trace.py is out of date because of some new fields. (I guess the test is disabled?). Bring it up to date. Test Plan: `python test/dynamo/test_structured_trace.py` Fixes #149671 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151606 Approved by: https://github.com/Skylion007 ghstack dependencies: #151599	2025-04-18 21:33:13 +00:00
Oguz Ulgen	ef64beb232	Include post grad gm and fx runnable in cache artifacts for tlparse (#151469 ) Fixed #151462 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151469 Approved by: https://github.com/bdhirsh	2025-04-17 17:14:13 +00:00
bobrenjc93	f649ee73ce	Use source hashing to generate consistent symbolic ids (#149665 ) This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149665 Approved by: https://github.com/Mingming-Ding, https://github.com/laithsakka	2025-03-28 05:36:32 +00:00
PyTorch MergeBot	af7719a2fa	Revert "Use source hashing to generate consistent symbolic ids (#149665 )" This reverts commit `1f92348dc6`. Reverted https://github.com/pytorch/pytorch/pull/149665 on behalf of https://github.com/malfet due to Broke trunk, see `6eb3c2e282/1` ([comment](https://github.com/pytorch/pytorch/pull/149665#issuecomment-2758578187))	2025-03-27 16:02:27 +00:00
bobrenjc93	1f92348dc6	Use source hashing to generate consistent symbolic ids (#149665 ) This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149665 Approved by: https://github.com/Mingming-Ding, https://github.com/laithsakka	2025-03-27 03:39:27 +00:00
William Wen	6285a71aba	[dynamo] fix bug where non-recursive disable modifies the original function (#148896 ) Fixes https://github.com/pytorch/pytorch/issues/148787. We fix this by: - Wrapping the original function instead of directly modifying it - When we detect that the previous frame is the non-recursive disable wrapper, then skip tracing this frame (non-recursive disable wrapper will always be skipped, so that frame will be present in the traceback)l Pull Request resolved: https://github.com/pytorch/pytorch/pull/148896 Approved by: https://github.com/jansel	2025-03-20 18:33:54 +00:00
Simon Fan	7c87ec1b50	[ca] always do initial trace with dynamic shapes (#148801 ) HUD: https://fburl.com/wzvx6tax no regressions (ignore the pass rate improvements, those come from #149030) <img width="864" alt="image" src="https://github.com/user-attachments/assets/d7598f98-b378-4abb-a0c7-e4311162f681" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/148801 Approved by: https://github.com/jansel ghstack dependencies: #148799, #149030	2025-03-13 17:30:29 +00:00
Sam Larsen	7cdbb913e7	[logging] Set compile_id in the CachingAutotuner during compilation so we have it for dynamo_timed logging (#148693 ) Summary: This is a simpler alternative to https://github.com/pytorch/pytorch/pull/146455, where we can stick the compileId (and forward/backward bool) in the CachingAutotuner so that we have it for logging `benchmark_all_configs`. Recall that the first attempt put the compileId in the inductor_meta and that interfered with caching. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` * tlparse: https://fburl.com/e71yn6uc * dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/4ageghhv * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/4fgv1itq Pull Request resolved: https://github.com/pytorch/pytorch/pull/148693 Approved by: https://github.com/eellison	2025-03-13 03:50:58 +00:00
PyTorch MergeBot	b54cf1a281	Revert "[logging] Set compile_id in the CachingAutotuner during compilation so we have it for dynamo_timed logging (#148693 )" This reverts commit `73c8068cf8`. Reverted https://github.com/pytorch/pytorch/pull/148693 on behalf of https://github.com/ZainRizvi due to This is breaking lint on trunk. Please rebase these changes before merging them back in. [GH job link](https://github.com/pytorch/pytorch/actions/runs/13796723235/job/38590020554) [HUD commit link](`73c8068cf8`) ([comment](https://github.com/pytorch/pytorch/pull/148693#issuecomment-2715671875))	2025-03-11 20:50:23 +00:00
Sam Larsen	73c8068cf8	[logging] Set compile_id in the CachingAutotuner during compilation so we have it for dynamo_timed logging (#148693 ) Summary: This is a simpler alternative to https://github.com/pytorch/pytorch/pull/146455, where we can stick the compileId (and forward/backward bool) in the CachingAutotuner so that we have it for logging `benchmark_all_configs`. Recall that the first attempt put the compileId in the inductor_meta and that interfered with caching. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` * tlparse: https://fburl.com/e71yn6uc * dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/4ageghhv * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/4fgv1itq Pull Request resolved: https://github.com/pytorch/pytorch/pull/148693 Approved by: https://github.com/eellison	2025-03-11 19:38:40 +00:00
Brian Hirsh	492f3fd5cf	replace usages of upload_graph in inductor with tlparse (v2) (#148720 ) Reland of https://github.com/pytorch/pytorch/pull/148703 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148720 Approved by: https://github.com/mengluy0125	2025-03-10 22:47:58 +00:00
Sam Larsen	187d5c0eb1	[logging] Log cudagraphify timings to dynamo_timed (#143220 ) Summary: this adds some new dynamo_timed calls in cudagraph_trees, primarily with the aim to add cudagraph-related timing to scuba. Things to note: * Uses the changes in https://github.com/pytorch/pytorch/pull/141919 to log "runtime" entries * The logging for chromium/tlparse/scuba relies on us providing a compile_id since it's not available in the environment. A lot of the changes here are just passing around the compile_id * I believe the spirit of the scuba logging is to capture the overheads of `torch.compile`. Therefore, I'm not adding _every_ dynamo_timed to scuba. For example, "run_eager" is the first real execution of the inductor graph -- it's not cudagraph overhead, per se. Watch out for the two instances of `dynamo_compile_runtime_column_us="runtime_cudagraphify_time_us"`. Those are the spots I believe are _extra_ overhead we'd contribute to torch.compile. Test Plan: `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only dcgan`: * tlparse: https://fburl.com/21yrdn8h * scuba: https://fburl.com/scuba/dynamo_compile/sandbox/wt90wnjz `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --print-compilation-time --repeat 5 --cold-start-latency --only nanogpt` * tlparse: https://fburl.com/r9mp7uiv * scuba: https://fburl.com/scuba/dynamo_compile/sandbox/1nvx94re Pull Request resolved: https://github.com/pytorch/pytorch/pull/143220 Approved by: https://github.com/eellison	2025-03-07 23:07:13 +00:00
IvanKobzarev	7ae0e0b2ea	[aotd] Log torch._functorch.config in tlparse (#147883 ) Adding torch._functorch.config to tlparse for better debugability. E.g. https://github.com/pytorch/pytorch/pull/147638 happened only with `torch._functorch.config.view_replay_for_aliased_outputs=False` which is True by defautl Pull Request resolved: https://github.com/pytorch/pytorch/pull/147883 Approved by: https://github.com/bdhirsh, https://github.com/jamesjwu	2025-02-27 11:22:45 +00:00

1 2 3

107 Commits