pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Shunting Zhang	4cc64d6234	[inductor] pre grad graph bisecting (#166344 ) A few things to note: 1. Customers like vllm use a custom backend (e.g. VllmBackend), split the graph, and call standalone_compile for each split. If we let the bisector override the backend, we won't bisect thru the custom backend. `test_configs.bisect_keep_custom_backend_for_inductor` is used to keep the custom backend if we are bisecting for inductor. 2. pre_grad_graph bisecting and lowering bisecting so far does not compose well with each other since an issue may be just captured by the first one we try. `test_configs.bisect_pre_grad_graph` is used to enable the 'pre_grad_graph' bisecting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166344 Approved by: https://github.com/eellison	2025-11-01 09:22:21 +00:00
Xuehai Pan	e8fadba28c	[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 ) The goal of this PR is to provide a standard way to create simple treespec instances and hide the implementation details of the `PyTreeSpec` class. Changes: 1. Add function `treespec_leaf()` to replace `LeafSpec()`. 2. Add function `treespec_tuple(...)` and `treespec_dict(...)` to create treespec for `tuple` / `dict` which is used for `args` / `*kwargs`. This avoids direct modification to `treespec` instances that rely on the implementation details of the `PyTreeSpec` class. 3. Change `len(spec.children_specs)` to `spec.num_children`. 4. Change `isinstance(spec, LeafSpec)` to `spec.is_leaf()`. ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/160843 Approved by: https://github.com/mlazos	2025-11-01 04:12:11 +00:00
PyTorch MergeBot	85b85f6c2c	Revert "[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 )" This reverts commit `108bb224f7`. Reverted https://github.com/pytorch/pytorch/pull/160843 on behalf of https://github.com/atalman due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/160843#issuecomment-3474354428))	2025-10-31 18:31:32 +00:00
Xuehai Pan	108bb224f7	[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 ) The goal of this PR is to provide a standard way to create simple treespec instances and hide the implementation details of the `PyTreeSpec` class. Changes: 1. Add function `treespec_leaf()` to replace `LeafSpec()`. 2. Add function `treespec_tuple(...)` and `treespec_dict(...)` to create treespec for `tuple` / `dict` which is used for `args` / `*kwargs`. This avoids direct modification to `treespec` instances that rely on the implementation details of the `PyTreeSpec` class. 3. Change `len(spec.children_specs)` to `spec.num_children`. 4. Change `isinstance(spec, LeafSpec)` to `spec.is_leaf()`. ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/160843 Approved by: https://github.com/mlazos	2025-10-31 10:33:16 +00:00
Bin Bao	08b0a8f11a	[Inductor] Fix an inductor_provenance bug (#166432 ) Summary: Fix an inductor_provenance related error seen when running TORCH_COMPILE_DEBUG generated fx_graph_runnable.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166432 Approved by: https://github.com/mlazos	2025-10-30 16:40:12 +00:00
PyTorch MergeBot	972030fe2e	Revert "[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 )" This reverts commit `284716a691`. Reverted https://github.com/pytorch/pytorch/pull/160843 on behalf of https://github.com/atalman due to failing internal torchrec test' ([comment](https://github.com/pytorch/pytorch/pull/160843#issuecomment-3464647878))	2025-10-29 22:46:48 +00:00
Xuehai Pan	284716a691	[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification (#160843 ) The goal of this PR is to provide a standard way to create simple treespec instances and hide the implementation details of the `PyTreeSpec` class. Changes: 1. Add function `treespec_leaf()` to replace `LeafSpec()`. 2. Add function `treespec_tuple(...)` and `treespec_dict(...)` to create treespec for `tuple` / `dict` which is used for `args` / `*kwargs`. This avoids direct modification to `treespec` instances that rely on the implementation details of the `PyTreeSpec` class. 3. Change `len(spec.children_specs)` to `spec.num_children`. 4. Change `isinstance(spec, LeafSpec)` to `spec.is_leaf()`. ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/160843 Approved by: https://github.com/mlazos	2025-10-29 09:16:24 +00:00
Menglu Yu	e95920e3e6	[Optimus] Rename the post_grad_graph tlparse log (#166109 ) Summary: ezyang observed a cache miss issue, see details in https://github.com/pytorch/pytorch/issues/166012 We thus rename the post_grad_graph tlparse log name to resolve the cache issue. Differential Revision: D85309891 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166109 Approved by: https://github.com/jamesjwu	2025-10-28 00:23:01 +00:00
James Wu	e4c01011c2	Mark FlexAttentionBackward as cacheable (#165996 ) This probably should have been marked cacheable a long time ago, no reason that it isn't. Test Plan: New regional inductor tests for test_flex_attention now are serializable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165996 Approved by: https://github.com/oulgen, https://github.com/zou3519, https://github.com/drisspg	2025-10-26 14:39:17 +00:00
Maggie Moss	c7eee49525	Fix pyrefly ignores 1/n (#166239 ) First diff adjusting the syntax for pyrefly: ignore suppressions so they only hide one class of type error. Test: lintrunner pyrefly check Pull Request resolved: https://github.com/pytorch/pytorch/pull/166239 Approved by: https://github.com/oulgen	2025-10-26 00:44:10 +00:00
Maggie Moss	eb83c3ca23	Clean up unused Pyrefly suppressions (#166178 ) Cleaning up ignores that are no longer needed in the repo and adding select suppressions so the main branch is clean. test plan: `lintrunner -a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166178 Approved by: https://github.com/oulgen	2025-10-25 05:32:21 +00:00
Shunting Zhang	673060beae	[inductor] turn Inductor deterministic mode on with torch.use_deterministic_algorithms (#165950 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165950 Approved by: https://github.com/v0i0, https://github.com/eellison	2025-10-23 02:48:42 +00:00
Colin L Reliability Rice	5f370f5c42	inductor_provenance: Correctly handle null provenance (#166019 ) Summary: If the provenance is null, we're getting crashes of the form ``` [trainers0]:E1021 10:51:31.990525 2752 PythonApi.h:87] Exception caught in GeneratedDynamoCompileLoggerConfig: <class 'dsi.logger.py3.GeneratedDynamoCompile.LogEntry.thrift_types.GeneratedDynamoCompileLogEntryThriftBase'>: error initializing Thrift struct field 'inductor_provenance_thrift_safe': Cannot create internal string data representation. Expected type <class 'str'>, got: <class 'NoneType'>. ``` Also fixed a type signature that wasn't being enforced. (It's still not enforced, but it's accurate). Test Plan: Added a new test which reproduces the logging issue Differential Revision: D85173596 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166019 Approved by: https://github.com/ppanchalia, https://github.com/yushangdi	2025-10-22 18:21:57 +00:00
Yuanyuan Chen	f9953e0f61	Enable PLC0414 on ruff (#165828 ) This PR enables `PLC0414` that fixes redundant import aliases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165828 Approved by: https://github.com/albanD	2025-10-22 04:56:52 +00:00
Colin L Reliability Rice	98a488c9aa	Start recording inductor provenance (#162669 ) Summary: This stores information on where fx graphs come from, which makes it significantly easier to debug. One outstanding question 1) I only stored the kernel stack traces, do we also want the node mappings? Test Plan: I wrote a explicit logging test which makes a module, fx traces it, compiles it, and makes sure the logging infomration shows up. ``` clr@devvm17763 ~/fbsource/fbcode/caffe2/test/dynamo % buck2 test @//mode/opt fbcode//caffe2/test/dynamo:test_dynamo -- test_utils File changed: fbsource//xplat/caffe2/test/dynamo/test_utils.py File changed: fbcode//caffe2/test/dynamo/test_utils.py Buck UI: https://www.internalfb.com/buck2/528dea32-2416-4a62-a1ec-39f3c0efdd2e Test UI: https://www.internalfb.com/intern/testinfra/testrun/13229324015574003 Network: Up: 0B Down: 0B Executing actions. Remaining 0/2 Command: test. Time elapsed: 17.3s Tests finished: Pass 16. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Rollback Plan: Differential Revision: D82037582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162669 Approved by: https://github.com/yushangdi	2025-10-16 23:05:31 +00:00
Shunting Zhang	5171f14064	[inductor] verify determinism with inductor benchmark script (#164904 ) Verify the deterministic mode with torch.compile benchmark scripts. Here is what my testing script does (pasted in the end): - run a model in default mode, save it's result - run the model again in default mode, but distort the benchmarking results. Compare it with the saved result. - Do the above again in deterministic mode. I tried to test a few modes - BertForMaskedLM and GoogleFnet: I can repro the numeric change by distorting the benchnmark result in the default mode. The non-determinism is gone in the deterministic mode - DistillGPT2: I can not repro the numeric change by distorting the benchmarking result in the default mode. It does not surprise me much. Reduction order change does not always cause numeric change. ``` model=GoogleFnet export TORCHINDUCTOR_WRITE_ARE_DETERMINISTIC_ALGORITHMS_ENABLED=0 export TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 # disable autotune cache export TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE=0 export TORCHINDUCTOR_FX_GRAPH_CACHE=0 export TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_shunting/ export TORCHINDUCTOR_BENCHMARK_KERNEL=1 export TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 export INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 # Non deterministic mode # --float32 rather than --amp to make it easier to repro non-deterministic echo "Save results for non-deterministic mode" python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-non-deterministic.pkl echo "Compare results with distorted benchmarking in non-deterministic mode" TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-non-deterministic.pkl echo "Save results for deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-deterministic.pkl echo "Compare results with distorted benchmarking in deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-deterministic.pkl ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164904 Approved by: https://github.com/jansel, https://github.com/v0i0	2025-10-12 00:03:42 +00:00
PyTorch MergeBot	d2cb183344	Revert "[inductor] verify determinism with inductor benchmark script (#164904 )" This reverts commit `a3c700656f`. Reverted https://github.com/pytorch/pytorch/pull/164904 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but there seems to be some failed vLLM failures coming out of this ([comment](https://github.com/pytorch/pytorch/pull/164904#issuecomment-3388443678))	2025-10-10 06:23:07 +00:00
Shunting Zhang	a3c700656f	[inductor] verify determinism with inductor benchmark script (#164904 ) Verify the deterministic mode with torch.compile benchmark scripts. Here is what my testing script does (pasted in the end): - run a model in default mode, save it's result - run the model again in default mode, but distort the benchmarking results. Compare it with the saved result. - Do the above again in deterministic mode. I tried to test a few modes - BertForMaskedLM and GoogleFnet: I can repro the numeric change by distorting the benchnmark result in the default mode. The non-determinism is gone in the deterministic mode - DistillGPT2: I can not repro the numeric change by distorting the benchmarking result in the default mode. It does not surprise me much. Reduction order change does not always cause numeric change. ``` model=GoogleFnet export TORCHINDUCTOR_WRITE_ARE_DETERMINISTIC_ALGORITHMS_ENABLED=0 export TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 # disable autotune cache export TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE=0 export TORCHINDUCTOR_FX_GRAPH_CACHE=0 export TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_shunting/ export TORCHINDUCTOR_BENCHMARK_KERNEL=1 export TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 export INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 # Non deterministic mode # --float32 rather than --amp to make it easier to repro non-deterministic echo "Save results for non-deterministic mode" python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-non-deterministic.pkl echo "Compare results with distorted benchmarking in non-deterministic mode" TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-non-deterministic.pkl echo "Save results for deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-deterministic.pkl echo "Compare results with distorted benchmarking in deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-deterministic.pkl ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164904 Approved by: https://github.com/jansel, https://github.com/v0i0 ghstack dependencies: #164801, #164532	2025-10-10 00:00:58 +00:00
Maggie Moss	9944cac6e6	Add suppressions to torch/_inductor (#165062 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Split this directory into two PRs to keep them from being too large. Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (6,884 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165062 Approved by: https://github.com/oulgen, https://github.com/mlazos	2025-10-09 20:34:20 +00:00
Avik Chaudhuri	93e833de0f	[inductor] separate preamble from main work in compile_fx (#164169 ) A couple minor things to clean up the structure of `compile_fx` before we hit pre grad passes: 1. After patching config and recursively calling `compile_fx`, we don't need the patches any more. We make the subsequent logic call a `_maybe_wrap_and_compile_fx_main` (both when cpp wrapper exists and doesn't). 2. There's some recursive wrapping that happens on inputs and outputs before hitting pre grad passes, which are now also separated out before calling a `_compile_fx_main`, where actual work finally happens. These also happen to fix a couple of TODOs in the old code. Differential Revision: D83500704 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164169 Approved by: https://github.com/zhxchen17	2025-10-02 05:44:31 +00:00
Nikita Shulga	f9fa138a39	[BE] Delete all pre py-3.10 checks (#163653 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163653 Approved by: https://github.com/jansel ghstack dependencies: #163648, #163649	2025-09-23 23:22:53 +00:00
Chang Pan	e0cbab46ad	[Inductor] avoid CUDA__equal when constant tensors are from different device (#163529 ) Summary: otherwise, may hit ``` Exception: Expected all tensors to be on the same device, but got other is on cuda:0, different from other tensors on cpu (when checking argument in method wrapper_CUDA__equal) ``` Test Plan: UTs Reviewed By: yushangdi Differential Revision: D82974062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163529 Approved by: https://github.com/yushangdi, https://github.com/Skylion007	2025-09-22 22:04:11 +00:00
Laith Sakka	04ddea44fd	Fix: ShapeEnv not propagated properly to inductor SizeVars (#162927 ) Summary: I am really skeptical about inductor sizevars creating an empty shape env when not provided with one i think we should fail there if the graph has dynamic shapes and no shape env is provided. however i wonder if there are actually use cases that depends on the shape env not being there? Reasoning APIs depends on facts in the shape env. and assumes some stuff exists for specific symbols. Test Plan: Fix the bug reported in creating simple e2e unit test is not trivial https://www.internalfb.com/diff/D82337184 Rollback Plan: Differential Revision: D82412384 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162927 Approved by: https://github.com/ezyang, https://github.com/eellison, https://github.com/jansel	2025-09-18 00:56:22 +00:00
Xuan Zhang	4d4abec80f	allow user to pass in custom partitioner function (#157580 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157580 Approved by: https://github.com/bdhirsh	2025-09-05 22:49:39 +00:00
zhxchen17	eb78757708	[inductor] Lift fw_compiler and bw_compiler as toplevel functions. (#161762 ) This is a no-op refactor to compiler_fx which lifts the logic of fw_compiler and bw_compiler to toplevel, so that they can be reused in a different stack (e.g. precompile). Differential Revision: [D81292968](https://our.internmc.facebook.com/intern/diff/D81292968/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161762 Approved by: https://github.com/angelayi, https://github.com/yushangdi	2025-08-29 21:46:55 +00:00
Shangdi Yu	92c2daebb6	Add inductor provenance tracking artifacts to cache (#161440 ) Summary: - Add inductor provenance tracking artifacts to cache - Update the tlparse version pin to `0.4.0`. The old tlparse version errors out on the new tlparse output. The lowest tlparse version that works is `0.3.42`. tlparse error: ``` thread 'main' panicked at src/parsers.rs:671:71: called `Result::unwrap()` on an `Err` value: Error("EOF while parsing a value", line: 1, column: 0) stack backtrace: 0: 0x55e4ff1c7f00 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h6d42cc84fc840290 1: 0x55e4ff1ee503 - core::fmt::write::h5af61a909e3ec64d 2: 0x55e4ff1c4c33 - std::io::Write::write_fmt::h5a7b54aa6e4a315d 3: 0x55e4ff1c7d52 - std::sys::backtrace::BacktraceLock::print::h555579e7396c26ac 4: 0x55e4ff1c8caf - std::panicking::default_hook::{{closure}}::h9128866118196224 5: 0x55e4ff1c8b1a - std::panicking::default_hook::h52e9e7314e0255f6 6: 0x55e4ff1c9652 - std::panicking::rust_panic_with_hook::h541791bcc774ef34 7: 0x55e4ff1c93fa - std::panicking::begin_panic_handler::{{closure}}::h6479a2f0137c7d19 8: 0x55e4ff1c8419 - std::sys::backtrace::__rust_end_short_backtrace::ha04e7c0fc61ded91 9: 0x55e4ff1c908d - rust_begin_unwind 10: 0x55e4fef7a030 - core::panicking::panic_fmt::h5764ee7030b7a73d 11: 0x55e4fef7a406 - core::result::unwrap_failed::h3ff7104a9ace307a 12: 0x55e4fefb3c56 - <tlparse::parsers::ArtifactParser as tlparse::parsers::StructuredLogParser>::parse::h20bc51a17ffc494a 13: 0x55e4fef9669a - tlparse::run_parser::h20c7729f151eec62 14: 0x55e4fef99a1b - tlparse::parse_path::he4892147f47fbade 15: 0x55e4fef7c760 - tlparse::main::hdc05613b32f4f53b 16: 0x55e4fef89263 - std::sys::backtrace::__rust_begin_short_backtrace::h15f188f3edf42596 17: 0x55e4fef8827d - std::rt::lang_start::{{closure}}::he2c21e32a442538e 18: 0x55e4ff1be0f0 - std::rt::lang_start_internal::h15895544e2012228 19: 0x55e4fef83975 - main 20: 0x7f0b3662a610 - __libc_start_call_main 21: 0x7f0b3662a6c0 - __libc_start_main_alias_2 22: 0x55e4fef7a610 - <unknown> 23: 0x0 - <unknown> ``` Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r test_kernel_information_generation python test/dynamo/test_structured_trace.py -k test_chromium_event ``` Differential Revision: D80976585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161440 Approved by: https://github.com/oulgen	2025-08-28 01:16:02 +00:00
Sandeep Narendranath Karjala	ec585ceab4	[inductor] structured-log graph execution order + test (#160448 ) Summary: - Emit a structured trace per compiled graph execution to reconstruct execution order in TLParse. - Adds debug.log_graph_execution(name) called from `CompiledFxGraph.__call__`, producing an artifact named inductor_graph_execution with payload {"graph": "graph_<id>"}. Testing: - Add inline test to verify structure and output Pull Request resolved: https://github.com/pytorch/pytorch/pull/160448 Approved by: https://github.com/xmfan	2025-08-27 18:12:46 +00:00
PyTorch MergeBot	4a1aca11c2	Revert "[inductor] structured-log graph execution order + test (#160448 )" This reverts commit `995397d47a`. Reverted https://github.com/pytorch/pytorch/pull/160448 on behalf of https://github.com/atalman due to internal failure please see associated diff ([comment](https://github.com/pytorch/pytorch/pull/160448#issuecomment-3223939035))	2025-08-26 12:20:37 +00:00
Sandeep Narendranath Karjala	995397d47a	[inductor] structured-log graph execution order + test (#160448 ) Summary: - Emit a structured trace per compiled graph execution to reconstruct execution order in TLParse. - Adds debug.log_graph_execution(name) called from `CompiledFxGraph.__call__`, producing an artifact named inductor_graph_execution with payload {"graph": "graph_<id>"}. Testing: - Add inline test to verify structure and output Pull Request resolved: https://github.com/pytorch/pytorch/pull/160448 Approved by: https://github.com/xmfan	2025-08-25 20:12:18 +00:00
Colin Peppler	512fc768e9	Add tlparse artifact for joint graph passes (for inference & non-freezing only) (#160589 ) Summary: Joint graph passes run several FX passes which can modify the graph before it hits Inductor. There's three usages of joint graph passes: - for inference & not freezing (we add structured loggings only for this) - for inference & freezing - for fw/bw split Rollback Plan: Reviewed By: yushangdi Differential Revision: D80130321 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160589 Approved by: https://github.com/yushangdi	2025-08-19 23:18:40 +00:00
angelayi	bab79824cb	[aoti-fx] Initial AOTInductor FX (#160765 ) Using the existing WrapperFxCodegen backend, this PR prototypes an AOT version of it which will directly return a graph module. How to use: ```python exported_gm = torch.export.export(model, inp, dynamic_shapes=dynamic_shapes).module() compiled_gm = torch._inductor.aot_compile( exported_gm, inp, options={"fx_wrapper": True, "compile_threads": 1} ) assert torch.allclose(model(inp), compiled_gm(inp)) ``` The motivation behind this is that backends like ExecuTorch/MTIA would like to use inductor's optimization technologies, but might have their own graph lowering pipelines so they might not want to use AOTI (which generates an so). Pull Request resolved: https://github.com/pytorch/pytorch/pull/160765 Approved by: https://github.com/jansel	2025-08-18 18:14:08 +00:00
angelayi	3c8c509a9c	[export] Fix custom ops in subgraphs (#160004 ) Fixes https://github.com/pytorch/pytorch/issues/159995 Currently there are two problems with extern kernels in subgraphs: 1. They don't get serialized to the extern kernel json file because we only look at the toplevel graph. 2. Since the scope of each extern_kernel list is within its own subgraph, the indices referencing the operator is messed up because each subgraph will start counting from 0. So, this PR moves the extern_kernels list to a global view (under virtualized) so that we can count the extern kernels across subgraphs and the toplevel graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160004 Approved by: https://github.com/ydwu4	2025-08-18 15:42:19 +00:00
Sandeep Narendranath Karjala	c699668009	[inductor] TLParse tensor metadata logging + test (#160132 ) Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132 Approved by: https://github.com/xmfan	2025-08-17 04:27:49 +00:00
PyTorch MergeBot	26297c27e2	Revert "[inductor] TLParse tensor metadata logging + test (#160132 )" This reverts commit `2603e40be5`. Reverted https://github.com/pytorch/pytorch/pull/160132 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/17010600949/job/48226137423) [HUD commit link](`2603e40be5`). landrace with another PR that changed some had_cuda related things ([comment](https://github.com/pytorch/pytorch/pull/160132#issuecomment-3193969792))	2025-08-16 23:47:03 +00:00
Sandeep Narendranath Karjala	2603e40be5	[inductor] TLParse tensor metadata logging + test (#160132 ) Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132 Approved by: https://github.com/xmfan ghstack dependencies: #160260	2025-08-16 16:37:18 +00:00
Shangdi Yu	b74c7cd335	Add kernel stack traces tlparse dump (#160608 ) (#160779 ) Summary: as title This is requested by the zoomer team so they can add stack trace information to profiler result. Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces ``` Rollback Plan: Differential Revision: D80050233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160779 Approved by: https://github.com/angelayi	2025-08-16 03:12:38 +00:00
Shangdi Yu	aa99e0958f	Separate provenance tracking to different levels (#160383 ) Summary: as title. We've got request from various parties who are interested in turning on the provenance tracking by default. In this PR, we prepare to turn on part of the provenance tracking that doesn't have too much overhead by default. - Change `provenance_tracking` config to `provenance_tracking_level` - turn on the following provenance tracking by default when `basic_provenance_tracking`=True - `set_kernel_post_grad_provenance_tracing` for kernels, this add mapping between triton kernels and post_grad nodes - `dump_inductor_provenance_info` if we're dumping tlparse log - `get_graph_provenance_json` and dump `reate_mapping_pre_post_grad_nodes`. This creates mapping between pre_grad and post_grad nodes. Since we're not turning on the provenance tracking in GraphTransformObserver by default, the mapping here maybe incomplete/limited. - add stack trace from post grad nodes to inductor IR nodes - add exception swallowing for all functions above Test Plan: CI Rollback Plan: Differential Revision: D80031559 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160383 Approved by: https://github.com/angelayi	2025-08-15 04:59:35 +00:00
Simon Fan	22bedc429f	Extract some HOP utils to be importable (#159705 ) Useful helper function for stage 1 export -> manual partitioner -> stage 2 compile users Pull Request resolved: https://github.com/pytorch/pytorch/pull/159705 Approved by: https://github.com/zou3519 ghstack dependencies: #159134	2025-08-05 23:59:47 +00:00
Sandeep Narendranath Karjala	8034b2a732	[inductor] Add TLParse artifact for logging runtime of collective and compute ops (#159730 ) Summary: - debug.py: Added log_runtime_estimates() function to dump runtime estimation data as structured tlparse artifacts in JSON format - test_structured_trace.py: Added comprehensive test coverage with testing compute and collective ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/159730 Approved by: https://github.com/yushangdi ghstack dependencies: #159190	2025-08-05 22:06:32 +00:00
Sandeep Narendranath Karjala	85e74d5ace	[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 ) This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs. - Iterates over scheduler.nodes, filters for _CollectiveKernel nodes - Extracts each op’s python_kernel_name - Emits a structured JSON payload under the inductor_collective_schedule artifact name - Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact - Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190 Approved by: https://github.com/yushangdi, https://github.com/xmfan	2025-08-01 21:51:42 +00:00
PyTorch MergeBot	490cb3f1a4	Revert "[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 )" This reverts commit `bb62e1f769`. Reverted https://github.com/pytorch/pytorch/pull/159190 on behalf of https://github.com/clee2000 due to broke [GH job link](https://github.com/pytorch/pytorch/actions/runs/16658705097/job/47150840171) [HUD commit link](`bb62e1f769`) on mac ([comment](https://github.com/pytorch/pytorch/pull/159190#issuecomment-3141513921))	2025-07-31 22:22:13 +00:00
Sandeep Narendranath Karjala	bb62e1f769	[inductor] Add logging for distributed collective ops for multi‑rank diagnostics (#159190 ) This change introduces structured logging of the collective communication schedule, enabling downstream tools (e.g. TLParse) to ingest and analyze per‑rank collective‐order information for multi‑rank jobs. - Iterates over scheduler.nodes, filters for _CollectiveKernel nodes - Extracts each op’s python_kernel_name - Emits a structured JSON payload under the inductor_collective_schedule artifact name - Dumps the full schedule list to collective_schedule.json via the PyTorch trace‑structured artifact - Added comprehensive unit tests for collective schedule tracing: Created test_collective_schedule_empty() and test_collective_schedule_real() tests to verify structured trace logging works correctly for both empty collective schedules and real collective operations (like all_reduce and wait_tensor from _c10d_functional ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/159190 Approved by: https://github.com/yushangdi, https://github.com/xmfan	2025-07-31 19:58:07 +00:00
Lucas Kabela	2b1ae29960	[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 ) (#159491 ) Summary: X-link: https://github.com/pytorch/executorch/pull/12986 As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py` Running ``` mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log ``` \| -------- \| Lines Unannotated \| Lines Total \| % lines covered \| Funcs Unannotated \| Funcs Total \| % funcs covered \| \| -------- \| ------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| Main \| 1227 \| 2208 \| 55.57% \| 207 \| 362 \| 57.18% \| \| This PR \| 2217 \| 2217 \| 100.00% \| 362 \| 362 \| 100.00% \| \| Delta \| +990 \| +9 \| +44.43% \| +155 \| 0 \| +42.82% \| cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 jerryzh168 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Test Plan: Imported from GitHub, without a `Test Plan:` line. Rollback Plan: Reviewed By: JacobSzwejbka, yangw-dev Differential Revision: D79199389 Pulled By: Lucaskabela Pull Request resolved: https://github.com/pytorch/pytorch/pull/159491 Approved by: https://github.com/anijain2305, https://github.com/yangw-dev	2025-07-30 22:57:50 +00:00
PyTorch MergeBot	d987a6f7f0	Revert "[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 )" This reverts commit `abcb24f4de`. Reverted https://github.com/pytorch/pytorch/pull/158397 on behalf of https://github.com/yangw-dev due to Suggested to fix failing internal signals on D78911890 ([comment](https://github.com/pytorch/pytorch/pull/158397#issuecomment-3133823766))	2025-07-29 19:49:40 +00:00
Lucas Kabela	abcb24f4de	[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 ) As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py` Running ``` mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log ``` \| -------- \| Lines Unannotated \| Lines Total \| % lines covered \| Funcs Unannotated \| Funcs Total \| % funcs covered \| \| -------- \| ------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| Main \| 1227 \| 2208 \| 55.57% \| 207 \| 362 \| 57.18% \| \| This PR \| 2217 \| 2217 \| 100.00% \| 362 \| 362 \| 100.00% \| \| Delta \| +990 \| +9 \| +44.43% \| +155 \| 0 \| +42.82% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/158397 Approved by: https://github.com/anijain2305	2025-07-24 15:55:18 +00:00
James Wu	1b772de397	Still run TritonBundler with BundledAOTAutogradCache, save autotune results (#158048 ) When running BundledAOTAutogradCache with precompile, we still need to run triton bundling so that the precompiled CompiledFxGraph has triton cuda kernels. We also pre save the autotune results in the precompile artifact. It would be even better to pre trim the cuda kernels on save and apply them, which we can work on later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158048 Approved by: https://github.com/zhxchen17	2025-07-22 14:12:21 +00:00
PyTorch MergeBot	bc379aebe2	Revert "Still run TritonBundler with BundledAOTAutogradCache, save autotune results (#158048 )" This reverts commit `8e57cdb746`. Reverted https://github.com/pytorch/pytorch/pull/158048 on behalf of https://github.com/jeffdaily due to rocm failures due to unit test introduced in this PR, but no pre-merge signal available ([comment](https://github.com/pytorch/pytorch/pull/158048#issuecomment-3098746624))	2025-07-21 20:45:21 +00:00
Benjamin Glass	22920c9138	Grab bag of (mostly) typing improvements (#158075 ) Collects some scattershot improvements made while attempting to enable training for AOTInductor. Non-typing changes are: 1. Swapping a few custom searches for the output node in an FX graph for calling `graph.output_node()`. 2. Removing two unused parameters from `torch.export._unlift._unlift`. 3. Switching handles to constants in `cpp_wrapper_cpu` to use C++ references for memory efficiency. 4. Cleaning out unused, unexported imports from `torch/export/__init__.py`, and adding one missing export to `__all__`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158075 Approved by: https://github.com/Skylion007	2025-07-21 19:17:01 +00:00
James Wu	8e57cdb746	Still run TritonBundler with BundledAOTAutogradCache, save autotune results (#158048 ) When running BundledAOTAutogradCache with precompile, we still need to run triton bundling so that the precompiled CompiledFxGraph has triton cuda kernels. We also pre save the autotune results in the precompile artifact. It would be even better to pre trim the cuda kernels on save and apply them, which we can work on later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158048 Approved by: https://github.com/zhxchen17	2025-07-21 13:35:46 +00:00
Shangdi Yu	1e86fa2e5b	Add stack trace to Inductor IR nodes if `inductor.config.trace.provenance_tracing=True` (#158576 ) Summary: - Split `create_mapping` to `create_mapping_pre_post_grad_nodes` and ` create_node_mapping_kernel_to_post_grad` - Store a mapping from pre_grad graph node names to stack traces in `_inductor_pre_grad_node_stack_trace` - Add `stack_traces` member to ir.Node and add it to the string representation of ir.Node - When we create an IR node, if `inductor.config.trace.provenance_tracing=True`, we populate `stack_traces` from `origins`. The nodes in `origins` are post_grad graph nodes. If a node has `node.stack_trace`, we store the stack_trace directly. This is particularly important for backward graph nodes because they don't have a mapping to pre-grad graph nodes. If a node doesn't have `.stack_trace ` (such as `linear`-> `addmm` nodes), we use the stack trace of the pre_grad graph nodes that it maps to. - A post grad graph node might not have stack trace if it correspond to multiple pre grad graph nodes, e.g. [GroupLinearFusion](`a00442421a/torch/_inductor/fx_passes/group_batch_fusion.py (L299)`) Example: ``` scheduling ExternKernelOut( python_kernel_name='extern_kernels.mm', name=buf0, layout=FixedLayout('cuda:0', torch.float32, size=[8, 16], stride=[16, 1]), inputs=[InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[8, 10], stride=[10, 1])), ReinterpretView( StorageBox( ConstantBuffer(name='fc1_weight', layout=FixedLayout('cuda:0', torch.float32, size=[16, 10], stride=[10, 1])) ), FixedLayout('cuda:0', torch.float32, size=[10, 16], stride=[1, 10]), origins=OrderedSet([mm_default_1]), stack_traces = {, File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/scripts/shangdiy/aot.py", line 29, in forward, x = self.fc1(x), File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/torch/nn/modules/linear.py", line 125, in forward, return F.linear(input, self.weight, self.bias), } )], constant_args=(), kwargs={}, output_view=None, python_kernel_name=extern_kernels.mm, cpp_kernel_name=at::mm_out, ordered_kwargs_for_cpp_kernel=(), op_overload=None, arg_properties=[{}, {}], allarg_properties={}, kwarg_properties=None, unbacked_bindings={}, mutation_outputs=[], origin_node=mm_default_1, origins=OrderedSet([mm_default_1]), stack_traces = {, File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/scripts/shangdiy/aot.py", line 29, in forward, x = self.fc1(x), File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/7b4b7a52e15abb17/scripts/shangdiy/__aot__/aot#link-tree/torch/nn/modules/linear.py", line 125, in forward, return F.linear(input, self.weight, self.bias), } ) ``` Test Plan: ``` buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing ``` Rollback Plan: Differential Revision: D78365534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158576 Approved by: https://github.com/angelayi	2025-07-18 04:05:17 +00:00

1 2 3 4 5 ...

555 Commits