pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Joel Schlosser	c4b93e6579	Replace frame_traced_fn hook with get_traced_code() util (#155249 ) #153622 introduced a hook for getting the relevant code objects after frame tracing. The idea is to have vLLM use this instead of monkey-patching `inline_call_()` to determine the source code files to hash. Unfortunately, the hook runs too late; the vLLM backend needs access to the set of source code filenames while it's running. This PR replaces the newly-added hook with a utility function that a backend can call to get this information. I've made the change in vLLM and can verify that this allows the information to be queried at the right time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155249 Approved by: https://github.com/zou3519	2025-06-10 22:40:58 +00:00
Simon Fan	28796f71d0	Redo D75092426: [internal] Expose additional metadata to compilation callbacks (#155063 ) Originally https://github.com/pytorch/pytorch/pull/153596 --------------- Summary: via reverting D75708685 gate the ROCm failure Test Plan: Unit tests in OSS, sandcastle Rollback Plan: Bifferential Revision: D75894349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155063 Approved by: https://github.com/masnesral	2025-06-05 23:40:31 +00:00
PyTorch MergeBot	35fc5c49b4	Revert "[internal] Expose additional metadata to compilation callbacks (#153596 )" This reverts commit `f889dea97d`. Reverted https://github.com/pytorch/pytorch/pull/153596 on behalf of https://github.com/izaitsevfb due to introduces bunch of callback-related failures on rocm ([comment](https://github.com/pytorch/pytorch/pull/153596#issuecomment-2923139061))	2025-05-30 18:39:27 +00:00
Simon Fan	f889dea97d	[internal] Expose additional metadata to compilation callbacks (#153596 ) These hooks are used by internal stuck job detection to associate compilation events with the compile lease. Previously, we only had events for Dynamo and Inductor compilation. And recently, the callback handler was updated to ignore nested events. So the Inductor event was only really used by lazy backward. Here, I remove the inductor event, and add an explicit lazy backward one. Additionally, I add other runtime compilation events: autotuning and cudagraphs. I also expose the CompileId as a string to avoid imports, this will let internal UIs track each graph's contribution to the timeout. ```python class CallbackTrigger(enum.Enum): # most common case, dynamo attempts to trace a new frame DYNAMO = 1 # backward compilation can be deferred to runtime LAZY_BACKWARD = 2 # some backends autotune at runtime TRITON_AUTOTUNING = 3 # cudagraphs record at runtime CUDAGRAPH_RECORDING = 4 ``` Differential Revision: [D75092426](https://our.internmc.facebook.com/intern/diff/D75092426) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153596 Approved by: https://github.com/masnesral	2025-05-30 08:07:04 +00:00
Zhengxu Chen	0f56318152	[precompile] Add Exception type PackageError for unsupported precompile features. (#154430 ) Summary: Today when guard serialization fails, dynamo will raise an internal error like: ``` torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: CLOSURE_MATCH guard cannot be serialized. ``` Adding a dedicated PackageError type to surface the error more clearly. Test Plan: CI Differential Revision: D75452124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154430 Approved by: https://github.com/jamesjwu, https://github.com/jansel	2025-05-28 22:34:51 +00:00
Joel Schlosser	9db7bcb3fe	[Dynamo] Introduce hook receiving list of traced code objects (#153622 ) This PR: * Expands `Hooks` with a new, optional `frame_traced_fn` field. It should be a callable receiving the list of traced code objects * Maintains a list of `traced_code` objects in the `TracingContext` of an `OutputGraph` * Whenever an `inline_call()` is encountered, the corresponding code object is added to this set * `OutputGraph`'s associated `f_code` is added to the list just before the hook is called I believe use of this hook should enable the source code hashing that vLLM does in a better way than monkey-patching `inline_call()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153622 Approved by: https://github.com/jansel	2025-05-28 15:40:09 +00:00
Animesh Jain	7fdd754136	[compile-time traces] Profile large missing gaps in compile time (#151256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256 Approved by: https://github.com/bdhirsh, https://github.com/masnesral, https://github.com/zou3519, https://github.com/jansel	2025-05-13 14:44:51 +00:00
PyTorch MergeBot	fdc387ec7c	Revert "refine fp32 precision api (#125888 )" This reverts commit `4c11b26158`. Reverted https://github.com/pytorch/pytorch/pull/125888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some failures on ROCm ([comment](https://github.com/pytorch/pytorch/pull/125888#issuecomment-2869274791))	2025-05-11 00:35:46 +00:00
haozhe.zhu	4c11b26158	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-05-10 11:13:04 +00:00
William Wen	5b9df57b50	[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 ) Implement traceable config patching for Dynamo: enables restricted patching of Dynamo config where user can use a context manager/decorator to change tracing behavior for parts of the code. The new `dont_skip_tracing` decorator/context manager for ignoring most trace rules is easily implemented with this more generic traceable config patching feature. Implementation: - Create a new specialized context manager class representing a wrapper around torch._dynamo.config.patch - Dynamo doesn't trace into the context manager but updates config at compile time - Correctness is based on our correctness for handling supported context managers - Implementation is inspired by how `GradModeVariable` is implemented. Previous attempts: https://github.com/pytorch/pytorch/pull/148736 (decorator-only global approach) and https://github.com/pytorch/pytorch/pull/149439 (decorator-only traceback approach) See https://docs.google.com/document/d/1vWNwKL_jpg-PLopifcaSa338wks3GqSVF4GHRguybGg/edit?tab=t.0 for more details on implementation - including previous approaches. NOTE: this PR fixes a bug where skipped code objects were not tracked by convert_frame.py, leading to cases where code objects would be automatically skipped even after `torch._dynamo.reset()`. This exposed some latent dynamo-wrapped test failures in CI that previously passed in CI but not locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150586 Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/anijain2305	2025-04-23 09:12:13 +00:00
PyTorch MergeBot	0bb9b89fb7	Revert "[compile][compile time traces] Add more dynamo traces (#151357 )" This reverts commit `607443b16b`. Reverted https://github.com/pytorch/pytorch/pull/151357 on behalf of https://github.com/wdvr due to stack in a weird state - reverting for now ([comment](https://github.com/pytorch/pytorch/pull/151357#issuecomment-2822369232))	2025-04-22 20:12:44 +00:00
Sam Larsen	529f698ad4	[logging] Put "everything" WaitCounters in dynamo_timed (#151757 ) Summary: The main motivation is to capture the cudagraphs overhead in a WaitCounter. We'll combine that with Triton autotuning, and therefore rename to "compile_runtime_overheads". Since we have a couple WaitCounters where we want to capture all runtime and compile overheads, let's put the accounting in dynamo_timed so we'll automatically capture any toplevel timed regions that get added in the future. Also, dynamo_timed already has to figure out if we're timing a runtime vs. compile-time event, so we can reuse some of that logic. Test Plan: Ran an internal model with `TORCHINDUCTOR_BENCHMARK_FUSION=1` (to get benchmarking at compile time in addition to runtime). Overall compile time from various sources matches up: * tlparse: https://fburl.com/9fgsstkr. Eyeballing, total time should be 32 ranks x 2175 = ~69.6k s * ods: https://fburl.com/canvas/r4clhnb7. Right on. * dynamo_compile: https://fburl.com/scuba/dynamo_compile/ax71aqox. Right on. * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/shcjd9ql. Right on. And the runtime overhead: * ods: https://fburl.com/canvas/nvgjb282 * dynamo_compile: https://fburl.com/scuba/dynamo_compile/f2dtv0qh If we compare that to a run of the same model without the changes in this stack, results can mismatch by a lot: * tlparse: https://fburl.com/cchxwd1s. Eyeballing, total time should be 32 ranks x 2300s = ~73.5k s * ods: https://fburl.com/canvas/x1i3wvf4. It's kinda close * dynamo_compile: https://fburl.com/scuba/dynamo_compile/l7sgxdxd. Waaay too high. * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/jb4s9z1u. This is the only one that's actually correct. The discrepancy is even worse if we focus on the runtime events: * ods: https://fburl.com/canvas/a4o9f7ou * dynamo_compile: https://fburl.com/scuba/dynamo_compile/95izaes1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151757 Approved by: https://github.com/ppanchalia ghstack dependencies: #151749	2025-04-22 03:29:13 +00:00
PyTorch MergeBot	fd04c79878	Revert "[aot autograd][logging] Profile large missing gaps in compile time tracing (#151256 )" This reverts commit `8e373592c8`. Reverted https://github.com/pytorch/pytorch/pull/151256 on behalf of https://github.com/Camyll due to breaking internal tests, cannot import ([comment](https://github.com/pytorch/pytorch/pull/151256#issuecomment-2819244186))	2025-04-21 18:49:23 +00:00
Animesh Jain	607443b16b	[compile][compile time traces] Add more dynamo traces (#151357 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151357 Approved by: https://github.com/williamwen42 ghstack dependencies: #151330, #151256	2025-04-16 20:37:08 +00:00
Animesh Jain	8e373592c8	[aot autograd][logging] Profile large missing gaps in compile time tracing (#151256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256 Approved by: https://github.com/bdhirsh, https://github.com/masnesral ghstack dependencies: #151330	2025-04-16 20:37:08 +00:00
PyTorch MergeBot	6a3a6d22dc	Revert "[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 )" This reverts commit `40ce4fb24a`. Reverted https://github.com/pytorch/pytorch/pull/150586 on behalf of https://github.com/clee2000 due to broke some inductor tests? inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_dynamo_bisect [GH job link](https://github.com/pytorch/pytorch/actions/runs/14486513628/job/40635178179) [HUD commit link](`40ce4fb24a`), bad TD ([comment](https://github.com/pytorch/pytorch/pull/150586#issuecomment-2810064322))	2025-04-16 16:13:47 +00:00
William Wen	40ce4fb24a	[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 ) Implement traceable config patching for Dynamo: enables restricted patching of Dynamo config where user can use a context manager/decorator to change tracing behavior for parts of the code. The new `dont_skip_tracing` decorator/context manager for ignoring most trace rules is easily implemented with this more generic traceable config patching feature. Implementation: - Create a new specialized context manager class representing a wrapper around torch._dynamo.config.patch - Dynamo doesn't trace into the context manager but updates config at compile time - Correctness is based on our correctness for handling supported context managers - Implementation is inspired by how `GradModeVariable` is implemented. Previous attempts: https://github.com/pytorch/pytorch/pull/148736 (decorator-only global approach) and https://github.com/pytorch/pytorch/pull/149439 (decorator-only traceback approach) See https://docs.google.com/document/d/1vWNwKL_jpg-PLopifcaSa338wks3GqSVF4GHRguybGg/edit?tab=t.0 for more details on implementation - including previous approaches. NOTE: this PR fixes a bug where skipped code objects were not tracked by convert_frame.py, leading to cases where code objects would be automatically skipped even after `torch._dynamo.reset()`. This exposed some latent dynamo-wrapped test failures in CI that previously passed in CI but not locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150586 Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/anijain2305	2025-04-16 06:49:58 +00:00
Zhengxu Chen	86370fd658	[dynamo] Allow guards to be dropped with custom filter functions. (#150936 ) Summary: A follow up of https://github.com/pytorch/pytorch/pull/150689. Test Plan: test_dynamo -k test_guard_filter_fn Differential Revision: D72722322 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150936 Approved by: https://github.com/jansel	2025-04-11 03:06:34 +00:00
Prajesh Praveen Anchalia	48e9ffc873	Unify on dynamo_compile as the overall wait counter (#150293 ) Summary: dynamo_compile for the most part has been accounting for compile time except autotuning. all_compilation_types had earlier been injected on fx_codegen_and_compile, which was incorrect. Add autotuining to dynamo and deprcate all_compilation_types counter. Differential Revision: D72145447 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150293 Approved by: https://github.com/masnesral, https://github.com/jamesjwu	2025-04-01 08:55:51 +00:00
Hollow Man	0692301e25	Catch OSError in general when writing files (#149464 ) Redundant exception types in `except (PermissionError, OSError):`. Write `except OSError:`, which catches exactly the same exceptions. https://github.com/pytorch/pytorch/actions/runs/13935844871/job/39141062991 When hipify files, or writing cprofile files, PermissionError is not enough when the file is located in a place that is not writable at all, or other OS errors happened when writing files. This fix makes the code more robust. Example error log: ```log File "deepspeed/ops/adam/fused_adam.py", line 94, in __init__ fused_adam_cuda = FusedAdamBuilder().load() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/ops/op_builder/builder.py", line 540, in load return self.jit_load(verbose) ^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/ops/op_builder/builder.py", line 587, in jit_load op_module = load(name=self.name, ^^^^^^^^^^^^^^^^^^^^ File "torch/utils/cpp_extension.py", line 1597, in load return _jit_compile( ^^^^^^^^^^^^^ File "torch/utils/cpp_extension.py", line 2031, in _jit_compile hipify_result = hipify_python.hipify( ^^^^^^^^^^^^^^^^^^^^^ File "torch/utils/hipify/hipify_python.py", line 1167, in hipify preprocess_file_and_save_result(output_directory, filepath, all_files, header_include_dirs, File "torch/utils/hipify/hipify_python.py", line 213, in preprocess_file_and_save_result result = preprocessor(output_directory, filepath, all_files, header_include_dirs, stats, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/utils/hipify/hipify_python.py", line 940, in preprocessor output_source = RE_QUOTE_HEADER.sub(mk_repl('#include "{0}"', True), output_source) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/utils/hipify/hipify_python.py", line 919, in repl preprocess_file_and_save_result(output_directory, File "torch/utils/hipify/hipify_python.py", line 213, in preprocess_file_and_save_result result = preprocessor(output_directory, filepath, all_files, header_include_dirs, stats, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/utils/hipify/hipify_python.py", line 986, in preprocessor with clean_ctx.open(fout_path, 'w', encoding='utf-8') as fout: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/utils/hipify/hipify_python.py", line 123, in open return open(fn, args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [Errno 30] Read-only file system: 'deepspeed/ops/csrc/adam/multi_tensor_apply_hip.cuh' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149464 Approved by: https://github.com/janeyx99	2025-03-21 02:42:50 +00:00
William Wen	6285a71aba	[dynamo] fix bug where non-recursive disable modifies the original function (#148896 ) Fixes https://github.com/pytorch/pytorch/issues/148787. We fix this by: - Wrapping the original function instead of directly modifying it - When we detect that the previous frame is the non-recursive disable wrapper, then skip tracing this frame (non-recursive disable wrapper will always be skipped, so that frame will be present in the traceback)l Pull Request resolved: https://github.com/pytorch/pytorch/pull/148896 Approved by: https://github.com/jansel	2025-03-20 18:33:54 +00:00
Animesh Jain	a3c286677b	[compile] Switch off inference mode during compilation (#149321 ) PR does following * Turns `inference_mode` to False and `no_grad` for `convert_frame`, if the inference_mode is on globally. * Turns off inference_mode for fake tensor prop. This ensures that converting from real inference tensor to a fake tensor removes the inference-ness. * Graph breaks on is_inference and is_inference_mode_enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149321 Approved by: https://github.com/jansel, https://github.com/zou3519	2025-03-19 02:45:27 +00:00
Guilherme Leobas	daff65d671	Correctly propagate exception to parent tx (#146502 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146502 Approved by: https://github.com/anijain2305, https://github.com/williamwen42, https://github.com/zou3519 ghstack dependencies: #146504, #146499	2025-03-11 18:55:45 +00:00
bobrenjc93	8b65d522e1	refactor delayed compile to use code context (#148530 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148530 Approved by: https://github.com/williamwen42 ghstack dependencies: #148509	2025-03-06 04:02:30 +00:00
bobrenjc93	da2688f624	Introduce delayed compile via `eager_then_compile` stance (#147983 ) Recently I've been experimenting with introducing new APIs to delay compile as a way to reduce compile times while improving the ergonomics of using dynamic shapes. The high level idea is to run the first invocation of compile in eager, save the example inputs, and on the second invocation we can derive the dynamism in the inputs so that we don't need to waste our time doing a compile with static shapes (which is the status quo today with automatic dynamic). Another benefit of this is most users no longer need to annotate their inputs with mark_dynamic and mark_unbaked calls since we can derive the dynamism on the very first call. Additionally we get dynamic ints out of the box in this new regime. This PR implements this idea through the set_stance APIs. In particular it introduces a new `eager_then_compile` stance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147983 Approved by: https://github.com/williamwen42	2025-03-04 07:46:31 +00:00
William Wen	8f361c808b	[dynamo] run-only recursively on recompile limit exceeded (#148021 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148021 Approved by: https://github.com/anijain2305	2025-03-03 21:01:08 +00:00
bobrenjc93	83ec7cdcd4	Fix recompile reason logging (#148200 ) for the following test case ``` @torch.compile(dynamic=False, backend=cnts) def fn(x, y, z): return x * y * z[0] fn(1, torch.randn(1), {0: torch.randn(1)}) fn(2, torch.randn(2), {0: torch.randn(2)}) fn(3, torch.randn(3), {0: torch.randn(3)}) fn(4, torch.randn(4), {0: torch.randn(4)}) fn(5, torch.randn(5), {0: torch.randn(5)}) ``` previously we would log ``` 0/0: L['x'] == 1 0/0: L['x'] == 1 0/0: L['x'] == 1 0/0: L['x'] == 1 ``` but after this change we now log ``` 0/0: L['x'] == 1 0/1: L['x'] == 2 0/2: L['x'] == 3 0/3: L['x'] == 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148200 Approved by: https://github.com/xmfan	2025-02-28 22:33:37 +00:00
Xuehai Pan	3ce352e389	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144549 Approved by: https://github.com/jansel	2025-02-28 03:03:53 +00:00
Raymond Li	c5bf9aaf1c	Log graph breaks (#146537 ) Graph breaks currently aren't logged to dynamo_compile and pt2_compile_events. We want to log them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146537 Approved by: https://github.com/c00w	2025-02-27 11:06:33 +00:00
William Wen	cf6d1e6824	[dynamo] add generic graph break hints (#147429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147429 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #147385	2025-02-26 09:20:28 +00:00
William Wen	3fd68e4e2f	[dynamo] make some more graph break messages readable in English [2/N] (#147385 ) This is for "for some large number Z, make sure the error messages are readable English." - beginning to audit all `unimplemented` sites and making sure that all messages are at least English-readable. Hints may not necessarily be provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147385 Approved by: https://github.com/jansel	2025-02-26 09:20:28 +00:00
Simon Fan	ed83b0b70b	[ddp] decouple python reducer from compilation mode (#147123 ) Current implementation reads as: we will only actually use the "python_reducer" config if the DDP forward is compiled. Otherwise, we will silently fallback to C++ reducer + no DDPOptimizer. I'm changing this behavior to always use the python reducer if the config is specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147123 Approved by: https://github.com/fegin	2025-02-19 15:51:40 +00:00
William Wen	63e8ad49b8	[dynamo] replace hardcoded eval frame control flags skip_code_recursive_flag/cache_limit_hit_flag (#146355 ) This PR and the previous: - Moves parts of `eval_frame.c` to C++. - Reduces code duplication in `dynamo__custom_eval_frame` and makes the control flow more clear. - Enables `convert_frame` to signal to `eval_frame.cpp` in a general manner how to evaluate this frame, recursive frames, and future frames with the same code object (default/compile, skip, run-only). e.g. this will allow us to change skipping/cache limit hit eval_frame behavior directly from convert_frame without requiring changes to C/C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146355 Approved by: https://github.com/jansel ghstack dependencies: #145603	2025-02-18 21:37:12 +00:00
zeshengzong	c6b331f7d9	Deprecate `skip_code_recursive_on_cache_limit_hit` config flag (#136970 ) Fixes one of #136862 Make `skip_code_recursive_on_cache_limit_hit` flag deprecated. Affected logic is in here: `6931c1644a/torch/_dynamo/convert_frame.py (L866-L876)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136970 Approved by: https://github.com/williamwen42	2025-02-18 18:48:23 +00:00
Raymond Li	21c2565f35	Document dynamo (#146736 ) Many files in dynamo are currently lacking file/module-level documentation, which makes it hard to know what they do at a glance and without digging into the code. This fixes that. Note: documentation was AI-generated and could be incorrect, please review carefully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146736 Approved by: https://github.com/jansel, https://github.com/StrongerXi, https://github.com/anijain2305, https://github.com/zou3519	2025-02-13 00:02:21 +00:00
Animesh Jain	ee45ea599d	[dynamo] Actionable message on recompilations for fullgraph=True (#146550 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146550 Approved by: https://github.com/zou3519, https://github.com/StrongerXi ghstack dependencies: #146553	2025-02-07 17:28:43 +00:00
Simon Fan	a14c780c4c	[dynamo] fix dynamo_compile logging on RecompileLimitExceeded (#146544 ) Logging branches based on RecompileLimitExceeded or not. If we exceed the limit, we fallback to eager before even trying to analyze the frame. We handle RecompileLimitExceeded outside of the try/catch/finally that edits the metrics context: `72405b0c0f/torch/_dynamo/convert_frame.py (L908-L935)`. dynamo_config and recompile_reason are both known before we raise the RecompileLimitExceeded, so we can add them with the rest of the "common" metrics. which are logged on metric_context decorator exit and is always called Pull Request resolved: https://github.com/pytorch/pytorch/pull/146544 Approved by: https://github.com/masnesral	2025-02-06 16:20:42 +00:00
Simon Fan	1d4adf4e1f	[dynamo] log recompile reason to dynamo_compile (#146117 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146117 Approved by: https://github.com/bobrenjc93	2025-02-03 21:04:04 +00:00
PyTorch MergeBot	3481c2aec4	Revert "[dynamo] save/restore system random state more carefully (#145750 )" This reverts commit `e3d3f2b22e`. Reverted https://github.com/pytorch/pytorch/pull/145750 on behalf of https://github.com/eellison due to bisected perf regression ([comment](https://github.com/pytorch/pytorch/pull/145750#issuecomment-2620028414))	2025-01-28 20:51:07 +00:00
William Wen	e3d3f2b22e	[dynamo] save/restore system random state more carefully (#145750 ) Reattempt of https://github.com/pytorch/pytorch/pull/145435 since the state of the linked internal diff appears to be messed up. Note: I have verified that the previously failing internal tests now pass internally. Differential Revision: [D68723334](https://our.internmc.facebook.com/intern/diff/D68723334) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145750 Approved by: https://github.com/StrongerXi	2025-01-28 01:34:13 +00:00
Aaron Orenstein	a79100ab11	PEP585 update - torch/_dynamo (#145105 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105 Approved by: https://github.com/bobrenjc93	2025-01-18 20:47:11 +00:00
Laith Sakka	c3fcb3606d	Profile compile_inner instead of _compile_inner (#144930 ) Summary: title Test Plan: NA Reviewed By: jamesjwu Differential Revision: D67990492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144930 Approved by: https://github.com/jamesjwu	2025-01-16 23:59:27 +00:00
Colin L. Rice	0cd9320c7f	easy: dynamo_config: sort keys and set values (#143317 ) This will create consistent ordering of keys when writing, as well as sorting sets before serializing Pull Request resolved: https://github.com/pytorch/pytorch/pull/143317 Approved by: https://github.com/masnesral ghstack dependencies: #143307	2025-01-11 03:08:04 +00:00
bobrenjc93	1fe3af2c68	Migrate from Tuple -> tuple in torch/_dynamo (#144261 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144261 Approved by: https://github.com/aorenste, https://github.com/zou3519	2025-01-10 07:45:57 +00:00
James Wu	f2d6cfa677	Introduce CompileEventLogger, replace usages of metrics_context and chromium_event with it (#143420 ) Problem statement: I want to be able to centralize and simplify the process by which people add columns/data to existing spans. We have MetricsContext and ChromiumEventLogger, and there's various choices you can make to decide where and when to log different levels of observability for your events. To resolve this, I want a central API for "adding to events under dynamo_timed". CompileEventLogger is intended as a frontend for MetricsContext and ChromiumEventLogger so we can use the same class for handling everything. CompileEventLogger is intended be used within a `dynamo_timed()` context. Its purpose is to 1. log to existing events that are in progress (i.e. within dynamo_timed), and 2. log instant events to chromium that are independent of any specific span. CompileEventLogger has three log levels: - CHROMIUM: Log only to chromium events, visible via tlparse. - PT2_COMPILE: Log to chromium_events + pt2_compile_events - COMPILATION_METRIC: Log to compilation metrics in addition to the toplevel chromium and pt2_compile_event. In addition, we have a function CompileEventLogger.add() that automagically chooses the correct log level. For now, it is conservative, and will never automagically choose to log CompilationMetrics (though I could imagine it figuring out the metadata are all keys in CompilationMetric and therefore loggable there). The goal here is to make one single interface to log stuff for observability reasons, and make it as easy as possible. Not included in this diff: - V1 of this diff will not have implementations of `increment` and `add_to_set` which MetricsContext has, so those usages are not replaced yet. But I'll add those in a followup. - We don't handle `RuntimeMetricsContext`. It's unclear if I want that to be part of this, because under RuntimeMetricsContext there might not be a toplevel event to log to, so chromium events doesn't make sense in that context. So I might leave that separate for now. Differential Revision: [D67346203](https://our.internmc.facebook.com/intern/diff/D67346203/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143420 Approved by: https://github.com/aorenste	2025-01-04 22:40:34 +00:00
Vinayak Pandey	16a57e232c	removed dead code for dynamo flag dead_code_elimination (#140938 ) Fixes #136862 1. removed dead code from torch/_dynamo/convert_frame.py 2. ran `lintrunner -a` and all the tests passed. 3. ran the unit tests and everything seems to be in order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140938 Approved by: https://github.com/zou3519	2024-12-31 09:27:43 +00:00
Jason Ansel	9e5f3fdfc7	[dynamo] Shorten tracebacks for backend compiler errors (#143552 ) Fixes #143406 After this PR the error for missing Triton is: ```py Traceback (most recent call last): File "/home/jansel/pytorch/repro.py", line 51, in <module> fp32_compiled = optimized_model(low_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 580, in _fn raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3624, in create_backend raise TritonMissing(inspect.currentframe()) torch._dynamo.exc.TritonMissing: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` Setting `TORCHDYNAMO_VERBOSE=1` yields something like the old error: ```py Traceback (most recent call last): File "/home/jansel/pytorch/repro.py", line 51, in <module> fp32_compiled = optimized_model(low_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 580, in _fn raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 576, in _fn return fn(args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 1383, in __call__ return self._torchdynamo_orig_callable( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 1167, in __call__ result = self._inner_convert( ^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 548, in __call__ return _compile( ^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 988, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 716, in compile_inner return _compile_inner(code, one_graph, hooks, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_utils_internal.py", line 95, in wrapper_function return function(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 751, in _compile_inner out_code = transform_code_object(code, transform) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object transformations(instructions, code_options) File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 232, in _fn return fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 663, in transform tracer.run() File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 2870, in run super().run() File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 1053, in run while self.step(): ^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 963, in step self.dispatch_table[inst.opcode](self, inst) File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 3050, in RETURN_VALUE self._return(inst) File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 3035, in _return self.output.compile_subgraph( File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1102, in compile_subgraph self.compile_and_call_fx_graph( File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1383, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1433, in call_user_compiler return self._call_user_compiler(gm) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1463, in _call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ compiled_gm = compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/__init__.py", line 2314, in __call__ return compile_fx(model_, inputs_, config_patches=self.config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1880, in compile_fx return aot_autograd( ^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/backends/common.py", line 83, in __call__ cg = aot_module_simplified(gm, example_inputs, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 1145, in aot_module_simplified compiled_fn = AOTAutogradCache.load( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/_aot_autograd/autograd_cache.py", line 754, in load compiled_fn = dispatch_and_compile() ^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile compiled_fn, _ = create_aot_dispatcher_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function return _create_aot_dispatcher_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function compiled_fn, fw_metadata = compiler_fn( ^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 676, in aot_dispatch_autograd compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 489, in __call__ return self.compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1758, in fw_compiler_base return inner_compile( ^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 572, in compile_fx_inner return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper inner_compiled_fn = compiler_fn(gm, example_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 686, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile compiled_fn = graph.compile_to_module().call ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1975, in compile_to_module return self._compile_to_module() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1981, in _compile_to_module self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() ^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1916, in codegen self.scheduler.codegen() File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3667, in codegen return self._codegen() ^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3761, in _codegen if device is not None and self.get_backend(device).ready_to_flush(): ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3631, in get_backend self.backends[device] = self.create_backend(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3624, in create_backend raise TritonMissing(inspect.currentframe()) torch._dynamo.exc.TritonMissing: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` This PR also strips dynamo stack frames from other types of backend compile errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143552 Approved by: https://github.com/yanboliang	2024-12-24 21:48:23 +00:00
Oguz Ulgen	dc55704b48	Rename cache limit to recompile limit in configs (#143709 ) This PR renames every cache_limit to recompile_limit via sed. Old config options are maintained via Config(alias='xyz') Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709 Approved by: https://github.com/jansel	2024-12-22 10:03:57 +00:00
Simon Fan	d88ebbf822	cleanup chromium event log on dynamo exit rather than on entry (#143175 ) clearing at dynamo start is an issue because it throws away events from compiled autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/143175 Approved by: https://github.com/Skylion007, https://github.com/jamesjwu ghstack dependencies: #141907	2024-12-21 00:41:24 +00:00
Simon Fan	4ee166b82f	[ca] add compiled autograd to CompileId (#141907 ) tlparse PR: https://github.com/ezyang/tlparse/pull/83 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141907 Approved by: https://github.com/ezyang	2024-12-21 00:41:24 +00:00

1 2 3 4 5 ...

322 Commits