pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
James Wu	be56a8d7ac	Automatically load and save dynamo entries via caching_precompile (#155913 ) This PR adds a new config option, `caching_precompile`, and a `DynamoCache`, which loads and saves Dynamo Cache entries automatically. It also hooks up DynamoCache to PrecompileContext, so that we can save multiple cache entries. When this configuration is turned on, we: - Automatically create and initialize a CompilePackage on every torch.compile - Automatically use BundledAutogradcache - Automatically save the CompilePackage entry to DynamoCache after every compile You can also use PrecompileContext.serialize() to manually serialize a full object. I've added unit tests to exhibit this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155913 Approved by: https://github.com/zhxchen17	2025-07-07 23:57:17 +00:00
PyTorch MergeBot	ae1094b72b	Revert "[WIP] Automatically load and save dynamo entries via caching_precompile (#155913 )" This reverts commit `e466dab164`. Reverted https://github.com/pytorch/pytorch/pull/155913 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to fail a test in trunk ([comment](https://github.com/pytorch/pytorch/pull/155913#issuecomment-3045914878))	2025-07-07 16:53:35 +00:00
James Wu	e466dab164	[WIP] Automatically load and save dynamo entries via caching_precompile (#155913 ) This PR adds a new config option, `caching_precompile`, and a `DynamoCache`, which loads and saves Dynamo Cache entries automatically. It also hooks up DynamoCache to PrecompileContext, so that we can save multiple cache entries. When this configuration is turned on, we: - Automatically create and initialize a CompilePackage on every torch.compile - Automatically use BundledAutogradcache - Automatically save the CompilePackage entry to DynamoCache after every compile You can also use PrecompileContext.serialize() to manually serialize a full object. I've added unit tests to exhibit this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155913 Approved by: https://github.com/zhxchen17	2025-07-07 11:56:30 +00:00
rzou	f0b388665e	Add dynamo_timed to bytecode hook (#157587 ) Test Plan: - ran tlparse on vLLM and saw this Pull Request resolved: https://github.com/pytorch/pytorch/pull/157587 Approved by: https://github.com/jingsh, https://github.com/BoyuanFeng	2025-07-04 01:11:03 +00:00
William Wen	ab2294d828	[dynamo] fix _torchdynamo_orig_callable naming issues (#156901 ) `_torchdynamo_orig_callable` was being used in two distinct places: - to get the original user function from nested eval_frame.py decorators - to get the original backend from nested convert_frame.py callbacks We rename ~the first usage to `_torchdynamo_orig_fn`~ and the second to `_torchdynamo_orig_backend` in order to distinguish these cases. UPDATE: seems like both internal and OSS users depend on `_torchdynamo_orig_callable`, but it only seems in the first context. We should thus keep the original name for the first case then. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156901 Approved by: https://github.com/StrongerXi, https://github.com/jansel	2025-07-02 09:53:55 +00:00
zhxchen17	f096820d0f	[precompile] Detect source code changes for save/load. (#156432 ) Go through all dynamo traced functions and compute checksum for them. While loading a precompilation back to memory, we will always check the checksum and refuse to load when source code changes are detected. Differential Revision: [D76987123](https://our.internmc.facebook.com/intern/diff/D76987123/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156432 Approved by: https://github.com/jansel, https://github.com/jamesjwu	2025-06-30 21:16:15 +00:00
PyTorch MergeBot	1e4c5b666a	Revert "[dynamo] fix _torchdynamo_orig_callable naming issues (#156901 )" This reverts commit `eb9efb37c8`. Reverted https://github.com/pytorch/pytorch/pull/156901 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some internal tests D77411594 ([comment](https://github.com/pytorch/pytorch/pull/156901#issuecomment-3014734151))	2025-06-28 00:37:01 +00:00
William Wen	eb9efb37c8	[dynamo] fix _torchdynamo_orig_callable naming issues (#156901 ) `_torchdynamo_orig_callable` was being used in two distinct places: - to get the original user function from nested eval_frame.py decorators - to get the original backend from nested convert_frame.py callbacks We rename the first usage to `_torchdynamo_orig_fn` and the second to `_torchdynamo_orig_backend` in order to distinguish these cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156901 Approved by: https://github.com/StrongerXi, https://github.com/jansel ghstack dependencies: #156527	2025-06-26 23:51:08 +00:00
William Wen	80d89974c1	[dynamo] raise hard error if error is encountered while tracing resume function prologue (#154564 ) This should prevent bad resume function prologues from slipping by. In particular, graph breaks in resume function prologues will now hard error. Implementation details: - The resume function prologue is surrounded by `LOAD_CONST arg, STORE_FAST __is_tracing_resume_prologue` instructions. The first sequence has `arg=True` and the second sequence has `arg=False`. - InstructionTranslator will know when it is tracing a resume function prologue when it detects `STORE_FAST __is_tracing_resume_prologue`. The top of stack will be True to mark the start of the prologue, False to mark the end. - When `convert_frame.py` detects that an error occurred while the InstructionTranslator was tracing a resume function prologue, we will wrap the exception and hard error Pull Request resolved: https://github.com/pytorch/pytorch/pull/154564 Approved by: https://github.com/jansel ghstack dependencies: #154283, #154289, #154782, #156762, #155166	2025-06-26 21:40:38 +00:00
William Wen	dcb8982969	[dynamo] move error_on_graph_break out of config (#156762 ) error_on_graph_break doesn't need to be in config, so we move it out. It should make the functorch_maml_omniglot regression less severe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156762 Approved by: https://github.com/jansel ghstack dependencies: #154283, #154289, #154782	2025-06-26 21:40:38 +00:00
William Wen	36666033ab	[dynamo] fix set_fullgraph for nested calls (#154782 ) - Make the fullgraph argument of set_fullgraph a positional argument - Fix behavior on nested calls by updating `tracer.error_on_graph_break` in more places. In particular, a tracer's error_on_graph_break is set to the inlined tracer's error_on_graph_break upon the latter's exit. We also track error_on_graph_break in the speculation log now, since if we encounter a nested graph break, we will restart analysis and we need to somehow remember the error_on_graph_break setting after attempting to run the nested function (but we don't actually trace into it in the restart analysis). Pull Request resolved: https://github.com/pytorch/pytorch/pull/154782 Approved by: https://github.com/jansel ghstack dependencies: #154283, #154289	2025-06-26 21:40:38 +00:00
William Wen	7b7eafe7ba	[dynamo] add set_fullgraph decorator/context manager (#154289 ) Implements https://github.com/pytorch/pytorch/issues/144908. Implementation notes: - `set_fullgraph` is implemented using `patch_config`, which changes config correctly during runtime and tracing. - Moved setting `config.error_on_graph_break` from convert_frame.py to eval_frame.py. This is because this should only be done at the top-level decorated function. If we kept this in convert_frame.py, we would be changing `config.error_on_graph_break` on every top-level frame, which causes confusing behavior (see added test for example). - InstructionTranslator reads from `config.error_on_graph_break` every `step()`. This is to determine the value of `config.error_on_graph_break` at the time of the graph break, because tracer cleanup will restore the value of `config.error_on_graph_break` . - `convert_frame.py` determines whether we should abort tracing (fullgraph=True) or continue (fullgraph=False) by reading the value of the tracer's `error_on_graph_break`. If there is no tracer (failed to initialize), then default to reading `config.error_on_graph_break`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154289 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #154283	2025-06-26 21:40:38 +00:00
William Wen	1c3f5e902d	[dynamo] control one_graph behavior additionally through config (#154283 ) `torch.compile` now always goes through `torch._dynamo._optimize`. fullgraph is now implemented in `torch.compile` by looking at `config.error_on_graph_break`. Export still goes through `torch._dynamo._optimize_assert`, which uses `tx.one_graph` instead of `config.error_on_graph_break`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154283 Approved by: https://github.com/jansel, https://github.com/anijain2305	2025-06-26 21:40:38 +00:00
haozhe.zhu	53e0b9c393	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-06-26 10:32:20 +00:00
PyTorch MergeBot	b5c8b8d09f	Revert "[dynamo] control one_graph behavior additionally through config (#154283 )" This reverts commit `b46eb1ccaf`. Reverted https://github.com/pytorch/pytorch/pull/154283 on behalf of https://github.com/ezyang due to All of this is responsible for regression, see https://github.com/pytorch/pytorch/pull/156561 ([comment](https://github.com/pytorch/pytorch/pull/154283#issuecomment-2994242583))	2025-06-22 14:22:07 +00:00
PyTorch MergeBot	5e56db59d4	Revert "[dynamo] add set_fullgraph decorator/context manager (#154289 )" This reverts commit `2c372a0502`. Reverted https://github.com/pytorch/pytorch/pull/154289 on behalf of https://github.com/ezyang due to All of this is responsible for regression, see https://github.com/pytorch/pytorch/pull/156561 ([comment](https://github.com/pytorch/pytorch/pull/154283#issuecomment-2994242583))	2025-06-22 14:22:07 +00:00
PyTorch MergeBot	c10eeb5bad	Revert "[dynamo] fix set_fullgraph for nested calls (#154782 )" This reverts commit `537b0877a8`. Reverted https://github.com/pytorch/pytorch/pull/154782 on behalf of https://github.com/ezyang due to All of this is responsible for regression, see https://github.com/pytorch/pytorch/pull/156561 ([comment](https://github.com/pytorch/pytorch/pull/154283#issuecomment-2994242583))	2025-06-22 14:22:07 +00:00
Pian Pawakapan	92c79f36db	[PGO] frame-specific whitelist logging (#155959 ) Summary: In D75617963, we started logging dynamic whitelist suggestions to PT2 Compile Events. The whitelists were aggregated across all frames, intending to avoid manual work for the user (e.g. if frame 0/1 saw L['x'] turn dynamic, and later 1/1 saw L['y'], we'd log "L['x'],L['y']" on frame 1/1). This switches to frame-specific whitelists, as attributing dynamism changes to certain frames was difficult, and suggestions are sometimes polluted by problematic frames (e.g. optimizer states). The globally aggregated whitelist is still available in tlparse, by looking at the final `put_local_code_state_*` entry. Test Plan: loggercli codegen GeneratedPt2CompileEventsLoggerConfig Rollback Plan: Differential Revision: D76628834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155959 Approved by: https://github.com/bobrenjc93	2025-06-21 06:15:51 +00:00
PyTorch MergeBot	754c04aa06	Revert "[dynamo] raise hard error if error is encountered while tracing resume function prologue (#154564 )" This reverts commit `0aed855b2b`. Reverted https://github.com/pytorch/pytorch/pull/154564 on behalf of https://github.com/ezyang due to regresses functorch_maml_omniglot ([comment](https://github.com/pytorch/pytorch/pull/154564#issuecomment-2992685744))	2025-06-20 20:18:24 +00:00
William Wen	0aed855b2b	[dynamo] raise hard error if error is encountered while tracing resume function prologue (#154564 ) This should prevent bad resume function prologues from slipping by. In particular, graph breaks in resume function prologues will now hard error. Implementation details: - The resume function prologue is surrounded by `LOAD_CONST arg, STORE_FAST __is_tracing_resume_prologue` instructions. The first sequence has `arg=True` and the second sequence has `arg=False`. - InstructionTranslator will know when it is tracing a resume function prologue when it detects `STORE_FAST __is_tracing_resume_prologue`. The top of stack will be True to mark the start of the prologue, False to mark the end. - When `convert_frame.py` detects that an error occurred while the InstructionTranslator was tracing a resume function prologue, we will wrap the exception and hard error Pull Request resolved: https://github.com/pytorch/pytorch/pull/154564 Approved by: https://github.com/jansel ghstack dependencies: #154283, #154289, #154782, #155166	2025-06-20 07:03:29 +00:00
William Wen	537b0877a8	[dynamo] fix set_fullgraph for nested calls (#154782 ) - Make the fullgraph argument of set_fullgraph a positional argument - Fix behavior on nested calls by updating `tracer.error_on_graph_break` in more places. In particular, a tracer's error_on_graph_break is set to the inlined tracer's error_on_graph_break upon the latter's exit. We also track error_on_graph_break in the speculation log now, since if we encounter a nested graph break, we will restart analysis and we need to somehow remember the error_on_graph_break setting after attempting to run the nested function (but we don't actually trace into it in the restart analysis). Pull Request resolved: https://github.com/pytorch/pytorch/pull/154782 Approved by: https://github.com/jansel ghstack dependencies: #154283, #154289	2025-06-20 07:03:16 +00:00
William Wen	2c372a0502	[dynamo] add set_fullgraph decorator/context manager (#154289 ) Implements https://github.com/pytorch/pytorch/issues/144908. Implementation notes: - `set_fullgraph` is implemented using `patch_config`, which changes config correctly during runtime and tracing. - Moved setting `config.error_on_graph_break` from convert_frame.py to eval_frame.py. This is because this should only be done at the top-level decorated function. If we kept this in convert_frame.py, we would be changing `config.error_on_graph_break` on every top-level frame, which causes confusing behavior (see added test for example). - InstructionTranslator reads from `config.error_on_graph_break` every `step()`. This is to determine the value of `config.error_on_graph_break` at the time of the graph break, because tracer cleanup will restore the value of `config.error_on_graph_break` . - `convert_frame.py` determines whether we should abort tracing (fullgraph=True) or continue (fullgraph=False) by reading the value of the tracer's `error_on_graph_break`. If there is no tracer (failed to initialize), then default to reading `config.error_on_graph_break`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154289 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #154283	2025-06-20 07:03:07 +00:00
William Wen	b46eb1ccaf	[dynamo] control one_graph behavior additionally through config (#154283 ) `torch.compile` now always goes through `torch._dynamo._optimize`. fullgraph is now implemented in `torch.compile` by looking at `config.error_on_graph_break`. Export still goes through `torch._dynamo._optimize_assert`, which uses `tx.one_graph` instead of `config.error_on_graph_break`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154283 Approved by: https://github.com/jansel, https://github.com/anijain2305	2025-06-20 07:02:57 +00:00
Chen Haifeng	134dfb3fe6	[dynamo] Fix cycle reference problem caused by recursive collect_temp_source in codegen (#155791 ) Recursive function collect_temp_source with closure in PyCodegen caused cycle reference issue when torch.compile is used. This issue may cause major tensors will not freed timely even there are no user references to these tensors. We saw OOM issues because of this problem in many cases including training and inference using torch.compile. The fix is to use iterative function implementation to replace the recursive function implementation. Fixes #155778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155791 Approved by: https://github.com/ezyang	2025-06-19 19:37:44 +00:00
PyTorch MergeBot	ce3406817d	Revert "[dynamo] control one_graph behavior additionally through config (#154283 )" This reverts commit `fe37db4f12`. Reverted https://github.com/pytorch/pytorch/pull/154283 on behalf of https://github.com/atalman due to inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_do_not_trigger_dynamic_shapes_on_empty_block_mask_cuda GH job link HUD commit link ([comment](https://github.com/pytorch/pytorch/pull/154283#issuecomment-2984795214))	2025-06-18 15:53:32 +00:00
PyTorch MergeBot	c5d3e7a4ff	Revert "[dynamo] add set_fullgraph decorator/context manager (#154289 )" This reverts commit `920f6e681e`. Reverted https://github.com/pytorch/pytorch/pull/154289 on behalf of https://github.com/atalman due to inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_do_not_trigger_dynamic_shapes_on_empty_block_mask_cuda GH job link HUD commit link ([comment](https://github.com/pytorch/pytorch/pull/154289#issuecomment-2984774814))	2025-06-18 15:51:06 +00:00
PyTorch MergeBot	408d9884b0	Revert "[dynamo] fix set_fullgraph for nested calls (#154782 )" This reverts commit `3c8c48f793`. Reverted https://github.com/pytorch/pytorch/pull/154782 on behalf of https://github.com/atalman due to inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_do_not_trigger_dynamic_shapes_on_empty_block_mask_cuda GH job link HUD commit link ([comment](https://github.com/pytorch/pytorch/pull/154782#issuecomment-2984764330))	2025-06-18 15:47:21 +00:00
PyTorch MergeBot	8f02161d10	Revert "[dynamo] raise hard error if error is encountered while tracing resume function prologue (#154564 )" This reverts commit `a6a3a44144`. Reverted https://github.com/pytorch/pytorch/pull/154564 on behalf of https://github.com/atalman due to inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_do_not_trigger_dynamic_shapes_on_empty_block_mask_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/15726606697/job/44333233942) [HUD commit link](`a6a3a44144`) ([comment](https://github.com/pytorch/pytorch/pull/154564#issuecomment-2984409088))	2025-06-18 14:19:39 +00:00
William Wen	a6a3a44144	[dynamo] raise hard error if error is encountered while tracing resume function prologue (#154564 ) This should prevent bad resume function prologues from slipping by. In particular, graph breaks in resume function prologues will now hard error. Implementation details: - The resume function prologue is surrounded by `LOAD_CONST arg, STORE_FAST __is_tracing_resume_prologue` instructions. The first sequence has `arg=True` and the second sequence has `arg=False`. - InstructionTranslator will know when it is tracing a resume function prologue when it detects `STORE_FAST __is_tracing_resume_prologue`. The top of stack will be True to mark the start of the prologue, False to mark the end. - When `convert_frame.py` detects that an error occurred while the InstructionTranslator was tracing a resume function prologue, we will wrap the exception and hard error Pull Request resolved: https://github.com/pytorch/pytorch/pull/154564 Approved by: https://github.com/jansel ghstack dependencies: #154283, #154289, #154782, #155166	2025-06-18 07:27:20 +00:00
William Wen	3c8c48f793	[dynamo] fix set_fullgraph for nested calls (#154782 ) - Make the fullgraph argument of set_fullgraph a positional argument - Fix behavior on nested calls by updating `tracer.error_on_graph_break` in more places. In particular, a tracer's error_on_graph_break is set to the inlined tracer's error_on_graph_break upon the latter's exit. We also track error_on_graph_break in the speculation log now, since if we encounter a nested graph break, we will restart analysis and we need to somehow remember the error_on_graph_break setting after attempting to run the nested function (but we don't actually trace into it in the restart analysis). Pull Request resolved: https://github.com/pytorch/pytorch/pull/154782 Approved by: https://github.com/jansel ghstack dependencies: #154283, #154289	2025-06-18 07:27:09 +00:00
William Wen	920f6e681e	[dynamo] add set_fullgraph decorator/context manager (#154289 ) Implements https://github.com/pytorch/pytorch/issues/144908. Implementation notes: - `set_fullgraph` is implemented using `patch_config`, which changes config correctly during runtime and tracing. - Moved setting `config.error_on_graph_break` from convert_frame.py to eval_frame.py. This is because this should only be done at the top-level decorated function. If we kept this in convert_frame.py, we would be changing `config.error_on_graph_break` on every top-level frame, which causes confusing behavior (see added test for example). - InstructionTranslator reads from `config.error_on_graph_break` every `step()`. This is to determine the value of `config.error_on_graph_break` at the time of the graph break, because tracer cleanup will restore the value of `config.error_on_graph_break` . - `convert_frame.py` determines whether we should abort tracing (fullgraph=True) or continue (fullgraph=False) by reading the value of the tracer's `error_on_graph_break`. If there is no tracer (failed to initialize), then default to reading `config.error_on_graph_break`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154289 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #154283	2025-06-18 07:27:00 +00:00
William Wen	fe37db4f12	[dynamo] control one_graph behavior additionally through config (#154283 ) `torch.compile` now always goes through `torch._dynamo._optimize`. fullgraph is now implemented in `torch.compile` by looking at `config.error_on_graph_break`. Export still goes through `torch._dynamo._optimize_assert`, which uses `tx.one_graph` instead of `config.error_on_graph_break`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154283 Approved by: https://github.com/jansel, https://github.com/anijain2305	2025-06-18 07:26:52 +00:00
James Wu	b2fc9cfea1	[precompile] Add CompilePackage to serialize dynamo states. (#155118 ) Adding a per torch.compile() object CompilePackage which tracks dynamo artifact. CompilePackage is considered a low level component and should not be directly exposed to end users. It has the following interface: 1. `CompilePackage.__init__()` which optionally takes previously serialized dynamo states. a. when `dynamo` argument is None, it will contruct a brand new CompilePackage object. b. when `dynamo` argument is not None, it will load a pre-compiled dynamo state. 2. `package.save()` which dumps the dynamo states into _DynamoCacheEntry. 3. `package.install(backends)` which will handle all the side-effectful global scope updates with compiled functions and resume functions. This diff focus on making the low level mechanism for precompile. It will be left to upper level interface to use these API to build more user-facing frontend. Differential Revision: [D75956538](https://our.internmc.facebook.com/intern/diff/D75956538/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155118 Approved by: https://github.com/jamesjwu Co-authored-by: James Wu <jjwu@meta.com>	2025-06-13 13:54:10 +00:00
Joel Schlosser	c4b93e6579	Replace frame_traced_fn hook with get_traced_code() util (#155249 ) #153622 introduced a hook for getting the relevant code objects after frame tracing. The idea is to have vLLM use this instead of monkey-patching `inline_call_()` to determine the source code files to hash. Unfortunately, the hook runs too late; the vLLM backend needs access to the set of source code filenames while it's running. This PR replaces the newly-added hook with a utility function that a backend can call to get this information. I've made the change in vLLM and can verify that this allows the information to be queried at the right time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155249 Approved by: https://github.com/zou3519	2025-06-10 22:40:58 +00:00
Simon Fan	28796f71d0	Redo D75092426: [internal] Expose additional metadata to compilation callbacks (#155063 ) Originally https://github.com/pytorch/pytorch/pull/153596 --------------- Summary: via reverting D75708685 gate the ROCm failure Test Plan: Unit tests in OSS, sandcastle Rollback Plan: Bifferential Revision: D75894349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155063 Approved by: https://github.com/masnesral	2025-06-05 23:40:31 +00:00
PyTorch MergeBot	35fc5c49b4	Revert "[internal] Expose additional metadata to compilation callbacks (#153596 )" This reverts commit `f889dea97d`. Reverted https://github.com/pytorch/pytorch/pull/153596 on behalf of https://github.com/izaitsevfb due to introduces bunch of callback-related failures on rocm ([comment](https://github.com/pytorch/pytorch/pull/153596#issuecomment-2923139061))	2025-05-30 18:39:27 +00:00
Simon Fan	f889dea97d	[internal] Expose additional metadata to compilation callbacks (#153596 ) These hooks are used by internal stuck job detection to associate compilation events with the compile lease. Previously, we only had events for Dynamo and Inductor compilation. And recently, the callback handler was updated to ignore nested events. So the Inductor event was only really used by lazy backward. Here, I remove the inductor event, and add an explicit lazy backward one. Additionally, I add other runtime compilation events: autotuning and cudagraphs. I also expose the CompileId as a string to avoid imports, this will let internal UIs track each graph's contribution to the timeout. ```python class CallbackTrigger(enum.Enum): # most common case, dynamo attempts to trace a new frame DYNAMO = 1 # backward compilation can be deferred to runtime LAZY_BACKWARD = 2 # some backends autotune at runtime TRITON_AUTOTUNING = 3 # cudagraphs record at runtime CUDAGRAPH_RECORDING = 4 ``` Differential Revision: [D75092426](https://our.internmc.facebook.com/intern/diff/D75092426) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153596 Approved by: https://github.com/masnesral	2025-05-30 08:07:04 +00:00
Zhengxu Chen	0f56318152	[precompile] Add Exception type PackageError for unsupported precompile features. (#154430 ) Summary: Today when guard serialization fails, dynamo will raise an internal error like: ``` torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: CLOSURE_MATCH guard cannot be serialized. ``` Adding a dedicated PackageError type to surface the error more clearly. Test Plan: CI Differential Revision: D75452124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154430 Approved by: https://github.com/jamesjwu, https://github.com/jansel	2025-05-28 22:34:51 +00:00
Joel Schlosser	9db7bcb3fe	[Dynamo] Introduce hook receiving list of traced code objects (#153622 ) This PR: * Expands `Hooks` with a new, optional `frame_traced_fn` field. It should be a callable receiving the list of traced code objects * Maintains a list of `traced_code` objects in the `TracingContext` of an `OutputGraph` * Whenever an `inline_call()` is encountered, the corresponding code object is added to this set * `OutputGraph`'s associated `f_code` is added to the list just before the hook is called I believe use of this hook should enable the source code hashing that vLLM does in a better way than monkey-patching `inline_call()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153622 Approved by: https://github.com/jansel	2025-05-28 15:40:09 +00:00
Animesh Jain	7fdd754136	[compile-time traces] Profile large missing gaps in compile time (#151256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256 Approved by: https://github.com/bdhirsh, https://github.com/masnesral, https://github.com/zou3519, https://github.com/jansel	2025-05-13 14:44:51 +00:00
PyTorch MergeBot	fdc387ec7c	Revert "refine fp32 precision api (#125888 )" This reverts commit `4c11b26158`. Reverted https://github.com/pytorch/pytorch/pull/125888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some failures on ROCm ([comment](https://github.com/pytorch/pytorch/pull/125888#issuecomment-2869274791))	2025-05-11 00:35:46 +00:00
haozhe.zhu	4c11b26158	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-05-10 11:13:04 +00:00
William Wen	5b9df57b50	[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 ) Implement traceable config patching for Dynamo: enables restricted patching of Dynamo config where user can use a context manager/decorator to change tracing behavior for parts of the code. The new `dont_skip_tracing` decorator/context manager for ignoring most trace rules is easily implemented with this more generic traceable config patching feature. Implementation: - Create a new specialized context manager class representing a wrapper around torch._dynamo.config.patch - Dynamo doesn't trace into the context manager but updates config at compile time - Correctness is based on our correctness for handling supported context managers - Implementation is inspired by how `GradModeVariable` is implemented. Previous attempts: https://github.com/pytorch/pytorch/pull/148736 (decorator-only global approach) and https://github.com/pytorch/pytorch/pull/149439 (decorator-only traceback approach) See https://docs.google.com/document/d/1vWNwKL_jpg-PLopifcaSa338wks3GqSVF4GHRguybGg/edit?tab=t.0 for more details on implementation - including previous approaches. NOTE: this PR fixes a bug where skipped code objects were not tracked by convert_frame.py, leading to cases where code objects would be automatically skipped even after `torch._dynamo.reset()`. This exposed some latent dynamo-wrapped test failures in CI that previously passed in CI but not locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150586 Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/anijain2305	2025-04-23 09:12:13 +00:00
PyTorch MergeBot	0bb9b89fb7	Revert "[compile][compile time traces] Add more dynamo traces (#151357 )" This reverts commit `607443b16b`. Reverted https://github.com/pytorch/pytorch/pull/151357 on behalf of https://github.com/wdvr due to stack in a weird state - reverting for now ([comment](https://github.com/pytorch/pytorch/pull/151357#issuecomment-2822369232))	2025-04-22 20:12:44 +00:00
Sam Larsen	529f698ad4	[logging] Put "everything" WaitCounters in dynamo_timed (#151757 ) Summary: The main motivation is to capture the cudagraphs overhead in a WaitCounter. We'll combine that with Triton autotuning, and therefore rename to "compile_runtime_overheads". Since we have a couple WaitCounters where we want to capture all runtime and compile overheads, let's put the accounting in dynamo_timed so we'll automatically capture any toplevel timed regions that get added in the future. Also, dynamo_timed already has to figure out if we're timing a runtime vs. compile-time event, so we can reuse some of that logic. Test Plan: Ran an internal model with `TORCHINDUCTOR_BENCHMARK_FUSION=1` (to get benchmarking at compile time in addition to runtime). Overall compile time from various sources matches up: * tlparse: https://fburl.com/9fgsstkr. Eyeballing, total time should be 32 ranks x 2175 = ~69.6k s * ods: https://fburl.com/canvas/r4clhnb7. Right on. * dynamo_compile: https://fburl.com/scuba/dynamo_compile/ax71aqox. Right on. * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/shcjd9ql. Right on. And the runtime overhead: * ods: https://fburl.com/canvas/nvgjb282 * dynamo_compile: https://fburl.com/scuba/dynamo_compile/f2dtv0qh If we compare that to a run of the same model without the changes in this stack, results can mismatch by a lot: * tlparse: https://fburl.com/cchxwd1s. Eyeballing, total time should be 32 ranks x 2300s = ~73.5k s * ods: https://fburl.com/canvas/x1i3wvf4. It's kinda close * dynamo_compile: https://fburl.com/scuba/dynamo_compile/l7sgxdxd. Waaay too high. * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/jb4s9z1u. This is the only one that's actually correct. The discrepancy is even worse if we focus on the runtime events: * ods: https://fburl.com/canvas/a4o9f7ou * dynamo_compile: https://fburl.com/scuba/dynamo_compile/95izaes1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151757 Approved by: https://github.com/ppanchalia ghstack dependencies: #151749	2025-04-22 03:29:13 +00:00
PyTorch MergeBot	fd04c79878	Revert "[aot autograd][logging] Profile large missing gaps in compile time tracing (#151256 )" This reverts commit `8e373592c8`. Reverted https://github.com/pytorch/pytorch/pull/151256 on behalf of https://github.com/Camyll due to breaking internal tests, cannot import ([comment](https://github.com/pytorch/pytorch/pull/151256#issuecomment-2819244186))	2025-04-21 18:49:23 +00:00
Animesh Jain	607443b16b	[compile][compile time traces] Add more dynamo traces (#151357 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151357 Approved by: https://github.com/williamwen42 ghstack dependencies: #151330, #151256	2025-04-16 20:37:08 +00:00
Animesh Jain	8e373592c8	[aot autograd][logging] Profile large missing gaps in compile time tracing (#151256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151256 Approved by: https://github.com/bdhirsh, https://github.com/masnesral ghstack dependencies: #151330	2025-04-16 20:37:08 +00:00
PyTorch MergeBot	6a3a6d22dc	Revert "[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 )" This reverts commit `40ce4fb24a`. Reverted https://github.com/pytorch/pytorch/pull/150586 on behalf of https://github.com/clee2000 due to broke some inductor tests? inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_dynamo_bisect [GH job link](https://github.com/pytorch/pytorch/actions/runs/14486513628/job/40635178179) [HUD commit link](`40ce4fb24a`), bad TD ([comment](https://github.com/pytorch/pytorch/pull/150586#issuecomment-2810064322))	2025-04-16 16:13:47 +00:00
William Wen	40ce4fb24a	[dynamo] context manager/decorator for dynamo config patching during tracing (#150586 ) Implement traceable config patching for Dynamo: enables restricted patching of Dynamo config where user can use a context manager/decorator to change tracing behavior for parts of the code. The new `dont_skip_tracing` decorator/context manager for ignoring most trace rules is easily implemented with this more generic traceable config patching feature. Implementation: - Create a new specialized context manager class representing a wrapper around torch._dynamo.config.patch - Dynamo doesn't trace into the context manager but updates config at compile time - Correctness is based on our correctness for handling supported context managers - Implementation is inspired by how `GradModeVariable` is implemented. Previous attempts: https://github.com/pytorch/pytorch/pull/148736 (decorator-only global approach) and https://github.com/pytorch/pytorch/pull/149439 (decorator-only traceback approach) See https://docs.google.com/document/d/1vWNwKL_jpg-PLopifcaSa338wks3GqSVF4GHRguybGg/edit?tab=t.0 for more details on implementation - including previous approaches. NOTE: this PR fixes a bug where skipped code objects were not tracked by convert_frame.py, leading to cases where code objects would be automatically skipped even after `torch._dynamo.reset()`. This exposed some latent dynamo-wrapped test failures in CI that previously passed in CI but not locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150586 Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/anijain2305	2025-04-16 06:49:58 +00:00

1 2 3 4 5 ...

355 Commits