pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Michael Lazos	96c36a6947	[Dynamo] Implement graph region tracking for deduplication (#141381 ) This PR implements graph region tracking for later extraction into common subgraphs. The algorithm is as follows: `GraphRegionTracker` tracks each node added to the output graph and generates a key based on the source location, instruction pointer, input shapes, and global state at the time the node is inserted into the graph. Nodes with the same key are grouped together in a list of identical nodes. Once graph capture is complete, these nodes are organized into region groups. A region group looks like this: [[IdenticalNode1], [IdenticalNode2], [IdenticalNode3]] and each sublist is called a region. For each region group (starting at the topologically latest region group), the inner regions are gradually expanded one node at time from args and kwargs of the node in each region provided that for all regions in the group, the nodes being added are also identical (ie have the same key computed above). The `get_identical_regions` function is the main entry point which will be used by the graph replacement algorithm in #141383 Edge cases to add more testing for in future PRs (in progress): * ~~multiple nodes on the same line~~ (implemented) * ~~dynamic shapes checking (need to verify symbolic inputs are the same across subgraphs)~~ (implemented) * ensure we don't expand regions where it will create a cycle during subgraph replacement * ensure outputs are always tensors (or tuples of tensors iirc) * ~~out of order kwargs, unevenly nested kwargs~~ (implemented) * input aliasing - TBD, we may add support for this in `invoke_subgraph` or reuse the aliasing analysis here to not form regions with these properties * ~~all global state~~ (implemented) Other followups: * consolidate global state checking across all caching infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/141381 Approved by: https://github.com/zou3519	2024-12-11 02:22:21 +00:00
PyTorch MergeBot	e7de245ee1	Revert "[reland][dynamo][guards] Consider tensors as immutable for dict tag matches (#141085 )" This reverts commit `8bfc0094e4`. Reverted https://github.com/pytorch/pytorch/pull/141085 on behalf of https://github.com/williamwen42 due to internal regression ([comment](https://github.com/pytorch/pytorch/pull/141085#issuecomment-2522403360))	2024-12-06 07:50:10 +00:00
Animesh Jain	8bfc0094e4	[reland][dynamo][guards] Consider tensors as immutable for dict tag matches (#141085 ) Reland - https://github.com/pytorch/pytorch/pull/139560 As mentioned in https://github.com/pytorch/pytorch/pull/130341, using `static py::object` can lead to segfaults. I suspect this is the reason for the import system error seen internally (https://www.internalfb.com/sevmanager/view/469592). In this PR, I am removing the `static` part. This is fine and also the right thing to do because this will catch if user changes the flag in the same process for compiling two different functions. Unfortunately, there is no easy way to trigger this segfault, so I can't write a test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141085 Approved by: https://github.com/jansel Co-authored-by: William Wen <williamwen@meta.com>	2024-12-06 01:49:55 +00:00
snahir	16ea0ddcdb	Ignore logger methods to avoid graph breaks (#139403 ) Fixes #132635 Calls to logging.logger cause a graph break, this PR allows the user to avoid these graph breaks (for specific methods) by setting DISABLE_LOGS_WHILE_COMPILING to 1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139403 Approved by: https://github.com/williamwen42	2024-12-05 20:12:26 +00:00
Bob Ren	a5ec09d0cd	Flip specialize_float to default False in fbcode (#142111 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142111 Approved by: https://github.com/ezyang	2024-12-05 18:23:47 +00:00
William Wen	408669a559	[dynamo, 3.13] disable 3.13.0 warning in dynamo-wrapped tests (#141860 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141860 Approved by: https://github.com/StrongerXi, https://github.com/atalman ghstack dependencies: #141409, #142003, #141572, #141577, #141605, #141621, #141623, #141673, #141674, #141858, #141862, #139533, #140733, #141859	2024-12-05 00:33:26 +00:00
Bob Ren	43c5f59190	flip capture_autograd_function to default to true and warn if false (#141972 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141972 Approved by: https://github.com/zou3519 ghstack dependencies: #141932	2024-12-03 19:50:14 +00:00
Bob Ren	e1e3bbc2e1	Set capture_autograd_function=False by default (#141932 ) https://github.com/pytorch/pytorch/pull/136959 cleaned up the flag and added a warning. @Chillee pointed out that we should really default this flag to false otherwise we subject all users that go down this code path to log spew. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141932 Approved by: https://github.com/jansel	2024-12-03 07:59:03 +00:00
Bob Ren	2f72635a5c	automatic dynamic unspecialize float (#141647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141647 Approved by: https://github.com/ezyang	2024-11-29 22:36:53 +00:00
PyTorch MergeBot	9e98b3d73c	Revert "automatic dynamic unspecialize float (#141647 )" This reverts commit `1a32daeb17`. Reverted https://github.com/pytorch/pytorch/pull/141647 on behalf of https://github.com/atalman due to functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inner_grad [GH job link](https://github.com/pytorch/pytorch/actions/runs/12080983316/job/33697901875) [HUD commit link](`1a32daeb17`) ([comment](https://github.com/pytorch/pytorch/pull/141647#issuecomment-2507980876))	2024-11-29 15:00:33 +00:00
Bob Ren	1a32daeb17	automatic dynamic unspecialize float (#141647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141647 Approved by: https://github.com/ezyang	2024-11-29 07:53:53 +00:00
PyTorch MergeBot	ad37afd590	Revert "Always unspecialize float in OSS (#138922 )" This reverts commit `ba5253da9b`. Reverted https://github.com/pytorch/pytorch/pull/138922 on behalf of https://github.com/yf225 due to perf regression on torchbench ([comment](https://github.com/pytorch/pytorch/pull/138922#issuecomment-2499277511))	2024-11-26 00:03:03 +00:00
Bob Ren	ba5253da9b	Always unspecialize float in OSS (#138922 ) Fixes https://github.com/pytorch/pytorch/issues/107277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138922 Approved by: https://github.com/ezyang Co-authored-by: Edward Z. Yang <ezyang@meta.com>	2024-11-24 01:58:13 +00:00
PyTorch MergeBot	a8c90e5140	Revert "Always unspecialize float in OSS (#138922 )" This reverts commit `6d779d0549`. Reverted https://github.com/pytorch/pytorch/pull/138922 on behalf of https://github.com/huydhn due to Sorry for reverting your change but there is some slow tests failing after this land ([comment](https://github.com/pytorch/pytorch/pull/138922#issuecomment-2495076878))	2024-11-22 23:18:36 +00:00
Bob Ren	6d779d0549	Always unspecialize float in OSS (#138922 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138922 Approved by: https://github.com/ezyang Co-authored-by: Edward Z. Yang <ezyang@meta.com>	2024-11-22 17:54:42 +00:00
PyTorch MergeBot	d276688da6	Revert "[dynamo][guards] Consider tensors as immutable for dict tag matches (#139560 )" This reverts commit `b09eb6ed6a`. Reverted https://github.com/pytorch/pytorch/pull/139560 on behalf of https://github.com/anijain2305 due to internal test failures ([comment](https://github.com/pytorch/pytorch/pull/139560#issuecomment-2486344859))	2024-11-19 17:37:44 +00:00
Animesh Jain	b09eb6ed6a	[dynamo][guards] Consider tensors as immutable for dict tag matches (#139560 ) This is a bug on the main exposed by https://github.com/pytorch/pytorch/issues/139476 We have dict tag optimization where if the dict tag does not change, we skip guards on all the items of the dict that are "immutable". We considered tensors as immutable in such scenarios. This is critical for guard eval performance, because generally users dont change their parameters. If I try to remove this optimization, we see slowdowns, e.g, 3.03x to 2.95x on conv_mixer TIMM benchamrk. So, I am adding a flag which keeps the current state but allows the users to remove this optimization. Not ideal, but given how serious guard eval perf has to be, we are in the gray are of unsoundness vs performance tradeoff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139560 Approved by: https://github.com/jansel	2024-11-05 21:48:07 +00:00
PyTorch MergeBot	4d5cc1b4ef	Revert "[dynamo][guards] Consider tensors as immutable for dict tag matches (#139560 )" This reverts commit `e6ff07f00e`. Reverted https://github.com/pytorch/pytorch/pull/139560 on behalf of https://github.com/ZainRizvi due to Sorry but this seems to be breaking internal tests. Please see D65430317 for more details ([comment](https://github.com/pytorch/pytorch/pull/139560#issuecomment-2457620720))	2024-11-05 16:22:30 +00:00
Animesh Jain	e6ff07f00e	[dynamo][guards] Consider tensors as immutable for dict tag matches (#139560 ) This is a bug on the main exposed by https://github.com/pytorch/pytorch/issues/139476 We have dict tag optimization where if the dict tag does not change, we skip guards on all the items of the dict that are "immutable". We considered tensors as immutable in such scenarios. This is critical for guard eval performance, because generally users dont change their parameters. If I try to remove this optimization, we see slowdowns, e.g, 3.03x to 2.95x on conv_mixer TIMM benchamrk. So, I am adding a flag which keeps the current state but allows the users to remove this optimization. Not ideal, but given how serious guard eval perf has to be, we are in the gray are of unsoundness vs performance tradeoff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139560 Approved by: https://github.com/jansel	2024-11-04 00:54:20 +00:00
Edward Z. Yang	585dbfa583	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-03 06:29:57 +00:00
PyTorch MergeBot	92d7f29e59	Revert "Profile guided optimization for automatic_dynamic (#139001 )" This reverts commit `f6be44c74e`. Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to more fbcode errors ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452985581))	2024-11-02 13:11:04 +00:00
Edward Z. Yang	f6be44c74e	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-02 11:50:11 +00:00
PyTorch MergeBot	8d1eaa3da6	Revert "Profile guided optimization for automatic_dynamic (#139001 )" This reverts commit `a6630bcf87`. Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to internal code triggers import cycle ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452833882))	2024-11-02 03:38:15 +00:00
Edward Z. Yang	a6630bcf87	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-01 21:43:25 +00:00
Animesh Jain	2aa5348356	[dynamo][guards] Skip no tensor aliasing guards on parameters (#138954 ) This is another unsound guard eval optimization. Its rare in practice to compile a function with two different parameters as inputs, and then later call the function with one parameter input as two different inputs (aliasing). This further reduces guard overhead from 280 us to 240 us for the model in https://github.com/pytorch/pytorch/issues/138386 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138954 Approved by: https://github.com/jansel ghstack dependencies: #139040	2024-10-29 02:11:47 +00:00
Simon Fan	fd9f4e6770	Back out "[compiled autograd] tls access helpers (#138061 )" and Back out "[compiled autograd] Compiled autograd configs in TLS (#137821 )" (#139086 ) Summary: Original commit changeset: 9bf80c1492d7 Original Phabricator Diff: D64796226 Original commit changeset: aa1d9ef8f6e6 Original Phabricator Diff: D64796212 Differential Revision: D65072644 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139086 Approved by: https://github.com/malfet	2024-10-28 23:37:05 +00:00
Animesh Jain	817b4988e4	[dynamo][config-cleanup] Remove enable_cpp_guard_manager=False codepath (#138512 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138512 Approved by: https://github.com/williamwen42, https://github.com/jansel	2024-10-25 16:41:55 +00:00
Mark Kim-Mulgrew	c7a20939b4	Remove unused enforce_cond_guards_match Dynamo feature flag. (#138589 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138589 Approved by: https://github.com/clee2000	2024-10-22 19:36:01 +00:00
Simon Fan	49fa437097	[compiled autograd] Compiled autograd configs in TLS (#137821 ) Multithreaded doesn't work yet, this adds python side TLS only for the python side state Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821 Approved by: https://github.com/jansel, https://github.com/yf225 ghstack dependencies: #137953	2024-10-22 08:03:52 +00:00
PyTorch MergeBot	361f42bc42	Revert "[compiled autograd] Compiled autograd configs in TLS (#137821 )" This reverts commit `9aba0b91c8`. Reverted https://github.com/pytorch/pytorch/pull/137821 on behalf of https://github.com/wdvr due to Reverting this for now, it is failing test_public_bindings in trunk ([comment](https://github.com/pytorch/pytorch/pull/137821#issuecomment-2417351788))	2024-10-16 16:38:29 +00:00
Simon Fan	9aba0b91c8	[compiled autograd] Compiled autograd configs in TLS (#137821 ) Multithreaded doesn't work yet, this adds python side TLS only for the python side state Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821 Approved by: https://github.com/jansel, https://github.com/yf225 ghstack dependencies: #137953	2024-10-16 09:28:32 +00:00
Angela Yi	f80ed0b831	[export] Custom op meta kernel generation (two pass) (#137277 ) Summary: Prototyping the custom op meta kernel generation. Rest of the changes are in fbcode/scripts/angelayi Test Plan: followup diff (D63837739) Differential Revision: D63837740 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137277 Approved by: https://github.com/zou3519	2024-10-07 15:34:19 +00:00
Bob Ren	f0ef7fddde	Add ignored/unmaintained comment for capture_autograd_function flag (#137309 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137309 Approved by: https://github.com/jansel ghstack dependencies: #136961	2024-10-04 20:02:37 +00:00
Bob Ren	a1f1f585ab	clean up error_on_nested_jit_trace flag (#136961 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136961 Approved by: https://github.com/jansel	2024-10-04 02:07:54 +00:00
Bob Ren	13ec343afe	clean up capture_func_transforms flag (#136960 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136960 Approved by: https://github.com/guilhermeleobas, https://github.com/jansel	2024-10-04 01:10:52 +00:00
Will Feng	a89e3c2490	Add compiled_autograd_kwargs_override Dynamo config (#136967 ) For Traceable FSDP2, the most common use case is to have `fullgraph=False` for forward pass (to allow user-level graph breaks), and `fullgraph=True` for compiled autograd backward pass (required for queue_callback support). With `torch._dynamo.compiled_autograd=True`, previously we are not able to set different `fullgraph` config value for forward vs. backward pass, since `rebuild_ctx` just reuses the forward compile config as-is. This PR adds `torch._dynamo.config.compiled_autograd_kwargs_override` config to allow forcing `fullgraph=True` for CA Dynamo tracing. With this PR, we can remove standalone compiled autograd ctx manager usage in Traceable FSDP2 unit tests, and consolidate on using `torch._dynamo.compiled_autograd=True`. Test commands: - `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_transformer_backend_inductor_fullgraph_True` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136967 Approved by: https://github.com/xmfan	2024-10-02 06:23:59 +00:00
Yuanhao Ji	be169f743b	[Dynamo] Mark `config.dead_code_elimination` as deprecated (#136933 ) part of #136862 For reviewers, all call sites are here: https://github.com/search?q=repo%3Apytorch%2Fpytorch+dead_code_elimination+language%3APython&type=code&l=Python Pull Request resolved: https://github.com/pytorch/pytorch/pull/136933 Approved by: https://github.com/williamwen42, https://github.com/anijain2305	2024-10-01 03:51:59 +00:00
Oguz Ulgen	a28b40fa74	Improve is_fbcode functionality (#136871 ) Summary: Previously is_fbcode just checked whether the checkout was git or not. This is extremely error prone. Lets make it fool-proof. Test Plan: unit tests Reviewed By: masnesral Differential Revision: D63545169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136871 Approved by: https://github.com/masnesral	2024-09-27 21:19:01 +00:00
Edward Z. Yang	a2d2a30311	Add torch._dynamo.config.fail_on_cache_limit_hit (#136767 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/136767 Approved by: https://github.com/albanD, https://github.com/jansel ghstack dependencies: #136533	2024-09-27 03:58:00 +00:00
William Wen	95e976a63f	[dynamo] recursively skip frames when Dynamo cache limit is hit (#135144 ) Fixes https://github.com/pytorch/pytorch/pull/135144 and [T197117723](https://www.internalfb.com/intern/tasks/?t=197117723). In general, adds `SkipCodeRecursiveException` to Dynamo - when raised in Dynamo, convert_frame will return a `skip_code_recursive_flag` back to C Dynamo, signaling it to skip the current frame and all recursive calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135144 Approved by: https://github.com/jansel, https://github.com/anijain2305	2024-09-06 21:38:53 +00:00
Animesh Jain	058a69d91a	[fbcode][dynamo] Turn on guard_nn_modules using justknobs_check (#134928 ) As Title Pull Request resolved: https://github.com/pytorch/pytorch/pull/134928 Approved by: https://github.com/ezyang	2024-09-05 22:05:54 +00:00
Laith Sakka	d6091c8726	Add compile time instruction count metric (#133834 ) PYTHONPATH=$(pwd) python benchmarks/update_hint_benchmark.py out as of this diff, compile_time_instruction_count counts the number of instruction from within convert_frame.compile_inner ``` update_hint_regression,compile_time_instruction_count,10522459165 ``` will add result from CI once populated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133834 Approved by: https://github.com/aorenste	2024-08-27 23:29:02 +00:00
Edward Z. Yang	66d6d8b1b9	Support TORCH_COMPILER_COLLECTIVES envvar (#133696 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133696 Approved by: https://github.com/Skylion007, https://github.com/c-p-i-o	2024-08-19 20:13:04 +00:00
Will Feng	6790eb52f9	[Traceable FSDP2] Set torch._dynamo.config.skip_fsdp_hooks to True by default (#133531 ) Setting `torch._dynamo.config.skip_fsdp_hooks = True` is required for graph-break compiled FSDP2, thus setting it to default will make this adoption easier. If users want to use Traceable FSDP2, they can set this to False manually (which will allow FSDP2 hooks to be traced through). Pull Request resolved: https://github.com/pytorch/pytorch/pull/133531 Approved by: https://github.com/awgu ghstack dependencies: #133532	2024-08-16 17:18:42 +00:00
Aart Bik	a8490a0762	[traced-graph][sparse] propagate sparsity in fx graph (#131920 ) This PR proceeds with implementing the feature request #117188 by generalizing more cases that already work with COO to work with the compressed sparse formats as well. Feature request: https://github.com/pytorch/pytorch/issues/117188 Rebranch of older PRs (for history): https://github.com/pytorch/pytorch/pull/131474 https://github.com/pytorch/pytorch/pull/128549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131920 Approved by: https://github.com/ezyang	2024-08-05 15:49:53 +00:00
Xuehai Pan	e74ba1b34a	[BE][Easy][15/19] enforce style for empty lines in import segments in `torch/_d*/` (#129767 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129767 Approved by: https://github.com/anijain2305	2024-07-31 21:18:11 +00:00
Animesh Jain	f44446e851	[dynamo] Turn on inline_inbuilt_nn_modules (#131275 ) Known issues that are deliberately kept open and will be fixed later are tracked here - https://github.com/pytorch/pytorch/issues/131696 Training dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/anijain2305/435/head&lCommit=408b9358b8fca3a5d08b39741419fe8a596941aa&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/08ef081c-37d7-436d-905b-4b9e2b470644) Inference dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=gh/anijain2305/435/head&lCommit=914244fa2fe0055917e039e35183b21fa90afdc6&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/32136eff-a39e-4cde-a438-e51a665bc3c9) Inference sees a little bit more perf degradation but we are ok with that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131275 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #132053	2024-07-29 20:01:51 +00:00
PyTorch MergeBot	7ef927da15	Revert "[dynamo] Turn on inline_inbuilt_nn_modules (#131275 )" This reverts commit `6de65d5dd4`. Reverted https://github.com/pytorch/pytorch/pull/131275 on behalf of https://github.com/atalman due to Broke CI: dynamo/test_structured_trace.py::StructuredTraceTest::test_ddp_graphs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10132084288/job/28016215101) [HUD commit link](`6de65d5dd4`) ([comment](https://github.com/pytorch/pytorch/pull/131275#issuecomment-2255839646))	2024-07-29 12:48:27 +00:00
Animesh Jain	6de65d5dd4	[dynamo] Turn on inline_inbuilt_nn_modules (#131275 ) Known issues that are deliberately kept open and will be fixed later are tracked here - https://github.com/pytorch/pytorch/issues/131696 Training dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/anijain2305/435/head&lCommit=408b9358b8fca3a5d08b39741419fe8a596941aa&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/08ef081c-37d7-436d-905b-4b9e2b470644) Inference dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=gh/anijain2305/435/head&lCommit=914244fa2fe0055917e039e35183b21fa90afdc6&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/32136eff-a39e-4cde-a438-e51a665bc3c9) Inference sees a little bit more perf degradation but we are ok with that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131275 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #131744, #131928, #131948	2024-07-28 13:23:00 +00:00
PyTorch MergeBot	14920c149b	Revert "[dynamo] Turn on inline_inbuilt_nn_modules (#131275 )" This reverts commit `0455344777`. Reverted https://github.com/pytorch/pytorch/pull/131275 on behalf of https://github.com/clee2000 due to I think this broke inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmDynamicShapesCPU::test_quantized_linear_amx_dynamic_shapes_batch_size_16_in_features_4_out_features_64_bias_True_cpu [GH job link](https://github.com/pytorch/pytorch/actions/runs/10102272826/job/27938970118) [HUD commit link](`0455344777`) not run on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/131275#issuecomment-2251609554))	2024-07-26 00:12:40 +00:00
Animesh Jain	0455344777	[dynamo] Turn on inline_inbuilt_nn_modules (#131275 ) Known issues that are deliberately kept open and will be fixed later are tracked here - https://github.com/pytorch/pytorch/issues/131696 Training dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/anijain2305/435/head&lCommit=408b9358b8fca3a5d08b39741419fe8a596941aa&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/08ef081c-37d7-436d-905b-4b9e2b470644) Inference dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=gh/anijain2305/435/head&lCommit=914244fa2fe0055917e039e35183b21fa90afdc6&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/32136eff-a39e-4cde-a438-e51a665bc3c9) Inference sees a little bit more perf degradation but we are ok with that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131275 Approved by: https://github.com/ezyang ghstack dependencies: #131744	2024-07-25 22:14:17 +00:00
Edward Z. Yang	0c6f1ca064	Introduce torch._dynamo.config.enable_compiler_collectives for syncing compilation across ranks (#130935 ) This PR implements an opt-in configuration option for synchronizing compilation across all ranks at the end of Dynamo tracing (and potentially, other places in the future). There are two pieces to this PR: 1. Implementing infrastructure for compiler collectives (DistributedState/LocalState, the actual collective) 2. Using this infrastructure to synchronize automatic dynamic choices across all ranks The infrastructure in part one can be used for other purposes, just add more (serializable) fields to LocalState. Here is how automatic dynamic synchronization works: 1. Preflight in "torch/_dynamo/variables/builder.py": On the first Dynamo trace run, we trace without automatic dynamic at all; we assume all Tensor inputs that are not otherwise marked are static. This run is purely to collect all Tensor input sizes in the program. 2. torch/_dynamo/output_graph.py: At the end of the first Dynamo trace run, we perform a compiler collective to distribute all Tensor input sizes to all ranks. Then, we restart Dynamo 3. Apply the updates in "torch/_dynamo/variables/builder.py": Now that we have all sizes for every rank, we now update frame state with the observed sizes for all ranks, in rank order. Under the assumption that frame state is consistent on all ranks, this series of updates will preserve consistency. For future work, it would be safer if we force a consistent hint on all ranks; this is more involved as we have to interpose in fakification. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130935 Approved by: https://github.com/jansel	2024-07-24 11:24:11 +00:00
Pian Pawakapan	988ed4d5db	[export] clean up allow_complex_guards_as_runtime_asserts flag (#130596 ) Summary: removes underscore, cleans up dead code in DimConstraints Test Plan: existing export tests Reviewed By: angelayi Differential Revision: D59612746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130596 Approved by: https://github.com/angelayi	2024-07-12 17:17:11 +00:00
Yanbo Liang	111f9b5d44	[Dynamo] Add config to skip/inline torchrec (#129912 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/129912 Approved by: https://github.com/anijain2305	2024-07-03 00:14:51 +00:00
Elias Ellison	b8e5678ad2	Delete lazy ddp optimizer (#120727 ) This is no longer necessary now that the normal ddp optimizer works correctly with inductor strides. Differential Revision: [D54858819](https://our.internmc.facebook.com/intern/diff/D54858819) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120727 Approved by: https://github.com/jansel, https://github.com/yf225	2024-06-26 21:53:54 +00:00
Will Feng	575bc1e3af	[Reopen #114036 ] Allow "must recompute" in torch.compile + selective checkpointing (SAC) (#129295 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129295 Approved by: https://github.com/Chillee	2024-06-25 23:47:08 +00:00
Will Feng	dadc0ed4c8	[Traceable FSDP2] Add `aot_eager` backend E2E tests for transformer model (#129157 ) This PR adds Traceable FSDP2 `aot_eager` backend E2E tests for simple MLP as well as transformer model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129157 Approved by: https://github.com/awgu ghstack dependencies: #129203	2024-06-23 06:11:11 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
Michael Lazos	2129903aa3	Properly detect nested torch function args (#127496 ) Dynamo was not detecting nested torch function classes in containers. This was due to pytree compatibility for variable trackers being removed. Fixes https://github.com/pytorch/pytorch/issues/127174 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127496 Approved by: https://github.com/anijain2305	2024-06-02 03:43:22 +00:00
Simon Fan	ec098b88b6	[compiled autograd] torch.compile API (#125880 ) - enter existing compiled autograd ctx manager before entering torch.compile frames Pull Request resolved: https://github.com/pytorch/pytorch/pull/125880 Approved by: https://github.com/jansel	2024-05-31 04:38:20 +00:00
PyTorch MergeBot	ce63b676f3	Revert "[compiled autograd] torch.compile API (#125880 )" This reverts commit `e1c322112a`. Reverted https://github.com/pytorch/pytorch/pull/125880 on behalf of https://github.com/atalman due to sorry your PR broke lint, need to revert ([comment](https://github.com/pytorch/pytorch/pull/125880#issuecomment-2139605376))	2024-05-30 13:53:31 +00:00
Simon Fan	e1c322112a	[compiled autograd] torch.compile API (#125880 ) - enter existing compiled autograd ctx manager before entering torch.compile frames Pull Request resolved: https://github.com/pytorch/pytorch/pull/125880 Approved by: https://github.com/jansel	2024-05-30 02:10:06 +00:00
Pian Pawakapan	8a31c2aa84	[export] allow complex guards as runtime asserts (#127129 ) With the current state of export's dynamic shapes, we struggle with guards and constraints that are beyond the current dynamic shapes language, expressed with dims and derived dims. While we can compile and guarantee correctness for guards within the current language (e.g. min/max ranges, linear relationships, integer divisibility) we struggle to dynamically compile guards which extend beyond that. For these "complex" guards, we typically do either of the following: 1) raise a constraint violation error, along the lines of "not all values of <symbol> in the specified range satisfy <guard>", with or without suggested fixes, 2) specialize to the provided static values and suggest removing dynamism, or 3) fail compilation due to some arbitrary unsupported case. Previous [work](https://github.com/pytorch/pytorch/pull/124949) went towards resolving this by disabling forced specializations, instead allowing the user to fail at runtime with incorrect inputs. In this PR, relying on [hybrid backed-unbacked symints](https://github.com/pytorch/pytorch/issues/121749), [deferred runtime asserts](https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/runtime_assert.py), and the function [_is_supported_equivalence()](`d7de4c9d80/torch/fx/experimental/symbolic_shapes.py (L1824)`), we add a flag `_allow_complex_guards_as_runtime_asserts` which allows the user to compile exported programs containing these guards and maintain dynamism, while adding correctness checks as runtime assertions in the graph. Hybrid backed-unbacked symints allow us to easily bypass "implicit" guards emitted from computation - guards that we ~expect to be true. Popular examples revolve around reshapes: ``` # reshape def forward(self, x, y): # x: [s0, s1], y: [s2] return x.reshape([-1]) + y # guard s0 * s1 = s2 This leads to the following exported program class GraphModule(torch.nn.Module): def forward(self, x: "f32[s0, s1]", y: "f32[s2]"): sym_size_int: "Sym(s2)" = torch.ops.aten.sym_size.int(y, 0) mul: "Sym(-s2)" = -1 * sym_size_int; sym_size_int = None sym_size_int_1: "Sym(s0)" = torch.ops.aten.sym_size.int(x, 0) sym_size_int_2: "Sym(s1)" = torch.ops.aten.sym_size.int(x, 1) mul_1: "Sym(s0s1)" = sym_size_int_1 sym_size_int_2; sym_size_int_1 = sym_size_int_2 = None add: "Sym(s0s1 - s2)" = mul + mul_1; mul = mul_1 = None eq: "Sym(Eq(s0s1 - s2, 0))" = add == 0; add = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq, "Runtime assertion failed for expression Eq(s0s1 - s2, 0) on node 'eq'"); eq = None view: "f32[s0s1]" = torch.ops.aten.view.default(x, [-1]); x = None add_1: "f32[s0s1]" = torch.ops.aten.add.Tensor(view, y); view = y = None return (add_1,) ``` Another case is symbol divisibility: ``` def forward(self, x): # x: [s0, s1] return x.reshape([-1, x.shape[0] - 1]) # Eq(Mod(s0 s1, s0 - 1), 0) ``` Applying deferred runtime asserts also helps dynamic compilation for "explicit" complex guards that typically cause problems for export. For example we can generate runtime asserts for not-equal guards, and complex conditions like the following: ``` class Foo(torch.nn.Module): def forward(self, x, y): # check that negation of first guard also shows up as runtime assertion if x.shape[0] == y.shape[0]: # False return x + y elif x.shape[0] == y.shape[0] 3: # False return x + 2, y + 3 elif x.shape[0] 2 == y.shape[0] * 3: # True return x * 2.0, y * 3.0 ``` For the above graph we will generate 3 runtime assertions: the negation of the first 2, and the 3rd condition as a guard. One additional benefit here over the current state of exported programs is that this adds further correctness guarantees - previously with explicit complex guards, if compilation succeeded, the guards would be ignored at runtime, treated as given. As shown above, the runtime asserts appear as math ops in the graph, generated by the sympy interpreter, resulting in an _assert_scalar call. There is an option to avoid adding these asserts into the graph, by setting `TORCH_DYNAMO_DO_NOT_EMIT_RUNTIME_ASSERTS=1`. This results in the "original" computation graph, with dynamism, and any incorrect inputs will fail on ops during runtime. Further work could go into prettifying the printer, so the majority of the graph isn't guard-related. Ideally this PR would subsume and remove the recently added [_disable_forced_specializations](https://github.com/pytorch/pytorch/pull/124949) flag, but that flag still handles one additional case of specialization: single-variable equalities where the symbol is solvable for a concrete value: see this [PR](https://github.com/pytorch/pytorch/pull/126925) This PR doesn't change any behavior around data-dependent errors/unbacked symints yet, that could be further work. NOTE: will take naming change suggestions for the flag :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127129 Approved by: https://github.com/avikchaudhuri	2024-05-29 17:15:25 +00:00
dshi7	0f67d38f0f	add TORCHDYNAMO_CAPTURE_DYNAMIC_OUTPUT_SHAPE_OPS (#127017 ) tlparse prints failure description like this > dynamic shape operator: aten._unique2.default; to enable, set torch._dynamo.config.capture_dynamic_output_shape_ops = True adding os env var to set it easier for testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/127017 Approved by: https://github.com/jackiexu1992	2024-05-25 05:42:41 +00:00
Alexandre Ghelfi	b3a8a3cbab	Fix typos in `torch._dynamo.config.py` (#126150 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126150 Approved by: https://github.com/Skylion007	2024-05-14 14:27:35 +00:00
Edward Z. Yang	2ba102f689	Implement native support for float inputs in Dynamo and ShapeEnv (#125325 ) The big idea is that floats are treated as Tensors on input/output to the FX graph, but on the inside, we immediately call item() on the synthetic Tensor and record regular float operations on it. Canonicalization to Tensor operations will happen in a standalone FX pass. This behavior is controlled by `specialize_float` config variable when set to False. The generated graph looks like this for the test `test_unspec_float_output`: ``` def forward(self, L_x_: "f32[3]", L_y_: "f32[]"): l_x_ = L_x_ l_y_ = L_y_ # File: /data/users/ezyang/a/pytorch/test/dynamo/test_unspec.py:511 in f, code: return x + 1, y * 2 add: "f32[3]" = l_x_ + 1; l_x_ = None item: "Sym(zf0)" = l_y_.item(); l_y_ = None mul: "Sym(2zf0)" = item 2; item = None scalar_tensor: "f32[]" = torch.scalar_tensor(mul); mul = None return (add, scalar_tensor) ``` The ingredients: * torch/_dynamo/variables/builder.py When `specialize_float` is False, we wrap float literals with `wrap_symfloat`. This is an unholy mashup of `wrap_symint` and `wrap_unspecialized_primitive`. The overall strategy is that we first generate a tensor argument (because that's what we want to show up into the FX graph), but then immediately call item() on the tensor argument to get a SymNodeVariable, which we will do the rest of the tracing with. Importantly, this SymNodeVariable is backed with the source of the original float: this means we can guard on the resulting value (something we could NOT do with UnspecializedPythonVariable). This has to be done manually, because if you literally call item() on the tensor, you will end up with an unbacked float. There is a bit of copy paste from wrap_symint and wrap_unspecialized_primitive which we can try to factor out, but this really is its own thing and you should review every line of code in the function. * torch/fx/experimental/symbolic_shapes.py We now can generate guards on float inputs, and these guards are handled inside of ShapeEnv. So we need to be able to allocate (backed!) float symbols, and produce guards for them. Fairly straightforward generalization. * torch/_dynamo/codegen.py I also need to maintain the invariant that there are no float outputs to the FX graph. I chose to do this at codegen time. When we detect a SymNodeVariable on the return stack for a float, we on the fly convert it (via `as_tensor`) to a TensorVariable, which is the true output. We then special case the output bytecode to call item() on it again. The tensor conversion is memoized on SymNodeVariable since we typically run the code generation process twice. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125325 Approved by: https://github.com/lezcano, https://github.com/jansel	2024-05-14 04:10:01 +00:00
Animesh Jain	0935b3d794	[dynamo] Turn on guard_nn_modules (#125202 ) Turning on guard_nn_modules adds large number of guards, so we are bound to take a perf hit. But the perf hit is small. These are the numbers ![image](https://github.com/pytorch/pytorch/assets/13822661/c8793906-c8c7-432b-9af4-4594713067be) First we observe that compared to Python guards, C++ guards give around 6x speedup. This reduces the total time spent in guards. This is shown in the last column (cpp_guards/inductor_optimized_latency). The worst model is around 1.61%, with most of the models below 1%. I think this is good enough signal to turn the config on. One might also wonder how much guard slowdown occurs with `guard_nn_modules=True`. This is the table ![image](https://github.com/pytorch/pytorch/assets/13822661/932a885b-1c03-424b-8405-5bc8fd35dd39) For most models, the guard overhead with nn module guards is under 2x. There are a few outliers, where the slowdown is really high and for those models we spend 1%-2% time in C++ guards as shown in first table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125202 Approved by: https://github.com/ezyang	2024-05-11 19:28:24 +00:00
Animesh Jain	b62e89c1b8	[dynamo] Do not turn on record relay with TORCH_COMPILE_DEBUG (#125488 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125488 Approved by: https://github.com/yanboliang, https://github.com/mlazos	2024-05-04 05:10:31 +00:00
Boyuan Feng	b91f83f181	[cudagraph] add config for cudagraph managed input mutation support (#124754 ) Summary: [#123231](https://github.com/pytorch/pytorch/pull/123231) adds cudagraph supports for more types of functions (i.e., cudagraph managed input mutation). These newly supported functions may have mutated static inputs, leading to assertion errors in some workload which skip cudagraph previously. This diff adds a config to opt in the new feature. Test Plan: ci Differential Revision: D56481353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124754 Approved by: https://github.com/eellison	2024-04-24 04:23:53 +00:00
Animesh Jain	704fac5618	[dynamo][cpp-guard] Reland Attempt 1 - Enable cpp guard manager (#124231 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124231 Approved by: https://github.com/jansel ghstack dependencies: #124230, #124237	2024-04-18 06:36:20 +00:00
PyTorch MergeBot	530bf391cc	Revert "[dynamo] Turn on CPP guard manager (#123547 )" This reverts commit `3e98bdd66d`. Reverted https://github.com/pytorch/pytorch/pull/123547 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/123547#issuecomment-2058337419))	2024-04-16 06:38:15 +00:00
willfengg	f1654fd4b0	[PT2D][FSDP] skip FSDP hooks base on dynamo config (#123021 ) unit test: `pytest test/distributed/_composable/fsdp/test_fully_shard_compile.py` For FSDP, we turn on/off compiling hooks base on `torch._dynamo.config.skip_fsdp_hooks` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123021 Approved by: https://github.com/yf225, https://github.com/anijain2305	2024-04-13 01:47:25 +00:00
Animesh Jain	3e98bdd66d	[dynamo] Turn on CPP guard manager (#123547 ) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/123547 Approved by: https://github.com/jansel	2024-04-12 23:30:56 +00:00
Jason Ansel	70b8c58f84	[dynamo] Emit warning to turn on capture_scalar_outputs (#123896 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123896 Approved by: https://github.com/anijain2305 ghstack dependencies: #123700, #123705, #123786, #123790, #123803, #123804	2024-04-12 19:03:13 +00:00
Oguz Ulgen	57a2032c7a	Delete Lark (#123689 ) Now that we are using MLIR bindings inside triton, lets delete Lark parser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123689 Approved by: https://github.com/jansel	2024-04-11 05:51:06 +00:00
PyTorch MergeBot	6b18daf205	Revert "Delete Lark (#123689 )" This reverts commit `a631461eef`. Reverted https://github.com/pytorch/pytorch/pull/123689 on behalf of https://github.com/PaliC due to This PR seems to be breaking test_binary_ufuncs.py ([comment](https://github.com/pytorch/pytorch/pull/123689#issuecomment-2048489549))	2024-04-10 21:48:04 +00:00
Oguz Ulgen	a631461eef	Delete Lark (#123689 ) Now that we are using MLIR bindings inside triton, lets delete Lark parser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123689 Approved by: https://github.com/jansel	2024-04-10 19:41:54 +00:00
Peter Bell	8865425ff7	[minifier] Add config flag to ignore non-fp values (#123006 ) When minifying, the after-aot minifier ignores non-floating values by default but does check them when running the the initial graph dump step. This means we may capture a graph that doesn't fail the tester and doesn't have any meaningful divergence. For example, the derivative of `elu(x)` depends on `x > 0` so this value is saved for backwards and so becomes a graph output. However, the difference between `FLT_MIN` and `0` in `x` is now enough to trigger an accuracy failure. I fix this by adding a config variable and environment variable to ignore these non floating point values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123006 Approved by: https://github.com/ezyang ghstack dependencies: #123005	2024-04-09 03:34:09 +00:00
Guilherme Leobas	84658d9c4f	Enable `capture_func_transforms` by default (#122211 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122211 Approved by: https://github.com/zou3519	2024-04-05 03:29:11 +00:00
eellison	5f46312dbb	Reapply "Switch cudagraph backend to cudagraph trees (#121019 )" and "Add Cudagraphs disable checking (#121018 )" (#121864 ) (#122713 ) This reverts commit `92ed8553a6`. No longer importing codecache or boxed_nop at top level, both of which casued issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122713 Approved by: https://github.com/anijain2305	2024-04-02 16:11:00 +00:00
Animesh Jain	60f3c092d4	[dynamo] Config option to Inline builtin nn module forward (#122725 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122725 Approved by: https://github.com/jansel ghstack dependencies: #122646, #122647, #122716, #122769, #122818	2024-03-28 03:01:27 +00:00
Animesh Jain	c108696228	[dynamo][guards-cpp-refactor][easy] Env variable to turn on cpp manager (#122646 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122646 Approved by: https://github.com/jansel	2024-03-27 19:40:37 +00:00
IvanKobzarev	9b095c3fe6	[dynamo] Config to not emit runtime asserts (#122603 ) Repetition on squashed & merged by mistake https://github.com/pytorch/pytorch/pull/122406 Differential Revision: [D55312394](https://our.internmc.facebook.com/intern/diff/D55312394) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122603 Approved by: https://github.com/ezyang	2024-03-25 21:17:44 +00:00
Animesh Jain	8860c625ea	[dynamo][guards-cpp-refactor] Integrate cpp guard manager with CheckFnManager (#120726 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120726 Approved by: https://github.com/jansel	2024-03-19 03:11:31 +00:00
Animesh Jain	f84d560236	[dynamo] Raise accumulated cache size limit (#122130 ) Fixes #114511 This was raised by IBM folks where the a LLM compile was failing because it had more than 64 layers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122130 Approved by: https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #121954, #122005	2024-03-19 02:35:48 +00:00
Animesh Jain	92ed8553a6	Revert "Switch cudagraph backend to cudagraph trees (#121019 )" and "Add Cudagraphs disable checking (#121018 )" (#121864 ) This reverts commit `9373ad0bb8`. Revert "Add Cudagraphs disable checking (#121018)" This reverts commit `4af0e634bf`. Causes compilation time increase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121864 Approved by: https://github.com/eellison	2024-03-15 00:03:09 +00:00
eellison	4af0e634bf	Add Cudagraphs disable checking (#121018 ) Adds the same cudagraphs disable checking from inductor - cudagraph trees to cudagraphs backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121018 Approved by: https://github.com/ezyang ghstack dependencies: #121017	2024-03-08 22:47:24 +00:00
PyTorch MergeBot	3d7cf8f392	Revert "Limit loop unrolling (#120023 )" This reverts commit `6cc7f9a2e6`. Reverted https://github.com/pytorch/pytorch/pull/120023 on behalf of https://github.com/anijain2305 due to breaks llms export ([comment](https://github.com/pytorch/pytorch/pull/120023#issuecomment-1974104633))	2024-03-02 00:04:08 +00:00
angelayi	c844b377fa	[dynamo] Reorder logs (#116106 ) Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792. Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600 There are some limitations to the printing right now: * You can only register logging functions, not methods * Inputs to the logging functions can only be tensors, constants, and format strings * Inputs to the logging functions which will later be mutated in-place will not be printed correctly TODO: Add the following tests * print function with argument of nested data structure; * print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly); * custom defined logging functions with nn.Module or nn.Module attribute arguments; * custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value); * custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage); Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106 Approved by: https://github.com/yanboliang	2024-03-01 17:04:24 +00:00
PyTorch MergeBot	63b259492a	Revert "[dynamo] Reorder logs (#116106 )" This reverts commit `c5472628ff`. Reverted https://github.com/pytorch/pytorch/pull/116106 on behalf of https://github.com/clee2000 due to landrace with `342e7929b8`, which removed the import for warnings. Should be an easy fix after rebase `c5472628ff` ([comment](https://github.com/pytorch/pytorch/pull/116106#issuecomment-1972586180))	2024-03-01 06:25:46 +00:00
Angela Yi	c5472628ff	[dynamo] Reorder logs (#116106 ) Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792. Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600 There are some limitations to the printing right now: * You can only register logging functions, not methods * Inputs to the logging functions can only be tensors, constants, and format strings * Inputs to the logging functions which will later be mutated in-place will not be printed correctly TODO: Add the following tests * print function with argument of nested data structure; * print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly); * custom defined logging functions with nn.Module or nn.Module attribute arguments; * custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value); * custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage); Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106 Approved by: https://github.com/yanboliang	2024-03-01 04:48:44 +00:00
Aaron Orenstein	6cc7f9a2e6	Limit loop unrolling (#120023 ) Tacotron2 causes massive loop unrolling resulting in very large graphs (26k nodes) which was causing inductor (and tracing itself) to choke. The unrolling size is controlled by the environment variable TORCHDYNAMO_MAX_LOOP_UNROLL_NODES which defaults to the arbitrary value 5000. This updates the tacotron2 timings as follows: eager timing: 3m:23s -> 35s aot_eager timing: 4m:12s -> 39s inductor timing: 22m:24s ->1m For reference the big loop in tacotron2 was this one (model.py[405]): ``` while len(mel_outputs) < decoder_inputs.size(0) - 1: decoder_input = decoder_inputs[len(mel_outputs)] mel_output, gate_output, attention_weights = self.decode(decoder_input) mel_outputs += [mel_output.squeeze(1)] gate_outputs += [gate_output.squeeze(1)] alignments += [attention_weights] ``` which gets unrolled and inlined adding about 36 nodes to the graph per iteration. Fixes #98467 Relates to #102839 which hopefully will result in a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120023 Approved by: https://github.com/yanboliang	2024-02-27 20:44:21 +00:00
Sam Larsen	2fb32a5f07	Enable fake tensor caching in fbcode by default (#118555 ) Summary: Enabled by default in OSS; this switches the default to "on" in fbcode too. Test Plan: Ran torchbench benchmarks in fbcode Differential Revision: [D53771626](https://our.internmc.facebook.com/intern/diff/D53771626) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118555 Approved by: https://github.com/eellison	2024-02-26 17:35:23 +00:00
PyTorch MergeBot	7d780ff86f	Revert "Enable fake tensor caching in fbcode by default (#118555 )" This reverts commit `0f2fbbff10`. Reverted https://github.com/pytorch/pytorch/pull/118555 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing one model test internally. Please take a look at the diff for more info D53189048 ([comment](https://github.com/pytorch/pytorch/pull/118555#issuecomment-1939550273))	2024-02-12 20:51:23 +00:00
Sam Larsen	0f2fbbff10	Enable fake tensor caching in fbcode by default (#118555 ) Summary: Enabled by default in OSS; this switches the default to "on" in fbcode too. Test Plan: Ran torchbench benchmarks in fbcode Differential Revision: [D53189048](https://our.internmc.facebook.com/intern/diff/D53189048) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118555 Approved by: https://github.com/eellison	2024-02-09 05:42:16 +00:00
Chien-Chin Huang	1d2382f141	[DDP] Use compiled_autograd to trace DDP backward allreduce (#110662 ) Summary The reducer of `DistributedDataParallel` is implemented with C++ and it is not easy to trace the allreduce launched in the reducer. This PR modifies `DistributedDataParallel` to launch one allreduce per gradient when `compiled_autograd` is enabled. The changes allow us to use `compiled_autograd` to trace the allreduce and later be optimized (fused) in the Inductor. Key Logic 1. If `ddp_python_hook` is True, we assume `compiled_autograd` is used. `DistributedDataParallel` registers `compiled_accum_grad_hook` for all parameters. 2. In the first forward() call, if `DistributedDataParallel` is not compiled, all `compiled_accum_grad_hook` are deregistered. If `DistributedDataParallel` is compiled, all `compiled_accum_grad_hook` will be compiled by `compiled_autograd`. 3. `compiled_accum_grad_hook` launches an allreduce to reduce the gradient of the parameter. Bucketing The compiled backward is slow because there is no bucketing for the allreduces. We rely on Inductor to bucket the allreduces. The bucketing is done in a separate PR. Differential Revision: [D49428482](https://our.internmc.facebook.com/intern/diff/D49428482/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110662 Approved by: https://github.com/wconstab	2024-02-08 03:03:15 +00:00
Will Constable	abe3c55a6a	Update DDP dynamo debug docs (#118295 ) Refreshes https://github.com/pytorch/pytorch/pull/114201 and updates it to include other log names that also include ddp_optimizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118295 Approved by: https://github.com/LucasLLC, https://github.com/wanchaol	2024-01-29 14:58:26 +00:00
Oguz Ulgen	47b5a6b05d	[Dynamo] Analyze triton kernels via tracing to determine mutations (#117300 ) This PR adds TTIR lexing and parsing in order to analyze which of the user defined triton kernel inputs are mutated. Differential Revision: [D53165999](https://our.internmc.facebook.com/intern/diff/D53165999) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117300 Approved by: https://github.com/jansel	2024-01-29 06:37:08 +00:00
Edward Z. Yang	46712b019d	Enable local_partial_types (#118467 ) When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418, #118432	2024-01-28 13:38:22 +00:00
Shunting Zhang	fe10b1800f	LazyGraphModule (#117911 ) I feel it's easier to open a new PR rather than iterating on the previous PR (https://github.com/pytorch/pytorch/pull/105257 ) since this is more like a rewrite. In this PR, instead of changing GraphModule directly which can easily causes BC issue, I create a LazyGraphModule class as Zachary & Jason suggested in comments from the previous PR. The difference between LazyGraphModule and GraphModule is mainly about how re-compile for the graph module happens. In GraphModule the recompilation happens 'eagerly': constructing a GraphModule will cause the recompilation. While in LazyGraphModule, we just mark the module as needing recompilation. The real recompilation only happens when absolutely required (e.g. call forward method, access the code property etc.). In a lot of cases in torch.compile, the real recompilation eventually is not triggered at all. This can save a few seconds of compilation time. By default, GraphModule rather than LazyGraphModule is used. `use_lazy_graph_module(True)` context manager can be used to pick LazyGraphModule instead. This has been applied to the torch.compile stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117911 Approved by: https://github.com/jansel	2024-01-27 04:10:18 +00:00

1 2 3 4 5 ...

313 Commits