pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Brian Hirsh	0d71a9dd5b	fix incorrect interaction between DDPOptimizer and donated buffers (#160745 ) This should fix https://x.com/wightmanr/status/1953147089518772254?t=ng_R4t0-tRhO_qQE8NqOhw&s=19. Still working on adding a reasonable test. You can see more of a description of the problem in the code comments. But the TLDR is that: * When using DDPOptimizer, we partition the graph and compile several subgraphs. So 1 dynamo graphs becomes N AOT/inductor artifacts * We have some existing logic to stash graph metadata (`fw_metadata`) in dynamo's TracingContext. When using DDPOptimizer, we generate one `fw_metadata` per AOT graph, and we stash it on the 1 TracingContext from dynamo. So we end up clobbering the `fw_metadata` for graph i-1 when AOT and inductor start compiling graph i * This is normally ok, but it becomes a problem if inductor ever wants to read from this `fw_metadata` during backward compilation. Why? We (by default) compile the backwards lazily. So when using DDPOptimizer, we will compile backward graph N, then bw graph N-1, etc. But... at the time that we have stated compiling bw graph N-1, its corresponding fw_metadata has already been clobbered! So we end up reusing graph N's metadata for all of our backward graph compilations. With donated buffer metadata, that means we end up donated and writing into incorrect input buffers The fix that I added was to add more dedicated DDPOptimizer metadata into the TracingContext, so we can properly switch between these N different `fw_metadata` objects in the backward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160745 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-09-04 21:57:27 +00:00
Lucas Kabela	453cfa5153	typing distributed.py (#160365 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/160365 Approved by: https://github.com/StrongerXi ghstack dependencies: #160362, #160363, #160364	2025-08-15 02:09:31 +00:00
Boyuan Feng	38410cf9b5	Fix DDPOptimizer issue on static tensor index (#155746 ) We rely on `_try_get_metadata_from_dynamo()` to get static input indices. When the meta info is missing, it just returns an empty list of static input indices. This wrong list of static input indices lead to repeated cudagraph re-recording, which looks like a hang from the user perspective. `bc3972b80a/torch/_functorch/aot_autograd.py (L1025-L1031)` The root cause is `split_module` in DDP Optimizer loses meta info and gm attributes. This PR fixes the issue by propagating these metadata from original module to submodules. `bc3972b80a/torch/_dynamo/backends/distributed.py (L515-L517)` Fixes #140395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155746 Approved by: https://github.com/xmfan, https://github.com/bdhirsh	2025-06-14 00:15:58 +00:00
Xuehai Pan	3ce352e389	[BE][PYFMT] migrate PYFMT for `torch._dynamo` to `ruff format` (#144549 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144549 Approved by: https://github.com/jansel	2025-02-28 03:03:53 +00:00
Raymond Li	21c2565f35	Document dynamo (#146736 ) Many files in dynamo are currently lacking file/module-level documentation, which makes it hard to know what they do at a glance and without digging into the code. This fixes that. Note: documentation was AI-generated and could be incorrect, please review carefully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146736 Approved by: https://github.com/jansel, https://github.com/StrongerXi, https://github.com/anijain2305, https://github.com/zou3519	2025-02-13 00:02:21 +00:00
Aaron Orenstein	a79100ab11	PEP585 update - torch/_dynamo (#145105 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105 Approved by: https://github.com/bobrenjc93	2025-01-18 20:47:11 +00:00
Simon Fan	f4969c8235	fix torch.compile + ddp + non-reentrant AC pack hook firing count (#144271 ) FIXES https://github.com/pytorch/pytorch/issues/144035 In order to preserve hook firing semantics, we disabled pack/unpack hooks for torch.compile: https://github.com/pytorch/pytorch/pull/123196. In DDP under torch.compile, there's this other callsite that we need to disable hooks for Pull Request resolved: https://github.com/pytorch/pytorch/pull/144271 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer	2025-01-07 21:08:52 +00:00
Tom Ritchford	dc23f1944a	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-12 17:39:14 +00:00
PyTorch MergeBot	5c97ac9721	Revert "Remove unused Python variables in torch/[_-a]* (#133492 )" This reverts commit `fda975a7b3`. Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else. The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))	2024-12-11 17:29:12 +00:00
Tom Ritchford	fda975a7b3	Remove unused Python variables in torch/[_-a]* (#133492 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492 Approved by: https://github.com/albanD	2024-12-10 21:48:44 +00:00
Aaron Gokaslan	12e95aa4ee	[BE]: Apply PERF401 autofixes from ruff (#140980 ) * Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-11-20 17:52:07 +00:00
chilli	392221b390	Made DDPOptimizer work with HOPs (#138787 ) Fixes https://github.com/pytorch/pytorch/issues/137481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138787 Approved by: https://github.com/yf225 ghstack dependencies: #138733, #138794, #138881	2024-10-25 18:59:01 +00:00
Oguz Ulgen	6e79932543	Add basic mypy annotations to dynamo (#132415 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132415 Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu	2024-08-04 18:43:36 +00:00
PyTorch MergeBot	3558a8cf4a	Revert "Add basic mypy annotations to dynamo (#132415 )" This reverts commit `71e22e0959`. Reverted https://github.com/pytorch/pytorch/pull/132415 on behalf of https://github.com/ZainRizvi due to Sorry, this PR has entered a weird state in the diff train. Trying to revert it to skip it, and then we can try relanding it ([comment](https://github.com/pytorch/pytorch/pull/132415#issuecomment-2267631785))	2024-08-04 18:39:29 +00:00
Oguz Ulgen	71e22e0959	Add basic mypy annotations to dynamo (#132415 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132415 Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu	2024-08-01 20:14:25 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Xuehai Pan	e74ba1b34a	[BE][Easy][15/19] enforce style for empty lines in import segments in `torch/_d*/` (#129767 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129767 Approved by: https://github.com/anijain2305	2024-07-31 21:18:11 +00:00
Elias Ellison	b8e5678ad2	Delete lazy ddp optimizer (#120727 ) This is no longer necessary now that the normal ddp optimizer works correctly with inductor strides. Differential Revision: [D54858819](https://our.internmc.facebook.com/intern/diff/D54858819) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120727 Approved by: https://github.com/jansel, https://github.com/yf225	2024-06-26 21:53:54 +00:00
Animesh Jain	a0604193a2	handle call_function with Parameter args in DDPOptimizer splitting (#128034 ) When nn module inlining is enabled, modules are replaced with the underlying function calls in the output fx graph. example: ``` class GraphModule(torch.nn.Module): def forward(self, L_x_: "f32[1024, 1024]"): l_x_ = L_x_ # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_structured_trace.py:284 in forward, code: return self.layers(x) l__self___layers_0: "f32[1024, 1024]" = self.L__self___layers_0(l_x_); l_x_ = None l__self___layers_1: "f32[1024, 1024]" = self.L__self___layers_1(l__self___layers_0); l__self___layers_0 = None return (l__self___layers_1,) ``` will be ``` class GraphModule(torch.nn.Module): def forward(self, L_self_layers_0_weight: "f32[1024, 1024]", L_self_layers_0_bias: "f32[1024]", L_x_: "f32[1024, 1024]", L_self_layers_1_weight: "f32[1024, 1024]", L_self_layers_1_bias: "f32[1024]"): l_self_layers_0_weight = L_self_layers_0_weight l_self_layers_0_bias = L_self_layers_0_bias l_x_ = L_x_ l_self_layers_1_weight = L_self_layers_1_weight l_self_layers_1_bias = L_self_layers_1_bias # File: /data/users/lsakka/pytorch/pytorch/torch/nn/modules/linear.py:116 in forward, code: return F.linear(input, self.weight, self.bias) input_1: "f32[1024, 1024]" = torch._C._nn.linear(l_x_, l_self_layers_0_weight, l_self_layers_0_bias); l_x_ = l_self_layers_0_weight = l_self_layers_0_bias = None input_2: "f32[1024, 1024]" = torch._C._nn.linear(input_1, l_self_layers_1_weight, l_self_layers_1_bias); input_1 = l_self_layers_1_weight = l_self_layers_1_bias = None return (input_2,) ``` The DDP optimizer when performing splitting, does not handle the inlined graph since it does not handle function calls since earlier we did not have function calls with params as inputs. (but calls to modules instead). This diff addresses that, it uses the example_value in the arguments to determine Parameter arguments of a function call and the Parameter properties. This address #https://github.com/pytorch/pytorch/issues/127552 running the optimizer on the code above with inlining yields to the following splitting: ``` ---submod_0 graph--- graph(): %l_x_ : torch.Tensor [num_users=1] = placeholder[target=l_x_] %l_self_layers_0_weight : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=l_self_layers_0_weight] %l_self_layers_0_bias : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=l_self_layers_0_bias] %linear : [num_users=1] = call_function[target=torch._C._nn.linear](args = (%l_x_, %l_self_layers_0_weight, %l_self_layers_0_bias), kwargs = {}) return linear ---submod_1 graph--- graph(): %input_1 : [num_users=1] = placeholder[target=input_1] %l_self_layers_1_weight : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=l_self_layers_1_weight] %l_self_layers_1_bias : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=l_self_layers_1_bias] %linear : [num_users=1] = call_function[target=torch._C._nn.linear](args = (%input_1, %l_self_layers_1_weight, %l_self_layers_1_bias), kwargs = {}) return linear ---final graph--- graph(): %l_self_layers_0_weight : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=L_self_layers_0_weight] %l_self_layers_0_bias : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=L_self_layers_0_bias] %l_x_ : torch.Tensor [num_users=1] = placeholder[target=L_x_] %l_self_layers_1_weight : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=L_self_layers_1_weight] %l_self_layers_1_bias : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=L_self_layers_1_bias] %submod_0 : [num_users=1] = call_module[target=compiled_submod_0](args = (%l_x_, %l_self_layers_0_weight, %l_self_layers_0_bias), kwargs = {}) %submod_1 : [num_users=1] = call_module[target=compiled_submod_1](args = (%submod_0, %l_self_layers_1_weight, %l_self_layers_1_bias), kwargs = {}) return (submod_1,) --------------- ``` where as without inlining it uses to be ``` ---submod_0 graph--- graph(): %l_x_ : torch.Tensor [num_users=1] = placeholder[target=l_x_] %l__self___layers_0 : [num_users=1] = call_module[target=L__self___layers_0](args = (%l_x_,), kwargs = {}) return l__self___layers_0 /data/users/lsakka/pytorch/pytorch/torch/_inductor/compile_fx.py:133: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( ---submod_1 graph--- graph(): %l__self___layers_0 : [num_users=1] = placeholder[target=l__self___layers_0] %l__self___layers_1 : [num_users=1] = call_module[target=L__self___layers_1](args = (%l__self___layers_0,), kwargs = {}) return l__self___layers_1 ---final graph--- graph(): %l_x_ : torch.Tensor [num_users=1] = placeholder[target=L_x_] %submod_0 : [num_users=1] = call_module[target=compiled_submod_0](args = (%l_x_,), kwargs = {}) %submod_1 : [num_users=1] = call_module[target=compiled_submod_1](args = (%submod_0,), kwargs = {}) return (submod_1,) --------------- ``` TESTING: (1) running ``` TORCHDYNAMO_INLINE_INBUILT_NN_MODULES=1 pytest test/distributed/test_dynamo_distributed.py -k ``` result in reduction in failures from 6 to 2 with this PR. The two remaining are FSDP related which does not sounds trivial and have so many details. will leave them for future work. Co-authored-by: Animesh Jain <anijain@umich.edu> Pull Request resolved: https://github.com/pytorch/pytorch/pull/128034 Approved by: https://github.com/anijain2305, https://github.com/wconstab	2024-06-13 17:07:27 +00:00
Boyuan Feng	0c1ac4484d	Support `call_method` in DDPOptimizer (#121771 ) This PR fixes Issue #111279. While #111279 reported the issue with `MultiheadAttention`, a minimal reproduction would be: ```python class ToyModel(nn.Module): def __init__(self,): super().__init__() self.linear = nn.Linear(128, 10) def forward(self, x: torch.Tensor) -> torch.Tensor: return self.linear.forward(x) # Error # return self.linear(x) # OK ``` Dynamo treats `self.linear(x)` as `call_module` while treating `self.linear.forward(x)` as a [`get_attr` and a `call_method`](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/variables/nn_module.py#L358-L378). However, existing DDPOptimizer assumes, for a `get_attr` node, `getattr(gm, node.target)` gives a tensor with the `requires_grad` attribute. Existing DDPOptimizer also does not support `call_method` nodes. This PR adds support for `call_method` and check on `get_attr`. It also checks if a module's parameters have been added to a bucket to support multiple method calls from the same module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121771 Approved by: https://github.com/yf225	2024-03-13 20:03:15 +00:00
Elias Ellison	d03b11ad5b	Pass inductor strides forward in ddp optimizer (#120523 ) # Note: Returning Fake Tensors on First AOT Autograd Call # # Inductor will optimize strides of outputs when it deems it profitable. # For instance, converting to channels last. When we split the graph here # into multiple inductor compilations, we need to make sure that the # output strides of one compilation is appropriately passed to the subsequent # compilations. However, the mapping from inductor output to dynamo output # is non-trivial due to aot_autograd's deduping, de-aliasing, mutation, re-writing, # subclass handling, etc. In order to replay all this logic we set a flag such that # the first invocation of inductor in aot_autograd will return Fake Tensors with # appropriate strides. Then, all of aot autograd's runtime logic is replayed. # This gives us the appropriately strided outputs here which will reflect runtime strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120523 Approved by: https://github.com/yf225, https://github.com/bdhirsh	2024-02-29 22:25:00 +00:00
Elias Ellison	9c9bde515c	Factor out Submod compilers (#120527 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120527 Approved by: https://github.com/kadeng	2024-02-28 22:11:47 +00:00
Edward Z. Yang	1a1fc1047d	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007 ghstack dependencies: #120712	2024-02-28 01:01:41 +00:00
PyTorch MergeBot	f3dd2a544c	Revert "Add structured trace logs (#120289 )" This reverts commit `9dfaef962c`. Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))	2024-02-27 19:49:05 +00:00
Edward Z. Yang	9dfaef962c	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007	2024-02-27 00:04:23 +00:00
Will Constable	abe3c55a6a	Update DDP dynamo debug docs (#118295 ) Refreshes https://github.com/pytorch/pytorch/pull/114201 and updates it to include other log names that also include ddp_optimizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118295 Approved by: https://github.com/LucasLLC, https://github.com/wanchaol	2024-01-29 14:58:26 +00:00
Edward Z. Yang	d03173e88c	Unify MYPYINDUCTOR and MYPY (#118432 ) The original motivation for MYPYINDUCTOR was a faster type checking configuration that only checked a subset of files. With the removal of `follow_imports = ignore`, we are now able to use dmypy to do fast incremental typechecking, eliminating the need for this. Perhaps erroneously, when I tee'ed up this PR I elected to delete the `follow_imports = skip` designations in the mypy-inductor.ini. This lead to a number of extra type error suppressions that I manually edited. You will need to review. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118432 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418	2024-01-27 17:23:20 +00:00
Will Feng	a27ed4d364	[dynamo / DDP] Add optimize_ddp_lazy_compile config to control lazy compile for DDPOptimizer (False by default) (#116292 ) We want to enable `optimize_ddp_lazy_compile` by default as soon as possible, becuase it will fix stride mismatch errors (see motivation: https://github.com/pytorch/pytorch/pull/114154). However, lazy compile currently causes shape mismatch in other cases (`test_graph_split_inductor_transpose`) and we need to fix them before we can enable it by default. Differential Revision: D52373445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116292 Approved by: https://github.com/williamwen42, https://github.com/wconstab	2023-12-21 22:34:24 +00:00
Jon Chuang	2cf0cf8137	[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 ) Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154 Approved by: https://github.com/wconstab, https://github.com/yf225	2023-12-06 18:50:14 +00:00
PyTorch MergeBot	e38a3a6079	Revert "[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 )" This reverts commit `3f574eadb4`. Reverted https://github.com/pytorch/pytorch/pull/114154 on behalf of https://github.com/clee2000 due to reverted internally, broke internal builds, not sure why bot isn't working ([comment](https://github.com/pytorch/pytorch/pull/114154#issuecomment-1832496040))	2023-11-29 18:43:17 +00:00
Jon Chuang	3f574eadb4	[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 ) Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154 Approved by: https://github.com/wconstab	2023-11-28 06:29:43 +00:00
Will Constable	2333d381b2	Make 'distributed' TORCH_LOGS include ddpoptimizer (#114376 ) There are now 3 ways to see logs from ddpoptimzer. 1) TORCH_LOGS="distributed" 2) TORCH_LOGS="dynamo" 3) TORCH_LOGS="torch._dynamo.backends.distributed" (1 and 2 are different supersets of 3 that also include other content) Note: ddp_graphs is still a separate 'artifact' logger, which just includes graph dumps from the graph-splitting process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114376 Approved by: https://github.com/wanchaol	2023-11-28 02:39:28 +00:00
voznesenskym	081c5b3adc	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) (#114526 ) Summary: The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526 Approved by: https://github.com/Chillee, https://github.com/huydhn	2023-11-26 23:40:32 +00:00
PyTorch MergeBot	2f3beb715c	Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 )" This reverts commit `2ca1119d53`. Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))	2023-11-22 12:52:33 +00:00
PyTorch MergeBot	e239a2b2d7	Revert "[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 )" This reverts commit `266054c3ca`. Reverted https://github.com/pytorch/pytorch/pull/114154 on behalf of https://github.com/DanilBaibak due to The lower PR in the stack https://github.com/pytorch/pytorch/pull/113926 breaks the internal build ([comment](https://github.com/pytorch/pytorch/pull/114154#issuecomment-1822704476))	2023-11-22 12:46:15 +00:00
Jon Chuang	266054c3ca	[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 ) Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154 Approved by: https://github.com/wconstab	2023-11-21 22:40:08 +00:00
voznesenskym	2ca1119d53	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-11-20 23:06:37 +00:00
Brian Hirsh	da914aed21	error when using _dynamo.optimize_ddp=True and _inductor.keep_output_stride=False together (#108235 ) From talking to @wconstab, we agreed that because of the way DDPOptimizer is written, it is (sort of) incompatible with inductor's `keep_output_stride=False` optimizations (and will cause silent correctness problems if you use them ogether). Added an assertion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108235 Approved by: https://github.com/wconstab ghstack dependencies: #108081	2023-09-05 20:02:35 +00:00
Animesh Jain	d0e5c681f5	[dynamo][ddp][ac] Fallback to single bucket when higher order op (#104639 ) This helps unblock an internal model. The real fix requires lot of work, which might question the alternate approach of partitioning AOT graphs instead of Dynamo graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104639 Approved by: https://github.com/wconstab	2023-07-06 02:20:15 +00:00
Will Constable	55cf5c00fa	Improve DDPOptimizer Logging (#103489 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103489 Approved by: https://github.com/ezyang	2023-06-14 22:24:44 +00:00
Will Constable	fee01640df	Make DDPOptimizer handle subgraphs without outputs (#103488 ) Subgraphs are partitions cut out of a whole graph. Outputs of a subgraph are either global outputs of the original graph, or can be outputs of a partition that feed inputs of the subsequent partition. Subgraphs are created using the fx utility 'passes.split_module', which requires that each partition have at least one output node. In cases where DDPOptimizer asked the partitioner to cut the graph around a set of nodes which only performed inplace mutation, the partitioner could be left trying to create a subgraph with no output nodes, violating its assumptions. To circumvent this, DDPOptimizer can expand the set of nodes marked for inclusion in a subgraph that has no outputs until it includes a node that is an output for that subgraph. It still traverses nodes of the original graph in reverse order and only considers widening a subgraph by iterating further in reverse order than it would have ordinarily done (past the cut point dictated by paramter count). It may still be possible the subgraph reaches the input node of the graph without satisfying the subgraph-output condition, in which case an error would still be raised by the partitioner. Fixes #103385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103488 Approved by: https://github.com/anijain2305	2023-06-14 01:16:04 +00:00
Edward Z. Yang	fa40195fac	Don't set_current_node in DDP. (#101046 ) Fixes https://github.com/pytorch/pytorch/issues/101045 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101046 Approved by: https://github.com/wconstab, https://github.com/malfet	2023-05-12 14:37:22 +00:00
Bert Maher	e0bf51d3bf	[dynamo] Add ddp_graphs artifact (#100021 ) I want to be able to decouple DDP graph printing from the rest of dynamo DEBUG-level logging, since frequently these logs are particularly enlightening. Differential Revision: [D45290919](https://our.internmc.facebook.com/intern/diff/D45290919/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100021 Approved by: https://github.com/wconstab, https://github.com/mlazos	2023-04-27 03:53:23 +00:00
Edward Z. Yang	b09722f540	Convert logging f-strings to use % format, part two (#98700 ) This hits multi-line logging strings Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Edward Z. Yang	9a8f71f23e	Convert logging f-strings to use % format (#98697 ) Codemod done with https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with assistance from ChatGPT. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Edward Z. Yang	d01ee10b25	Add detect_fake_mode (#98321 ) This replaces fake_mode_from_tensors but it preferentially looks for fake_mode in TracingContext and also if there is an active fake mode on the dispatch stack, before groveling in tensors to find it. This advances PegasusForCausalLM, which was previously failing because we generated a graph that had a parameter (non-fake) and a SymInt, and thus previously we failed to detect the correct fake mode. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98321 Approved by: https://github.com/voznesenskym	2023-04-05 22:15:16 +00:00
Edward Z. Yang	5df59f957f	Fix G001,G002,G003 in logs to % syntax (#97812 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97812 Approved by: https://github.com/Skylion007, https://github.com/kiukchung, https://github.com/malfet, https://github.com/mlazos	2023-04-01 01:43:33 +00:00
Andrew Gu	d9cd9a13bc	[BE][DDPOptimizer] De-dup `p` and `param` (#95654 ) The `param` from `param = target.get_parameter(name)` should be the same as `p` from `target.named_parameters()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95654 Approved by: https://github.com/wconstab	2023-03-01 01:17:09 +00:00
Kazuaki Ishizaki	46385b3e48	Fix typos under torch/_dynamo directory (#95599 ) This PR fixes typos in comments and messages of `.py` files under `torch/_dynamo` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95599 Approved by: https://github.com/ezyang	2023-02-28 03:44:24 +00:00
Will Constable	9fb9219478	Make DDPOptimizer work with torch._dynamo.explain() (#94749 ) GraphModules that were created during DDPOptimizer graph breaking lacked `compile_subgraph_reason`, which caused an exception when running .explain(). Now the reason is provided and users can use .explain() to find out that DDPOptimizer is causing graph breaks. Fixes #94579 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94749 Approved by: https://github.com/voznesenskym	2023-02-14 01:33:47 +00:00

1 2

51 Commits