pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Will Feng	a89e3c2490	Add compiled_autograd_kwargs_override Dynamo config (#136967 ) For Traceable FSDP2, the most common use case is to have `fullgraph=False` for forward pass (to allow user-level graph breaks), and `fullgraph=True` for compiled autograd backward pass (required for queue_callback support). With `torch._dynamo.compiled_autograd=True`, previously we are not able to set different `fullgraph` config value for forward vs. backward pass, since `rebuild_ctx` just reuses the forward compile config as-is. This PR adds `torch._dynamo.config.compiled_autograd_kwargs_override` config to allow forcing `fullgraph=True` for CA Dynamo tracing. With this PR, we can remove standalone compiled autograd ctx manager usage in Traceable FSDP2 unit tests, and consolidate on using `torch._dynamo.compiled_autograd=True`. Test commands: - `pytest -rA test/distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_transformer_backend_inductor_fullgraph_True` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136967 Approved by: https://github.com/xmfan	2024-10-02 06:23:59 +00:00
Yuanhao Ji	be169f743b	[Dynamo] Mark `config.dead_code_elimination` as deprecated (#136933 ) part of #136862 For reviewers, all call sites are here: https://github.com/search?q=repo%3Apytorch%2Fpytorch+dead_code_elimination+language%3APython&type=code&l=Python Pull Request resolved: https://github.com/pytorch/pytorch/pull/136933 Approved by: https://github.com/williamwen42, https://github.com/anijain2305	2024-10-01 03:51:59 +00:00
Oguz Ulgen	a28b40fa74	Improve is_fbcode functionality (#136871 ) Summary: Previously is_fbcode just checked whether the checkout was git or not. This is extremely error prone. Lets make it fool-proof. Test Plan: unit tests Reviewed By: masnesral Differential Revision: D63545169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136871 Approved by: https://github.com/masnesral	2024-09-27 21:19:01 +00:00
Edward Z. Yang	a2d2a30311	Add torch._dynamo.config.fail_on_cache_limit_hit (#136767 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/136767 Approved by: https://github.com/albanD, https://github.com/jansel ghstack dependencies: #136533	2024-09-27 03:58:00 +00:00
William Wen	95e976a63f	[dynamo] recursively skip frames when Dynamo cache limit is hit (#135144 ) Fixes https://github.com/pytorch/pytorch/pull/135144 and [T197117723](https://www.internalfb.com/intern/tasks/?t=197117723). In general, adds `SkipCodeRecursiveException` to Dynamo - when raised in Dynamo, convert_frame will return a `skip_code_recursive_flag` back to C Dynamo, signaling it to skip the current frame and all recursive calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135144 Approved by: https://github.com/jansel, https://github.com/anijain2305	2024-09-06 21:38:53 +00:00
Animesh Jain	058a69d91a	[fbcode][dynamo] Turn on guard_nn_modules using justknobs_check (#134928 ) As Title Pull Request resolved: https://github.com/pytorch/pytorch/pull/134928 Approved by: https://github.com/ezyang	2024-09-05 22:05:54 +00:00
Laith Sakka	d6091c8726	Add compile time instruction count metric (#133834 ) PYTHONPATH=$(pwd) python benchmarks/update_hint_benchmark.py out as of this diff, compile_time_instruction_count counts the number of instruction from within convert_frame.compile_inner ``` update_hint_regression,compile_time_instruction_count,10522459165 ``` will add result from CI once populated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133834 Approved by: https://github.com/aorenste	2024-08-27 23:29:02 +00:00
Edward Z. Yang	66d6d8b1b9	Support TORCH_COMPILER_COLLECTIVES envvar (#133696 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133696 Approved by: https://github.com/Skylion007, https://github.com/c-p-i-o	2024-08-19 20:13:04 +00:00
Will Feng	6790eb52f9	[Traceable FSDP2] Set torch._dynamo.config.skip_fsdp_hooks to True by default (#133531 ) Setting `torch._dynamo.config.skip_fsdp_hooks = True` is required for graph-break compiled FSDP2, thus setting it to default will make this adoption easier. If users want to use Traceable FSDP2, they can set this to False manually (which will allow FSDP2 hooks to be traced through). Pull Request resolved: https://github.com/pytorch/pytorch/pull/133531 Approved by: https://github.com/awgu ghstack dependencies: #133532	2024-08-16 17:18:42 +00:00
Aart Bik	a8490a0762	[traced-graph][sparse] propagate sparsity in fx graph (#131920 ) This PR proceeds with implementing the feature request #117188 by generalizing more cases that already work with COO to work with the compressed sparse formats as well. Feature request: https://github.com/pytorch/pytorch/issues/117188 Rebranch of older PRs (for history): https://github.com/pytorch/pytorch/pull/131474 https://github.com/pytorch/pytorch/pull/128549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131920 Approved by: https://github.com/ezyang	2024-08-05 15:49:53 +00:00
Xuehai Pan	e74ba1b34a	[BE][Easy][15/19] enforce style for empty lines in import segments in `torch/_d*/` (#129767 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129767 Approved by: https://github.com/anijain2305	2024-07-31 21:18:11 +00:00
Animesh Jain	f44446e851	[dynamo] Turn on inline_inbuilt_nn_modules (#131275 ) Known issues that are deliberately kept open and will be fixed later are tracked here - https://github.com/pytorch/pytorch/issues/131696 Training dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/anijain2305/435/head&lCommit=408b9358b8fca3a5d08b39741419fe8a596941aa&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/08ef081c-37d7-436d-905b-4b9e2b470644) Inference dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=gh/anijain2305/435/head&lCommit=914244fa2fe0055917e039e35183b21fa90afdc6&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/32136eff-a39e-4cde-a438-e51a665bc3c9) Inference sees a little bit more perf degradation but we are ok with that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131275 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #132053	2024-07-29 20:01:51 +00:00
PyTorch MergeBot	7ef927da15	Revert "[dynamo] Turn on inline_inbuilt_nn_modules (#131275 )" This reverts commit `6de65d5dd4`. Reverted https://github.com/pytorch/pytorch/pull/131275 on behalf of https://github.com/atalman due to Broke CI: dynamo/test_structured_trace.py::StructuredTraceTest::test_ddp_graphs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10132084288/job/28016215101) [HUD commit link](`6de65d5dd4`) ([comment](https://github.com/pytorch/pytorch/pull/131275#issuecomment-2255839646))	2024-07-29 12:48:27 +00:00
Animesh Jain	6de65d5dd4	[dynamo] Turn on inline_inbuilt_nn_modules (#131275 ) Known issues that are deliberately kept open and will be fixed later are tracked here - https://github.com/pytorch/pytorch/issues/131696 Training dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/anijain2305/435/head&lCommit=408b9358b8fca3a5d08b39741419fe8a596941aa&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/08ef081c-37d7-436d-905b-4b9e2b470644) Inference dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=gh/anijain2305/435/head&lCommit=914244fa2fe0055917e039e35183b21fa90afdc6&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/32136eff-a39e-4cde-a438-e51a665bc3c9) Inference sees a little bit more perf degradation but we are ok with that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131275 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #131744, #131928, #131948	2024-07-28 13:23:00 +00:00
PyTorch MergeBot	14920c149b	Revert "[dynamo] Turn on inline_inbuilt_nn_modules (#131275 )" This reverts commit `0455344777`. Reverted https://github.com/pytorch/pytorch/pull/131275 on behalf of https://github.com/clee2000 due to I think this broke inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmDynamicShapesCPU::test_quantized_linear_amx_dynamic_shapes_batch_size_16_in_features_4_out_features_64_bias_True_cpu [GH job link](https://github.com/pytorch/pytorch/actions/runs/10102272826/job/27938970118) [HUD commit link](`0455344777`) not run on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/131275#issuecomment-2251609554))	2024-07-26 00:12:40 +00:00
Animesh Jain	0455344777	[dynamo] Turn on inline_inbuilt_nn_modules (#131275 ) Known issues that are deliberately kept open and will be fixed later are tracked here - https://github.com/pytorch/pytorch/issues/131696 Training dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=gh/anijain2305/435/head&lCommit=408b9358b8fca3a5d08b39741419fe8a596941aa&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/08ef081c-37d7-436d-905b-4b9e2b470644) Inference dashboard ([link](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Thu%2C%2018%20Jul%202024%2000%3A03%3A50%20GMT&stopTime=Thu%2C%2025%20Jul%202024%2000%3A03%3A50%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=gh/anijain2305/435/head&lCommit=914244fa2fe0055917e039e35183b21fa90afdc6&rBranch=gh/anijain2305/435/base&rCommit=d31f2ae904ba2cf0884bf24413ba2109c3585d51)) ![image](https://github.com/user-attachments/assets/32136eff-a39e-4cde-a438-e51a665bc3c9) Inference sees a little bit more perf degradation but we are ok with that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131275 Approved by: https://github.com/ezyang ghstack dependencies: #131744	2024-07-25 22:14:17 +00:00
Edward Z. Yang	0c6f1ca064	Introduce torch._dynamo.config.enable_compiler_collectives for syncing compilation across ranks (#130935 ) This PR implements an opt-in configuration option for synchronizing compilation across all ranks at the end of Dynamo tracing (and potentially, other places in the future). There are two pieces to this PR: 1. Implementing infrastructure for compiler collectives (DistributedState/LocalState, the actual collective) 2. Using this infrastructure to synchronize automatic dynamic choices across all ranks The infrastructure in part one can be used for other purposes, just add more (serializable) fields to LocalState. Here is how automatic dynamic synchronization works: 1. Preflight in "torch/_dynamo/variables/builder.py": On the first Dynamo trace run, we trace without automatic dynamic at all; we assume all Tensor inputs that are not otherwise marked are static. This run is purely to collect all Tensor input sizes in the program. 2. torch/_dynamo/output_graph.py: At the end of the first Dynamo trace run, we perform a compiler collective to distribute all Tensor input sizes to all ranks. Then, we restart Dynamo 3. Apply the updates in "torch/_dynamo/variables/builder.py": Now that we have all sizes for every rank, we now update frame state with the observed sizes for all ranks, in rank order. Under the assumption that frame state is consistent on all ranks, this series of updates will preserve consistency. For future work, it would be safer if we force a consistent hint on all ranks; this is more involved as we have to interpose in fakification. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130935 Approved by: https://github.com/jansel	2024-07-24 11:24:11 +00:00
Pian Pawakapan	988ed4d5db	[export] clean up allow_complex_guards_as_runtime_asserts flag (#130596 ) Summary: removes underscore, cleans up dead code in DimConstraints Test Plan: existing export tests Reviewed By: angelayi Differential Revision: D59612746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130596 Approved by: https://github.com/angelayi	2024-07-12 17:17:11 +00:00
Yanbo Liang	111f9b5d44	[Dynamo] Add config to skip/inline torchrec (#129912 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/129912 Approved by: https://github.com/anijain2305	2024-07-03 00:14:51 +00:00
Elias Ellison	b8e5678ad2	Delete lazy ddp optimizer (#120727 ) This is no longer necessary now that the normal ddp optimizer works correctly with inductor strides. Differential Revision: [D54858819](https://our.internmc.facebook.com/intern/diff/D54858819) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120727 Approved by: https://github.com/jansel, https://github.com/yf225	2024-06-26 21:53:54 +00:00
Will Feng	575bc1e3af	[Reopen #114036 ] Allow "must recompute" in torch.compile + selective checkpointing (SAC) (#129295 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129295 Approved by: https://github.com/Chillee	2024-06-25 23:47:08 +00:00
Will Feng	dadc0ed4c8	[Traceable FSDP2] Add `aot_eager` backend E2E tests for transformer model (#129157 ) This PR adds Traceable FSDP2 `aot_eager` backend E2E tests for simple MLP as well as transformer model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129157 Approved by: https://github.com/awgu ghstack dependencies: #129203	2024-06-23 06:11:11 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
Michael Lazos	2129903aa3	Properly detect nested torch function args (#127496 ) Dynamo was not detecting nested torch function classes in containers. This was due to pytree compatibility for variable trackers being removed. Fixes https://github.com/pytorch/pytorch/issues/127174 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127496 Approved by: https://github.com/anijain2305	2024-06-02 03:43:22 +00:00
Simon Fan	ec098b88b6	[compiled autograd] torch.compile API (#125880 ) - enter existing compiled autograd ctx manager before entering torch.compile frames Pull Request resolved: https://github.com/pytorch/pytorch/pull/125880 Approved by: https://github.com/jansel	2024-05-31 04:38:20 +00:00
PyTorch MergeBot	ce63b676f3	Revert "[compiled autograd] torch.compile API (#125880 )" This reverts commit `e1c322112a`. Reverted https://github.com/pytorch/pytorch/pull/125880 on behalf of https://github.com/atalman due to sorry your PR broke lint, need to revert ([comment](https://github.com/pytorch/pytorch/pull/125880#issuecomment-2139605376))	2024-05-30 13:53:31 +00:00
Simon Fan	e1c322112a	[compiled autograd] torch.compile API (#125880 ) - enter existing compiled autograd ctx manager before entering torch.compile frames Pull Request resolved: https://github.com/pytorch/pytorch/pull/125880 Approved by: https://github.com/jansel	2024-05-30 02:10:06 +00:00
Pian Pawakapan	8a31c2aa84	[export] allow complex guards as runtime asserts (#127129 ) With the current state of export's dynamic shapes, we struggle with guards and constraints that are beyond the current dynamic shapes language, expressed with dims and derived dims. While we can compile and guarantee correctness for guards within the current language (e.g. min/max ranges, linear relationships, integer divisibility) we struggle to dynamically compile guards which extend beyond that. For these "complex" guards, we typically do either of the following: 1) raise a constraint violation error, along the lines of "not all values of <symbol> in the specified range satisfy <guard>", with or without suggested fixes, 2) specialize to the provided static values and suggest removing dynamism, or 3) fail compilation due to some arbitrary unsupported case. Previous [work](https://github.com/pytorch/pytorch/pull/124949) went towards resolving this by disabling forced specializations, instead allowing the user to fail at runtime with incorrect inputs. In this PR, relying on [hybrid backed-unbacked symints](https://github.com/pytorch/pytorch/issues/121749), [deferred runtime asserts](https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/runtime_assert.py), and the function [_is_supported_equivalence()](`d7de4c9d80/torch/fx/experimental/symbolic_shapes.py (L1824)`), we add a flag `_allow_complex_guards_as_runtime_asserts` which allows the user to compile exported programs containing these guards and maintain dynamism, while adding correctness checks as runtime assertions in the graph. Hybrid backed-unbacked symints allow us to easily bypass "implicit" guards emitted from computation - guards that we ~expect to be true. Popular examples revolve around reshapes: ``` # reshape def forward(self, x, y): # x: [s0, s1], y: [s2] return x.reshape([-1]) + y # guard s0 * s1 = s2 This leads to the following exported program class GraphModule(torch.nn.Module): def forward(self, x: "f32[s0, s1]", y: "f32[s2]"): sym_size_int: "Sym(s2)" = torch.ops.aten.sym_size.int(y, 0) mul: "Sym(-s2)" = -1 * sym_size_int; sym_size_int = None sym_size_int_1: "Sym(s0)" = torch.ops.aten.sym_size.int(x, 0) sym_size_int_2: "Sym(s1)" = torch.ops.aten.sym_size.int(x, 1) mul_1: "Sym(s0s1)" = sym_size_int_1 sym_size_int_2; sym_size_int_1 = sym_size_int_2 = None add: "Sym(s0s1 - s2)" = mul + mul_1; mul = mul_1 = None eq: "Sym(Eq(s0s1 - s2, 0))" = add == 0; add = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq, "Runtime assertion failed for expression Eq(s0s1 - s2, 0) on node 'eq'"); eq = None view: "f32[s0s1]" = torch.ops.aten.view.default(x, [-1]); x = None add_1: "f32[s0s1]" = torch.ops.aten.add.Tensor(view, y); view = y = None return (add_1,) ``` Another case is symbol divisibility: ``` def forward(self, x): # x: [s0, s1] return x.reshape([-1, x.shape[0] - 1]) # Eq(Mod(s0 s1, s0 - 1), 0) ``` Applying deferred runtime asserts also helps dynamic compilation for "explicit" complex guards that typically cause problems for export. For example we can generate runtime asserts for not-equal guards, and complex conditions like the following: ``` class Foo(torch.nn.Module): def forward(self, x, y): # check that negation of first guard also shows up as runtime assertion if x.shape[0] == y.shape[0]: # False return x + y elif x.shape[0] == y.shape[0] 3: # False return x + 2, y + 3 elif x.shape[0] 2 == y.shape[0] * 3: # True return x * 2.0, y * 3.0 ``` For the above graph we will generate 3 runtime assertions: the negation of the first 2, and the 3rd condition as a guard. One additional benefit here over the current state of exported programs is that this adds further correctness guarantees - previously with explicit complex guards, if compilation succeeded, the guards would be ignored at runtime, treated as given. As shown above, the runtime asserts appear as math ops in the graph, generated by the sympy interpreter, resulting in an _assert_scalar call. There is an option to avoid adding these asserts into the graph, by setting `TORCH_DYNAMO_DO_NOT_EMIT_RUNTIME_ASSERTS=1`. This results in the "original" computation graph, with dynamism, and any incorrect inputs will fail on ops during runtime. Further work could go into prettifying the printer, so the majority of the graph isn't guard-related. Ideally this PR would subsume and remove the recently added [_disable_forced_specializations](https://github.com/pytorch/pytorch/pull/124949) flag, but that flag still handles one additional case of specialization: single-variable equalities where the symbol is solvable for a concrete value: see this [PR](https://github.com/pytorch/pytorch/pull/126925) This PR doesn't change any behavior around data-dependent errors/unbacked symints yet, that could be further work. NOTE: will take naming change suggestions for the flag :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127129 Approved by: https://github.com/avikchaudhuri	2024-05-29 17:15:25 +00:00
dshi7	0f67d38f0f	add TORCHDYNAMO_CAPTURE_DYNAMIC_OUTPUT_SHAPE_OPS (#127017 ) tlparse prints failure description like this > dynamic shape operator: aten._unique2.default; to enable, set torch._dynamo.config.capture_dynamic_output_shape_ops = True adding os env var to set it easier for testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/127017 Approved by: https://github.com/jackiexu1992	2024-05-25 05:42:41 +00:00
Alexandre Ghelfi	b3a8a3cbab	Fix typos in `torch._dynamo.config.py` (#126150 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126150 Approved by: https://github.com/Skylion007	2024-05-14 14:27:35 +00:00
Edward Z. Yang	2ba102f689	Implement native support for float inputs in Dynamo and ShapeEnv (#125325 ) The big idea is that floats are treated as Tensors on input/output to the FX graph, but on the inside, we immediately call item() on the synthetic Tensor and record regular float operations on it. Canonicalization to Tensor operations will happen in a standalone FX pass. This behavior is controlled by `specialize_float` config variable when set to False. The generated graph looks like this for the test `test_unspec_float_output`: ``` def forward(self, L_x_: "f32[3]", L_y_: "f32[]"): l_x_ = L_x_ l_y_ = L_y_ # File: /data/users/ezyang/a/pytorch/test/dynamo/test_unspec.py:511 in f, code: return x + 1, y * 2 add: "f32[3]" = l_x_ + 1; l_x_ = None item: "Sym(zf0)" = l_y_.item(); l_y_ = None mul: "Sym(2zf0)" = item 2; item = None scalar_tensor: "f32[]" = torch.scalar_tensor(mul); mul = None return (add, scalar_tensor) ``` The ingredients: * torch/_dynamo/variables/builder.py When `specialize_float` is False, we wrap float literals with `wrap_symfloat`. This is an unholy mashup of `wrap_symint` and `wrap_unspecialized_primitive`. The overall strategy is that we first generate a tensor argument (because that's what we want to show up into the FX graph), but then immediately call item() on the tensor argument to get a SymNodeVariable, which we will do the rest of the tracing with. Importantly, this SymNodeVariable is backed with the source of the original float: this means we can guard on the resulting value (something we could NOT do with UnspecializedPythonVariable). This has to be done manually, because if you literally call item() on the tensor, you will end up with an unbacked float. There is a bit of copy paste from wrap_symint and wrap_unspecialized_primitive which we can try to factor out, but this really is its own thing and you should review every line of code in the function. * torch/fx/experimental/symbolic_shapes.py We now can generate guards on float inputs, and these guards are handled inside of ShapeEnv. So we need to be able to allocate (backed!) float symbols, and produce guards for them. Fairly straightforward generalization. * torch/_dynamo/codegen.py I also need to maintain the invariant that there are no float outputs to the FX graph. I chose to do this at codegen time. When we detect a SymNodeVariable on the return stack for a float, we on the fly convert it (via `as_tensor`) to a TensorVariable, which is the true output. We then special case the output bytecode to call item() on it again. The tensor conversion is memoized on SymNodeVariable since we typically run the code generation process twice. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125325 Approved by: https://github.com/lezcano, https://github.com/jansel	2024-05-14 04:10:01 +00:00
Animesh Jain	0935b3d794	[dynamo] Turn on guard_nn_modules (#125202 ) Turning on guard_nn_modules adds large number of guards, so we are bound to take a perf hit. But the perf hit is small. These are the numbers ![image](https://github.com/pytorch/pytorch/assets/13822661/c8793906-c8c7-432b-9af4-4594713067be) First we observe that compared to Python guards, C++ guards give around 6x speedup. This reduces the total time spent in guards. This is shown in the last column (cpp_guards/inductor_optimized_latency). The worst model is around 1.61%, with most of the models below 1%. I think this is good enough signal to turn the config on. One might also wonder how much guard slowdown occurs with `guard_nn_modules=True`. This is the table ![image](https://github.com/pytorch/pytorch/assets/13822661/932a885b-1c03-424b-8405-5bc8fd35dd39) For most models, the guard overhead with nn module guards is under 2x. There are a few outliers, where the slowdown is really high and for those models we spend 1%-2% time in C++ guards as shown in first table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125202 Approved by: https://github.com/ezyang	2024-05-11 19:28:24 +00:00
Animesh Jain	b62e89c1b8	[dynamo] Do not turn on record relay with TORCH_COMPILE_DEBUG (#125488 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125488 Approved by: https://github.com/yanboliang, https://github.com/mlazos	2024-05-04 05:10:31 +00:00
Boyuan Feng	b91f83f181	[cudagraph] add config for cudagraph managed input mutation support (#124754 ) Summary: [#123231](https://github.com/pytorch/pytorch/pull/123231) adds cudagraph supports for more types of functions (i.e., cudagraph managed input mutation). These newly supported functions may have mutated static inputs, leading to assertion errors in some workload which skip cudagraph previously. This diff adds a config to opt in the new feature. Test Plan: ci Differential Revision: D56481353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124754 Approved by: https://github.com/eellison	2024-04-24 04:23:53 +00:00
Animesh Jain	704fac5618	[dynamo][cpp-guard] Reland Attempt 1 - Enable cpp guard manager (#124231 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124231 Approved by: https://github.com/jansel ghstack dependencies: #124230, #124237	2024-04-18 06:36:20 +00:00
PyTorch MergeBot	530bf391cc	Revert "[dynamo] Turn on CPP guard manager (#123547 )" This reverts commit `3e98bdd66d`. Reverted https://github.com/pytorch/pytorch/pull/123547 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/123547#issuecomment-2058337419))	2024-04-16 06:38:15 +00:00
willfengg	f1654fd4b0	[PT2D][FSDP] skip FSDP hooks base on dynamo config (#123021 ) unit test: `pytest test/distributed/_composable/fsdp/test_fully_shard_compile.py` For FSDP, we turn on/off compiling hooks base on `torch._dynamo.config.skip_fsdp_hooks` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123021 Approved by: https://github.com/yf225, https://github.com/anijain2305	2024-04-13 01:47:25 +00:00
Animesh Jain	3e98bdd66d	[dynamo] Turn on CPP guard manager (#123547 ) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/123547 Approved by: https://github.com/jansel	2024-04-12 23:30:56 +00:00
Jason Ansel	70b8c58f84	[dynamo] Emit warning to turn on capture_scalar_outputs (#123896 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123896 Approved by: https://github.com/anijain2305 ghstack dependencies: #123700, #123705, #123786, #123790, #123803, #123804	2024-04-12 19:03:13 +00:00
Oguz Ulgen	57a2032c7a	Delete Lark (#123689 ) Now that we are using MLIR bindings inside triton, lets delete Lark parser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123689 Approved by: https://github.com/jansel	2024-04-11 05:51:06 +00:00
PyTorch MergeBot	6b18daf205	Revert "Delete Lark (#123689 )" This reverts commit `a631461eef`. Reverted https://github.com/pytorch/pytorch/pull/123689 on behalf of https://github.com/PaliC due to This PR seems to be breaking test_binary_ufuncs.py ([comment](https://github.com/pytorch/pytorch/pull/123689#issuecomment-2048489549))	2024-04-10 21:48:04 +00:00
Oguz Ulgen	a631461eef	Delete Lark (#123689 ) Now that we are using MLIR bindings inside triton, lets delete Lark parser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123689 Approved by: https://github.com/jansel	2024-04-10 19:41:54 +00:00
Peter Bell	8865425ff7	[minifier] Add config flag to ignore non-fp values (#123006 ) When minifying, the after-aot minifier ignores non-floating values by default but does check them when running the the initial graph dump step. This means we may capture a graph that doesn't fail the tester and doesn't have any meaningful divergence. For example, the derivative of `elu(x)` depends on `x > 0` so this value is saved for backwards and so becomes a graph output. However, the difference between `FLT_MIN` and `0` in `x` is now enough to trigger an accuracy failure. I fix this by adding a config variable and environment variable to ignore these non floating point values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123006 Approved by: https://github.com/ezyang ghstack dependencies: #123005	2024-04-09 03:34:09 +00:00
Guilherme Leobas	84658d9c4f	Enable `capture_func_transforms` by default (#122211 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122211 Approved by: https://github.com/zou3519	2024-04-05 03:29:11 +00:00
eellison	5f46312dbb	Reapply "Switch cudagraph backend to cudagraph trees (#121019 )" and "Add Cudagraphs disable checking (#121018 )" (#121864 ) (#122713 ) This reverts commit `92ed8553a6`. No longer importing codecache or boxed_nop at top level, both of which casued issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122713 Approved by: https://github.com/anijain2305	2024-04-02 16:11:00 +00:00
Animesh Jain	60f3c092d4	[dynamo] Config option to Inline builtin nn module forward (#122725 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122725 Approved by: https://github.com/jansel ghstack dependencies: #122646, #122647, #122716, #122769, #122818	2024-03-28 03:01:27 +00:00
Animesh Jain	c108696228	[dynamo][guards-cpp-refactor][easy] Env variable to turn on cpp manager (#122646 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122646 Approved by: https://github.com/jansel	2024-03-27 19:40:37 +00:00
IvanKobzarev	9b095c3fe6	[dynamo] Config to not emit runtime asserts (#122603 ) Repetition on squashed & merged by mistake https://github.com/pytorch/pytorch/pull/122406 Differential Revision: [D55312394](https://our.internmc.facebook.com/intern/diff/D55312394) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122603 Approved by: https://github.com/ezyang	2024-03-25 21:17:44 +00:00
Animesh Jain	8860c625ea	[dynamo][guards-cpp-refactor] Integrate cpp guard manager with CheckFnManager (#120726 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120726 Approved by: https://github.com/jansel	2024-03-19 03:11:31 +00:00
Animesh Jain	f84d560236	[dynamo] Raise accumulated cache size limit (#122130 ) Fixes #114511 This was raised by IBM folks where the a LLM compile was failing because it had more than 64 layers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122130 Approved by: https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #121954, #122005	2024-03-19 02:35:48 +00:00
Animesh Jain	92ed8553a6	Revert "Switch cudagraph backend to cudagraph trees (#121019 )" and "Add Cudagraphs disable checking (#121018 )" (#121864 ) This reverts commit `9373ad0bb8`. Revert "Add Cudagraphs disable checking (#121018)" This reverts commit `4af0e634bf`. Causes compilation time increase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121864 Approved by: https://github.com/eellison	2024-03-15 00:03:09 +00:00
eellison	4af0e634bf	Add Cudagraphs disable checking (#121018 ) Adds the same cudagraphs disable checking from inductor - cudagraph trees to cudagraphs backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121018 Approved by: https://github.com/ezyang ghstack dependencies: #121017	2024-03-08 22:47:24 +00:00
PyTorch MergeBot	3d7cf8f392	Revert "Limit loop unrolling (#120023 )" This reverts commit `6cc7f9a2e6`. Reverted https://github.com/pytorch/pytorch/pull/120023 on behalf of https://github.com/anijain2305 due to breaks llms export ([comment](https://github.com/pytorch/pytorch/pull/120023#issuecomment-1974104633))	2024-03-02 00:04:08 +00:00
angelayi	c844b377fa	[dynamo] Reorder logs (#116106 ) Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792. Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600 There are some limitations to the printing right now: * You can only register logging functions, not methods * Inputs to the logging functions can only be tensors, constants, and format strings * Inputs to the logging functions which will later be mutated in-place will not be printed correctly TODO: Add the following tests * print function with argument of nested data structure; * print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly); * custom defined logging functions with nn.Module or nn.Module attribute arguments; * custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value); * custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage); Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106 Approved by: https://github.com/yanboliang	2024-03-01 17:04:24 +00:00
PyTorch MergeBot	63b259492a	Revert "[dynamo] Reorder logs (#116106 )" This reverts commit `c5472628ff`. Reverted https://github.com/pytorch/pytorch/pull/116106 on behalf of https://github.com/clee2000 due to landrace with `342e7929b8`, which removed the import for warnings. Should be an easy fix after rebase `c5472628ff` ([comment](https://github.com/pytorch/pytorch/pull/116106#issuecomment-1972586180))	2024-03-01 06:25:46 +00:00
Angela Yi	c5472628ff	[dynamo] Reorder logs (#116106 ) Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792. Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600 There are some limitations to the printing right now: * You can only register logging functions, not methods * Inputs to the logging functions can only be tensors, constants, and format strings * Inputs to the logging functions which will later be mutated in-place will not be printed correctly TODO: Add the following tests * print function with argument of nested data structure; * print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly); * custom defined logging functions with nn.Module or nn.Module attribute arguments; * custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value); * custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage); Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106 Approved by: https://github.com/yanboliang	2024-03-01 04:48:44 +00:00
Aaron Orenstein	6cc7f9a2e6	Limit loop unrolling (#120023 ) Tacotron2 causes massive loop unrolling resulting in very large graphs (26k nodes) which was causing inductor (and tracing itself) to choke. The unrolling size is controlled by the environment variable TORCHDYNAMO_MAX_LOOP_UNROLL_NODES which defaults to the arbitrary value 5000. This updates the tacotron2 timings as follows: eager timing: 3m:23s -> 35s aot_eager timing: 4m:12s -> 39s inductor timing: 22m:24s ->1m For reference the big loop in tacotron2 was this one (model.py[405]): ``` while len(mel_outputs) < decoder_inputs.size(0) - 1: decoder_input = decoder_inputs[len(mel_outputs)] mel_output, gate_output, attention_weights = self.decode(decoder_input) mel_outputs += [mel_output.squeeze(1)] gate_outputs += [gate_output.squeeze(1)] alignments += [attention_weights] ``` which gets unrolled and inlined adding about 36 nodes to the graph per iteration. Fixes #98467 Relates to #102839 which hopefully will result in a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120023 Approved by: https://github.com/yanboliang	2024-02-27 20:44:21 +00:00
Sam Larsen	2fb32a5f07	Enable fake tensor caching in fbcode by default (#118555 ) Summary: Enabled by default in OSS; this switches the default to "on" in fbcode too. Test Plan: Ran torchbench benchmarks in fbcode Differential Revision: [D53771626](https://our.internmc.facebook.com/intern/diff/D53771626) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118555 Approved by: https://github.com/eellison	2024-02-26 17:35:23 +00:00
PyTorch MergeBot	7d780ff86f	Revert "Enable fake tensor caching in fbcode by default (#118555 )" This reverts commit `0f2fbbff10`. Reverted https://github.com/pytorch/pytorch/pull/118555 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing one model test internally. Please take a look at the diff for more info D53189048 ([comment](https://github.com/pytorch/pytorch/pull/118555#issuecomment-1939550273))	2024-02-12 20:51:23 +00:00
Sam Larsen	0f2fbbff10	Enable fake tensor caching in fbcode by default (#118555 ) Summary: Enabled by default in OSS; this switches the default to "on" in fbcode too. Test Plan: Ran torchbench benchmarks in fbcode Differential Revision: [D53189048](https://our.internmc.facebook.com/intern/diff/D53189048) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118555 Approved by: https://github.com/eellison	2024-02-09 05:42:16 +00:00
Chien-Chin Huang	1d2382f141	[DDP] Use compiled_autograd to trace DDP backward allreduce (#110662 ) Summary The reducer of `DistributedDataParallel` is implemented with C++ and it is not easy to trace the allreduce launched in the reducer. This PR modifies `DistributedDataParallel` to launch one allreduce per gradient when `compiled_autograd` is enabled. The changes allow us to use `compiled_autograd` to trace the allreduce and later be optimized (fused) in the Inductor. Key Logic 1. If `ddp_python_hook` is True, we assume `compiled_autograd` is used. `DistributedDataParallel` registers `compiled_accum_grad_hook` for all parameters. 2. In the first forward() call, if `DistributedDataParallel` is not compiled, all `compiled_accum_grad_hook` are deregistered. If `DistributedDataParallel` is compiled, all `compiled_accum_grad_hook` will be compiled by `compiled_autograd`. 3. `compiled_accum_grad_hook` launches an allreduce to reduce the gradient of the parameter. Bucketing The compiled backward is slow because there is no bucketing for the allreduces. We rely on Inductor to bucket the allreduces. The bucketing is done in a separate PR. Differential Revision: [D49428482](https://our.internmc.facebook.com/intern/diff/D49428482/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110662 Approved by: https://github.com/wconstab	2024-02-08 03:03:15 +00:00
Will Constable	abe3c55a6a	Update DDP dynamo debug docs (#118295 ) Refreshes https://github.com/pytorch/pytorch/pull/114201 and updates it to include other log names that also include ddp_optimizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118295 Approved by: https://github.com/LucasLLC, https://github.com/wanchaol	2024-01-29 14:58:26 +00:00
Oguz Ulgen	47b5a6b05d	[Dynamo] Analyze triton kernels via tracing to determine mutations (#117300 ) This PR adds TTIR lexing and parsing in order to analyze which of the user defined triton kernel inputs are mutated. Differential Revision: [D53165999](https://our.internmc.facebook.com/intern/diff/D53165999) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117300 Approved by: https://github.com/jansel	2024-01-29 06:37:08 +00:00
Edward Z. Yang	46712b019d	Enable local_partial_types (#118467 ) When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418, #118432	2024-01-28 13:38:22 +00:00
Shunting Zhang	fe10b1800f	LazyGraphModule (#117911 ) I feel it's easier to open a new PR rather than iterating on the previous PR (https://github.com/pytorch/pytorch/pull/105257 ) since this is more like a rewrite. In this PR, instead of changing GraphModule directly which can easily causes BC issue, I create a LazyGraphModule class as Zachary & Jason suggested in comments from the previous PR. The difference between LazyGraphModule and GraphModule is mainly about how re-compile for the graph module happens. In GraphModule the recompilation happens 'eagerly': constructing a GraphModule will cause the recompilation. While in LazyGraphModule, we just mark the module as needing recompilation. The real recompilation only happens when absolutely required (e.g. call forward method, access the code property etc.). In a lot of cases in torch.compile, the real recompilation eventually is not triggered at all. This can save a few seconds of compilation time. By default, GraphModule rather than LazyGraphModule is used. `use_lazy_graph_module(True)` context manager can be used to pick LazyGraphModule instead. This has been applied to the torch.compile stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117911 Approved by: https://github.com/jansel	2024-01-27 04:10:18 +00:00
Jason Ansel	e5e9f390be	[dynamo] Optimize overheads from _TorchDynamoContext (#118070 ) Based on `python benchmarks/dynamo/microbenchmarks/overheads.py`: - Before `18.1us` - After `12.2us` Pull Request resolved: https://github.com/pytorch/pytorch/pull/118070 Approved by: https://github.com/yanboliang, https://github.com/anijain2305 ghstack dependencies: #118065	2024-01-25 05:04:56 +00:00
Quinn Zhu	12662f4d95	[dynamo] add username in debug path (#117820 ) Summary: No user name may cause conflict and permission error when people share a dev server bypass-github-pytorch-ci-checks Test Plan: ci Differential Revision: D52895486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117820 Approved by: https://github.com/kflu, https://github.com/DanilBaibak	2024-01-24 10:14:20 +00:00
Sam Larsen	208e64a9ba	Initial implementation of FakeTensor caching (#113873 ) Summary: Cache the result of FakeTensor dispatch and skip re-evaluation on cache hits. Test Plan: New unit tests. Caching is enabled in this diff, so all existing tests exercise the cache as well. Differential Revision: [D52841637](https://our.internmc.facebook.com/intern/diff/D52841637) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113873 Approved by: https://github.com/eellison	2024-01-17 20:38:54 +00:00
Will Feng	a27ed4d364	[dynamo / DDP] Add optimize_ddp_lazy_compile config to control lazy compile for DDPOptimizer (False by default) (#116292 ) We want to enable `optimize_ddp_lazy_compile` by default as soon as possible, becuase it will fix stride mismatch errors (see motivation: https://github.com/pytorch/pytorch/pull/114154). However, lazy compile currently causes shape mismatch in other cases (`test_graph_split_inductor_transpose`) and we need to fix them before we can enable it by default. Differential Revision: D52373445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116292 Approved by: https://github.com/williamwen42, https://github.com/wconstab	2023-12-21 22:34:24 +00:00
David Berard	89ee3af076	[Reland][Dynamo] Don't log compilation metrics for PyTorch unit tests (#115571 ) Reland #115452, which was reverted to simplify a merge conflict with #115386 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115571 Approved by: https://github.com/yanboliang	2023-12-12 01:15:54 +00:00
David Berard	5c0976fa04	Revert "[dynamo] guarded config (#111299 )" (#115386 ) This reverts commit `5927e9cbf2`. Differential Revision: [D51959266](https://our.internmc.facebook.com/intern/diff/D51959266) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115386 Approved by: https://github.com/yanboliang, https://github.com/malfet ghstack dependencies: #115384, #115401, #115385	2023-12-11 19:35:42 +00:00
PyTorch MergeBot	f06f51b152	Revert "[Dynamo] Don't log compilation metrics for PyTorch unit tests (#115452 )" This reverts commit `cd444aa075`. Reverted https://github.com/pytorch/pytorch/pull/115452 on behalf of https://github.com/davidberard98 due to Merge conflict with #115385, which already landed in fbcode ([comment](https://github.com/pytorch/pytorch/pull/115452#issuecomment-1850729965))	2023-12-11 19:21:40 +00:00
Yanbo Liang	274fdc81f8	[Dynamo][6.3/N] Further cleanup torch.py (#114669 ) A follow-up PR to clean up what I found during the refactor of torch.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/114669 Approved by: https://github.com/jansel	2023-12-11 07:16:03 +00:00
Yanbo Liang	cd444aa075	[Dynamo] Don't log compilation metrics for PyTorch unit tests (#115452 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115452 Approved by: https://github.com/zou3519	2023-12-09 01:39:36 +00:00
rzou	2847045ed9	Set _dynamo.config.capture_func_transforms=False (#115267 ) Due to not all tests in the Dynamo shard actually running in CI, we've started to bitrot on this implementation. Since our plan is to trace into the functorch implementations instead of construct a HOP (which is what capture_func_transforms=True does), let's turn off this config by default. Test Plan: - Tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/115267 Approved by: https://github.com/voznesenskym, https://github.com/guilhermeleobas	2023-12-07 18:42:15 +00:00
Yanbo Liang	4620170008	[Dynamo] Revert multiple PRs since they triggered compilation stuck internally (#115126 ) Revert the following PRs to mitigate internal compilation stuck: #113432 #114016 #114507 #114196 #114739 #114669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115126 Approved by: https://github.com/xush6528	2023-12-05 22:35:37 +00:00
Yanbo Liang	ab5385fc50	[Dynamo][6.3/N] Further cleanup torch.py (#114669 ) A follow-up PR to clean up what I found during the refactor of torch.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/114669 Approved by: https://github.com/jansel	2023-12-01 04:08:29 +00:00
Jon Chuang	00b67193ef	[utils] move `config_typing.pyi` to `torch.utils` (#113929 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113929 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #111299, #111300, #113901, #113916	2023-11-17 18:51:57 +00:00
Jon Chuang	5927e9cbf2	[dynamo] guarded config (#111299 ) --- Fixes: https://github.com/pytorch/pytorch/issues/110682 Replaces: https://github.com/pytorch/pytorch/pull/111074 The guards are installed based on config that is valid at the call to `torch.compile`, rather than at any subsequent call / triggered compilation. Subsequent compilations will restore the config if there is a config mismatch of the existing global config with the saved config. TODO: - [X] add tests Follow up PRs: - [x] add revised cache size computation (follow up PR: #111300 , based on: https://github.com/pytorch/pytorch/pull/107496) - [ ] handle run-only mode? - [ ] config restoration itself is not thread-safe (tracked: https://github.com/pytorch/pytorch/issues/111150) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111299 Approved by: https://github.com/ezyang	2023-11-17 09:59:58 +00:00
Jez Ng	dc63248b76	Make dynamo configs more amenable to static type checking (#112130 ) `install_config_module` makes a regular module into a ConfigModule with extra methods defined on it. mypy thinks those extra methods (or module functions) are undefined since it cannot analyze something so dynamic. As a workaround, I've created a fake module that defines these extra functions, which I import into the config modules during type checking. As part of this change, I've also added more types to config_utils.py and enabled typechecking for torch/_dynamo/config.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112130 Approved by: https://github.com/jansel	2023-11-08 21:17:45 +00:00
William Wen	ad1c3467e2	[dynamo] run guard fail hooks for each cache entry for which there is a cache miss (#110325 ) Attempt number 2 at https://github.com/pytorch/pytorch/issues/108950. Improves debugging for guard failures/recompilations by: - only running guard fail reason generation during recompilation, instead of when a guard fails during dynamo cache lookup (so generating guard failure reasons is not on the critical path) - ~~always reporting all guard failures~~ Reports the first-failing guard failure for each cache entry. We don't expect a performance hit since the guard fail reasons are only generated at recompile time rather than runtime. Perf benchmark to check this (https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Fri,%2027%20Oct%202023%2017:42:43%20GMT&stopTime=Fri,%2003%20Nov%202023%2017:42:43%20GMT&granularity=hour&mode=training&dtype=amp&lBranch=gh/williamwen42/62/head&lCommit=f4724f5ffc6d17ceae513a42fc18627be7b85482&rBranch=main&rCommit=29f3d392bf230072e3bffae37b078e770cae1956). We may also need to verify this on benchmarks where guard fails are common. Sample script: ```python import torch def generate_data(b): return ( torch.randn(b, 3, 32, 32).to(torch.float32).cuda(), torch.randint(1000, (b,)).cuda(), ) from torchvision.models import resnet18 def init_model(): return resnet18().to(torch.float32).cuda() model = init_model() model_opt = torch.compile(model, dynamic=False) for b in range(16, 32): data = generate_data(b) model_opt(data[0]) ``` Sample logs: ```bash (/data/users/williamwen/py310-env) [williamwen@devgpu020.odn1 /data/users/williamwen/pytorch (wwen/log-all-guards)]$ python playground5.py /data/users/williamwen/pytorch/torch/_inductor/compile_fx.py:141: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8) [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] function: 'forward' (/data/users/williamwen/torchvision/torchvision/models/resnet.py:284) [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] last reason: tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles". [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html. (/data/users/williamwen/py310-env) [williamwen@devgpu020.odn1 /data/users/williamwen/pytorch (wwen/log-all-guards)]$ TORCH_LOGS="recompiles" python playground5.py /data/users/williamwen/pytorch/torch/_inductor/compile_fx.py:141: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 17 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 18 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 18 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 19 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 19 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 19 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 20 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 21 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 22 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 22, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 23 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 23, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 22, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8) [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] function: 'forward' (/data/users/williamwen/torchvision/torchvision/models/resnet.py:284) [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] last reason: tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles". [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html. [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 25 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 26 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 26 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 27 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 27 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 27 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 28 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 29 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 29, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 30 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 30, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 29, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 31 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110325 Approved by: https://github.com/ezyang, https://github.com/jon-chuang	2023-11-07 20:10:59 +00:00
Peter Bell	65ecb36621	Move ShapeEnv config out of dynamo (#112933 ) Previously there was a circular dependency between fx and dynamo that happened to work out since ShapeEnv didn't access the config at module init time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112933 Approved by: https://github.com/ezyang	2023-11-07 01:10:25 +00:00
Jon Chuang	d090c18fca	[dynamo] annotate config with `@compile_ignored` (#111303 ) Fixes: #111221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111303 Approved by: https://github.com/ezyang	2023-10-26 05:41:29 +00:00
rzou	e5049648be	Add a "pt2 compliant" tag; add config to graph break on non-pt2_compliant ops (#111933 ) This PR: - adds the pt2 compliant tag. This tag specifies that the operator works with the PT2 compilation APIs. A custom op author should test their ops with opcheck if they choose to add this tag. - adds a config for Dynamo to allow only pt2 compliant ops into the graph and graph break on all other OpOverload/OpOverloadPacket. Bikeshedding help wanted on the name of the tag. It should be easily grep-able so we can set up rules for it. Test Plan: - new tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111933 Approved by: https://github.com/ezyang ghstack dependencies: #111912, #111915, #111948	2023-10-25 21:20:59 +00:00
Edward Z. Yang	126d422cf0	Error if you try to run Dynamo compiled function under torch.jit.trace (#111321 ) Fixes https://github.com/pytorch/pytorch/issues/111319 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111321 Approved by: https://github.com/Chillee	2023-10-16 13:52:29 +00:00
Kaichao You	69dcbc02b0	[Dynamo]Expose bytecode hooks and add example usage for decompilation in docs (#110714 ) Dynamo dynamically translate bytecode of python functions, which is powerful but with difficult-to-understand bytecode. Most users cannot understand python bytecode. Although a general purpose way to decompile python bytecode into source code is very difficult, I find that this work can be greatly simplified since Dynamo already cleans up the code: the bytecode generated by Dynamo is a reduced subset of well-behaved python bytecode. I created a tiny decompiler for pytorch 2.0, named `depyf`: https://github.com/youkaichao/depyf . There are several takeaways: - It supports pyton 3.7 - 3.11 (both inclusive), the same python versions supported by pytorch. Since the main usage of this library is to understand pytorch 2.0, I plan to keep pace with pytorch. If pytorch supports a new python version, I can add support for that. (Actually, the core code is just about 1k lines. Adding support for new versions of python bytecode can be done in just several days.) - I have tested the correctness of decompiled source code in torchbench. I capture the modified bytecode generated by Dynamo, decompile it into source code, and then compile it into new bytecode, replace the Dynamo generated bytecode with new bytecode. And it passed all the accuracy tests for timm models. For huggingface models, the situation is more complicated: all failed cases are caused by the compile step: some functions use the `__class__` as closure variables, but decompiler can only get the code object, so it has no way to figure out the `__class__` , leading to a name error when compiling the decompiled code. That said, it passed the rest tests without the `__class__` issue. Please see the log file https://cloud.tsinghua.edu.cn/f/685e4af8d930499baa7c/?dl=1 and https://cloud.tsinghua.edu.cn/f/cab89500e15e4b62890b/?dl=1 for details. With the above efforts, I think it would be great to add an additional logging option in Dynamo: we can try to decompile the generated bytecode into source code, so that users can have a rough idea of what the modified bytecode does. It does not affect the workflow of Dynamo, but just adds more debug information. An example code from the [doc](https://pytorch.org/docs/main/torch.compiler_deepdive.html): ```python from typing import List import torch from torch import _dynamo as torchdynamo def my_compiler(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]): print("my_compiler() called with FX graph:") gm.graph.print_tabular() return gm.forward # return a python callable @torchdynamo.optimize(my_compiler) def toy_example(a, b): x = a / (torch.abs(a) + 1) if b.sum() < 0: b = b * -1 return x * b for _ in range(100): toy_example(torch.randn(10), torch.randn(10)) ``` Run with `export TORCH_LOGS="+dynamo,guards,bytecode"`. Bytecode logging: ``` [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE toy_example /Users/youkaichao/DeepLearning/depyf/ykc_test.py line 8 [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 0 LOAD_FAST 0 (a) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_GLOBAL 0 (torch) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_METHOD 1 (abs) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 LOAD_FAST 0 (a) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 CALL_METHOD 1 [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 LOAD_CONST 1 (1) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 BINARY_ADD [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 BINARY_TRUE_DIVIDE [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 16 STORE_FAST 2 (x) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 11 18 LOAD_FAST 1 (b) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 20 LOAD_METHOD 2 (sum) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 22 CALL_METHOD 0 [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 24 STORE_FAST 3 (__temp_2) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 26 LOAD_FAST 3 (__temp_2) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 28 LOAD_CONST 2 (0) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 30 COMPARE_OP 0 (<) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 32 POP_JUMP_IF_FALSE 21 (to 42) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 13 34 LOAD_FAST 1 (b) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 36 LOAD_CONST 3 (-1) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 38 BINARY_MULTIPLY [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 40 STORE_FAST 1 (b) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 >> 42 LOAD_FAST 2 (x) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 44 LOAD_FAST 1 (b) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 46 BINARY_MULTIPLY [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 48 RETURN_VALUE [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE toy_example /Users/youkaichao/DeepLearning/depyf/ykc_test.py line 8 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 0 LOAD_GLOBAL 3 (__compiled_fn_0) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 0 (a) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_FAST 1 (b) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 CALL_FUNCTION 2 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 UNPACK_SEQUENCE 2 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 STORE_FAST 2 (x) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 POP_JUMP_IF_FALSE 12 (to 24) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 LOAD_GLOBAL 4 (__resume_at_34_1) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 16 LOAD_FAST 1 (b) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 18 LOAD_FAST 2 (x) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 20 CALL_FUNCTION 2 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 22 RETURN_VALUE [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] >> 24 LOAD_GLOBAL 5 (__resume_at_42_2) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 26 LOAD_FAST 1 (b) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 28 LOAD_FAST 2 (x) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 30 CALL_FUNCTION 2 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 32 RETURN_VALUE [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ``` New output with this PR: ``` [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code: [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def toy_example(a, b): [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] __temp_1 = __compiled_fn_0(a, b) [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] x = __temp_1[0] [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] if __temp_1[1]: [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] return __resume_at_34_1(b, x) [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] return __resume_at_42_2(b, x) [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues. ``` The rest two log (please pay attention to the output `possible source code:`): ``` [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 0 JUMP_ABSOLUTE 22 (to 44) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 2 (a) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_GLOBAL 0 (torch) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 LOAD_ATTR 1 (abs) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 LOAD_FAST 2 (a) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 CALL_FUNCTION 1 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 LOAD_CONST 1 (1) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 BINARY_ADD [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 16 BINARY_TRUE_DIVIDE [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 18 STORE_FAST 1 (x) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 20 LOAD_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 22 LOAD_ATTR 2 (sum) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 24 CALL_FUNCTION 0 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 26 STORE_FAST 3 (__temp_2) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 28 LOAD_FAST 3 (__temp_2) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 30 LOAD_CONST 2 (0) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 32 COMPARE_OP 0 (<) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 34 POP_JUMP_IF_FALSE 22 (to 44) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 36 LOAD_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 38 LOAD_CONST 3 (-1) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 40 BINARY_MULTIPLY [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 42 STORE_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 >> 44 LOAD_FAST 1 (x) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 46 LOAD_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 48 BINARY_MULTIPLY [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 50 RETURN_VALUE [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 0 LOAD_GLOBAL 3 (__compiled_fn_3) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_FAST 1 (x) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 CALL_FUNCTION 2 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 UNPACK_SEQUENCE 1 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 RETURN_VALUE [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code: [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def <resume in toy_example>(b, x): [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] return __compiled_fn_3(b, x)[0] [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues. ``` ``` [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12 [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 0 JUMP_ABSOLUTE 18 (to 36) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 2 (a) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_GLOBAL 0 (torch) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 LOAD_ATTR 1 (abs) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 LOAD_FAST 2 (a) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 CALL_FUNCTION 1 [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 LOAD_CONST 1 (1) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 BINARY_ADD [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 16 BINARY_TRUE_DIVIDE [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 18 STORE_FAST 1 (x) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 20 LOAD_FAST 0 (b) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 22 LOAD_ATTR 2 (sum) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 24 CALL_FUNCTION 0 [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 26 STORE_FAST 3 (__temp_2) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 28 LOAD_FAST 3 (__temp_2) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 30 LOAD_CONST 2 (0) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 32 COMPARE_OP 0 (<) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 34 POP_JUMP_IF_FALSE 22 (to 44) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 13 >> 36 LOAD_FAST 0 (b) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 38 LOAD_CONST 3 (-1) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 40 BINARY_MULTIPLY [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 42 STORE_FAST 0 (b) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 >> 44 LOAD_FAST 1 (x) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 46 LOAD_FAST 0 (b) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 48 BINARY_MULTIPLY [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 50 RETURN_VALUE [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12 [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 0 LOAD_GLOBAL 3 (__compiled_fn_4) [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 0 (b) [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_FAST 1 (x) [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 CALL_FUNCTION 2 [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 UNPACK_SEQUENCE 1 [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 RETURN_VALUE [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code: [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def <resume in toy_example>(b, x): [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] return __compiled_fn_4(b, x)[0] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110714 Approved by: https://github.com/jansel	2023-10-13 12:36:00 +00:00
Michael Voznesensky	395d0eaea0	Dynamo - config gated torch.distributed allow, exclusion for special leaf funcs (#110894 ) `is_allowed` is a tricky bit of functionality - it sits early up in builder and is used to drive the creation of TorchVariable (more notes here, meta only https://fb.workplace.com/groups/pytorch.dev/permalink/1393563781222098/) If we are tracing distributed in full, we want to route certain calls in distributed to NOT PASS is_allowed (this does not, confusingly, mean that they are not allowed, lol, but rather that we dont want them to become TorchVariable), others, we are fine with preserving. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110894 Approved by: https://github.com/ezyang	2023-10-12 09:25:51 +00:00
Michael Voznesensky	1e7947b3e0	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 ) This reverts commit `f786fbdebd`. Forward fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2023-10-11 05:16:47 +00:00
Jon Chuang	84ad3ed7b2	[dynamo] add config for displaying all guard failures (#110927 ) Fixes https://github.com/pytorch/pytorch/issues/110879 Example output: ``` ('Recompiling function fn in /home/jonch/Desktop/Programming/mlsys/pytorch/test/dynamo/test_misc.py:4578', 'triggered by the following guard failures: ["___check_type_id(L[\'obj\'], 94834370481168)", "L[\'obj\'].x == -0.5"]') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110927 Approved by: https://github.com/lezcano	2023-10-10 19:57:44 +00:00
Adnan Akhundov	f74937741e	Remove runtime assertions between export and AOT compilation (#110710 ) Summary: The runtime assertions inserted in the `torch._export.export` by the `_AddRuntimeAssertionsForInlineConstraintsPass` lead to errors in AOT Inductor like #109884. In `torch._export.aot_compile` export and AOT compilation are run consecutively which would lead to the above issue if any assertions are inserted. In this PR, we're adding a new parameter / flag to `torch._export.aot_compile`, `remove_runtime_assertions`, to remove the assertions inserted during export before AOT compilation. The flag is set to `False` for BC. Additionally, we remove the flag `add_runtime_assertions_for_inline_constraints` recently added to `torch._dynamo.config`, as it can lead to undesirable `torch._export` behavior and is 's no longer required for the AOT Inductor testing purposes. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110710 Approved by: https://github.com/zhxchen17, https://github.com/chenyang78	2023-10-06 21:09:35 +00:00
chilli	f767a6c57a	Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504 Approved by: https://github.com/mlazos, https://github.com/eellison ghstack dependencies: #110501	2023-10-05 15:47:30 +00:00
PyTorch MergeBot	1e4c0641ce	Revert "Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 )" This reverts commit `9648df1a6a`. Reverted https://github.com/pytorch/pytorch/pull/110504 on behalf of https://github.com/PaliC due to temporarily will revert as it's causing problems with difftrain import ([comment](https://github.com/pytorch/pytorch/pull/110504#issuecomment-1749132253))	2023-10-05 15:28:23 +00:00
chilli	9648df1a6a	Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504 Approved by: https://github.com/mlazos, https://github.com/eellison ghstack dependencies: #110501	2023-10-05 01:34:57 +00:00
Yanbo Liang	9bc5e10899	[New][1/N] Dynamo skipfiles refactor (#110330 ) This is the replacement of #109567. Now I preserved all existing semantics and only focusing on API (for developers) and code structure changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110330 Approved by: https://github.com/ezyang	2023-10-03 16:50:33 +00:00
Adnan Akhundov	2ead6c2f6e	Skip launching kernels with zero grid in AOT Inductor (#110312 ) Summary: with the grid computed in terms of unbacked `SymInt`s, it can happen that the grid is zero size. This causes CUDA error on `cuLaunchKernel` in the AOT Inductor codegen. In this PR, when the grid contains unbacked `SymInt`s, a check is added around the `launchKernel` in the AOT Inductor's C++ wrapper codegen to make sure that the grid is not zero-size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110312 Approved by: https://github.com/chenyang78	2023-09-30 09:12:56 +00:00
atalman	b253fc9c93	Revert "[1/N] Dynamo skipfiles refactor (#109567 )" (#110296 ) This reverts commit `84c5435b29`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110296 Approved by: https://github.com/yanboliang	2023-09-29 20:35:46 +00:00
Yanbo Liang	84c5435b29	[1/N] Dynamo skipfiles refactor (#109567 ) This is 1/N of the dynamo skipfiles/allowed_functions refactor, the major change in this PR includes: * Refactor & define the [skipfiles rules](https://github.com/pytorch/pytorch/pull/109567/files#diff-5aa3ce9db729bf0901ea97a5d3cc51924cc8575d9c516c1c8f572a35de92544aR56) and interface * For every ```skipfiles.check```, we return both the check result and the skip/inline reason and log them for debugging. * We found several latent issues/bugs and incorrect implementations in the codebase, but I'm planning to fix them in follow-up PRs to make the refactor decoupled with bug fixes. * More details in the inline comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109567 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/anijain2305	2023-09-28 18:36:46 +00:00
PyTorch MergeBot	75462fd870	Revert "[1/N] Dynamo skipfiles refactor (#109567 )" This reverts commit `f8e0ebec8c`. Reverted https://github.com/pytorch/pytorch/pull/109567 on behalf of https://github.com/huydhn due to Many jobs are failing in trunk after this with FILENAME_ALLOWLIST is not defined error `f8e0ebec8c`. This looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/109567#issuecomment-1738344950))	2023-09-28 02:22:22 +00:00
Yanbo Liang	f8e0ebec8c	[1/N] Dynamo skipfiles refactor (#109567 ) This is 1/N of the dynamo skipfiles/allowed_functions refactor, the major change in this PR includes: * Refactor & define the [skipfiles rules](https://github.com/pytorch/pytorch/pull/109567/files#diff-5aa3ce9db729bf0901ea97a5d3cc51924cc8575d9c516c1c8f572a35de92544aR56) and interface * For every ```skipfiles.check```, we return both the check result and the skip/inline reason and log them for debugging. * We found several latent issues/bugs and incorrect implementations in the codebase, but I'm planning to fix them in follow-up PRs to make the refactor decoupled with bug fixes. * More details in the inline comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109567 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/anijain2305	2023-09-28 01:21:59 +00:00
Animesh Jain	2ac7e52d34	[dynamo][nn_module_guards] Config flag to disable nn_module_guards (#110039 ) This flag is requested by @Chillee who is seeing recompilations with simple gpt experiments. We are observing recompilations because `_parameters` ordered dict keeps changing from run to run, and its unclear why that is happening. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110039 Approved by: https://github.com/Chillee ghstack dependencies: #110023	2023-09-26 06:35:23 +00:00
Yukio Siraichi	f35cc0fb6f	Don't record function call if `ShapeEnv` is not found. (#109904 ) Fix: #109844 - Redirecting execution to original function if `ShapeEnv` instance is not found in its arguments - Removed `dont_record_shape_env_events`, as it wasn't being used anywhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/109904 Approved by: https://github.com/ezyang	2023-09-23 19:48:24 +00:00
Will Feng	3f3e353885	torch.compile + selective activation checkpointing (#105489 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105489 NOTE: this PR is tagged "not user facing", because it's not ready to be announced externally yet. This PR implements torch.compile + selective activation checkpoint (SAC) integration, by using `TagActivationCheckpoint` (same backend as torch.compile + full activation checkpoint integration). TorchDispatchMode based implementation cannot support including inplace ops in the checkpointed region at the moment (the reason for this needs investigation), and there is also no way to ban them (because TorchDispatchMode now only sees "after-functionalization" ops, so can't detect if an op is in-place). Hence we hide torch.compile + SAC behind a flag (`torch._dynamo.config._experimental_support_context_fn_in_torch_utils_checkpoint`) and will only use it internally for cases that are known to not have in-place ops. This state won't last too long, because in-place op will at least be able to be detected after Brian's mode reordering and related functionalization changes. So next steps after this PR: 1. Wait for Brian's mode reordering and related functionalization changes to land, and then try to enable the "inplace ops" unit test for torch.compile + selective activation checkpoint (if it doesn't work, investigate why). 2. Unify selective- and full-checkpoint under TorchDispatchMode based implementation. Differential Revision: D47497145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105489 Approved by: https://github.com/anijain2305	2023-09-21 16:24:11 +00:00
Edward Yang	88600e7d2e	[RELAND] Force synced KJT to trace unbacked SymInt (#108960 ) (#109216 ) Summary: The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as unbacked integers. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 ``` buck2 test //caffe2/test/dynamo:test_dynamo_torchrec buck2 run 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup' ``` Differential Revision: D49236757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109216 Approved by: https://github.com/voznesenskym	2023-09-18 14:39:44 +00:00
Animesh Jain	f786fbdebd	Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109323 Approved by: https://github.com/huydhn, https://github.com/voznesenskym	2023-09-15 08:44:14 +00:00
ydwu4	94a54b89aa	[dynamo] Add BACKEND_MATCH guard to detect and recompile when backend changes (#107337 ) Motivation: We try to make torch.cond use torch.compile automatically so that we could error out when there is side-effects in the branches and correctly handle the closures. Before this PR, we have a warning if we don't turn on a config raise_on_backend_change (turning it on gives us an error) for the following code: ```python def foo() # Inside torch.cond, we'd like to do something like torch.compile(foo, backend="eager", fullgraph=True)(...) ... # Users may then call torch.compile somewhere else. # Dynamo will use the cached code of foo for "eager" backend # but we expect dynamo to recompile with "inductor" backend. torch.compile(foo, backend="inductor")(...) ``` This PR adds a BACKEND_MATCH guard. Effectively, it implements a per-backend cache. In the above example, the cached code for "eager" won't work for "inductor" due to guard check failures and the second torch.compile will do a re-compilation. In the future, it might be useful to have something like a configuration guard that guards against dynamo configuration changes across different compiles (e.g. compile a function with fullgraph=False then compile it again with fullgraph=True). Implementation: 1. We add a guarded_backend_cache and check the most_recent_backend against the backend associated with cached code. We also remove the raise_on_backend_change flag. Note: More lines are printed for debug log due to newly added context manager and guard adds . Test Plan: Removed original tests that raise on different backend and add a new test to test whether the BACKEND_MATCH guard can guard against backend change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107337 Approved by: https://github.com/jansel	2023-09-14 15:49:30 +00:00
Evgeni Burovski	3ac2396e00	Fix `torch._numpy.random` (#108944 ) Fix several issues with `torch._numpy.random` functions on eager 1. actually return scalars when `size is None` 2. fix dispatch with USE_NUMPY_STREAM 3. make tnp.random functions composable: make numpy functions receive numpy arguments, not `tnp.ndarray`s 4. fix random.shuffle for e.g. lists The main need for this gymnastics is due to `np.random` functions returning an ndarray or python scalar depending on the `size` argument. We decided a while ago to replicate this behavior in `tnp.random` and not elsewhere where we always return 0D arrays instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108944 Approved by: https://github.com/lezcano	2023-09-13 05:08:19 +00:00
Yukio Siraichi	12e8530b35	Record and replay for ShapeEnv. (#107989 ) This PR introduces record and replay functionality for `ShapeEnv` instances. In short, throughout the execution of a program, we record events (e.g. function calls that modify its state) so that, in the future, we are able to reproduce any intermediary state of the instance. In summary, this PR introduces the following changes (they mostly belong to _symbolic_shapes.py_ unless otherwise stated): - Create `ShapeEnvEvent` class for recording function calls + arguments - Create `record_shapeenv_event` decorator and decorate every function that changes the state of a `ShapeEnv`: it creates an appropriate event and add it to the available ShapeEnv instance (sometimes it has to extract from `SymTypes`). - Create `SymNode.with_shape_env` convenient function for replacing `ShapeEnv` references - Wraps `ShapeEnv` initialization method: so that we also save the exact way a `ShapeEnv` was constructed, i.e. arguments - Introduces a way to compare two `ShapeEnv` instances, defining a concept of state for that class. In short, the state of `ShapeEnv` is every variable that may change the execution flow - Create `check_shape_env_recorded_events` dynamo configuration for enabling the check for equality the state of `ShapeEnv` with another one that was constructed by replaying all the recorded events. This check takes place inside `produce_guards` - Create `replay_shape_env_events` function for replaying given events. It assumes the first event is `ShapeEnv` initialization function Pull Request resolved: https://github.com/pytorch/pytorch/pull/107989 Approved by: https://github.com/ezyang	2023-09-13 00:22:38 +00:00
PyTorch MergeBot	56c2386157	Revert "reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 )" This reverts commit `d4230e5574`. Reverted https://github.com/pytorch/pytorch/pull/108883 on behalf of https://github.com/huydhn due to Per the discussion thread on D49122208, reverting this change ([comment](https://github.com/pytorch/pytorch/pull/108883#issuecomment-1712707853))	2023-09-10 04:40:02 +00:00
Animesh Jain	d4230e5574	reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108883 Approved by: https://github.com/voznesenskym, https://github.com/huydhn	2023-09-09 03:12:31 +00:00
PyTorch MergeBot	72f24d0001	Revert "[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 )" This reverts commit `34bb74c4cf`. Reverted https://github.com/pytorch/pytorch/pull/108528 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it has some nasty merge conflicts after the revert of D48910794. I need to revert this so the conflict could be resolved. Please help rebase this tomorrow and reland the change ([comment](https://github.com/pytorch/pytorch/pull/108528#issuecomment-1711034781))	2023-09-08 03:49:41 +00:00
PyTorch MergeBot	38fcf77a1b	Revert "[dynamo] Add BACKEND_MATCH guard to detect and recompile when backend changes (#107337 )" This reverts commit `1a64ec7dd4`. Reverted https://github.com/pytorch/pytorch/pull/107337 on behalf of https://github.com/huydhn due to Sorry for reverting your change but inductor perf smoke test starts to regress after this ([comment](https://github.com/pytorch/pytorch/pull/107337#issuecomment-1710974588))	2023-09-08 02:03:48 +00:00
ydwu4	1a64ec7dd4	[dynamo] Add BACKEND_MATCH guard to detect and recompile when backend changes (#107337 ) Motivation: We try to make torch.cond use torch.compile automatically so that we could error out when there is side-effects in the branches and correctly handle the closures. Before this PR, we have a warning if we don't turn on a config raise_on_backend_change (turning it on gives us an error) for the following code: ```python def foo() # Inside torch.cond, we'd like to do something like torch.compile(foo, backend="eager", fullgraph=True)(...) ... # Users may then call torch.compile somewhere else. # Dynamo will use the cached code of foo for "eager" backend # but we expect dynamo to recompile with "inductor" backend. torch.compile(foo, backend="inductor")(...) ``` This PR adds a BACKEND_MATCH guard. Effectively, it implements a per-backend cache. In the above example, the cached code for "eager" won't work for "inductor" due to guard check failures and the second torch.compile will do a re-compilation. In the future, it might be useful to have something like a configuration guard that guards against dynamo configuration changes across different compiles (e.g. compile a function with fullgraph=False then compile it again with fullgraph=True). Implementation: 1. We add a guarded_backend_cache and check the most_recent_backend against the backend associated with cached code. We also remove the raise_on_backend_change flag. 2. Then newly added context manager and guard adds more lines for debug log so we change the uppper limit from 50 to 55. Test Plan: Removed original tests that raise on different backend and add a new test to test whether the BACKEND_MATCH guard can guard against backend change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107337 Approved by: https://github.com/jansel	2023-09-07 22:45:54 +00:00
Animesh Jain	34bb74c4cf	[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 ) This PR is a 99% copy paste of Sam Gross (@colesbury) work at https://github.com/pytorch/pytorch/pull/100642. Copied from there -------- The NN_MODULE guard now subsumes guards on Module attributes. The check_fn will fail if the module attributes are changed (such as Module.training), parameters, submodules, and buffers are added or removed, and if fields are changed on the type itself. This gives up specificity in the guard check -- if any field is changed the check_fn fails -- for faster overall checks. ----- Pull Request resolved: https://github.com/pytorch/pytorch/pull/108528 Approved by: https://github.com/ezyang	2023-09-07 01:45:47 +00:00
Animesh Jain	29f1097891	[dynamo] Reduce cache size limit to 8 (#108526 ) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/108526 Approved by: https://github.com/ezyang	2023-09-05 17:56:26 +00:00
Animesh Jain	9d2ffc5dfa	[reland][Dynamo] cache_size policy #107496 (#108069 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108069 Approved by: https://github.com/yanboliang	2023-08-28 22:06:54 +00:00
PyTorch MergeBot	b4c6c4da88	Revert "[Dynamo] cache_size policy (#107496 )" This reverts commit `4175a6e944`. Reverted https://github.com/pytorch/pytorch/pull/107496 on behalf of https://github.com/ZainRizvi due to Breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/107496#issuecomment-1693590121))	2023-08-25 16:07:14 +00:00
Animesh Jain	4175a6e944	[Dynamo] cache_size policy (#107496 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107496 Approved by: https://github.com/ezyang ghstack dependencies: #107645	2023-08-24 21:50:00 +00:00
lezcano	db39a81e1e	Add a flag that allows breaking on NumPy ops (#107687 ) This was removed in `63d406a6a9` Resotiring, as it's rather useful for debugging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107687 Approved by: https://github.com/larryliu0820	2023-08-23 01:21:22 +00:00
Yukio Siraichi	d8ad74857c	Run translation validation on tracing error. (#106645 ) This PR wraps `InstructionTranslator` run with a try-catch block so as to run the translation validation (TV) if it ends up raising an error. In this context, we run TV so as to catch simplification errors. These may turn `ShapeEnv.divisible` and `ShapeEnv.replacements` incorrect. For example: #101173 describes a SymPy simplification bug that doesn't reach TV, since it's run only in the end of the tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106645 Approved by: https://github.com/ezyang	2023-08-14 13:43:34 +00:00
lezcano	a9dca53438	NumPy support in torch.compile (#106211 ) RFC: https://github.com/pytorch/rfcs/pull/54 First commit is the contents of https://github.com/Quansight-Labs/numpy_pytorch_interop/ We have already been using this in core for the last few months as a external dependency. This PR pulls all these into core. In the next commits, I do a number of things in this order - Fix a few small issues - Make the tests that this PR adds pass - Bend backwards until lintrunner passes - Remove the optional dependency on `torch_np` and simply rely on the upstreamed code - Fix a number dynamo tests that were passing before (they were not tasting anything I think) and are not passing now. Missing from this PR (but not blocking): - Have a flag that deactivates tracing NumPy functions and simply breaks. There used to be one but after the merge stopped working and I removed it. @lezcano to investigate. - https://github.com/pytorch/pytorch/pull/106431#issuecomment-1667079543. @voznesenskym to submit a fix after we merge. All the tests in `tests/torch_np` take about 75s to run. This was a work by @ev-br, @rgommers @honno and I. I did not create this PR via ghstack (which would have been convenient) as this is a collaboration, and ghstack doesn't allow for shared contributions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106211 Approved by: https://github.com/ezyang	2023-08-11 00:39:32 +00:00
Thomas Ortner	cc21fa75a3	Enable dynamic shapes of torch.nn.Parameter (#105855 ) This PR adds a new configuration that enables shapes of torch.nn.Parameter to be treated as dynamic in order to avoid extensive recompilation when Paramters are used instead of Tensor. This features addresses part of issue #105279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105855 Approved by: https://github.com/ezyang	2023-08-08 05:40:01 +00:00
Yanbo Liang	df8abaaf5f	[Dynamo] Revert 'Enable torch._dynamo.config.suppress_errors by default' (#106562 ) D47969512 was the original diff to revert this, but the diff train doesn't work well, so I have to split it into two part: this OSS PR and another separate diff to revert the fbcode change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106562 Approved by: https://github.com/angelayi	2023-08-04 16:46:21 +00:00
Edward Z. Yang	76163a56c0	Refactor stack handling to always use TracingContext to populate real stack on exception (#106277 ) The basic gist of the PR is simple, but it's accompanied with some careful modifications and unit tests to make sure I got it right. Check inline comments for more details. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106277 Approved by: https://github.com/albanD, https://github.com/voznesenskym	2023-08-02 00:09:16 +00:00
Tugsbayasgalan Manlaibaatar	df50f91571	Support fx_pytree in dynamo (#105574 ) This PR does two things: 1. Make dynamo trace through fx_pytree (on top of torch.utils._pytree) so that generated graph modules can be retraced. 2. Fix bug where unflatten not returning dynamo VariableTracker. Differential Revision: [D47734623](https://our.internmc.facebook.com/intern/diff/D47734623) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105574 Approved by: https://github.com/yanboliang, https://github.com/ydwu4	2023-07-29 05:08:15 +00:00
Edward Z. Yang	49e047e0f9	Delete dead summarize_dim_constraints (#106053 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106053 Approved by: https://github.com/ydwu4	2023-07-27 03:08:24 +00:00
PyTorch MergeBot	050d3de07d	Revert "Correct dynamo logging docs (#105658 )" This reverts commit `f3a261e096`. Reverted https://github.com/pytorch/pytorch/pull/105658 on behalf of https://github.com/PaliC due to breaking docs `f3a261e096` ([comment](https://github.com/pytorch/pytorch/pull/105658#issuecomment-1646310865))	2023-07-21 22:38:28 +00:00
David Radley	f3a261e096	Correct dynamo logging docs (#105658 ) Fixes #105657 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105658 Approved by: https://github.com/zou3519	2023-07-21 21:37:02 +00:00
Yanbo Liang	4c73016ff2	[Dynamo] Enable torch._dynamo.config.suppress_errors by default (#105307 ) Summary: We are working toward full model compilation, where when compilation error happens, we just fall back to eager mode rather than error out. But at the same time, we should fix these issues if they are bugs. We will: * 1/ log warnings in OSS; * 2/ log warnings and write them into Scuba in fbcode; to prevent us from ignoring these issues. Test Plan: Manual test Differential Revision: D47506314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105307 Approved by: https://github.com/jansel	2023-07-21 19:17:46 +00:00
Yukio Siraichi	0b6de0eb1c	Improve validator module behavior if Z3 is not installed. (#105168 ) Fixes: #105143 In summary, the changes are: - Check if Z3 is installed when the module is loaded - Naming consistently as "translation validation" (not "validator") - Skipping tests if Z3 is not installed Pull Request resolved: https://github.com/pytorch/pytorch/pull/105168 Approved by: https://github.com/ezyang	2023-07-19 13:11:22 +00:00
Aleksandar Samardžić	fc2f87b281	Add semi-structured sparse conversions (#103830 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103830 Approved by: https://github.com/amjames, https://github.com/jcaip, https://github.com/cpuhrsch	2023-07-13 21:09:09 +00:00
Edward Z. Yang	87cf51cc7f	Switch automatic_dynamic_shapes to True by default in fbcode (#104883 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104883 Approved by: https://github.com/xw285cornell	2023-07-13 17:37:57 +00:00
PyTorch MergeBot	0e7529940d	Revert "Switch automatic_dynamic_shapes to True by default in fbcode (#104883 )" This reverts commit `d1ca98665f`. Reverted https://github.com/pytorch/pytorch/pull/104883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/104883#issuecomment-1632931223))	2023-07-12 17:30:18 +00:00
Edward Z. Yang	d1ca98665f	Switch automatic_dynamic_shapes to True by default in fbcode (#104883 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104883 Approved by: https://github.com/xw285cornell	2023-07-10 20:32:22 +00:00
Yukio Siraichi	85cbe7e6fd	Add timeout for translation validation instances. (#104654 ) As of now, translation validation runs to its completion. However, Z3 is time consuming. PR #104464, for example, disables translation validation for a few benchmarks. Instead, this PR introduces a timeout for translation validation. In that case, Z3 will return `unknown`, since it wasn't able to prove or disprove the assertions. Then, we log it as a warning, but don't stop execution. Here's a summary of the changes: - Added an environment variable for turning translation validation on and off - Added an environment variable for setting the translation validation timeout - Possibly reverts the changes in #104464 - ~~Move from "QF_NRA" to "QF_NIRA" logic~~ - ~~It makes more sense, given the nature of the problems~~ - "QF_NRA" seems to solve more instances of _dynamo/test_dynamic_shapes.py_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104654 Approved by: https://github.com/ezyang	2023-07-08 19:19:00 +00:00
Edward Z. Yang	2385dad4b3	Enable automatic_dynamic_shapes by default (#103623 ) Some notes: * I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops * A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it * expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103623 Approved by: https://github.com/voznesenskym	2023-07-05 00:25:02 +00:00
Yukio Siraichi	d0a72ec5e4	Translation validator for dynamo guards. (#102563 ) This PR introduces a translation validator for dynamo guards. In summary, it verifies whether the guards issued as Python code are sound, w.r.t the initial guards. The main changes in this PR are: - Create an FX graph for dynamic shapes - Translate "the original" guards from the FX graph to Z3 - Check if the guards produced by `produce_guards` are sound w.r.t. the ones from the FX graph gh-stack version of the PR #101146. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102563 Approved by: https://github.com/ezyang	2023-06-28 22:32:53 +00:00
Michael Voznesensky	ec24f1e4cc	Simulate treespec flattening/unflattening (#101896 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101896 Approved by: https://github.com/jansel, https://github.com/anijain2305	2023-06-23 10:53:15 +00:00
kshitij12345	d552c271db	[pt2] grad support (#102264 ) Teach dynamo about grad Pull Request resolved: https://github.com/pytorch/pytorch/pull/102264 Approved by: https://github.com/zou3519	2023-06-21 10:13:09 +00:00
PyTorch MergeBot	e737a8486f	Revert "[pt2] grad support (#102264 )" This reverts commit `85b83954c8`. Reverted https://github.com/pytorch/pytorch/pull/102264 on behalf of https://github.com/huydhn due to This is failing in trunk `85b83954c8` and looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/102264#issuecomment-1600001309))	2023-06-21 03:02:55 +00:00
kshitij12345	85b83954c8	[pt2] grad support (#102264 ) Teach dynamo about grad Pull Request resolved: https://github.com/pytorch/pytorch/pull/102264 Approved by: https://github.com/zou3519	2023-06-21 01:37:08 +00:00
Edward Z. Yang	bc6ec97e02	Switch dynamic_shapes to True by default (#103597 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597 Approved by: https://github.com/voznesenskym	2023-06-15 15:16:20 +00:00
Edward Z. Yang	ddf4cd69ec	Delete ifdyn and ifunspec combinators (#103596 ) Replaced with expect tests for ease of updating. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103596 Approved by: https://github.com/voznesenskym	2023-06-15 00:14:17 +00:00
Michael Lazos	6c6c897d6b	Add graph break logging option instead of config flag (#103202 ) Make graph break logging a logging option vs a config setting Pull Request resolved: https://github.com/pytorch/pytorch/pull/103202 Approved by: https://github.com/yanboliang, https://github.com/anijain2305	2023-06-12 19:52:31 +00:00
Edward Z. Yang	414ec6ce97	Turn off automatic_dynamic_shapes in prep for dynamic-by-default (#103320 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103320 Approved by: https://github.com/Skylion007	2023-06-10 02:49:59 +00:00
PyTorch MergeBot	f79d2b45fb	Revert "Replace _dynamo.config with an object instead of module (#96455 )" This reverts commit `3864207c2a`. Reverted https://github.com/pytorch/pytorch/pull/96455 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/96455#issuecomment-1576162237))	2023-06-05 07:06:14 +00:00
Han Qi	3864207c2a	Replace _dynamo.config with an object instead of module (#96455 ) Summary: Replace _dynamo.config with an object instead of module Current usage patterns of setting and reading fields on config will work unchanged. Only changes needed going forward: 1. import torch._dynamo.config will not work. However, just doing import torch._dynamo is sufficient to access dynamo config as torch._dynamo.config. 2. Files inside of _dynamo folder need to access config via from torch._dynamo.config_util import config instead of from torch._dynamo import config. Because _dynamo/__init__.py imports some of the files so it would be circular import. Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/96455 Approved by: https://github.com/jansel	2023-06-03 23:18:41 +00:00
Michael Voznesensky	4c1bc91f42	Support autograd.Function w/ grad (#99483 ) This PR adds support for tracing autograd.Function with grad. A few important bullet points outlining our approach: 1) Our goal is to verify soundness in order to add a call_function to the autograd.Function's `apply` to the graph. 2) We achieve (1) by either verifying soundness or rejecting soundness, by ensuring that both forward and backward of the autograd.Function are sound. 3) For the forward, if we verify soundness, we install its guards into the graph. 4) For the backward, if we verify soundness, we throw it out. However, backwards soundness verification is more onerous, and has a config driven set of banned attrs and methods for tensors. 1-4 above are achieved by turning the forward and backward into UserDefinedFunctionVariables, and inlining through them, relying on dynamo's soundness detection. If we graph break in these, we raise and treat them as unsound. As noted above, backwards is stricter yet. For the tracing, the safety comes from dynamo's HigherOrderOperator system. That system ensures that not only do we trace soundly, but that no new variables are lifted into inputs during the tracing, and that the forward and backwards are entirely self contained. Whenever we reject a function as unsound, we restore back, as usual. Due to some limitations in the lifting logic, we have an escape hatch we implemented for tensors that are known in forward, but cross into backwards through save_tensors (save) /saved_tensors (load). We escape hatch here to avoid having the known saved tensors coming from forward end up being accidentally treated as lifted variables (and rejected). This is sound, but a little hacky feeling. Additionally, due to some limitations in fx node removal, combined with how we produce subgraphs for the traces installed from HigherOrderOperators, we had to improve our node removal logic. In the event of a restore, we remove the old nodes from the graph, as usual in dynamo. However, because the references to these nodes may exist in subgraphs, we traverse any nodes users and remove them first if and only if they are in another graph. This is always sound, because removal should only be downstream of restoration at this point. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99483 Approved by: https://github.com/zou3519	2023-05-19 01:26:21 +00:00
Animesh Jain	8994d9e610	[dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590 ) For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590 Approved by: https://github.com/voznesenskym, https://github.com/wconstab	2023-05-04 18:52:21 +00:00
Will Constable	2dca418112	Reland basic dynamo support for traceable collectives (#100476 ) Relative to the original land, this also contains: - Fix torchdeploy import of functional collectives - Can't import torchdynamo utils due to torch._refs being missing Pull Request resolved: https://github.com/pytorch/pytorch/pull/100476 Approved by: https://github.com/kumpera	2023-05-04 04:25:35 +00:00
Edward Z. Yang	c7e9f40653	Misc accuracy improvements on minifier (#100447 ) The changes: * Add config knob `same_two_models_use_fp64` for toggling whether or not to use fp64 * Add a test showing that RMSE is superior to atol/rtol * Add `--strict-accuracy` options, which allows for testing against integral/boolean accuracy. Regular accuracy by default now ONLY. There's a test which exercises this, it's a little delicate but I had trouble thinking of a good test otherwise. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100447 Approved by: https://github.com/voznesenskym	2023-05-04 02:51:26 +00:00

1 2 3 4 5 ...

328 Commits