pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Guilherme Leobas	4eaa000acc	Teach dynamo about torch.func.jvp (#119926 ) List of changes: - Replace JVP_NESTING by torch._C._functorch.maybe_current_level() - Remove all increment nesting functions from wrap_fx_proxy_cls - fwAD.make_dual receives the dual_level as keyword argument - Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926 Approved by: https://github.com/zou3519	2024-03-22 20:25:47 +00:00
Peter Bell	5790096059	[dynamo] Remove uses of `raise unimplemented` (#122136 ) `unimplemented` is a function that raises an error, so `raise unimplemented(...)` never reaches the `raise`. Another related issue is that `raise unimplemented(...) from e` doesn't attach the exception cause correctly. I fix this by adding a `from_exc` argument to `unimplemented`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122136 Approved by: https://github.com/lezcano	2024-03-22 19:29:58 +00:00
Brian Hirsh	09be5800c8	dynamo: support placement kwargs for DTensor.to_local() (#119947 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119947 Approved by: https://github.com/wanchaol, https://github.com/yoyoyocmu ghstack dependencies: #118803	2024-03-22 14:42:27 +00:00
Brian Hirsh	2e44b12dd4	dynamo: handle DTensor.device_mesh.device_type (#118803 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118803 Approved by: https://github.com/wanchaol, https://github.com/yanboliang	2024-03-22 14:42:22 +00:00
chilli	eca30df846	Added load_args to repro (#121624 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121624 Approved by: https://github.com/ezyang	2024-03-22 08:32:14 +00:00
Jason Ansel	bb75313f0a	[dynamo] Optimize handling of BINARY_OP (#122465 ) This saves ~0.1s on https://dev-discuss.pytorch.org/t/a-torchdynamo-trace-time-ablation-study/1961 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122465 Approved by: https://github.com/oulgen	2024-03-22 08:14:58 +00:00
Joel Schlosser	cd6bfc7965	Proper view support for jagged layout NestedTensor (#113279 ) This PR: * Introduces an ATen op for creating true jagged views from a dense values buffer * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)` * This ops is implemented on the Python side using torch.library so we can return a subclass instance * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer` * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()` * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view * Introduces an ATen op for accessing the `values` component of an NT via a view * `_nested_get_values(nt)` * Removes the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively. * Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly * Similarly, avoid `buffer_from_jagged()`, preferring `values()` * Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack) With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling. Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922) Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279 Approved by: https://github.com/ezyang	2024-03-22 02:12:36 +00:00
PyTorch MergeBot	224beecee6	Revert "Proper view support for jagged layout NestedTensor (#113279 )" This reverts commit `5855c490f0`. Reverted https://github.com/pytorch/pytorch/pull/113279 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113279#issuecomment-2013899762))	2024-03-21 22:03:01 +00:00
William Wen	23524710e6	[dynamo] use proxies to nn.Module in dynamo generated GraphModules (#120756 ) Fixes remaining refleaks found when debugging https://github.com/pytorch/pytorch/issues/119607, tests added in https://github.com/pytorch/pytorch/pull/120657. Also fixes some tests that xfail: https://github.com/pytorch/pytorch/issues/120631 (not entirely sure why), but introduced tests now fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120756 Approved by: https://github.com/jansel	2024-03-21 21:23:12 +00:00
PyTorch MergeBot	968c4c4154	Revert "Refactor gpu trace to be device-agnostic (#121794 )" This reverts commit `74deacbf31`. Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks ROCm jobs in trunk `74deacbf31`, please help take a look and reland the change ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2013674083))	2024-03-21 20:33:17 +00:00
Adnan Akhundov	885fb9742d	Handle special kwargs in user-written Triton kernel calls (#122280 ) Summary: Special kwargs like `num_warps`, `num_stages`, and `num_ctas` can be passed to the Triton kernel call as kwargs. These kwargs are handled in a special way, not being passed to the underlying kernel function directly. In this PR, we move those special kwargs from `kwargs` of the `TritonKernelVariable` in dynamo to `Autotuner`'s `Config` instances (either already existing or newly created for this purpose). As a result, the special kwargs can be codegened correctly as a part of `Config`, not as direct arguments to the kernel `.run`. Test Plan: ``` python test/inductor/test_triton_kernels.py -k test_triton_kernel_special_kwargs ... ---------------------------------------------------------------------- Ran 6 tests in 6.783s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122280 Approved by: https://github.com/oulgen	2024-03-21 03:34:07 +00:00
Yu, Guangye	74deacbf31	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-21 01:52:58 +00:00
JackCaoG	e38d60bc07	Remove some stale xla dynamo backend (#122128 ) `torchxla_trace_once ` and `aot_torchxla_trivial ` should be removed. In our internal(hopefully dashboard can be open source soon) torchbench daily runs, `openxla` backend has much higher passing rate and similar perfomrance as the `openxla_eval`(non-aot-auto-grad backend). We still use `openxla_eval` in llama2 example but I think we should move user to `openxla` backend going forward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122128 Approved by: https://github.com/alanwaketan, https://github.com/jansel	2024-03-21 01:13:50 +00:00
Joel Schlosser	5855c490f0	Proper view support for jagged layout NestedTensor (#113279 ) This PR: * Introduces an ATen op for creating true jagged views from a dense values buffer * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)` * This ops is implemented on the Python side using torch.library so we can return a subclass instance * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer` * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()` * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view * Introduces an ATen op for accessing the `values` component of an NT via a view * `_nested_get_values(nt)` * Removes the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively. * Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly * Similarly, avoid `buffer_from_jagged()`, preferring `values()` * Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack) With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling. Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922) Co-authored-by: voznesenskym <voznesenskym@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279 Approved by: https://github.com/ezyang	2024-03-20 23:45:34 +00:00
ydwu4	61ff41f0ca	[while_loop] disable closure capturing and manually set the inputs. (#122244 ) For while_loop operator, it's important to keep the output ordering consistent with input ordering. Previously, we're using set_graph_inputs="automatic", which doesn't respect such ordering. This PR changes it to "manual" and respects the original user inputs' ordering. We disable closures for body and cond fn as they require some additional designs. This PR is just to prevent the bleeding. Repro: ```python import torch from torch._higher_order_ops.while_loop import while_loop from torch._functorch.aot_autograd import aot_export_module class Nested(torch.nn.Module): def forward(self, ci, cj, a, b): def cond_fn(i1, j1, x1, y1): return i1 > 0 def body_fn(i1, j1, x1, y1): def cond_fn_nested(i2, j2, x2, y2): return j2 > 0 def body_fn_nested(i2, j2, x2, y2): return i2.clone(), j2 - 1, x2 + 3.14, y2 - 2.71 i1, j1, x1, y1 = while_loop( cond_fn_nested, body_fn_nested, [i1, j1, x1, y1] ) return i1 - 1, j1.clone(), x1 * 2, y1 / 2 return while_loop(cond_fn, body_fn, (ci, cj, a, b)) nested = Nested() torch.compile(nested, backend="eager", fullgraph=True)(torch.tensor(2), torch.tensor(2), torch.randn(2, 2), torch.randn(2, 2)) ``` Test plan: add new test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122244 Approved by: https://github.com/aakhundov	2024-03-20 20:14:35 +00:00
PyTorch MergeBot	0696db8202	Revert "Teach dynamo about torch.func.jvp (#119926 )" This reverts commit `17489784b6`. Reverted https://github.com/pytorch/pytorch/pull/119926 on behalf of https://github.com/peterbell10 due to broken mac jobs on main ([comment](https://github.com/pytorch/pytorch/pull/119926#issuecomment-2010327997))	2024-03-20 18:34:43 +00:00
IvanKobzarev	8a94005d46	[dynamo][runtime_asserts] Ignore failures on sorting sympy relations (#122205 ) Differential Revision: [D55075500](https://our.internmc.facebook.com/intern/diff/D55075500) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122205 Approved by: https://github.com/ezyang	2024-03-20 13:25:37 +00:00
Guilherme Leobas	17489784b6	Teach dynamo about torch.func.jvp (#119926 ) List of changes: - Replace JVP_NESTING by torch._C._functorch.maybe_current_level() - Remove all increment nesting functions from wrap_fx_proxy_cls - fwAD.make_dual receives the dual_level as keyword argument - Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926 Approved by: https://github.com/zou3519	2024-03-20 13:09:19 +00:00
Jason Ansel	477d154ffd	[dynamo] Add missing _nonvar_fields annotations (#122219 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122219 Approved by: https://github.com/anijain2305 ghstack dependencies: #122218	2024-03-20 07:53:18 +00:00
Jason Ansel	46bf37b3f7	[dynamo] Replace VariableTracker.apply with visit/realize_all (#122218 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122218 Approved by: https://github.com/anijain2305	2024-03-20 07:53:18 +00:00
Jason Ansel	a0db2e4237	[dynamo] Fixed handling of ImportError (#122222 ) Fixes #122088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122222 Approved by: https://github.com/anijain2305	2024-03-20 07:52:01 +00:00
PyTorch MergeBot	e5e0685f61	Revert "[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098 )" This reverts commit `88ebdbc97c`. Reverted https://github.com/pytorch/pytorch/pull/122098 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the distributed failure looks legit as it is also failing in trunk `88ebdbc97c` ([comment](https://github.com/pytorch/pytorch/pull/122098#issuecomment-2008483316))	2024-03-20 01:12:24 +00:00
Oguz Ulgen	c0b2e56c8f	Support triton.language.dtype with torch.compile -- Second Attempt (#122141 ) This PR is the second attempt at supporting `triton.language.dtype`, now instead of putting it on the graph, we put it on the side table since it is a constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122141 Approved by: https://github.com/jansel ghstack dependencies: #122140	2024-03-19 19:40:52 +00:00
Oguz Ulgen	58a805da71	[UserDefinedTriton] Move constant args out of the fx graph (#122140 ) @ezyang mentioned that we should not put constant args on the graph. Especially when there are args that would be trickier to put on the graph. E.g. next PR needs `triton.language.dtype` as an argument on the graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122140 Approved by: https://github.com/jansel	2024-03-19 19:40:52 +00:00
PyTorch MergeBot	36e5c1dcab	Revert "Teach dynamo about torch.func.jvp (#119926 )" This reverts commit `edd04b7c16`. Reverted https://github.com/pytorch/pytorch/pull/119926 on behalf of https://github.com/jeanschmidt due to lots of breakages in pull jobs, checking if reverting this one will help ([comment](https://github.com/pytorch/pytorch/pull/119926#issuecomment-2007915919))	2024-03-19 18:59:46 +00:00
Peter Bell	88ebdbc97c	[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098 ) Fixes #114844 In the linked issue we have ``` compiled_module = torch.compile(module) compiled_module.x = ... compiled_module(...) # Mutates self.x ``` Where since the module mutates `self.x` you would expect `compiled_module.x` to be updated but actually `compiled_module.x = ...` sets an attribute "x" on the `OptimizedModule` object while the forward method of the module mutates `module.x`. This gives the expected behavior by forwarding `compiled_module.__setattr__` down to `module.__setattr__`. There is already a corresponding `__getattr__` so now `compiled_module.x` becomes an alias for `module.x`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098 Approved by: https://github.com/ezyang, https://github.com/lezcano	2024-03-19 16:51:43 +00:00
PyTorch MergeBot	f9ed1c432d	Revert "Refactor gpu trace to be device-agnostic (#121794 )" This reverts commit `0ff1109e26`. Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/jeanschmidt due to Reverting to see if rocm trunk errors are related ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2007519408))	2024-03-19 15:40:26 +00:00
Jason Ansel	c05bf0037d	[dynamo] Remove copy_graphstate/restore_graphstate (#122067 ) Some dead code cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122067 Approved by: https://github.com/oulgen	2024-03-19 15:37:53 +00:00
Guilherme Leobas	edd04b7c16	Teach dynamo about torch.func.jvp (#119926 ) List of changes: - Replace JVP_NESTING by torch._C._functorch.maybe_current_level() - Remove all increment nesting functions from wrap_fx_proxy_cls - fwAD.make_dual receives the dual_level as keyword argument - Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926 Approved by: https://github.com/zou3519	2024-03-19 13:06:42 +00:00
Yu, Guangye	0ff1109e26	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-19 06:02:28 +00:00
Jason Ansel	5bc7f7f977	[dynamo] Make tx.next_instruction lazy (#122066 ) Improves benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py from 2.5s to 2.4s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122066 Approved by: https://github.com/oulgen, https://github.com/anijain2305 ghstack dependencies: #122039, #122043, #122055, #122058, #122060, #122063	2024-03-19 04:23:30 +00:00
Jason Ansel	153a01833b	[dynamo] Optimize SourcelessBuilder (#122063 ) Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` from 2.7s to 2.5s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122063 Approved by: https://github.com/anijain2305 ghstack dependencies: #122039, #122043, #122055, #122058, #122060	2024-03-19 04:23:30 +00:00
Jason Ansel	8082adcf65	[dynamo] Only rename a proxy once (#122060 ) Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` from 3.9s to 2.7s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122060 Approved by: https://github.com/oulgen ghstack dependencies: #122039, #122043, #122055, #122058	2024-03-19 04:23:27 +00:00
Jason Ansel	2bec55c5f9	[dynamo] Remove VariableTracker.parents_tracker (#122058 ) This is leftover from mutable variable tracker days and no longer needed. Improves benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py from 4.2s to 3.9s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122058 Approved by: https://github.com/oulgen, https://github.com/anijain2305 ghstack dependencies: #122039, #122043, #122055	2024-03-19 04:23:24 +00:00
Jason Ansel	3c706bf483	[dynamo] Optimize BuiltinVariable (#122055 ) Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` from 5.1s to 4.2s (compared to 2 PRs ago). This works by precomputing (and caching) the parts of `BuiltinVariable.call_function` that don't depend on the values of args/kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122055 Approved by: https://github.com/oulgen, https://github.com/anijain2305 ghstack dependencies: #122039, #122043	2024-03-19 04:23:20 +00:00
Jason Ansel	07caea5c12	[dynamo] Refactor COMPARE_OP and comparison builtins (#122043 ) This removes the duplicate handling of comparison ops between symbolic_convert and bultin and refactors the handling to use the binop infrastructure. This change regresses overheads a bit, but this is fixed in the next PR. New test skips are variants of `type(e) is np.ndarray` previously falling back to eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122043 Approved by: https://github.com/anijain2305 ghstack dependencies: #122039	2024-03-19 04:23:17 +00:00
Jason Ansel	769ff86b91	[dynamo] Optimize COMPARE_OP (#122039 ) Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` from 5.6 to 5.1s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122039 Approved by: https://github.com/Skylion007, https://github.com/anijain2305	2024-03-19 04:23:14 +00:00
Animesh Jain	8860c625ea	[dynamo][guards-cpp-refactor] Integrate cpp guard manager with CheckFnManager (#120726 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120726 Approved by: https://github.com/jansel	2024-03-19 03:11:31 +00:00
Animesh Jain	f84d560236	[dynamo] Raise accumulated cache size limit (#122130 ) Fixes #114511 This was raised by IBM folks where the a LLM compile was failing because it had more than 64 layers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122130 Approved by: https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #121954, #122005	2024-03-19 02:35:48 +00:00
Animesh Jain	7084528eb9	[dynamo][model_output] Do not include none for CustomizedDictVariable (#122005 ) Fixes https://github.com/pytorch/pytorch/issues/120923 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122005 Approved by: https://github.com/weifengpy, https://github.com/jansel ghstack dependencies: #121954	2024-03-19 02:35:48 +00:00
Jason Ansel	18d94d7165	Make FX nodes sortable (#122071 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122071 Approved by: https://github.com/oulgen	2024-03-19 01:40:56 +00:00
andrewor14	773ae817f7	Batch Norm Consolidation (#116092 ) Summary: This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op: `batch_norm_with_update`. The existing hierarchy looks like: ``` aten.batch_norm -> aten._batch_norm_impl_index -> [ aten.native_batch_norm -> aten._native_batch_norm_legit (export only) -> _batch_norm_legit_cpu/cuda (kernels, export only) -> _batch_norm_cpu/cuda (kernels) ] OR [ aten.cudnn_batch_norm ] OR [ aten.miopen_batch_norm ] ``` Aside from complexity, an important problem with the above decomposition hierarchy is cuda numerics in export flows. We observed significantly worse convergence when training a mobilenetv2-like model when using the `_batch_norm_cuda` kernel instead of the `cudnn_batch_norm` kernel. This means users who export their models on CPU first then move the models to cuda later may silently see worse accuracies even when cudnn is installed, because they are using the worse kernel. This issue is summarized in https://github.com/pytorch/pytorch/issues/111384. Instead, the new hierarchy proposed by consolidating existing batch norm ops will look like: ``` aten.batch_norm -> aten.batch_norm_with_update -> [ _batch_norm_cpu (kernel) ] OR [ _batch_norm_cuda (kernel) ] OR [ cudnn_batch_norm (kernel) ] OR [ miopen_batch_norm (kernel) ] ``` The new op `batch_norm_with_update` hides backend implementation details and automatically picks the right kernel based on what is installed. This commit also adds the following variants to this op: ``` batch_norm_with_update_functional batch_norm_with_update.out batch_norm_no_update batch_norm_no_update.out batch_norm_backward ``` Note that this commit only adds this op and its variants, but does not actually change the decomps to produce these ops in the graph. This will be done after the 2 week FC window, and the ops used in the old stack is planned to be removed after the 6 month BC window. Test Plan: `OpInfo` tests for `batch_norm_with_update`. Reviewers: albanD, bdhirsh Subscribers: albanD, bdhirsh, supriyar Tasks: https://github.com/pytorch/pytorch/issues/111384 Differential Revision: [D54805279](https://our.internmc.facebook.com/intern/diff/D54805279) Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2024-03-18 21:01:30 +00:00
Oguz Ulgen	7c5e29ae71	Back out "Support `triton.language.dtype` with `torch.compile` (#121690 )" (#122108 ) Summary: Some hard to deal with package import/export related problems. Lets revert and start with clean slate. Test Plan: CI Differential Revision: D55024877 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122108 Approved by: https://github.com/ezyang	2024-03-18 20:50:28 +00:00
James Wu	df1cdaedeb	Log restart reasons and extra compile time in CompilationMetrics (#121827 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121827 Approved by: https://github.com/ezyang, https://github.com/yanboliang	2024-03-18 18:59:25 +00:00
Jason Ansel	5d52b163d1	[dynamo] Optimize load/store/const op handling (#122038 ) Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` from 6.7s to 5.6. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122038 Approved by: https://github.com/Skylion007 ghstack dependencies: #122032, #122033, #122034, #122035	2024-03-18 18:08:06 +00:00
Jason Ansel	4034873a31	[dynamo] Optimize builtin handling (#122035 ) Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` from 7.3s to 6.7s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122035 Approved by: https://github.com/Skylion007 ghstack dependencies: #122032, #122033, #122034	2024-03-18 18:08:06 +00:00
Jason Ansel	6ca0323615	[dynamo] Optimize VariableTracker.__post_init__ (#122034 ) Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` from 8.6s to 7.3s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122034 Approved by: https://github.com/Skylion007 ghstack dependencies: #122032, #122033	2024-03-18 18:08:06 +00:00
Animesh Jain	c568b84794	[dynamo][guards] Move backend match to eval_frame (#121954 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121954 Approved by: https://github.com/jansel	2024-03-17 06:52:10 +00:00
Jason Ansel	4d92928fe2	[dynamo] Add tests for fake FSDP (#121610 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121610 Approved by: https://github.com/yanboliang ghstack dependencies: #121735, #120965	2024-03-16 04:29:59 +00:00
Jason Ansel	0b7d9711d4	[dynamo] Add support for nn.Parameter constructor (part 2) (#120965 ) This handles the case where the tensor isn't an input. The changes to dynamo tests are cases where we would previously fall back to eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120965 Approved by: https://github.com/yanboliang ghstack dependencies: #121735	2024-03-16 04:29:58 +00:00
Jason Ansel	040b925753	[Compiled Autograd] Reorder accumulate grad nodes (#121735 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121735 Approved by: https://github.com/xmfan	2024-03-16 04:29:56 +00:00
lezcano	d0d09f5977	Fix torch.compile links (#121824 ) Fixes https://github.com/pytorch/pytorch.github.io/issues/1567 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121824 Approved by: https://github.com/svekars, https://github.com/peterbell10, https://github.com/malfet ghstack dependencies: #121823	2024-03-15 19:49:37 +00:00
lezcano	8a5a377190	Move doc links to point to main (#121823 ) The previous links were pointing to an outdated branch Command: `find . -type f -exec sed -i "s:docs/main:docs/master:g" {} + ` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121823 Approved by: https://github.com/albanD, https://github.com/malfet	2024-03-15 19:49:37 +00:00
Animesh Jain	d04faf4531	[dynamo][compile-time] Remove preserve rng state per op (#121923 ) We already have one globally - `02bb2180f4/torch/_dynamo/convert_frame.py (L477)` I don't think we need per op. Saves ~2 seconds on this benchmark ~~~ def fn(x): for _ in range(10000): x = torch.ops.aten.sin(x) return x ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/121923 Approved by: https://github.com/jansel	2024-03-15 18:24:46 +00:00
Jason Ansel	5a2b4fc8f0	[dynamo] Convert invalid args into graph breaks (#121784 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121784 Approved by: https://github.com/yanboliang	2024-03-15 06:51:27 +00:00
Animesh Jain	a623666066	[dynamo][compile-time] Make output_graph new_var linear (#121858 ) Fixes https://github.com/pytorch/pytorch/issues/121679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121858 Approved by: https://github.com/jansel	2024-03-15 03:20:04 +00:00
Jason Ansel	60ccf81490	[dynamo] Refactor update_block_stack into a seperate function (#121810 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121810 Approved by: https://github.com/williamwen42 ghstack dependencies: #121790	2024-03-15 01:01:05 +00:00
Jason Ansel	1e9a7df8fe	[dynamo] Compile time optimizations in tx.step() (#121790 ) `python benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` - Before: `symbolic_convert_overhead_stress_test: 10.7s` - After: `symbolic_convert_overhead_stress_test: 8.6s` `tx.step()` is a small part of that benchmark, so likely the speedup in that isolated function is larger than the top line. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121790 Approved by: https://github.com/oulgen	2024-03-15 01:01:05 +00:00
Animesh Jain	92ed8553a6	Revert "Switch cudagraph backend to cudagraph trees (#121019 )" and "Add Cudagraphs disable checking (#121018 )" (#121864 ) This reverts commit `9373ad0bb8`. Revert "Add Cudagraphs disable checking (#121018)" This reverts commit `4af0e634bf`. Causes compilation time increase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121864 Approved by: https://github.com/eellison	2024-03-15 00:03:09 +00:00
Michael Lazos	15abc56bd5	Graph break on step closure in optimizer (#121777 ) Fixes https://github.com/pytorch/pytorch/issues/116494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121777 Approved by: https://github.com/yanboliang	2024-03-14 03:18:23 +00:00
PyTorch MergeBot	70c6f542f2	Revert "[dynamo] Convert invalid args into graph breaks (#121784 )" This reverts commit `0df39480f6`. Reverted https://github.com/pytorch/pytorch/pull/121784 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think it breaks ONNX test in trunk `0c1ac4484d` ([comment](https://github.com/pytorch/pytorch/pull/121784#issuecomment-1995979435))	2024-03-13 22:12:43 +00:00
Boyuan Feng	0c1ac4484d	Support `call_method` in DDPOptimizer (#121771 ) This PR fixes Issue #111279. While #111279 reported the issue with `MultiheadAttention`, a minimal reproduction would be: ```python class ToyModel(nn.Module): def __init__(self,): super().__init__() self.linear = nn.Linear(128, 10) def forward(self, x: torch.Tensor) -> torch.Tensor: return self.linear.forward(x) # Error # return self.linear(x) # OK ``` Dynamo treats `self.linear(x)` as `call_module` while treating `self.linear.forward(x)` as a [`get_attr` and a `call_method`](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/variables/nn_module.py#L358-L378). However, existing DDPOptimizer assumes, for a `get_attr` node, `getattr(gm, node.target)` gives a tensor with the `requires_grad` attribute. Existing DDPOptimizer also does not support `call_method` nodes. This PR adds support for `call_method` and check on `get_attr`. It also checks if a module's parameters have been added to a bucket to support multiple method calls from the same module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121771 Approved by: https://github.com/yf225	2024-03-13 20:03:15 +00:00
Jason Ansel	0df39480f6	[dynamo] Convert invalid args into graph breaks (#121784 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121784 Approved by: https://github.com/yanboliang ghstack dependencies: #121615, #121616	2024-03-13 20:02:33 +00:00
Michael Lazos	05df03ec1b	Allow custom attributes for torch function subclasses (#121693 ) Added custom attribute access with test Pull Request resolved: https://github.com/pytorch/pytorch/pull/121693 Approved by: https://github.com/anijain2305	2024-03-13 17:01:57 +00:00
Jason Ansel	559ca13b3f	[dynamo] Refactor TorchInGraphFunctionVariable for compile time (#121616 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121616 Approved by: https://github.com/oulgen ghstack dependencies: #121615	2024-03-13 14:21:21 +00:00
Avik Chaudhuri	7fe0cc53e9	make _process_dynamic_shapes an implementation detail (#121713 ) Summary: `_process_dynamic_shapes` converts new dynamic shapes to old constraints, but in the future may not need to do so. Preparing for that future. Test Plan: CI Differential Revision: D54780374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121713 Approved by: https://github.com/tugsbayasgalan	2024-03-13 08:33:00 +00:00
Jason Ansel	a13dd92d88	[dynamo] Minor compile time optimizations in torch.py (#121615 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121615 Approved by: https://github.com/oulgen	2024-03-13 05:36:22 +00:00
Yanan Cao	7d05c4c093	Remove error anti-pattern when dealing with dynamic shape output (#121681 ) There are cases where capture_dynamic_output_shape_ops=True and we will still see DynamicOutputShapeException. For example, when an op doesn't have a meta kernel implemented to return the correct dynamic shape output. If we blindly give users instructions to set capture_dynamic_output_shape_ops to True, users would try it and see no change. As witnessed in this issue: https://github.com/pytorch/pytorch/issues/121036#issuecomment-1985221435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121681 Approved by: https://github.com/tugsbayasgalan	2024-03-13 00:45:23 +00:00
Wenting Wang	02bb2180f4	[torch export] replace traceback.extract_stack with CapturedTraceback.extract (#121449 ) Summary: with a simple bench in TestDeserializer.test_basic function: ``` time_start = time.time() for i in range(1000): self.check_graph(MyModule(), inputs) warnings.warn(f"time_taken: {time.time() - time_start}") ``` and forcing FakeTensorConfig.debug to True, record_stack_traces to True, logging level to debug, it shows that the the changed code is consistently ard 20 secs faster (~90s vs originally ~110s) Test Plan: test passed, see summary compared debug trace before and after: - exactly the same for fake tensor and proxy callsite https://www.internalfb.com/intern/diffing/?paste_number=1189883685 - slightly different for the user frame in proxy node https://www.internalfb.com/intern/diffing/?paste_number=1189884347 Differential Revision: D54237017 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121449 Approved by: https://github.com/angelayi	2024-03-13 00:19:05 +00:00
Oguz Ulgen	79ee6bbde3	Support `triton.language.dtype` with `torch.compile` (#121690 ) Putting this PR as an RFC since I have resorted to some horrible hacks in order to make this work. ``` (Pdb) p triton.language.float32 triton.language.fp32 (Pdb) p str(triton.language.float32) 'fp32' (Pdb) p repr(triton.language.float32) 'triton.language.fp32' ``` This means that we need to "rewrite" them for fx graph and inductor execution. This PR allows Mamba2 to work with `torch.compile`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121690 Approved by: https://github.com/Skylion007	2024-03-12 23:21:46 +00:00
Animesh Jain	22bb24986d	[dynamo][guards] Use lazy variable tracker for func defaults (#121388 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121388 Approved by: https://github.com/jansel	2024-03-12 22:48:48 +00:00
PyTorch MergeBot	5b506c8bce	Revert "[dynamo][guards] Use lazy variable tracker for func defaults (#121388 )" This reverts commit `04a5d6e8d3`. Reverted https://github.com/pytorch/pytorch/pull/121388 on behalf of https://github.com/osalpekar due to causing executorch model-test failures internally. See [D54707529](https://www.internalfb.com/diff/D54707529) ([comment](https://github.com/pytorch/pytorch/pull/121388#issuecomment-1992619251))	2024-03-12 21:31:18 +00:00
Shunting Zhang	522d972924	[eazy] add more log when accuracy check fail (#121656 ) Add these log to debug the regress of accuracy test for dm_nfnet_f0 model for training. With these extra log when the accuracy check fail, we can verify if it's close to succeed or not. If yes that indicates there is no real issue but just flaky and we probably can tune the tolerance to fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121656 Approved by: https://github.com/jansel, https://github.com/Skylion007	2024-03-12 20:58:20 +00:00
Jason Ansel	3c8c7e2a46	[dynamo] Tweak naming for module hook bw_state (#121609 ) Some minor changes not related to the other PRs in the stack Pull Request resolved: https://github.com/pytorch/pytorch/pull/121609 Approved by: https://github.com/yanboliang	2024-03-12 16:27:56 +00:00
Animesh Jain	78b4793c96	[dynamo][compile-time] Caching VTs to reduce compile-time (#121031 ) Reduces the `torch.compile(backend="eager")` for this code ~~~ def fn(x): for _ in range(10000): # x = torch.sin(x) x = torch.ops.aten.sin(x) # x = sin(x) return x ~~~ From 18 seconds to 12 seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121031 Approved by: https://github.com/jansel	2024-03-12 09:19:50 +00:00
PyTorch MergeBot	fd0dbcd891	Revert "Batch Norm Consolidation (#116092 )" This reverts commit `7b4f70eda5`. Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/osalpekar due to Causes build failure in //caffe2:aten-hip (AMD build) target. See [D54707318](https://www.internalfb.com/diff/D54707318) for more details, may require internal build system changes to resolve. ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1989542965))	2024-03-11 22:22:41 +00:00
Jason Ansel	7cc476ea16	[dynamo] Fix support for nn.Parameter constructor (part 1) (#120163 ) This captures calls to `torch.nn.Parameter` by lifting them to graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120163 Approved by: https://github.com/albanD, https://github.com/yanboliang ghstack dependencies: #121086	2024-03-11 05:14:42 +00:00
Jason Ansel	32488b0664	[dynamo] Support _unsafe_set_version_counter (#121086 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121086 Approved by: https://github.com/yanboliang	2024-03-11 05:14:42 +00:00
Ze Sheng	7a4e451184	[Dynamo] Fix function overrides (#120885 ) To check existence of `__torch_function__`, the code intended to iterate each element but got `TupleVariable` when the ordinary `has_torch_function()` was being used. Needs further unpack in this case Fixes #120653 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120885 Approved by: https://github.com/yanboliang	2024-03-11 02:18:43 +00:00
Yifu Wang	71d0202627	[dynamo] support rewriting dist.all_reduce with explicitly specified reduce op (#120181 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120181 Approved by: https://github.com/wconstab, https://github.com/awgu	2024-03-09 08:28:22 +00:00
eellison	9373ad0bb8	Switch cudagraph backend to cudagraph trees (#121019 ) Switch torch.compile(..., backend="cudagraphs") to use cudagraph trees. Enabled a few test in cudagraph_trees and note that there is another test suite existing for cudagraphs backend: https://github.com/pytorch/pytorch/blob/main/test/dynamo/test_cudagraphs.py. This is basically the inductor cudagraphs without inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121019 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #121017, #121018	2024-03-08 22:56:26 +00:00
eellison	4af0e634bf	Add Cudagraphs disable checking (#121018 ) Adds the same cudagraphs disable checking from inductor - cudagraph trees to cudagraphs backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121018 Approved by: https://github.com/ezyang ghstack dependencies: #121017	2024-03-08 22:47:24 +00:00
Boyuan Feng	35d3adb4b0	Add ATen Op _chunk_cat and _chunk_cat.out (#121081 ) # Motivation In backward of per-parameter sharding FSDP, each rank performs reduce scatter to sync gradients across ranks. A rank chunks each gradient tensor into `world_size` slices along the 0-th dimension and concatenate all slices along the 1-th dimension. Gradient tensors will be padded before concatenation when tensor.size(0) % world_size != 0. ### Example 1 Consider `world_size=3` and tensors A (2x4), B (3x3), C (1x2): Input tensors: ``` AAAA BBB CC AAAA BBB BBB ``` Reduce-scatter-copy-in Output: ``` AAAABBBCC AAAABBB00 0000BBB00 ``` ### Example 2 Consider `world_size=2` and tensors A (2x4), B (3x3), C(1x2), D(4x2): Input tensors: ``` AAAA BBB CC DD AAAA BBB 00 DD BBB DD 000 DD ``` Reduce-scatter-copy-in first pad: ``` AAAA BBB CC DD AAAA BBB 00 DD BBB DD 000 DD ``` Then chunk and cat along dim as the output: ``` AAAABBBBBBCCDDDD AAAABBB00000DDDD ``` The performance of reduce-scatter-copy-in is critical to per-parameter sharding FSDP. However, reduce-scatter-copy-in via composing existing ATen ops involves `cat` and irregular `pad`, leading redundant data copies and unsatisfactory performance. # PR We provide aten native support for reduce-scatter-copy-in, namely `_chunk_cat()`: ``` _chunk_cat(Tensor[] tensors, int dim, int num_chunks) -> Tensor ``` This PR includes the registration of `_chunk_cat` and `_chunk_cat.out`, OpInfo tests, and basic implementation composing existing ATen ops. In the next PR, we will add the CUDA implementation. Comparing with baselines of composing existing ATen ops, `_chunk_cat()` CUDA implementation improves copy bandwidth from 498 GB/s to 966 GB/s on a production benchmark. ## Requirements on input 1. If input tensors have different ndims, dim should be non-negative and be less than the ndims of every input tensors. If all input tensors have the same ndims, we support both negative and non-negative dim. 2. For wrapped_dim, all tensors should have the same size for 0,...,wrapped_dim-1 dimensions. No requirements for (wrapped_dim, ...)-th dimension. 3. Expect positive num_chunks 4. Expect non-empty input tensor list and each input tensor should have at least 1 element Pull Request resolved: https://github.com/pytorch/pytorch/pull/121081 Approved by: https://github.com/albanD	2024-03-08 21:48:12 +00:00
albanD	53d5276d69	Improve Dynamo support for torch function and class methods in general (#121365 ) I was originally trying to solve https://github.com/pytorch/pytorch/issues/120799 but got sidetracked along the way. This PR contains a couple fixes. Let me know if you want me to split them up! - Properly handle invalid user code when "super()" is called from non-method/classmethod. It will now properly raise the same error as CPython - Fix base VariableTracker `__str__` method shadowing all `__repr__` methods defined in subclasses - Fix accessing a classmethod on a user object to bind "cls" and not "self" - Fix custom class handling of super() call to properly handle mixed regular/class/static methods Locally , test_repros.py -k test_batch_norm_act still fails where the generated graph module is: ``` Call using an FX-traced Module, line 8 of the traced Module's generated forward function: x = self.forward(l_x_); self = l_x_ = None x_1 = self.L__self___act(x); x = None ``` note that "self" is being unset on the first line even though it is used on the second one. For reference, this is the test `c268ce4a6d/test/dynamo/test_repros.py (L1368-L1369)` I cannot figure out where the generated forward function comes from though, any hint would be welcome! Pull Request resolved: https://github.com/pytorch/pytorch/pull/121365 Approved by: https://github.com/jansel	2024-03-08 20:03:49 +00:00
Elias Ellison	937e89f252	cudagraphs backend refactoring (#121017 ) This is just some refactoring.. no functional changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/121017 Approved by: https://github.com/ezyang	2024-03-08 19:47:41 +00:00
Aaron Gokaslan	d55d803812	Add operator length hint support (#121495 ) Seemed like an easy operator to squeeze into Python 2.3 . Added a simple test. Partially addresses #116396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121495 Approved by: https://github.com/albanD	2024-03-08 19:08:33 +00:00
andrewor14	7b4f70eda5	Batch Norm Consolidation (#116092 ) Summary: This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op: `batch_norm_with_update`. The existing hierarchy looks like: ``` aten.batch_norm -> aten._batch_norm_impl_index -> [ aten.native_batch_norm -> aten._native_batch_norm_legit (export only) -> _batch_norm_legit_cpu/cuda (kernels, export only) -> _batch_norm_cpu/cuda (kernels) ] OR [ aten.cudnn_batch_norm ] OR [ aten.miopen_batch_norm ] ``` Aside from complexity, an important problem with the above decomposition hierarchy is cuda numerics in export flows. We observed significantly worse convergence when training a mobilenetv2-like model when using the `_batch_norm_cuda` kernel instead of the `cudnn_batch_norm` kernel. This means users who export their models on CPU first then move the models to cuda later may silently see worse accuracies even when cudnn is installed, because they are using the worse kernel. This issue is summarized in https://github.com/pytorch/pytorch/issues/111384. Instead, the new hierarchy proposed by consolidating existing batch norm ops will look like: ``` aten.batch_norm -> aten.batch_norm_with_update -> [ _batch_norm_cpu (kernel) ] OR [ _batch_norm_cuda (kernel) ] OR [ cudnn_batch_norm (kernel) ] OR [ miopen_batch_norm (kernel) ] ``` The new op `batch_norm_with_update` hides backend implementation details and automatically picks the right kernel based on what is installed. This commit also adds the following variants to this op: ``` batch_norm_with_update_functional batch_norm_with_update.out batch_norm_no_update batch_norm_no_update.out batch_norm_backward ``` Note that this commit only adds this op and its variants, but does not actually change the decomps to produce these ops in the graph. This will be done after the 2 week FC window, and the ops used in the old stack is planned to be removed after the 6 month BC window. Test Plan: `OpInfo` tests for `batch_norm_with_update`. Reviewers: albanD, bdhirsh Subscribers: albanD, bdhirsh, supriyar Tasks: https://github.com/pytorch/pytorch/issues/111384 Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2024-03-08 15:07:15 +00:00
Shunting Zhang	7dc1ab8989	make dyanmo work with _LazyGraphModule.lazy_forward (#121259 ) Fix https://github.com/pytorch/pytorch/issues/121198 . We previously already trigger the real recompilation for LazyGraphModule when it runs thru dynamo context. But people may pass in LazyGraphModule._lazy_forward rather than the LazyGraphModule instance itself. This PR handles that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121259 Approved by: https://github.com/williamwen42, https://github.com/jansel	2024-03-08 01:37:39 +00:00
Animesh Jain	04a5d6e8d3	[dynamo][guards] Use lazy variable tracker for func defaults (#121388 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121388 Approved by: https://github.com/jansel	2024-03-08 01:10:46 +00:00
Ivan Zaitsev	5d8e4126b6	Fixup test_trace_rules (#121351 ) Summary: Fixes https://www.internalfb.com/intern/testinfra/diagnostics/7599824578133672.281475099376195.1709732674/ (for some reason this test didn't run in OSS)? Reached out to Yanbo Liang for additional context: {F1465435684} Test Plan: Local: https://www.internalfb.com/intern/testinfra/testconsole/testrun/16325548673376150/ Differential Revision: D54605075 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121351 Approved by: https://github.com/malfet, https://github.com/yanboliang	2024-03-08 00:50:45 +00:00
Jane Xu	24821fec26	Add RAdam capturable API for forloop (#121260 ) Implementation thanks to @MarouaneMaatouk in https://github.com/pytorch/pytorch/pull/118697, though I've since cleaned it up a lot to save perf on the rect < 5 eager case. It also just looks better now :) Added tests and the cudagraph health check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121260 Approved by: https://github.com/mlazos	2024-03-08 00:00:30 +00:00
Dheeraj Peri	b1657beac1	feat: Add min, max ranges to mark_dynamic API (#119737 ) Fixes https://github.com/pytorch/pytorch/issues/115137 This PR adds: - mark_dynamic API will accept `min`, `max` values to create a bounded constraint on the dim. - test case in test_misc.py which checks if `ConstraintViolationError` is triggered if `torch.compile` gets a input dimension out of bounds. Co-authored-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119737 Approved by: https://github.com/ezyang, https://github.com/jansel	2024-03-07 23:26:03 +00:00
William Wen	d14d62b7aa	[dynamo] add more refleak tests (#120657 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120657 Approved by: https://github.com/jansel	2024-03-07 22:25:43 +00:00
Joel Schlosser	ea8f6e2e54	Subclass view fake-ification via reified ViewFuncs (#118405 ) This PR: * Uses reified ViewFuncs to swap in fake tensors / symbolic SymInts for view replay during subclass view fake-ification * Enables automatic dynamic on view bases -> fakeifies according to the resultant symbolic context instead of the old "all-static" approach * Covers the following view types: * subclass -> dense * dense -> subclass * subclass -> subclass * Dense -> dense views are handled the old way via an `as_strided()` call, as it's likely there is no view func available Differential Revision: [D54269082](https://our.internmc.facebook.com/intern/diff/D54269082) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118405 Approved by: https://github.com/ezyang	2024-03-07 19:56:16 +00:00
IvanKobzarev	9a45001905	[dynamo] relax missing symbols runtime assert (#121339 ) Differential Revision: [D54603361](https://our.internmc.facebook.com/intern/diff/D54603361) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121339 Approved by: https://github.com/ezyang	2024-03-07 14:53:38 +00:00
mingfeima	b3065f6899	add int8 packed gemm support on CPU device (#118056 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118056 Approved by: https://github.com/mikekgfb	2024-03-07 08:41:43 +00:00
laith sakka	eb4d87f237	graph break on sparse tensors constructions (#120458 ) Fix some tests in https://github.com/pytorch/pytorch/issues/119780 sparse_bsc_tensor is not supported https://github.com/pytorch/pytorch/pull/117907 Also more about the issue here. https://docs.google.com/document/d/1EIb4qG88-SjVFn5TloLERliYdxIu2hwYoAA8skjOVfo/edit Pull Request resolved: https://github.com/pytorch/pytorch/pull/120458 Approved by: https://github.com/ezyang	2024-03-07 02:17:41 +00:00
Yichen Yan	e50ded03a6	Use type check for also `is_not` (#113859 ) Handle `is_not` for: `9647a251cb/torch/_dynamo/variables/builtin.py (L1314-L1317)` I noticed https://github.com/pytorch/pytorch/issues/111713 exists, I think it's no harm to land this first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113859 Approved by: https://github.com/Skylion007	2024-03-06 23:12:42 +00:00
Yifu Wang	d7a5e59647	[dynamo] support group=None when rewriting collectives (#121043 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121043 Approved by: https://github.com/awgu	2024-03-06 21:37:19 +00:00
Michael Lazos	c5ef4df274	guard on grads being `None` in compiled optimizers (#121291 ) Fixes #115607 We were missing guards when the grads were set to `None`. So if we compiled the optimizer with grads set to their proper value, and then with the grads set to `None` we'd continuously run the `None` version because all of the guards would pass and it would be ordered before the correct version in the cache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121291 Approved by: https://github.com/Skylion007, https://github.com/anijain2305	2024-03-06 18:33:23 +00:00
PyTorch MergeBot	b529c19bdf	Revert "Batch Norm Consolidation (#116092 )" This reverts commit `5680f565d5`. Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/jeffdaily due to broke ROCm, PR signal was clean but trunk was not, the merge should have been blocked but wasn't ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1981373237))	2024-03-06 17:10:01 +00:00
Guilherme Leobas	54d92f2e37	Add jacrev support in torch.compile (#121146 ) Changes are simple. Moved a few entries on trace_rules.py and included tests to compare the graph generated by jacrev Pull Request resolved: https://github.com/pytorch/pytorch/pull/121146 Approved by: https://github.com/zou3519	2024-03-06 16:05:33 +00:00
Yukio Siraichi	aa0b0944d5	[dynamo] Re-dispatch `torch.Tensor.new` into `torch.Tensor.new_empty` method. (#121075 ) Fix: https://github.com/pytorch/xla/issues/6009 This PR adds another case to `TensorVariable.method_new` special case, where it re-dispatches `new` into `new_empty`. Since we are using fake tensors, the `new` call doesn't actually gets to the corresponding backend (e.g. XLA). So, things like the following might happen: ```python @torch.compile(backend="openxla") def foo(x): new_x = x.new(x.size()) # new_x.device() == "xla" # x.device() == "xla:0" return new_x + x a = torch.arange(10) foo(a.to(xm.xla_device())) ``` Resulting in the following error: ```python Traceback (most recent call last): ... File "torch/_dynamo/utils.py", line 1654, in get_fake_value ret_val = wrap_fake_exception( File "torch/_dynamo/utils.py", line 1190, in wrap_fake_exception return fn() File "torch/_dynamo/utils.py", line 1655, in <lambda> lambda: run_node(tx.output, node, args, kwargs, nnmodule) File "torch/_dynamo/utils.py", line 1776, in run_node raise RuntimeError(make_error_message(e)).with_traceback( File "torch/_dynamo/utils.py", line 1758, in run_node return node.target(args, *kwargs) File "torch/utils/_stats.py", line 20, in wrapper return fn(args, *kwargs) File "torch/_subclasses/fake_tensor.py", line 885, in __torch_dispatch__ return self.dispatch(func, types, args, kwargs) File "torch/_subclasses/fake_tensor.py", line 1224, in dispatch return self._cached_dispatch_impl(func, types, args, kwargs) File "torch/_subclasses/fake_tensor.py", line 955, in _cached_dispatch_impl output = self._dispatch_impl(func, types, args, kwargs) File "torch/_subclasses/fake_tensor.py", line 1445, in _dispatch_impl return self.wrap_meta_outputs_with_default_device_logic( File "torch/_subclasses/fake_tensor.py", line 1575, in wrap_meta_outputs_with_default_device_logic return tree_map(wrap, r) File "torch/utils/_pytree.py", line 900, in tree_map return treespec.unflatten(map(func, flat_args)) File "torch/utils/_pytree.py", line 736, in unflatten leaves = list(leaves) File "torch/_subclasses/fake_tensor.py", line 1550, in wrap ) = FakeTensor._find_common_device(func, flat_args) File "torch/_subclasses/fake_tensor.py", line 625, in _find_common_device merge_devices(arg) File "torch/_subclasses/fake_tensor.py", line 620, in merge_devices raise RuntimeError( torch._dynamo.exc.TorchRuntimeError: Failed running call_function <built-in function add>((FakeTensor(..., device='xla', size=(10,), dtype=torch.int64), FakeTensor(..., device='xla:0', size=(10,), dtype=torch.int64)), *{}): Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices xla, xla:0 ``` Using `new_empty`, instead, fixes this error because it uses the device from the source tensor, instead of inferring from the current dispatch key set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121075 Approved by: https://github.com/jansel	2024-03-06 11:49:27 +00:00
Tugsbayasgalan Manlaibaatar	5680f565d5	Batch Norm Consolidation (#116092 ) Summary: This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op: `batch_norm_with_update`. The existing hierarchy looks like: ``` aten.batch_norm -> aten._batch_norm_impl_index -> [ aten.native_batch_norm -> aten._native_batch_norm_legit (export only) -> _batch_norm_legit_cpu/cuda (kernels, export only) -> _batch_norm_cpu/cuda (kernels) ] OR [ aten.cudnn_batch_norm ] OR [ aten.miopen_batch_norm ] ``` Aside from complexity, an important problem with the above decomposition hierarchy is cuda numerics in export flows. We observed significantly worse convergence when training a mobilenetv2-like model when using the `_batch_norm_cuda` kernel instead of the `cudnn_batch_norm` kernel. This means users who export their models on CPU first then move the models to cuda later may silently see worse accuracies even when cudnn is installed, because they are using the worse kernel. This issue is summarized in https://github.com/pytorch/pytorch/issues/111384. Instead, the new hierarchy proposed by consolidating existing batch norm ops will look like: ``` aten.batch_norm -> aten.batch_norm_with_update -> [ _batch_norm_cpu (kernel) ] OR [ _batch_norm_cuda (kernel) ] OR [ cudnn_batch_norm (kernel) ] OR [ miopen_batch_norm (kernel) ] ``` The new op `batch_norm_with_update` hides backend implementation details and automatically picks the right kernel based on what is installed. This commit also adds the following variants to this op: ``` batch_norm_with_update_functional batch_norm_with_update.out batch_norm_no_update batch_norm_no_update.out batch_norm_backward ``` Note that this commit only adds this op and its variants, but does not actually change the decomps to produce these ops in the graph. This will be done after the 2 week FC window, and the ops used in the old stack is planned to be removed after the 6 month BC window. Test Plan: `OpInfo` tests for `batch_norm_with_update`. Reviewers: albanD, bdhirsh Subscribers: albanD, bdhirsh, supriyar Tasks: https://github.com/pytorch/pytorch/issues/111384 Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2024-03-06 04:50:46 +00:00
Joel Schlosser	dad1b76584	Introduce EphemeralSource for symbols that should be simplified out (#120948 ) Context: view fake-ification should handle closed-over state in ViewFuncs for use in view replay by: * fake-ifying tensors * symbolicizing SymInts This avoids invalid specialization during view replay. However, the symbols / tensors created as intermediates in the view chain should not stick around or be guarded on. This PR introduces an `EphemeralSource` intended to be used as a source for this purpose. It has the following properties: * Considered first to be simplified out in symbol simplification logic * Errors if guarded on Differential Revision: [D54561597](https://our.internmc.facebook.com/intern/diff/D54561597) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120948 Approved by: https://github.com/ezyang	2024-03-06 02:30:52 +00:00
Zhengxu Chen	8aeb247a3d	[export] Remove WrapperModule. (#121042 ) Summary: WrapperModule seems a good idea but may introduce some surprising behavior to users, for example, it never registers enclosed modules as submodules and therefore it's unclear that's the state dict for the exported program should look like, because some people may argue to include every state in state dict but others want to keep them as constants. Test Plan: CI Reviewed By: tugsbayasgalan Differential Revision: D54326331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121042 Approved by: https://github.com/angelayi	2024-03-05 18:10:22 +00:00
Jason Ansel	35004b8ab4	[dynamo] Fix handling of invalid args (#121110 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121110 Approved by: https://github.com/yanboliang ghstack dependencies: #121106	2024-03-05 17:16:04 +00:00
Jason Ansel	4f19b5f7ef	[dynamo] Remove extra guard for tensor constant attrs (#121106 ) Also deletes some unused code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121106 Approved by: https://github.com/yanboliang, https://github.com/anijain2305	2024-03-05 17:16:04 +00:00
Yanbo Liang	46c9d646dd	[Dynamo] Fix inspect.getattr_static doesn't work well for torch.utils._cxx_pytree.PyTreeSpec (#120812 ) Fixes #118793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120812 Approved by: https://github.com/zou3519	2024-03-05 09:05:26 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	a15c02562a	Fix dynamo failure (#121167 ) Summary: Title Test Plan: CI Differential Revision: D54509198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121167 Approved by: https://github.com/izaitsevfb	2024-03-05 03:19:59 +00:00
Aaron Gokaslan	9deaa2e812	[BE]: FURB187 Use inplace reverse on lists: faster, more readable. (#121140 ) Use `reverse()` method as it's faster and inplace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121140 Approved by: https://github.com/albanD	2024-03-05 01:36:17 +00:00
chilli	83c312990f	Add missing newline to repro and some utility thing in repro (#121051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121051 Approved by: https://github.com/ezyang, https://github.com/shunting314, https://github.com/eellison	2024-03-04 22:52:54 +00:00
angelayi	4cdc2d7096	[dynamo] Remove expected dynamo test failures (#120836 ) Fixes some of the tests in #120643 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120836 Approved by: https://github.com/zou3519	2024-03-04 20:41:49 +00:00
PyTorch MergeBot	a98c17edc7	Revert "add int8 packed gemm support on CPU device (#118056 )" This reverts commit `f84375ca5d`. Reverted https://github.com/pytorch/pytorch/pull/118056 on behalf of https://github.com/izaitsevfb due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/118056#issuecomment-1977368720))	2024-03-04 20:09:40 +00:00
rzou	3ef0befdc9	Better error messages for impl_abstract_pystub (#120959 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120959 Approved by: https://github.com/drisspg	2024-03-04 15:24:36 +00:00
Pearu Peterson	ce2903080c	Add sparse compressed fake tensor support (#120920 ) As in the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120920 Approved by: https://github.com/ezyang	2024-03-04 14:38:45 +00:00
Animesh Jain	b7f2522692	[dynamo][compile-time] Remove unnecessary tree_map_only (#121052 ) Reduces the torch.compile(backend="eager") for this code by 1-2 seconds. ~~~ def fn(x): for _ in range(10000): # x = torch.sin(x) x = torch.ops.aten.sin(x) # x = sin(x) return x ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/121052 Approved by: https://github.com/jansel ghstack dependencies: #121053	2024-03-03 06:59:43 +00:00
Jason Ansel	0e0a621e0c	[dynamo] Minor refactors (#120966 ) These are changes I pulled out of the above PRs due to not being related. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120966 Approved by: https://github.com/yanboliang	2024-03-03 02:20:48 +00:00
Animesh Jain	8e4301077e	[dynamo][comp-time] BuiltinVariableTracker - inspect signature only on failure (#121053 ) Reduces the torch.compile(backend="eager") for this code by 1-2 seconds. ~~~ def fn(x): for _ in range(10000): # x = torch.sin(x) x = torch.ops.aten.sin(x) # x = sin(x) return x ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/121053 Approved by: https://github.com/jansel	2024-03-02 23:03:00 +00:00
mingfeima	f84375ca5d	add int8 packed gemm support on CPU device (#118056 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118056 Approved by: https://github.com/mikekgfb ghstack dependencies: #117475	2024-03-02 04:35:49 +00:00
Nikita Shulga	a3b81666b1	[Dynamo] Fix guards for code objects (#120909 ) By comparing them only by id, and raising an assert if someone calls into `EQUALS_MATCH` Which render following example compileable: ```python import torch @torch.compile() def foo(x, y): code = compile(y, "foo", "exec") exec(y) return x print(foo(torch.rand(3), "print('Hello World')")) ``` Fixes https://github.com/pytorch/pytorch/issues/120647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120909 Approved by: https://github.com/jansel	2024-03-02 02:17:17 +00:00
PyTorch MergeBot	3d7cf8f392	Revert "Limit loop unrolling (#120023 )" This reverts commit `6cc7f9a2e6`. Reverted https://github.com/pytorch/pytorch/pull/120023 on behalf of https://github.com/anijain2305 due to breaks llms export ([comment](https://github.com/pytorch/pytorch/pull/120023#issuecomment-1974104633))	2024-03-02 00:04:08 +00:00
Tugsbayasgalan Manlaibaatar	f01a23d01b	Don't aggressively rewrite asserts for symbolic expressions (#120564 ) Fixes: https://github.com/pytorch/pytorch/issues/118417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120564 Approved by: https://github.com/ezyang	2024-03-01 17:46:36 +00:00
angelayi	c844b377fa	[dynamo] Reorder logs (#116106 ) Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792. Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600 There are some limitations to the printing right now: * You can only register logging functions, not methods * Inputs to the logging functions can only be tensors, constants, and format strings * Inputs to the logging functions which will later be mutated in-place will not be printed correctly TODO: Add the following tests * print function with argument of nested data structure; * print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly); * custom defined logging functions with nn.Module or nn.Module attribute arguments; * custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value); * custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage); Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106 Approved by: https://github.com/yanboliang	2024-03-01 17:04:24 +00:00
Aaron Orenstein	8861507ba3	Fix guard for SUPPORTED_NODES (#120798 ) The special-case code for handling SUPPORTED_NODES was producing a guard that looked like: ``` "G['torch'].utils._pytree.SUPPORTED_NODES[<class '__main__.CausalLMOutputWithPast'>].type" ``` resulting in a eval error trying to evaluate the guard. This change adds a new source type (`ClassSource`) which is given a class type (in this case `CausalLMOutputWithPast`) and attempts to fetch it from its defining module. It then uses that to build the `SUPPORTED_NODES` guards instead of referring to the type directly. Also added a unit test which fails before this change and passes after. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120798 Approved by: https://github.com/anijain2305	2024-03-01 16:03:21 +00:00
Tugsbayasgalan Manlaibaatar	c646030cd2	Support higher order op functionalization in predispatch IR (#115314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115314 Approved by: https://github.com/bdhirsh	2024-03-01 09:13:47 +00:00
PyTorch MergeBot	63b259492a	Revert "[dynamo] Reorder logs (#116106 )" This reverts commit `c5472628ff`. Reverted https://github.com/pytorch/pytorch/pull/116106 on behalf of https://github.com/clee2000 due to landrace with `342e7929b8`, which removed the import for warnings. Should be an easy fix after rebase `c5472628ff` ([comment](https://github.com/pytorch/pytorch/pull/116106#issuecomment-1972586180))	2024-03-01 06:25:46 +00:00
Angela Yi	c5472628ff	[dynamo] Reorder logs (#116106 ) Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792. Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600 There are some limitations to the printing right now: * You can only register logging functions, not methods * Inputs to the logging functions can only be tensors, constants, and format strings * Inputs to the logging functions which will later be mutated in-place will not be printed correctly TODO: Add the following tests * print function with argument of nested data structure; * print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly); * custom defined logging functions with nn.Module or nn.Module attribute arguments; * custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value); * custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage); Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106 Approved by: https://github.com/yanboliang	2024-03-01 04:48:44 +00:00
PyTorch MergeBot	65d568680c	Revert "[Dynamo] Fix inspect.getattr_static doesn't work well for torch.utils._cxx_pytree.PyTreeSpec (#120812 )" This reverts commit `1104e0798c`. Reverted https://github.com/pytorch/pytorch/pull/120812 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the XLA failure test_simple_model look legit `1104e0798c` ([comment](https://github.com/pytorch/pytorch/pull/120812#issuecomment-1972460001))	2024-03-01 03:53:27 +00:00
Oguz Ulgen	b35551f357	Ban reset_to_zero argument to triton.autotune in user defined kernels (#120938 ) Fixes #120802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120938 Approved by: https://github.com/chenyang78, https://github.com/jansel	2024-03-01 02:37:24 +00:00
Jason Ansel	e3dbd194f4	[dynamo] Support module backwards hooks (#120685 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120685 Approved by: https://github.com/yanboliang, https://github.com/xmfan	2024-03-01 02:24:26 +00:00
Guilherme Leobas	fd35aafc26	Teach dynamo about vjp (#119405 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119405 Approved by: https://github.com/zou3519 ghstack dependencies: #118407	2024-03-01 00:21:10 +00:00
PyTorch MergeBot	33da8d5c12	Revert "Fix guard for SUPPORTED_NODES (#120798 )" This reverts commit `1b8bb027f6`. Reverted https://github.com/pytorch/pytorch/pull/120798 on behalf of https://github.com/kit1980 due to the new test fails internally, see D54343456 ([comment](https://github.com/pytorch/pytorch/pull/120798#issuecomment-1972134227))	2024-02-29 23:19:22 +00:00
Elias Ellison	d03b11ad5b	Pass inductor strides forward in ddp optimizer (#120523 ) # Note: Returning Fake Tensors on First AOT Autograd Call # # Inductor will optimize strides of outputs when it deems it profitable. # For instance, converting to channels last. When we split the graph here # into multiple inductor compilations, we need to make sure that the # output strides of one compilation is appropriately passed to the subsequent # compilations. However, the mapping from inductor output to dynamo output # is non-trivial due to aot_autograd's deduping, de-aliasing, mutation, re-writing, # subclass handling, etc. In order to replay all this logic we set a flag such that # the first invocation of inductor in aot_autograd will return Fake Tensors with # appropriate strides. Then, all of aot autograd's runtime logic is replayed. # This gives us the appropriately strided outputs here which will reflect runtime strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120523 Approved by: https://github.com/yf225, https://github.com/bdhirsh	2024-02-29 22:25:00 +00:00
Matthias Reso	772db2a3ae	Fix handling of torch.return_types in dynamo (#120826 ) Handle quasi-namedtuples as a special case in dynamo Fixes #120651 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120826 Approved by: https://github.com/anijain2305	2024-02-29 22:11:35 +00:00
Avik Chaudhuri	f7a809c96a	fix dupe deprecated warning in dynamo export (#120896 ) Summary: When we convert `dynamic_shapes` to `constraints` and pass them to `_dynamo.export`, we shouldn't give a deprecation warning. Such conversion happens when calling `torch.export.export`, e.g. But it can also happen when calling `capture_pre_autograd_graph` (which itself has this deprecation warning when `constraints` are passed directly as well). Since `_log_export_usage` is an indicator of a top-level call (it is `True` by default but set to `False`, or at least passed through, by callers), we can (ab)use it to indicate when to give this deprecation warning. Test Plan: none Differential Revision: D54350172 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120896 Approved by: https://github.com/BoyuanFeng, https://github.com/zhxchen17	2024-02-29 18:57:42 +00:00
Yanbo Liang	1104e0798c	[Dynamo] Fix inspect.getattr_static doesn't work well for torch.utils._cxx_pytree.PyTreeSpec (#120812 ) Fixes #118793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120812 Approved by: https://github.com/zou3519	2024-02-29 18:19:14 +00:00
Catherine Lee	9e016debeb	[dynamo] Fix inference_mode context variable (#120830 ) <idk what im doing> Fixes #120646 The module for torch.inference_mode should be torch The input to `create` is a bool (mode?) and `_enter_inference_mode` expects a bool but [BlockStackEntry](`50073248ed/torch/_dynamo/symbolic_convert.py (L206)`) expects `target_values` to be a list? [inference_mode](`50073248ed/torch/autograd/grad_mode.py (L205)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120830 Approved by: https://github.com/zou3519, https://github.com/anijain2305, https://github.com/tugsbayasgalan	2024-02-29 17:10:06 +00:00
cpuhrsch	576c0482a5	Remove hard numpy dependency from guards.py (#119519 ) I'm not sure if this is the ideal behavior / best fix for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119519 Approved by: https://github.com/albanD	2024-02-29 14:37:33 +00:00
Nikita Shulga	14c5ebc8a1	[Dynamo] Do not attempt to make nditer spawned arrays writable (#120868 ) As they are not, converting `numpy.nditer` to writable is too expensive and tensor values are copied anyway Minimal reproducer: ```python import numpy as np import torch @torch.compile def f(x): return x + 1.0 for x in np.nditer(np.arange(3)): print(f(x)) ``` Fixes https://github.com/pytorch/pytorch/issues/119787 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120868 Approved by: https://github.com/jansel	2024-02-29 07:49:59 +00:00
Yanbo Liang	169c220bf8	[torch.compile] Provide capability to register callback on compile start/stop (#120764 ) This is a requirement from Meta internal cases, where ppl wants to register a callback function to detect if a job is stuck during compilation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120764 Approved by: https://github.com/jansel	2024-02-29 07:37:52 +00:00
Animesh Jain	66d05a8900	[dynamo] Fix source for default dict default_factory (#120864 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120864 Approved by: https://github.com/yanboliang, https://github.com/Skylion007, https://github.com/jansel	2024-02-29 07:25:13 +00:00
Oleg Khabinov	4b18ab869f	[torch.export] Support is_compiling() flag for non-strict mode (#119602 ) Summary: In non-strict mode of torch.export() we didn't set those `is_compiling()` to `True` which is needed by some models. Test Plan: Unit tests and manual testing. Differential Revision: D53624452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119602 Approved by: https://github.com/suo	2024-02-29 05:52:51 +00:00
youkaichao	2c0c70f763	[Dynamo] enumerate imported names for eval_frame.py (#120778 ) Fixes https://github.com/pytorch/pytorch/issues/120699 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/120778 Approved by: https://github.com/Skylion007	2024-02-29 03:08:43 +00:00
PyTorch MergeBot	db1cc781db	Revert "[dynamo] Function => FunctionCtx for placeholder obj (#120577 )" This reverts commit `ee01d0807b`. Reverted https://github.com/pytorch/pytorch/pull/120577 on behalf of https://github.com/jansel due to Causing breakages internally ([comment](https://github.com/pytorch/pytorch/pull/120577#issuecomment-1970254363))	2024-02-29 01:56:09 +00:00
Aaron Orenstein	1b8bb027f6	Fix guard for SUPPORTED_NODES (#120798 ) The special-case code for handling SUPPORTED_NODES was producing a guard that looked like: ``` "G['torch'].utils._pytree.SUPPORTED_NODES[<class '__main__.CausalLMOutputWithPast'>].type" ``` resulting in a eval error trying to evaluate the guard. This change adds a new source type (`ClassSource`) which is given a class type (in this case `CausalLMOutputWithPast`) and attempts to fetch it from its defining module. It then uses that to build the `SUPPORTED_NODES` guards instead of referring to the type directly. Also added a unit test which fails before this change and passes after. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120798 Approved by: https://github.com/anijain2305	2024-02-28 23:34:17 +00:00
Elias Ellison	9c9bde515c	Factor out Submod compilers (#120527 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120527 Approved by: https://github.com/kadeng	2024-02-28 22:11:47 +00:00
Jason Ansel	01ec8df6d8	[Compiled Autograd] Introduce BackwardState capture (#120382 ) This adds support for backwards hooks that are both: 1) Interior to the graph; and 2) Dynamically generated (e.g. lambdas) We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo after the forwards runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382 Approved by: https://github.com/xmfan	2024-02-28 20:36:47 +00:00
Guilherme Leobas	491c2b4665	Let torch dynamo inline torch.func.grad (#118407 ) When dynamo sees torch.func.grad, it tries to inline all frames related to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118407 Approved by: https://github.com/zou3519	2024-02-28 20:05:00 +00:00
Avik Chaudhuri	5472923998	derived dim (#118729 ) With the current `Dim`-based dynamic shapes API for export, one can express that shapes of different input shapes must be equal by reusing the same `Dim`. However, non-trivial relationships between such input shapes cannot be expressed. Recently we are seeing more and more examples of code that require this additional expressibility, e.g., where a pair of shapes might differ by one, or a shape might be double another (or simply even). This PR introduces the concept of a "derived" `Dim`, i.e., a linear arithmetic expression over a `Dim`. By using a combination of `Dim`s and derived `Dim`s to specify input shapes, the desired relationships can be expressed naturally. E.g., a pair of shapes might be `dim` and `dim + 1`, or `dim` and `2dim`, or even `2dim` and `dim + 1`. We extend the current infrastructure that translates `Dim`s to deprecated `dynamic_dim`-based constraints to work with derived `Dim`s. As usual, we raise constraint violation errors when shape guards cannot be verified given a dynamic shapes spec; suggest fixes; and raise runtime errors when future inputs violate the spec. Importantly, some guards that used to cause forced specializations in the constraint solver because they were deemed "too complex" now do not do so, because they can now be specified as constraints. Since this was what motivated the introduction of a `disable_constraint_solver` flag to some internal APIs, we may not need that flag any more. Note that shapes of placeholders in exported programs can now contain symbolic expressions and not just symbols. Differential Revision: D53254587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118729 Approved by: https://github.com/ezyang	2024-02-28 19:48:32 +00:00
Yu, Guangye	46e3f670b4	refactor code to share across different devices (#120602 ) # Motivation Refactor utils code to make it possible to share across CUDA, XPU, and other backends. # Solution Move `_dummy_type` and `_LazySeedTracker` to torch._utils; # Additional Context When upstreaming, refactor these code changes by isolating them into in an additional PR to minimize their impact on the CUDA code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120602 Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang	2024-02-28 09:42:58 +00:00
Animesh Jain	e9a961f66a	[dynamo][refactor] Use originating_source for HASATTR (#120723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120723 Approved by: https://github.com/jansel ghstack dependencies: #120520, #120590, #120721	2024-02-28 05:00:59 +00:00
Animesh Jain	5a53c0ff23	[dynamo][refactor] Rename LIST_LENGTH to SEQUENCE_LENGTH, separate DICT_LENGTH (#120721 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120721 Approved by: https://github.com/jansel ghstack dependencies: #120520, #120590	2024-02-28 02:19:10 +00:00
Edward Z. Yang	1a1fc1047d	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007 ghstack dependencies: #120712	2024-02-28 01:01:41 +00:00
Edward Z. Yang	213b3ac3f2	[BE] fail_* variables don't need to be shared across restarts, they're set only once (#120712 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120712 Approved by: https://github.com/yanboliang	2024-02-28 00:48:11 +00:00
Aaron Orenstein	6cc7f9a2e6	Limit loop unrolling (#120023 ) Tacotron2 causes massive loop unrolling resulting in very large graphs (26k nodes) which was causing inductor (and tracing itself) to choke. The unrolling size is controlled by the environment variable TORCHDYNAMO_MAX_LOOP_UNROLL_NODES which defaults to the arbitrary value 5000. This updates the tacotron2 timings as follows: eager timing: 3m:23s -> 35s aot_eager timing: 4m:12s -> 39s inductor timing: 22m:24s ->1m For reference the big loop in tacotron2 was this one (model.py[405]): ``` while len(mel_outputs) < decoder_inputs.size(0) - 1: decoder_input = decoder_inputs[len(mel_outputs)] mel_output, gate_output, attention_weights = self.decode(decoder_input) mel_outputs += [mel_output.squeeze(1)] gate_outputs += [gate_output.squeeze(1)] alignments += [attention_weights] ``` which gets unrolled and inlined adding about 36 nodes to the graph per iteration. Fixes #98467 Relates to #102839 which hopefully will result in a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120023 Approved by: https://github.com/yanboliang	2024-02-27 20:44:21 +00:00
PyTorch MergeBot	f3dd2a544c	Revert "Add structured trace logs (#120289 )" This reverts commit `9dfaef962c`. Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))	2024-02-27 19:49:05 +00:00
Jason Ansel	5b5c167adc	[dynamo] Add some helpers to PyCodegen (#120684 ) This are used in later PRs in the stack Pull Request resolved: https://github.com/pytorch/pytorch/pull/120684 Approved by: https://github.com/yanboliang	2024-02-27 18:46:51 +00:00
Kai Londenberg	2ad66e6bf0	Fix test failure: Add torch.cuda._get_device_properties to dynamo trace rules (#120620 ) In this PR stack, there were unrelated test failures within test_trace_rules.py - It turned out that torch.cuda._get_device_properties should be registered in _dynamoc/trace_rules.py. A test failed because it was not. This is a small fix which tries to get rid of the test failure by manually registering that function. Note: I am not sure whether this is the best way to fix this, as I am neither familiar with the trace rules nor with the introduction of torch.cuda._get_device_properties. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120620 Approved by: https://github.com/Skylion007	2024-02-27 10:46:01 +00:00
Animesh Jain	e3d64c4d5d	[dynamo] Desugar accumulate_grad, fix .grad handling (#120590 ) Fixes https://github.com/pytorch/pytorch/issues/118435 Fixes https://github.com/pytorch/pytorch/issues/119906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120590 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #120520	2024-02-27 10:12:26 +00:00
Animesh Jain	8a59f49da2	[dynamo][compile-time] Collect guard debug stack info only with logs enabled (#120520 ) Reduces backend=eager compile time from 33 to 19 seconds for `MobileBertForQuestionAnswering`. This also helps an internal model where guards.add function is taking 124 seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120520 Approved by: https://github.com/mlazos	2024-02-27 01:51:16 +00:00
PyTorch MergeBot	17560eb472	Revert "[Dynamo] Remove deadcode: unwrapping script_if_tracing (#120444 )" This reverts commit `4d2073bc3f`. Reverted https://github.com/pytorch/pytorch/pull/120444 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54192376 ([comment](https://github.com/pytorch/pytorch/pull/120444#issuecomment-1965600268))	2024-02-27 00:58:00 +00:00
Sergii Dymchenko	d341b66e96	Revert [dynamo] support group=None when rewriting collectives (#12018 ) (#120677 ) This reverts commit `298c686d3f`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120677 Approved by: https://github.com/yifuwang, https://github.com/huydhn	2024-02-27 00:33:35 +00:00
Edward Z. Yang	9dfaef962c	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007	2024-02-27 00:04:23 +00:00
Yanbo Liang	5a0a964444	[Dynamo] Fix guards for script_if_tracing or lru_cache fn with default args (#120390 ) Fixes #120387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120390 Approved by: https://github.com/anijain2305	2024-02-26 19:40:14 +00:00
Edward Z. Yang	fd3cf88f27	Rewrite docs about why we guard on dynamic dims (#120566 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120566 Approved by: https://github.com/desertfire	2024-02-26 18:58:30 +00:00
angelayi	759204253f	[export] Change runtime asserts to using assert_scalar (#119608 ) By changing runtime symbolic asserts to using assert_scalar, the asserts can call into `expect_true` and modify the shape env so that we can run through the traced graph module with fake tensors. With assert_async, the asserts only get hit during runtime, but that means if we run the graph module with fake tensors, the asserts will not affect the shape env, so later data dependent calls to the fake tensors may result in GuardOnDataDependentSymNode errors. https://github.com/pytorch/pytorch/issues/119587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119608 Approved by: https://github.com/ezyang	2024-02-26 17:56:12 +00:00
Sam Larsen	2fb32a5f07	Enable fake tensor caching in fbcode by default (#118555 ) Summary: Enabled by default in OSS; this switches the default to "on" in fbcode too. Test Plan: Ran torchbench benchmarks in fbcode Differential Revision: [D53771626](https://our.internmc.facebook.com/intern/diff/D53771626) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118555 Approved by: https://github.com/eellison	2024-02-26 17:35:23 +00:00
Jason Ansel	ee01d0807b	[dynamo] Function => FunctionCtx for placeholder obj (#120577 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120577 Approved by: https://github.com/yanboliang	2024-02-26 17:16:31 +00:00
angelayi	86063b4d03	Add torch._print to dynamo trace_rules (#120533 ) Fixes #114831 Before: ``` (pytorch10) angelayi@devgpu022 ~/local/pytorch [main] $ python test/dynamo/test_trace_rules.py -k test_torch_name_rule_map_updated F ====================================================================== FAIL: test_torch_name_rule_map_updated (__main__.TraceRuleTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/users/angelayi/pytorch/torch/testing/_internal/common_utils.py", line 2739, in wrapper method(args, *kwargs) File "/data/users/angelayi/pytorch/test/dynamo/test_trace_rules.py", line 328, in test_torch_name_rule_map_updated self._check_set_equality( File "/data/users/angelayi/pytorch/test/dynamo/test_trace_rules.py", line 302, in _check_set_equality self.assertTrue(len(x) == 0, msg1) AssertionError: False is not true : New torch objects: {<built-in method _print of type object at 0x7ff477e40ee0>} were not added to `trace_rules.torch_c_binding_in_graph_functions` or `test_trace_rules.ignored_c_binding_in_graph_function_names`. Refer the instruction in `torch/_dynamo/trace_rules.py` for more details. To execute this test, run the following from the base repo dir: python test/dynamo/test_trace_rules.py -k test_torch_name_rule_map_updated This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- Ran 1 test in 0.184s FAILED (failures=1) ``` After change: ``` (pytorch10) angelayi@devgpu022 ~/local/pytorch [main] $ python test/dynamo/test_trace_rules.py -k test_torch_name_rule_map_updated . ---------------------------------------------------------------------- Ran 1 test in 0.209s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120533 Approved by: https://github.com/clee2000, https://github.com/yanboliang, https://github.com/huydhn, https://github.com/Skylion007	2024-02-26 16:52:59 +00:00
Animesh Jain	c18623b7ed	[dynamo] Reland 120147 - - Use EQUALS_MATCH guard for mod.training (#120578 ) To fix Memory leak discovered in https://github.com/pytorch/pytorch/issues/112090 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120578 Approved by: https://github.com/jansel	2024-02-26 03:49:47 +00:00
Shan19900305	685d862c45	Add SparsePrivateUse1 in backend_to_string, layout_from_backend and check_base_legacy_new. (#119263 ) 1) Using items stored in torch._tensor_classes to check item passed from python side; 2) Add SparsePrivateUse1 in backend_to_string, layout_from_backend and check_base_legacy_new; 3) Using more general API to get python module name in get_storage_obj and get_name functions. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119263 Approved by: https://github.com/ezyang	2024-02-26 01:54:30 +00:00
Animesh Jain	834c7a1d3e	[dynamo][refactor] Move some helper functions to global scope (#120426 ) This is to prepare for guard C++ refactor work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120426 Approved by: https://github.com/ezyang	2024-02-25 04:38:20 +00:00
Yifu Wang	298c686d3f	[dynamo] support group=None when rewriting collectives (#120118 ) Resolves case 2 in #120082. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120118 Approved by: https://github.com/wconstab ghstack dependencies: #120370	2024-02-25 03:12:10 +00:00
Edward Z. Yang	0f20cc1e0e	Don't use size on TensorVariable when doing out resize test (#120567 ) Fixes https://github.com/pytorch/pytorch/issues/120482 Fixes https://github.com/pytorch/pytorch/issues/120511 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120567 Approved by: https://github.com/Skylion007	2024-02-25 02:21:34 +00:00
Michael Lazos	56203fc407	Add profiling for backward (#120540 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/120540 Approved by: https://github.com/anijain2305	2024-02-24 16:53:28 +00:00
Isuru Fernando	b7df3bba62	add decomposition for frexp (#119217 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119217 Approved by: https://github.com/peterbell10 ghstack dependencies: #119284, #120027	2024-02-23 21:52:42 +00:00
Yanbo Liang	4d2073bc3f	[Dynamo] Remove deadcode: unwrapping script_if_tracing (#120444 ) After the consolidated ```trace_rules.lookup```, we already unwrap at `2240018c03/torch/_dynamo/variables/builder.py (L712)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120444 Approved by: https://github.com/anijain2305	2024-02-23 21:22:09 +00:00
Edward Z. Yang	edf1c4e552	[Dynamo] Handle guard_size_oblivious in user code (#120379 ) Fixes https://github.com/pytorch/pytorch/issues/120083 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120379 Approved by: https://github.com/yanboliang	2024-02-23 05:38:57 +00:00
PyTorch MergeBot	722afe6171	Revert "[dynamo] Use EQUALS_MATCH guard for mod.training (#120147 )" This reverts commit `b642a18e80`. Reverted https://github.com/pytorch/pytorch/pull/120147 on behalf of https://github.com/williamwen42 due to memory leak, see https://github.com/pytorch/pytorch/issues/112090 ([comment](https://github.com/pytorch/pytorch/pull/120147#issuecomment-1960522018))	2024-02-22 23:46:55 +00:00
Thiago Crepaldi	3588e7f265	Ignore .numpy() under FakeTensorMode() (#120261 ) Fixes #120259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120261 Approved by: https://github.com/jansel	2024-02-22 22:49:20 +00:00
Yifu Wang	11e4a9266d	Temporarily support ranks + tag as pg identifier in native funcol (#120226 ) As communicated in https://github.com/pytorch/pytorch/issues/93173#issuecomment-1907095208, although we are dropping `(ranks, tag)` as group identifier in funcols, there will be a grace period for migration. This PR adds temporary `(ranks, tag)` support in native funcols. It also helps us decouple the py funcol -> native funcol transition from the API change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120226 Approved by: https://github.com/wanchaol, https://github.com/wconstab ghstack dependencies: #120042, #120043, #120070	2024-02-22 20:24:16 +00:00
Yanbo Liang	03f7235caa	[Dynamo] Fix dynamo trace rules (#120371 ) ```test_trace_rules.py``` is still failing due to this. Fixes https://github.com/pytorch/pytorch/issues/114831 (Having this here will run the disabled test on the PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120371 Approved by: https://github.com/drisspg, https://github.com/huydhn	2024-02-22 14:32:00 +00:00
Bert Maher	0e4bd25a33	[inductor] When generating debug logs don't fail if nvcc not found (#120346 ) If nvcc isn't found subprocess throws a CalledProcessError Differential Revision: [D54028438](https://our.internmc.facebook.com/intern/diff/D54028438/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120346 Approved by: https://github.com/Skylion007, https://github.com/shunting314	2024-02-22 14:25:34 +00:00
PyTorch MergeBot	8fa6340701	Revert "Ignore .numpy() under FakeTensorMode() (#120261 )" This reverts commit `952b37145b`. Reverted https://github.com/pytorch/pytorch/pull/120261 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems breaking trunk on Python 3.12 `952b37145b` ([comment](https://github.com/pytorch/pytorch/pull/120261#issuecomment-1958267417))	2024-02-21 23:09:27 +00:00
Thiago Crepaldi	952b37145b	Ignore .numpy() under FakeTensorMode() (#120261 ) Fixes #120259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120261 Approved by: https://github.com/jansel	2024-02-21 22:06:29 +00:00
Edward Z. Yang	9199468401	Properly trace into mark_static (#120232 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120232 Approved by: https://github.com/yanboliang	2024-02-21 13:51:31 +00:00
drisspg	d6801578c3	Update tracing rules for new cudnn functions (#120268 ) # Summary This updates the trace_rules with the new cudnn torch functions for sdpa To repro: `pytest test/dynamo/test_trace_rules.py -k test_torch_name_rule_map_updated` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120268 Approved by: https://github.com/shuqiangzhang, https://github.com/huydhn, https://github.com/yanboliang	2024-02-21 05:22:44 +00:00
Elias Ellison	96092e1f55	Extend aot_graph_input_parser to sym shapes (#120246 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120246 Approved by: https://github.com/shunting314	2024-02-20 23:24:45 +00:00
Yifu Wang	2d6c0cc81b	Run test_functional_api.py with both legacy and native funcol impls (#119982 ) Additional changes: tests in test_functional_api.py uses multi-threaded pg which is implemented in Python. For the native ops to call into the Python pg implementation, glue code in PyProcessGroup is required for each collective. This PR also adds a few pieces of previously missing glue code, which are necessary for running test_functional_api.py with native funcol. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119982 Approved by: https://github.com/wanchaol	2024-02-20 21:15:37 +00:00
Yanbo Liang	d42ede8ae4	[torch.compile] Log compilation start time for timeline view (#120220 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/120220 Approved by: https://github.com/angelayi	2024-02-20 21:07:40 +00:00
Brian Hirsh	26fbbc3e84	DTensor + dynamo: fix is_shard/replicate always inlining to False (#118668 ) Fixes an internal enablement bug. When dynamo traces `is_sharded`/`is_replicate`, it would unconditioanlly assume the result was False. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118668 Approved by: https://github.com/wconstab, https://github.com/wanchaol ghstack dependencies: #117667, #117666, #118209, #118191, #118667	2024-02-20 15:23:48 +00:00
Jason Ansel	f2cf0768d1	[dynamo][distributed] handle _rank_not_in_group, _get_or_create_default_group (#119628 ) Copy of #117692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119628 Approved by: https://github.com/yanboliang	2024-02-18 22:34:35 +00:00
Animesh Jain	b642a18e80	[dynamo] Use EQUALS_MATCH guard for mod.training (#120147 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120147 Approved by: https://github.com/jansel ghstack dependencies: #120132, #120140, #120145	2024-02-18 00:31:36 +00:00
Animesh Jain	0b11b0edd6	[dynamo][refactor] Use existing helper functions for CLOSURE_MATCH (#120145 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120145 Approved by: https://github.com/jansel, https://github.com/Fidget-Spinner ghstack dependencies: #120132, #120140	2024-02-18 00:31:36 +00:00
Jason Ansel	2fea475215	[dynamo] Refactor reconstruct() not to return anything (#120150 ) This simplifies things slightly and avoids some bugs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120150 Approved by: https://github.com/yanboliang	2024-02-17 17:13:41 +00:00
Animesh Jain	757fc663a8	[dynamo][refactor] Use TYPE_MATCH instead of manually constructing guard (#120140 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120140 Approved by: https://github.com/jansel, https://github.com/yanboliang ghstack dependencies: #120132	2024-02-17 16:03:36 +00:00
Animesh Jain	48d96c08f2	[dynamo][guards] Use EQUALS_MATCH for NAME_MATCH (#120132 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120132 Approved by: https://github.com/jansel, https://github.com/yanboliang	2024-02-17 16:03:36 +00:00
Shunting Zhang	becfda005e	tiny improvement to the cprofile wrapper (#120100 ) 1. right now we double increment the profile counter. The PR avoid that so we don't end up with profile_0, profile_2, profile_4 ... 2. log the latency to run the passed in function with profiling on so we can easily skip those _compile call which returns quickly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120100 Approved by: https://github.com/eellison	2024-02-17 02:10:25 +00:00
Menglu Yu	7b1f5c874f	[PT2][Optimus][Observability] Log the optimus graph transformation to the scuba (#119745 ) Summary: Current everstore upload logging may cuase excessive compilation time when the model has lots of graph breaks (post: https://fb.workplace.com/groups/257735836456307/permalink/633533465543207/), we here log the transformation only when the graph changed Test Plan: timeout flows: f528209775 f530084719 Differential Revision: D53692344 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119745 Approved by: https://github.com/jackiexu1992	2024-02-16 21:32:04 +00:00
Yanbo Liang	4f4629d522	[Dynamo] Fix ListIteratorVariable repr to avoid log flooding (#120053 ) This issue was found from Meta internal use case. Before: ``` V0215 18:33:41.761000 140489262883968 torch/_dynamo/symbolic_convert.py:682 [0/0] TRACE starts_line /data/users/ybliang/debug/debug4.py:11 in <listcomp> (f) (inline depth: 1) V0215 18:33:41.761000 140489262883968 torch/_dynamo/symbolic_convert.py:682 [0/0] a = [sum(x) for x in result] V0215 18:33:41.761000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE BUILD_LIST 0 [] V0215 18:33:41.761000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST .0 [ListVariable()] V0215 18:33:41.762000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable([LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=0)] V0215 18:33:41.762000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable([LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1), LazyVariableTracker()] V0215 18:33:41.762000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1)] V0215 18:33:41.763000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1), BuiltinVariable(sum)] V0215 18:33:41.763000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1), BuiltinVariable(sum), ListVariable()] V0215 18:33:41.764000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1), ConstantVariable(int: 50)] V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE JUMP_ABSOLUTE 4 [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1)] V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1)] V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2), LazyVariableTracker()] V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2)] V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2), BuiltinVariable(sum)] V0215 18:33:41.766000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2), BuiltinVariable(sum), ListVariable()] V0215 18:33:41.766000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2), ConstantVariable(int: 68)] V0215 18:33:41.767000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE JUMP_ABSOLUTE 4 [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2)] ``` After: ``` V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:682 [0/0] TRACE starts_line /data/users/ybliang/debug/debug4.py:11 in <listcomp> (f) (inline depth: 1) V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:682 [0/0] a = [sum(x) for x in result] V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE BUILD_LIST 0 [] V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST .0 [ListVariable()] V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable(length=10, index=0)] V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable(length=10, index=1), LazyVariableTracker()] V0215 18:27:57.902000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable(length=10, index=1)] V0215 18:27:57.902000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable(length=10, index=1), BuiltinVariable(sum)] V0215 18:27:57.903000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable(length=10, index=1), BuiltinVariable(sum), ListVariable()] V0215 18:27:57.903000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable(length=10, index=1), ConstantVariable(int: 55)] V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE JUMP_ABSOLUTE 4 [ListVariable(), ListIteratorVariable(length=10, index=1)] V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable(length=10, index=1)] V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable(length=10, index=2), LazyVariableTracker()] V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable(length=10, index=2)] V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable(length=10, index=2), BuiltinVariable(sum)] V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable(length=10, index=2), BuiltinVariable(sum), ListVariable()] V0215 18:27:57.905000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable(length=10, index=2), ConstantVariable(int: 64)] V0215 18:27:57.905000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE JUMP_ABSOLUTE 4 [ListVariable(), ListIteratorVariable(length=10, index=2)] V0215 18:27:57.905000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable(length=10, index=2)] V0215 18:27:57.905000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable(length=10, index=3), LazyVariableTracker()] V0215 18:27:57.906000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable(length=10, index=3)] V0215 18:27:57.906000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable(length=10, index=3), BuiltinVariable(sum)] V0215 18:27:57.906000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable(length=10, index=3), BuiltinVariable(sum), ListVariable()] V0215 18:27:57.907000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable(length=10, index=3), ConstantVariable(int: 56)] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120053 Approved by: https://github.com/williamwen42	2024-02-16 21:19:37 +00:00
Brian Hirsh	26343451be	DTensor: make tensor_flatten more compatible for dynamo getattr (#118209 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118209 Approved by: https://github.com/ezyang, https://github.com/wanchaol ghstack dependencies: #117667, #117666	2024-02-16 21:16:07 +00:00
Brian Hirsh	ee7bcf23db	dynamo: support attribute access on tensor subclasses without sources (#117666 ) Fixes https://github.com/pytorch/pytorch/issues/117596 This was needed for Float8Tensor. Before this PR, dynamo would sometimes handle attribute access on tensor subclasses correctly, but it would choke on tensor subclasses with no source (it would fall back to using a `GetAttrVariable` to represent the attribute access, which is a problem if the attribute is a tensor that we later want to call tensor methods on). I supported two cases: (1) the attribute is a tensor, which is part of the `attrs` returned by the subclass's `__tensor_flatten__`. This creates a `TensorVariable` (2) the attribute is a constant, which is part of the constant metadata returned by `__tensor_flatten__`. As per the contract of tensor_flatten, this should be a `ConstantVariable`. It could be possible that we allow non-constant metadata in the future, but we don't support that today. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117666 Approved by: https://github.com/zou3519 ghstack dependencies: #117667	2024-02-16 21:16:07 +00:00
Brian Hirsh	67f6aca0d0	dynamo: respect autograd.Function + multiple save_for_backward calls (#117667 ) Fixes https://github.com/pytorch/pytorch/issues/117652. Corner case that I hit debugging some Float8 issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117667 Approved by: https://github.com/ezyang, https://github.com/zou3519	2024-02-16 21:16:07 +00:00
soulitzer	312ce35c1f	Rename singleton int to nested int (#119661 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119661 Approved by: https://github.com/ezyang	2024-02-16 19:21:17 +00:00
laith sakka	3693d8f467	Do to convert UnsupportedFakeTensorException into RuntimeError in runNode for proper graph breaking. (#120026 ) Fix: https://github.com/pytorch/pytorch/issues/119779 by properly graph breaking a proper fix is to handle quantized tensors for full complete solution. if when generating a fake tensor, UnsupportedFakeTensorException is thrown, then its handled and converted into a Unimplemented in inside wrap_fake_exception which is then translated to a graph break. However run_node used to convert UnsupportedFakeTensorException into a runtime error, creating runtime errors instead of graph breaks whenever generating a fake tensor for a quantized tensor fails. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120026 Approved by: https://github.com/jansel	2024-02-16 09:21:58 +00:00
Oguz Ulgen	62e5840b36	[Dynamo] Do not create TorchInGraphFunctionVariable for tags (#120005 ) Fixes https://github.com/pytorch/pytorch/issues/119793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120005 Approved by: https://github.com/yanboliang	2024-02-16 03:37:32 +00:00
Yanbo Liang	2a63dd8889	[Dynamo] Support lazy module with namedtuple/dict input (#119972 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119972 Approved by: https://github.com/jansel	2024-02-15 23:18:18 +00:00
PyTorch MergeBot	47300221c2	Revert "[export] Change runtime asserts to using assert_scalar (#119608 )" This reverts commit `f4d641ba2f`. Reverted https://github.com/pytorch/pytorch/pull/119608 on behalf of https://github.com/huydhn due to This break ONNX trunk job `65fd8b6730` ([comment](https://github.com/pytorch/pytorch/pull/119608#issuecomment-1947436402))	2024-02-15 22:25:24 +00:00
laith sakka	d707e3c9c6	Fix handling none source in build_torch_function_fn (#119724 ) Fix https://github.com/pytorch/pytorch/issues/119580 When a UserDefinedObjectVariable is created it does not always have a source, i.e: when its an intermediate This diff fix two handling of none source in two locations during an inlining of a user torch function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119724 Approved by: https://github.com/jansel, https://github.com/mlazos, https://github.com/anijain2305	2024-02-15 21:21:47 +00:00
angelayi	f4d641ba2f	[export] Change runtime asserts to using assert_scalar (#119608 ) By changing runtime symbolic asserts to using assert_scalar, the asserts can call into `expect_true` and modify the shape env so that we can run through the traced graph module with fake tensors. With assert_async, the asserts only get hit during runtime, but that means if we run the graph module with fake tensors, the asserts will not affect the shape env, so later data dependent calls to the fake tensors may result in GuardOnDataDependentSymNode errors. https://github.com/pytorch/pytorch/issues/119587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119608 Approved by: https://github.com/ezyang	2024-02-15 07:13:42 +00:00
Yifu Wang	cd08dc37f8	Support tracing native functional collective via python APIs (#119103 ) Summary: - Inlined `torch.distributed.distributed_c10d._get_group_size_by_name` - Updated all torch.compile tests in test_c10d_functional_native.py to use funcol python APIs (as opposed to the dispatcher ops) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119103 Approved by: https://github.com/wconstab, https://github.com/fegin, https://github.com/wanchaol	2024-02-15 03:33:49 +00:00
Yanbo Liang	7f5b87c953	[torch.compile] Log more compilation time breakdown (#119865 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119865 Approved by: https://github.com/ezyang	2024-02-15 02:20:07 +00:00
Alexander Grund	99cb807e25	Skip test_wrap_bad if run under pytest (#115070 ) Pytest replaces sys.stdout/stderr by `TextIOWrapper` instances which do not support `fileno()` Hence skip that test in this case Fixes #115069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115070 Approved by: https://github.com/clee2000	2024-02-15 00:10:05 +00:00
Zhengxu Chen	8f27fde2f5	[export] Log private api uses. (#119848 ) Summary: as title. The following APIs are logged: - capture_preautograd_graph - torch._export.aot_compile - external usage of _export_to_torch_ir (AOTInductor, Pippy) - constraints API - public use of torch._dynamo.export Test Plan: CI Differential Revision: D53735599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119848 Approved by: https://github.com/suo	2024-02-14 22:58:23 +00:00
gs-olive	e0f6fa6a7c	Windows Dynamo Error Removal CI Check (#115969 ) Rebase of #111313 onto `main`, for CI validation Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969 Approved by: https://github.com/PaliC, https://github.com/thiagocrepaldi	2024-02-14 21:14:36 +00:00
Aaron Orenstein	2aad3f93f8	Fix guards for field access through properties (#119719 ) When building guards which went through a property we were analyzing the property using getattr_static but the guard wasn't built using getattr_static so if the property was "unusual" it generated misbehaved code which referenced a non-existent `__closure__` field. Fixes #118786 Note that after this change some of the referenced tests are still failing with a different error - but getting further. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119719 Approved by: https://github.com/oulgen	2024-02-14 20:42:55 +00:00
laith sakka	edd9ddf73f	Propagate allow_non_graph_fake between get_fake_values_from_nodes and get_fake_values (#119731 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119731 Approved by: https://github.com/jansel, https://github.com/anijain2305 ghstack dependencies: #119314, #119435	2024-02-14 15:26:17 +00:00
Yanbo Liang	54a30f6d4e	[Dynamo] Update trace_rules.py and re-enable skipped tests (#119860 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119860 Approved by: https://github.com/angelayi	2024-02-14 05:22:55 +00:00
Animesh Jain	80379ef0aa	[dynamo-must-fix] Use ID_MATCH for UserDefinedClass (#119853 ) Fixes https://github.com/pytorch/pytorch/issues/119715 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119853 Approved by: https://github.com/jansel	2024-02-14 03:14:42 +00:00
Jason Ansel	75a6d6aef7	[inductor] Support storage resizing (#119749 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119749 Approved by: https://github.com/yf225 ghstack dependencies: #119647, #119671	2024-02-14 03:03:38 +00:00
PyTorch MergeBot	4a5b2cd6cb	Revert "Windows Dynamo Error Removal CI Check (#115969 )" This reverts commit `45e7af5818`. Reverted https://github.com/pytorch/pytorch/pull/115969 on behalf of https://github.com/PaliC due to this pr ended up breaking some of our periodic tests ([comment](https://github.com/pytorch/pytorch/pull/115969#issuecomment-1942934386))	2024-02-14 01:11:46 +00:00
Guilherme Leobas	3319dbcd23	Update vmap guard to avoid recompilations (#119061 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119061 Approved by: https://github.com/zou3519	2024-02-13 20:50:23 +00:00
andrewor14	8ec8d78ef2	[quant][pt2e][be] Rename eval_utils -> export_utils (#119725 ) It's not really eval_utils anymore, since we added some training related utils. Instead it should be util functions that are related to general export use cases. Differential Revision: [D53711494](https://our.internmc.facebook.com/intern/diff/D53711494) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119725 Approved by: https://github.com/tugsbayasgalan	2024-02-13 19:10:06 +00:00
Jason Ansel	39c68efd85	[dynamo] Capture untyped_storage().resize_() (#119647 ) This makes storage resizing work with `backend=eager`, the next two PRs make it work for inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119647 Approved by: https://github.com/yf225	2024-02-13 19:03:28 +00:00
Edward Z. Yang	52de407b6c	Avoid performing replacements when it would unrefine ranges (#117356 ) Fixes https://github.com/pytorch/pytorch/issues/117268; check this issue for background. This PR does the following: * Do not perform a replacement if the expression we're replacing the symbol with has a less refined value range than the original. There's a little bit of trickiness around the handling for values close to INT64_MAX; when checking if a range refines another, I only consider the range representable in 64-bit integers. This is enough to prevent us from doing a substitution like `i0 = 10 - i1`, but it appears to still let us do the other substitutions we like, such as `i0 = i1` or `i0 = 12 * i1` * The test above is order dependent: if we assert an equality BEFORE we have refined a range, we might be willing to do the replacement because there isn't a meaningful range. This means that it's important to mark things as sizes, before you start doing other error checking. `split_with_sizes` is adjusted accordingly. It would be good to raise an error if you get the ordering wrong, but I leave this to future work. * It turns out this is not enough to fix AOTAutograd, because we lose the size-ness of unbacked SymInts when AOTAutograd retraces the Dynamo graph. So update deferred runtime assert insertion to also insert size-ness and value ranges annotations. Note that, in principle, it shouldn't be necessary to explicitly do the latter; these should just show up as deferred runtime asserts. That's some extra refactoring for a later day. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117356 Approved by: https://github.com/lezcano	2024-02-13 15:56:59 +00:00
Wanchao Liang	bfb9ea1a43	fix compile DTensor.from_local in trace_rule_look up (#119659 ) There's a bug when converting from TorchVariable to trace rule look ups, in some corner cases the DTensor.from_local calls not matching the trace name rule look up, resulting in a None look up, and falling back to the UserFunctionVariable, which makes the tracing silent wrong by tracing into the DTensor.from_local function. Not exactly sure yet why the look up failed This PR fixes the DTensor.from_local tracing to make sure in everycase we should hit the InGraphFunctionVariable Pull Request resolved: https://github.com/pytorch/pytorch/pull/119659 Approved by: https://github.com/yifuwang	2024-02-13 05:21:19 +00:00
laith sakka	ea8e4fd5ac	Support FunctoolsPartialVariable::get_function, fix NamedTupleVariable::as_proxy and handle call_function in get_fake_values_from_nodes (#119435 ) partially address https://github.com/pytorch/pytorch/issues/118785 This diff fixes three things: 1. add get_function to FunctoolsPartialVariable note that it will be available only if all args constant otherwise, it would throw unimplemented in the call to asPythonConstant. 2. NamedTupleVariable takes args dispatched not as list ex: NamedTuple(a, b, c) vs NamedTuple([a, b, c]), hence fix that by specializing asProxy. 3. A call to create_arg from within create_proxy, changes a python NamedTuple to a function call node without associating an example value! Updated get_fake_values_from_nodes to handle such case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119435 Approved by: https://github.com/jansel, https://github.com/anijain2305 ghstack dependencies: #119314	2024-02-13 01:44:08 +00:00
Jason Ansel	74d55b0e63	[dynamo] Support torch.distributed.fsdp._flat_param._same_storage_size (#119627 ) Replaces #117690 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119627 Approved by: https://github.com/Skylion007	2024-02-13 01:27:37 +00:00
PyTorch MergeBot	472500e32a	Revert "Avoid performing replacements when it would unrefine ranges (#117356 )" This reverts commit `0e6b314fc2`. Reverted https://github.com/pytorch/pytorch/pull/117356 on behalf of https://github.com/huydhn due to Sorry for reverting the change but it looks like the forward fix still needs more work https://github.com/pytorch/pytorch/pull/119712, so it would be cleaner to reland them ([comment](https://github.com/pytorch/pytorch/pull/117356#issuecomment-1940032407))	2024-02-13 01:16:58 +00:00
PyTorch MergeBot	7d780ff86f	Revert "Enable fake tensor caching in fbcode by default (#118555 )" This reverts commit `0f2fbbff10`. Reverted https://github.com/pytorch/pytorch/pull/118555 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing one model test internally. Please take a look at the diff for more info D53189048 ([comment](https://github.com/pytorch/pytorch/pull/118555#issuecomment-1939550273))	2024-02-12 20:51:23 +00:00
Yanbo Liang	0e5b6594b7	[Dynamo] Minor cleanup of redundant function lookup logics (#119666 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119666 Approved by: https://github.com/angelayi	2024-02-12 19:00:39 +00:00
Tarun Karuturi	bc521f2ce3	In dynamo tracing for index() use None as the default indicator for end and not -1 (#119151 ) Summary: In dynamo tracing, `index()`'s implementation currently has the default begin index as `0` and the default end index as`-1` which means that by default we're dropping the last element. Rather we should be doing `None` which will ensure that the last element is also checked. Test Plan: CI Differential Revision: D53392287 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119151 Approved by: https://github.com/yanboliang	2024-02-12 17:45:05 +00:00
laith sakka	72d9a38118	add get_function to TorchInGraphFunctionVariable (#119314 ) partially address https://github.com/pytorch/pytorch/issues/118785 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119314 Approved by: https://github.com/yanboliang, https://github.com/anijain2305	2024-02-12 16:35:34 +00:00
Taras Tsugrii	a4cc6b85dc	[dynamo][eval][perf] Remove unnecessary dict copies. (#119305 ) Both of these variables are already created using `dict(...)` so making yet another `dict` copy is pure overhead and boilerplate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119305 Approved by: https://github.com/Skylion007	2024-02-11 20:29:26 +00:00
Yanbo Liang	57d8f67619	[Dynamo][17/N] Rename SkipFilesVariable to SkipFunctionVariable and move to functions.py (#119619 ) This is follow-up-3 from https://github.com/pytorch/pytorch/pull/118971#issue-2114082018 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119619 Approved by: https://github.com/jansel	2024-02-10 19:33:37 +00:00
Oguz Ulgen	e693089c7a	[Dynamo] Refactor tensor methods handling (#119581 ) Fixes part of #119128 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119581 Approved by: https://github.com/jansel, https://github.com/anijain2305	2024-02-10 08:46:50 +00:00
Jason Ansel	e1c1b8c2b2	[dynamo] Improve support for backwards hooks (#119525 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119525 Approved by: https://github.com/yanboliang, https://github.com/anijain2305	2024-02-10 01:14:03 +00:00
PyTorch MergeBot	25a0fa6d13	Revert "[dynamo] Improve support for backwards hooks (#119525 )" This reverts commit `b1f4b2a63c`. Reverted https://github.com/pytorch/pytorch/pull/119525 on behalf of https://github.com/clee2000 due to broke test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_gets_cleaned_up on dynamo https://github.com/pytorch/pytorch/actions/runs/7847212828/job/21416215820 `b1f4b2a63c`. The failure exists on the PR as well, but got masked by the other test. Putting this as no signal? ([comment](https://github.com/pytorch/pytorch/pull/119525#issuecomment-1936447169))	2024-02-09 18:58:55 +00:00
Yanbo Liang	f3a2094065	[Dynamo][Export] Mitigate legacy issue that aten op as export entrance function (#119528 ) This is going to fix a legacy issue like: ``` torch._dynamo.export(torch.ops.aten.scaled_dot_product_attention, ...)(*inputs,) ``` This is not supported any more, now the top level ```torch.export``` only support ```nn.Module```, but there are still some tests using the internal APIs and caused the ```trace_rules.check``` assertion error. This PR is going to mitigate such cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119528 Approved by: https://github.com/ydwu4	2024-02-09 18:24:09 +00:00
Yanbo Liang	5356b5d1f0	[Dynamo][16/N] Move skipfiles to trace_rules.py (#119432 ) This is follow-up-1 for https://github.com/pytorch/pytorch/pull/118971#issue-2114082018. Only code motion and doc update in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119432 Approved by: https://github.com/jansel	2024-02-09 18:18:23 +00:00
Jason Ansel	b1f4b2a63c	[dynamo] Improve support for backwards hooks (#119525 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119525 Approved by: https://github.com/yanboliang	2024-02-09 17:02:40 +00:00
PyTorch MergeBot	eff93fbd86	Revert "[Dynamo][16/N] Move skipfiles to trace_rules.py (#119432 )" This reverts commit `56364124af`. Reverted https://github.com/pytorch/pytorch/pull/119432 on behalf of https://github.com/atalman due to Breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/119432#issuecomment-1936122795))	2024-02-09 15:25:25 +00:00
Edward Z. Yang	0e6b314fc2	Avoid performing replacements when it would unrefine ranges (#117356 ) Fixes https://github.com/pytorch/pytorch/issues/117268; check this issue for background. This PR does the following: * Do not perform a replacement if the expression we're replacing the symbol with has a less refined value range than the original. There's a little bit of trickiness around the handling for values close to INT64_MAX; when checking if a range refines another, I only consider the range representable in 64-bit integers. This is enough to prevent us from doing a substitution like `i0 = 10 - i1`, but it appears to still let us do the other substitutions we like, such as `i0 = i1` or `i0 = 12 * i1` * The test above is order dependent: if we assert an equality BEFORE we have refined a range, we might be willing to do the replacement because there isn't a meaningful range. This means that it's important to mark things as sizes, before you start doing other error checking. `split_with_sizes` is adjusted accordingly. It would be good to raise an error if you get the ordering wrong, but I leave this to future work. * It turns out this is not enough to fix AOTAutograd, because we lose the size-ness of unbacked SymInts when AOTAutograd retraces the Dynamo graph. So update deferred runtime assert insertion to also insert size-ness and value ranges annotations. Note that, in principle, it shouldn't be necessary to explicitly do the latter; these should just show up as deferred runtime asserts. That's some extra refactoring for a later day. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117356 Approved by: https://github.com/lezcano	2024-02-09 14:43:58 +00:00
Sam Larsen	0f2fbbff10	Enable fake tensor caching in fbcode by default (#118555 ) Summary: Enabled by default in OSS; this switches the default to "on" in fbcode too. Test Plan: Ran torchbench benchmarks in fbcode Differential Revision: [D53189048](https://our.internmc.facebook.com/intern/diff/D53189048) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118555 Approved by: https://github.com/eellison	2024-02-09 05:42:16 +00:00
Elias Ellison	930b60f5aa	Add Debug Utility To Generate Inputs for AOT Graphs (#119409 ) ``` Takes in a function which has been printed with print_readable() and constructs kwargs to run it. Currently only handles Tensor inputs and a graph module which might have tensor constants. Example: Consider a function `forward` defined as follows: >>> def forward(self, primals_1: "f32[1001, 6]"): ... _tensor_constant0: "i64[4190]" = self._tensor_constant0 ... # Further implementation >>> kwargs = aot_graph_input_parser(forward) >>> forward(**kwargs) """ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119409 Approved by: https://github.com/shunting314	2024-02-09 03:55:19 +00:00
Yue Dong	915f9db03c	[Dynamo] Support kwargs for lazy module (#119445 ) Summary: Seems like `kwargs` is already support in `_infer_argument`, so we don't need the extra assertion `len(kwargs) == 0`. This optimization ensures compatibility with torch.compile() for LazyModules with kwargs inputs, preventing graph breaks. Test Plan: Unit tetst and CI Differential Revision: D53558778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119445 Approved by: https://github.com/yanboliang	2024-02-09 00:46:41 +00:00
gs-olive	45e7af5818	Windows Dynamo Error Removal CI Check (#115969 ) Rebase of #111313 onto `main`, for CI validation Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969 Approved by: https://github.com/ezyang	2024-02-08 21:23:45 +00:00
Angela Yi	8da2f81527	[export] Convert internal tests to using .module() (#119105 ) Test Plan: CI Differential Revision: D53091904 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119105 Approved by: https://github.com/ydwu4	2024-02-08 17:19:07 +00:00
ydwu4	b251bca205	[dynamo] inlining into __iter__ of user defined object (#119243 ) Fixes #119198. This PR make dynamo inline `__iter__` of a user defined object instead of creating a graph break. Also added a new test, which shows: 1. the loop is unrolled 2. the length of the loop is guarded when inlining `__iter__` ```python class Mod: def __init__(self): self.a = [torch.randn(2, 2), torch.randn(2, 2)] def __iter__(self): return iter(self.a) def f(mod): ret = [] for x in mod: ret.append(x + 1) return ret ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119243 Approved by: https://github.com/jansel	2024-02-08 17:07:30 +00:00

... 3 4 5 6 7 ...

2446 Commits