pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Yidi Wu	b838bdd4d4	[dynamo] remove unnecessary set_example_value for SymBool input. (#141610 ) These are automatically done in create_graph_input so we can remove them. Code refactoring only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141610 Approved by: https://github.com/zou3519	2024-12-10 17:33:48 +00:00
Yukio Siraichi	cbfab8b4de	Add `tensor._base` as a tracked fake for `ShapeEnv` guards. (#139554 ) This PR fixes the issue where AOTAutograd would produce a guard that used a symbolic value that came from one of the input's base. ```python @torch.compile(backend="aot_eager", dynamic=True) def f(a, b): a.add_(1) b.add_(1) return a x = torch.ones(10) f(x[1:], x[1:]) ``` In the example above, AOTAutograd functionalizes the mutation by making use of `as_strided_scatter` operation, which produces the guard: `s0 >= s1 + 1`, where: - `s0`: corresponds to `x.size()[0]` - `s1`: corresponds to `a.size()[0]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139554 Approved by: https://github.com/bdhirsh	2024-12-05 14:43:58 +00:00
Ryan Guo	ff73e2e679	[dynamo] Validate `mutation_type` and `source` in `VariableTracker.__init__` (#141717 ) As title, this also uncovered a few invalid use cases; the cases that cause error are fixed in separate patches prior to this patch, and the rest are fixed in this patch. This patch also moves a few `.source` mutation to variable construction, to increase the coverage of the validation. Fixes #133027. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141717 Approved by: https://github.com/jansel ghstack dependencies: #141713, #141714, #141715, #141902, #141716	2024-12-03 09:18:06 +00:00
Ryan Guo	0efd184685	[dynamo] Fix side effects for range iterator that escapes the graph (#141716 ) `wrap_range_iterator` mistakenly used `ValueMutationNew`, when it should've used `ValueMutationExisting`, because this code path always has a source. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141716 Approved by: https://github.com/jansel ghstack dependencies: #141713, #141714, #141715, #141902	2024-12-03 09:18:06 +00:00
Ryan Guo	7c3c8a662e	[dynamo] Add `RANGE_ITERATOR_MATCH` to properly guard on range iterators (#141902 ) A subsequeunt patch attempts to fix a side-effect issue for range iterators, which in turn exposed an exising issue on guards for range iterators -- the following test started failing: ``` PYTORCH_TEST_WITH_DYNAMO=1 python test/test_tensor_creation_ops.py TestTensorCreationCPU.test_hstack_column_stack_cpu_int16 ``` This patch adds a `RANGE_ITERATOR_MATCH` guard to make sure that we properly guard on range iterators, and adds a regression test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141902 Approved by: https://github.com/jansel ghstack dependencies: #141713, #141714, #141715	2024-12-03 09:18:06 +00:00
Ryan Guo	9eb0520d75	[dynamo] Fix side-effect handling for pre-existing `collections.deque` (#141714 ) Previously we never replayed side effects to `DequeVariable` with a source; the bug was already in the `test_deque_input` test, but went unnoticed because we didn't check the deque objects. This patch adds limited but practical support for this (see comments in `side_effects.py` for why limited), and updates the deque tests to check for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141714 Approved by: https://github.com/jansel ghstack dependencies: #141713	2024-12-03 09:18:06 +00:00
Bob Ren	2f72635a5c	automatic dynamic unspecialize float (#141647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141647 Approved by: https://github.com/ezyang	2024-11-29 22:36:53 +00:00
PyTorch MergeBot	9e98b3d73c	Revert "automatic dynamic unspecialize float (#141647 )" This reverts commit `1a32daeb17`. Reverted https://github.com/pytorch/pytorch/pull/141647 on behalf of https://github.com/atalman due to functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inner_grad [GH job link](https://github.com/pytorch/pytorch/actions/runs/12080983316/job/33697901875) [HUD commit link](`1a32daeb17`) ([comment](https://github.com/pytorch/pytorch/pull/141647#issuecomment-2507980876))	2024-11-29 15:00:33 +00:00
Bob Ren	1a32daeb17	automatic dynamic unspecialize float (#141647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141647 Approved by: https://github.com/ezyang	2024-11-29 07:53:53 +00:00
Ryan Guo	3141e038f0	[dynamo] Fix `VariableBuilder._wrap` on frozenset and enforce invariants on `ConstantVariable` (#141504 ) Prior to this patch, we are using `ConstantVariable.create` to create VT for frozenset objects, and intended yet failed to predicate that on all itmes being literals (see https://github.com/pytorch/pytorch/pull/140984#discussion_r1847393736). The code was from https://github.com/pytorch/torchdynamo/commit/7c03434 and the original goal was to help DBR quantization, but as the new test in this patch shows, it could lead to silent incorrectness. Upon a closer look, this exposes some subtleties in how Dynamo handles `ConstantVariable` and `LOAD_CONST`, so this patch both fixes the aforementioned issue and documents, enforces, and makes explicit the invariants around `ConstantVariable` and `LOAD_CONST` -- only immutable objects are supported. Specifically, this patch: 1. refine the checks for wrapping a `frozenset` object, document why we can't just wrap its items directly due to lack of `Sourcec` for set items, and use a safe workaround (`SourcelessBuilder`) to ensure soundness while keeping the DBR quantization support. 2. Adds more types to `common_constant_types`, thereby making `ConstantVariable.is_base_literal` more lenient, and strictly checks this property in the constructor of `ConstantVariable`. 3. Change relevant uses of `create_instruction("LOAD_CONST", ...)` to `create_load_const` which checks `is_safe_constant`, and makes developer overrides explicit by using `create_load_const_unchecked` when needed. 4. In a few places, use more specific `VariableTracker`, e.g., `TypingVariable` rather than `ConstantVariable`, and `FrozensetVariable` rather than `SetVariable`. (2) and (3) are mainly to future-proof Dynamo against bugs like (1). Pull Request resolved: https://github.com/pytorch/pytorch/pull/141504 Approved by: https://github.com/jansel	2024-11-27 21:58:35 +00:00
Bob Ren	ed9135a732	add jk for unspecialize float killswitch (#141143 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141143 Approved by: https://github.com/c00w	2024-11-20 23:20:52 +00:00
Bob Ren	7c7c34693d	disable tensorify floats when cuda graphs is on (#140983 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140983 Approved by: https://github.com/ezyang	2024-11-20 00:33:09 +00:00
Bob Ren	9005156004	don't specialize when grad tracking tensors are activated (#140828 ) Fixes `python test/dynamo/test_inline_inbuilt_nn_modules.py InlineInbuiltNNModulesFuncTorchHigherOrderOpTests.test_grad_non_tensor_input_inline_inbuilt_nn_modules` when `specialize_float=False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140828 Approved by: https://github.com/ezyang ghstack dependencies: #140830, #140832	2024-11-17 18:28:47 +00:00
Bob Ren	3b8470c461	add special case for __round__ constant variables (#139583 ) Fixes `PYTORCH_TEST_WITH_INDUCTOR=1 tlp python test/test_torch.py TestTorchDeviceTypeCUDA.test_cauchy_cuda_float64` when specialize_float=False Pull Request resolved: https://github.com/pytorch/pytorch/pull/139583 Approved by: https://github.com/ezyang ghstack dependencies: #139569, #139457, #139568, #139572, #139846, #139454, #139896, #139935, #139587	2024-11-09 03:25:53 +00:00
Animesh Jain	a140e65e0f	[dynamo] Support method with different __self__ on user defined objects (#139953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139953 Approved by: https://github.com/jansel	2024-11-08 18:44:08 +00:00
Edward Z. Yang	114a0bc306	Make PGO work correctly with NJT inputs (#140046 ) We were actually triggering a latent bug where nested ints were uselessly being incorporated into the automatic dynamic state, even though they were unconditionally ignored afterwards. Now we munge them out before putting them in. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65623303](https://our.internmc.facebook.com/intern/diff/D65623303) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140046 Approved by: https://github.com/jbschlosser, https://github.com/bdhirsh ghstack dependencies: #140042	2024-11-08 04:27:39 +00:00
Animesh Jain	86792a5a8d	[invoke_subgraph] User facing API to support arbitrary args and kwargs (#139162 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139162 Approved by: https://github.com/zou3519	2024-11-08 03:31:19 +00:00
Bob Ren	85204d0081	Don't wrap inf values as symfloat (#139896 ) Fixes `PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=7 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCPU.test_comprehensive_linalg_norm_cpu_float16` when `specialize_float=False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139896 Approved by: https://github.com/ezyang ghstack dependencies: #139569, #139457, #139568, #139572, #139846, #139454	2024-11-07 20:03:54 +00:00
Yidi Wu	ab42967238	[hop free symbols] lift free symbols in example_value when create_graph_input (#138363 ) There are 4 parts (they are hard to further break into smaller ones cause they're highly coupled) in this PR: 1. Whenever we call create_graph_input, we try to bind the symbols in the graph input. We've enforced the invariant that all create_graph_inputs calls must provide an example value, we could intercept at the create_graph_input calls (This PR only handles free symbols in tensors). 2. We cache the bound_symbols to avoid lift the same symbol repeated. 3. For lifted symbols, we re-used lifted_freevars i.e. the mapping between symbol proxy in parent graph to the lifted phs in current subgraph, which we handle lifted tensors. In this way, all hops that supports lifted tensors should be able to handle lifted_symints automatically (at least in dynamo part). 4. For unbacked symbols created during tracing, we need to also bound these symbols to its proxy. This is to support the tests cases where we want to lift unbacked symbols as input. We need the proxy of the unbacked symbol in parent graph in order to properly create the args to the hop. 5. We change all the tests after free symbols are lifted in subgraphs. And also supports the lifted symbols in existing higher order ops. The interaction of nested tracers: The previous design for lifting tensor closures is that: suppose we're in nested tracers, whenever we see a new proxy that's not created by create tracer, we recursively look for the proxy in parent tracer until we find the tracer that creates this proxy (either a placeholder or some intermediate results). More detail is in Note [Nested SubgraphTracer and free_variable handling]. Given the above design, the plan for lifting the free symbols is: whenever we lift a free tensor to be the inputs of current subgraph, we'll look at the symbols in it and bind the symbols at the same time. For example, suppose we have the following function: ```python def f(x: [s1, s2]): def true_f(): def true_f_inner(): return x.sin() ``` what will happen in time order: 1. we create a subtracer 1 and start to speculate the outer cond's true_f 2. we create a another subtracer 2 and start to speculate the inner cond's true_f_inner. 3. dynamo realize the tensor input x by calling wrap_tensor in top-level to create graph input x (tracer 0), we bind the symbol s1, s2 after ph for x is created. So the graph now looks like: ```python def gm(s1, s2, x): ``` 4. when seeing TensorVariable.call_method of x, tracer2 wants to create a call_function(sin, proxy_of_x), but it finds that proxy_of_x is not created by current tracer. So it recursively look up its parent tracer1 and find parent tracer1 also doesn't track this proxy_of_x then it finds the root tracer0, who is the creator of it and tracks it as a ph. Then tracer 1 create_graph_input to lift the closure to its input ph1 and add (proxy_of_x: ph1) k-v in lifted_freevars of tracer 1. Now the graph looks like: ```python def gm(s1, s2, x): def true_gm(x): ``` 5. Since there are free symbols inside this new tensor input, tracer 1 also binds the symbols (maybe_bind_symbol), which calls create_graph_input for s1 and s2. Now the graph looks like ```python def gm(s1, s2, x): def true_gm(s1, s2, x): ``` 6. then it goes back to tracer 2, and call create_graph_input for x and get ph2, tracer 2's lifted_freevars records (ph1, ph2). and tracer 2 also binds the symbols in this new tensor input. Now the graph looks like: ```python def gm(s1, s2, x): def true_gm(s1, s2, x): def true_gm_inner(s1, s2, x): ``` 7. Finally the sin call_function node is created by tracer 2. This PR also handles the following cases: - What if we lift two tensors share the same symbol? e.g. x1 [s1, s2], x2 [s2, s3]? Each subtracer maintains bound_symbols as a cache that maps a symbol.expr to its proxy in current tracer. So when we see x1, we'll track s1 and s2 as inputs and bound s1 to ph1, s2 to ph2. So when we try to bind symbols of x2, s2 will already be tracked so no graph input is created. - what if a subgraph close over a symint? e.g. ```python def f(x): def true_f(): c = x.size(0) def true_fn_inner(): return c ``` When we speculate true_fn_inner, we find proxy_of_c is not tracked by tracer 2, so it recursively looks up its parent. At this point, x and its symbols have been lifted as input of true_f (as a result of lifting x during tracing true_f in tracer 1. Specifically the graph looks like: ```python def gm(s1, s2, x): def true_gm(s1, s2, x): def true_gm_inner(): ``` So tracer 2 is able to find that s1 have been tracked as ph in tracer 1 so it returns back to gm and call create_graph_input on s1. The graph now looks like: ```python def gm(s1, s2, x): def true_gm(s1, s2, x): def true_gm_inner(s1): return s1 ``` - What if subgraph close over an unbacked symint? e.g. ```python def f(x): def true_f(): c = x.item() def true_f_inner(): return c ``` When x.item() is called, proxy_of_c and its symnode variable is created for tracer 1, and we also call track_unbacked_symbols to record this relationship. So when tracer 2 finds proxy_of_c is not created by current tracer, it recursivelly looks up its parent tracer and finds that that expression u0 has been tracked as a result of track_unbacked_symbol in tracer 1. So it will stop the recursion and create_graph_input u0 in tracer 2. Graph looks like: ```python def f(x): def true_f(s1, s2, x): c = x.item() def true_gm_inner(u0): return u0 cond(pred, true_gm_inner, false_gm_inner, (c,)) ``` - what if subgraph close over a tensor with unbacked symint shape? ```python def f(x): def true_f(): c = x.item() r = torch.randn((c,)) def true_f_inner(): return r + 1 ``` This is the same as the case of closing over tensors with backed shapes. where we first lift r, then bind u0 in it, which recursively bind_symint of u0 in its parent and found u0 is tracked in parent tracer as a result of .item() call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138363 Approved by: https://github.com/zou3519	2024-11-07 04:44:32 +00:00
Animesh Jain	ac5fa26e07	[dynamo][weakref] Support weakref.ref call (#139914 ) Should fix - https://github.com/pytorch/pytorch/pull/135001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139914 Approved by: https://github.com/jansel ghstack dependencies: #139856	2024-11-06 23:16:41 +00:00
Ryan Guo	693a0a1bd4	[dynamo][NFC] Rename `mutable_local` and add documentation (#139339 ) This patch addresses the renaming part of #133027, specifically, it renames the following and adds documentation for relevant classes. 1. `VariableTracker.mutable_local` to `mutation_type` 2. `MatableLocal `to `ValueMutationNew` 3. `MutableSideEffects `to `ValueMutationExisting` 4. `MutableLocalSource` to `SourceType` 5. `MutableLocalSource.Local` to `New` Note that (2), (3) and (5) are mainly to bring consistency between them and `AttributeMutationNew`, `AttributeMutationExisting`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139339 Approved by: https://github.com/jansel, https://github.com/mlazos, https://github.com/anijain2305	2024-11-05 19:11:41 +00:00
Yidi Wu	dc3a6a9d08	[hop free symbols][refactor] make create_graph_input always take example_value (#138428 ) Code refactoring only. We move the wrap_to_fake_tensor_logic out of wrap_fx_proxy for placeholders to provide the invariant that all graph inputs must set their example values when creating the inputs. This invariant helps us to identify all the free symbols in the graph in top-level and sub-graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138428 Approved by: https://github.com/ezyang, https://github.com/zou3519 ghstack dependencies: #138345	2024-11-04 22:47:49 +00:00
Yidi Wu	54c69a785b	[hop free symbols][refactor] make bound_symbols a dictionary (#138345 ) Code refactoring only. Change all self.tx.output.bound_symbols to self.tx.output.root_tracer.bound_symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138345 Approved by: https://github.com/zou3519	2024-11-04 22:47:41 +00:00
Bob Ren	b8b60e0bc5	add is_integer to support example_value function whitelist (#139484 ) Fixes python test/dynamo/test_dynamic_shapes.py DynamicShapesFunctionTests.test_is_integer_dynamic_shapes when specialize_float=False Pull Request resolved: https://github.com/pytorch/pytorch/pull/139484 Approved by: https://github.com/ezyang ghstack dependencies: #139451, #139482	2024-11-03 02:01:38 +00:00
Bob Ren	232af152b5	Fix graph breaks related to specialized float inputs (#139482 ) Fixes issue with timm models where example_value = 0.09999 proxy.node.target = <built-in function sub> would fall through to ``` unimplemented( "torch.* op returned non-Tensor " + f"{typestr(example_value)} {proxy.node.op} {proxy.node.target}", case_name="unsupported_operator", ) ``` and graph break Pull Request resolved: https://github.com/pytorch/pytorch/pull/139482 Approved by: https://github.com/ezyang ghstack dependencies: #139451	2024-11-02 21:58:46 +00:00
Bob Ren	fdd298dcb7	add hex method on SymFloat (#139451 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139451 Approved by: https://github.com/ezyang	2024-11-02 05:33:19 +00:00
Michael Lazos	46132dc026	[Dynamo] Refactor wrap_fx_proxy (#138933 ) During the work to dedup graphs for hierarchical compilation I tried to tame the `wrap_fx_proxy_cls` mess by separating the wrapping into three distinct scenarios (vs a jumble of conditionals). These are: 1) wrapping a preexisting tensor (`_wrap_fx_preexisting_tensor` 2) wrapping and tracing a new op into the graph (`_wrap_fx_proxy`) 3) handling a value that is some other proxyable data structure See `wrap_fx_proxy_cls` for the conditional tree handling these three cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138933 Approved by: https://github.com/williamwen42	2024-10-28 08:05:33 +00:00
Edward Z. Yang	14a45d7793	Refactor core algorithm for automatic dynamic shapes (#138717 ) While working on automatic dynamic PGO (https://github.com/pytorch/pytorch/pull/138052) one abstract property I was looking for out of profile information is that it formed a semilattice: I could join together two profiles and get a merged profile that is consistent with the profiles that I saw in both cases. While working on this data structure that supported joins, I realized that the base automatic dynamic algorithm could be implemented in this way, therefore this refactor. The basic recipe is that we now support a join operation on FrameStateSizeEntry. Intuitively, if you join two sizes that are equal, you get back that size (join(2, 2) == 2), but if you join two different sizes you get a special singleton auto_dynamic indicating that the size of the tensor is dynamic (join(2, 3) == auto_dynamic). So now, the automatic dynamic algorithm is: (1) compute the FrameStateSizeEntry that corresponds to the concrete values we've seen, and (2) join it into the ambient FrameStateSizeEntry. As a bonus, compiler collectives can buy into the same abstraction (we're simply distributing FrameStateSizeEntry from each node to every other node). For convenience, I also added the necessary `auto_unset` extra state which is the identity element (which makes our semilattice bounded from both top and bottom). Here, join(2, auto_unset) == 2. While doing this, there was a complication: the infer stride algorithm wasn't technically a semilattice. Here, I did what I suggested in the original code review https://github.com/pytorch/pytorch/pull/130232 which is stop using a heuristic, and instead replicate the stride inference algorithm in automatic dynamic. This means that when I join strides together, I don't join their concrete values, instead, if a stride can be inferred as the contiguous stride for a particular inner dimension, then you represent it as InferStride(dim). There's an example in code which I recommend looking at. Some other extra things that are happening in this PR: * I tried to deduplicate the size/stride automatic dynamic logic as much as possible. So hopefully less code to review here. * I had to reimplement all the logging. For the most part I tried to track the logging as closely to the original as possible, but I think we could be emitting less Chrome events here * The `marked_dynamic` handling is still preserved as is, but I kind of don't like it and we should figure out how to put it somewhere else Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138717 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #138693	2024-10-27 03:08:41 +00:00
Nick Westlake	0d9fb51028	Fix lru_cache where config is used (#134235 ) Ensure that any use of functools.lru_cache does not prevent config from being changed after the function has already run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134235 Approved by: https://github.com/masnesral	2024-10-24 10:43:34 +00:00
Animesh Jain	b1acd0978e	[dynamo] Support range_iterator as a function input (#138657 ) Fixes https://github.com/pytorch/pytorch/issues/138654 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138657 Approved by: https://github.com/williamwen42, https://github.com/jansel	2024-10-24 03:49:26 +00:00
Pian Pawakapan	51045e6251	make DimHints compatible with Dims (#138490 ) Previously we'd been raising UserErrors when `Dim()` and DimHints (`Dim.AUTO/Dim.DYNAMIC`) were both specified in `dynamic_shapes`, this PR stops that, and uses `Dim()` objects to guide DimHints. The key to this was making the `EqualityConstraint` class happy when it checks that inferred equivalence relations were specified in the original `dynamic_shapes` spec, and this introduces a `RelaxedConstraint` object to mark the hinted dimensions, so equality checks between `RelaxedConstraints` and other constraints are treated as valid. Current behavior is that: ``` class Foo(torch.nn.Module): def forward(self, x, y): return x - y inputs = (torch.randn(4, 4), torch.randn(4, 4)) shapes = { "x": (Dim.AUTO, Dim("d1", min=3)), "y": (Dim("d0", max=8), Dim.DYNAMIC), } ep = export(Foo(), inputs, dynamic_shapes=shapes) ``` The dimensions marked `AUTO` and `DYNAMIC` will have max & min ranges of 8 & 3 respectively. Note that inferred equality between `Dim()` objects & `Dim.STATIC` will still raise errors - `Dim()` suggests not specializing to a constant. Differential Revision: D64636101 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138490 Approved by: https://github.com/avikchaudhuri	2024-10-22 07:43:48 +00:00
PyTorch MergeBot	e8b1409dcf	Revert "[user triton] typing triton_kernel_wrap.py (#138230 )" This reverts commit `2f61b69603`. Reverted https://github.com/pytorch/pytorch/pull/138230 on behalf of https://github.com/wdvr due to Reverting this, as it started failing tests on main ([comment](https://github.com/pytorch/pytorch/pull/138230#issuecomment-2423354596))	2024-10-18 23:12:29 +00:00
David Berard	2f61b69603	[user triton] typing triton_kernel_wrap.py (#138230 ) Remove `# mypy: allow-untyped-defs` from triton_kernel_wrap.py, and fixed all the mypy errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138230 Approved by: https://github.com/oulgen, https://github.com/Skylion007	2024-10-18 19:29:31 +00:00
Ryan Guo	59158f640c	[dynamo] Support equality comparison between Tensor and `None` (#138289 ) This patch updates the `wrap_fx_proxy_cls` function to allow boolean output when the operation is one of `supported_const_comparison_op_values`. Fixes #120907. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138289 Approved by: https://github.com/williamwen42	2024-10-18 17:49:26 +00:00
Adnan Akhundov	809ff3b274	Add host-side Triton TMA support to Dynamo (#137677 ) This adds Dynamo tracing support for the host-side Triton TMA API (see `create_2d_tma_descriptor` calls on the host in the [Triton tutorial](https://triton-lang.org/main/getting-started/tutorials/09-persistent-matmul.html#sphx-glr-getting-started-tutorials-09-persistent-matmul-py)). A few notes: - Here we assume the availability of the host-side TMA API added to upstream Triton in https://github.com/triton-lang/triton/pull/4498. As of time of writing, this is not a part of the PT2 OSS Triton pin (although back-ported internally). OSS Triton pin update should be done in December 2024. - To capture the chain of calls `t.data_ptr() --> create_{1d,2d}_tma_descriptor(ptr, ...) --> kernel[grid](tma_desc, ...)`, we add three new variable trackers: `DataPtrVariable`, `CreateTMADescriptorVariable` (for the function), `TMADescriptorVariable` (for TMA descriptor object). This is to maintain the path back from the Triton kernel to the Tensor from which the TMA descriptor has been created. - The newly introduced variables have `reconstruct` methods used in case of graph breaks. - The `tma_descriptor_metadata` extracted from the captured `create_{1d,2d}_tma_descriptor` calls is propagated through the HOPs in Dynamo and AOTAutograd to be used by the downstream compiler (e.g., Inductor). See the unit tests for how the captured HOP arguments look like. - In the Dynamo-captured fx graph, we replace the TMA descriptor arguments of the Triton kernel by the underlying Tensors, to be able to track the input/output relationships in terms of Tensors. - In the Triton kernel mutation analysis pass (in AOTAutograd), we use the `tt.experimental_descriptor_store` TTIR op to detect mutations of the underlying tensors via TMA descriptors. So that downstream AOTAutograd can perform functionalizations as required. - JIT Inductor and AOT Inductor support will be implemented in follow-up PRs. Differential Revision: [D64404928](https://our.internmc.facebook.com/intern/diff/D64404928) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137677 Approved by: https://github.com/zou3519	2024-10-16 02:18:48 +00:00
James Wu	41ccfc8752	Log chromium event for automatic dynamic reasons (#137491 ) Log a chromium event so that we can see the reasons for invoking automatic dynamic shapes in aggregate internally. Run following code: ``` import torch @torch.compile(backend="eager") def foo(t, x): return t.sin() + x torch._dynamo.config.automatic_dynamic_shapes = True torch._dynamo.config.assume_static_by_default = True # Change size x = torch.randn([1,2]) foo(x, 2) x = torch.randn([2,2]) foo(x, 2) torch._dynamo.reset() # Change dimensionality x = torch.randn([1,2]) foo(x, 2) x = torch.randn([1,2,3]) foo(x, 2) torch._dynamo.reset() # Change stride x = torch.randn([3,3]) foo(x, 2) x = torch.as_strided(x, [3,3], [2,2]) foo(x, 2) torch._dynamo.reset() # Change scalar x = torch.randn([1,2]) foo(x, 2) foo(x, 3) ``` Internal link to perfetto: https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fchromium_events.json#!/viewer?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fchromium_events.json&local_cache_key The events look like this: <img width="639" alt="image" src="https://github.com/user-attachments/assets/23916333-7f24-47c7-934b-201f33aebeab"> <img width="638" alt="image" src="https://github.com/user-attachments/assets/9f927c8d-04bb-4431-8802-685b032df656"> <img width="640" alt="image" src="https://github.com/user-attachments/assets/342e9e11-0dfc-422d-bd0b-01a8574d38ba"> <img width="635" alt="image" src="https://github.com/user-attachments/assets/dc2c97cd-7180-4069-b019-d6e63ee490bc"> Differential Revision: [D64184625](https://our.internmc.facebook.com/intern/diff/D64184625) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137491 Approved by: https://github.com/Skylion007, https://github.com/oulgen	2024-10-11 16:50:25 +00:00
PyTorch MergeBot	c73d2634b9	Revert "Log chromium event for automatic dynamic reasons (#137491 )" This reverts commit `3c1ab93678`. Reverted https://github.com/pytorch/pytorch/pull/137491 on behalf of https://github.com/jovianjaison due to breaking internal tests ([comment](https://github.com/pytorch/pytorch/pull/137491#issuecomment-2403360486))	2024-10-09 20:24:12 +00:00
Michael Lazos	e41dffbedd	[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) (#137114 ) This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114 Approved by: https://github.com/yanboliang	2024-10-09 02:29:40 +00:00
PyTorch MergeBot	d34b617bb9	Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) (#137114 )" This reverts commit `51bc839b94`. Reverted https://github.com/pytorch/pytorch/pull/137114 on behalf of https://github.com/huydhn due to The top of the stack has been reverted but it leaves trunk in a broken state, so I try to revert the rest of the stack ([comment](https://github.com/pytorch/pytorch/pull/137114#issuecomment-2400765603))	2024-10-08 20:33:17 +00:00
James Wu	3c1ab93678	Log chromium event for automatic dynamic reasons (#137491 ) Log a chromium event so that we can see the reasons for invoking automatic dynamic shapes in aggregate internally. Run following code: ``` import torch @torch.compile(backend="eager") def foo(t, x): return t.sin() + x torch._dynamo.config.automatic_dynamic_shapes = True torch._dynamo.config.assume_static_by_default = True # Change size x = torch.randn([1,2]) foo(x, 2) x = torch.randn([2,2]) foo(x, 2) torch._dynamo.reset() # Change dimensionality x = torch.randn([1,2]) foo(x, 2) x = torch.randn([1,2,3]) foo(x, 2) torch._dynamo.reset() # Change stride x = torch.randn([3,3]) foo(x, 2) x = torch.as_strided(x, [3,3], [2,2]) foo(x, 2) torch._dynamo.reset() # Change scalar x = torch.randn([1,2]) foo(x, 2) foo(x, 3) ``` Internal link to perfetto: https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fchromium_events.json#!/viewer?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fchromium_events.json&local_cache_key The events look like this: <img width="639" alt="image" src="https://github.com/user-attachments/assets/23916333-7f24-47c7-934b-201f33aebeab"> <img width="638" alt="image" src="https://github.com/user-attachments/assets/9f927c8d-04bb-4431-8802-685b032df656"> <img width="640" alt="image" src="https://github.com/user-attachments/assets/342e9e11-0dfc-422d-bd0b-01a8574d38ba"> <img width="635" alt="image" src="https://github.com/user-attachments/assets/dc2c97cd-7180-4069-b019-d6e63ee490bc"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/137491 Approved by: https://github.com/Skylion007, https://github.com/oulgen	2024-10-08 19:53:12 +00:00
Michael Lazos	51bc839b94	[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) (#137114 ) This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114 Approved by: https://github.com/yanboliang	2024-10-07 18:55:26 +00:00
Yu, Guangye	d29094888b	Use torch.Stream&torch.Event for Dynamo capature (#134850 ) # Motivation This PR aims to solve the multiple Inheritance problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134850 Approved by: https://github.com/yf225, https://github.com/EikanWang	2024-10-02 14:15:33 +00:00
chilli	34cef1eaa7	Allow automatic dynamic shapes for closures and set current node properly in flexattention subgraph lowering (#137043 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137043 Approved by: https://github.com/drisspg ghstack dependencies: #136826	2024-10-01 09:08:08 +00:00
Animesh Jain	289df45cee	Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 )" (#136590 ) This reverts commit `7743149b2b`. Reverts * https://github.com/pytorch/pytorch/pull/135503 * https://github.com/pytorch/pytorch/pull/135502 * https://github.com/pytorch/pytorch/pull/135422 This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation. ``` import torch from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention from torch._inductor.lowering import make_pointwise, register_lowering from torch._inductor.virtualized import ops from torch.nn.attention.flex_attention import create_block_mask torch.set_default_device('cuda') flex_attention = torch.compile(flex_attention, dynamic=False) prefix_lengths = torch.arange(8) def prefix_lm(b, h, q, kv): return prefix_lengths[b] >= kv mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136590 Approved by: https://github.com/Chillee	2024-09-25 21:10:43 +00:00
Guilherme Leobas	52c917b0ba	Optimize dict reconstruct to not codegen untouched values (#134876 ) PR changes how `reconstruct` is done for a ConstDict. As of today, it works as follow: (1) codegen(...) each pair of key/value (2) create a new dictionary to hold the new items (3) clear the original dictionary (4) update the original dict with the one created in (2) We do a micro optimization in the generated bytecode to: - Only codegen the items that changed. - Only clear the original dictionary if a key was removed. Fixes: #133487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134876 Approved by: https://github.com/zou3519	2024-09-23 21:45:44 +00:00
Siju Samuel	f1ad680818	[dynamo]Remove stream hardcoding in dynamo VariableBuilder (#131763 ) Fixes #ISSUE_NUMBER Recent change from PR#123487 used torch.cuda.Stream directly and this causes failure for other backends. This PR will generalize the stream handling for all backends like cuda/hpu/xpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/131763 Approved by: https://github.com/yanboliang, https://github.com/yf225	2024-09-18 22:32:34 +00:00
PyTorch MergeBot	3b5e2689a1	Revert "Optimize dict reconstruct to not codegen untouched values (#134876 )" This reverts commit `a1a57a424d`. Reverted https://github.com/pytorch/pytorch/pull/134876 on behalf of https://github.com/jeanschmidt due to new introduced test test_reconstruct.py::ReconstructTest::test_functional_call_reconstruct is breaking internally. @zou3519 may you help get those changes merged back to main? ([comment](https://github.com/pytorch/pytorch/pull/134876#issuecomment-2355697685))	2024-09-17 13:00:01 +00:00
Guilherme Leobas	a1a57a424d	Optimize dict reconstruct to not codegen untouched values (#134876 ) PR changes how `reconstruct` is done for a ConstDict. As of today, it works as follow: (1) codegen(...) each pair of key/value (2) create a new dictionary to hold the new items (3) clear the original dictionary (4) update the original dict with the one created in (2) We do a micro optimization in the generated bytecode to: - Only codegen the items that changed. - Only clear the original dictionary if a key was removed. Fixes: #133487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134876 Approved by: https://github.com/zou3519	2024-09-14 23:25:28 +00:00
Michael Lazos	1b9daeb240	[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 ) This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135422 Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444	2024-09-14 18:52:22 +00:00
PyTorch MergeBot	f3180f0088	Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 )" This reverts commit `7743149b2b`. Reverted https://github.com/pytorch/pytorch/pull/135422 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))	2024-09-14 10:02:55 +00:00

1 2 3 4 5 ...

572 Commits