Commit Graph

633 Commits

Author SHA1 Message Date
Laith Sakka
0029259bdf Add view_simple as meta function for view, and avoid calling reshape_view_helper. (#154757)
address https://github.com/pytorch/pytorch/issues/153303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154757
Approved by: https://github.com/bobrenjc93, https://github.com/leslie-fang-intel
2025-06-12 09:58:15 +00:00
Pian Pawakapan
8ad6197b46 [draft export] avoid storing intermediate real tensors in proxies (#154630)
Handles GC for non-strict draft export; GPU memory usage shouldn't be much more than eager mode + input tensors now.

While trying to do draft export CPU offloading, I found out GC is feasible, because in non-strict, there's 2 places holding references to a `.real_tensor` attribute:
1) the FakeTensors in fake tensor prop, but these are held by the actual variables in the model's forward call, and so the real tensor gets gc-ed along with the fake one when the variable goes out of scope.
2) A clone of the fake tensor in 1) stored in `proxy.node.meta["val"]`, which was added in https://github.com/pytorch/pytorch/pull/150948. But we didn't actually need to store them on intermediate values; the placeholders are enough for retracing/lowering.

Avoiding storing the intermediate values in 2), the values in 1) should be naturally GC-ed, and the real-tensor memory usage for non-strict should be pretty similar to eager computation?

Strict still OOMs; dynamo still holds these in variable tracking, and not sure how to GC those.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154630
Approved by: https://github.com/angelayi, https://github.com/yushangdi
2025-06-12 01:18:57 +00:00
Oguz Ulgen
d1947a8707 Migrate from lru_cache to cache (#155613)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155613
Approved by: https://github.com/ezyang
ghstack dependencies: #155612
2025-06-11 19:44:18 +00:00
Colin Peppler
7b7cd56f5e [export] support linear & layer_norm unbacked (#155260)
## What
- use `definitely_contiguous_for_memory_format` instead of `is_contiguous` when the non-contiguous case is fine if we encounter a DDE.
- use ref's contiguous over Aten's contiguous because Aten's version will DDE and stop tracing. ref's version will use `definitely_contiguous_for_memory_format` and clone if there's a DDE.

## Example DDEs

- Fixed with `definitely_contiguous_for_memory_format` in `fast_binary_impl`
```
torch._dynamo.exc.UserError: Could not guard on data-dependent expression Eq((u0//387), 0) (unhinted: Eq((u0//387), 0)).  (Size-like symbols: u0)

Caused by: layer_norm = self.layer_norm(linear)  # caffe2/test/export/test_export.py:4566 in forward (_subclasses/fake_impls.py:1022 in fast_binary_impl)
```

- Fixed with `refs.contiguous` instead of calling aten's contiguous (that'd require a bigger re-write in Aten)
```
  File "c10/core/TensorImpl.h", line 825, in torch::autograd::THPVariable_contiguous(_object*, _object*, _object*)
  File "c10/core/SymbolicShapeMeta.h", line 87, in c10::TensorImpl::is_contiguous_default(c10::MemoryFormat) const
  File "c10/core/SymbolicShapeMeta.cpp", line 250, in c10::SymbolicShapeMeta::init_is_contiguous() const

torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(128*((u0//387)), 0) (unhinted: Eq(128*((u0//387)), 0)).  (Size-like symbols: u0)

Caused by: (_refs/__init__.py:3302 in native_layer_norm)
```

- Fixed with `definitely_contiguous_for_memory_format` in ref's contiguous
```
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression 387*((u0//387)) < 2 (unhinted: 387*((u0//387)) < 2).  (Size-like symbols: u0)

Caused by: (_prims_common/__init__.py:279 in is_contiguous)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155260
Approved by: https://github.com/laithsakka
ghstack dependencies: #155499
2025-06-11 16:47:34 +00:00
Brian Hirsh
6c05f2fca0 [test] use JK to force graph break on slow aliasing/mutation/dynamic_shape behavior (#155257)
Summary: test to unblock shampoo, needs cleanup

Test Plan:
CI

Rollback Plan:
steps:
  - jk.update:
      jk: pytorch/compiler:aliased_inputs_with_mutation_and_dyn_shapes_killswitch
      constant_bool: null
      consistent_pass_rate: null
      fractional_host_rollout: null
      sampling_rate: null
  - manual.note:
      content: Set it to false.

Reviewed By: c00w

Differential Revision: D76051868

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155257
Approved by: https://github.com/c00w
2025-06-09 16:21:59 +00:00
Animesh Jain
db491825e0 [invoke_subgraph] Add logging (#155284)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155284
Approved by: https://github.com/zou3519
ghstack dependencies: #155270
2025-06-07 11:31:53 +00:00
bobrenjc93
fc77269262 Add randint_like tensor overload for high (#154899)
Fixes #135664

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154899
Approved by: https://github.com/StrongerXi
2025-06-06 15:48:00 +00:00
PyTorch MergeBot
5130ac64f4 Revert "Add randint_like tensor overload for high (#154899)"
This reverts commit 72fe1d5f42.

Reverted https://github.com/pytorch/pytorch/pull/154899 on behalf of https://github.com/seemethere due to Failing internal tests see https://fburl.com/diff/bai044ob ([comment](https://github.com/pytorch/pytorch/pull/154899#issuecomment-2942740661))
2025-06-05 04:54:05 +00:00
bobrenjc93
72fe1d5f42 Add randint_like tensor overload for high (#154899)
Fixes #135664

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154899
Approved by: https://github.com/StrongerXi
ghstack dependencies: #154863
2025-06-04 03:37:09 +00:00
Ryan Guo
467235027c [AOTDispatch] Use the proper meta function for _amp_foreach_non_finite_check_and_unscale_ (#154930)
As title, this fixes part of #138412.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154930
Approved by: https://github.com/zou3519
2025-06-03 18:18:40 +00:00
PyTorch MergeBot
0fab32290a Revert "[draft export] avoid storing intermediate real tensors in proxies (#154630)"
This reverts commit 5acb8d5080.

Reverted https://github.com/pytorch/pytorch/pull/154630 on behalf of https://github.com/malfet due to This still ooms, at least occasionally see 78624679a8/1 ([comment](https://github.com/pytorch/pytorch/pull/154630#issuecomment-2923759745))
2025-05-31 00:07:56 +00:00
Pian Pawakapan
5acb8d5080 [draft export] avoid storing intermediate real tensors in proxies (#154630)
Handles GC for non-strict draft export; GPU memory usage shouldn't be much more than eager mode + input tensors now.

While trying to do draft export CPU offloading, I found out GC is feasible, because in non-strict, there's 2 places holding references to a `.real_tensor` attribute:
1) the FakeTensors in fake tensor prop, but these are held by the actual variables in the model's forward call, and so the real tensor gets gc-ed along with the fake one when the variable goes out of scope.
2) A clone of the fake tensor in 1) stored in `proxy.node.meta["val"]`, which was added in https://github.com/pytorch/pytorch/pull/150948. But we didn't actually need to store them on intermediate values; the placeholders are enough for retracing/lowering.

Avoiding storing the intermediate values in 2), the values in 1) should be naturally GC-ed, and the real-tensor memory usage for non-strict should be pretty similar to eager computation?

Strict still OOMs; dynamo still holds these in variable tracking, and not sure how to GC those.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154630
Approved by: https://github.com/angelayi, https://github.com/yushangdi
2025-05-30 21:06:55 +00:00
Aaron Orenstein
fc0135ca11 Re-enable FakeTensor caching for SymInts (#152662)
Summary:

This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present.

There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously.

Test Plan: Reran the tests listed in T196779132 and they pass.

## Perf
### Instruction Counter Benchmark:
- 26% win on add_loop_eager_dynamic
- 13% win on add_loop_inductor_dynamic_gpu
### Perf Dashboard
Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s.

Differential Revision: [D75467694](https://our.internmc.facebook.com/intern/diff/D75467694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662
Approved by: https://github.com/anijain2305
2025-05-30 17:23:36 +00:00
Yuanhao Ji
0a7eef140b Add torch.Tensor._make_wrapper_subclass to torch/_C/__init__.pyi (#154022)
Fixes #153790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154022
Approved by: https://github.com/Skylion007
2025-05-27 14:10:00 +00:00
Laith Sakka
aaf5cc13d9 [EASY] use guard_or_false instead of gso in Meta converter (#154234)
this was added in https://github.com/pytorch/pytorch/pull/141659, the current change keep the same intention
"i do not want to fail here if i cant tell if the size is zero or not"
i am not familiar enough in the code to know if we need here a runtime check, but looking at current
impl it seems that guard_or_false is appropriate to match current behaviour  and have the same effect of guard_size_oblivious here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154234
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #154154, #154164, #154167, #154172
2025-05-26 21:59:52 +00:00
PyTorch MergeBot
3f64502c98 Revert "Re-enable FakeTensor caching for SymInts (#152662)"
This reverts commit 7d11c61c26.

Reverted https://github.com/pytorch/pytorch/pull/152662 on behalf of https://github.com/malfet due to Looks like it broke bunch of inductor tests, see 187d38185e/1 ([comment](https://github.com/pytorch/pytorch/pull/152662#issuecomment-2910293593))
2025-05-26 17:13:22 +00:00
Aaron Orenstein
7d11c61c26 Re-enable FakeTensor caching for SymInts (#152662)
Summary:

This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present.

There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously.

Test Plan: Reran the tests listed in T196779132 and they pass.

## Perf
### Instruction Counter Benchmark:
- 26% win on add_loop_eager_dynamic
- 13% win on add_loop_inductor_dynamic_gpu
### Perf Dashboard
Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662
Approved by: https://github.com/anijain2305
2025-05-26 04:17:56 +00:00
Aaron Orenstein
4b7abce6a4 Fix fake tensor caching when output has unbacked (#153034)
We handle fake tensor caching in two ways:
1. If the inputs have no symbols (SymInt, etc) then we cache on the FakeTensorMode.
2. If the inputs have symbols then we cache on the ShapeEnv.

This way the symbols in the inputs and outputs are associated with the guards in place at the time of the call.

However - it's possible to have an op where there are no symbols in the inputs but there is an unbacked symbol in the output.  In this case we shouldn't cache at all because what would that really mean?

So this PR changes the caching behavior so that if there's a symbol in the output which doesn't come in some way from the input then we refuse to cache that op.

Added a test which checks for this case.

While in there I also did a couple other related changes:
1. Added negative caching - if we see that an (op, args) failed to cache previously we don't even bother trying to cache it again.
2. Reworked the inner behavior of _cached_dispatch_impl a little to make it more clear which bits we expect to be able to throw _BypassDispatchCache and add some comments.

The latest version of this also:
1. Addresses the problem that caused #153891.
    The issue was that with caching ops are required to support `__eq__`.  Unfortunately _RecordFunction is minimalistic and doesn't support that - so in the off-chance that two keys hash to the same value the `__eq__` check would raise an exception.

    Apparently this was much more common on MacOS where memory patterns end up with more reuse (so the object IDs are the same and give you the same hash value for objects that use pointer hash).

    Tested locally on MacOS where running
```
python test/inductor/test_torchinductor.py GPUTests
```
was pretty much guaranteed to fail (at least for me) somewhere around test 100-200 and passed all 800 tests after this change.

Another way to test this is to run the inductor tests with `torch._subclasses.fake_tensor._DispatchCacheKey.__hash__` monkey-patched to return a constant (causing all values to hash-collide) but this can't really be checked-in since it causes the cache lookup to turn into an O(n) lookup which takes a crazy long time to run through all the tests...

2. Folds in #153780 to ensure that exceptions raised from the op don't include the context from the cache key bypass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153034
Approved by: https://github.com/masnesral, https://github.com/tugsbayasgalan
2025-05-23 15:03:31 +00:00
PyTorch MergeBot
1075bb37d3 Revert "Fix fake tensor caching when output has unbacked (#153034)"
This reverts commit cb5f31a4a1.

Reverted https://github.com/pytorch/pytorch/pull/153034 on behalf of https://github.com/malfet due to Seems to have introduced flakiness in MacOS inductor tests, see https://github.com/pytorch/pytorch/issues/153891 ([comment](https://github.com/pytorch/pytorch/pull/153034#issuecomment-2893059329))
2025-05-20 06:02:38 +00:00
PyTorch MergeBot
9849c79fa2 Revert "FakeTensorMode dispatch shouldn't include bypass in exception context (#153780)"
This reverts commit aa84c037f0.

Reverted https://github.com/pytorch/pytorch/pull/153780 on behalf of https://github.com/malfet due to Reverting to clearly revert https://github.com/pytorch/pytorch/pull/153034, that seems to have introduced flakiness in MacOS inductor tests, see https://github.com/pytorch/pytorch/issues/153891 ([comment](https://github.com/pytorch/pytorch/pull/153780#issuecomment-2893053304))
2025-05-20 05:59:42 +00:00
Aaron Orenstein
aa84c037f0 FakeTensorMode dispatch shouldn't include bypass in exception context (#153780)
In the FakeTensor cache when we get a bypass exception while computing the cache key (call this exc_1) we need to dispatch to the original operation.

It's possible for the dispatch to the original operation to get its own exception which we want to bubble up to the caller (call this exc_2).

If we directly dispatch from within the handler for exc_1 then exc_2 will have a `__context__` of exc_1 - which can cause deviations between cached and non-cached behavior - so we need to be a bit careful when we call the dispatch.

Testing:
test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_outdtype fails before this change and passes after.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153780
Approved by: https://github.com/oulgen
2025-05-18 17:21:46 +00:00
Aaron Orenstein
cb5f31a4a1 Fix fake tensor caching when output has unbacked (#153034)
We handle fake tensor caching in two ways:
1. If the inputs have no symbols (SymInt, etc) then we cache on the FakeTensorMode.
2. If the inputs have symbols then we cache on the ShapeEnv.

This way the symbols in the inputs and outputs are associated with the guards in place at the time of the call.

However - it's possible to have an op where there are no symbols in the inputs but there is an unbacked symbol in the output.  In this case we shouldn't cache at all because what would that really mean?

So this PR changes the caching behavior so that if there's a symbol in the output which doesn't come in some way from the input then we refuse to cache that op.

Added a test which checks for this case.

While in there I also did a couple other related changes:
1. Added negative caching - if we see that an (op, args) failed to cache previously we don't even bother trying to cache it again.
2. Reworked the inner behavior of _cached_dispatch_impl a little to make it more clear which bits we expect to be able to throw _BypassDispatchCache and add some comments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153034
Approved by: https://github.com/masnesral, https://github.com/tugsbayasgalan
2025-05-15 23:18:52 +00:00
Xuehai Pan
f7a5aa1d8d [torchgen] Refactor and simplify gen_pyi.py to use Generic TypeAlias (PEP 585) and Union Type (PEP 604) (#150727)
https://github.com/pytorch/pytorch/pull/129001#discussion_r1645126801 is the motivation for the whole stack of PRs. In `torch/__init__.py`, `torch._C.Type` shadows `from typing import Type`, and there is no type stub for `torch._C.Type` in `torch/_C/__init__.pyi`. So we need to use `from typing import Type as _Type`. After enabling [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585) in the `.pyi` type stub files, we can use `type` instead of `typing.Type` or `from typing import Type as _Type`.

------

- [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585): e.g. `typing.List[T] -> list[T]`, `typing.Dict[KT, VT] -> dict[KT, VT]`, `typing.Type[T] -> type[T]`.
- [Union Type (PEP 604)](https://peps.python.org/pep-0604): e.g. `Union[X, Y] -> X | Y`, `Optional[X] -> X | None`, `Optional[Union[X, Y]] -> X | Y | None`.

Note that in `.pyi` stub files, we do not need `from __future__ import annotations`. So this PR does not violate issue #117449:

- #117449

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150727
Approved by: https://github.com/aorenste
ghstack dependencies: #150726
2025-05-15 09:36:42 +00:00
Aaron Gokaslan
3555ebb63d [BE]: Update ruff to 0.11.8 (#153249)
Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere
2025-05-12 18:30:52 +00:00
PyTorch MergeBot
e6dccb036e Revert "Fix fake tensor caching when output has unbacked (#153034)"
This reverts commit 4f425a0397.

Reverted https://github.com/pytorch/pytorch/pull/153034 on behalf of https://github.com/malfet due to Broke pr_time_benchmarks, see d07fbd41e3/1 ([comment](https://github.com/pytorch/pytorch/pull/153034#issuecomment-2868100487))
2025-05-09 23:43:56 +00:00
Aaron Orenstein
4f425a0397 Fix fake tensor caching when output has unbacked (#153034)
We handle fake tensor caching in two ways:
1. If the inputs have no symbols (SymInt, etc) then we cache on the FakeTensorMode.
2. If the inputs have symbols then we cache on the ShapeEnv.

This way the symbols in the inputs and outputs are associated with the guards in place at the time of the call.

However - it's possible to have an op where there are no symbols in the inputs but there is an unbacked symbol in the output.  In this case we shouldn't cache at all because what would that really mean?

So this PR changes the caching behavior so that if there's a symbol in the output which doesn't come in some way from the input then we refuse to cache that op.

Added a test which checks for this case.

While in there I also did a couple other related changes:
1. Added negative caching - if we see that an (op, args) failed to cache previously we don't even bother trying to cache it again.
2. Reworked the inner behavior of _cached_dispatch_impl a little to make it more clear which bits we expect to be able to throw _BypassDispatchCache and add some comments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153034
Approved by: https://github.com/masnesral, https://github.com/tugsbayasgalan
2025-05-09 21:17:54 +00:00
Pian Pawakapan
4166373908 [dynamic shapes] guard_or_false for infer_size (#152146)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152146
Approved by: https://github.com/laithsakka
2025-05-08 21:27:22 +00:00
Pian Pawakapan
5521e6b671 [export] support SymInt minlength for torch.bincount() (#152497)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152497
Approved by: https://github.com/angelayi
2025-05-01 00:45:58 +00:00
zhxchen17
a34c28e0d2 [dynamo] Add guard serialization for tensor matches. (#151318)
This is a proof-of-concept of how we could serialize a guard and deserialize it back from the bytes.

The main behavioral change introduced in this diff is on CheckFunctionManager:

```
check_fn_manager = CheckFunctionManager(code, output_graph, guards_serialization_mode="save")

guards_state: bytes = check_fn_manager.guards_state
```

Once `guards_serialization_mode` is set to `save`, CheckFunctionManager will return an addtional `bytes` object called `guards_state` which should contain all the information needed for deserializing guards later.

When we load back guards state, we will set `guards_serialization_mode` is set to `load`:

```
output_graph_state = pickle.loads(guards_state)
check_fn_manager = CheckFunctionManager(code, output_graph_state, guards_serialization_mode="load")
```

# TENSOR_MATCH

Since we have many types of guards to support, we will break the work into small diffs instead of a single diff to support every guards.

We kick off the work from TENSOR_MATCH from this diff.

# Testing

For each type of guard we will test it like the following:
1. Use guard_filter_fn to select 1 type of guard each time.
2. Call InstructionTranslator directly on an example function to get OutputGraph and CheckFunctionManager (reference guard manager)
3. Serialize->deserialize the output graph state and re-build the guards with a new CheckFunctionManager (loaded guard manager)
4. Throw a set of example inputs to both reference and loaded guard manager to see if their behavior match.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151318
Approved by: https://github.com/jansel, https://github.com/anijain2305
2025-04-25 14:16:23 +00:00
PyTorch MergeBot
b1d055fd6a Revert "[dynamo] Add guard serialization for tensor matches. (#151318)"
This reverts commit 81c4369d81.

Reverted https://github.com/pytorch/pytorch/pull/151318 on behalf of https://github.com/zhxchen17 due to macos test failing ([comment](https://github.com/pytorch/pytorch/pull/151318#issuecomment-2828638168))
2025-04-24 19:22:45 +00:00
zhxchen17
81c4369d81 [dynamo] Add guard serialization for tensor matches. (#151318)
This is a proof-of-concept of how we could serialize a guard and deserialize it back from the bytes.

The main behavioral change introduced in this diff is on CheckFunctionManager:

```
check_fn_manager = CheckFunctionManager(code, output_graph, guards_serialization_mode="save")

guards_state: bytes = check_fn_manager.guards_state
```

Once `guards_serialization_mode` is set to `save`, CheckFunctionManager will return an addtional `bytes` object called `guards_state` which should contain all the information needed for deserializing guards later.

When we load back guards state, we will set `guards_serialization_mode` is set to `load`:

```
output_graph_state = pickle.loads(guards_state)
check_fn_manager = CheckFunctionManager(code, output_graph_state, guards_serialization_mode="load")
```

# TENSOR_MATCH

Since we have many types of guards to support, we will break the work into small diffs instead of a single diff to support every guards.

We kick off the work from TENSOR_MATCH from this diff.

# Testing

For each type of guard we will test it like the following:
1. Use guard_filter_fn to select 1 type of guard each time.
2. Call InstructionTranslator directly on an example function to get OutputGraph and CheckFunctionManager (reference guard manager)
3. Serialize->deserialize the output graph state and re-build the guards with a new CheckFunctionManager (loaded guard manager)
4. Throw a set of example inputs to both reference and loaded guard manager to see if their behavior match.

Differential Revision: [D72987485](https://our.internmc.facebook.com/intern/diff/D72987485/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151318
Approved by: https://github.com/jansel, https://github.com/anijain2305
2025-04-24 18:07:01 +00:00
Animesh Jain
9c1bc9ce46 [fake tensor] Cache None, integer and SymInts in the output (#151961)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151961
Approved by: https://github.com/bdhirsh, https://github.com/zou3519
ghstack dependencies: #151409, #151633, #151477, #151957
2025-04-24 16:44:45 +00:00
Animesh Jain
d743a7bd85 [invoke_subgraph] Cache fake tensor if no unbacked symint in the output (#151957)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151957
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
ghstack dependencies: #151409, #151633, #151477
2025-04-24 14:17:22 +00:00
Animesh Jain
1d73b644a8 [fake tensor cache] Support index with non bool/int8 indices (#151477)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151477
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
ghstack dependencies: #151409, #151633
2025-04-24 13:48:18 +00:00
Animesh Jain
41285f26e4 [invoke_subgraph][fake tensor] Add finalizer on subgraph instead of the functionalize ctx wrapper (#151633)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151633
Approved by: https://github.com/zou3519
ghstack dependencies: #151409
2025-04-24 13:32:08 +00:00
PyTorch MergeBot
9344da8bd1 Revert "[fake tensor cache] Support index with non bool/int8 indices (#151477)"
This reverts commit bdb34f55a0.

Reverted https://github.com/pytorch/pytorch/pull/151477 on behalf of https://github.com/wdvr due to reverting confusing ghstack state ([comment](https://github.com/pytorch/pytorch/pull/151477#issuecomment-2825023953))
2025-04-23 17:30:27 +00:00
PyTorch MergeBot
348272e67e Revert "[invoke_subgraph][fake tensor] Add finalizer on subgraph instead of the functionalize ctx wrapper (#151633)"
This reverts commit 02dd096e51.

Reverted https://github.com/pytorch/pytorch/pull/151633 on behalf of https://github.com/wdvr due to reverting confusing ghstack state ([comment](https://github.com/pytorch/pytorch/pull/151633#issuecomment-2825007363))
2025-04-23 17:23:23 +00:00
Shangdi Yu
efdcc981d0 Back out "Do not propagate real tensor in extern kernel" (#151813)
Summary:
D73002775 breaks aot_compile for many draft exported models on PT2I dashboard. Revert.

Example error msg:

```
OrderedSet([]) >= OrderedSet([u1185, u1186, u1187]) (inductor >= fx)
fx node is: %embedding_bag_byte_prepack : [num_users=4] = call_function[target=torch.ops.quantized.embedding_bag_byte_prepack.default](args = (%view_10,), kwargs = {})
new operations are:
```

Differential Revision: D73381032

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151813
Approved by: https://github.com/angelayi, https://github.com/zou3519
2025-04-21 22:54:03 +00:00
Animesh Jain
02dd096e51 [invoke_subgraph][fake tensor] Add finalizer on subgraph instead of the functionalize ctx wrapper (#151633)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151633
Approved by: https://github.com/zou3519
ghstack dependencies: #151330, #151256, #151357, #151477
2025-04-18 21:23:21 +00:00
Shangdi Yu
931bd05560 Do not propagate real tensor in extern kernel (#151377)
Summary: See internal Diff for more details.

In ExternKernel, the FakeTensors do not have associated real tensors, because they are just created from ir.Node's shape and stride.

Test Plan:
```
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_data_dependent_ex

buck2 run mode/dev-nosan  fbcode//caffe2/test/inductor:aot_inductor_arrayref_cpu -- -r data_dependent_extern_kernel_op
```

Differential Revision: D73002775

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151377
Approved by: https://github.com/angelayi
2025-04-18 17:28:13 +00:00
Animesh Jain
bdb34f55a0 [fake tensor cache] Support index with non bool/int8 indices (#151477)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151477
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
ghstack dependencies: #151330, #151256, #151357
2025-04-17 21:51:08 +00:00
angelayi
d5dda82586 [export] Integrate meta kernel generation with draft-export (#150809)
If a custom operator does not contain a fake impl, currently draft-export will use the real-tensor propagation to get an output for the operator and continue tracing. However if we retrace the exported model using `ep.run_decompositions`, or `export`, or run the exported program with fake tensors, we'll still fail because there's no fake impl.

With this PR, after draft-export we will generate an operator profile for each operator call that we encounter, and store this on the report attached to the exported program `ep._report.op_profiles`. Users can then use `torch._library.fake_profile.register_fake_profile` to temporarily generate and register a fake impl based on these operator profiles. This way future fake tensor retracing will work.

The workflow would look something like:
```python
class M(torch.nn.Module):
    def forward(self, a, b):
        res = torch.ops.mylib.foo8(a, b)  # no fake impl
        return res

ep = export(M(), (torch.ones(3, 4), torch.ones(3, 4)) # this fails bc no fake impl
ep = draft_export(M(), (torch.ones(3, 4), torch.ones(3, 4))

ep.run_decompositions()  # this fails bc no fake impl
# this registers fake impls based on the profiles
with torch._library.fake_profile.register_fake_profile(ep._report.op_profiles):
    decomp = ep.run_decompositions()  # this works

new_inp = (
    torch.ones(2, 3, 4),
    torch.ones(2, 3, 4),
)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150809
Approved by: https://github.com/zou3519
2025-04-17 20:52:31 +00:00
angelayi
7deed1946f Fix assert_tensor_meta (#150808)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150808
Approved by: https://github.com/pianpwk
ghstack dependencies: #150806, #150807
2025-04-14 19:28:54 +00:00
Shangdi Yu
92e81cf41a Add real_tensor to the FakeTensor in node.meta["val"] (#150948)
Summary: We need real_tensor on the FakeTensor in node.meta["val"] in order to aot_compile the draft exported programs. Otherwise, we cannot propagate real tensors even when fake_mode.propagate_real_tensors = True.

This also fixes real tensor propagation in `run_decomposition()`.

Test Plan:
```
 buck2 run @mode/dev-nosan  caffe2/test:test_export -- -r test_dedup_data_dependent_failure
```

Differential Revision: D72732714

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150948
Approved by: https://github.com/angelayi
2025-04-10 00:11:46 +00:00
Shangdi Yu
cfab04d01b Fix aten.div type promotion for FakeTensor (#150874)
Summary:
When we divide a FakeTensor by an integer using the fast op implementation, the type promotion should be `ELEMENTWISE_TYPE_PROMOTION_KIND.INT_TO_FLOAT` so we get a float when dividing an int FakeTensor by an integer.

```
FAST = get_fast_op_impls()
fast_div = FAST[torch.ops.aten.div.Tensor]
fast_div(fake_tensor, some_int)
```

Test Plan:
```
python test/test_fake_tensor.py -k test_fast_div
```

Differential Revision: D72667430

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150874
Approved by: https://github.com/angelayi
2025-04-09 18:52:01 +00:00
Shangdi Yu
51da241c0a [aoti] Fix cannot determine truth value of Relation error when propagating unbacked symint in lowering (#150570)
Summary: Fix  cannot determine truth value of Relation error when propagating unbacked symint in lowering

Test Plan:
```
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts
```

Differential Revision: D72331070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150570
Approved by: https://github.com/angelayi, https://github.com/henryoier
2025-04-03 20:06:15 +00:00
Pian Pawakapan
90ddb33141 [export] specialize for aten.to (#149235)
Changes decomposition behavior of `aten.to` to respect the aliasing/non-aliasing behavior in eager, and to specialize to the input/conversion dtype & device.

Before change: we always decompose `aten.to` into `_to_copy`, regardless of aliasing behavior. This leads us to ban mutations on the result of `_to_copy` when aliased, since we can't guarantee correct program semantics. This meant users had to explicitly call `.clone()` before mutating. In the special cases where we don’t ban mutations (e.g. dtype conversion), we add runtime assertions on the input & conversion dtype/devices in the decomposed program (see https://github.com/pytorch/pytorch/pull/142420).

After change: we decompose to the aliasing/non-aliasing behavior that matches eager, allowing mutations in all cases. We also add dtype/device assertions for all `aten.to` ops, starting in the pre-dispatch graph, basically specializing the program to the dtype/devices.

Differential Revision: D71229547

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149235
Approved by: https://github.com/tugsbayasgalan
2025-04-03 05:20:10 +00:00
Animesh Jain
61ebe999cc [invoke_subgraph] Do not cache fake tensors for AOTDispatcher first pass (#150450)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150450
Approved by: https://github.com/zou3519
ghstack dependencies: #150082
2025-04-02 02:31:54 +00:00
angelayi
5e34758cef [invoke_subgraph] Support unbacked (#149298)
Differential Revision: [D71420641](https://our.internmc.facebook.com/intern/diff/D71420641)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149298
Approved by: https://github.com/zou3519
2025-03-31 17:25:09 +00:00
pralay
a9ee797e41 added fake tensor support for foreach_copy (#149127)
Fixes #149111

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149127
Approved by: https://github.com/jansel, https://github.com/jeromean
2025-03-27 09:26:23 +00:00