Commit Graph

2431 Commits

Author SHA1 Message Date
Laith Sakka
39df901b2a introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)
when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.

This is appleid for reshape in this PR and also to  tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true  now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
2025-05-28 03:41:26 +00:00
bobrenjc93
e79790e14b [ez] add docblock for _sympy_from_args (#154376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154376
Approved by: https://github.com/Skylion007
ghstack dependencies: #154374, #154375
2025-05-27 23:43:13 +00:00
bobrenjc93
4fd8a54a41 [ez] add docblock for is_accessor_node (#154375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154375
Approved by: https://github.com/Skylion007, https://github.com/pianpwk
ghstack dependencies: #154374
2025-05-27 19:47:32 +00:00
bobrenjc93
70fbd5e08c [ez] Add docblock for resolve_unbacked_bindings (#154374)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154374
Approved by: https://github.com/Skylion007, https://github.com/pianpwk
2025-05-27 17:05:49 +00:00
PyTorch MergeBot
11a51a11af Revert "introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)"
This reverts commit 5c6d7caaaa.

Reverted https://github.com/pytorch/pytorch/pull/153432 on behalf of https://github.com/malfet due to Looks like it broke flex attention tests, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=g6.4xlarge&mergeEphemeralLF=true ([comment](https://github.com/pytorch/pytorch/pull/153432#issuecomment-2912562570))
2025-05-27 13:42:34 +00:00
Laith Sakka
5c6d7caaaa introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432)
when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.

This is appleid for reshape in this PR and also to  tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true  now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
2025-05-27 08:54:31 +00:00
Laith Sakka
1da2cc52bc [EASY] remove guard_size_oblivious from is_nonzero proxy call check (#154164)
This was added in https://github.com/pytorch/pytorch/pull/149637,
torch._check can handle unbacked there is no need for size oblivious reasoning here.

Note this does not make is_nonzero unbacked friendly. but that is a different story.
I ran the test added in  https://github.com/pytorch/pytorch/pull/149637 for veirfication.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154164
Approved by: https://github.com/aorenste, https://github.com/bobrenjc93
ghstack dependencies: #154154
2025-05-26 21:59:29 +00:00
Laith Sakka
f8a2998832 [EASY] used guard_or_false instead of guard_sizes_oblivious in pointless_view (#154154)
The change is direct and clear, the optimizations removes pointless_view iff it all sizes are the same if not we want to return false, there is no need for size oblivious  reasoning.

this was added in https://github.com/pytorch/pytorch/pull/139136, run existing tests that are added in that PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154154
Approved by: https://github.com/bobrenjc93
2025-05-26 21:59:21 +00:00
PyTorch MergeBot
3f64502c98 Revert "Re-enable FakeTensor caching for SymInts (#152662)"
This reverts commit 7d11c61c26.

Reverted https://github.com/pytorch/pytorch/pull/152662 on behalf of https://github.com/malfet due to Looks like it broke bunch of inductor tests, see 187d38185e/1 ([comment](https://github.com/pytorch/pytorch/pull/152662#issuecomment-2910293593))
2025-05-26 17:13:22 +00:00
Aaron Orenstein
7d11c61c26 Re-enable FakeTensor caching for SymInts (#152662)
Summary:

This backs out D60320595 which itself turned off FakeTensor caching when a SymInt was present.

There has been a lot of dynamic shape fixes done this year and tests pass so I'm assuming some of that work fixed what was breaking previously.

Test Plan: Reran the tests listed in T196779132 and they pass.

## Perf
### Instruction Counter Benchmark:
- 26% win on add_loop_eager_dynamic
- 13% win on add_loop_inductor_dynamic_gpu
### Perf Dashboard
Compilation Latency wins across the board but especially strong on the dynamic tests (like cudagraphs_dynamic) - for example MobileBertForMaskedLM went from 66s -> 50s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152662
Approved by: https://github.com/anijain2305
2025-05-26 04:17:56 +00:00
bobrenjc93
53ecb8159a Introduce statically_known_false (#154291)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154291
Approved by: https://github.com/mengluy0125
2025-05-24 14:23:55 +00:00
Aaron Orenstein
6503b4a96e Update to using mypy 1.15 (#154054)
The BC break isn't real - mypy decided to start complaining about the way we were typing that function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154054
Approved by: https://github.com/Skylion007
2025-05-24 04:30:57 +00:00
Pian Pawakapan
a19f2cdf29 [draft export] skip when no LOC found (#154190)
Couldn't repro error, but verified fix with @ColinPeppler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154190
Approved by: https://github.com/ColinPeppler
2025-05-24 02:29:34 +00:00
Laith Sakka
9e089bb5b6 change guard_or impl for better perf and simplicity (#153674)
PR time benchmarks has been showing regressions as we move to guard_or_false, reason is that prev implementation do not cache.
This new approach will propagate the fallback value to eval and return it. allowing eval to cache and reducing scamming logs and complexity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153674
Approved by: https://github.com/bobrenjc93
2025-05-23 15:24:28 +00:00
Autin Mitra
5623d30228 [Minimizer] Gracefully exit when there is no discrepancy in block mode (#154076)
Summary:
Previously, when there is no discrepancy in results for block mode, net_min_base will throw an OOB error.

This occurs due to the block _block_traverse_impl returning an OOB after exhausting subgraphs all the way down to a single node

There is also an issue where we may get an unsound subgraph (i.e. mark an earlier node as the "end" even if the correct end is later). This is due to an incorrect check (start_idx == mid) where there can possibly be two values left before the program pre-maturely returns

Test Plan:
Buck UI: https://www.internalfb.com/buck2/52524c26-ace5-4593-8a4b-843a54eb206a
Test UI: https://www.internalfb.com/intern/testinfra/testrun/3096224973363310
Network: Up: 0B  Down: 15MiB  (reSessionID-cd404e97-395f-49fc-8381-373e90a1378f)
Executing actions. Remaining     0/1
Command: test.
Time elapsed: 53.7s
Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0

Differential Revision: D75143242

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154076
Approved by: https://github.com/jfix71
2025-05-23 06:42:07 +00:00
Laith Sakka
c1055f41a6 Data dependent free reshape. (#153198)
#### change 1: if compute_strides stride fail for reshape just clone.

Lets consider the most general case, if torch compile is asked to reshape [u0, u1][u3, u4] -> [u5, u6] what shall it do?
The shape is general enough to represent both contiguous and non contiguous tensors, tensors where a clone free reshape can happen and other where a clone free cant happen.  The current algorithm will fail due to data dependent errors.

The general idea is if its impossible to tell if the reshape can happen in place, (because for some concrete inputs
it will and other not) then its ok to take the general path and clone, instead of failing or asking the user to give hints.
**Because the user want a single graph (single compilations)** and this is the only way it can be done.
Had this been a view? then the user is explicitly asking for a copy-free reshape, we would fail asking for more
information (hints in torch.checks form).

with this change reshape works as the following:
1. if we know the input is contiguous we will convert the reshape to view.
2. if compute_strides succeed we will use view. (compute_strides  was changed to not fail when when unbacked presented instead it will just return nullptr if it cant compute the strides meaning we shall use a clone).
3. if neither 1, 2 works clone and use a view.

Side note: having a view does not mean that inductor will not clone, for inductor there is a pass that converts all views back to reshapes and inductor has its logic dealing with those.

#### change 2 : skip  _reshape_view_helper and fall back to simpler logic if it fail.
We trace _reshape_view_helper when doing fake tensor tracing , but not during proxy tracing. hence such tracing wont effect the graph (only compute output shapes of several operations). We should not fail there, because it should always be possible for us to pass it in case of reshape.

i.e. when reshape_symint was called we would have either cloned, or compute_strides succeeded so the view should pass. What I did is the following: we run _reshape_view_helper, if we fail due to unbacked we call _view_simple which will succeed always for reshapes, (might fail for views when its impossible to do the view, in such case we throw the dde that was thrown by the original algorithm).

Ideally I would want to register _view_simple as the meta for view and avoid calling  _reshape_view_helper completely but I am running some issues with the dispatcher with subclasses and I do not have time to debug it. Namely one test
would end up calling some c++ view function that does not support symints during meta dispatch when i register a
python meta decompositions
```python test/dynamo/test_subclasses.py SubclassTests.test_subclass_views_dynamic_True ```
 https://github.com/pytorch/pytorch/issues/153303.I will follow up with that change in a separate PR.  cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @bdhirsh

 Two other alternatives for registering   _view_simple as meta and the try catch approach in this PR is:
 1. call _view_simple if any input is dynamic see  #153521
 2. if we make is_compiling works for framework code tracing (does not work rn) we can call _view_simple
 is if is_compiling.

#### Note:
Reshape can still fail when is_contiguous is called, Next PR will handle that by calling is_known_contiguous.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153198
Approved by: https://github.com/etaf, https://github.com/bobrenjc93
2025-05-23 01:45:16 +00:00
Yidi Wu
1e0f19e173 auto functionalize base_hop (#151067)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151067
Approved by: https://github.com/zou3519
2025-05-21 18:55:46 +00:00
Deysha Rivera
aec7cc60d7 add graph_code_verbose_log artifact for fx passes (#153775)
Fixes #153646

This PR refactors the logging behavior in the FX pass insert_deferred_runtime_asserts and runtime_assert.py to separate verbose/intermediate graph logs from the final output graph log. All verbose logs generated during the FX pass are now routed to a new artifact logger, graph_code_verbose, while only the final output graph remains logged to the original graph_code artifact.

Changes

- Added a new artifact logger: [graph_code_log = torch._logging.getArtifactLogger(__name__, "graph_code_verbose")]
- Updated all verbose/intermediate FX pass logs in [insert_deferred_runtime_asserts] to use the new graph_code_verbose artifact.
- Ensured that only the final output graph is logged to the original graph_code artifact.
- No changes to the FX pass logic or output—only logging behavior is affected.

Notes
This change is backward-compatible and does not affect the functional behavior of FX passes.
No changes to user-facing APIs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153775
Approved by: https://github.com/williamwen42
2025-05-21 18:31:59 +00:00
PyTorch MergeBot
d81217be2e Revert "Improve torch.ops typing (#153558)"
This reverts commit c5cba39d46.

Reverted https://github.com/pytorch/pytorch/pull/153558 on behalf of https://github.com/yangw-dev due to Your diff will not be landed to fbcode since we suspect it caused the following breakage in an internal test:[D75007157](https://www.internalfb.com/diff/D75007157) for instance: tests_gpu/lookup_gpu_index_test.py:232:8 Undefined attribute [16]: torch._ops._OpNamespace has no attribute simple_index_mm_batch ([comment](https://github.com/pytorch/pytorch/pull/153558#issuecomment-2892506789))
2025-05-19 23:32:36 +00:00
Laith Sakka
0ec8fe46d7 cleanup, refactor and add missing self._dde_suppressed checks (#152657)
so two things other than cleanups and refactoring
1) do not use propagate_real_tensors to resolve eval under guard_or_true/guard_or_false .
2) do not guard for dimensions of type  DimDynamic.OBLIVIOUS_SIZE under guard_or_true/guard_or_false .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152657
Approved by: https://github.com/pianpwk
2025-05-19 16:15:14 +00:00
Benjamin Glass
c5cba39d46 Improve torch.ops typing (#153558)
Fixes longstanding issue where direct references to aten operations are seen as untyped by type checkers. This is accomplished by setting attributes on several classes more consistently, so that `__getattr__` can return a single type in all other cases.

Decisions made along the way:

1. `torch.ops.higher_order` is now implemented by a single-purpose class. This was effectively true before, but the class implementing it attempted to be generalized unnecessarily. Fixing this simplified typing for the `_Ops` class.
2. `__getattr__` is only called when all other lookup methods have failed, so several constant special-cases in the function could be implemented as class variables.

The remainder of this PR is fixing up all the bugs exposed by the updated typing, as well as all the nitpicky typing issues.

Test plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153558
Approved by: https://github.com/rec, https://github.com/Skylion007, https://github.com/cyyever
2025-05-19 14:52:32 +00:00
PyTorch MergeBot
1748fa529a Revert "cleanup, refactor and add missing self._dde_suppressed checks (#152657)"
This reverts commit f7fb2f66e3.

Reverted https://github.com/pytorch/pytorch/pull/152657 on behalf of https://github.com/malfet due to Broke lint ([comment](https://github.com/pytorch/pytorch/pull/152657#issuecomment-2887539146))
2025-05-16 19:42:20 +00:00
Laith Sakka
f7fb2f66e3 cleanup, refactor and add missing self._dde_suppressed checks (#152657)
so two things other than cleanups and refactoring
1) do not use propagate_real_tensors to resolve eval under guard_or_true/guard_or_false .
2) do not guard for dimensions of type  DimDynamic.OBLIVIOUS_SIZE under guard_or_true/guard_or_false .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152657
Approved by: https://github.com/pianpwk
2025-05-16 19:10:04 +00:00
Pian Pawakapan
befb5bd52a [dynamic shapes] simplify int(x / y) pattern (#153477)
Fixes #138853

Summary: Converts `TruncToInt(IntTrueDiv(x / y))` to `x // y` if divisible, helps detect symint specializations where we didn't previously

Differential Revision: D74664734

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153477
Approved by: https://github.com/bobrenjc93
2025-05-16 17:32:15 +00:00
PyTorch MergeBot
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
angelayi
3fe42d4d5d [export] Dynamo symint support (#152677)
Basically adds native _IntWrapper support to dynamo. Here's my process of trying to make symint input support work on dynamo, and how I ended up with this approach [(doc)](https://docs.google.com/document/d/1GvNRQd8BnxlMay_hrEVgEta6VUeUW_hcFeRuB7q1nDY/edit?tab=t.0).

What I did was, before passing inputs to dynamo.export, I first wrap them with a class, `_IntWrapper`. When processing dynamic shapes, I will then add the corresponding dynamic shape specification to the `dynamism` field stored on the `_IntWrapper`. If there is no dynamism specified, then this will get unwrapped back to an integer. When dynamo tracing, when we encounter an `_IntWrapper`, we will convert this to a symint if the dynamism was specified as `Dim.DYNAMIC/AUTO`. Dynamo will then trace a graph that contains symint inputs, which will get passed to AOTAutograd and so on.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152677
Approved by: https://github.com/pianpwk
2025-05-16 07:51:50 +00:00
Angela Yi
459ce6c12a [export] Flatten frame local logs (#153627)
Summary:
Some new errors have been showing up on the PT2 dashboard with
```
Invalid type for lengths: Expected BlobReference or torch.Tensor, got: Tensor(shape: torch.Size([10]), stride: (1,), storage_offset: 0)
```
This is caused by [this piece of code](https://fburl.com/code/5nbi9on7) which maps over a set of nodes (in this case type `IDListFeatureListField`) and turns the results into strings to be displayed later. However during pytree.tree_map we call pytree.tree_unflatten which will call the class's init function, which calls `assert_blob` (https://fburl.com/code/h3ainrn9). Because we've mapped over the values and converted them to strings, the assert_blob fails.

I initially thought to disable the assert_blob while tracing (D74684309) but then I think we should actually flatten the list first. Because tlparse will expect just a string out outputs instead of the actual structure.

Test Plan: `buck2 run mode/opt sigmoid/inference/ts_migration:pt2i_readiness_main -- --test_suite ads_all --mode test_full_model --model_id 542947220` fails with something else 😅

Differential Revision: D74744326

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153627
Approved by: https://github.com/yiming0416
2025-05-16 04:45:09 +00:00
Aaron Gokaslan
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
Will Feng
0139ce9303 Add skip_dtype_check_in_meta_registrations config to torch/fx/experimental/_config (#153513)
Helion relies on torch/fx/experimental 's fake_tensor tracing but does its own dtype checking, which conflicts with some meta kernel's existing dtype checking. This PR adds a config so that we skip those dtype checking in meta kernels and rely on the calling system to do the dtype checking.

Currently it only applies to `baddbmm`, but I expect that similar changes will need to be done to other meta kernels in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153513
Approved by: https://github.com/jansel
2025-05-14 09:14:11 +00:00
bobrenjc93
70c8047c2d include user stacks with constraint violation error message (#152924)
Fixes #152918

Before:

```
File "/data/users/bobren/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 5588, in produce_guards_verbose
    raise ConstraintViolationError(
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['x'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
  - You marked L['x'].size()[0] as dynamic but your code specialized it to be a constant (5). Either remove the mark_dynamic or use a less strict API such as maybe_mark_dynamic or Dim.AUTO.
```

After:

```
File "/data/users/bobren/a/pytorch/torch/fx/experimental/symbolic_shapes.py", line 5588, in produce_guards_verbose
    raise ConstraintViolationError(
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['x'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
  - You marked L['x'].size()[0] as dynamic but your code specialized it to be a constant (5). Either remove the mark_dynamic or use a less strict API such as maybe_mark_dynamic or Dim.AUTO.

User stack:
  File "/home/bobren/local/a/pytorch/error.py", line 5, in foo
    return torch.randn(5) * x
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152924
Approved by: https://github.com/pianpwk
2025-05-10 13:36:47 +00:00
Pian Pawakapan
4166373908 [dynamic shapes] guard_or_false for infer_size (#152146)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152146
Approved by: https://github.com/laithsakka
2025-05-08 21:27:22 +00:00
Jim Wan
8b8051f6ed [Minimizer] Fix the path naming (#153130)
Summary:
Added some logging and captured the indexing. See below image.
{F1977773416}

This is why the saved module path is called `/tmp/jimwan/minimizer_a_acc.pt`

Now the updated module paths are `/tmp/jimwan/minimizer_addmm_default_103_acc.pt`.

Test Plan:
```
MTIAC_USE_DIST_REF_KERNELS=all  buck2 run @//mode/opt mtia/accuracy/minimizer:mtia_minimizer_runner --  --mode sequential  --compare_fn allclose  --pt_save_dir  /tmp/debug3  --atol 1e-4 --rtol 1e-4 --all_outputs --start_idx native_layer_norm_default_80 --end_idx getitem_272 2>&1 | tee ~/test.log
```
{F1977773610}

Reviewed By: qcyuan

Differential Revision: D74369107

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153130
Approved by: https://github.com/Skylion007
2025-05-08 19:59:52 +00:00
Aaron Orenstein
7cdf5048ea Fix evaluate_expr to include suppress_guards_tls in cache key (#152661)
ShapeEnv.evaluate_expr() behaves differently based on the (tls) global "suppress_guards" - so its cache key needs to include that value.

This came up because #152662 triggered it in the test `test/dynamo/test_exc.py::ExcTests::test_trigger_bisect_on_error` - fixing this caused that test to work again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152661
Approved by: https://github.com/laithsakka
2025-05-08 18:25:34 +00:00
Menglu Yu
2d25e4d478 [1/n][Optimus][Auto-AC] Support activation quantization without scaling (#148380)
Summary: We enable the activation quantization in the forward pass, and users can customize the dtype they want to quantize.

Test Plan:
# unit test

```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:quantization -- test_activation_quantization_aten
```

Buck UI: https://www.internalfb.com/buck2/776d3911-bb86-4ac8-a527-540cf1510b9d
Test UI: https://www.internalfb.com/intern/testinfra/testrun/4785074873051017
Network: Up: 4.3MiB  Down: 42MiB  (reSessionID-fef7e727-68b1-4645-a519-5652854df38d)
Executing actions. Remaining     0/4                                                                                 6.7s exec time total
Command: test.     Finished 2 local
Time elapsed: 3:11.5s
Tests finished: Pass 2. Fail 0. Fatal 0. Skip 0. Build failure 0

# E2E

### how to enable (you can overrite the dtype, if nothing given, the default is fp8)

```
post_grad_fusion_options={
            "activation_quantization_aten_pass": {"quant_type": "torch.float8_e5m2"}
        },
```

Differential Revision: D70522237

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148380
Approved by: https://github.com/Mingming-Ding, https://github.com/Hahu803
2025-05-08 04:44:15 +00:00
Aaron Orenstein
42b3e560ee Thread through options so GraphPickler can allow all ops (#152801)
Fixes #151904

In #151904 we discussed the feasibility of including all ops in the GraphPickler. This PR changes it so we can filter which ops are allowed and which are blocked.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152801
Approved by: https://github.com/masnesral
2025-05-07 14:36:50 +00:00
Aaron Orenstein
7a0781eaad Improve cache key graph printing performance (#151928)
Teach the graph printer how to allow overriding printing SymTypes (`SymInt`, `SymFloat`, `SymBool`) and then use that to reuse the fast SymNode printing from `torch._inductor.utils.sympy_str()` to make computing the cache key faster.

On my computer the repro from #151823 goes from 480s -> 80s (still terrible... but better).

Fixes #151823

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151928
Approved by: https://github.com/laithsakka
2025-05-06 17:39:53 +00:00
PyTorch MergeBot
0e2b948256 Revert "cleanup, refactor and add missing self._dde_suppressed checks (#152657)"
This reverts commit 784c666cae.

Reverted https://github.com/pytorch/pytorch/pull/152657 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a test to fail in trunk ([comment](https://github.com/pytorch/pytorch/pull/152657#issuecomment-2853442594))
2025-05-06 06:45:07 +00:00
Laith Sakka
784c666cae cleanup, refactor and add missing self._dde_suppressed checks (#152657)
so two things other than cleanups and refactoring
1) do not use propagate_real_tensors to resolve eval under guard_or_true/guard_or_false .
2) do not guard for dimensions of type  DimDynamic.OBLIVIOUS_SIZE under guard_or_true/guard_or_false .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152657
Approved by: https://github.com/pianpwk
2025-05-06 05:24:09 +00:00
Animesh Jain
97dfd8dd53 [invoke_subgraph] Run missing graph passes recursively (#152675)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152675
Approved by: https://github.com/bdhirsh, https://github.com/zou3519
ghstack dependencies: #152772, #152770
2025-05-06 02:55:34 +00:00
Animesh Jain
b1d34acac5 [fx] Recursive DCE on subgraphs (#152772)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152772
Approved by: https://github.com/bdhirsh, https://github.com/zou3519
2025-05-06 02:55:34 +00:00
rzou
2b37a726e0 Refactor layout constraint selection logic (#148104)
This PR:

- cleans up some existing comments that don't make sense anymore
- hooks up the "custom_op_default_layout_constraint" back (that seems to
have broken)
- cleans up the "lazy registration path" which seems to never get hit
anymore
- adds dislike_padding to nodes that require exact strides

Test Plan:
- tests + CI

disable padding

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148104
Approved by: https://github.com/shunting314, https://github.com/eellison
2025-05-03 00:02:24 +00:00
Laith Sakka
376529c78b consolidate guard_or_x and definitely_x (#152463)
definitely_true is almost same as guard_or_false, the potential differences are not meaningful to a degree that justify the
existence of both. same for definitely_false, it can be expressed with guard_or_true and guard_or_false.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152463
Approved by: https://github.com/bobrenjc93
2025-05-02 18:08:11 +00:00
bobrenjc93
1f898657e6 [ez] fix grammar mistakes in StatefulSymbolicContext comment (#152598)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152598
Approved by: https://github.com/malfet
ghstack dependencies: #151407
2025-05-02 04:21:16 +00:00
Yidi Wu
447f8241f5 [export][function schema] support exporting hop with function schema argument (#152073)
We need to make function schema proxyable to trace a the auto_functionalized hop that takes function schema as inputs.  The implementation basically follows how we support torchbind object:

1. upon seeing an untracked function schema arg, we creates a constant get_attr node
2. we track the function schema argument in export to support lift/unlift.
3. we need to support serde for functional schema. We'll add support for this in follow-up PRs.

However, compared with torchbind object:
1. we don't need a dynamo implementation, because the function schema is added when we auto_functionalize a hop to the argument of auto_functionalized. One potential use case is users re-traces an exported program with strict mode. Since non-strict is the default now, we don't see a use case yet.
2. we don't need an inductor implementation, because the function schema will go away after auto_functionalized re-inplacing pass.

edit: we greatly simplifies (and generalizes) the implementation following @zou3519 's suggestion of using pytree.register_constant

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152073
Approved by: https://github.com/zou3519
ghstack dependencies: #152072
2025-05-01 05:22:02 +00:00
Pian Pawakapan
0d2746092b [ez][export] suggest torch._checks only for booleans (#152499)
We were doing this when the error was coming from int/float casts, suggesting fixes like `torch._check(zuf0), torch._check(~zuf0)`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152499
Approved by: https://github.com/angelayi
2025-05-01 01:24:46 +00:00
bobrenjc93
e5ea7911ea [ez] Make relaxed constraint error message more user friendly (#151407)
Fixes #151356

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151407
Approved by: https://github.com/Skylion007
2025-04-30 03:51:50 +00:00
Laith Sakka
62f1d0ea78 Log information about suppressed data dependent errors (#151041)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151041
Approved by: https://github.com/bobrenjc93
2025-04-29 06:08:07 +00:00
Laith Sakka
5a9868b78c Do not log exception when recording is disabled or already recording (#151038)
I am not sure why do we log all exceptions here and re-raise them , but at least when recording is disabled this should be
transparent. namely logging dde could be spamming.

before:
<img width="995" alt="Screenshot 2025-04-10 at 12 47 31 PM" src="https://github.com/user-attachments/assets/f90d4557-d958-4558-a917-0d687366cad1" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151038
Approved by: https://github.com/bobrenjc93
2025-04-29 02:48:20 +00:00
Nikita Shulga
13966d0bf5 [BE] Migrate dtype_abbrs into one location (#152229)
Namely `torch.utils._dtype_abbrs.dtype_abbrs`

Before that it was defined in various forms of completeness in
c02edba863/torch/fx/graph.py (L215),
c02edba863/torch/testing/_internal/common_utils.py (L5226)
 and c02edba863/torch/testing/_internal/logging_tensor.py (L17)

TODO:
 - Add linter that `torch.testing._internal` module is not referenced from any of the public facing APIs, as it can have extra dependencies such as `expect_test`

Fixes https://github.com/pytorch/pytorch/issues/152225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152229
Approved by: https://github.com/clee2000, https://github.com/Skylion007
2025-04-28 03:52:47 +00:00
Laith Sakka
98bd2bd1ab Do not generate long log messages for suppressed data dependent errors. (#151023)
TORCH_LOGS="all" python test/test_dynamic_shapes.py -k test_guard_or_true

 before:
<img width="1065" alt="Screenshot 2025-04-10 at 9 55 27 AM" src="https://github.com/user-attachments/assets/3ee20de0-2902-4eb1-8ab0-80f1b974fb78" />

after:
<img width="1124" alt="Screenshot 2025-04-10 at 9 54 35 AM" src="https://github.com/user-attachments/assets/4e7e1f0c-856c-417f-8763-bfe183e2450d" />

Note: we actually do not expect to see a log at all, this is an orthogonal issue in recording where it logs each error seen
even when recording is not enabled? I will follow up with PR for that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151023
Approved by: https://github.com/bobrenjc93
2025-04-28 00:39:52 +00:00