Commit Graph

89 Commits

Author SHA1 Message Date
Animesh Jain
bcd1d2e832 [dynamo] Introduce UnspecializedNNModule guard source (#132304)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132304
Approved by: https://github.com/yanboliang
ghstack dependencies: #132302
2024-08-01 04:35:43 +00:00
Animesh Jain
e772547d70 [dynamo][rename/refactor] Rename guard_source NN_MODULE to SPECIALIZED_NN_MODULE (#132302)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132302
Approved by: https://github.com/yanboliang
2024-08-01 04:35:43 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
Oguz Ulgen
54b0006cb2 Evaluate symexprs on load path of cache not write (#128997)
When caching is enabled, an internal model fails with
```
assert_size_stride(bmm_9, (17, s0, 512), (54784, 512, 1))
AssertionError: expected size 17==17, stride 57344==54784 at dim=0
```
looking at this model, the exact problem is when the cache is hit on the forward graph, the generated code for backward fails since the strides of the outputs of forward, passed to backward as inputs, are not what we expected.

This PR changes the evaluation logic so that we defer evaluation of output stride exprs to load path as opposed to eagerly doing it on save path.

I have not been able to come up with a unit test repro for this problem.

Differential Revision: [D58796503](https://our.internmc.facebook.com/intern/diff/D58796503)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128997
Approved by: https://github.com/ezyang
2024-06-20 08:55:12 +00:00
Xuehai Pan
dd143d44cc [BE] enable UFMT for top-level files torch/*.py (#127707)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127707
Approved by: https://github.com/ezyang
2024-06-12 20:15:05 +00:00
Aaron Orenstein
ea614fb2b1 Flip default value for mypy disallow_untyped_defs [2/11] (#127839)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127839
Approved by: https://github.com/oulgen
2024-06-08 18:23:08 +00:00
Aaron Gokaslan
1dd42e42c4 [BE]: Try TCH autofixes on torch/ (#125536)
Tries TCH autofixes and see what breaks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536
Approved by: https://github.com/ezyang
2024-05-05 23:13:59 +00:00
Animesh Jain
37c993546d [dynamo][guards] Bug fix for set_export_info (#125275)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125275
Approved by: https://github.com/yanboliang
2024-05-01 03:46:26 +00:00
Edward Z. Yang
64491c0811 Restore CompileContext as well in backwards (#124626)
This should fix many of the unknown compile id problems currently
afflicting tlparse backwards analysis.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124626
Approved by: https://github.com/bdhirsh
2024-04-23 14:39:52 +00:00
Xuehai Pan
93e249969b [BE] enable ruff rule RSE and remove useless parentheses in raise statements (#124261)
Remove useless parentheses in `raise` statements if the exception type is raised with no argument.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261
Approved by: https://github.com/albanD
2024-04-17 19:29:34 +00:00
Jason Ansel
11e6f84ad8 [dynamo] Graph break on uninitialized nn.Module (#123790)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123790
Approved by: https://github.com/anijain2305
ghstack dependencies: #123700, #123705, #123786
2024-04-12 19:03:13 +00:00
Brian Hirsh
134e56fa33 inductor: log unique id to match output_code to aot graphs (#118647)
I found it helpful to be able to see, given some inductor output code, which AOT graph it came from. When you have large models with multiple graphs floating around this can be difficult, so I added the aot_config.aot_id to the printed inductor output.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118647
Approved by: https://github.com/ezyang
2024-04-11 14:37:07 +00:00
Animesh Jain
1346ebf12e [dynamo][guards] Delay DUPLICATE_INPUT guard because of incorrect ordering (#123605)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123605
Approved by: https://github.com/jansel
ghstack dependencies: #123606
2024-04-10 07:30:02 +00:00
Jason Ansel
1e9a7df8fe [dynamo] Compile time optimizations in tx.step() (#121790)
`python benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
- Before: `symbolic_convert_overhead_stress_test: 10.7s`
- After: `symbolic_convert_overhead_stress_test: 8.6s`

`tx.step()` is a small part of that benchmark, so likely the speedup in that isolated function is larger than the top line.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121790
Approved by: https://github.com/oulgen
2024-03-15 01:01:05 +00:00
Jason Ansel
7cc476ea16 [dynamo] Fix support for nn.Parameter constructor (part 1) (#120163)
This captures calls to `torch.nn.Parameter` by lifting them to graph inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120163
Approved by: https://github.com/albanD, https://github.com/yanboliang
ghstack dependencies: #121086
2024-03-11 05:14:42 +00:00
Joel Schlosser
dad1b76584 Introduce EphemeralSource for symbols that should be simplified out (#120948)
Context: view fake-ification should handle closed-over state in ViewFuncs for use in view replay by:
* fake-ifying tensors
* symbolicizing SymInts

This avoids invalid specialization during view replay. However, the symbols / tensors created as intermediates in the view chain should not stick around or be guarded on. This PR introduces an `EphemeralSource` intended to be used as a source for this purpose. It has the following properties:
* Considered first to be simplified out in symbol simplification logic
* Errors if guarded on

Differential Revision: [D54561597](https://our.internmc.facebook.com/intern/diff/D54561597)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120948
Approved by: https://github.com/ezyang
2024-03-06 02:30:52 +00:00
Sam Larsen
06f8af30fa Change FakeTensor serialization to consider only an _active_ FakeTensor mode (#120848)
Summary: https://github.com/pytorch/pytorch/pull/108186 make some changes related to FakeTensor serialization such that saving and loading a tensor will give us a meta tensor, even if FakeTensor mode is not enabled. This means we can't properly save and load Tensors as part of Fx graph caching. This PR changes the logic to check if there's an _active_ FakeTensor mode.

Test Plan:
* New unit tests
* Validated unit tests introduced in https://github.com/pytorch/pytorch/pull/108186 still pass
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120848
Approved by: https://github.com/eellison, https://github.com/thiagocrepaldi
2024-03-01 02:37:21 +00:00
Elias Ellison
d03b11ad5b Pass inductor strides forward in ddp optimizer (#120523)
# Note: Returning Fake Tensors on First AOT Autograd Call
            #
            # Inductor will optimize strides of outputs when it deems it profitable.
            # For instance, converting to channels last. When we split the graph here
            # into multiple inductor compilations, we need to make sure that the
            # output strides of one compilation is appropriately passed to the subsequent
            # compilations. However, the mapping from inductor output to dynamo output
            # is non-trivial due to aot_autograd's deduping, de-aliasing, mutation, re-writing,
            # subclass handling, etc. In order to replay all this logic we set a flag such that
            # the first invocation of inductor in aot_autograd will return Fake Tensors with
            # appropriate strides. Then, all of aot autograd's runtime logic is replayed.
            # This gives us the appropriately strided outputs here which will reflect runtime strides.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120523
Approved by: https://github.com/yf225, https://github.com/bdhirsh
2024-02-29 22:25:00 +00:00
Jason Ansel
01ec8df6d8 [Compiled Autograd] Introduce BackwardState capture (#120382)
This adds support for backwards hooks that are *both*:
1) Interior to the graph; and
2) Dynamically generated (e.g. lambdas)

We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo *after* the forwards runs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382
Approved by: https://github.com/xmfan
2024-02-28 20:36:47 +00:00
Animesh Jain
8a59f49da2 [dynamo][compile-time] Collect guard debug stack info only with logs enabled (#120520)
Reduces backend=eager compile time from 33 to 19 seconds for `MobileBertForQuestionAnswering`. This also helps an internal model where guards.add function is taking 124 seconds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120520
Approved by: https://github.com/mlazos
2024-02-27 01:51:16 +00:00
Taras Tsugrii
2c8722182e [dynamo][guards] Avoid unnecessary stack copies. (#119115)
There is no need to make a `frame_summary_stack` copy in case it's not modified. Proposed change uses copy-on-write functional approach that is easy to understand and is more efficient in case `self.loc_in_frame` is `None`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119115
Approved by: https://github.com/Skylion007
2024-02-10 21:56:00 +00:00
Animesh Jain
0c3a1c893e [dynamo] Setup the globals for guard_fn without a reference to f_locals (#118447)
UPDATE - I changed the PR because from discussion with @jansel it was clear that someone else was holding on to a reference to f_locals. This PR now solves that problem first. I removed the eval_frame.c part because it was failing tests that use `exec` or `eval` with weird error like `no no locals found when storing 'math'`. I would debug that in a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118447
Approved by: https://github.com/Skylion007, https://github.com/jansel
ghstack dependencies: #118975, #118420
2024-02-05 05:39:39 +00:00
Taras Tsugrii
41b63b26c2 [dynamo] Fix incorrect docstring placements in _guards.py. (#119114)
This makes them unavailable when using help and other tools accessing them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119114
Approved by: https://github.com/kit1980
2024-02-03 06:25:54 +00:00
lezcano
eb2bdfae88 Make variables in dict LazyTrackers (not lazily guarded yet) and avoid using DICT_KEYS guard (#117625)
Make variables in dict lazy and remove DICT_KEYS guard.

We build the keys of a dict depth-first and we rely on the guards of
each element in the dict to create the correct guards. This allows us to
remove the rather buggy DICT_KEYS guard and make the guard lazy.
The guards are not completely lazy yet, as we instantiate them in
`_HashableTracker._eq_impl` but it should be possible to make them
truly lazy.

Also, adding new types to the supported types within keys should be less
error prone.

This is marginally less efficient when we graph break, but in turn we
should graph break much less. It also  makes the dicts code easier to maintain
(removes `is_hashable_python_var`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117625
Approved by: https://github.com/jansel, https://github.com/peterbell10, https://github.com/anijain2305
ghstack dependencies: #117982, #118098, #117983
2024-02-02 14:38:08 +00:00
Edward Z. Yang
46712b019d Enable local_partial_types (#118467)
When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414, #118418, #118432
2024-01-28 13:38:22 +00:00
voznesenskym
081c5b3adc Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926) (#114526)
Summary:

The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this)

cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: huydhn, Chillee

Differential Revision: D51566250

Pulled By: voznesenskym

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526
Approved by: https://github.com/Chillee, https://github.com/huydhn
2023-11-26 23:40:32 +00:00
PyTorch MergeBot
2f3beb715c Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926)"
This reverts commit 2ca1119d53.

Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))
2023-11-22 12:52:33 +00:00
voznesenskym
2ca1119d53 Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926)
The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926
Approved by: https://github.com/ezyang, https://github.com/eellison
2023-11-20 23:06:37 +00:00
Jez Ng
5b95715bc0 Make {Tracing,Compile}Context.get() return non-optional type (#113535)
They are used in many contexts that don't actually check if the returned
type is `None`. I have also created `try_get()` for the cases where we
do actually want an Optional type returned.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535
Approved by: https://github.com/ezyang
ghstack dependencies: #113412
2023-11-14 04:31:12 +00:00
Jez Ng
a8cf04fd2a [inductor] Make {output_graph,pad_mm}.py pass follow_imports typechecking (#113413)
I changed OutputGraph.nn_modules' type to `Dict[str, Any]` because it
seems that `register_attr_or_module` can populate it with essentially
any type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113413
Approved by: https://github.com/Skylion007
2023-11-11 22:15:46 +00:00
Jez Ng
b0ede09682 [inductor] Make pattern_matcher.py pass follow_imports typechecking (#113409)
Import following reveals that a good number of hints were wrong...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113409
Approved by: https://github.com/Skylion007
2023-11-10 19:58:08 +00:00
Jason Ansel
9664190952 [dynamo] Eagerly install guards (#111415)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111415
Approved by: https://github.com/voznesenskym
ghstack dependencies: #111306
2023-11-07 19:55:19 +00:00
Jason Ansel
4b8a5e1854 [dynamo] Remove VariableTracker.as_specialized (#112363)
My local testing can't seem to find this function actually doing anything.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112363
Approved by: https://github.com/yanboliang
2023-10-30 20:07:55 +00:00
Peter Bell
bbd5b935e4 Use pytree.tree_leaves everywhere (#112324)
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327, #112323
2023-10-30 03:39:04 +00:00
lezcano
c8a5bb451e Do not import sympy within torch._prims_common (#112034)
This is the first of a few PRs that avoid importing SymPy at import time.
The pitch here is that we (almost!) do not have SymPy on our API, so
this should be feasible.

This should speed-up torch imports by a good 15% as per
https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589

In this PR we just move a few global imports into local imports.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034
Approved by: https://github.com/ezyang
2023-10-26 12:53:25 +00:00
voznesenskym
9455af58b5 [easy][dynamo] Cleanup guard builder selection (#111723)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111723
Approved by: https://github.com/jon-chuang, https://github.com/jansel
2023-10-21 10:48:32 +00:00
Animesh Jain
58637c4b43 [dynamo] Remove SuperSource (#110475)
The motivation for removing this is already present in the pre-PR comments. Copying it

~~~
# NB - SuperSource is a weird one.
# it is our only source with 2 bases, so we use the objec
# as the base, rather than the type, since an invocation
# like super(Foo, foo) is represented here, the source object base is more spiritually
# aligned with the instance, rather than the type.
# This whole construction is questionable tho, and we should probably find a way to
# avoid this exception to our otherwise nice source parentage invariant.
~~~

Instead of using super(a, b), we can use `type(b).__mro__[index]`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110475
Approved by: https://github.com/jansel
2023-10-08 04:45:06 +00:00
chilli
005e8ddcb9 cache the hash construction on Guard (#110464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110464
Approved by: https://github.com/zou3519, https://github.com/voznesenskym
2023-10-04 04:49:18 +00:00
Edward Yang
88600e7d2e [RELAND] Force synced KJT to trace unbacked SymInt (#108960) (#109216)
Summary:

The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1.

The fix is to detect KJTs and treat these integers as *unbacked integers*. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples.

The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked.

Test Plan:
```
buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu
```

from aakhundov

1. first build feed_lower_benchmark:
```
buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark
```
2. then run the lowering of the model with it:
```
TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace
```
cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0

From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/

From ge0405
baseline (without your diff): f477293168
your diff: f477292363

```
buck2 test //caffe2/test/dynamo:test_dynamo_torchrec
buck2 run 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup'
```

Differential Revision: D49236757

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109216
Approved by: https://github.com/voznesenskym
2023-09-18 14:39:44 +00:00
PyTorch MergeBot
1d32c9c7f2 Revert "Force synced KJT to trace unbacked SymInt (#108960)"
This reverts commit f9a250c35b.

Reverted https://github.com/pytorch/pytorch/pull/108960 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/108960#issuecomment-1715850779))
2023-09-12 14:37:36 +00:00
Edward Yang
f9a250c35b Force synced KJT to trace unbacked SymInt (#108960)
Summary:
The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1.

The fix is to detect KJTs and treat these integers as *unbacked integers*. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples.

The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked.

Test Plan:
```
buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu
```

from aakhundov

1. first build feed_lower_benchmark:
```
buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark
```
2. then run the lowering of the model with it:
```
TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace
```
cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0

From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/

From ge0405
baseline (without your diff): f477293168
your diff: f477292363

Differential Revision: D49019987

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108960
Approved by: https://github.com/voznesenskym
2023-09-12 03:44:24 +00:00
Edward Z. Yang
66f67d9a25 Print restart attempt as part of Dynamo log context (#108864)
Now looks like:

```
[2023-09-08 06:04:48,532] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_ATTR foo [ConstantVariable(int), NNModule
Variable()]
[2023-09-08 06:04:48,532] [0/0] torch._dynamo.convert_frame: [INFO]
Restarting analysis due to _dynamo/variables/nn_module.py
:138 in convert_to_unspecialized
[2023-09-08 06:04:48,533] [0/0_1] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing f /data/users/ezyang/c/pytorch/a.py:6
[2023-09-08 06:04:48,533] [0/0_1] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line f /data/users/ezyang/c/pytorch/a.py:6
```

I'm happy to bikeshed the exact formatting of the attempt number if you
want.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108864
Approved by: https://github.com/mlazos, https://github.com/voznesenskym
2023-09-08 23:00:19 +00:00
Huy Do
5a4fe05a15 Revert "Force synced KJT to trace unbacked SymInt (#107788)" (#108684)
This reverts commit 3b92ef814d.  So let's manually revert it instead.

(Not sure why the bot doesn't work on https://github.com/pytorch/pytorch/pull/107788)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108684
Approved by: https://github.com/ezyang
2023-09-06 19:15:45 +00:00
Edward Z. Yang
3b92ef814d Force synced KJT to trace unbacked SymInt (#107788)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107788
Approved by: https://github.com/voznesenskym
2023-09-06 03:18:26 +00:00
Animesh Jain
a506d0ad8f [dynamo] Store originating source in the Guard object (#107634)
Many times, I find myself wanting to know the source for the guard. This PR adds that as a field of guard itself.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107634
Approved by: https://github.com/voznesenskym
ghstack dependencies: #107622
2023-08-22 02:16:31 +00:00
Edward Z. Yang
8292b03c47 Use fast traceback for symbolic shapes (#107439)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107439
Approved by: https://github.com/voznesenskym
ghstack dependencies: #107505, #107516, #107530, #107532, #107562, #107471
2023-08-22 01:03:13 +00:00
Edward Z. Yang
8316affc45 Add frame/recompile counter to all log messages in tracing context (#107530)
All log messages that occur while running Dynamo compilation now have `[X/Y]` added to the beginning of their message. X represents the frame being compiled, while Y says which compilation of the frame. For example, if you are debugging a frame that is repeatedly recompiling, you can look for N/0, N/1, N/2, etc. for the same N.  Here is what the logs look like as you transition from one frame to another:

<img width="1372" alt="image" src="https://github.com/pytorch/pytorch/assets/13564/4897e368-1e50-4807-b342-54e911bcf087">

To accurately get this prefix added to all messages, I had to expand the scope of the `tracing` context manager. Its scope now coincides with `log_compilation_event`. To do this, I had to populate fake mode lazily in the TracingContext, since it isn't created until later, inside the OutputGraph.

This subsumes the previous X.Y logging that was solely for dynamic shapes.

Unfortunately I had to reindent some stuff. Review the diff with whitespace off.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107530
Approved by: https://github.com/anijain2305
ghstack dependencies: #107505, #107516
2023-08-21 13:02:12 +00:00
Edward Z. Yang
d6d485fa8c Revamp guard debug logging (#107505)
The new guard printout looks like this:

```
[DEBUG] GUARDS:
[DEBUG]   ___check_type_id(L['name'], 7605632)                          # if name == "special_attr":  # test/dynamo/test_misc.py:1155 in __getattribute__
[DEBUG]   L['name'] == '_backward_pre_hooks'                            # if name == "special_attr":  # test/dynamo/test_misc.py:1155 in __getattribute__
[DEBUG]   ___check_obj_id(L['self'], 139746432564960)                   # return super().__getattribute__(name)  # test/dynamo/test_misc.py:1157 in __getattribute__
[DEBUG]   ___check_obj_id(L['__class__'], 1451499216)                   # return super().__getattribute__(name)  # test/dynamo/test_misc.py:1157 in __getattribute__
[DEBUG]   ___is_grad_enabled()                                          # _dynamo/output_graph.py:346 in init_ambient_guards
[DEBUG]   not ___are_deterministic_algorithms_enabled()                 # _dynamo/output_graph.py:342 in init_ambient_guards
[DEBUG]   ___is_torch_function_enabled()                                # _dynamo/output_graph.py:350 in init_ambient_guards
[DEBUG]   utils_device.CURRENT_DEVICE == None                           # _dynamo/output_graph.py:348 in init_ambient_guards
```

Along with the guards, we also print what line of user code caused the guard to be added, or what line of Dynamo internal code added the guard (if there is no user stack trace, which is typically the case for ambient guards.)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107505
Approved by: https://github.com/mlazos, https://github.com/voznesenskym, https://github.com/anijain2305
2023-08-20 06:50:27 +00:00
Edward Z. Yang
67bb3c05b0 Add verbose_guards logging artifact (#107388)
It looks like this:

```
[DEBUG] GUARD: ___check_type_id(L['z'][L["MyEnum"].BAR], 7640416) and L['z'][L["MyEnum"].BAR] == 10
[DEBUG] Stack:
[DEBUG]   File "/data/users/ezyang/b/pytorch/test/dynamo/test_misc.py", line 6657, in <module>
[DEBUG]     run_tests()
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/test_case.py", line 38, in run_tests
[DEBUG]     run_tests()
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/testing/_internal/common_utils.py", line 985, in run_tests
[DEBUG]     unittest.main(argv=argv)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/main.py", line 101, in __init__
[DEBUG]     self.runTests()
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/main.py", line 271, in runTests
[DEBUG]     self.result = testRunner.run(self.test)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/runner.py", line 184, in run
[DEBUG]     test(result)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__
[DEBUG]     return self.run(*args, **kwds)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run
[DEBUG]     test(result)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__
[DEBUG]     return self.run(*args, **kwds)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run
[DEBUG]     test(result)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/case.py", line 650, in __call__
[DEBUG]     return self.run(*args, **kwds)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/testing/_internal/common_utils.py", line 2521, in run
[DEBUG]     self._run_with_retry(
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/testing/_internal/common_utils.py", line 2450, in _run_with_retry
[DEBUG]     super_run(result=result)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/case.py", line 591, in run
[DEBUG]     self._callTestMethod(testMethod)
[DEBUG]   File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
[DEBUG]     method()
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/testing/_internal/common_utils.py", line 2377, in wrapper
[DEBUG]     method(*args, **kwargs)
[DEBUG]   File "/data/users/ezyang/b/pytorch/test/dynamo/test_misc.py", line 2529, in test_enum_as_dict_key_with_overloaded_str
[DEBUG]     res = opt_fn(x)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 333, in _fn
[DEBUG]     return fn(*args, **kwargs)
[DEBUG]   File "/data/users/ezyang/b/pytorch/test/dynamo/test_misc.py", line 2519, in fn
[DEBUG]     torch._dynamo.graph_break()
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 493, in catch_errors
[DEBUG]     return callback(frame, cache_size, hooks, frame_state)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 637, in _convert_frame
[DEBUG]     result = inner_convert(frame, cache_size, hooks, frame_state)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 133, in _fn
[DEBUG]     return fn(*args, **kwargs)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 371, in _convert_frame_assert
[DEBUG]     return _compile(
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 567, in _compile
[DEBUG]     guarded_code = compile_inner(code, one_graph, hooks, transform)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/utils.py", line 181, in time_wrapper
[DEBUG]     r = func(*args, **kwargs)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 466, in compile_inner
[DEBUG]     out_code = transform_code_object(code, transform)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
[DEBUG]     transformations(instructions, code_options)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 416, in transform
[DEBUG]     tracer = InstructionTranslator(
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/symbolic_convert.py", line 2018, in __init__
[DEBUG]     self.symbolic_locals = collections.OrderedDict(
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/symbolic_convert.py", line 2021, in <genexpr>
[DEBUG]     VariableBuilder(
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 211, in __call__
[DEBUG]     vt = self._wrap(value).clone(**self.options())
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 404, in _wrap
[DEBUG]     result = {
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 405, in <dictcomp>
[DEBUG]     k: VariableBuilder(
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 211, in __call__
[DEBUG]     vt = self._wrap(value).clone(**self.options())
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 354, in _wrap
[DEBUG]     return type_dispatch(self, value)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 837, in wrap_literal
[DEBUG]     return self.wrap_unspecialized_primitive(value)
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 1073, in wrap_unspecialized_primitive
[DEBUG]     guards=self.make_guards(GuardBuilder.CONSTANT_MATCH),
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 269, in make_guards
[DEBUG]     return {source.make_guard(guard) for guard in guards}
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 269, in <setcomp>
[DEBUG]     return {source.make_guard(guard) for guard in guards}
[DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_guards.py", line 641, in make_guard
[DEBUG]     return Guard(self.name(), self.guard_sou
```

One downside is I can't report *why* the guard was added. I'm not entirely sure how to do this; the problem is guards will propagate to a bunch of variables before finally getting included as part of the final set. Maybe a very very verbose version could report stack traces at every handoff point.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107388
Approved by: https://github.com/mlazos
ghstack dependencies: #107438, #107358
2023-08-18 19:05:54 +00:00
eellison
e9ae820279 Unfuse bias add before pointwise ops (#106912)
I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬

Kicking off another run in cublas 12, it's possible that the results have changed since.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106912
Approved by: https://github.com/jansel
ghstack dependencies: #106911
2023-08-16 17:22:24 +00:00