Commit Graph

2446 Commits

Author SHA1 Message Date
Guilherme Leobas
4eaa000acc Teach dynamo about torch.func.jvp (#119926)
List of changes:
- Replace JVP_NESTING by torch._C._functorch.maybe_current_level()
- Remove all increment nesting functions from wrap_fx_proxy_cls
- fwAD.make_dual receives the dual_level as keyword argument
- Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926
Approved by: https://github.com/zou3519
2024-03-22 20:25:47 +00:00
Peter Bell
5790096059 [dynamo] Remove uses of raise unimplemented (#122136)
`unimplemented` is a function that raises an error, so
`raise unimplemented(...)` never reaches the `raise`.
Another related issue is that `raise unimplemented(...) from e`
doesn't attach the exception cause correctly. I fix this by adding
a `from_exc` argument to `unimplemented`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122136
Approved by: https://github.com/lezcano
2024-03-22 19:29:58 +00:00
Brian Hirsh
09be5800c8 dynamo: support placement kwargs for DTensor.to_local() (#119947)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119947
Approved by: https://github.com/wanchaol, https://github.com/yoyoyocmu
ghstack dependencies: #118803
2024-03-22 14:42:27 +00:00
Brian Hirsh
2e44b12dd4 dynamo: handle DTensor.device_mesh.device_type (#118803)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118803
Approved by: https://github.com/wanchaol, https://github.com/yanboliang
2024-03-22 14:42:22 +00:00
chilli
eca30df846 Added load_args to repro (#121624)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121624
Approved by: https://github.com/ezyang
2024-03-22 08:32:14 +00:00
Jason Ansel
bb75313f0a [dynamo] Optimize handling of BINARY_OP (#122465)
This saves ~0.1s on https://dev-discuss.pytorch.org/t/a-torchdynamo-trace-time-ablation-study/1961

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122465
Approved by: https://github.com/oulgen
2024-03-22 08:14:58 +00:00
Joel Schlosser
cd6bfc7965 Proper view support for jagged layout NestedTensor (#113279)
This PR:
* Introduces an ATen op for creating true jagged views from a dense values buffer
    * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)`
    * This ops is implemented on the Python side using torch.library so we can return a subclass instance
    * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer`
    * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()`
    * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view
* Introduces an ATen op for accessing the `values` component of an NT via a view
    * `_nested_get_values(nt)`
* **Removes** the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively.
* Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly
    * Similarly, avoid `buffer_from_jagged()`, preferring `values()`
* Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack)

With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling.

Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922)
Co-authored-by: voznesenskym <voznesenskym@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279
Approved by: https://github.com/ezyang
2024-03-22 02:12:36 +00:00
PyTorch MergeBot
224beecee6 Revert "Proper view support for jagged layout NestedTensor (#113279)"
This reverts commit 5855c490f0.

Reverted https://github.com/pytorch/pytorch/pull/113279 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113279#issuecomment-2013899762))
2024-03-21 22:03:01 +00:00
William Wen
23524710e6 [dynamo] use proxies to nn.Module in dynamo generated GraphModules (#120756)
Fixes remaining refleaks found when debugging https://github.com/pytorch/pytorch/issues/119607, tests added in https://github.com/pytorch/pytorch/pull/120657.

Also fixes some tests that xfail: https://github.com/pytorch/pytorch/issues/120631 (not entirely sure why), but introduced tests now fail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120756
Approved by: https://github.com/jansel
2024-03-21 21:23:12 +00:00
PyTorch MergeBot
968c4c4154 Revert "Refactor gpu trace to be device-agnostic (#121794)"
This reverts commit 74deacbf31.

Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks ROCm jobs in trunk 74deacbf31, please help take a look and reland the change ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2013674083))
2024-03-21 20:33:17 +00:00
Adnan Akhundov
885fb9742d Handle special kwargs in user-written Triton kernel calls (#122280)
Summary: Special kwargs like `num_warps`, `num_stages`, and `num_ctas` can be passed to the Triton kernel call as kwargs. These kwargs are handled in a special way, not being passed to the underlying kernel function directly. In this PR, we move those special kwargs from `kwargs` of the `TritonKernelVariable` in dynamo to `Autotuner`'s `Config` instances (either already existing or newly created for this purpose). As a result, the special kwargs can be codegened correctly as a part of `Config`, not as direct arguments to the kernel `.run`.

Test Plan:

```
python test/inductor/test_triton_kernels.py -k test_triton_kernel_special_kwargs
...
----------------------------------------------------------------------
Ran 6 tests in 6.783s

OK
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122280
Approved by: https://github.com/oulgen
2024-03-21 03:34:07 +00:00
Yu, Guangye
74deacbf31 Refactor gpu trace to be device-agnostic (#121794)
# Motivation
Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend.

# Solution
move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794
Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
2024-03-21 01:52:58 +00:00
JackCaoG
e38d60bc07 Remove some stale xla dynamo backend (#122128)
`torchxla_trace_once ` and `aot_torchxla_trivial ` should be removed.

In our internal(hopefully dashboard can be open source soon) torchbench daily runs, `openxla` backend has much higher passing rate and similar perfomrance as the `openxla_eval`(non-aot-auto-grad backend). We still use `openxla_eval` in llama2 example but I think we should move user to `openxla` backend going forward.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122128
Approved by: https://github.com/alanwaketan, https://github.com/jansel
2024-03-21 01:13:50 +00:00
Joel Schlosser
5855c490f0 Proper view support for jagged layout NestedTensor (#113279)
This PR:
* Introduces an ATen op for creating true jagged views from a dense values buffer
    * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)`
    * This ops is implemented on the Python side using torch.library so we can return a subclass instance
    * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer`
    * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()`
    * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view
* Introduces an ATen op for accessing the `values` component of an NT via a view
    * `_nested_get_values(nt)`
* **Removes** the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively.
* Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly
    * Similarly, avoid `buffer_from_jagged()`, preferring `values()`
* Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack)

With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling.

Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922)
Co-authored-by: voznesenskym <voznesenskym@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279
Approved by: https://github.com/ezyang
2024-03-20 23:45:34 +00:00
ydwu4
61ff41f0ca [while_loop] disable closure capturing and manually set the inputs. (#122244)
For while_loop operator, it's important to keep the output ordering consistent with input ordering. Previously, we're using set_graph_inputs="automatic", which doesn't respect such ordering. This PR changes it to "manual" and respects the original user inputs' ordering. We disable closures for body and cond fn as they require some additional designs. This PR is just to prevent the bleeding.

 Repro:
```python
import torch
from torch._higher_order_ops.while_loop import while_loop
from torch._functorch.aot_autograd import aot_export_module

class Nested(torch.nn.Module):
    def forward(self, ci, cj, a, b):
        def cond_fn(i1, j1, x1, y1):
            return i1 > 0
        def body_fn(i1, j1, x1, y1):
            def cond_fn_nested(i2, j2, x2, y2):
                return j2 > 0
            def body_fn_nested(i2, j2, x2, y2):
                return i2.clone(), j2 - 1, x2 + 3.14, y2 - 2.71
            i1, j1, x1, y1 = while_loop(
                cond_fn_nested, body_fn_nested, [i1, j1, x1, y1]
            )
            return i1 - 1, j1.clone(), x1 * 2, y1 / 2
        return while_loop(cond_fn, body_fn, (ci, cj, a, b))

nested = Nested()
torch.compile(nested, backend="eager", fullgraph=True)(torch.tensor(2), torch.tensor(2), torch.randn(2, 2), torch.randn(2, 2))
```

Test plan:
add new test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122244
Approved by: https://github.com/aakhundov
2024-03-20 20:14:35 +00:00
PyTorch MergeBot
0696db8202 Revert "Teach dynamo about torch.func.jvp (#119926)"
This reverts commit 17489784b6.

Reverted https://github.com/pytorch/pytorch/pull/119926 on behalf of https://github.com/peterbell10 due to broken mac jobs on main ([comment](https://github.com/pytorch/pytorch/pull/119926#issuecomment-2010327997))
2024-03-20 18:34:43 +00:00
IvanKobzarev
8a94005d46 [dynamo][runtime_asserts] Ignore failures on sorting sympy relations (#122205)
Differential Revision: [D55075500](https://our.internmc.facebook.com/intern/diff/D55075500)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122205
Approved by: https://github.com/ezyang
2024-03-20 13:25:37 +00:00
Guilherme Leobas
17489784b6 Teach dynamo about torch.func.jvp (#119926)
List of changes:
- Replace JVP_NESTING by torch._C._functorch.maybe_current_level()
- Remove all increment nesting functions from wrap_fx_proxy_cls
- fwAD.make_dual receives the dual_level as keyword argument
- Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926
Approved by: https://github.com/zou3519
2024-03-20 13:09:19 +00:00
Jason Ansel
477d154ffd [dynamo] Add missing _nonvar_fields annotations (#122219)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122219
Approved by: https://github.com/anijain2305
ghstack dependencies: #122218
2024-03-20 07:53:18 +00:00
Jason Ansel
46bf37b3f7 [dynamo] Replace VariableTracker.apply with visit/realize_all (#122218)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122218
Approved by: https://github.com/anijain2305
2024-03-20 07:53:18 +00:00
Jason Ansel
a0db2e4237 [dynamo] Fixed handling of ImportError (#122222)
Fixes #122088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122222
Approved by: https://github.com/anijain2305
2024-03-20 07:52:01 +00:00
PyTorch MergeBot
e5e0685f61 Revert "[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)"
This reverts commit 88ebdbc97c.

Reverted https://github.com/pytorch/pytorch/pull/122098 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the distributed failure looks legit as it is also failing in trunk 88ebdbc97c ([comment](https://github.com/pytorch/pytorch/pull/122098#issuecomment-2008483316))
2024-03-20 01:12:24 +00:00
Oguz Ulgen
c0b2e56c8f Support triton.language.dtype with torch.compile -- Second Attempt (#122141)
This PR is the second attempt at supporting `triton.language.dtype`, now instead of putting it on the graph, we put it on the side table since it is a constant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122141
Approved by: https://github.com/jansel
ghstack dependencies: #122140
2024-03-19 19:40:52 +00:00
Oguz Ulgen
58a805da71 [UserDefinedTriton] Move constant args out of the fx graph (#122140)
@ezyang mentioned that we should not put constant args on the graph. Especially when there are args that would be trickier to put on the graph. E.g. next PR needs `triton.language.dtype` as an argument on the graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122140
Approved by: https://github.com/jansel
2024-03-19 19:40:52 +00:00
PyTorch MergeBot
36e5c1dcab Revert "Teach dynamo about torch.func.jvp (#119926)"
This reverts commit edd04b7c16.

Reverted https://github.com/pytorch/pytorch/pull/119926 on behalf of https://github.com/jeanschmidt due to lots of breakages in pull jobs, checking if reverting this one will help ([comment](https://github.com/pytorch/pytorch/pull/119926#issuecomment-2007915919))
2024-03-19 18:59:46 +00:00
Peter Bell
88ebdbc97c [dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)
Fixes #114844

In the linked issue we have
```
compiled_module = torch.compile(module)
compiled_module.x = ...
compiled_module(...)  # Mutates self.x
```
Where since the module mutates `self.x` you would expect `compiled_module.x`
to be updated but actually `compiled_module.x = ...` sets an attribute "x"
on the `OptimizedModule` object while the forward method of the module mutates
`module.x`.

This gives the expected behavior by forwarding `compiled_module.__setattr__`
down to `module.__setattr__`. There is already a corresponding `__getattr__`
so now `compiled_module.x` becomes an alias for `module.x`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098
Approved by: https://github.com/ezyang, https://github.com/lezcano
2024-03-19 16:51:43 +00:00
PyTorch MergeBot
f9ed1c432d Revert "Refactor gpu trace to be device-agnostic (#121794)"
This reverts commit 0ff1109e26.

Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/jeanschmidt due to Reverting to see if rocm trunk errors are related ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2007519408))
2024-03-19 15:40:26 +00:00
Jason Ansel
c05bf0037d [dynamo] Remove copy_graphstate/restore_graphstate (#122067)
Some dead code cleanup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122067
Approved by: https://github.com/oulgen
2024-03-19 15:37:53 +00:00
Guilherme Leobas
edd04b7c16 Teach dynamo about torch.func.jvp (#119926)
List of changes:
- Replace JVP_NESTING by torch._C._functorch.maybe_current_level()
- Remove all increment nesting functions from wrap_fx_proxy_cls
- fwAD.make_dual receives the dual_level as keyword argument
- Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926
Approved by: https://github.com/zou3519
2024-03-19 13:06:42 +00:00
Yu, Guangye
0ff1109e26 Refactor gpu trace to be device-agnostic (#121794)
# Motivation
Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend.

# Solution
move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794
Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
2024-03-19 06:02:28 +00:00
Jason Ansel
5bc7f7f977 [dynamo] Make tx.next_instruction lazy (#122066)
Improves benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py
from 2.5s to 2.4s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122066
Approved by: https://github.com/oulgen, https://github.com/anijain2305
ghstack dependencies: #122039, #122043, #122055, #122058, #122060, #122063
2024-03-19 04:23:30 +00:00
Jason Ansel
153a01833b [dynamo] Optimize SourcelessBuilder (#122063)
Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
from 2.7s to 2.5s.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122063
Approved by: https://github.com/anijain2305
ghstack dependencies: #122039, #122043, #122055, #122058, #122060
2024-03-19 04:23:30 +00:00
Jason Ansel
8082adcf65 [dynamo] Only rename a proxy once (#122060)
Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
from 3.9s to 2.7s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122060
Approved by: https://github.com/oulgen
ghstack dependencies: #122039, #122043, #122055, #122058
2024-03-19 04:23:27 +00:00
Jason Ansel
2bec55c5f9 [dynamo] Remove VariableTracker.parents_tracker (#122058)
This is leftover from mutable variable tracker days and no longer needed.

Improves benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py
from 4.2s to 3.9s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122058
Approved by: https://github.com/oulgen, https://github.com/anijain2305
ghstack dependencies: #122039, #122043, #122055
2024-03-19 04:23:24 +00:00
Jason Ansel
3c706bf483 [dynamo] Optimize BuiltinVariable (#122055)
Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
from 5.1s to 4.2s (compared to 2 PRs ago).

This works by precomputing (and caching) the parts of `BuiltinVariable.call_function` that don't depend on the values of args/kwargs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122055
Approved by: https://github.com/oulgen, https://github.com/anijain2305
ghstack dependencies: #122039, #122043
2024-03-19 04:23:20 +00:00
Jason Ansel
07caea5c12 [dynamo] Refactor COMPARE_OP and comparison builtins (#122043)
This removes the duplicate handling of comparison ops between symbolic_convert and bultin and refactors the handling to use the binop infrastructure.  This change regresses overheads a bit, but this is fixed in the next PR.

New test skips are variants of `type(e) is np.ndarray` previously falling back to eager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122043
Approved by: https://github.com/anijain2305
ghstack dependencies: #122039
2024-03-19 04:23:17 +00:00
Jason Ansel
769ff86b91 [dynamo] Optimize COMPARE_OP (#122039)
Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
from 5.6 to 5.1s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122039
Approved by: https://github.com/Skylion007, https://github.com/anijain2305
2024-03-19 04:23:14 +00:00
Animesh Jain
8860c625ea [dynamo][guards-cpp-refactor] Integrate cpp guard manager with CheckFnManager (#120726)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120726
Approved by: https://github.com/jansel
2024-03-19 03:11:31 +00:00
Animesh Jain
f84d560236 [dynamo] Raise accumulated cache size limit (#122130)
Fixes #114511

This was raised by IBM folks where the a LLM compile was failing because it had more than 64 layers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122130
Approved by: https://github.com/Chillee, https://github.com/jansel
ghstack dependencies: #121954, #122005
2024-03-19 02:35:48 +00:00
Animesh Jain
7084528eb9 [dynamo][model_output] Do not include none for CustomizedDictVariable (#122005)
Fixes https://github.com/pytorch/pytorch/issues/120923

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122005
Approved by: https://github.com/weifengpy, https://github.com/jansel
ghstack dependencies: #121954
2024-03-19 02:35:48 +00:00
Jason Ansel
18d94d7165 Make FX nodes sortable (#122071)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122071
Approved by: https://github.com/oulgen
2024-03-19 01:40:56 +00:00
andrewor14
773ae817f7 Batch Norm Consolidation (#116092)
**Summary:**

This commit simplifies the existing decomposition hierarchy
of batch norm ops by adding a single, backend agnostic op:
`batch_norm_with_update`. The existing hierarchy looks like:

```
aten.batch_norm ->
aten._batch_norm_impl_index ->
[
  aten.native_batch_norm ->
  aten._native_batch_norm_legit (export only) ->
  _batch_norm_legit_cpu/cuda (kernels, export only) ->
  _batch_norm_cpu/cuda (kernels)
] OR
[ aten.cudnn_batch_norm ] OR
[ aten.miopen_batch_norm ]
```

Aside from complexity, an important problem with the
above decomposition hierarchy is cuda numerics in
export flows. We observed significantly worse convergence
when training a mobilenetv2-like model when using the
`_batch_norm_cuda` kernel instead of the `cudnn_batch_norm`
kernel. This means users who export their models on CPU
first then move the models to cuda later may silently
see worse accuracies even when cudnn is installed,
because they are using the worse kernel. This issue is
summarized in https://github.com/pytorch/pytorch/issues/111384.

Instead, the new hierarchy proposed by consolidating
existing batch norm ops will look like:

```
aten.batch_norm ->
aten.batch_norm_with_update ->
[ _batch_norm_cpu (kernel) ] OR
[ _batch_norm_cuda (kernel) ] OR
[ cudnn_batch_norm (kernel) ] OR
[ miopen_batch_norm (kernel) ]
```

The new op `batch_norm_with_update` hides backend
implementation details and automatically picks the right
kernel based on what is installed. This commit also adds
the following variants to this op:

```
batch_norm_with_update_functional
batch_norm_with_update.out
batch_norm_no_update
batch_norm_no_update.out
batch_norm_backward
```

Note that this commit only adds this op and its variants,
but does not actually change the decomps to produce these
ops in the graph. This will be done after the 2 week FC
window, and the ops used in the old stack is planned to
be removed after the 6 month BC window.

Test Plan: `OpInfo` tests for `batch_norm_with_update`.

Reviewers: albanD, bdhirsh

Subscribers: albanD, bdhirsh, supriyar

Tasks: https://github.com/pytorch/pytorch/issues/111384

Differential Revision: [D54805279](https://our.internmc.facebook.com/intern/diff/D54805279)
Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2024-03-18 21:01:30 +00:00
Oguz Ulgen
7c5e29ae71 Back out "Support triton.language.dtype with torch.compile (#121690)" (#122108)
Summary: Some hard to deal with package import/export related problems. Lets revert and start with clean slate.

Test Plan: CI

Differential Revision: D55024877

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122108
Approved by: https://github.com/ezyang
2024-03-18 20:50:28 +00:00
James Wu
df1cdaedeb Log restart reasons and extra compile time in CompilationMetrics (#121827)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121827
Approved by: https://github.com/ezyang, https://github.com/yanboliang
2024-03-18 18:59:25 +00:00
Jason Ansel
5d52b163d1 [dynamo] Optimize load/store/const op handling (#122038)
Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
from 6.7s to 5.6.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122038
Approved by: https://github.com/Skylion007
ghstack dependencies: #122032, #122033, #122034, #122035
2024-03-18 18:08:06 +00:00
Jason Ansel
4034873a31 [dynamo] Optimize builtin handling (#122035)
Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
from 7.3s to 6.7s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122035
Approved by: https://github.com/Skylion007
ghstack dependencies: #122032, #122033, #122034
2024-03-18 18:08:06 +00:00
Jason Ansel
6ca0323615 [dynamo] Optimize VariableTracker.__post_init__ (#122034)
Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
from 8.6s to 7.3s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122034
Approved by: https://github.com/Skylion007
ghstack dependencies: #122032, #122033
2024-03-18 18:08:06 +00:00
Animesh Jain
c568b84794 [dynamo][guards] Move backend match to eval_frame (#121954)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121954
Approved by: https://github.com/jansel
2024-03-17 06:52:10 +00:00
Jason Ansel
4d92928fe2 [dynamo] Add tests for fake FSDP (#121610)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121610
Approved by: https://github.com/yanboliang
ghstack dependencies: #121735, #120965
2024-03-16 04:29:59 +00:00
Jason Ansel
0b7d9711d4 [dynamo] Add support for nn.Parameter constructor (part 2) (#120965)
This handles the case where the tensor isn't an input.

The changes to dynamo tests are cases where we would previously fall back to eager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120965
Approved by: https://github.com/yanboliang
ghstack dependencies: #121735
2024-03-16 04:29:58 +00:00
Jason Ansel
040b925753 [Compiled Autograd] Reorder accumulate grad nodes (#121735)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121735
Approved by: https://github.com/xmfan
2024-03-16 04:29:56 +00:00
lezcano
d0d09f5977 Fix torch.compile links (#121824)
Fixes https://github.com/pytorch/pytorch.github.io/issues/1567

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121824
Approved by: https://github.com/svekars, https://github.com/peterbell10, https://github.com/malfet
ghstack dependencies: #121823
2024-03-15 19:49:37 +00:00
lezcano
8a5a377190 Move doc links to point to main (#121823)
The previous links were pointing to an outdated branch

Command: `find . -type f -exec sed -i "s:docs/main:docs/master:g" {} + `

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121823
Approved by: https://github.com/albanD, https://github.com/malfet
2024-03-15 19:49:37 +00:00
Animesh Jain
d04faf4531 [dynamo][compile-time] Remove preserve rng state per op (#121923)
We already have one globally - 02bb2180f4/torch/_dynamo/convert_frame.py (L477)

I don't think we need per op.

Saves ~2 seconds on this benchmark

~~~
def fn(x):
    for _ in range(10000):
        x = torch.ops.aten.sin(x)
    return x
~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121923
Approved by: https://github.com/jansel
2024-03-15 18:24:46 +00:00
Jason Ansel
5a2b4fc8f0 [dynamo] Convert invalid args into graph breaks (#121784)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121784
Approved by: https://github.com/yanboliang
2024-03-15 06:51:27 +00:00
Animesh Jain
a623666066 [dynamo][compile-time] Make output_graph new_var linear (#121858)
Fixes https://github.com/pytorch/pytorch/issues/121679

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121858
Approved by: https://github.com/jansel
2024-03-15 03:20:04 +00:00
Jason Ansel
60ccf81490 [dynamo] Refactor update_block_stack into a seperate function (#121810)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121810
Approved by: https://github.com/williamwen42
ghstack dependencies: #121790
2024-03-15 01:01:05 +00:00
Jason Ansel
1e9a7df8fe [dynamo] Compile time optimizations in tx.step() (#121790)
`python benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
- Before: `symbolic_convert_overhead_stress_test: 10.7s`
- After: `symbolic_convert_overhead_stress_test: 8.6s`

`tx.step()` is a small part of that benchmark, so likely the speedup in that isolated function is larger than the top line.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121790
Approved by: https://github.com/oulgen
2024-03-15 01:01:05 +00:00
Animesh Jain
92ed8553a6 Revert "Switch cudagraph backend to cudagraph trees (#121019)" and "Add Cudagraphs disable checking (#121018)" (#121864)
This reverts commit 9373ad0bb8.

Revert "Add Cudagraphs disable checking (#121018)"

This reverts commit 4af0e634bf.

Causes compilation time increase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121864
Approved by: https://github.com/eellison
2024-03-15 00:03:09 +00:00
Michael Lazos
15abc56bd5 Graph break on step closure in optimizer (#121777)
Fixes https://github.com/pytorch/pytorch/issues/116494

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121777
Approved by: https://github.com/yanboliang
2024-03-14 03:18:23 +00:00
PyTorch MergeBot
70c6f542f2 Revert "[dynamo] Convert invalid args into graph breaks (#121784)"
This reverts commit 0df39480f6.

Reverted https://github.com/pytorch/pytorch/pull/121784 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think it breaks ONNX test in trunk 0c1ac4484d ([comment](https://github.com/pytorch/pytorch/pull/121784#issuecomment-1995979435))
2024-03-13 22:12:43 +00:00
Boyuan Feng
0c1ac4484d Support call_method in DDPOptimizer (#121771)
This PR fixes Issue #111279.

While #111279 reported the issue with `MultiheadAttention`, a minimal reproduction would be:
```python
class ToyModel(nn.Module):
    def __init__(self,):
        super().__init__()
        self.linear = nn.Linear(128, 10)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear.forward(x) # Error
        # return self.linear(x) # OK
```

Dynamo treats `self.linear(x)` as `call_module` while treating `self.linear.forward(x)` as a [`get_attr` and a `call_method`](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/variables/nn_module.py#L358-L378). However, existing DDPOptimizer assumes, for a `get_attr` node, `getattr(gm, node.target)` gives a tensor with the `requires_grad` attribute. Existing DDPOptimizer also does not support `call_method` nodes.

This PR adds support for `call_method` and check on `get_attr`. It also checks if a module's parameters have been added to a bucket to support multiple method calls from the same module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121771
Approved by: https://github.com/yf225
2024-03-13 20:03:15 +00:00
Jason Ansel
0df39480f6 [dynamo] Convert invalid args into graph breaks (#121784)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121784
Approved by: https://github.com/yanboliang
ghstack dependencies: #121615, #121616
2024-03-13 20:02:33 +00:00
Michael Lazos
05df03ec1b Allow custom attributes for torch function subclasses (#121693)
Added custom attribute access with test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121693
Approved by: https://github.com/anijain2305
2024-03-13 17:01:57 +00:00
Jason Ansel
559ca13b3f [dynamo] Refactor TorchInGraphFunctionVariable for compile time (#121616)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121616
Approved by: https://github.com/oulgen
ghstack dependencies: #121615
2024-03-13 14:21:21 +00:00
Avik Chaudhuri
7fe0cc53e9 make _process_dynamic_shapes an implementation detail (#121713)
Summary: `_process_dynamic_shapes` converts new dynamic shapes to old constraints, but in the future may not need to do so. Preparing for that future.

Test Plan: CI

Differential Revision: D54780374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121713
Approved by: https://github.com/tugsbayasgalan
2024-03-13 08:33:00 +00:00
Jason Ansel
a13dd92d88 [dynamo] Minor compile time optimizations in torch.py (#121615)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121615
Approved by: https://github.com/oulgen
2024-03-13 05:36:22 +00:00
Yanan Cao
7d05c4c093 Remove error anti-pattern when dealing with dynamic shape output (#121681)
There are cases where capture_dynamic_output_shape_ops=True and we will still see DynamicOutputShapeException. For example, when an op doesn't have a meta kernel implemented to return the correct dynamic shape output. If we blindly give users instructions to set capture_dynamic_output_shape_ops to True, users would try it and see no change. As witnessed in this issue:
https://github.com/pytorch/pytorch/issues/121036#issuecomment-1985221435

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121681
Approved by: https://github.com/tugsbayasgalan
2024-03-13 00:45:23 +00:00
Wenting Wang
02bb2180f4 [torch export] replace traceback.extract_stack with CapturedTraceback.extract (#121449)
Summary:
with a simple bench in TestDeserializer.test_basic function:
```
time_start = time.time()
for i in range(1000):
    self.check_graph(MyModule(), inputs)
warnings.warn(f"time_taken: {time.time() - time_start}")
```
and forcing FakeTensorConfig.debug to True, record_stack_traces to True, logging level to debug, it shows that the the changed code is consistently ard 20 secs faster (~90s vs originally ~110s)

Test Plan:
test passed, see summary

compared debug trace before and after:
- exactly the same for fake tensor and proxy callsite https://www.internalfb.com/intern/diffing/?paste_number=1189883685
- slightly different for the user frame in proxy node https://www.internalfb.com/intern/diffing/?paste_number=1189884347

Differential Revision: D54237017

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121449
Approved by: https://github.com/angelayi
2024-03-13 00:19:05 +00:00
Oguz Ulgen
79ee6bbde3 Support triton.language.dtype with torch.compile (#121690)
Putting this PR as an RFC since I have resorted to some horrible hacks in order to make this work.
```
(Pdb) p triton.language.float32
triton.language.fp32
(Pdb) p str(triton.language.float32)
'fp32'
(Pdb) p repr(triton.language.float32)
'triton.language.fp32'
```
This means that we need to "rewrite" them for fx graph and inductor execution.

This PR allows Mamba2 to work with `torch.compile`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121690
Approved by: https://github.com/Skylion007
2024-03-12 23:21:46 +00:00
Animesh Jain
22bb24986d [dynamo][guards] Use lazy variable tracker for func defaults (#121388)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121388
Approved by: https://github.com/jansel
2024-03-12 22:48:48 +00:00
PyTorch MergeBot
5b506c8bce Revert "[dynamo][guards] Use lazy variable tracker for func defaults (#121388)"
This reverts commit 04a5d6e8d3.

Reverted https://github.com/pytorch/pytorch/pull/121388 on behalf of https://github.com/osalpekar due to causing executorch model-test failures internally. See [D54707529](https://www.internalfb.com/diff/D54707529) ([comment](https://github.com/pytorch/pytorch/pull/121388#issuecomment-1992619251))
2024-03-12 21:31:18 +00:00
Shunting Zhang
522d972924 [eazy] add more log when accuracy check fail (#121656)
Add these log to debug the regress of accuracy test for dm_nfnet_f0 model for training.

With these extra log when the accuracy check fail, we can verify if it's close to succeed or not. If yes that indicates there is no real issue but just flaky and we probably can tune the tolerance to fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121656
Approved by: https://github.com/jansel, https://github.com/Skylion007
2024-03-12 20:58:20 +00:00
Jason Ansel
3c8c7e2a46 [dynamo] Tweak naming for module hook bw_state (#121609)
Some minor changes not related to the other PRs in the stack

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121609
Approved by: https://github.com/yanboliang
2024-03-12 16:27:56 +00:00
Animesh Jain
78b4793c96 [dynamo][compile-time] Caching VTs to reduce compile-time (#121031)
Reduces the `torch.compile(backend="eager")` for this code

~~~
def fn(x):
    for _ in range(10000):
        # x = torch.sin(x)
        x = torch.ops.aten.sin(x)
        # x = sin(x)

    return x
~~~

From 18 seconds to 12 seconds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121031
Approved by: https://github.com/jansel
2024-03-12 09:19:50 +00:00
PyTorch MergeBot
fd0dbcd891 Revert "Batch Norm Consolidation (#116092)"
This reverts commit 7b4f70eda5.

Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/osalpekar due to Causes build failure in //caffe2:aten-hip (AMD build) target. See [D54707318](https://www.internalfb.com/diff/D54707318) for more details, may require internal build system changes to resolve. ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1989542965))
2024-03-11 22:22:41 +00:00
Jason Ansel
7cc476ea16 [dynamo] Fix support for nn.Parameter constructor (part 1) (#120163)
This captures calls to `torch.nn.Parameter` by lifting them to graph inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120163
Approved by: https://github.com/albanD, https://github.com/yanboliang
ghstack dependencies: #121086
2024-03-11 05:14:42 +00:00
Jason Ansel
32488b0664 [dynamo] Support _unsafe_set_version_counter (#121086)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121086
Approved by: https://github.com/yanboliang
2024-03-11 05:14:42 +00:00
Ze Sheng
7a4e451184 [Dynamo] Fix function overrides (#120885)
To check existence of `__torch_function__`, the code intended to iterate each element but got `TupleVariable` when the ordinary `has_torch_function()` was being used. Needs further unpack in this case

Fixes #120653

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120885
Approved by: https://github.com/yanboliang
2024-03-11 02:18:43 +00:00
Yifu Wang
71d0202627 [dynamo] support rewriting dist.all_reduce with explicitly specified reduce op (#120181)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120181
Approved by: https://github.com/wconstab, https://github.com/awgu
2024-03-09 08:28:22 +00:00
eellison
9373ad0bb8 Switch cudagraph backend to cudagraph trees (#121019)
Switch torch.compile(..., backend="cudagraphs") to use cudagraph trees. Enabled a few test in cudagraph_trees and note that there is another test suite existing for cudagraphs backend: https://github.com/pytorch/pytorch/blob/main/test/dynamo/test_cudagraphs.py.

This is basically the inductor cudagraphs without inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121019
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #121017, #121018
2024-03-08 22:56:26 +00:00
eellison
4af0e634bf Add Cudagraphs disable checking (#121018)
Adds the same cudagraphs disable checking from inductor - cudagraph trees to cudagraphs backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121018
Approved by: https://github.com/ezyang
ghstack dependencies: #121017
2024-03-08 22:47:24 +00:00
Boyuan Feng
35d3adb4b0 Add ATen Op _chunk_cat and _chunk_cat.out (#121081)
# Motivation

In backward of per-parameter sharding FSDP, each rank performs reduce scatter to sync gradients across ranks. A rank chunks each gradient tensor into `world_size` slices along the 0-th dimension and concatenate all slices along the 1-th dimension. Gradient tensors will be padded before concatenation when tensor.size(0) % world_size != 0.

### Example 1
Consider `world_size=3` and tensors A (2x4), B (3x3), C (1x2):

Input tensors:
```
AAAA   BBB   CC
AAAA   BBB
       BBB
```

Reduce-scatter-copy-in Output:
```
AAAABBBCC
AAAABBB00
0000BBB00
```

### Example 2
Consider `world_size=2` and tensors A (2x4), B (3x3), C(1x2), D(4x2):

Input tensors:
```
AAAA   BBB   CC   DD
AAAA   BBB   00   DD
       BBB        DD
       000        DD
```

Reduce-scatter-copy-in first pad:
```
AAAA   BBB   CC   DD
AAAA   BBB   00   DD
       BBB        DD
       000        DD
```

Then chunk and cat along dim as the output:
```
AAAABBBBBBCCDDDD
AAAABBB00000DDDD
```

The performance of reduce-scatter-copy-in is critical to per-parameter sharding FSDP. However, reduce-scatter-copy-in via composing existing ATen ops involves `cat` and irregular `pad`, leading redundant data copies and unsatisfactory performance.

# PR
We provide aten native support for reduce-scatter-copy-in, namely `_chunk_cat()`:

```
_chunk_cat(Tensor[] tensors, int dim, int num_chunks) -> Tensor
```

This PR includes the registration of `_chunk_cat` and `_chunk_cat.out`, OpInfo tests, and basic implementation composing existing ATen ops.
In the next PR, we will add the CUDA implementation. Comparing with baselines of composing existing ATen ops, `_chunk_cat()` CUDA implementation improves copy bandwidth from 498 GB/s to 966 GB/s on a production benchmark.

## Requirements on input

1. If input tensors have different ndims, dim should be non-negative and be less than the ndims of every input tensors. If all input tensors have the same ndims, we support both negative and non-negative dim.
2. For wrapped_dim, all tensors should have the same size for 0,...,wrapped_dim-1 dimensions. No requirements for (wrapped_dim, ...)-th dimension.
3. Expect positive num_chunks
4. Expect non-empty input tensor list and each input tensor should have at least 1 element

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121081
Approved by: https://github.com/albanD
2024-03-08 21:48:12 +00:00
albanD
53d5276d69 Improve Dynamo support for torch function and class methods in general (#121365)
I was originally trying to solve https://github.com/pytorch/pytorch/issues/120799 but got sidetracked along the way.
This PR contains a couple fixes. Let me know if you want me to split them up!

- Properly handle invalid user code when "super()" is called from non-method/classmethod. It will now properly raise the same error as CPython
- Fix base VariableTracker `__str__` method shadowing all `__repr__` methods defined in subclasses
- Fix accessing a classmethod on a user object to bind "cls" and not "self"
- Fix custom class handling of super() call to properly handle mixed regular/class/static methods

Locally , test_repros.py -k test_batch_norm_act still fails where the generated graph module is:
```
Call using an FX-traced Module, line 8 of the traced Module's generated forward function:
    x = self.forward(l_x_);  self = l_x_ = None
    x_1 = self.L__self___act(x);  x = None
```
note that "self" is being unset on the first line even though it is used on the second one.
For reference, this is the test c268ce4a6d/test/dynamo/test_repros.py (L1368-L1369)
I cannot figure out where the generated forward function comes from though, any hint would be welcome!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121365
Approved by: https://github.com/jansel
2024-03-08 20:03:49 +00:00
Elias Ellison
937e89f252 cudagraphs backend refactoring (#121017)
This is just some refactoring.. no functional changes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121017
Approved by: https://github.com/ezyang
2024-03-08 19:47:41 +00:00
Aaron Gokaslan
d55d803812 Add operator length hint support (#121495)
Seemed like an easy operator to squeeze into Python 2.3 . Added a simple test. Partially addresses #116396

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121495
Approved by: https://github.com/albanD
2024-03-08 19:08:33 +00:00
andrewor14
7b4f70eda5 Batch Norm Consolidation (#116092)
**Summary:**

This commit simplifies the existing decomposition hierarchy
of batch norm ops by adding a single, backend agnostic op:
`batch_norm_with_update`. The existing hierarchy looks like:

```
aten.batch_norm ->
aten._batch_norm_impl_index ->
[
  aten.native_batch_norm ->
  aten._native_batch_norm_legit (export only) ->
  _batch_norm_legit_cpu/cuda (kernels, export only) ->
  _batch_norm_cpu/cuda (kernels)
] OR
[ aten.cudnn_batch_norm ] OR
[ aten.miopen_batch_norm ]
```

Aside from complexity, an important problem with the
above decomposition hierarchy is cuda numerics in
export flows. We observed significantly worse convergence
when training a mobilenetv2-like model when using the
`_batch_norm_cuda` kernel instead of the `cudnn_batch_norm`
kernel. This means users who export their models on CPU
first then move the models to cuda later may silently
see worse accuracies even when cudnn is installed,
because they are using the worse kernel. This issue is
summarized in https://github.com/pytorch/pytorch/issues/111384.

Instead, the new hierarchy proposed by consolidating
existing batch norm ops will look like:

```
aten.batch_norm ->
aten.batch_norm_with_update ->
[ _batch_norm_cpu (kernel) ] OR
[ _batch_norm_cuda (kernel) ] OR
[ cudnn_batch_norm (kernel) ] OR
[ miopen_batch_norm (kernel) ]
```

The new op `batch_norm_with_update` hides backend
implementation details and automatically picks the right
kernel based on what is installed. This commit also adds
the following variants to this op:

```
batch_norm_with_update_functional
batch_norm_with_update.out
batch_norm_no_update
batch_norm_no_update.out
batch_norm_backward
```

Note that this commit only adds this op and its variants,
but does not actually change the decomps to produce these
ops in the graph. This will be done after the 2 week FC
window, and the ops used in the old stack is planned to
be removed after the 6 month BC window.

Test Plan: `OpInfo` tests for `batch_norm_with_update`.

Reviewers: albanD, bdhirsh

Subscribers: albanD, bdhirsh, supriyar

Tasks: https://github.com/pytorch/pytorch/issues/111384

Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2024-03-08 15:07:15 +00:00
Shunting Zhang
7dc1ab8989 make dyanmo work with _LazyGraphModule.lazy_forward (#121259)
Fix https://github.com/pytorch/pytorch/issues/121198 .

We previously already trigger the real recompilation for LazyGraphModule when it runs thru dynamo context. But people may pass in LazyGraphModule._lazy_forward rather than the LazyGraphModule instance itself. This PR handles that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121259
Approved by: https://github.com/williamwen42, https://github.com/jansel
2024-03-08 01:37:39 +00:00
Animesh Jain
04a5d6e8d3 [dynamo][guards] Use lazy variable tracker for func defaults (#121388)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121388
Approved by: https://github.com/jansel
2024-03-08 01:10:46 +00:00
Ivan Zaitsev
5d8e4126b6 Fixup test_trace_rules (#121351)
Summary:
Fixes
https://www.internalfb.com/intern/testinfra/diagnostics/7599824578133672.281475099376195.1709732674/

(for some reason this test didn't run in OSS)?

Reached out to Yanbo Liang for additional context:
 {F1465435684}

Test Plan:
Local:
https://www.internalfb.com/intern/testinfra/testconsole/testrun/16325548673376150/

Differential Revision: D54605075

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121351
Approved by: https://github.com/malfet, https://github.com/yanboliang
2024-03-08 00:50:45 +00:00
Jane Xu
24821fec26 Add RAdam capturable API for forloop (#121260)
Implementation thanks to @MarouaneMaatouk in https://github.com/pytorch/pytorch/pull/118697, though I've since cleaned it up a lot to save perf on the rect < 5 eager case. It also just looks better now :) Added tests and the cudagraph health check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121260
Approved by: https://github.com/mlazos
2024-03-08 00:00:30 +00:00
Dheeraj Peri
b1657beac1 feat: Add min, max ranges to mark_dynamic API (#119737)
Fixes https://github.com/pytorch/pytorch/issues/115137

This PR adds:

- mark_dynamic API will accept `min`, `max` values to create a bounded constraint on the dim.
- test case in test_misc.py which checks if `ConstraintViolationError` is triggered if `torch.compile` gets a input dimension out of bounds.

Co-authored-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119737
Approved by: https://github.com/ezyang, https://github.com/jansel
2024-03-07 23:26:03 +00:00
William Wen
d14d62b7aa [dynamo] add more refleak tests (#120657)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120657
Approved by: https://github.com/jansel
2024-03-07 22:25:43 +00:00
Joel Schlosser
ea8f6e2e54 Subclass view fake-ification via reified ViewFuncs (#118405)
This PR:
* Uses reified ViewFuncs to swap in fake tensors / symbolic SymInts for view replay during subclass view fake-ification
* Enables automatic dynamic on view bases -> fakeifies according to the resultant symbolic context instead of the old "all-static" approach
* Covers the following view types:
    * subclass -> dense
    * dense -> subclass
    * subclass -> subclass
* Dense -> dense views are handled the old way via an `as_strided()` call, as it's likely there is no view func available

Differential Revision: [D54269082](https://our.internmc.facebook.com/intern/diff/D54269082)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118405
Approved by: https://github.com/ezyang
2024-03-07 19:56:16 +00:00
IvanKobzarev
9a45001905 [dynamo] relax missing symbols runtime assert (#121339)
Differential Revision: [D54603361](https://our.internmc.facebook.com/intern/diff/D54603361)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121339
Approved by: https://github.com/ezyang
2024-03-07 14:53:38 +00:00
mingfeima
b3065f6899 add int8 packed gemm support on CPU device (#118056)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118056
Approved by: https://github.com/mikekgfb
2024-03-07 08:41:43 +00:00
laith sakka
eb4d87f237 graph break on sparse tensors constructions (#120458)
Fix some tests in https://github.com/pytorch/pytorch/issues/119780
sparse_bsc_tensor is not supported
https://github.com/pytorch/pytorch/pull/117907

Also more about the issue here.
https://docs.google.com/document/d/1EIb4qG88-SjVFn5TloLERliYdxIu2hwYoAA8skjOVfo/edit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120458
Approved by: https://github.com/ezyang
2024-03-07 02:17:41 +00:00
Yichen Yan
e50ded03a6 Use type check for also is_not (#113859)
Handle `is_not` for:

9647a251cb/torch/_dynamo/variables/builtin.py (L1314-L1317)

I noticed https://github.com/pytorch/pytorch/issues/111713 exists, I think it's no harm to land this first.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113859
Approved by: https://github.com/Skylion007
2024-03-06 23:12:42 +00:00
Yifu Wang
d7a5e59647 [dynamo] support group=None when rewriting collectives (#121043)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121043
Approved by: https://github.com/awgu
2024-03-06 21:37:19 +00:00
Michael Lazos
c5ef4df274 guard on grads being None in compiled optimizers (#121291)
Fixes #115607

We were missing guards when the grads were set to `None`. So if we compiled the optimizer with grads set to their proper value, and then with the grads set to `None` we'd continuously run the `None` version because all of the guards would pass and it would be ordered before the correct version in the cache.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121291
Approved by: https://github.com/Skylion007, https://github.com/anijain2305
2024-03-06 18:33:23 +00:00
PyTorch MergeBot
b529c19bdf Revert "Batch Norm Consolidation (#116092)"
This reverts commit 5680f565d5.

Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/jeffdaily due to broke ROCm, PR signal was clean but trunk was not, the merge should have been blocked but wasn't ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1981373237))
2024-03-06 17:10:01 +00:00
Guilherme Leobas
54d92f2e37 Add jacrev support in torch.compile (#121146)
Changes are simple. Moved a few entries on trace_rules.py and included tests to compare the graph generated by jacrev

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121146
Approved by: https://github.com/zou3519
2024-03-06 16:05:33 +00:00
Yukio Siraichi
aa0b0944d5 [dynamo] Re-dispatch torch.Tensor.new into torch.Tensor.new_empty method. (#121075)
Fix: https://github.com/pytorch/xla/issues/6009

This PR adds another case to `TensorVariable.method_new` special case, where it
re-dispatches `new` into `new_empty`.

Since we are using fake tensors, the `new` call doesn't actually gets to the corresponding
backend (e.g. XLA). So, things like the following might happen:

```python
@torch.compile(backend="openxla")
def foo(x):
    new_x = x.new(*x.size())

    # new_x.device() == "xla"
    # x.device() == "xla:0"

    return new_x + x

a = torch.arange(10)
foo(a.to(xm.xla_device()))
```

Resulting in the following error:

```python
Traceback (most recent call last):
  ...
  File "torch/_dynamo/utils.py", line 1654, in get_fake_value
    ret_val = wrap_fake_exception(
  File "torch/_dynamo/utils.py", line 1190, in wrap_fake_exception
    return fn()
  File "torch/_dynamo/utils.py", line 1655, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "torch/_dynamo/utils.py", line 1776, in run_node
    raise RuntimeError(make_error_message(e)).with_traceback(
  File "torch/_dynamo/utils.py", line 1758, in run_node
    return node.target(*args, **kwargs)
  File "torch/utils/_stats.py", line 20, in wrapper
    return fn(*args, **kwargs)
  File "torch/_subclasses/fake_tensor.py", line 885, in __torch_dispatch__
    return self.dispatch(func, types, args, kwargs)
  File "torch/_subclasses/fake_tensor.py", line 1224, in dispatch
    return self._cached_dispatch_impl(func, types, args, kwargs)
  File "torch/_subclasses/fake_tensor.py", line 955, in _cached_dispatch_impl
    output = self._dispatch_impl(func, types, args, kwargs)
  File "torch/_subclasses/fake_tensor.py", line 1445, in _dispatch_impl
    return self.wrap_meta_outputs_with_default_device_logic(
  File "torch/_subclasses/fake_tensor.py", line 1575, in wrap_meta_outputs_with_default_device_logic
    return tree_map(wrap, r)
  File "torch/utils/_pytree.py", line 900, in tree_map
    return treespec.unflatten(map(func, *flat_args))
  File "torch/utils/_pytree.py", line 736, in unflatten
    leaves = list(leaves)
  File "torch/_subclasses/fake_tensor.py", line 1550, in wrap
    ) = FakeTensor._find_common_device(func, flat_args)
  File "torch/_subclasses/fake_tensor.py", line 625, in _find_common_device
    merge_devices(arg)
  File "torch/_subclasses/fake_tensor.py", line 620, in merge_devices
    raise RuntimeError(
torch._dynamo.exc.TorchRuntimeError: Failed running call_function <built-in function add>(*(FakeTensor(..., device='xla', size=(10,), dtype=torch.int64), FakeTensor(..., device='xla:0', size=(10,), dtype=torch.int64)), **{}):
Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices xla, xla:0
```

Using `new_empty`, instead, fixes this error because it uses the device from the source
tensor, instead of inferring from the current dispatch key set.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121075
Approved by: https://github.com/jansel
2024-03-06 11:49:27 +00:00
Tugsbayasgalan Manlaibaatar
5680f565d5 Batch Norm Consolidation (#116092)
**Summary:**

This commit simplifies the existing decomposition hierarchy
of batch norm ops by adding a single, backend agnostic op:
`batch_norm_with_update`. The existing hierarchy looks like:

```
aten.batch_norm ->
aten._batch_norm_impl_index ->
[
  aten.native_batch_norm ->
  aten._native_batch_norm_legit (export only) ->
  _batch_norm_legit_cpu/cuda (kernels, export only) ->
  _batch_norm_cpu/cuda (kernels)
] OR
[ aten.cudnn_batch_norm ] OR
[ aten.miopen_batch_norm ]
```

Aside from complexity, an important problem with the
above decomposition hierarchy is cuda numerics in
export flows. We observed significantly worse convergence
when training a mobilenetv2-like model when using the
`_batch_norm_cuda` kernel instead of the `cudnn_batch_norm`
kernel. This means users who export their models on CPU
first then move the models to cuda later may silently
see worse accuracies even when cudnn is installed,
because they are using the worse kernel. This issue is
summarized in https://github.com/pytorch/pytorch/issues/111384.

Instead, the new hierarchy proposed by consolidating
existing batch norm ops will look like:

```
aten.batch_norm ->
aten.batch_norm_with_update ->
[ _batch_norm_cpu (kernel) ] OR
[ _batch_norm_cuda (kernel) ] OR
[ cudnn_batch_norm (kernel) ] OR
[ miopen_batch_norm (kernel) ]
```

The new op `batch_norm_with_update` hides backend
implementation details and automatically picks the right
kernel based on what is installed. This commit also adds
the following variants to this op:

```
batch_norm_with_update_functional
batch_norm_with_update.out
batch_norm_no_update
batch_norm_no_update.out
batch_norm_backward
```

Note that this commit only adds this op and its variants,
but does not actually change the decomps to produce these
ops in the graph. This will be done after the 2 week FC
window, and the ops used in the old stack is planned to
be removed after the 6 month BC window.

Test Plan: `OpInfo` tests for `batch_norm_with_update`.

Reviewers: albanD, bdhirsh

Subscribers: albanD, bdhirsh, supriyar

Tasks: https://github.com/pytorch/pytorch/issues/111384

Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2024-03-06 04:50:46 +00:00
Joel Schlosser
dad1b76584 Introduce EphemeralSource for symbols that should be simplified out (#120948)
Context: view fake-ification should handle closed-over state in ViewFuncs for use in view replay by:
* fake-ifying tensors
* symbolicizing SymInts

This avoids invalid specialization during view replay. However, the symbols / tensors created as intermediates in the view chain should not stick around or be guarded on. This PR introduces an `EphemeralSource` intended to be used as a source for this purpose. It has the following properties:
* Considered first to be simplified out in symbol simplification logic
* Errors if guarded on

Differential Revision: [D54561597](https://our.internmc.facebook.com/intern/diff/D54561597)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120948
Approved by: https://github.com/ezyang
2024-03-06 02:30:52 +00:00
Zhengxu Chen
8aeb247a3d [export] Remove WrapperModule. (#121042)
Summary: WrapperModule seems a good idea but may introduce some surprising behavior to users, for example, it never registers enclosed modules as submodules and therefore it's unclear that's the state dict for the exported program should look like, because some people may argue to include every state in state dict but others want to keep them as constants.

Test Plan: CI

Reviewed By: tugsbayasgalan

Differential Revision: D54326331

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121042
Approved by: https://github.com/angelayi
2024-03-05 18:10:22 +00:00
Jason Ansel
35004b8ab4 [dynamo] Fix handling of invalid args (#121110)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121110
Approved by: https://github.com/yanboliang
ghstack dependencies: #121106
2024-03-05 17:16:04 +00:00
Jason Ansel
4f19b5f7ef [dynamo] Remove extra guard for tensor constant attrs (#121106)
Also deletes some unused code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121106
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2024-03-05 17:16:04 +00:00
Yanbo Liang
46c9d646dd [Dynamo] Fix inspect.getattr_static doesn't work well for torch.utils._cxx_pytree.PyTreeSpec (#120812)
Fixes #118793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120812
Approved by: https://github.com/zou3519
2024-03-05 09:05:26 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
a15c02562a Fix dynamo failure (#121167)
Summary: Title

Test Plan: CI

Differential Revision: D54509198

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121167
Approved by: https://github.com/izaitsevfb
2024-03-05 03:19:59 +00:00
Aaron Gokaslan
9deaa2e812 [BE]: FURB187 Use inplace reverse on lists: faster, more readable. (#121140)
Use `reverse()` method as it's faster and inplace.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121140
Approved by: https://github.com/albanD
2024-03-05 01:36:17 +00:00
chilli
83c312990f Add missing newline to repro and some utility thing in repro (#121051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121051
Approved by: https://github.com/ezyang, https://github.com/shunting314, https://github.com/eellison
2024-03-04 22:52:54 +00:00
angelayi
4cdc2d7096 [dynamo] Remove expected dynamo test failures (#120836)
Fixes some of the tests in #120643

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120836
Approved by: https://github.com/zou3519
2024-03-04 20:41:49 +00:00
PyTorch MergeBot
a98c17edc7 Revert "add int8 packed gemm support on CPU device (#118056)"
This reverts commit f84375ca5d.

Reverted https://github.com/pytorch/pytorch/pull/118056 on behalf of https://github.com/izaitsevfb due to breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/118056#issuecomment-1977368720))
2024-03-04 20:09:40 +00:00
rzou
3ef0befdc9 Better error messages for impl_abstract_pystub (#120959)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120959
Approved by: https://github.com/drisspg
2024-03-04 15:24:36 +00:00
Pearu Peterson
ce2903080c Add sparse compressed fake tensor support (#120920)
As in the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120920
Approved by: https://github.com/ezyang
2024-03-04 14:38:45 +00:00
Animesh Jain
b7f2522692 [dynamo][compile-time] Remove unnecessary tree_map_only (#121052)
Reduces the torch.compile(backend="eager") for this code by 1-2 seconds.

~~~
def fn(x):
    for _ in range(10000):
        # x = torch.sin(x)
        x = torch.ops.aten.sin(x)
        # x = sin(x)

    return x
~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121052
Approved by: https://github.com/jansel
ghstack dependencies: #121053
2024-03-03 06:59:43 +00:00
Jason Ansel
0e0a621e0c [dynamo] Minor refactors (#120966)
These are changes I pulled out of the above PRs due to not being
related.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120966
Approved by: https://github.com/yanboliang
2024-03-03 02:20:48 +00:00
Animesh Jain
8e4301077e [dynamo][comp-time] BuiltinVariableTracker - inspect signature only on failure (#121053)
Reduces the torch.compile(backend="eager") for this code by 1-2 seconds.
~~~
def fn(x):
    for _ in range(10000):
        # x = torch.sin(x)
        x = torch.ops.aten.sin(x)
        # x = sin(x)

    return x
~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121053
Approved by: https://github.com/jansel
2024-03-02 23:03:00 +00:00
mingfeima
f84375ca5d add int8 packed gemm support on CPU device (#118056)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118056
Approved by: https://github.com/mikekgfb
ghstack dependencies: #117475
2024-03-02 04:35:49 +00:00
Nikita Shulga
a3b81666b1 [Dynamo] Fix guards for code objects (#120909)
By comparing them only by id, and raising an assert if someone calls into `EQUALS_MATCH`
Which render following example compileable:
```python
import torch

@torch.compile()
def foo(x, y):
    code = compile(y, "foo", "exec")
    exec(y)
    return x

print(foo(torch.rand(3), "print('Hello World')"))
```

Fixes https://github.com/pytorch/pytorch/issues/120647

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120909
Approved by: https://github.com/jansel
2024-03-02 02:17:17 +00:00
PyTorch MergeBot
3d7cf8f392 Revert "Limit loop unrolling (#120023)"
This reverts commit 6cc7f9a2e6.

Reverted https://github.com/pytorch/pytorch/pull/120023 on behalf of https://github.com/anijain2305 due to breaks llms export ([comment](https://github.com/pytorch/pytorch/pull/120023#issuecomment-1974104633))
2024-03-02 00:04:08 +00:00
Tugsbayasgalan Manlaibaatar
f01a23d01b Don't aggressively rewrite asserts for symbolic expressions (#120564)
Fixes: https://github.com/pytorch/pytorch/issues/118417

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120564
Approved by: https://github.com/ezyang
2024-03-01 17:46:36 +00:00
angelayi
c844b377fa [dynamo] Reorder logs (#116106)
Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792.

Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600

There are some limitations to the printing right now:
* You can only register logging functions, not methods
* Inputs to the logging functions can only be tensors, constants, and format strings
* Inputs to the logging functions which will later be mutated in-place will not be printed correctly

TODO: Add the following tests
* print function with argument of nested data structure;
* print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly);
* custom defined logging functions with nn.Module or nn.Module attribute arguments;
* custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value);
* custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage);

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106
Approved by: https://github.com/yanboliang
2024-03-01 17:04:24 +00:00
Aaron Orenstein
8861507ba3 Fix guard for SUPPORTED_NODES (#120798)
The special-case code for handling SUPPORTED_NODES was producing a guard that looked like:
```
"G['torch'].utils._pytree.SUPPORTED_NODES[<class '__main__.CausalLMOutputWithPast'>].type"
```
resulting in a eval error trying to evaluate the guard.

This change adds a new source type (`ClassSource`) which is given a class type (in this case `CausalLMOutputWithPast`) and attempts to fetch it from its defining module.  It then uses that to build the `SUPPORTED_NODES` guards instead of referring to the type directly.

Also added a unit test which fails before this change and passes after.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120798
Approved by: https://github.com/anijain2305
2024-03-01 16:03:21 +00:00
Tugsbayasgalan Manlaibaatar
c646030cd2 Support higher order op functionalization in predispatch IR (#115314)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115314
Approved by: https://github.com/bdhirsh
2024-03-01 09:13:47 +00:00
PyTorch MergeBot
63b259492a Revert "[dynamo] Reorder logs (#116106)"
This reverts commit c5472628ff.

Reverted https://github.com/pytorch/pytorch/pull/116106 on behalf of https://github.com/clee2000 due to landrace with 342e7929b8, which removed the import for warnings.  Should be an easy fix after rebase c5472628ff ([comment](https://github.com/pytorch/pytorch/pull/116106#issuecomment-1972586180))
2024-03-01 06:25:46 +00:00
Angela Yi
c5472628ff [dynamo] Reorder logs (#116106)
Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792.

Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600

There are some limitations to the printing right now:
* You can only register logging functions, not methods
* Inputs to the logging functions can only be tensors, constants, and format strings
* Inputs to the logging functions which will later be mutated in-place will not be printed correctly

TODO: Add the following tests
* print function with argument of nested data structure;
* print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly);
* custom defined logging functions with nn.Module or nn.Module attribute arguments;
* custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value);
* custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage);

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106
Approved by: https://github.com/yanboliang
2024-03-01 04:48:44 +00:00
PyTorch MergeBot
65d568680c Revert "[Dynamo] Fix inspect.getattr_static doesn't work well for torch.utils._cxx_pytree.PyTreeSpec (#120812)"
This reverts commit 1104e0798c.

Reverted https://github.com/pytorch/pytorch/pull/120812 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the XLA failure test_simple_model look legit 1104e0798c ([comment](https://github.com/pytorch/pytorch/pull/120812#issuecomment-1972460001))
2024-03-01 03:53:27 +00:00
Oguz Ulgen
b35551f357 Ban reset_to_zero argument to triton.autotune in user defined kernels (#120938)
Fixes #120802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120938
Approved by: https://github.com/chenyang78, https://github.com/jansel
2024-03-01 02:37:24 +00:00
Jason Ansel
e3dbd194f4 [dynamo] Support module backwards hooks (#120685)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120685
Approved by: https://github.com/yanboliang, https://github.com/xmfan
2024-03-01 02:24:26 +00:00
Guilherme Leobas
fd35aafc26 Teach dynamo about vjp (#119405)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119405
Approved by: https://github.com/zou3519
ghstack dependencies: #118407
2024-03-01 00:21:10 +00:00
PyTorch MergeBot
33da8d5c12 Revert "Fix guard for SUPPORTED_NODES (#120798)"
This reverts commit 1b8bb027f6.

Reverted https://github.com/pytorch/pytorch/pull/120798 on behalf of https://github.com/kit1980 due to the new test fails internally, see D54343456 ([comment](https://github.com/pytorch/pytorch/pull/120798#issuecomment-1972134227))
2024-02-29 23:19:22 +00:00
Elias Ellison
d03b11ad5b Pass inductor strides forward in ddp optimizer (#120523)
# Note: Returning Fake Tensors on First AOT Autograd Call
            #
            # Inductor will optimize strides of outputs when it deems it profitable.
            # For instance, converting to channels last. When we split the graph here
            # into multiple inductor compilations, we need to make sure that the
            # output strides of one compilation is appropriately passed to the subsequent
            # compilations. However, the mapping from inductor output to dynamo output
            # is non-trivial due to aot_autograd's deduping, de-aliasing, mutation, re-writing,
            # subclass handling, etc. In order to replay all this logic we set a flag such that
            # the first invocation of inductor in aot_autograd will return Fake Tensors with
            # appropriate strides. Then, all of aot autograd's runtime logic is replayed.
            # This gives us the appropriately strided outputs here which will reflect runtime strides.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120523
Approved by: https://github.com/yf225, https://github.com/bdhirsh
2024-02-29 22:25:00 +00:00
Matthias Reso
772db2a3ae Fix handling of torch.return_types in dynamo (#120826)
Handle quasi-namedtuples as a special case in dynamo

Fixes #120651

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120826
Approved by: https://github.com/anijain2305
2024-02-29 22:11:35 +00:00
Avik Chaudhuri
f7a809c96a fix dupe deprecated warning in dynamo export (#120896)
Summary:
When we convert `dynamic_shapes` to `constraints` and pass them to `_dynamo.export`, we shouldn't give a deprecation warning. Such conversion happens when calling `torch.export.export`, e.g. But it can also happen when calling `capture_pre_autograd_graph` (which itself has this deprecation warning when `constraints` are passed directly as well).

Since `_log_export_usage` is an indicator of a top-level call (it is `True` by default but set to `False`, or at least passed through, by callers), we can (ab)use it to indicate when to give this deprecation warning.

Test Plan: none

Differential Revision: D54350172

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120896
Approved by: https://github.com/BoyuanFeng, https://github.com/zhxchen17
2024-02-29 18:57:42 +00:00
Yanbo Liang
1104e0798c [Dynamo] Fix inspect.getattr_static doesn't work well for torch.utils._cxx_pytree.PyTreeSpec (#120812)
Fixes #118793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120812
Approved by: https://github.com/zou3519
2024-02-29 18:19:14 +00:00
Catherine Lee
9e016debeb [dynamo] Fix inference_mode context variable (#120830)
<idk what im doing>
Fixes #120646

The module for torch.inference_mode should be torch

The input to `create` is a bool (mode?) and `_enter_inference_mode` expects a bool but [BlockStackEntry](50073248ed/torch/_dynamo/symbolic_convert.py (L206)) expects `target_values` to be a list?
[inference_mode](50073248ed/torch/autograd/grad_mode.py (L205))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120830
Approved by: https://github.com/zou3519, https://github.com/anijain2305, https://github.com/tugsbayasgalan
2024-02-29 17:10:06 +00:00
cpuhrsch
576c0482a5 Remove hard numpy dependency from guards.py (#119519)
I'm not sure if this is the ideal behavior / best fix for this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119519
Approved by: https://github.com/albanD
2024-02-29 14:37:33 +00:00
Nikita Shulga
14c5ebc8a1 [Dynamo] Do not attempt to make nditer spawned arrays writable (#120868)
As they are not, converting `numpy.nditer` to writable is too expensive and  tensor values are copied anyway

Minimal reproducer:
```python
import numpy as np
import torch

@torch.compile
def f(x):
    return x + 1.0

for x in np.nditer(np.arange(3)):
    print(f(x))
```

Fixes https://github.com/pytorch/pytorch/issues/119787

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120868
Approved by: https://github.com/jansel
2024-02-29 07:49:59 +00:00
Yanbo Liang
169c220bf8 [torch.compile] Provide capability to register callback on compile start/stop (#120764)
This is a requirement from Meta internal cases, where ppl wants to register a callback function to detect if a job is stuck during compilation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120764
Approved by: https://github.com/jansel
2024-02-29 07:37:52 +00:00
Animesh Jain
66d05a8900 [dynamo] Fix source for default dict default_factory (#120864)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120864
Approved by: https://github.com/yanboliang, https://github.com/Skylion007, https://github.com/jansel
2024-02-29 07:25:13 +00:00
Oleg Khabinov
4b18ab869f [torch.export] Support is_compiling() flag for non-strict mode (#119602)
Summary: In non-strict mode of torch.export() we didn't set those `is_compiling()` to `True` which is needed by some models.

Test Plan: Unit tests and manual testing.

Differential Revision: D53624452

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119602
Approved by: https://github.com/suo
2024-02-29 05:52:51 +00:00
youkaichao
2c0c70f763 [Dynamo] enumerate imported names for eval_frame.py (#120778)
Fixes https://github.com/pytorch/pytorch/issues/120699 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120778
Approved by: https://github.com/Skylion007
2024-02-29 03:08:43 +00:00
PyTorch MergeBot
db1cc781db Revert "[dynamo] Function => FunctionCtx for placeholder obj (#120577)"
This reverts commit ee01d0807b.

Reverted https://github.com/pytorch/pytorch/pull/120577 on behalf of https://github.com/jansel due to Causing breakages internally ([comment](https://github.com/pytorch/pytorch/pull/120577#issuecomment-1970254363))
2024-02-29 01:56:09 +00:00
Aaron Orenstein
1b8bb027f6 Fix guard for SUPPORTED_NODES (#120798)
The special-case code for handling SUPPORTED_NODES was producing a guard that looked like:
```
"G['torch'].utils._pytree.SUPPORTED_NODES[<class '__main__.CausalLMOutputWithPast'>].type"
```
resulting in a eval error trying to evaluate the guard.

This change adds a new source type (`ClassSource`) which is given a class type (in this case `CausalLMOutputWithPast`) and attempts to fetch it from its defining module.  It then uses that to build the `SUPPORTED_NODES` guards instead of referring to the type directly.

Also added a unit test which fails before this change and passes after.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120798
Approved by: https://github.com/anijain2305
2024-02-28 23:34:17 +00:00
Elias Ellison
9c9bde515c Factor out Submod compilers (#120527)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120527
Approved by: https://github.com/kadeng
2024-02-28 22:11:47 +00:00
Jason Ansel
01ec8df6d8 [Compiled Autograd] Introduce BackwardState capture (#120382)
This adds support for backwards hooks that are *both*:
1) Interior to the graph; and
2) Dynamically generated (e.g. lambdas)

We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo *after* the forwards runs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382
Approved by: https://github.com/xmfan
2024-02-28 20:36:47 +00:00
Guilherme Leobas
491c2b4665 Let torch dynamo inline torch.func.grad (#118407)
When dynamo sees torch.func.grad, it tries to inline all frames related
to.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118407
Approved by: https://github.com/zou3519
2024-02-28 20:05:00 +00:00
Avik Chaudhuri
5472923998 derived dim (#118729)
With the current `Dim`-based dynamic shapes API for export, one can express that shapes of different input shapes must be equal by reusing the same `Dim`. However, non-trivial relationships between such input shapes cannot be expressed.

Recently we are seeing more and more examples of code that require this additional expressibility, e.g., where a pair of shapes might differ by one, or a shape might be double another (or simply even).

This PR introduces the concept of a "derived" `Dim`, i.e., a linear arithmetic expression over a `Dim`. By using a combination of `Dim`s and derived `Dim`s to specify input shapes, the desired relationships can be expressed naturally. E.g., a pair of shapes might be `dim` and `dim + 1`, or `dim` and `2*dim`, or even `2*dim` and `dim + 1`.

We extend the current infrastructure that translates `Dim`s to deprecated `dynamic_dim`-based constraints to work with derived `Dim`s. As usual, we raise constraint violation errors when shape guards cannot be verified given a dynamic shapes spec; suggest fixes; and raise runtime errors when future inputs violate the spec.

Importantly, some guards that used to cause forced specializations in the constraint solver because they were deemed "too complex" now do not do so, because they can now be specified as constraints. Since this was what motivated the introduction of a `disable_constraint_solver` flag to some internal APIs, we may not need that flag any more.

Note that shapes of placeholders in exported programs can now contain symbolic expressions and not just symbols.

Differential Revision: D53254587

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118729
Approved by: https://github.com/ezyang
2024-02-28 19:48:32 +00:00
Yu, Guangye
46e3f670b4 refactor code to share across different devices (#120602)
# Motivation
Refactor utils code to make it possible to share across CUDA, XPU, and other backends.

# Solution
Move `_dummy_type` and `_LazySeedTracker` to torch._utils;

# Additional Context
When upstreaming, refactor these code changes by isolating them into in an additional PR to minimize their impact on the CUDA code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120602
Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang
2024-02-28 09:42:58 +00:00
Animesh Jain
e9a961f66a [dynamo][refactor] Use originating_source for HASATTR (#120723)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120723
Approved by: https://github.com/jansel
ghstack dependencies: #120520, #120590, #120721
2024-02-28 05:00:59 +00:00
Animesh Jain
5a53c0ff23 [dynamo][refactor] Rename LIST_LENGTH to SEQUENCE_LENGTH, separate DICT_LENGTH (#120721)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120721
Approved by: https://github.com/jansel
ghstack dependencies: #120520, #120590
2024-02-28 02:19:10 +00:00
Edward Z. Yang
1a1fc1047d Add structured trace logs (#120289)
Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit

How to read the diff:
* Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes)
* torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs
* torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines.
* torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log.
* test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable.

https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289
Approved by: https://github.com/Skylion007
ghstack dependencies: #120712
2024-02-28 01:01:41 +00:00
Edward Z. Yang
213b3ac3f2 [BE] fail_* variables don't need to be shared across restarts, they're set only once (#120712)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120712
Approved by: https://github.com/yanboliang
2024-02-28 00:48:11 +00:00
Aaron Orenstein
6cc7f9a2e6 Limit loop unrolling (#120023)
Tacotron2 causes massive loop unrolling resulting in very large graphs (26k nodes) which was causing inductor (and tracing itself) to choke.

The unrolling size is controlled by the environment variable TORCHDYNAMO_MAX_LOOP_UNROLL_NODES which defaults to the arbitrary value 5000.

This updates the tacotron2 timings as follows:
eager timing: 3m:23s -> 35s
aot_eager timing: 4m:12s -> 39s
inductor timing: 22m:24s ->1m

For reference the big loop in tacotron2 was this one (model.py[405]):
```
        while len(mel_outputs) < decoder_inputs.size(0) - 1:
            decoder_input = decoder_inputs[len(mel_outputs)]
            mel_output, gate_output, attention_weights = self.decode(decoder_input)
            mel_outputs += [mel_output.squeeze(1)]
            gate_outputs += [gate_output.squeeze(1)]
            alignments += [attention_weights]
```
which gets unrolled and inlined adding about 36 nodes to the graph per iteration.

Fixes #98467
Relates to #102839 which hopefully will result in a better fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120023
Approved by: https://github.com/yanboliang
2024-02-27 20:44:21 +00:00
PyTorch MergeBot
f3dd2a544c Revert "Add structured trace logs (#120289)"
This reverts commit 9dfaef962c.

Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))
2024-02-27 19:49:05 +00:00
Jason Ansel
5b5c167adc [dynamo] Add some helpers to PyCodegen (#120684)
This are used in later PRs in the stack

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120684
Approved by: https://github.com/yanboliang
2024-02-27 18:46:51 +00:00
Kai Londenberg
2ad66e6bf0 Fix test failure: Add torch.cuda._get_device_properties to dynamo trace rules (#120620)
In this PR stack, there were unrelated test failures within test_trace_rules.py - It turned out that torch.cuda._get_device_properties should be registered in _dynamoc/trace_rules.py. A test failed because it was not.

This is a small fix which tries to get rid of the test failure by manually registering that function.

Note:
I am not sure whether this is the best way to fix this, as I am neither familiar with the trace rules nor with the introduction of torch.cuda._get_device_properties.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120620
Approved by: https://github.com/Skylion007
2024-02-27 10:46:01 +00:00
Animesh Jain
e3d64c4d5d [dynamo] Desugar accumulate_grad, fix .grad handling (#120590)
Fixes https://github.com/pytorch/pytorch/issues/118435
Fixes https://github.com/pytorch/pytorch/issues/119906

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120590
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #120520
2024-02-27 10:12:26 +00:00
Animesh Jain
8a59f49da2 [dynamo][compile-time] Collect guard debug stack info only with logs enabled (#120520)
Reduces backend=eager compile time from 33 to 19 seconds for `MobileBertForQuestionAnswering`. This also helps an internal model where guards.add function is taking 124 seconds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120520
Approved by: https://github.com/mlazos
2024-02-27 01:51:16 +00:00
PyTorch MergeBot
17560eb472 Revert "[Dynamo] Remove deadcode: unwrapping script_if_tracing (#120444)"
This reverts commit 4d2073bc3f.

Reverted https://github.com/pytorch/pytorch/pull/120444 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54192376 ([comment](https://github.com/pytorch/pytorch/pull/120444#issuecomment-1965600268))
2024-02-27 00:58:00 +00:00
Sergii Dymchenko
d341b66e96 Revert [dynamo] support group=None when rewriting collectives (#12018) (#120677)
This reverts commit 298c686d3f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120677
Approved by: https://github.com/yifuwang, https://github.com/huydhn
2024-02-27 00:33:35 +00:00
Edward Z. Yang
9dfaef962c Add structured trace logs (#120289)
Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit

How to read the diff:
* Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes)
* torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs
* torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines.
* torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log.
* test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable.

https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs.

Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289
Approved by: https://github.com/Skylion007
2024-02-27 00:04:23 +00:00
Yanbo Liang
5a0a964444 [Dynamo] Fix guards for script_if_tracing or lru_cache fn with default args (#120390)
Fixes #120387

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120390
Approved by: https://github.com/anijain2305
2024-02-26 19:40:14 +00:00
Edward Z. Yang
fd3cf88f27 Rewrite docs about why we guard on dynamic dims (#120566)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120566
Approved by: https://github.com/desertfire
2024-02-26 18:58:30 +00:00
angelayi
759204253f [export] Change runtime asserts to using assert_scalar (#119608)
By changing runtime symbolic asserts to using assert_scalar, the asserts can call into `expect_true` and modify the shape env so that we can run through the traced graph module with fake tensors. With assert_async, the asserts only get hit during runtime, but that means if we run the graph module with fake tensors, the asserts will not affect the shape env, so later data dependent calls to the fake tensors may result in GuardOnDataDependentSymNode errors.

https://github.com/pytorch/pytorch/issues/119587

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119608
Approved by: https://github.com/ezyang
2024-02-26 17:56:12 +00:00
Sam Larsen
2fb32a5f07 Enable fake tensor caching in fbcode by default (#118555)
Summary: Enabled by default in OSS; this switches the default to "on" in fbcode too.

Test Plan: Ran torchbench benchmarks in fbcode

Differential Revision: [D53771626](https://our.internmc.facebook.com/intern/diff/D53771626)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118555
Approved by: https://github.com/eellison
2024-02-26 17:35:23 +00:00
Jason Ansel
ee01d0807b [dynamo] Function => FunctionCtx for placeholder obj (#120577)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120577
Approved by: https://github.com/yanboliang
2024-02-26 17:16:31 +00:00
angelayi
86063b4d03 Add torch._print to dynamo trace_rules (#120533)
Fixes #114831

Before:
```
(pytorch10) angelayi@devgpu022 ~/local/pytorch [main] $  python test/dynamo/test_trace_rules.py -k test_torch_name_rule_map_updated
F
======================================================================
FAIL: test_torch_name_rule_map_updated (__main__.TraceRuleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/users/angelayi/pytorch/torch/testing/_internal/common_utils.py", line 2739, in wrapper
    method(*args, **kwargs)
  File "/data/users/angelayi/pytorch/test/dynamo/test_trace_rules.py", line 328, in test_torch_name_rule_map_updated
    self._check_set_equality(
  File "/data/users/angelayi/pytorch/test/dynamo/test_trace_rules.py", line 302, in _check_set_equality
    self.assertTrue(len(x) == 0, msg1)
AssertionError: False is not true : New torch objects: {<built-in method _print of type object at 0x7ff477e40ee0>} were not added to `trace_rules.torch_c_binding_in_graph_functions` or `test_trace_rules.ignored_c_binding_in_graph_function_names`. Refer the instruction in `torch/_dynamo/trace_rules.py` for more details.

To execute this test, run the following from the base repo dir:
     python test/dynamo/test_trace_rules.py -k test_torch_name_rule_map_updated

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
Ran 1 test in 0.184s

FAILED (failures=1)
```
After change:
```
(pytorch10) angelayi@devgpu022 ~/local/pytorch [main] $  python test/dynamo/test_trace_rules.py -k test_torch_name_rule_map_updated
.
----------------------------------------------------------------------
Ran 1 test in 0.209s

OK
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120533
Approved by: https://github.com/clee2000, https://github.com/yanboliang, https://github.com/huydhn, https://github.com/Skylion007
2024-02-26 16:52:59 +00:00
Animesh Jain
c18623b7ed [dynamo] Reland 120147 - - Use EQUALS_MATCH guard for mod.training (#120578)
To fix Memory leak discovered in https://github.com/pytorch/pytorch/issues/112090

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120578
Approved by: https://github.com/jansel
2024-02-26 03:49:47 +00:00
Shan19900305
685d862c45 Add SparsePrivateUse1 in backend_to_string, layout_from_backend and check_base_legacy_new. (#119263)
1) Using items stored in torch._tensor_classes to check item passed from python side;
2) Add SparsePrivateUse1 in backend_to_string, layout_from_backend and check_base_legacy_new;
3) Using more general API to get python module name in get_storage_obj and get_name functions.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119263
Approved by: https://github.com/ezyang
2024-02-26 01:54:30 +00:00
Animesh Jain
834c7a1d3e [dynamo][refactor] Move some helper functions to global scope (#120426)
This is to prepare for guard C++ refactor work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120426
Approved by: https://github.com/ezyang
2024-02-25 04:38:20 +00:00
Yifu Wang
298c686d3f [dynamo] support group=None when rewriting collectives (#120118)
Resolves case 2 in #120082.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120118
Approved by: https://github.com/wconstab
ghstack dependencies: #120370
2024-02-25 03:12:10 +00:00
Edward Z. Yang
0f20cc1e0e Don't use size on TensorVariable when doing out resize test (#120567)
Fixes https://github.com/pytorch/pytorch/issues/120482
Fixes https://github.com/pytorch/pytorch/issues/120511

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120567
Approved by: https://github.com/Skylion007
2024-02-25 02:21:34 +00:00
Michael Lazos
56203fc407 Add profiling for backward (#120540)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120540
Approved by: https://github.com/anijain2305
2024-02-24 16:53:28 +00:00
Isuru Fernando
b7df3bba62 add decomposition for frexp (#119217)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119217
Approved by: https://github.com/peterbell10
ghstack dependencies: #119284, #120027
2024-02-23 21:52:42 +00:00
Yanbo Liang
4d2073bc3f [Dynamo] Remove deadcode: unwrapping script_if_tracing (#120444)
After the consolidated ```trace_rules.lookup```, we already unwrap at
2240018c03/torch/_dynamo/variables/builder.py (L712)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120444
Approved by: https://github.com/anijain2305
2024-02-23 21:22:09 +00:00
Edward Z. Yang
edf1c4e552 [Dynamo] Handle guard_size_oblivious in user code (#120379)
Fixes https://github.com/pytorch/pytorch/issues/120083

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120379
Approved by: https://github.com/yanboliang
2024-02-23 05:38:57 +00:00
PyTorch MergeBot
722afe6171 Revert "[dynamo] Use EQUALS_MATCH guard for mod.training (#120147)"
This reverts commit b642a18e80.

Reverted https://github.com/pytorch/pytorch/pull/120147 on behalf of https://github.com/williamwen42 due to memory leak, see https://github.com/pytorch/pytorch/issues/112090 ([comment](https://github.com/pytorch/pytorch/pull/120147#issuecomment-1960522018))
2024-02-22 23:46:55 +00:00
Thiago Crepaldi
3588e7f265 Ignore .numpy() under FakeTensorMode() (#120261)
Fixes #120259

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120261
Approved by: https://github.com/jansel
2024-02-22 22:49:20 +00:00
Yifu Wang
11e4a9266d Temporarily support ranks + tag as pg identifier in native funcol (#120226)
As communicated in https://github.com/pytorch/pytorch/issues/93173#issuecomment-1907095208, although we are dropping `(ranks, tag)` as group identifier in funcols, there will be a grace period for migration. This PR adds temporary `(ranks, tag)` support in native funcols. It also helps us decouple the py funcol -> native funcol transition from the API change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120226
Approved by: https://github.com/wanchaol, https://github.com/wconstab
ghstack dependencies: #120042, #120043, #120070
2024-02-22 20:24:16 +00:00
Yanbo Liang
03f7235caa [Dynamo] Fix dynamo trace rules (#120371)
```test_trace_rules.py``` is still failing due to this.

Fixes https://github.com/pytorch/pytorch/issues/114831
(Having this here will run the disabled test on the PR)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120371
Approved by: https://github.com/drisspg, https://github.com/huydhn
2024-02-22 14:32:00 +00:00
Bert Maher
0e4bd25a33 [inductor] When generating debug logs don't fail if nvcc not found (#120346)
If nvcc isn't found subprocess throws a CalledProcessError

Differential Revision: [D54028438](https://our.internmc.facebook.com/intern/diff/D54028438/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120346
Approved by: https://github.com/Skylion007, https://github.com/shunting314
2024-02-22 14:25:34 +00:00
PyTorch MergeBot
8fa6340701 Revert "Ignore .numpy() under FakeTensorMode() (#120261)"
This reverts commit 952b37145b.

Reverted https://github.com/pytorch/pytorch/pull/120261 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems breaking trunk on Python 3.12 952b37145b ([comment](https://github.com/pytorch/pytorch/pull/120261#issuecomment-1958267417))
2024-02-21 23:09:27 +00:00
Thiago Crepaldi
952b37145b Ignore .numpy() under FakeTensorMode() (#120261)
Fixes #120259

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120261
Approved by: https://github.com/jansel
2024-02-21 22:06:29 +00:00
Edward Z. Yang
9199468401 Properly trace into mark_static (#120232)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120232
Approved by: https://github.com/yanboliang
2024-02-21 13:51:31 +00:00
drisspg
d6801578c3 Update tracing rules for new cudnn functions (#120268)
# Summary
This updates the trace_rules with the new cudnn torch functions for sdpa

To repro:
`pytest test/dynamo/test_trace_rules.py -k test_torch_name_rule_map_updated`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120268
Approved by: https://github.com/shuqiangzhang, https://github.com/huydhn, https://github.com/yanboliang
2024-02-21 05:22:44 +00:00
Elias Ellison
96092e1f55 Extend aot_graph_input_parser to sym shapes (#120246)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120246
Approved by: https://github.com/shunting314
2024-02-20 23:24:45 +00:00
Yifu Wang
2d6c0cc81b Run test_functional_api.py with both legacy and native funcol impls (#119982)
Additional changes: tests in test_functional_api.py uses multi-threaded pg which is implemented in Python. For the native ops to call into the Python pg implementation, glue code in PyProcessGroup is required for each collective. This PR also adds a few pieces of previously missing glue code, which are necessary for running test_functional_api.py with native funcol.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119982
Approved by: https://github.com/wanchaol
2024-02-20 21:15:37 +00:00
Yanbo Liang
d42ede8ae4 [torch.compile] Log compilation start time for timeline view (#120220)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120220
Approved by: https://github.com/angelayi
2024-02-20 21:07:40 +00:00
Brian Hirsh
26fbbc3e84 DTensor + dynamo: fix is_shard/replicate always inlining to False (#118668)
Fixes an internal enablement bug. When dynamo traces `is_sharded`/`is_replicate`, it would unconditioanlly assume the result was False.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118668
Approved by: https://github.com/wconstab, https://github.com/wanchaol
ghstack dependencies: #117667, #117666, #118209, #118191, #118667
2024-02-20 15:23:48 +00:00
Jason Ansel
f2cf0768d1 [dynamo][distributed] handle _rank_not_in_group, _get_or_create_default_group (#119628)
Copy of #117692

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119628
Approved by: https://github.com/yanboliang
2024-02-18 22:34:35 +00:00
Animesh Jain
b642a18e80 [dynamo] Use EQUALS_MATCH guard for mod.training (#120147)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120147
Approved by: https://github.com/jansel
ghstack dependencies: #120132, #120140, #120145
2024-02-18 00:31:36 +00:00
Animesh Jain
0b11b0edd6 [dynamo][refactor] Use existing helper functions for CLOSURE_MATCH (#120145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120145
Approved by: https://github.com/jansel, https://github.com/Fidget-Spinner
ghstack dependencies: #120132, #120140
2024-02-18 00:31:36 +00:00
Jason Ansel
2fea475215 [dynamo] Refactor reconstruct() not to return anything (#120150)
This simplifies things slightly and avoids some bugs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120150
Approved by: https://github.com/yanboliang
2024-02-17 17:13:41 +00:00
Animesh Jain
757fc663a8 [dynamo][refactor] Use TYPE_MATCH instead of manually constructing guard (#120140)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120140
Approved by: https://github.com/jansel, https://github.com/yanboliang
ghstack dependencies: #120132
2024-02-17 16:03:36 +00:00
Animesh Jain
48d96c08f2 [dynamo][guards] Use EQUALS_MATCH for NAME_MATCH (#120132)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120132
Approved by: https://github.com/jansel, https://github.com/yanboliang
2024-02-17 16:03:36 +00:00
Shunting Zhang
becfda005e tiny improvement to the cprofile wrapper (#120100)
1. right now we double increment the profile counter. The PR avoid that so we don't end up with profile_0, profile_2, profile_4 ...
2. log the latency to run the passed in function with profiling on so we can easily skip those _compile call which returns quickly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120100
Approved by: https://github.com/eellison
2024-02-17 02:10:25 +00:00
Menglu Yu
7b1f5c874f [PT2][Optimus][Observability] Log the optimus graph transformation to the scuba (#119745)
Summary: Current everstore upload logging may cuase excessive compilation time when the model has lots of graph breaks (post: https://fb.workplace.com/groups/257735836456307/permalink/633533465543207/), we here log the transformation only when the graph changed

Test Plan:
timeout flows:
f528209775
f530084719

Differential Revision: D53692344

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119745
Approved by: https://github.com/jackiexu1992
2024-02-16 21:32:04 +00:00
Yanbo Liang
4f4629d522 [Dynamo] Fix ListIteratorVariable repr to avoid log flooding (#120053)
This issue was found from Meta internal use case.
Before:
```
V0215 18:33:41.761000 140489262883968 torch/_dynamo/symbolic_convert.py:682 [0/0] TRACE starts_line /data/users/ybliang/debug/debug4.py:11 in <listcomp> (f) (inline depth: 1)
V0215 18:33:41.761000 140489262883968 torch/_dynamo/symbolic_convert.py:682 [0/0]         a = [sum(x) for x in result]
V0215 18:33:41.761000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE BUILD_LIST 0 []
V0215 18:33:41.761000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST .0 [ListVariable()]
V0215 18:33:41.762000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable([LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=0)]
V0215 18:33:41.762000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable([LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1), LazyVariableTracker()]
V0215 18:33:41.762000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1)]
V0215 18:33:41.763000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1), BuiltinVariable(sum)]
V0215 18:33:41.763000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1), BuiltinVariable(sum), ListVariable()]
V0215 18:33:41.764000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1), ConstantVariable(int: 50)]
V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE JUMP_ABSOLUTE 4 [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1)]
V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=1)]
V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable([ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2), LazyVariableTracker()]
V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2)]
V0215 18:33:41.765000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2), BuiltinVariable(sum)]
V0215 18:33:41.766000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2), BuiltinVariable(sum), ListVariable()]
V0215 18:33:41.766000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2), ConstantVariable(int: 68)]
V0215 18:33:41.767000 140489262883968 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE JUMP_ABSOLUTE 4 [ListVariable(), ListIteratorVariable([ListVariable(), ListVariable(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker(), LazyVariableTracker()], index=2)]
```
After:
```
V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:682 [0/0] TRACE starts_line /data/users/ybliang/debug/debug4.py:11 in <listcomp> (f) (inline depth: 1)
V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:682 [0/0]         a = [sum(x) for x in result]
V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE BUILD_LIST 0 []
V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST .0 [ListVariable()]
V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable(length=10, index=0)]
V0215 18:27:57.901000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable(length=10, index=1), LazyVariableTracker()]
V0215 18:27:57.902000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable(length=10, index=1)]
V0215 18:27:57.902000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable(length=10, index=1), BuiltinVariable(sum)]
V0215 18:27:57.903000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable(length=10, index=1), BuiltinVariable(sum), ListVariable()]
V0215 18:27:57.903000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable(length=10, index=1), ConstantVariable(int: 55)]
V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE JUMP_ABSOLUTE 4 [ListVariable(), ListIteratorVariable(length=10, index=1)]
V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable(length=10, index=1)]
V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable(length=10, index=2), LazyVariableTracker()]
V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable(length=10, index=2)]
V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable(length=10, index=2), BuiltinVariable(sum)]
V0215 18:27:57.904000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable(length=10, index=2), BuiltinVariable(sum), ListVariable()]
V0215 18:27:57.905000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable(length=10, index=2), ConstantVariable(int: 64)]
V0215 18:27:57.905000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE JUMP_ABSOLUTE 4 [ListVariable(), ListIteratorVariable(length=10, index=2)]
V0215 18:27:57.905000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE FOR_ITER 18 [ListVariable(), ListIteratorVariable(length=10, index=2)]
V0215 18:27:57.905000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE STORE_FAST x [ListVariable(), ListIteratorVariable(length=10, index=3), LazyVariableTracker()]
V0215 18:27:57.906000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_GLOBAL sum [ListVariable(), ListIteratorVariable(length=10, index=3)]
V0215 18:27:57.906000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LOAD_FAST x [ListVariable(), ListIteratorVariable(length=10, index=3), BuiltinVariable(sum)]
V0215 18:27:57.906000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE CALL_FUNCTION 1 [ListVariable(), ListIteratorVariable(length=10, index=3), BuiltinVariable(sum), ListVariable()]
V0215 18:27:57.907000 140556649206912 torch/_dynamo/symbolic_convert.py:708 [0/0] TRACE LIST_APPEND 2 [ListVariable(), ListIteratorVariable(length=10, index=3), ConstantVariable(int: 56)]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120053
Approved by: https://github.com/williamwen42
2024-02-16 21:19:37 +00:00
Brian Hirsh
26343451be DTensor: make tensor_flatten more compatible for dynamo getattr (#118209)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118209
Approved by: https://github.com/ezyang, https://github.com/wanchaol
ghstack dependencies: #117667, #117666
2024-02-16 21:16:07 +00:00
Brian Hirsh
ee7bcf23db dynamo: support attribute access on tensor subclasses without sources (#117666)
Fixes https://github.com/pytorch/pytorch/issues/117596

This was needed for Float8Tensor. Before this PR, dynamo would sometimes handle attribute access on tensor subclasses correctly, but it would choke on tensor subclasses with no source (it would fall back to using a `GetAttrVariable` to represent the attribute access, which is a problem if the attribute is a tensor that we later want to call tensor methods on).

I supported two cases:

(1) the attribute is a tensor, which is part of the `attrs` returned by the subclass's `__tensor_flatten__`. This creates a `TensorVariable`
(2) the attribute is a constant, which is part of the constant metadata returned by `__tensor_flatten__`. As per the contract of tensor_flatten, this should be a `ConstantVariable`. It could be possible that we allow non-constant metadata in the future, but we don't support that today.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117666
Approved by: https://github.com/zou3519
ghstack dependencies: #117667
2024-02-16 21:16:07 +00:00
Brian Hirsh
67f6aca0d0 dynamo: respect autograd.Function + multiple save_for_backward calls (#117667)
Fixes https://github.com/pytorch/pytorch/issues/117652. Corner case that I hit debugging some Float8 issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117667
Approved by: https://github.com/ezyang, https://github.com/zou3519
2024-02-16 21:16:07 +00:00
soulitzer
312ce35c1f Rename singleton int to nested int (#119661)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119661
Approved by: https://github.com/ezyang
2024-02-16 19:21:17 +00:00
laith sakka
3693d8f467 Do to convert UnsupportedFakeTensorException into RuntimeError in runNode for proper graph breaking. (#120026)
Fix: https://github.com/pytorch/pytorch/issues/119779 by properly graph breaking  a proper fix is to handle quantized tensors for full complete solution.

if when generating  a fake tensor, UnsupportedFakeTensorException is thrown, then its handled and converted into a
Unimplemented in inside wrap_fake_exception which is then translated to a graph break.

However run_node used to convert  UnsupportedFakeTensorException into a runtime error, creating runtime
errors instead of graph breaks whenever generating a fake tensor for a quantized tensor fails.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120026
Approved by: https://github.com/jansel
2024-02-16 09:21:58 +00:00
Oguz Ulgen
62e5840b36 [Dynamo] Do not create TorchInGraphFunctionVariable for tags (#120005)
Fixes https://github.com/pytorch/pytorch/issues/119793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120005
Approved by: https://github.com/yanboliang
2024-02-16 03:37:32 +00:00
Yanbo Liang
2a63dd8889 [Dynamo] Support lazy module with namedtuple/dict input (#119972)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119972
Approved by: https://github.com/jansel
2024-02-15 23:18:18 +00:00
PyTorch MergeBot
47300221c2 Revert "[export] Change runtime asserts to using assert_scalar (#119608)"
This reverts commit f4d641ba2f.

Reverted https://github.com/pytorch/pytorch/pull/119608 on behalf of https://github.com/huydhn due to This break ONNX trunk job 65fd8b6730 ([comment](https://github.com/pytorch/pytorch/pull/119608#issuecomment-1947436402))
2024-02-15 22:25:24 +00:00
laith sakka
d707e3c9c6 Fix handling none source in build_torch_function_fn (#119724)
Fix https://github.com/pytorch/pytorch/issues/119580

When a UserDefinedObjectVariable is created it does not always have a source, i.e: when its an intermediate
This diff fix two handling of none source in two locations during an inlining of a user torch function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119724
Approved by: https://github.com/jansel, https://github.com/mlazos, https://github.com/anijain2305
2024-02-15 21:21:47 +00:00
angelayi
f4d641ba2f [export] Change runtime asserts to using assert_scalar (#119608)
By changing runtime symbolic asserts to using assert_scalar, the asserts can call into `expect_true` and modify the shape env so that we can run through the traced graph module with fake tensors. With assert_async, the asserts only get hit during runtime, but that means if we run the graph module with fake tensors, the asserts will not affect the shape env, so later data dependent calls to the fake tensors may result in GuardOnDataDependentSymNode errors.

https://github.com/pytorch/pytorch/issues/119587

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119608
Approved by: https://github.com/ezyang
2024-02-15 07:13:42 +00:00
Yifu Wang
cd08dc37f8 Support tracing native functional collective via python APIs (#119103)
Summary:
- Inlined `torch.distributed.distributed_c10d._get_group_size_by_name`
- Updated all torch.compile tests in test_c10d_functional_native.py to use funcol python APIs (as opposed to the dispatcher ops)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119103
Approved by: https://github.com/wconstab, https://github.com/fegin, https://github.com/wanchaol
2024-02-15 03:33:49 +00:00
Yanbo Liang
7f5b87c953 [torch.compile] Log more compilation time breakdown (#119865)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119865
Approved by: https://github.com/ezyang
2024-02-15 02:20:07 +00:00
Alexander Grund
99cb807e25 Skip test_wrap_bad if run under pytest (#115070)
Pytest replaces sys.stdout/stderr by `TextIOWrapper` instances which do not support `fileno()`
Hence skip that test in this case

Fixes #115069

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115070
Approved by: https://github.com/clee2000
2024-02-15 00:10:05 +00:00
Zhengxu Chen
8f27fde2f5 [export] Log private api uses. (#119848)
Summary:
as title.
The following APIs are logged:
- capture_preautograd_graph
- torch._export.aot_compile
- external usage of _export_to_torch_ir (AOTInductor, Pippy)
- constraints API
- public use of torch._dynamo.export

Test Plan: CI

Differential Revision: D53735599

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119848
Approved by: https://github.com/suo
2024-02-14 22:58:23 +00:00
gs-olive
e0f6fa6a7c Windows Dynamo Error Removal CI Check (#115969)
Rebase of #111313 onto `main`, for CI validation

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969
Approved by: https://github.com/PaliC, https://github.com/thiagocrepaldi
2024-02-14 21:14:36 +00:00
Aaron Orenstein
2aad3f93f8 Fix guards for field access through properties (#119719)
When building guards which went through a property we were analyzing the property using getattr_static but the guard wasn't built using getattr_static so if the property was "unusual" it generated misbehaved code which referenced a non-existent `__closure__` field.

Fixes #118786

Note that after this change some of the referenced tests are still failing with a different error - but getting further.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119719
Approved by: https://github.com/oulgen
2024-02-14 20:42:55 +00:00
laith sakka
edd9ddf73f Propagate allow_non_graph_fake between get_fake_values_from_nodes and get_fake_values (#119731)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119731
Approved by: https://github.com/jansel, https://github.com/anijain2305
ghstack dependencies: #119314, #119435
2024-02-14 15:26:17 +00:00
Yanbo Liang
54a30f6d4e [Dynamo] Update trace_rules.py and re-enable skipped tests (#119860)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119860
Approved by: https://github.com/angelayi
2024-02-14 05:22:55 +00:00
Animesh Jain
80379ef0aa [dynamo-must-fix] Use ID_MATCH for UserDefinedClass (#119853)
Fixes https://github.com/pytorch/pytorch/issues/119715

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119853
Approved by: https://github.com/jansel
2024-02-14 03:14:42 +00:00
Jason Ansel
75a6d6aef7 [inductor] Support storage resizing (#119749)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119749
Approved by: https://github.com/yf225
ghstack dependencies: #119647, #119671
2024-02-14 03:03:38 +00:00
PyTorch MergeBot
4a5b2cd6cb Revert "Windows Dynamo Error Removal CI Check (#115969)"
This reverts commit 45e7af5818.

Reverted https://github.com/pytorch/pytorch/pull/115969 on behalf of https://github.com/PaliC due to this pr ended up breaking some of our periodic tests ([comment](https://github.com/pytorch/pytorch/pull/115969#issuecomment-1942934386))
2024-02-14 01:11:46 +00:00
Guilherme Leobas
3319dbcd23 Update vmap guard to avoid recompilations (#119061)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119061
Approved by: https://github.com/zou3519
2024-02-13 20:50:23 +00:00
andrewor14
8ec8d78ef2 [quant][pt2e][be] Rename eval_utils -> export_utils (#119725)
It's not really eval_utils anymore, since we added some training
related utils. Instead it should be util functions that are
related to general export use cases.

Differential Revision: [D53711494](https://our.internmc.facebook.com/intern/diff/D53711494)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119725
Approved by: https://github.com/tugsbayasgalan
2024-02-13 19:10:06 +00:00
Jason Ansel
39c68efd85 [dynamo] Capture untyped_storage().resize_() (#119647)
This makes storage resizing work with `backend=eager`, the next two PRs make it work for inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119647
Approved by: https://github.com/yf225
2024-02-13 19:03:28 +00:00
Edward Z. Yang
52de407b6c Avoid performing replacements when it would unrefine ranges (#117356)
Fixes https://github.com/pytorch/pytorch/issues/117268; check this issue for background.

This PR does the following:

* Do not perform a replacement if the expression we're replacing the symbol with has a less refined value range than the original. There's a little bit of trickiness around the handling for values close to INT64_MAX; when checking if a range refines another, I *only* consider the range representable in 64-bit integers. This is enough to prevent us from doing a substitution like `i0 = 10 - i1`, but it appears to still let us do the other substitutions we like, such as `i0 = i1` or `i0 = 12 * i1`
* The test above is order dependent: if we assert an equality BEFORE we have refined a range, we might be willing to do the replacement because there isn't a meaningful range. This means that it's important to mark things as sizes, before you start doing other error checking. `split_with_sizes` is adjusted accordingly. It would be good to raise an error if you get the ordering wrong, but I leave this to future work.
* It turns out this is not enough to fix AOTAutograd, because we lose the size-ness of unbacked SymInts when AOTAutograd retraces the Dynamo graph. So update deferred runtime assert insertion to also insert size-ness and value ranges annotations. Note that, in principle, it shouldn't be necessary to explicitly do the latter; these should just show up as deferred runtime asserts. That's some extra refactoring for a later day.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117356
Approved by: https://github.com/lezcano
2024-02-13 15:56:59 +00:00
Wanchao Liang
bfb9ea1a43 fix compile DTensor.from_local in trace_rule_look up (#119659)
There's a bug when converting from TorchVariable to trace rule look ups,
in some corner cases the DTensor.from_local calls not matching the trace
name rule look up, resulting in a None look up, and falling back to the
UserFunctionVariable, which makes the tracing silent wrong by tracing
into the DTensor.from_local function. Not exactly sure yet why the look
up failed

This PR fixes the DTensor.from_local tracing to make sure in everycase
we should hit the InGraphFunctionVariable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119659
Approved by: https://github.com/yifuwang
2024-02-13 05:21:19 +00:00
laith sakka
ea8e4fd5ac Support FunctoolsPartialVariable::get_function, fix NamedTupleVariable::as_proxy and handle call_function in get_fake_values_from_nodes (#119435)
partially address https://github.com/pytorch/pytorch/issues/118785
This diff fixes three things:
1. add get_function to FunctoolsPartialVariable note that it will be available only if all args constant otherwise,
it would throw unimplemented in the call to asPythonConstant.

2. NamedTupleVariable takes args dispatched not as list ex: NamedTuple(a, b, c) vs NamedTuple([a, b, c]),
 hence fix that by specializing asProxy.

3. A call to create_arg from within create_proxy, changes a python NamedTuple to a function call node without
associating an example value! Updated get_fake_values_from_nodes to handle such case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119435
Approved by: https://github.com/jansel, https://github.com/anijain2305
ghstack dependencies: #119314
2024-02-13 01:44:08 +00:00
Jason Ansel
74d55b0e63 [dynamo] Support torch.distributed.fsdp._flat_param._same_storage_size (#119627)
Replaces #117690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119627
Approved by: https://github.com/Skylion007
2024-02-13 01:27:37 +00:00
PyTorch MergeBot
472500e32a Revert "Avoid performing replacements when it would unrefine ranges (#117356)"
This reverts commit 0e6b314fc2.

Reverted https://github.com/pytorch/pytorch/pull/117356 on behalf of https://github.com/huydhn due to Sorry for reverting the change but it looks like the forward fix still needs more work https://github.com/pytorch/pytorch/pull/119712, so it would be cleaner to reland them ([comment](https://github.com/pytorch/pytorch/pull/117356#issuecomment-1940032407))
2024-02-13 01:16:58 +00:00
PyTorch MergeBot
7d780ff86f Revert "Enable fake tensor caching in fbcode by default (#118555)"
This reverts commit 0f2fbbff10.

Reverted https://github.com/pytorch/pytorch/pull/118555 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing one model test internally. Please take a look at the diff for more info D53189048 ([comment](https://github.com/pytorch/pytorch/pull/118555#issuecomment-1939550273))
2024-02-12 20:51:23 +00:00
Yanbo Liang
0e5b6594b7 [Dynamo] Minor cleanup of redundant function lookup logics (#119666)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119666
Approved by: https://github.com/angelayi
2024-02-12 19:00:39 +00:00
Tarun Karuturi
bc521f2ce3 In dynamo tracing for index() use None as the default indicator for end and not -1 (#119151)
Summary: In dynamo tracing, `index()`'s implementation currently has the default begin index as `0` and the default end index as`-1` which means that by default we're dropping the last element. Rather we should be doing `None` which will ensure that the last element is also checked.

Test Plan: CI

Differential Revision: D53392287

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119151
Approved by: https://github.com/yanboliang
2024-02-12 17:45:05 +00:00
laith sakka
72d9a38118 add get_function to TorchInGraphFunctionVariable (#119314)
partially address https://github.com/pytorch/pytorch/issues/118785

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119314
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2024-02-12 16:35:34 +00:00
Taras Tsugrii
a4cc6b85dc [dynamo][eval][perf] Remove unnecessary dict copies. (#119305)
Both of these variables are already created using `dict(...)` so making yet another `dict` copy is pure overhead and boilerplate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119305
Approved by: https://github.com/Skylion007
2024-02-11 20:29:26 +00:00
Yanbo Liang
57d8f67619 [Dynamo][17/N] Rename SkipFilesVariable to SkipFunctionVariable and move to functions.py (#119619)
This is follow-up-3 from https://github.com/pytorch/pytorch/pull/118971#issue-2114082018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119619
Approved by: https://github.com/jansel
2024-02-10 19:33:37 +00:00
Oguz Ulgen
e693089c7a [Dynamo] Refactor tensor methods handling (#119581)
Fixes part of #119128

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119581
Approved by: https://github.com/jansel, https://github.com/anijain2305
2024-02-10 08:46:50 +00:00
Jason Ansel
e1c1b8c2b2 [dynamo] Improve support for backwards hooks (#119525)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119525
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2024-02-10 01:14:03 +00:00
PyTorch MergeBot
25a0fa6d13 Revert "[dynamo] Improve support for backwards hooks (#119525)"
This reverts commit b1f4b2a63c.

Reverted https://github.com/pytorch/pytorch/pull/119525 on behalf of https://github.com/clee2000 due to broke test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_gets_cleaned_up on dynamo https://github.com/pytorch/pytorch/actions/runs/7847212828/job/21416215820 b1f4b2a63c.  The failure exists on the PR as well, but got masked by the other test.  Putting this as no signal? ([comment](https://github.com/pytorch/pytorch/pull/119525#issuecomment-1936447169))
2024-02-09 18:58:55 +00:00
Yanbo Liang
f3a2094065 [Dynamo][Export] Mitigate legacy issue that aten op as export entrance function (#119528)
This is going to fix a legacy issue like:
```
torch._dynamo.export(torch.ops.aten.scaled_dot_product_attention, ...)(*inputs,)
```
This is not supported any more, now the top level ```torch.export``` only support ```nn.Module```, but there are still some tests using the internal APIs and caused the ```trace_rules.check``` assertion error. This PR is going to mitigate such cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119528
Approved by: https://github.com/ydwu4
2024-02-09 18:24:09 +00:00
Yanbo Liang
5356b5d1f0 [Dynamo][16/N] Move skipfiles to trace_rules.py (#119432)
This is follow-up-1 for https://github.com/pytorch/pytorch/pull/118971#issue-2114082018. Only code motion and doc update in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119432
Approved by: https://github.com/jansel
2024-02-09 18:18:23 +00:00
Jason Ansel
b1f4b2a63c [dynamo] Improve support for backwards hooks (#119525)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119525
Approved by: https://github.com/yanboliang
2024-02-09 17:02:40 +00:00
PyTorch MergeBot
eff93fbd86 Revert "[Dynamo][16/N] Move skipfiles to trace_rules.py (#119432)"
This reverts commit 56364124af.

Reverted https://github.com/pytorch/pytorch/pull/119432 on behalf of https://github.com/atalman due to Breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/119432#issuecomment-1936122795))
2024-02-09 15:25:25 +00:00
Edward Z. Yang
0e6b314fc2 Avoid performing replacements when it would unrefine ranges (#117356)
Fixes https://github.com/pytorch/pytorch/issues/117268; check this issue for background.

This PR does the following:

* Do not perform a replacement if the expression we're replacing the symbol with has a less refined value range than the original. There's a little bit of trickiness around the handling for values close to INT64_MAX; when checking if a range refines another, I *only* consider the range representable in 64-bit integers. This is enough to prevent us from doing a substitution like `i0 = 10 - i1`, but it appears to still let us do the other substitutions we like, such as `i0 = i1` or `i0 = 12 * i1`
* The test above is order dependent: if we assert an equality BEFORE we have refined a range, we might be willing to do the replacement because there isn't a meaningful range. This means that it's important to mark things as sizes, before you start doing other error checking. `split_with_sizes` is adjusted accordingly. It would be good to raise an error if you get the ordering wrong, but I leave this to future work.
* It turns out this is not enough to fix AOTAutograd, because we lose the size-ness of unbacked SymInts when AOTAutograd retraces the Dynamo graph. So update deferred runtime assert insertion to also insert size-ness and value ranges annotations. Note that, in principle, it shouldn't be necessary to explicitly do the latter; these should just show up as deferred runtime asserts. That's some extra refactoring for a later day.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117356
Approved by: https://github.com/lezcano
2024-02-09 14:43:58 +00:00
Sam Larsen
0f2fbbff10 Enable fake tensor caching in fbcode by default (#118555)
Summary: Enabled by default in OSS; this switches the default to "on" in fbcode too.

Test Plan: Ran torchbench benchmarks in fbcode

Differential Revision: [D53189048](https://our.internmc.facebook.com/intern/diff/D53189048)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118555
Approved by: https://github.com/eellison
2024-02-09 05:42:16 +00:00
Elias Ellison
930b60f5aa Add Debug Utility To Generate Inputs for AOT Graphs (#119409)
```
    Takes in a function which has been printed with print_readable() and constructs kwargs to run it.
    Currently only handles Tensor inputs and a graph module which might have tensor constants.
    Example:
        Consider a function `forward` defined as follows:
        >>> def forward(self, primals_1: "f32[1001, 6]"):
        ...     _tensor_constant0: "i64[4190]" = self._tensor_constant0
        ...     # Further implementation
        >>> kwargs = aot_graph_input_parser(forward)
        >>> forward(**kwargs)
    """
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119409
Approved by: https://github.com/shunting314
2024-02-09 03:55:19 +00:00
Yue Dong
915f9db03c [Dynamo] Support kwargs for lazy module (#119445)
Summary:
Seems like `kwargs` is already support in `_infer_argument`, so we don't need the extra assertion `len(kwargs) == 0`.

This optimization ensures compatibility with torch.compile() for LazyModules with kwargs inputs, preventing graph breaks.

Test Plan: Unit tetst and CI

Differential Revision: D53558778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119445
Approved by: https://github.com/yanboliang
2024-02-09 00:46:41 +00:00
gs-olive
45e7af5818 Windows Dynamo Error Removal CI Check (#115969)
Rebase of #111313 onto `main`, for CI validation

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115969
Approved by: https://github.com/ezyang
2024-02-08 21:23:45 +00:00
Angela Yi
8da2f81527 [export] Convert internal tests to using .module() (#119105)
Test Plan: CI

Differential Revision: D53091904

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119105
Approved by: https://github.com/ydwu4
2024-02-08 17:19:07 +00:00
ydwu4
b251bca205 [dynamo] inlining into __iter__ of user defined object (#119243)
Fixes #119198.

This PR make dynamo inline `__iter__` of a user defined object instead of creating a graph break. Also added a new test, which shows:
1. the loop is unrolled
2. the length of the loop is guarded when inlining `__iter__`
```python
class Mod:
    def __init__(self):
        self.a = [torch.randn(2, 2), torch.randn(2, 2)]

    def __iter__(self):
        return iter(self.a)

def f(mod):
    ret = []
    for x in mod:
        ret.append(x + 1)
    return ret
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119243
Approved by: https://github.com/jansel
2024-02-08 17:07:30 +00:00