Commit Graph

300 Commits

Author SHA1 Message Date
Simon Fan
7e0edafe86 [compiled autograd][dynamo] improve lifted autograd.Function.backward handling and fallback to pseudo-eager (#125661)
- `FakeContext` hides all fields other than ctx.saved_tensors, this dynamo errors when the autograd.Function.backward uses other attrs on ctx and it also doesn't allow fallback to eager.
- If we remove it, we still can't fallback to eager: node variables are already freed (ctx.saved_tensors throws)
- However, we can fallback to "pseudo-eager" by using a duck-typed ctx and routing the ctx.saved_tensors to lifted tensors
- Dynamo tries to inline external_utils.call_backward, treats BackwardCFunction as a AutogradFunctionContextVariable (only used up until we create the fake context: FakeBackwardCFunction)
- we call_function backward from the forward class AutogradFunctionVariable, and we still pass in the fake context as a UserDefinedObjectVariable (can later use AutogradFunctionContextVariable + HOO graph speculate)

Fixes #125489  #124827

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125661
Approved by: https://github.com/jansel
2024-05-08 21:00:37 +00:00
Yu, Guangye
d17be10df1 make torch.amp.autocast more generic (#125103)
# Motivation
As discussed in [#124479](https://github.com/pytorch/pytorch/pull/124479), `torch.amp.autocast` can NOT be completely equivalent to `torch.cuda.amp.autocast` and `torch.cpu.amp.autocast` since `torch.amp.autocast` has NOT the default `dtype` for CPU (`torch.bfloat16` by default) and CUDA (`torch.float16` by default) respectively. We would like `torch.amp.autocast` to be more generic to help the developer/customer write the device-agnostic code. Because there are not enough reasons to add device-specific autocast `torch.xxx.amp.autocast` for each device backend.

# Solution
When `None` is passed to `dtype`, we should use `torch.get_autocast_dtype` to get the related dtype for each backend. Meanwhile, `torch.get_autocast_dtype` is necessary to be supported in JIT path for BC.

# Additional Context
With this PR, `torch.amp.autocast(device_type='cuda')` is equivalent to `torch.cuda.amp.autocast`.
Add two new UTs to cover this change in eager and jit path respectively.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125103
Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/gujinghui
2024-05-08 12:13:26 +00:00
ydwu4
461ffaaaf3 [dynamo] support torchbind object input (#124978)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124978
Approved by: https://github.com/jansel
2024-05-07 03:02:00 +00:00
Edward Z. Yang
b6bcd09173 Get rid of tabular and sizes, beef up verbosity of output graph (#125507)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125507
Approved by: https://github.com/Chillee, https://github.com/jansel
ghstack dependencies: #125505
2024-05-06 13:41:58 +00:00
Aaron Gokaslan
1dd42e42c4 [BE]: Try TCH autofixes on torch/ (#125536)
Tries TCH autofixes and see what breaks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536
Approved by: https://github.com/ezyang
2024-05-05 23:13:59 +00:00
Edward Z. Yang
650a248d3e Rename is_unspecialized to pass_arg_as_tensor, add comment (#125496)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125496
Approved by: https://github.com/lezcano
ghstack dependencies: #125395, #125419, #125483, #125494
2024-05-05 16:57:50 +00:00
Animesh Jain
071ee40793 [dynamo][nn module] Check for duplicate tensors in register_attr_or_module (#125421)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125421
Approved by: https://github.com/mlazos
ghstack dependencies: #125439
2024-05-03 05:08:09 +00:00
Animesh Jain
e68d65dae2 [dynamo][cpp-guards] Differentiate dict guards wrt to guarding on key order (#124779)
We guard on key order
1) When a key is a non-constant object
2) When we actually need key order - like .values, .items etc

For dicts/OrderedDicts that do not require key order guarding, we just rely on usual `GuardManger + DictGetItemGuardAccessor`. This is faster than going through the `list(d.keys())` based design for OrderedDicts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124779
Approved by: https://github.com/jansel
2024-04-25 08:20:35 +00:00
Animesh Jain
59a1f1f308 [dynamo][inline inbuilt nn modules] Do not inline for export (#124814)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124814
Approved by: https://github.com/jansel
2024-04-25 06:35:31 +00:00
Yu, Guangye
25f321b84f Refactor autocast C++ APIs to be device-agnostic (#124359)
# Motivation
This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast  **C++** APIs.
In C++ side,
- `is_enabled()` -> `is_enabled(device_type)`.
- `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`.
- `get_autocast_dtype()` -> `get_autocast_dtype(device_type)`
- `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)`

These following C++ APIs are deprecated and should be removed in PyTorch 2.5
- `is_cpu_enabled`
- `set_cpu_enabled`
- `get_autocast_cpu_dtype`
- `set_autocast_cpu_dtype`
- `is_xpu_enabled`
- `set_xpu_enabled`
- `get_autocast_xpu_dtype`
- `set_autocast_xpu_dtype`
- `is_ipu_enabled`
- `set_ipu_enabled`
- `get_autocast_ipu_dtype`
- `set_autocast_ipu_dtype`
- `is_hpu_enabled`
- `set_hpu_enabled`
- `get_autocast_hpu_dtype`
- `set_autocast_hpu_dtype`
- `is_xla_enabled`
- `set_xla_enabled`
- `get_autocast_xla_dtype`
- `set_autocast_xla_dtype`
- `is_privateuseone_enabled`
- `set_privateuseone_enabled`
- `get_autocast_privateuseone_dtype`
- `set_autocast_privateuseone_dtype`

In Python side,
provide 4 generic autocast APIs:
- `torch.is_autocast_enabled(device_type)`
- `torch.set_autocast_enabled(device_type, new_enabled)`
- `torch.get_autocast_dtype(device_type)`
- `torch.set_autocast_dtype(device_type, dtype)`

# Additional Context
We will submit another PR to refactor autocast **Python** APIs based on this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124359
Approved by: https://github.com/jgong5, https://github.com/albanD
2024-04-23 10:38:50 +00:00
Boyuan Feng
aa2da0cdd2 [Export] Add runtime assert to non-strict export (#123681)
This PR moves insert_deferred_runtime_asserts from dynamo to torch.fx.passes and uses it to add runtime assertion for non-strict export.

Differential Revision: D55944267

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123681
Approved by: https://github.com/tugsbayasgalan, https://github.com/angelayi
2024-04-18 16:13:27 +00:00
Edward Z. Yang
bebdbb63ce Introduce set_example_value and use it throughout Dynamo (#124176)
I'm going to setup some extra behavior when we set example value, so
I need a convenient place to interpose.  I cannot easily do it on
meta itself because its a generic dict with no interposition point.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124176
Approved by: https://github.com/oulgen
ghstack dependencies: #124105, #124059
2024-04-17 22:57:11 +00:00
Simon Fan
67bd43b510 [compiled autograd][dynamo] use aliases for stack restore when partial graphs steal inputs (#124127)
same idea as https://github.com/pytorch/pytorch/pull/123359, but for when we restore stack variables after calling a partial graph:

Illustrated by the test case:

before:
```python
def function(inputs):
    graph_out_0 = __compiled_fn_2(inputs)
    getitem_1 = graph_out_0[0]
    add = inputs[1]  <---- error inputs is already cleared
    del graph_out_0
    add_1 = add + getitem_1
    add = None
    getitem_1 = None
    cpu = add_1.cpu()
    add_1 = None
    return (cpu,)
```
after:
```python
def function(inputs):
    inputs_ref_0 = inputs[1]
    graph_out_1 = __compiled_fn_2(inputs)
    getitem_1 = graph_out_1[0]
    add = inputs_ref_0
    del graph_out_1
    add_1 = add + getitem_1
    add = None
    getitem_1 = None
    cpu = add_1.cpu()
    add_1 = None
    return (cpu,)
```

Co-authored-by: Jason Ansel <jansel@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124127
Approved by: https://github.com/jansel
2024-04-16 17:01:34 +00:00
William Wen
9309580d69 [dynamo, 3.12] handle possibility of NULL local variables during graph breaks (#124095)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124095
Approved by: https://github.com/jansel
2024-04-16 08:44:43 +00:00
Animesh Jain
bb0c768c5b [dynamo][refactor] Move LazyGraphModule handling (#124113)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124113
Approved by: https://github.com/jansel
ghstack dependencies: #124078
2024-04-16 06:39:45 +00:00
Simon Fan
540b451e91 [compiled autograd][dynamo] Codegen aliases to keep grad mutated tensors alive (#123359)
The current codegen is problematic if __compiled_fn_0 clears the inputs list, since we need it for assignment afterwards
```python
def forward(inputs):
    __compiled_fn_0 = ...  # The actual function needs to be provided
    graph_out_0 = __compiled_fn_0(inputs)  # clears inputs
    temp_list = []
    temp_list.append(graph_out_0[0])
    inputs[4].grad = graph_out_0[1]  # inputs is empty, index error
    inputs[7].grad = graph_out_0[2]
    inputs[8].grad = graph_out_0[3]
    inputs[9].grad = graph_out_0[3]
    del graph_out_0
    return temp_list
```

With this fix, we use aliases to keep the tensors alive
```python
def forward(inputs):
    __compiled_fn_0 = ...  # The actual function needs to be provided
    inputs_ref_1 = inputs[9]
    inputs_ref_2 = inputs[4]
    inputs_ref_3 = inputs[8]
    inputs_ref_4 = inputs[7]
    graph_out_0 = __compiled_fn_0(inputs)
    temp_list = []
    temp_list.append(graph_out_0[0])
    inputs_ref_2.grad = graph_out_0[1]
    inputs_ref_4.grad = graph_out_0[2]
    inputs_ref_3.grad = graph_out_0[3]
    inputs_ref_1.grad = graph_out_0[3]
    del graph_out_0
    return temp_list
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123359
Approved by: https://github.com/jansel
ghstack dependencies: #123630, #123674, #122353
2024-04-12 10:29:09 +00:00
Animesh Jain
7283c37c98 [dynamo] Keep guards on global function (#123423)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123423
Approved by: https://github.com/jansel
2024-04-09 04:23:11 +00:00
Oguz Ulgen
287680176b Use graph.find_nodes in dynamo (#122257)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122257
Approved by: https://github.com/jansel
ghstack dependencies: #121565, #122255, #122256
2024-04-07 18:51:18 +00:00
Animesh Jain
8c84fe3c86 [dynamo][guards] Forward fix for #123302 (#123485)
For some reason, adding a `TYPE_CHECK` in DATA_PTR_MATCH guard in https://github.com/pytorch/pytorch/issues/123302 increases optimizer guard overhead for `MT5ForConditionalGeneration` by 10x. There is nothing special about MT5. As we are going to move towards the CPP guards soon, there is no reason to investigate this deeper.

We can use `ID_MATCH` instead of `DATA_PTR` match. Today both cant be serialized, so there is no one preference over the other.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123485
Approved by: https://github.com/mlazos
2024-04-06 02:34:06 +00:00
Guilherme Leobas
32f9453c2a [dynamo] Emit FUNCTORCH_STACK_MATCH guard in vmap(compile(f)) case (#122786)
Fixes: #122201

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122786
Approved by: https://github.com/zou3519
2024-04-05 15:04:16 +00:00
Michael Lazos
512759a3d7 Fix for tensor attribute missing (#123313)
Tensors would sometimes be realized after we already registered attrs on the root nn module. This ensures all stack values are realized before registering attrs on the root nn module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123313
Approved by: https://github.com/anijain2305
2024-04-04 21:11:04 +00:00
rzou
fd60752786 Turn _allow_unsafe_data_ptr_access into a config option (#123291)
We're not planning on having this flag around for very long (see
deprecation in next PR), so it's better as a config option.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123291
Approved by: https://github.com/eellison
ghstack dependencies: #123261, #123282
2024-04-04 20:35:24 +00:00
Animesh Jain
5b45ec8892 [dynamo][guards] Use DATA_PTR instead of ID_MATCH for tensors (#123302)
We should sparingly use ID_MATCH guards. When it comes to performance, ID_MATCH is much faster DATA_PTR for Python guards. However, the difference is very small in C++. So, its worth just using DATA_PTR_MATCH.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123302
Approved by: https://github.com/mlazos
ghstack dependencies: #123285
2024-04-04 03:52:50 +00:00
Michael Lazos
3e2b7e6052 [dynamo][guard overhead] Data ptr guard optimizer state tensors (#122858)
Stricter (but faster) guarding on optimizer state tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122858
Approved by: https://github.com/anijain2305
2024-04-03 21:42:06 +00:00
Animesh Jain
5d0ac887b9 [dynamo][higher order ops] Make the subgraph sourceless (#123071)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123071
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #123046, #123058, #123059
2024-04-01 21:09:41 +00:00
William Wen
abe4a0e9eb [dynamo] pop result of print reordering (#122744)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122744
Approved by: https://github.com/jansel
ghstack dependencies: #122146, #122335, #122354, #122355, #122356, #122449, #122455, #122456, #122530, #122737, #122738, #122739, #122740, #122741, #122742, #122743
2024-03-27 20:39:39 +00:00
rzou
c81c9ba472 Disallow {FakeTensor,FunctionalTensor}.data_ptr (#122514)
This PR:
- disallows FakeTensor.data_ptr when it is called inside PT2 or fx tracing.
- disallows FunctionalTensor.data_ptr (python FunctionalTensor is only used in
  PT2)

The motivation behind this is that the leading cause of segfaults when
using custom ops with PT2 is calling .data_ptr on FunctionalTensor or
FakeTensor.

This change is BC-breaking. If your code broke as a result of this, it's
because there was a bug in it (these .data_ptr should never be
accessed!). You can either fix the bug (recommended) or get the previous
behavior back with:
```
from torch._subclasses.fake_tensor import FakeTensor
from torch._subclasses.functional_tensor import FunctionalTensor

data_ptr = 0 if isinstance(tensor, (FakeTensor, FunctionalTensor)) else tensor.data_ptr()
```

Test Plan:
- existing tests

Differential Revision: [D55366199](https://our.internmc.facebook.com/intern/diff/D55366199)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122514
Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/yifuwang, https://github.com/kurtamohler
2024-03-26 23:55:42 +00:00
Joel Schlosser
07b618e2d4 Graph break cleanly in Dynamo for module parametrization (#121041)
Fixes #118795

This is a graph breaking partial fix for #120914. We still need -actual- module parametrization tracing support, but at least it doesn't blow up hard now.

**Background**: Module parametrization injects a property as the module parameter attribute that calls a `nn.Module` whose forward takes in a module parameter and returns a reparametrized module parameter.
Example:
```
class MyParametrization(nn.Module):
    def forward(X):
        # This reparametrization just negates the original parameter value
        return -X

m = nn.Linear(...)
p = MyParametrization()
register_parametrization(m, "weight", p)

# Accessing the "weight" attribute will invoke p's forward() on m's original weight and return the output as the new weight.
# m.weight here is now an injected property that does the above instead of an actual Parameter.
# This property is defined in torch/nn/utils/parametrize.py.
m.weight

# NB: Parametrization changes the module type (e.g. torch.nn.utils.parametrize.ParametrizedLinear)
print(type(m))
```

**Problem 1**: Dynamo has special tracing rules for things in `torch.nn`. Parametrizing a module changes the type of the module and the parametrized attribute, so now these rules wrongly affect tracing here. To fix this:
* For parametrized modules, call `convert_to_unspecialized()` to restart analysis where Dynamo starts inlining the module.

**Problem 2**: The issue seen in #118795 is that Dynamo will see a dynamically constructed tensor when `m.weight` is called and introduce that to its `tensor_weakref_to_sizes_strides` cache during fake-ification. This tensor is also made to be a graph input, since it's a module parameter. When guards are created for this module parameter input, the logic calls `m.weight` again and tries to look the result up in the cache, but this is a different tensor now, giving the `KeyError` symptom. To fix this:
* Replace Dynamo's `tensor_weakref_to_sizes_strides` cache with a `input_source_to_sizes_strides` cache.
    * This cache was originally introduced in #100128.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121041
Approved by: https://github.com/anijain2305
2024-03-26 23:44:51 +00:00
Yifu Wang
09cb42ce29 [dynamo] delete graph_out_{n} after restoring local vars (#122658)
At graph breaks, we create a graph_out_{n} symbol to hold the graph output and
use it to restore the local vars. In addition to their own symbols, the local
vars are kept alive by the symbol we created. This means that if the graph
break is the last usage of one of the symbols, the symbol would still be kept
alive upon graph resumption.

This PR: delete the graph_out_{n} symbol after restoring local vars so the
lifetime of the local vars is governed by themselves.

## Example Problem
Tensor `b`'s last usage is in the graph break. However, it won't be deallocated until `bar()` completes. In the orignal issue report by @Yuzhen11, `b` is a large tensor and `bar()` is an expensive computation.

```python
import torch

def foo(a):
    return torch.mm(a, a)

@torch._dynamo.disable()
def graph_break_fn(a):
    ret = a.bfloat16()
    return ret

def bar(c):
    return torch.mm(c, c)

def fn(a):
    b = foo(a)
    c = graph_break_fn(b)
    # del b
    return bar(c)

fn_compiled = torch.compile(fn, backend="eager")
a = torch.randn(10000, 10000, device="cuda", requires_grad=True)

fn_compiled(a).sum().backward()
```

Bytecode before this PR:
```
ORIGINAL BYTECODE fn /home/yifu/microbench/del2.py line 18
 19           0 LOAD_GLOBAL              0 (foo)
              2 LOAD_FAST                0 (a)
              4 CALL_FUNCTION            1
              6 STORE_FAST               1 (b)

 20           8 LOAD_GLOBAL              1 (graph_break_fn)
             10 LOAD_FAST                1 (b)
             12 CALL_FUNCTION            1
             14 STORE_FAST               2 (c)

 22          16 LOAD_GLOBAL              2 (bar)
             18 LOAD_FAST                2 (c)
             20 CALL_FUNCTION            1
             22 RETURN_VALUE

MODIFIED BYTECODE fn /home/yifu/microbench/del2.py line 18
 18           0 LOAD_GLOBAL              3 (__compiled_fn_0)
              2 LOAD_FAST                0 (a)
              4 CALL_FUNCTION            1
              6 STORE_FAST               3 (graph_out_0)
              8 LOAD_GLOBAL              1 (graph_break_fn)
             10 LOAD_FAST                3 (graph_out_0)
             12 LOAD_CONST               1 (0)
             14 BINARY_SUBSCR

 20          16 CALL_FUNCTION            1
             18 LOAD_GLOBAL              4 (__resume_at_14_1)
             20 ROT_TWO
             22 CALL_FUNCTION            1
             24 RETURN_VALUE

ORIGINAL BYTECODE torch_dynamo_resume_in_fn_at_20 /home/yifu/microbench/del2.py line 20
 20           0 LOAD_FAST                0 (___stack0)
              2 JUMP_ABSOLUTE            9 (to 18)
              4 LOAD_GLOBAL              0 (foo)
              6 LOAD_FAST                1 (a)
              8 CALL_FUNCTION            1
             10 STORE_FAST               2 (b)
             12 LOAD_GLOBAL              1 (graph_break_fn)
             14 LOAD_FAST                2 (b)
             16 CALL_FUNCTION            1
        >>   18 STORE_FAST               3 (c)

 22          20 LOAD_GLOBAL              2 (bar)
             22 LOAD_FAST                3 (c)
             24 CALL_FUNCTION            1
             26 RETURN_VALUE

MODIFIED BYTECODE torch_dynamo_resume_in_fn_at_20 /home/yifu/microbench/del2.py line 20
 20           0 LOAD_GLOBAL              3 (__compiled_fn_2)
              2 LOAD_FAST                0 (___stack0)
              4 CALL_FUNCTION            1
              6 UNPACK_SEQUENCE          1
              8 RETURN_VALUE
```

Bytecode after this PR:
```
ORIGINAL BYTECODE fn /home/yifu/microbench/del2.py line 18
 19           0 LOAD_GLOBAL              0 (foo)
              2 LOAD_FAST                0 (a)
              4 CALL_FUNCTION            1
              6 STORE_FAST               1 (b)

 20           8 LOAD_GLOBAL              1 (graph_break_fn)
             10 LOAD_FAST                1 (b)
             12 CALL_FUNCTION            1
             14 STORE_FAST               2 (c)

 22          16 LOAD_GLOBAL              2 (bar)
             18 LOAD_FAST                2 (c)
             20 CALL_FUNCTION            1
             22 RETURN_VALUE

MODIFIED BYTECODE fn /home/yifu/microbench/del2.py line 18
 18           0 LOAD_GLOBAL              3 (__compiled_fn_0)
              2 LOAD_FAST                0 (a)
              4 CALL_FUNCTION            1
              6 STORE_FAST               3 (graph_out_0)
              8 LOAD_GLOBAL              1 (graph_break_fn)
             10 LOAD_FAST                3 (graph_out_0)
             12 LOAD_CONST               1 (0)
             14 BINARY_SUBSCR
             16 DELETE_FAST              3 (graph_out_0)

 20          18 CALL_FUNCTION            1
             20 LOAD_GLOBAL              4 (__resume_at_14_1)
             22 ROT_TWO
             24 CALL_FUNCTION            1
             26 RETURN_VALUE

ORIGINAL BYTECODE torch_dynamo_resume_in_fn_at_20 /home/yifu/microbench/del2.py line 20
 20           0 LOAD_FAST                0 (___stack0)
              2 JUMP_ABSOLUTE            9 (to 18)
              4 LOAD_GLOBAL              0 (foo)
              6 LOAD_FAST                1 (a)
              8 CALL_FUNCTION            1
             10 STORE_FAST               2 (b)
             12 LOAD_GLOBAL              1 (graph_break_fn)
             14 LOAD_FAST                2 (b)
             16 CALL_FUNCTION            1
        >>   18 STORE_FAST               3 (c)

 22          20 LOAD_GLOBAL              2 (bar)
             22 LOAD_FAST                3 (c)
             24 CALL_FUNCTION            1
             26 RETURN_VALUE

MODIFIED BYTECODE torch_dynamo_resume_in_fn_at_20 /home/yifu/microbench/del2.py line 20
 20           0 LOAD_GLOBAL              3 (__compiled_fn_2)
              2 LOAD_FAST                0 (___stack0)
              4 CALL_FUNCTION            1
              6 UNPACK_SEQUENCE          1
              8 RETURN_VALUE

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122658
Approved by: https://github.com/jansel, https://github.com/anijain2305
2024-03-26 22:49:05 +00:00
IvanKobzarev
9b095c3fe6 [dynamo] Config to not emit runtime asserts (#122603)
Repetition on squashed & merged by mistake https://github.com/pytorch/pytorch/pull/122406

Differential Revision: [D55312394](https://our.internmc.facebook.com/intern/diff/D55312394)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122603
Approved by: https://github.com/ezyang
2024-03-25 21:17:44 +00:00
Edward Z. Yang
47a9725de9 Implement prefer_deferred_runtime_asserts_over_guards (#122090)
Fixes https://github.com/pytorch/pytorch/issues/121749

As promised, it is pretty easy.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122090
Approved by: https://github.com/lezcano
2024-03-25 16:31:16 +00:00
William Wen
23524710e6 [dynamo] use proxies to nn.Module in dynamo generated GraphModules (#120756)
Fixes remaining refleaks found when debugging https://github.com/pytorch/pytorch/issues/119607, tests added in https://github.com/pytorch/pytorch/pull/120657.

Also fixes some tests that xfail: https://github.com/pytorch/pytorch/issues/120631 (not entirely sure why), but introduced tests now fail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120756
Approved by: https://github.com/jansel
2024-03-21 21:23:12 +00:00
IvanKobzarev
8a94005d46 [dynamo][runtime_asserts] Ignore failures on sorting sympy relations (#122205)
Differential Revision: [D55075500](https://our.internmc.facebook.com/intern/diff/D55075500)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122205
Approved by: https://github.com/ezyang
2024-03-20 13:25:37 +00:00
Jason Ansel
c05bf0037d [dynamo] Remove copy_graphstate/restore_graphstate (#122067)
Some dead code cleanup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122067
Approved by: https://github.com/oulgen
2024-03-19 15:37:53 +00:00
Animesh Jain
c568b84794 [dynamo][guards] Move backend match to eval_frame (#121954)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121954
Approved by: https://github.com/jansel
2024-03-17 06:52:10 +00:00
Jason Ansel
0b7d9711d4 [dynamo] Add support for nn.Parameter constructor (part 2) (#120965)
This handles the case where the tensor isn't an input.

The changes to dynamo tests are cases where we would previously fall back to eager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120965
Approved by: https://github.com/yanboliang
ghstack dependencies: #121735
2024-03-16 04:29:58 +00:00
Animesh Jain
a623666066 [dynamo][compile-time] Make output_graph new_var linear (#121858)
Fixes https://github.com/pytorch/pytorch/issues/121679

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121858
Approved by: https://github.com/jansel
2024-03-15 03:20:04 +00:00
Jason Ansel
3c8c7e2a46 [dynamo] Tweak naming for module hook bw_state (#121609)
Some minor changes not related to the other PRs in the stack

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121609
Approved by: https://github.com/yanboliang
2024-03-12 16:27:56 +00:00
Animesh Jain
78b4793c96 [dynamo][compile-time] Caching VTs to reduce compile-time (#121031)
Reduces the `torch.compile(backend="eager")` for this code

~~~
def fn(x):
    for _ in range(10000):
        # x = torch.sin(x)
        x = torch.ops.aten.sin(x)
        # x = sin(x)

    return x
~~~

From 18 seconds to 12 seconds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121031
Approved by: https://github.com/jansel
2024-03-12 09:19:50 +00:00
Jason Ansel
7cc476ea16 [dynamo] Fix support for nn.Parameter constructor (part 1) (#120163)
This captures calls to `torch.nn.Parameter` by lifting them to graph inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120163
Approved by: https://github.com/albanD, https://github.com/yanboliang
ghstack dependencies: #121086
2024-03-11 05:14:42 +00:00
IvanKobzarev
9a45001905 [dynamo] relax missing symbols runtime assert (#121339)
Differential Revision: [D54603361](https://our.internmc.facebook.com/intern/diff/D54603361)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121339
Approved by: https://github.com/ezyang
2024-03-07 14:53:38 +00:00
angelayi
c844b377fa [dynamo] Reorder logs (#116106)
Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792.

Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600

There are some limitations to the printing right now:
* You can only register logging functions, not methods
* Inputs to the logging functions can only be tensors, constants, and format strings
* Inputs to the logging functions which will later be mutated in-place will not be printed correctly

TODO: Add the following tests
* print function with argument of nested data structure;
* print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly);
* custom defined logging functions with nn.Module or nn.Module attribute arguments;
* custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value);
* custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage);

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106
Approved by: https://github.com/yanboliang
2024-03-01 17:04:24 +00:00
PyTorch MergeBot
63b259492a Revert "[dynamo] Reorder logs (#116106)"
This reverts commit c5472628ff.

Reverted https://github.com/pytorch/pytorch/pull/116106 on behalf of https://github.com/clee2000 due to landrace with 342e7929b8, which removed the import for warnings.  Should be an easy fix after rebase c5472628ff ([comment](https://github.com/pytorch/pytorch/pull/116106#issuecomment-1972586180))
2024-03-01 06:25:46 +00:00
Angela Yi
c5472628ff [dynamo] Reorder logs (#116106)
Currently when there is a print/warning in the graph, dynamo graph breaks causing export to fail. However export would like to just skip over these print/warning calls: https://github.com/pytorch/pytorch/issues/113792.

Additionally there's a torch.compile feature request to "reorder prints" so that instead of graph breaking when hitting prints/logging, we can skip over these prints to create larger compiled graphs, and then print the results out after those compiled graphs: https://github.com/pytorch/pytorch/issues/93739. This PR also adds the `reorderable_logging_functions` config for users to register logging functions to be reordered (like `print` or a custom logging function). Printout of the bytecode after reordering the prints looks like the following: P914736600

There are some limitations to the printing right now:
* You can only register logging functions, not methods
* Inputs to the logging functions can only be tensors, constants, and format strings
* Inputs to the logging functions which will later be mutated in-place will not be printed correctly

TODO: Add the following tests
* print function with argument of nested data structure;
* print function with argument of nested data structure being updated inside of compile region (this would test if we handle side effect correctly);
* custom defined logging functions with nn.Module or nn.Module attribute arguments;
* custom defined logging functions with submodule input/output as arguments (we need to handle the mapping and fused-out value);
* custom defined logging functions with tensor argument and mutation inside of the function (TBD: this may increase memory usage);

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116106
Approved by: https://github.com/yanboliang
2024-03-01 04:48:44 +00:00
Jason Ansel
01ec8df6d8 [Compiled Autograd] Introduce BackwardState capture (#120382)
This adds support for backwards hooks that are *both*:
1) Interior to the graph; and
2) Dynamically generated (e.g. lambdas)

We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo *after* the forwards runs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382
Approved by: https://github.com/xmfan
2024-02-28 20:36:47 +00:00
Edward Z. Yang
1a1fc1047d Add structured trace logs (#120289)
Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit

How to read the diff:
* Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes)
* torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs
* torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines.
* torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log.
* test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable.

https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289
Approved by: https://github.com/Skylion007
ghstack dependencies: #120712
2024-02-28 01:01:41 +00:00
PyTorch MergeBot
f3dd2a544c Revert "Add structured trace logs (#120289)"
This reverts commit 9dfaef962c.

Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))
2024-02-27 19:49:05 +00:00
Edward Z. Yang
9dfaef962c Add structured trace logs (#120289)
Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit

How to read the diff:
* Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes)
* torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs
* torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines.
* torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log.
* test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable.

https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs.

Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289
Approved by: https://github.com/Skylion007
2024-02-27 00:04:23 +00:00
angelayi
759204253f [export] Change runtime asserts to using assert_scalar (#119608)
By changing runtime symbolic asserts to using assert_scalar, the asserts can call into `expect_true` and modify the shape env so that we can run through the traced graph module with fake tensors. With assert_async, the asserts only get hit during runtime, but that means if we run the graph module with fake tensors, the asserts will not affect the shape env, so later data dependent calls to the fake tensors may result in GuardOnDataDependentSymNode errors.

https://github.com/pytorch/pytorch/issues/119587

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119608
Approved by: https://github.com/ezyang
2024-02-26 17:56:12 +00:00
PyTorch MergeBot
47300221c2 Revert "[export] Change runtime asserts to using assert_scalar (#119608)"
This reverts commit f4d641ba2f.

Reverted https://github.com/pytorch/pytorch/pull/119608 on behalf of https://github.com/huydhn due to This break ONNX trunk job 65fd8b6730 ([comment](https://github.com/pytorch/pytorch/pull/119608#issuecomment-1947436402))
2024-02-15 22:25:24 +00:00