Commit Graph

252 Commits

Author SHA1 Message Date
lezcano
207b06d099 [dynamo] Wrap ndarray dunder methods (#107689)
Fixes https://github.com/pytorch/pytorch/issues/107437

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107689
Approved by: https://github.com/ezyang
ghstack dependencies: #107687, #107688, #107710, #107711, #107746
2023-08-23 13:55:36 +00:00
lezcano
612c8a8c84 Guard numpy imports in the dynamo folder (#107299)
Fixes https://github.com/pytorch/pytorch/issues/107228

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107299
Approved by: https://github.com/atalman
2023-08-21 19:07:20 +00:00
Edward Z. Yang
36bb7a1f42 Add fast traceback utilities (#107358)
This adds some utilities for conveniently working with fast combined CapturedTraceback from Python. The main goal of these utilities is to make it easier for people to use CapturedTraceback as a drop-in replacement for `traceback.extract_stack`, which is 20x slower than CapturedTraceback.

I port symbolic shapes to use the new CapturedTraceback code, to validate that the APIs work and are useful.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107358
Approved by: https://github.com/zdevito, https://github.com/albanD
ghstack dependencies: #107438
2023-08-18 19:05:54 +00:00
Michael Lazos
e0d6072f69 Add API to mark input tensors static for cudagraphs (#107154)
Adds API to mark tensor as a static input -
To make this trigger recompiles properly, I'll need to update tensor match checks to also check for this new attribute

Additional concern is memory - the tensors will be kept alive, but this is the current behavior for nn modules and parameters.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107154
Approved by: https://github.com/eellison
2023-08-16 04:38:19 +00:00
Yanbo Liang
fbfb9a1648 [Dynamo] Improve PT2 fbcode logging observability (#106932)
Summary:
https://docs.google.com/document/d/1D5K3_ELsda3tIUeSyNL_2yee-M3jVWbirqSQ5BDNvHQ/edit

This is the revamped version of D47908299.

For each frame, we will record a list of compilation metrics: e.g, backend_compile time, entire_frame_compile time, cache_size, co_filename, co_firstlineno, co_name, guards, graph input_count, graph node_count, graph op_count.

With the help of job info: mast_job_name, global_rank, we can satisfy the requirements from `Things I’ve used/wanted to use our logging to determine` in https://docs.google.com/document/d/1D5K3_ELsda3tIUeSyNL_2yee-M3jVWbirqSQ5BDNvHQ/edit (or add more metrics for this framework)

Test Plan:
```
buck2 test //caffe2/test:test_dynamo
```

Differential Revision: D48142400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106932
Approved by: https://github.com/anijain2305
2023-08-11 20:46:04 +00:00
lezcano
a9dca53438 NumPy support in torch.compile (#106211)
RFC: https://github.com/pytorch/rfcs/pull/54
First commit is the contents of https://github.com/Quansight-Labs/numpy_pytorch_interop/

We have already been using this in core for the last few months as a external dependency. This PR pulls all these into core.

In the next commits, I do a number of things in this order
- Fix a few small issues
- Make the tests that this PR adds pass
- Bend backwards until lintrunner passes
- Remove the optional dependency on `torch_np` and simply rely on the upstreamed code
- Fix a number dynamo tests that were passing before (they were not tasting anything I think) and are not passing now.

Missing from this PR (but not blocking):
- Have a flag that deactivates tracing NumPy functions and simply breaks. There used to be one but after the merge stopped working and I removed it. @lezcano to investigate.
- https://github.com/pytorch/pytorch/pull/106431#issuecomment-1667079543. @voznesenskym to submit a fix after we merge.

All the tests in `tests/torch_np` take about 75s to run.

This was a work by @ev-br, @rgommers @honno and I. I did not create this PR via ghstack (which would have been convenient) as this is a collaboration, and ghstack doesn't allow for shared contributions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106211
Approved by: https://github.com/ezyang
2023-08-11 00:39:32 +00:00
Jason Lu
bc88028e8e Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743)
Summary:
Original commit changeset: 81319beb97f3

Original Phabricator Diff: D47961182

Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822

Reviewed By: atuljangra

Differential Revision: D48131623

@diff-train-skip-merge
(D48131623 landed internally)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743
Approved by: https://github.com/malfet
2023-08-08 15:27:34 +00:00
Thomas Ortner
cc21fa75a3 Enable dynamic shapes of torch.nn.Parameter (#105855)
This PR adds a new configuration that enables shapes of torch.nn.Parameter to be treated as dynamic in order to avoid extensive recompilation when Paramters are used instead of Tensor.

This features addresses part of issue #105279

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105855
Approved by: https://github.com/ezyang
2023-08-08 05:40:01 +00:00
Mikayla Gawarecki
d8e5f2aa6d Reland "Make adding buffers more like adding parameters (#104069)" (#106224)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224
Approved by: https://github.com/atalman, https://github.com/albanD
2023-07-31 17:18:56 +00:00
Michael Voznesensky
8549abc347 Grab bag of DTensor enablement stuff (Enable whole graph capture for DTensor) (#105787)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105787
Approved by: https://github.com/ezyang
2023-07-30 00:17:45 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
angelayi
b0a04331b4 [dynamo] Fix import if numpy is not installed (#105711)
This [line](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/allowed_functions.py#L18) results in an import issue if numpy is not installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105711
Approved by: https://github.com/yanboliang, https://github.com/ezyang
2023-07-21 05:52:32 +00:00
William Wen
777fc0bb58 [dynamo] fine-grained bytecode-source attribution in python 3.11 (#104676)
Since Python 3.11 bytecode contains endline and column information, for each bytecode, we attribute the source code corresponding to the bytecode in a more accurate way. For example, we can highlight a function call in a series of nested function calls, or highlight a function call spanning multiple lines.

Sample:
```python
import torch
import torch._dynamo
from functorch.experimental.control_flow import cond

def h(x):
    return x * 5

def true_fn(x):
    return x * 2

def false_fn(x):
    return x * 3

def f(pred, x):
    x = h(
        h(h(x))
    )
    x = x[1:][:2]
    torch._dynamo.graph_break()
    x = cond(pred, true_fn, false_fn, [x])

opt_f = torch.compile(f, backend="eager")
opt_f(torch.tensor(True), torch.randn(3, 3, 3, 3))
```

Output:
```
$ TORCH_LOGS="trace_call" python playground9.py
TRACE inlined call h from f /scratch/williamwen/work/pytorch/playground9.py:16
        h(h(x))
          ~^^^
TRACE FX call mul from h /scratch/williamwen/work/pytorch/playground9.py:6 (inline depth: 1)
    return x * 5
           ~~^~~
TRACE inlined call h from f /scratch/williamwen/work/pytorch/playground9.py:16
        h(h(x))
        ~^^^^^^
TRACE FX call mul_1 from h /scratch/williamwen/work/pytorch/playground9.py:6 (inline depth: 1)
    return x * 5
           ~~^~~
TRACE inlined call h from f /scratch/williamwen/work/pytorch/playground9.py:15
    x = h(
        ~^
        h(h(x))
        ^^^^^^^
    )
    ^
TRACE FX call mul_2 from h /scratch/williamwen/work/pytorch/playground9.py:6 (inline depth: 1)
    return x * 5
           ~~^~~
TRACE FX call getitem from f /scratch/williamwen/work/pytorch/playground9.py:18
    x = x[1:][:2]
        ~^^^^
TRACE FX call getitem_1 from f /scratch/williamwen/work/pytorch/playground9.py:18
    x = x[1:][:2]
        ~~~~~^^^^
TRACE inlined call true_fn from <resume in f> /scratch/williamwen/work/pytorch/playground9.py:20
    x = cond(pred, true_fn, false_fn, [x])
        ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TRACE FX call mul from true_fn /scratch/williamwen/work/pytorch/playground9.py:9 (inline depth: 1)
    return x * 2
           ~~^~~
TRACE inlined call false_fn from <resume in f> /scratch/williamwen/work/pytorch/playground9.py:20
    x = cond(pred, true_fn, false_fn, [x])
        ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TRACE FX call mul from false_fn /scratch/williamwen/work/pytorch/playground9.py:12 (inline depth: 1)
    return x * 3
           ~~^~~
TRACE FX call cond from <resume in f> /scratch/williamwen/work/pytorch/playground9.py:20
    x = cond(pred, true_fn, false_fn, [x])
        ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104676
Approved by: https://github.com/ezyang
2023-07-20 17:18:52 +00:00
Andrey Talman
c6653b65d8 Back out "Make adding buffers more like adding parameters (#104069)" (#105581)
Summary:
D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/

with `TypeError: register_buffer() takes 3 positional arguments but 4 were given`

Original commit changeset: d4b4069fbd38

Original Phabricator Diff: D47537831

Test Plan:
```
buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform
```

Reviewed By: atalman

Differential Revision: D47600140

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581
Approved by: https://github.com/mikaylagawarecki
2023-07-20 03:39:53 +00:00
kshitij12345
e137ac6c59 [dynamo][torch_np] support linalg, random and fft module (#105320)
Support tracing through `np.linalg` with `torch_np` installed. Will update with other modules if this approach makes sense.

TODO:
* [x] Add test for `fft` and `random`.

Fixes https://github.com/pytorch/pytorch/issues/105269

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105320
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-07-19 11:06:37 +00:00
Michael Lazos
1597dd7a54 Report guard failures with recompiles logging (#105500)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105500
Approved by: https://github.com/Chillee, https://github.com/anijain2305
2023-07-19 02:20:44 +00:00
Wanchao Liang
cb23373264 [dynamo] allow tensor subclass fakification in dynamo (#105308)
This PR adds necessary plumbing through torchdynamo to allow tensor
subclasses with certain contract (i.e. with `__tensor_flatten__` and
`__tensor_unflatten__`) to goes through the dynamo fakification pass by
fakifying the tensor subclass internal components.

Some of the tensor subclass contract logic mostly borrowed from
https://github.com/pytorch/pytorch/pull/97540

Added some tests to verify simply passing through a tensor subclass
(i.e. DTensor) through dynamo eager works as expected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105308
Approved by: https://github.com/ezyang
2023-07-18 17:28:04 +00:00
Aleksandar Samardžić
5d473a950f Make conversions from/to sparse semi-structured always @torch.compile-d (#105272)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105272
Approved by: https://github.com/ezyang
2023-07-18 04:51:28 +00:00
lezcano
a26afb9848 Better comparisons for np.ndarrays in dynamo (#105333)
This takes tolerances into account.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105333
Approved by: https://github.com/larryliu0820
2023-07-17 20:20:50 +00:00
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Animesh Jain
95232c216b [dynamo] Bugfix for enums (#105306)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105306
Approved by: https://github.com/yanboliang
2023-07-17 16:39:16 +00:00
lezcano
b190f46514 Allow NumPy code in torch.compile to run on cuda (#104699)
This can be achieved by doing `torch.set_default_device("cuda")`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104699
Approved by: https://github.com/ezyang, https://github.com/larryliu0820
2023-07-06 18:43:09 +00:00
Animesh Jain
8c191d8eef [dynamo][ac] Reland #104397 - Remove disable monkeypatching of utils.checkpoint (#104665)
NO CHANGE from before. The ancestor diff was reverted, so this diff got reverted as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104665
Approved by: https://github.com/wconstab
2023-07-06 00:48:02 +00:00
Animesh Jain
4005152b92 [dynamo] Organize higherorderops variable trackers (#104565)
The main change is moving the higherorderops from torch.py to higher_order_ops.py. And creating smaller subclasses of HigherOrderOp for cond, map etc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104565
Approved by: https://github.com/zou3519
2023-07-05 22:19:26 +00:00
PyTorch MergeBot
40f53912cf Revert "[dynamo][ac] Remove disable monkeypatching of utils.checkpoint (#104397)"
This reverts commit 537a6c0651.

Reverted https://github.com/pytorch/pytorch/pull/104397 on behalf of https://github.com/huydhn due to This has been reverted internally by D47216591, so I need to also revert it on OSS to keep them in sync ([comment](https://github.com/pytorch/pytorch/pull/104397#issuecomment-1621086360))
2023-07-05 06:11:08 +00:00
Animesh Jain
537a6c0651 [dynamo][ac] Remove disable monkeypatching of utils.checkpoint (#104397)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104397
Approved by: https://github.com/wconstab
2023-06-30 02:27:06 +00:00
Animesh Jain
2bb83cd45c [dynamo][ac] Minor refactor for better code organization and a bugfix (#104276)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104276
Approved by: https://github.com/zou3519
2023-06-29 12:57:59 +00:00
cdzhan
c06bb82ba1 fix specialization when you pass an unspec int into slicing on a Python list. (#104142)
Fixes #103545

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104142
Approved by: https://github.com/malfet, https://github.com/jansel
2023-06-28 13:13:07 +00:00
Animesh Jain
75dab587ef [dynamo] FSDP + AC + torch.compile (#103953)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953
Approved by: https://github.com/wanchaol
2023-06-24 01:40:56 +00:00
Vinay Kumar Burugu
3c28431a0f Feature: Dump compile_times when TORCH_LOGS=dynamo is enabled. (#104057)
Partial implementation of  https://github.com/pytorch/pytorch/issues/103173. This PR only implements the feature to dump compile_times at the end of the session using the atexit handler.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104057
Approved by: https://github.com/ezyang
2023-06-23 05:25:09 +00:00
Thiago Crepaldi
6f655d4195 Add symbolic tracing support to torch._dynamo.export (fake input + weights) (#100017)
Fixes #95900
Using the following repro as guide:

```python
import torch
import torch._dynamo
from torch._subclasses import fake_tensor
from torch.fx.experimental.symbolic_shapes import ShapeEnv
from torch._dynamo.output_graph import config
class Model(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = torch.nn.Linear(2, 2)
        self.linear2 = torch.nn.Linear(2, 2)

    def forward(self, x):
        out = self.linear(x)
        out = self.linear2(out)
        return out

fake_mode = fake_tensor.FakeTensorMode(allow_non_fake_inputs=False,
                                       allow_fallback_kernels=True,
                                       shape_env=ShapeEnv(
                                            allow_scalar_outputs=config.capture_scalar_outputs,
                                            allow_dynamic_output_shape_ops=config.capture_dynamic_output_shape_ops,
                                            frame_id=0
                                        ),
)
# Fakefying input/model before calling torch._dynamo.export
with fake_mode:
    fake_x = torch.rand(5, 2, 2)
    model = Model()

# Calling torch._dynamo.export without active fake mode
graph_module, guards = torch._dynamo.export(
    model,
    fake_x,
    aten_graph=True,
    fake_mode=fake_mode
)
graph_module.print_readable()
graph_module.graph.print_tabular()
```

Summary of changes:

    * Plumb fake_mode through torch.export API. When specified, it
    replaces the creation of a new FaketendorMode at InstructionTranslator on behalf of OutputGraph
     Hacks FakeTensor.__new__ to prevent a
    torch.tensor._make_subclass call for inputs that are already fakefied by
    user. This probably need to be fixed in a nicer way. Any idea?
    * Removed a few asserts that didn't want faked tensors coming
    from user script
    * Added torch._subclasses.fake_tensor.FakeTensor to type list on a few
    asserts check to allow fake inputs

The changes above allowed symbolic tracing with both static and dynamic shapes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100017
Approved by: https://github.com/ezyang
2023-06-15 21:28:10 +00:00
Mengwei Liu
96c23fe212 [dynamo][numpy] Add support for builtin functions (#103457)
In order to be able to run stuff like:
```
def f(x):
	a = x.numpy()
        return a + a
```
This PR adds a branch in `BuiltinVariable` to handle `NumpyNdarrayVariable` case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103457
Approved by: https://github.com/ezyang
2023-06-15 09:18:45 +00:00
Animesh Jain
16c2090b2d [benchmark][compile] Limit number of bounding boxes to 5 (#103413)
Depends on https://github.com/pytorch/benchmark/pull/1729

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103413
Approved by: https://github.com/ezyang
2023-06-15 01:06:40 +00:00
Edward Z. Yang
ddf4cd69ec Delete ifdyn and ifunspec combinators (#103596)
Replaced with expect tests for ease of updating.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103596
Approved by: https://github.com/voznesenskym
2023-06-15 00:14:17 +00:00
Animesh Jain
bd0ed940b7 [activation checkpoint][dynamo] Wrap AC into Tag based higher order op (#102935)
These are the numbers with this PR

![image](https://github.com/pytorch/pytorch/assets/13822661/63e991d5-80e2-4e94-8e4b-243621c3990e)

There are 3 main followups
* A naive partitioner gives better memory footprint than min-cut partitioner here. Currently, we are using min-cut partitioner. Waiting for @Chillee  to discuss this further to either modify min-cut or add a naive partitioner.
* aot_eager is < 1x memory footprint. This is true even for non AC models. This could hide some inefficiency somewhere.
* inductor is giving very different memory numbers between AOT-traced-AC (duplicate early) vs this implementation. This leads to some inefficiency in inductor that we need to resolve.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102935
Approved by: https://github.com/jansel
2023-06-14 20:15:43 +00:00
Edward Z. Yang
8b015c166c Don't test dynamic_shapes in tensor_always_has_static_shape (#103517)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103517
Approved by: https://github.com/anijain2305
2023-06-14 07:04:17 +00:00
Mengwei Liu
2eac8bd2b8 [dynamo][numpy] Support ndarray methods (#97537)
This PR adds universal support for ndarray methods. After #100839 each `NumpyNdarrayVariable` should wrap a `torch.Tensor`. This PR adds a `numpy_method_wrapper` which converts the `torch.Tensor` to `torch_np.ndarray` and then call the numpy ndarray method. Then we also try to return a `torch.Tensor` (return as-is if the value is not ndarray-like)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97537
Approved by: https://github.com/ezyang
2023-06-12 17:21:31 +00:00
Edward Z. Yang
12cd1dbba0 Handle recursive tuple in clone_inputs (#102979)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102979
Approved by: https://github.com/wconstab
2023-06-05 22:11:48 +00:00
Michael Lazos
c46af25bb3 Initialize optimizer in dynamo to avoid graph break and tracing slowness (#102640)
On calls to `_init_group` rather than tracing through it, extract python values from the arguments, and call the initialization. This avoids having to trace this function which is very slow with large parameters, and also avoids graph breaking on it. This is sound in this case because the state is only initialized once in the eager case. Guards on the state and params are generated explicitly rather than via tracing the initialization.

Caveats:
`_init_group` also gathers various state tensors into lists via mutating list arguments to pass to the functional optimizer implementation. These state tensors exist on the optimizer itself, but we don't know exactly how the gathering is done and which tensors correspond to which attributes of the optimizer module (each optimizer has different states). To rectify this, we keep weak_ptrs to all of the tensors collected in the lists in globals (similar to how parameter keys are stored for dictionaries). These pointers are guaranteed to be alive as long as the optimizer object is alive if the internal state is not interfered with and they are guarded with weakref guards

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102640
Approved by: https://github.com/jansel
2023-06-03 15:49:51 +00:00
Mengwei Liu
c304fddf68 [dynamo][numpy] Support graph break for numpy ndarray (#100839)
Issue: #93684

In previous PRs #95849 #99560 we redirect `numpy.*`, `<tensor>.numpy()` calls to `torch_np.*` methods and attributes, by creating `NumpyNdarrayVariable` for those calls.

We need to handle `NumpyNdarrayVariable` when graph break happens.

This PR did 2 things:
1. In `codegen.py` we made sure we can reconstruct the value wrapped by `NumpyNdarrayVariable`, to be `torch_np.ndarray` in the stack whenerver we recompiles the subgraph.
2. In `builder.py` we can wrap the value to be `NumpyNdarrayVariable` and save it as graph input.

-----

Starting from commit 6:

## A new design for supporting numpy in dynamo

In short the core concept doesn't change: we still convert `numpy` API calls to `torch_np` API calls. However, instead of wrapping a `torch_np.ndarray` in `NumpyNdarrayVariable`, the new design wraps a `torch.Tensor`.

The reason for doing this change is because we need to keep `torch.Tensor` everywhere in the captured graph, so that it works well with the backend of dynamo. See discussions in https://github.com/Quansight-Labs/numpy_pytorch_interop/issues/142 for details.

### Flow
This is an example showing how do we think about dynamo working on a simple function:
```python
def f(x: torch.Tensor, y: torch.Tensor):
    a, b = x.numpy(), y.numpy()
    c = np.add(x, y)
    return torch.from_numpy(c)
```
```

              +------------+             +------------+
 torch.Tensor |            |numpy.ndarray|            |
 -------------- .numpy()   --------------|            |
              |            |             |            |             +------------------+
              +------------+             | numpy.add  |numpy.ndarray|                  |torch.Tensor
              +------------+             |            --------------| torch.from_numpy --------------
 torch.Tensor |            |numpy.ndarray|            |             |                  |
 -------------- .numpy()   --------------|            |             +------------------+
              |            |             |            |
              +------------+             +------------+

              +------------+             +----------------+
 torch.Tensor |            |torch.Tensor |                |
 -------------- .detach()  --------------|                |
              |            |             |                |                +----------------+            +------------+
              +------------+             |                |torch_np.ndarray|                |torch.Tensor|            |torch.Tensor
                                         | torch_np.add   -----------------| util.to_tensor -------------| .detach()  --------------
              +------------+             |                |                |                |            |            |
 torch.Tensor |            |torch.Tensor |                |                +----------------+            +------------+
 -------------- .detach()  --------------|                |
              |            |             |                |
              +------------+         |   +----------------+                                   |
                                     |                       wrapper on torch_np.add          |
                                     +--------------------------------------------------------+
```

### Approach

`torch_np` APIs can take both `torch_np.ndarray` as well as `torch.Tensor`. What  we need to do is to have a wrapper for these APIs to convert the return value back to `torch.Tensor`. This way only the wrapper is showing up in the captured graph, with `torch.Tensor`s as input and `torch.Tensor` as output.

If we have a graph break or we've traced to the end of the program, we need to inspect all the `NumpyNdarrayVariable` in the stack and convert them back to `numpy.ndarray`, to make sure the compiled version is still behaving the same as the eager version.

### Examples
Here's an example of the graph generated:

```python
def fn(x: np.ndarray, y: np.ndarray):
    a = x.real
    b = y.real
    torch._dynamo.graph_break()
    return np.add(a, 1), np.add(b, 1)
```

Graph generated:

```
[2023-05-16 10:31:48,737] torch._dynamo.output_graph.__graph: [DEBUG] TRACED GRAPH
 __compiled_fn_0 <eval_with_key>.0 opcode         name            target                                                      args                    kwargs
-------------  --------------  ----------------------------------------------------------  ----------------------  --------
placeholder    l_x_            L_x_                                                        ()                      {}
placeholder    l_y_            L_y_                                                        ()                      {}
call_function  from_numpy      <built-in method from_numpy of type object at 0x12b1fdc80>  (l_x_,)                 {}
call_function  from_numpy_1    <built-in method from_numpy of type object at 0x12b1fdc80>  (l_y_,)                 {}
call_function  attr_wrapper    <function attr_wrapper at 0x12e8693a0>                      (from_numpy, 'real')    {}
call_function  attr_wrapper_1  <function attr_wrapper at 0x12e8693a0>                      (from_numpy_1, 'real')  {}
output         output          output                                                      ((),)                   {}

[2023-05-16 10:31:48,908] torch._dynamo.output_graph.__graph: [DEBUG] TRACED GRAPH
 __compiled_fn_2 <eval_with_key>.1 opcode         name           target                                                      args                             kwargs
-------------  -------------  ----------------------------------------------------------  -------------------------------  --------
placeholder    l_a_           L_a_                                                        ()                               {}
placeholder    l_b_           L_b_                                                        ()                               {}
call_function  from_numpy     <built-in method from_numpy of type object at 0x12b1fdc80>  (l_a_,)                          {}
call_function  from_numpy_1   <built-in method from_numpy of type object at 0x12b1fdc80>  (l_b_,)                          {}
call_function  wrapped_add    <Wrapped function <original add>>                           (from_numpy, 1)                  {}
call_function  wrapped_add_1  <Wrapped function <original add>>                           (from_numpy_1, 1)                {}
output         output         output                                                      ((wrapped_add, wrapped_add_1),)  {}

```
### Changes

* `codegen.py`: reconstruct `numpy.ndarray` from `NumpyNdarrayVariable` by adding bytecode to call `utils.to_numpy_helper()`.
*  `output_graph.py`: getting rid of legacy code that does exactly what `codegen.py` does, which only handling return case but not graph break case.
*  `utils.py`: added helpers to convert `numpy.ndarray` to `torch.Tensor` and vice versa. Also adding a wrapper class that takes in a function. In `__call__` it calls the function and converts its out to `torch.Tensor` (or a list of it).
* `builder.py`: add method to wrap `numpy.ndarray` graph inputs into `NumpyNdarrayVariable`, by calling `torch.numpy` in the proxy.
* `misc.py`: `numpy` API calls goes into `NumpyVariable` and we find the function with the same name in `torch_np` module, then wrap it with the wrapper defined in `utils.py`.
* `tensor.py`, `torch.py`: proxy `tensor.numpy()` to be `torch.detach()` but wrap it with `NumpyNdarrayVariable`. Similarly, `torch.from_numpy()` -> `torch.detach()` but wrap it with `TensorVariable`. In `NumpyNdarrayVariable`, do the similar `torch_np.ndarray` to `torch.Tensor` wrapping for attributes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100839
Approved by: https://github.com/ezyang
2023-06-03 00:54:25 +00:00
Edward Z. Yang
90b1b17c9f Fix string concatenation with non-string (#102728)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102728
Approved by: https://github.com/Skylion007
2023-06-01 20:02:03 +00:00
Animesh Jain
2fa1b563da [dynamo] Activation checkpoint higher order ops - Reland 101028 (#101790)
https://github.com/pytorch/pytorch/pull/101028 was reverted due to internal breakage. Relanding.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101790
Approved by: https://github.com/zou3519
2023-05-18 19:09:14 +00:00
Yanbo Liang
7052fb37bd [Dynamo] Improve handling UnspecializedNNModuleVariable side effect (#101141)
Fixes #101102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101141
Approved by: https://github.com/jansel
2023-05-16 03:57:13 +00:00
PyTorch MergeBot
d0db7d624d Revert "[dynamo] Activation checkpointing as higher order op (#101028)"
This reverts commit de15e740a1.

Reverted https://github.com/pytorch/pytorch/pull/101028 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/101028#issuecomment-1548280970))
2023-05-15 17:47:08 +00:00
Michael Lazos
d75f93603a Flatten exceptions in dynamo (#100779)
Fixes https://github.com/pytorch/pytorch/issues/93571

[before and after](https://gist.github.com/mlazos/256b0e8f0f98495752a22b960e9f4fcb)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100779
Approved by: https://github.com/ezyang
2023-05-13 00:58:57 +00:00
Animesh Jain
de15e740a1 [dynamo] Activation checkpointing as higher order op (#101028)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101028
Approved by: https://github.com/voznesenskym, https://github.com/zou3519
2023-05-12 03:17:41 +00:00
Jerry Zhang
c3f3cb5b0f [quant][pt2e] Support conv bn fusion in convert step for QAT flow (#100442)
Summary:
This PR adds support for folding bn weights into conv for QAT flow, this is equivalent
to the QAT branch of `from_float` in eager mode quantized conv module: https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/conv.py#L223

Items that needs followup:
* there are some workaround I did because quantize_per_tensor is using float/int args and dynamo does not support these args, need to fix after we change the quantized model representation and also change these args to Tensor

Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_convert_qat_conv_bn_fusion (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: andrewor14

Differential Revision: D45344281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100442
Approved by: https://github.com/kimishpatel
2023-05-09 19:43:51 +00:00
Bin Bao
86ddfc7f68 [inductor] Move cpp wrapper trigger logic to inner_compile (#100611)
Summary: This enables cpp wrapper for backward as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100611
Approved by: https://github.com/jansel
2023-05-08 15:24:02 +00:00
Animesh Jain
3f025c607c summarize graph breaks (#100696)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100696
Approved by: https://github.com/yanboliang
2023-05-05 22:27:47 +00:00
Edward Z. Yang
ce1ad1c143 Add load_storage (#100519)
This adds a new operator debugprims::load_storage which does the unusual thing of loading a tensor from disk (via ContentStoreReader). This will be used in a later PR to implement delta debugging in the minifier, even when the repro is too big to fit into memory. The way it works is that you specify a name of the tensor you want to load, as well as enough metadata to reconstruct the tensor, if the store isn't available. If there is an active content store, we read and return the tensor from that store; otherwise we use `rand_strided` to create it.

I needed some infra improvements to do this:

* `custom_op` now supports factory functions. Factory functions have to be registered specially via `impl_factory`
* I modified `clone_input` to also support dtype conversion, which I use to change the dtype of a loaded tensor if necessary.
* ContentStore needs to work with a device argument, so we torch.load directly to the correct device. This is for fake tensor support.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100519
Approved by: https://github.com/zou3519, https://github.com/anijain2305
2023-05-05 05:25:03 +00:00
Animesh Jain
8994d9e610 [dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590)
For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590
Approved by: https://github.com/voznesenskym, https://github.com/wconstab
2023-05-04 18:52:21 +00:00
Edward Z. Yang
c7e9f40653 Misc accuracy improvements on minifier (#100447)
The changes:

* Add config knob `same_two_models_use_fp64` for toggling whether or not to use fp64
* Add a test showing that RMSE is superior to atol/rtol
* Add `--strict-accuracy` options, which allows for testing against integral/boolean accuracy.  Regular accuracy by default now ONLY. There's a test which exercises this, it's a little delicate but I had trouble thinking of a good test otherwise.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100447
Approved by: https://github.com/voznesenskym
2023-05-04 02:51:26 +00:00
kshitij12345
8b64dee5d2 [fix] torch_compile_debug don't log with 0 (#100462)
Fixes https://github.com/pytorch/pytorch/issues/99906

Tested locally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100462
Approved by: https://github.com/mlazos
2023-05-03 08:23:09 +00:00
Richard Zou
984a2397ba Refactor OutputGraph (#99987)
This PR splits OutputGraph into two classes:
- SubgraphTracer (handles FX-tracing)
- OutputGraph (handles Dynamo-specific output graph logic, like
tracking graph inputs, compiling the graph, and executing it).

The motivation behind this is in the next PR up in the stack.
TL;DR is: in order to do higher-order operators, we need nested
SubgraphTracer, one for each level of nesting of the higher-order
operators.

I'm happy to flatten the stack into a single PR, but this separate made
it easier for me to test. Lmk if you want the stack flattened.

Test Plan:
- existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99987
Approved by: https://github.com/anijain2305, https://github.com/voznesenskym
2023-05-02 17:11:02 +00:00
Michael Voznesensky
aafc6ce8cc Produce constant variables in cases where a SymNode is created with a constant (#100144)
` AOT_DYNAMIC_SHAPES=1 TORCHDYNAMO_DYNAMIC_SHAPES=1  benchmarks/dynamo/huggingface.py --performance  --training --amp --backend eager --disable-cudagraphs --device cuda --only AllenaiLongformerBase --explain`

Looks promising!

Goes from:

Dynamo produced 173 graphs covering 2760 ops with 160 graph breaks (14 unique)

To:

Dynamo produced 6 graphs covering 2298 ops with 15 graph breaks (7 unique)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100144
Approved by: https://github.com/ezyang
2023-05-01 21:32:11 +00:00
Edward Z. Yang
2d8deffc1e Refactor repro/minifier into CLI; add analyze (#100226)
This is a two part PR; I can split it if you really want me to.

The first part is a refactor of the after aot repro/minifier scripts to come with a command line interface. I maintain exact BC with the previous interface (so, e.g., you still get a repro.py and a run_minifier.py that do the same thing as before), but each of these scripts also take command line arguments now which you can use to customize what actually happens. Check `run_repro` for full documentation on the arguments.

The second part of this is an implementation of `analyze` subcommand on the new CLI for any repro.

<img width="1277" alt="image" src="https://user-images.githubusercontent.com/13564/235045677-8545aab7-5e83-4813-bbec-47783dc60122.png">

This facility is oriented towards accuracy debugging. It does several things:

1. It will run your model twice and check for nondeterminism in inductor/float64, *even* on intermediate inputs (our benchmarking nondeterminism test only checks for nondeterminism on the final output). This makes localizing which operator is nondeterministic easy.
2. It will run your compiled model side-by-side with eager and float64 variants, and then report when things diverge too far from RMSE delta from float64.

Importantly, it does all this without requiring every intermediate to be held in memory (which will cause an OOM on large repros, such as the one I tested this on.)

Some other minor improvements:

* MinifierTestBase now has an easy to comment out spot that you can use to retain the temporary directory; good for debugging
* We print "running minifier" and "running repro" in MinifierTestBase to make it easier to orient where logs are coming from
* same takes a `log_error` optional argument which you can use to reroute the error logs when things mismatch
* counters["inductor"]["intermediate_hooks"] tracks the number of intermediate hooks we've codegen'ed; good for populate the tqdm interface
* torch.fx.interpreter gets an official `boxed_run` interface which uses the boxed arguments calling convention and doesn't retain inputs unnecessarily long
* torch.utils._content_store gets compute_tensor_metadata/read_tensor_metadata helper functions for computing tensor information without serializing it

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100226
Approved by: https://github.com/bertmaher, https://github.com/bdhirsh, https://github.com/anijain2305
2023-05-01 11:12:38 +00:00
PyTorch MergeBot
89c43f4108 Revert "Produce constant variables in cases where a SymNode is created with a constant (#100144)"
This reverts commit d7bdfd3454.

Reverted https://github.com/pytorch/pytorch/pull/100144 on behalf of https://github.com/ezyang due to ci failure is real ([comment](https://github.com/pytorch/pytorch/pull/100144#issuecomment-1529587039))
2023-05-01 11:10:48 +00:00
Michael Voznesensky
d7bdfd3454 Produce constant variables in cases where a SymNode is created with a constant (#100144)
` AOT_DYNAMIC_SHAPES=1 TORCHDYNAMO_DYNAMIC_SHAPES=1  benchmarks/dynamo/huggingface.py --performance  --training --amp --backend eager --disable-cudagraphs --device cuda --only AllenaiLongformerBase --explain`

Looks promising!

Goes from:

Dynamo produced 173 graphs covering 2760 ops with 160 graph breaks (14 unique)

To:

Dynamo produced 6 graphs covering 2298 ops with 15 graph breaks (7 unique)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100144
Approved by: https://github.com/ezyang
2023-04-30 17:13:57 +00:00
Animesh Jain
03806eddbf [dynamo] Compile torchvision augmentations (#100292)
Resolves https://github.com/pytorch/pytorch/issues/100112

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100292
Approved by: https://github.com/jansel
2023-04-29 02:59:41 +00:00
Larry Liu
f5853342ea [dynamo][numpy] Handle return value being numpy ndarray (#99560)
On top of #95849 this PR is trying to handle the special case when dealing with numpy.

Consider the following example:

```
def f(x: torch.Tensor) -> np.ndarray:
	a = x.numpy()
	return a.T
```
In previous PR this will error out because we translate `a.T` to be a method call on `torch_np.ndarray.T` which is also a `torch_np.ndarray`.

This PR handles this case, by conditionally converting a `torch_np.ndarray` to `np.ndarray` before returning, to match the original behavior.

The compiled version will be:

```
def f(x):
    ___tmp_0 = __compiled_fn_0(x)
    if isinstance(___tmp_0, torch_np.ndarray):
        return ___tmp_0.tensor.numpy()
    else:
        return ___tmp_0
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99560
Approved by: https://github.com/jansel, https://github.com/yanboliang
2023-04-27 16:18:35 +00:00
Larry Liu
687afeb686 [dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849)
Issue: #93684

# Problem

Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations.

# Design (as I know it)

* Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`.
* Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent.

This PR adds `NumpyTensorVariable` and supports:
1.  tensor to ndarray, ndarray to tensor
2. numpy functions such as numpy.meshgrid()
3. ndarray attributes such as `itemsize`, `stride`

Next PR will handle returning `np.ndarray` and add support for ndarray methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849
Approved by: https://github.com/ezyang
2023-04-27 16:18:35 +00:00
Animesh Jain
3dcc7b396c [easy] iterate dict with sorted keys for accuracy checking (#99793)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99793
Approved by: https://github.com/jansel
2023-04-24 21:26:35 +00:00
Edward Z. Yang
f602b3a6ae Preserve mark_dynamic when cloning inputs (#99617)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99617
Approved by: https://github.com/ngimel, https://github.com/voznesenskym, https://github.com/anijain2305
2023-04-22 19:46:31 +00:00
Michael Voznesensky
0ac0d9d224 Pass locals to enum_repr to correctly make the guard str for enums (#99680)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99680
Approved by: https://github.com/jansel
2023-04-21 07:14:49 +00:00
Yanbo Liang
05809c7d3b [Dynamo] No graph break for explicit calling Conv{1/2/3}d.forward & ConvTranspose{1/2/3}d.forward (#99015)
Before this PR, if users call ```Conv2d(x)```, dynamo handles it well(no graph break) and puts a ```call_module``` op in the FX graph. However, if users explicitly call ```Conv2d.forward(x)``` in another ```forward``` function, the inlining would be failed(caused graph break). This PR fixed this issue by translating the explicit ```Conv2d.forward(x)``` to ```Conv2d(x)```.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99015
Approved by: https://github.com/jansel, https://github.com/wconstab
2023-04-15 08:04:13 +00:00
Michael Voznesensky
10fbdcf72c Re-PR of 90269 - Force all nn_module associated tensors to be static (#99108)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99108
Approved by: https://github.com/ezyang
2023-04-14 05:53:48 +00:00
Angela Yi
1d077f28ed [export] Constraints API (#98433)
Wrapper for users to insert constraints into model code.

The constraints will not be maintained in the graph after tracing through make_fx so retracing with dynamo/make_fx will not work. This will be supported after torch._assert supported is implemented. Then we can convert the constrain_range calls to torch._asserts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98433
Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan
2023-04-13 21:20:10 +00:00
PyTorch MergeBot
ab761605ae Revert "[export] Constraints API (#98433)"
This reverts commit 1510eb4072.

Reverted https://github.com/pytorch/pytorch/pull/98433 on behalf of https://github.com/izaitsevfb due to Breaks internal tests, asked by author to revert
2023-04-12 23:37:19 +00:00
PyTorch MergeBot
629377ea8b Revert "Replace _dynamo.config with an object instead of module (#96455)"
This reverts commit 420104a886.

Reverted https://github.com/pytorch/pytorch/pull/96455 on behalf of https://github.com/jansel due to BC breaking, was landed prematurely
2023-04-12 15:06:14 +00:00
Angela Yi
1510eb4072 [export] Constraints API (#98433)
Wrapper for users to insert constraints into model code.

The constraints will not be maintained in the graph after tracing through make_fx so retracing with dynamo/make_fx will not work. This will be supported after torch._assert supported is implemented. Then we can convert the constrain_range calls to torch._asserts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98433
Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan
2023-04-12 01:32:44 +00:00
Han Qi
420104a886 Replace _dynamo.config with an object instead of module (#96455)
Summary:
    Replace _dynamo.config with an object instead of module

    Current usage patterns of setting and reading fields on config will work
    unchanged.

    Only changes needed going forward:
    1. import torch._dynamo.config will not work. However, just doing
       import torch._dynamo is sufficient to access dynamo config
       as torch._dynamo.config.

    2. Files inside of _dynamo folder need to access config via
       from torch._dynamo.config_util import config instead of
       from torch._dynamo import config. Because _dynamo/__init__.py
       imports some of the files so it would be circular import.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96455
Approved by: https://github.com/williamwen42
2023-04-11 21:23:32 +00:00
Edward Z. Yang
b8b840be3d Convert logging f-strings to use % format, part five (#98765)
This does some annoying but simple cases by hand.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765
Approved by: https://github.com/wanchaol
2023-04-11 13:17:59 +00:00
Edward Z. Yang
822464567f Lazily format graphs for debug printing (#98776)
The current code unconditionally formats the graphs, which is a
waste of CPU if no one looks at them.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98776
Approved by: https://github.com/albanD, https://github.com/mlazos
2023-04-10 22:41:33 +00:00
Edward Z. Yang
b09722f540 Convert logging f-strings to use % format, part two (#98700)
This hits multi-line logging strings

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Edward Z. Yang
9a8f71f23e Convert logging f-strings to use % format (#98697)
Codemod done with
https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with
assistance from ChatGPT.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
YJ Shi
5ceae85f1c [Dynamo] Include UserDict in clone_inputs (#97725)
Fixes #97724

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97725
Approved by: https://github.com/yanboliang
2023-04-08 00:19:35 +00:00
Horace He
c75dd7c413 grab bag of changes (#98572)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98572
Approved by: https://github.com/shunting314, https://github.com/mlazos
2023-04-07 20:02:59 +00:00
Will Constable
390c51bf87 Skip nnmodule hook guards by default (#98371)
This PR makes basic nnmodule forward hooks work by default, without any overhead.  But it leaves silent correctness issues if users modify/remove their hooks later, thus also emits a warning.

- the usual case is to not use hooks, so avoid guard overhead here
- registering any hook before compile will trigger a warning about hook support
- registering a hook later (or removing one) requires user knowledge and opting in,
  currently this isn't warnable (but maybe we can observe compiled nnmodules to make it
  warnable).

Why skip hook guards by default instead of not tracing __call__/hooks by default?
- avoid having a mode flag that alters dynamo tracing behavior (harder to test both codepaths
  in CI with full coverage)
- the most basic hook usecase (registering a hook before compile, and never removing it)
  will work by default with this PR, while it would require enablement and incur overhead
  in the 'not tracing __call__' proposal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98371
Approved by: https://github.com/jansel
2023-04-07 15:10:51 +00:00
Edward Z. Yang
d01ee10b25 Add detect_fake_mode (#98321)
This replaces fake_mode_from_tensors but it preferentially looks for
fake_mode in TracingContext and also if there is an active fake mode
on the dispatch stack, before groveling in tensors to find it.

This advances PegasusForCausalLM, which was previously failing because
we generated a graph that had a parameter (non-fake) and a SymInt,
and thus previously we failed to detect the correct fake mode.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98321
Approved by: https://github.com/voznesenskym
2023-04-05 22:15:16 +00:00
Yanbo Liang
b1c2925493 [Dynamo] Support typing.Union and typing.Optional (#98384)
Fixes #98265

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98384
Approved by: https://github.com/ezyang
2023-04-05 21:31:52 +00:00
Michael Voznesensky
b1e60bfb6a Pass f_locals as a dict rather than kwargs (#98107)
Fixes https://github.com/pytorch/pytorch/issues/97688

One big problem is that instead of printing x < y we now print
`E["x"] < E["y"]` and now all of the tests wobbled and I'm mad.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98107
Approved by: https://github.com/ezyang
2023-04-04 00:30:08 +00:00
Yanbo Liang
a6bd21d935 [Dynamo] Eagerly initializing Lazy Module to reduce graph breaks (#97946)
Fixes Meta internal user case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97946
Approved by: https://github.com/wconstab
2023-04-03 22:24:43 +00:00
Jason Ansel
35b3309539 Fix graph break from inline patched init (#98150)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98150
Approved by: https://github.com/anijain2305, https://github.com/yanboliang
2023-04-03 01:11:30 +00:00
Michael Lazos
ee9a9b7add Remove old logging callsites (#98095)
Get around GH first issue, OSS only changes for https://github.com/pytorch/pytorch/pull/97182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98095
Approved by: https://github.com/anijain2305
2023-04-01 00:57:37 +00:00
William Wen
14ef91cea6 [dynamo 3.11] small bug fixes (#96508)
Bugs fixed:
	- CALL_FUNCTION_EX expects null pop in symbolic_convert
	- make_function_with_closure codegen requires a push_null
	- copy over the closure in eval_frame.c
	- add JUMP_FORWARD to terminal opcodes
	- enum repr fix in utils.py
	- fix symbolic_convert's break_graph_if_unsupported wrapper

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96508
Approved by: https://github.com/jansel
2023-03-31 18:18:12 +00:00
David Berard
c218309f88 [dynamo] profiler.record_function on all dynamo_timed functions (#96495)
**Summary**: profiler.record_function inserts an event into the chrome trace generated by the pytorch profiler. This PR adds record_function everywhere that @dynamo_timed is annotated.

dynamo_timed and the CLI viewer torch._dynamo.utils.compile_times() are already useful on their own; but for identifying _when_ these get called, it's nice to be able to view in the profiler chrome trace.

Why not just turn on python stack traces in the profiler to get this information? Dynamo compilation is implemented in python and therefore produces a huge amount of events when it records compilation steps. The resulting trace files are often too large to load in chrome://tracing, and they take a long time to generate. Additionally, the stack traces are deep enough that they are often hard to read. This approach produces much more readable traces with lower overhead.

**Tests**:
- Added in test/dynamo/test_profiler.py. Verified in https://github.com/pytorch/pytorch/actions/runs/4559322864/jobs/8043307798?pr=96495 that the tests are actually running.
- Performance run with `ciflow/inductor-perf-compare` shows no noticeable change in compilation time or speedup numbers. Geomean speedup changes from 1.275 -> 1.277. Geomean compilation times change from 54.2s -> 53.8s. That's likely just due to noise. All individual benchmark numbers regressed by no more than 5% between the two runs; and we see improvements of around the same magnitude, suggesting this is, again, just noise. For meta employees, you can see the results in a google sheets here: https://docs.google.com/spreadsheets/d/1Ki69XvcgxcA3ZnqC5n_jav5KiD4u7Wojlad3VTnIdlk/edit?usp=sharing

**Example**:

Run this:

```python
import torch

def gn(x):
    return x.sin().cos()

def fn(x, y):
    return x.sin() * y.cos()

x, y = [torch.rand((2, 2), device='cuda') for _ in range(2)]

# just to clear out any lazy initialization
with torch.profiler.profile() as prof:
    torch.compile(gn)(x)

with torch.profiler.profile() as prof:
    torch.compile(fn)(x, y)

prof.export_chrome_trace("./dynamo_timed_profile.json")
```

and we can see that the resulting trace shows important dynamo steps, even when python tracing is turned off.

<img width="867" alt="Screenshot 2023-03-29 at 7 26 15 PM" src="https://user-images.githubusercontent.com/5067123/228712263-8ae67ab9-1a52-4765-a9c2-7c5cf0abe2f5.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96495
Approved by: https://github.com/ngimel, https://github.com/mlazos
2023-03-30 21:49:02 +00:00
Edward Z. Yang
fb7f983357 Graph break on operators that fake tensor doesn't support (#97708)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97708
Approved by: https://github.com/eellison
2023-03-28 19:49:54 +00:00
vfdev
0f424f7f05 Fixed broken link to troubleshooting.html docs page (#97330)
Seen first in error message:
```
[2023-03-22 10:30:39,786] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
   function: '<resume in paste_mask_in_image>' (/vision/torchvision/models/detection/roi_heads.py:407)
   reasons:  w == 857
to diagnose recompilation issues, see https://pytorch.org/docs/master/dynamo/troubleshooting.html.
[2023-03-22 10:30:40,036] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
   function: '<resume in paste_mask_in_image>' (/vision/torchvision/models/detection/roi_heads.py:406)
   reasons:  ___stack0 == 207
to diagnose recompilation issues, see https://pytorch.org/docs/master/dynamo/troubleshooting.html.
```

Broken link:
- https://pytorch.org/docs/master/dynamo/troubleshooting.html.

Good link:
- https://pytorch.org/docs/master/compile/troubleshooting.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97330
Approved by: https://github.com/zou3519
2023-03-22 16:40:21 +00:00
Will Constable
141a2ebcf1 Clean up Compilation Profiler (#97029)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97029
Approved by: https://github.com/voznesenskym
2023-03-21 06:24:22 +00:00
Michael Voznesensky
722c4e59a4 Replace source check with assert (#95640)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95640
Approved by: https://github.com/ezyang
2023-03-19 21:51:59 +00:00
Michael Lazos
a1c46e5f8f component-level configurable logging for dynamo, inductor, aot (#94858)
Summary:

Adds NNC-like logging that is configured through an env var `TORCH_COMPILE_LOGS`
Examples:
`TORCH_LOGS="dynamo,guards" python script.py` - prints dynamo logs at level INFO with guards of all functions that are compiled

`TORCH_LOGS="+dynamo,guards,graph" python script.py` - prints dynamo logs at level DEBUG with guards and graphs (in tabular) format of all graphs that are compiled

[More examples with full output](https://gist.github.com/mlazos/b17f474457308ce15e88c91721ac1cce)

Implementation:
The implementation parses the log settings from the environment, finds any components (aot, dynamo, inductor) or other loggable objects (guards, graph, etc.) and generates a log_state object. This object contains all of the enabled artifacts, and a qualified log name -> level mapping. _init_logs then adds handlers to the highest level logs (the registered logs), and sets any artifact loggers to level DEBUG if the artifact is enabled.

Note: set_logs is an alternative for manipulating the log_state, but if the environment contains TORCH_LOGS, the environment settings will be prioritized.

Adding a new log:
To add a new log, a dev should add their log name to torch._logging._registrations (there are examples there already).

Adding a new artifact:
To add a new artifact, a dev should add their artifact name to torch._logging._registrations as well.
Additionally, wherever the artifact is logged, `torch._logging.getArtifactLogger(__name__, <artifact_name>)` should be used instead of the standard logging implementation.

[design doc](https://docs.google.com/document/d/1ZRfTWKa8eaPq1AxaiHrq4ASTPouzzlPiuquSBEJYwS8/edit#)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94858
Approved by: https://github.com/ezyang
2023-03-18 04:17:31 +00:00
Edward Z. Yang
384d3ec2b6 Extra CR comments from #95621 (#96043)
Specifically:
063e441471 (r1120306196)
https://github.com/pytorch/pytorch/pull/95621#discussion_r1125015510

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96043
Approved by: https://github.com/Chillee, https://github.com/albanD
2023-03-10 01:10:48 +00:00
Horace He
5bbec680d7 Fix usages of contextmanager without finally (#96170)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96170
Approved by: https://github.com/ngimel, https://github.com/malfet
2023-03-08 20:59:27 +00:00
Edward Z. Yang
d303665d33 Make int unspecialization actually work (#95621)
OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor.

The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors.

* I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.)
* Only 0/1 integers get specialized by default now
* A hodgepodge of fixes. I'll comment on the PR about them.

Fixes https://github.com/pytorch/pytorch/issues/95469

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621
Approved by: https://github.com/jansel, https://github.com/Chillee
2023-03-04 01:22:08 +00:00
Michael Voznesensky
34a7c79eac Rename func (#95639)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95639
Approved by: https://github.com/ezyang
2023-03-01 23:03:09 +00:00
Edward Z. Yang
835122c89f Add missing f-string specifiers (#95707)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95707
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-02-28 20:20:05 +00:00
Kazuaki Ishizaki
46385b3e48 Fix typos under torch/_dynamo directory (#95599)
This PR fixes typos in comments and messages of `.py` files under `torch/_dynamo` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95599
Approved by: https://github.com/ezyang
2023-02-28 03:44:24 +00:00
Michael Voznesensky
eff5ae8746 Better mark_dynamic assertions (#95566)
This PR allows us to reuse the static per tensor decision making we make at fake tensorification time. We can use this to avoid setting up dynamic dim guards later if the tensor was never a candidate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95566
Approved by: https://github.com/ezyang
2023-02-28 00:02:22 +00:00
David Berard
a4085ab837 [dynamo] support custom __getattr__ on torch.nn.Modules (#94658)
**Summary**: torch.nn.Module implementations previously did not support custom implementations of `__getattr__`; if a torch.nn.Module subclass implemented `__getattr__` and we tried to access an attribute that was expected to be present in `__getattr__`, dynamo would not check `__getattr__` and would error out with an AttributeError. This PR copies the functionality from UserDefinedObjectVariable into torch.nn.Module so that it also supports `__getattr__`

Example of a module which previously would fail:

```python
class MyMod(torch.nn.Module):
		def __init__(self):
				super().__init__()
				self.custom_dict = {"queue": [torch.rand((2, 2)) for _ in range(3)]}
				self.other_attr = torch.rand((2, 2))

		def __getattr__(self, name):
				custom_dict = self.custom_dict
				if name in custom_dict:
						return custom_dict[name]
				return super().__getattr__(name)

		def forward(self, x):
				return x @ self.other_attr + self.queue[-1]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94658
Approved by: https://github.com/yanboliang, https://github.com/jansel
2023-02-16 04:00:51 +00:00
jon-chuang
d1d5d16df3 dynamo: handle straight-line graph breaks for autocast context manager with constant args (#94137)
Fixes https://github.com/pytorch/pytorch/issues/93890

We do the following:
1. fix __init__constructor for `AutocastModeVariable` with exisiting `mode` while copying
2. `resume_execution` is made aware of constant args (`target_values`), by storing said args in `ReenterWith`. To propagate between subgraphs (in straightline code), we also store the constant args in the downstream's `code_options["co_consts"]` if not already.

---

Future work:
1. handle instantiating context manager in non-inlineable functions. Simultaneously fix nested grad mode bug.
2. generalize to general `ContextManager`s
3. generalize to variable arguments passed to context manager, with guards around the variable.

---

Actually, if we look at the repro: 74592a43d0/test/dynamo/test_repros.py (L1249), we can see that the method in this PR doesn't work for graph breaks in function calls, in particular, in function calls that don't get inlined.

Why inlining functions with graph breaks is hard:
- When we handle graph breaks, we create a new code object for the remainder of the code. It's hard to imagine doing this when you are inside a function, then we need a frame stack. And we just want to deal with the current frame as a sequence of straight line codes.

Why propagating context manager information is hard:
- If we do not inline the function, the frame does not contain any information about the parent `block_stack` or `co_consts`. So we cannot store it on local objects like the eval frame. It has to be a global object in the output_graph.

---

Anyway, I'm starting to see clearly that dynamo must indeed be optimized for torch use-case. Supporting more general cases tends to run into endless corner-cases and caveats.

One direction that I see as viable to handle function calls which have graph breaks and `has_tensor_in_frame` is stick with not inlining them, while installing a global `ContextManagerManager`, similar to the `CleanupManager` (which cleans up global variables). We can know which context managers are active at any given point, so that we can install their setup/teardown code on those functions and their fragments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94137
Approved by: https://github.com/yanboliang
2023-02-14 14:00:37 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
Aaron Gokaslan
3d82d8d0ed [BE] Enable more flake8-comprehensions checks (#94601)
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.

This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
Xiaodong Wang
88e16849db [pt2] Fix multiple races in log folder (#93407)
Summary:
There are a few races/permission errors in file creation, fixing
OSS:
1. caffe2/torch/_dynamo/utils.py, get_debug_dir: multiple process may conflict on it even it's using us. Adding pid to it
2. caffe2/torch/_dynamo/config.py: may not be a right assumption that we have permission to cwd

Test Plan: sandcastle

Differential Revision: D42905908

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93407
Approved by: https://github.com/soumith, https://github.com/mlazos
2023-02-09 21:10:14 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Jason Ansel
ee2729890c Refactor dynamo register_backend/BACKENDS (#93389)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93389
Approved by: https://github.com/voznesenskym
2023-02-02 19:41:48 +00:00
Edward Z. Yang
ca9ebf9e2b Delete dynamo_import and inductor_import (#93851)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93851
Approved by: https://github.com/albanD, https://github.com/jansel
2023-02-02 01:51:29 +00:00
Edward Z. Yang
902b4dba75 Change capture_scalar_outputs to use SymInt/SymFloat rather than Tensor to model scalars (#93150)
Previously, Dynamo faked support for item() when `capture_scalar_outputs` was True by representing it internally as a Tensor. With dynamic shapes, this is no longer necessary; we can represent it directly as a SymInt/SymFloat. Do so. Doing this requires you to use dynamic shapes; in principle we could support scalar outputs WITHOUT dynamic shapes but I won't do this unless someone hollers for it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Differential Revision: [D42885775](https://our.internmc.facebook.com/intern/diff/D42885775)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93150
Approved by: https://github.com/voznesenskym
2023-01-31 21:23:23 +00:00
Edward Z. Yang
e5235fb62c Convert GuardOnDataDependentSymNode into graph break (#93373)
Extracted from https://github.com/pytorch/pytorch/pull/93150 because
I need it earlier in trunk.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93373
Approved by: https://github.com/Skylion007
2023-01-31 19:31:44 +00:00
Yanbo Liang
304d8dd6c8 [Dynamo] Support enum.Enum type as dict key (#93026)
Fixes Meta internal user case of using ```enum.Enum``` type as dict key, pleaser refer the added test case for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93026
Approved by: https://github.com/mlazos
2023-01-29 06:37:10 +00:00
David Berard
58acab4616 [dynamo] support [tensor].type(torch.FloatTensor) (#93043)
for some tensor x, x.type(torch.FloatTensor) will essentially do the same thing as x.to(torch.float). x.type can be called with at least 3 types of inputs:
* a string "torch.FloatTensor"
* a dtype torch.float
* a tensor type torch.FloatTensor

the third option (torch.FloatTensor) fails in fx, because fx cannot trace torch.FloatTensor objects.  So this PR will replace the torch.FloatTensor type with a string "torch.FloatTensor"

Why not fix this in fx? Well, it's possible, but I'm not sure a nice way to do it. We would want to update [torch.fx.node.BaseArgumentTypes](d88bc38b0c/torch/fx/node.py (L17)) to contain torch.FloatTensor etc. We could hard-code a list of tensor types there (the types vary depending on build type, e.g. whether or not cuda tensors are available), but that's not great in case our hardcoded list differs from the actual list registered by python_tensor.cpp. Another option is to dynamically populate the list of types with `Union[tuple(...)])`, and fill the tuple with `torch._tensor_classes` (which is directly populated by python_tensor.cpp), but apparently this breaks most typecheckers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93043
Approved by: https://github.com/jansel
2023-01-27 21:27:13 +00:00
Michael Voznesensky
d322f82b05 Add @count util to torch, use it to track benchmark stats (#93013)
<img width="1333" alt="image" src="https://user-images.githubusercontent.com/4755252/214687911-f766f072-c162-4298-9aed-c889f1375336.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93013
Approved by: https://github.com/ezyang
2023-01-26 03:09:12 +00:00
Michael Voznesensky
5778c04a15 Add --timing flag, phase timing to @dynamo_timed (#92637)
Ex output:
```
 TIMING:
 entire_frame_compile:8.574629999999999
 backend_compile:5.26806
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92637
Approved by: https://github.com/ezyang
2023-01-21 10:52:13 +00:00
PyTorch MergeBot
44132cc4b0 Revert "Add --timing flag, phase timing to @dynamo_timed (#92637)"
This reverts commit 773b513435.

Reverted https://github.com/pytorch/pytorch/pull/92637 on behalf of https://github.com/malfet due to Broke lint
2023-01-20 16:23:20 +00:00
Edward Z. Yang
387357539f Log accuracy failure in more cases (#92645)
Fixes https://github.com/pytorch/torchdynamo/issues/1910

But not durably, it's easy to forget if you add more cases.  I'd like
someone else to do that refactor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92645
Approved by: https://github.com/Chillee
2023-01-20 15:23:35 +00:00
Michael Voznesensky
773b513435 Add --timing flag, phase timing to @dynamo_timed (#92637)
Ex output:
```
 TIMING:
 entire_frame_compile:8.574629999999999
 backend_compile:5.26806
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92637
Approved by: https://github.com/ezyang
2023-01-20 05:01:21 +00:00
Michael Lazos
cac217c80a Fix key error formatting and move exc code to exc.py (#92593)
Fixes https://github.com/pytorch/torchdynamo/issues/1953 and moves exception formatting code from convert_frame.py to exc.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92593
Approved by: https://github.com/ezyang
2023-01-19 02:54:00 +00:00
lezcano
77b8aa6e43 Wrap a few more functions to ease their tracking during debugging (#92004)
Yup

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92004
Approved by: https://github.com/ezyang
2023-01-17 16:53:36 +00:00
Wu, Chunyuan
a111dd9014 [dynamo] support comparing numpy ndarray (#91870)
The output of Torchbench model `doctr_det_predictor` on CPU is a `numpy ndarray`. When running the accuracy benchmark of this model, the below error is raised: `RuntimeError: unsupported type: ndarray`.
Repro CMD:
```bash
python benchmarks/dynamo/torchbench.py --accuracy --float32 -dcpu  -n50 --inductor  --no-skip --dashboard --only doctr_det_predictor --batch_size 1 --threads 1
```

This PR adds the support to compare `numpy ndarray` in the dynamo utils.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91870
Approved by: https://github.com/jgong5, https://github.com/Chillee
2023-01-13 12:11:49 +00:00
Andrew M. James
7cd951c21e Properly guard all numpy usage within dynamo and remove UnspecializedNumpyVariable (#90795)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90795
Approved by: https://github.com/ngimel, https://github.com/cpuhrsch
2023-01-06 22:36:38 +00:00
Tugsbayasgalan Manlaibaatar
d4713b4c7d [dynamo] Fix bug in tensor.item fake tensor propogation (#91668)
When we run the node with fake value for tensor.item, it would previously error because the utility method doesn't know how to handle placeholder node. The tensor we are calling item can be input from user will be placeholder in the graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91668
Approved by: https://github.com/voznesenskym
2023-01-04 19:51:19 +00:00
Mengchi Zhang
d1123c94a7 [pytorch] Update troubleshooting_url (#91298)
Summary: Update new troubleshooting_url. Old one does not exist.

Test Plan: None

Differential Revision: D42205626

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91298
Approved by: https://github.com/jianyuh
2022-12-22 21:29:54 +00:00
Bin Bao
8992eec781 [inductor] Update how REQUIRE_HIGHER_TOLERANCE is handled (#91227)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91227
Approved by: https://github.com/kit1980
2022-12-21 05:43:39 +00:00
PyTorch MergeBot
94262efc7d Revert "[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105)"
This reverts commit d6dd2e97da.

Reverted https://github.com/pytorch/pytorch/pull/91105 on behalf of https://github.com/atalman due to Broke internal builds
2022-12-21 00:02:38 +00:00
lezcano
e6fcf7ad9d Remove breakpoint (#91128)
This was left in https://github.com/pytorch/pytorch/pull/90026

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91128
Approved by: https://github.com/kit1980
2022-12-20 22:14:35 +00:00
Jason Ansel
d6dd2e97da [inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105)
https://github.com/pytorch/pytorch/pull/90738 seems a bit borked. ghimport fails on it, and I unlinked it from the Phabricator diff, but it still won't land.  This is an exact copy that PR without using ghstack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91105
Approved by: https://github.com/ngimel
2022-12-20 02:38:23 +00:00
Edward Z. Yang
bbea58d500 Stop using GraphArgs for shape env guard source tracking (#90911)
GraphArgs worked fairly well, but it was still missing sources
sometimes.  Now, we maintain an auxiliary data structure which we
MUST populate whenever we fakeify a tensor / allocate a bare SymInt.
This should guarantee once and for all that every symbol is available.
Should fix swin_base_patch4_window7_224.

While I was at it, I moved fakeification utility back to builder
as it was only used at once call site.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90911
Approved by: https://github.com/voznesenskym
2022-12-16 05:22:56 +00:00
Edward Z. Yang
45109ec30a Completely redo how ShapeEnv guards are generated (#90528)
Instead of inferring shape mappings from a bunch of data structures that were plumbed in InstructionTranslator, we instead work out mappings by just iterating over the GraphArgs and mapping symbols to arguments as they show up. If multiple argument sizes/strides/offset map to the same symbol, this means they are duck sized, so we also generate extra equality tests that they must be equal. Finally, we generate 0/1 specialization guards. The resulting code is much shorter, and I think also easier to understand.

TODO: Delete all the tensor ref tracking code, it's unnecessary

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90528
Approved by: https://github.com/voznesenskym
2022-12-10 13:35:04 +00:00
Edward Z. Yang
b68dead20c Keep track of source name on all allocated SymInts (#90295)
Wow, I had to sweat so much to get this PR out lol.

This PR enforces the invariant that whenever we allocate SymInts as part of fakeification, the SymInt is associated with a Source, and in fact we store the string source name on SymbolWithSourceName. We use 'sname' as the shorthand for source name, as 'name' is already used by sympy to name symbols.

In order to store source names, we have to plumb source names from Dynamo to PyTorch. This made doing this PR a bit bone crushing, because there are many points in the Dynamo codebase where we are improperly converting intermediate tensors into fake tensors, where there is no source (and there cannot be, because it's a frickin' intermediate tensor). I've fixed all of the really awful cases in earlier PRs in the stack. This PR is just plumbing in source names from places where we do have it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90295
Approved by: https://github.com/voznesenskym
2022-12-10 13:17:34 +00:00
Bin Bao
f7cdd3a7a0 [inductor] Use a large tolerance for botnet26t_256 (#90383)
Summary: botnet26t_256 shows random tolerance failure on CI. The root
cause of this randomness is still to-be-invesitgated, but let's use a
larger tolerance for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90383
Approved by: https://github.com/ezyang
2022-12-07 19:35:06 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
Edward Z. Yang
3d4b92b171 Ensure that we fakeify tensor subclasses when they are initially tracked (#90009)
The old code didn't actually fakeify traceable tensor subclasses at the
time they are added as a GraphArg to the module; now we do, by ignoring
the subclass during fakeification and relying on Dynamo to simulate
the subclass on top.  See comments for more details.

BTW, this codepath is super broken, see filed issues linked on the
inside.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90009
Approved by: https://github.com/wconstab, https://github.com/voznesenskym
2022-12-06 22:36:32 +00:00
Michael Voznesensky
3b9a386d48 Add TORCH_FAKE_TENSOR_DEBUG use it to enable storage of traces on fake tensors at init time (#90215)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90215
Approved by: https://github.com/ezyang
2022-12-06 22:28:52 +00:00
Michael Voznesensky
41c3b41b92 Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773] (#90039)
After all of the preparatory commits, this is a subset of the
changes in https://github.com/pytorch/pytorch/pull/89392 that actually
change us to propagating fake tensors to backends.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

This is the merger of Ed's PR #89672, which is a rewrite of an older PR of mine (#89392), with CI Fixes on top of it (#89773)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90039
Approved by: https://github.com/ezyang
2022-12-05 01:56:50 +00:00
PyTorch MergeBot
4648baa911 Revert "Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773] (#90039)"
This reverts commit ef0c7ec958.

Reverted https://github.com/pytorch/pytorch/pull/90039 on behalf of https://github.com/clee2000 due to broke xla tests ef0c7ec958 https://github.com/pytorch/pytorch/actions/runs/3606308473/jobs/6077646142
2022-12-04 21:57:30 +00:00
Michael Voznesensky
ef0c7ec958 Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773] (#90039)
After all of the preparatory commits, this is a subset of the
changes in https://github.com/pytorch/pytorch/pull/89392 that actually
change us to propagating fake tensors to backends.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

This is the merger of Ed's PR #89672, which is a rewrite of an older PR of mine (#89392), with CI Fixes on top of it (#89773)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90039
Approved by: https://github.com/ezyang
2022-12-03 01:19:55 +00:00
Animesh Jain
3162a48a77 [dynamo][benchmarks] Call zero grad (#90026)
Hoping that it might reduce some flakiness

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90026
Approved by: https://github.com/williamwen42
2022-12-02 04:05:57 +00:00
Michael Voznesensky
b5616cd5f4 Add simple assert to detect fake tensors on modules (#89723)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89723
Approved by: https://github.com/ezyang
2022-11-28 08:57:33 +00:00
Edward Z. Yang
6904324781 Remove fake_tensor_propagation (#89646)
You always have to run dynamo with fake tensors.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646
Approved by: https://github.com/soumith
2022-11-25 03:27:32 +00:00
Edward Z. Yang
94a88b53ed Remove fake_tensors_available (#89637)
As we are one repo now, they are always available.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637
Approved by: https://github.com/anjali411
2022-11-24 19:28:10 +00:00
Shunting Zhang
e545caa50f dynamo/torchxla integration: trace on xla rather than eager (#88904)
In #87741 we added the inference support for dynamo/torchxla integration. Later on in #88449 we attempt to add the training support. That attempt is not smooth because
- we try 2 things together
   1. let dynamo trace the model on xla rather than eager
   2. enable training
- It turns out neither of these two tasks are trivial enough.

Furthermore, item 2 (enable training) depends on item 1 (tracing on xla). We enable training via AOTAutograd. AOTAutograd lift all model parameters/buffers as graph inputs. Without item 1 being done, we would need copy all graph inputs (including model parameters/buffers) from eager device to xla devices. That hurts performance a lot. Have a cache to map eager parameter to XLA parameter does not solve the problem since the update on either will not sync automatically to the other. They will easily go out of sync.

This PR let dynamo trace the model on XLA rather than eager. This is a preparation step to enabling training.

Also, tracing on XLA makes the data movement more efficient. We see 1.5x geomean speedup compared to previous 1.38x.
```
+-------------------------+--------------------+-------------------------+
| Model                   |   XLA (trace once) |   XLA (trace everytime) |
+=========================+====================+=========================+
| resnet18                |            1.38    |                 1.008   |
+-------------------------+--------------------+-------------------------+
| resnet50                |            1.227   |                 0.998   |
+-------------------------+--------------------+-------------------------+
| resnext50_32x4d         |            1.544   |                 1.008   |
+-------------------------+--------------------+-------------------------+
| alexnet                 |            1.085   |                 1.045   |
+-------------------------+--------------------+-------------------------+
| mobilenet_v2            |            2.028   |                 1.013   |
+-------------------------+--------------------+-------------------------+
| mnasnet1_0              |            1.516   |                 0.995   |
+-------------------------+--------------------+-------------------------+
| squeezenet1_1           |            0.868   |                 1.01    |
+-------------------------+--------------------+-------------------------+
| vgg16                   |            1.099   |                 1.008   |
+-------------------------+--------------------+-------------------------+
| BERT_pytorch            |            3.26    |                 1.027   |
+-------------------------+--------------------+-------------------------+
| timm_vision_transformer |            2.182   |                 1.015   |
+-------------------------+--------------------+-------------------------+
| geomean                 |            1.50389 |                 1.01261 |
+-------------------------+--------------------+-------------------------+
```

Example command
```
GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --only resnet18 --backend=torchxla_trace_once
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88904
Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/jansel
2022-11-22 03:57:04 +00:00
Animesh Jain
82713a1cc4 [inductor][compilation time] Fallback when kernel size for avg/max pool is large (#89448)
This fixes compilation time for yolov3 from 400 seconds to 48 seconds. yolov3 has a 13x13 max_pool2d kernel, which was creating really large Triton code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89448
Approved by: https://github.com/ngimel
2022-11-22 02:23:24 +00:00
Michael Voznesensky
808bdbab89 Fix try/except flow where DataDependentOutputException is getting wrapped in a RuntimeError (#89314)
Repro fixed

```
def fn(a):
    return a.repeat_interleave(14, dim=0).repeat_interleave(14, dim=1)

x = torch.ones(14, 14).to(dtype=torch.int64)
opt_fn = torch._dynamo.optimize("eager")(fn)
opt_fn(x)
```

Fixes [#1886](https://github.com/pytorch/torchdynamo/issues/1886)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89314
Approved by: https://github.com/anijain2305, https://github.com/eellison
2022-11-19 07:16:29 +00:00
Animesh Jain
cad5772c2c [dashboard][huggingface] skip accuracy checks for really large models… (#89273)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89273
Approved by: https://github.com/desertfire
2022-11-19 00:22:45 +00:00
Yanbo Liang
b72f5b9ae3 [Dynamo] Support typing.Mapping & Support function as argument (#88963)
These missing features come from https://github.com/pytorch/benchmark/pull/1302, where we'd like to enable E2E hf_bert dynamo train/eval. The dependent [HuggingFace accelerate library](https://huggingface.co/docs/accelerate/index) requires these improvements.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88963
Approved by: https://github.com/jansel
2022-11-17 06:57:42 +00:00
Michael Voznesensky
06ce1338bc [dynamo] Port all pytorch/dynamo and test/dynamo pieces over from symbolic-shapes branch (#88768)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88768
Approved by: https://github.com/jansel, https://github.com/ezyang
2022-11-13 04:50:21 +00:00
Bin Bao
91b71cdbe4 [dynamo] Add torch.device to is_safe_constant (#88766)
Test Plan:
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/test_torch.py -k  test_advancedindex_mixed_cpu_devices_cuda
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88766
Approved by: https://github.com/jansel
2022-11-11 15:06:17 +00:00
Elias Ellison
2ce2fc133d Disable Current Modes when printing Tensor (#88344)
Fix for https://github.com/pytorch/pytorch/issues/88087

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88344
Approved by: https://github.com/ezyang, https://github.com/samdow
2022-11-04 00:45:35 +00:00
Elias Ellison
9835413009 Fake Tensor For (Conv) Propagation (#87641)
Resubmitting https://github.com/pytorch/pytorch/pull/87302 so it can be ghstack'd with the pr below.

Incorrect strides in any meta impl would lead to runtime assertion errors for fallback kernels, so start by just enabling it for conv.

Replaces https://github.com/pytorch/pytorch/pull/87588.

cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87641
Approved by: https://github.com/jansel
2022-10-29 04:14:01 +00:00
Michael Lazos
44d7ba7efb Fix debug dir bugs and minifier output directories (#87682)
Fixes https://github.com/pytorch/torchdynamo/issues/1758, https://github.com/pytorch/torchdynamo/issues/1752

- minifier_launcher.py now dumps checkpoints to \<cwd\>/checkpoints when run
- a single debug directory is created per script invocation, asserts failing with no directory will no longer occur
- torchinductor debug tracing will correctly dump to the debug directory now since no prior setup is needed, (the directory was incorrectly only initialized during dynamo tracing)

cc @jansel @lezcano @fdrocha @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87682
Approved by: https://github.com/ezyang
2022-10-25 21:55:28 +00:00
Michael Voznesensky
bc19494814 [Dynamo] Symbolic shape guards (#87570)
**Introduces symbolic shape guards into dynamo.**

In this PR, we take the existing fake tensor infra and plumbing in dynamo and we start passing a shape_env around. This shape_env does not get plumbed down to middle layers / backend yet - it only collects expressions from frontend invocations at the moment. We then translate these expressions into guards at the point where we take other guards installed throughout dynamo - and add them to check_fn.

Part 1 of https://docs.google.com/document/d/1QJ-M4zfMkD-fjHIqW089RptjLl9EgozZGCceUbvmgfY/edit#

cc @jansel @lezcano @fdrocha @mlazos @soumith @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87570
Approved by: https://github.com/ezyang
2022-10-25 21:15:40 +00:00