Commit Graph

133 Commits

Author SHA1 Message Date
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Animesh Jain
95232c216b [dynamo] Bugfix for enums (#105306)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105306
Approved by: https://github.com/yanboliang
2023-07-17 16:39:16 +00:00
lezcano
b190f46514 Allow NumPy code in torch.compile to run on cuda (#104699)
This can be achieved by doing `torch.set_default_device("cuda")`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104699
Approved by: https://github.com/ezyang, https://github.com/larryliu0820
2023-07-06 18:43:09 +00:00
Animesh Jain
8c191d8eef [dynamo][ac] Reland #104397 - Remove disable monkeypatching of utils.checkpoint (#104665)
NO CHANGE from before. The ancestor diff was reverted, so this diff got reverted as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104665
Approved by: https://github.com/wconstab
2023-07-06 00:48:02 +00:00
Animesh Jain
4005152b92 [dynamo] Organize higherorderops variable trackers (#104565)
The main change is moving the higherorderops from torch.py to higher_order_ops.py. And creating smaller subclasses of HigherOrderOp for cond, map etc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104565
Approved by: https://github.com/zou3519
2023-07-05 22:19:26 +00:00
PyTorch MergeBot
40f53912cf Revert "[dynamo][ac] Remove disable monkeypatching of utils.checkpoint (#104397)"
This reverts commit 537a6c0651.

Reverted https://github.com/pytorch/pytorch/pull/104397 on behalf of https://github.com/huydhn due to This has been reverted internally by D47216591, so I need to also revert it on OSS to keep them in sync ([comment](https://github.com/pytorch/pytorch/pull/104397#issuecomment-1621086360))
2023-07-05 06:11:08 +00:00
Animesh Jain
537a6c0651 [dynamo][ac] Remove disable monkeypatching of utils.checkpoint (#104397)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104397
Approved by: https://github.com/wconstab
2023-06-30 02:27:06 +00:00
Animesh Jain
2bb83cd45c [dynamo][ac] Minor refactor for better code organization and a bugfix (#104276)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104276
Approved by: https://github.com/zou3519
2023-06-29 12:57:59 +00:00
cdzhan
c06bb82ba1 fix specialization when you pass an unspec int into slicing on a Python list. (#104142)
Fixes #103545

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104142
Approved by: https://github.com/malfet, https://github.com/jansel
2023-06-28 13:13:07 +00:00
Animesh Jain
75dab587ef [dynamo] FSDP + AC + torch.compile (#103953)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953
Approved by: https://github.com/wanchaol
2023-06-24 01:40:56 +00:00
Vinay Kumar Burugu
3c28431a0f Feature: Dump compile_times when TORCH_LOGS=dynamo is enabled. (#104057)
Partial implementation of  https://github.com/pytorch/pytorch/issues/103173. This PR only implements the feature to dump compile_times at the end of the session using the atexit handler.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104057
Approved by: https://github.com/ezyang
2023-06-23 05:25:09 +00:00
Thiago Crepaldi
6f655d4195 Add symbolic tracing support to torch._dynamo.export (fake input + weights) (#100017)
Fixes #95900
Using the following repro as guide:

```python
import torch
import torch._dynamo
from torch._subclasses import fake_tensor
from torch.fx.experimental.symbolic_shapes import ShapeEnv
from torch._dynamo.output_graph import config
class Model(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = torch.nn.Linear(2, 2)
        self.linear2 = torch.nn.Linear(2, 2)

    def forward(self, x):
        out = self.linear(x)
        out = self.linear2(out)
        return out

fake_mode = fake_tensor.FakeTensorMode(allow_non_fake_inputs=False,
                                       allow_fallback_kernels=True,
                                       shape_env=ShapeEnv(
                                            allow_scalar_outputs=config.capture_scalar_outputs,
                                            allow_dynamic_output_shape_ops=config.capture_dynamic_output_shape_ops,
                                            frame_id=0
                                        ),
)
# Fakefying input/model before calling torch._dynamo.export
with fake_mode:
    fake_x = torch.rand(5, 2, 2)
    model = Model()

# Calling torch._dynamo.export without active fake mode
graph_module, guards = torch._dynamo.export(
    model,
    fake_x,
    aten_graph=True,
    fake_mode=fake_mode
)
graph_module.print_readable()
graph_module.graph.print_tabular()
```

Summary of changes:

    * Plumb fake_mode through torch.export API. When specified, it
    replaces the creation of a new FaketendorMode at InstructionTranslator on behalf of OutputGraph
     Hacks FakeTensor.__new__ to prevent a
    torch.tensor._make_subclass call for inputs that are already fakefied by
    user. This probably need to be fixed in a nicer way. Any idea?
    * Removed a few asserts that didn't want faked tensors coming
    from user script
    * Added torch._subclasses.fake_tensor.FakeTensor to type list on a few
    asserts check to allow fake inputs

The changes above allowed symbolic tracing with both static and dynamic shapes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100017
Approved by: https://github.com/ezyang
2023-06-15 21:28:10 +00:00
Mengwei Liu
96c23fe212 [dynamo][numpy] Add support for builtin functions (#103457)
In order to be able to run stuff like:
```
def f(x):
	a = x.numpy()
        return a + a
```
This PR adds a branch in `BuiltinVariable` to handle `NumpyNdarrayVariable` case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103457
Approved by: https://github.com/ezyang
2023-06-15 09:18:45 +00:00
Animesh Jain
16c2090b2d [benchmark][compile] Limit number of bounding boxes to 5 (#103413)
Depends on https://github.com/pytorch/benchmark/pull/1729

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103413
Approved by: https://github.com/ezyang
2023-06-15 01:06:40 +00:00
Edward Z. Yang
ddf4cd69ec Delete ifdyn and ifunspec combinators (#103596)
Replaced with expect tests for ease of updating.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103596
Approved by: https://github.com/voznesenskym
2023-06-15 00:14:17 +00:00
Animesh Jain
bd0ed940b7 [activation checkpoint][dynamo] Wrap AC into Tag based higher order op (#102935)
These are the numbers with this PR

![image](https://github.com/pytorch/pytorch/assets/13822661/63e991d5-80e2-4e94-8e4b-243621c3990e)

There are 3 main followups
* A naive partitioner gives better memory footprint than min-cut partitioner here. Currently, we are using min-cut partitioner. Waiting for @Chillee  to discuss this further to either modify min-cut or add a naive partitioner.
* aot_eager is < 1x memory footprint. This is true even for non AC models. This could hide some inefficiency somewhere.
* inductor is giving very different memory numbers between AOT-traced-AC (duplicate early) vs this implementation. This leads to some inefficiency in inductor that we need to resolve.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102935
Approved by: https://github.com/jansel
2023-06-14 20:15:43 +00:00
Edward Z. Yang
8b015c166c Don't test dynamic_shapes in tensor_always_has_static_shape (#103517)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103517
Approved by: https://github.com/anijain2305
2023-06-14 07:04:17 +00:00
Mengwei Liu
2eac8bd2b8 [dynamo][numpy] Support ndarray methods (#97537)
This PR adds universal support for ndarray methods. After #100839 each `NumpyNdarrayVariable` should wrap a `torch.Tensor`. This PR adds a `numpy_method_wrapper` which converts the `torch.Tensor` to `torch_np.ndarray` and then call the numpy ndarray method. Then we also try to return a `torch.Tensor` (return as-is if the value is not ndarray-like)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97537
Approved by: https://github.com/ezyang
2023-06-12 17:21:31 +00:00
Edward Z. Yang
12cd1dbba0 Handle recursive tuple in clone_inputs (#102979)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102979
Approved by: https://github.com/wconstab
2023-06-05 22:11:48 +00:00
Michael Lazos
c46af25bb3 Initialize optimizer in dynamo to avoid graph break and tracing slowness (#102640)
On calls to `_init_group` rather than tracing through it, extract python values from the arguments, and call the initialization. This avoids having to trace this function which is very slow with large parameters, and also avoids graph breaking on it. This is sound in this case because the state is only initialized once in the eager case. Guards on the state and params are generated explicitly rather than via tracing the initialization.

Caveats:
`_init_group` also gathers various state tensors into lists via mutating list arguments to pass to the functional optimizer implementation. These state tensors exist on the optimizer itself, but we don't know exactly how the gathering is done and which tensors correspond to which attributes of the optimizer module (each optimizer has different states). To rectify this, we keep weak_ptrs to all of the tensors collected in the lists in globals (similar to how parameter keys are stored for dictionaries). These pointers are guaranteed to be alive as long as the optimizer object is alive if the internal state is not interfered with and they are guarded with weakref guards

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102640
Approved by: https://github.com/jansel
2023-06-03 15:49:51 +00:00
Mengwei Liu
c304fddf68 [dynamo][numpy] Support graph break for numpy ndarray (#100839)
Issue: #93684

In previous PRs #95849 #99560 we redirect `numpy.*`, `<tensor>.numpy()` calls to `torch_np.*` methods and attributes, by creating `NumpyNdarrayVariable` for those calls.

We need to handle `NumpyNdarrayVariable` when graph break happens.

This PR did 2 things:
1. In `codegen.py` we made sure we can reconstruct the value wrapped by `NumpyNdarrayVariable`, to be `torch_np.ndarray` in the stack whenerver we recompiles the subgraph.
2. In `builder.py` we can wrap the value to be `NumpyNdarrayVariable` and save it as graph input.

-----

Starting from commit 6:

## A new design for supporting numpy in dynamo

In short the core concept doesn't change: we still convert `numpy` API calls to `torch_np` API calls. However, instead of wrapping a `torch_np.ndarray` in `NumpyNdarrayVariable`, the new design wraps a `torch.Tensor`.

The reason for doing this change is because we need to keep `torch.Tensor` everywhere in the captured graph, so that it works well with the backend of dynamo. See discussions in https://github.com/Quansight-Labs/numpy_pytorch_interop/issues/142 for details.

### Flow
This is an example showing how do we think about dynamo working on a simple function:
```python
def f(x: torch.Tensor, y: torch.Tensor):
    a, b = x.numpy(), y.numpy()
    c = np.add(x, y)
    return torch.from_numpy(c)
```
```

              +------------+             +------------+
 torch.Tensor |            |numpy.ndarray|            |
 -------------- .numpy()   --------------|            |
              |            |             |            |             +------------------+
              +------------+             | numpy.add  |numpy.ndarray|                  |torch.Tensor
              +------------+             |            --------------| torch.from_numpy --------------
 torch.Tensor |            |numpy.ndarray|            |             |                  |
 -------------- .numpy()   --------------|            |             +------------------+
              |            |             |            |
              +------------+             +------------+

              +------------+             +----------------+
 torch.Tensor |            |torch.Tensor |                |
 -------------- .detach()  --------------|                |
              |            |             |                |                +----------------+            +------------+
              +------------+             |                |torch_np.ndarray|                |torch.Tensor|            |torch.Tensor
                                         | torch_np.add   -----------------| util.to_tensor -------------| .detach()  --------------
              +------------+             |                |                |                |            |            |
 torch.Tensor |            |torch.Tensor |                |                +----------------+            +------------+
 -------------- .detach()  --------------|                |
              |            |             |                |
              +------------+         |   +----------------+                                   |
                                     |                       wrapper on torch_np.add          |
                                     +--------------------------------------------------------+
```

### Approach

`torch_np` APIs can take both `torch_np.ndarray` as well as `torch.Tensor`. What  we need to do is to have a wrapper for these APIs to convert the return value back to `torch.Tensor`. This way only the wrapper is showing up in the captured graph, with `torch.Tensor`s as input and `torch.Tensor` as output.

If we have a graph break or we've traced to the end of the program, we need to inspect all the `NumpyNdarrayVariable` in the stack and convert them back to `numpy.ndarray`, to make sure the compiled version is still behaving the same as the eager version.

### Examples
Here's an example of the graph generated:

```python
def fn(x: np.ndarray, y: np.ndarray):
    a = x.real
    b = y.real
    torch._dynamo.graph_break()
    return np.add(a, 1), np.add(b, 1)
```

Graph generated:

```
[2023-05-16 10:31:48,737] torch._dynamo.output_graph.__graph: [DEBUG] TRACED GRAPH
 __compiled_fn_0 <eval_with_key>.0 opcode         name            target                                                      args                    kwargs
-------------  --------------  ----------------------------------------------------------  ----------------------  --------
placeholder    l_x_            L_x_                                                        ()                      {}
placeholder    l_y_            L_y_                                                        ()                      {}
call_function  from_numpy      <built-in method from_numpy of type object at 0x12b1fdc80>  (l_x_,)                 {}
call_function  from_numpy_1    <built-in method from_numpy of type object at 0x12b1fdc80>  (l_y_,)                 {}
call_function  attr_wrapper    <function attr_wrapper at 0x12e8693a0>                      (from_numpy, 'real')    {}
call_function  attr_wrapper_1  <function attr_wrapper at 0x12e8693a0>                      (from_numpy_1, 'real')  {}
output         output          output                                                      ((),)                   {}

[2023-05-16 10:31:48,908] torch._dynamo.output_graph.__graph: [DEBUG] TRACED GRAPH
 __compiled_fn_2 <eval_with_key>.1 opcode         name           target                                                      args                             kwargs
-------------  -------------  ----------------------------------------------------------  -------------------------------  --------
placeholder    l_a_           L_a_                                                        ()                               {}
placeholder    l_b_           L_b_                                                        ()                               {}
call_function  from_numpy     <built-in method from_numpy of type object at 0x12b1fdc80>  (l_a_,)                          {}
call_function  from_numpy_1   <built-in method from_numpy of type object at 0x12b1fdc80>  (l_b_,)                          {}
call_function  wrapped_add    <Wrapped function <original add>>                           (from_numpy, 1)                  {}
call_function  wrapped_add_1  <Wrapped function <original add>>                           (from_numpy_1, 1)                {}
output         output         output                                                      ((wrapped_add, wrapped_add_1),)  {}

```
### Changes

* `codegen.py`: reconstruct `numpy.ndarray` from `NumpyNdarrayVariable` by adding bytecode to call `utils.to_numpy_helper()`.
*  `output_graph.py`: getting rid of legacy code that does exactly what `codegen.py` does, which only handling return case but not graph break case.
*  `utils.py`: added helpers to convert `numpy.ndarray` to `torch.Tensor` and vice versa. Also adding a wrapper class that takes in a function. In `__call__` it calls the function and converts its out to `torch.Tensor` (or a list of it).
* `builder.py`: add method to wrap `numpy.ndarray` graph inputs into `NumpyNdarrayVariable`, by calling `torch.numpy` in the proxy.
* `misc.py`: `numpy` API calls goes into `NumpyVariable` and we find the function with the same name in `torch_np` module, then wrap it with the wrapper defined in `utils.py`.
* `tensor.py`, `torch.py`: proxy `tensor.numpy()` to be `torch.detach()` but wrap it with `NumpyNdarrayVariable`. Similarly, `torch.from_numpy()` -> `torch.detach()` but wrap it with `TensorVariable`. In `NumpyNdarrayVariable`, do the similar `torch_np.ndarray` to `torch.Tensor` wrapping for attributes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100839
Approved by: https://github.com/ezyang
2023-06-03 00:54:25 +00:00
Edward Z. Yang
90b1b17c9f Fix string concatenation with non-string (#102728)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102728
Approved by: https://github.com/Skylion007
2023-06-01 20:02:03 +00:00
Animesh Jain
2fa1b563da [dynamo] Activation checkpoint higher order ops - Reland 101028 (#101790)
https://github.com/pytorch/pytorch/pull/101028 was reverted due to internal breakage. Relanding.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101790
Approved by: https://github.com/zou3519
2023-05-18 19:09:14 +00:00
Yanbo Liang
7052fb37bd [Dynamo] Improve handling UnspecializedNNModuleVariable side effect (#101141)
Fixes #101102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101141
Approved by: https://github.com/jansel
2023-05-16 03:57:13 +00:00
PyTorch MergeBot
d0db7d624d Revert "[dynamo] Activation checkpointing as higher order op (#101028)"
This reverts commit de15e740a1.

Reverted https://github.com/pytorch/pytorch/pull/101028 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/101028#issuecomment-1548280970))
2023-05-15 17:47:08 +00:00
Michael Lazos
d75f93603a Flatten exceptions in dynamo (#100779)
Fixes https://github.com/pytorch/pytorch/issues/93571

[before and after](https://gist.github.com/mlazos/256b0e8f0f98495752a22b960e9f4fcb)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100779
Approved by: https://github.com/ezyang
2023-05-13 00:58:57 +00:00
Animesh Jain
de15e740a1 [dynamo] Activation checkpointing as higher order op (#101028)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101028
Approved by: https://github.com/voznesenskym, https://github.com/zou3519
2023-05-12 03:17:41 +00:00
Jerry Zhang
c3f3cb5b0f [quant][pt2e] Support conv bn fusion in convert step for QAT flow (#100442)
Summary:
This PR adds support for folding bn weights into conv for QAT flow, this is equivalent
to the QAT branch of `from_float` in eager mode quantized conv module: https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/conv.py#L223

Items that needs followup:
* there are some workaround I did because quantize_per_tensor is using float/int args and dynamo does not support these args, need to fix after we change the quantized model representation and also change these args to Tensor

Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_convert_qat_conv_bn_fusion (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'

Reviewed By: andrewor14

Differential Revision: D45344281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100442
Approved by: https://github.com/kimishpatel
2023-05-09 19:43:51 +00:00
Bin Bao
86ddfc7f68 [inductor] Move cpp wrapper trigger logic to inner_compile (#100611)
Summary: This enables cpp wrapper for backward as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100611
Approved by: https://github.com/jansel
2023-05-08 15:24:02 +00:00
Animesh Jain
3f025c607c summarize graph breaks (#100696)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100696
Approved by: https://github.com/yanboliang
2023-05-05 22:27:47 +00:00
Edward Z. Yang
ce1ad1c143 Add load_storage (#100519)
This adds a new operator debugprims::load_storage which does the unusual thing of loading a tensor from disk (via ContentStoreReader). This will be used in a later PR to implement delta debugging in the minifier, even when the repro is too big to fit into memory. The way it works is that you specify a name of the tensor you want to load, as well as enough metadata to reconstruct the tensor, if the store isn't available. If there is an active content store, we read and return the tensor from that store; otherwise we use `rand_strided` to create it.

I needed some infra improvements to do this:

* `custom_op` now supports factory functions. Factory functions have to be registered specially via `impl_factory`
* I modified `clone_input` to also support dtype conversion, which I use to change the dtype of a loaded tensor if necessary.
* ContentStore needs to work with a device argument, so we torch.load directly to the correct device. This is for fake tensor support.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100519
Approved by: https://github.com/zou3519, https://github.com/anijain2305
2023-05-05 05:25:03 +00:00
Animesh Jain
8994d9e610 [dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590)
For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590
Approved by: https://github.com/voznesenskym, https://github.com/wconstab
2023-05-04 18:52:21 +00:00
Edward Z. Yang
c7e9f40653 Misc accuracy improvements on minifier (#100447)
The changes:

* Add config knob `same_two_models_use_fp64` for toggling whether or not to use fp64
* Add a test showing that RMSE is superior to atol/rtol
* Add `--strict-accuracy` options, which allows for testing against integral/boolean accuracy.  Regular accuracy by default now ONLY. There's a test which exercises this, it's a little delicate but I had trouble thinking of a good test otherwise.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100447
Approved by: https://github.com/voznesenskym
2023-05-04 02:51:26 +00:00
kshitij12345
8b64dee5d2 [fix] torch_compile_debug don't log with 0 (#100462)
Fixes https://github.com/pytorch/pytorch/issues/99906

Tested locally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100462
Approved by: https://github.com/mlazos
2023-05-03 08:23:09 +00:00
Richard Zou
984a2397ba Refactor OutputGraph (#99987)
This PR splits OutputGraph into two classes:
- SubgraphTracer (handles FX-tracing)
- OutputGraph (handles Dynamo-specific output graph logic, like
tracking graph inputs, compiling the graph, and executing it).

The motivation behind this is in the next PR up in the stack.
TL;DR is: in order to do higher-order operators, we need nested
SubgraphTracer, one for each level of nesting of the higher-order
operators.

I'm happy to flatten the stack into a single PR, but this separate made
it easier for me to test. Lmk if you want the stack flattened.

Test Plan:
- existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99987
Approved by: https://github.com/anijain2305, https://github.com/voznesenskym
2023-05-02 17:11:02 +00:00
Michael Voznesensky
aafc6ce8cc Produce constant variables in cases where a SymNode is created with a constant (#100144)
` AOT_DYNAMIC_SHAPES=1 TORCHDYNAMO_DYNAMIC_SHAPES=1  benchmarks/dynamo/huggingface.py --performance  --training --amp --backend eager --disable-cudagraphs --device cuda --only AllenaiLongformerBase --explain`

Looks promising!

Goes from:

Dynamo produced 173 graphs covering 2760 ops with 160 graph breaks (14 unique)

To:

Dynamo produced 6 graphs covering 2298 ops with 15 graph breaks (7 unique)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100144
Approved by: https://github.com/ezyang
2023-05-01 21:32:11 +00:00
Edward Z. Yang
2d8deffc1e Refactor repro/minifier into CLI; add analyze (#100226)
This is a two part PR; I can split it if you really want me to.

The first part is a refactor of the after aot repro/minifier scripts to come with a command line interface. I maintain exact BC with the previous interface (so, e.g., you still get a repro.py and a run_minifier.py that do the same thing as before), but each of these scripts also take command line arguments now which you can use to customize what actually happens. Check `run_repro` for full documentation on the arguments.

The second part of this is an implementation of `analyze` subcommand on the new CLI for any repro.

<img width="1277" alt="image" src="https://user-images.githubusercontent.com/13564/235045677-8545aab7-5e83-4813-bbec-47783dc60122.png">

This facility is oriented towards accuracy debugging. It does several things:

1. It will run your model twice and check for nondeterminism in inductor/float64, *even* on intermediate inputs (our benchmarking nondeterminism test only checks for nondeterminism on the final output). This makes localizing which operator is nondeterministic easy.
2. It will run your compiled model side-by-side with eager and float64 variants, and then report when things diverge too far from RMSE delta from float64.

Importantly, it does all this without requiring every intermediate to be held in memory (which will cause an OOM on large repros, such as the one I tested this on.)

Some other minor improvements:

* MinifierTestBase now has an easy to comment out spot that you can use to retain the temporary directory; good for debugging
* We print "running minifier" and "running repro" in MinifierTestBase to make it easier to orient where logs are coming from
* same takes a `log_error` optional argument which you can use to reroute the error logs when things mismatch
* counters["inductor"]["intermediate_hooks"] tracks the number of intermediate hooks we've codegen'ed; good for populate the tqdm interface
* torch.fx.interpreter gets an official `boxed_run` interface which uses the boxed arguments calling convention and doesn't retain inputs unnecessarily long
* torch.utils._content_store gets compute_tensor_metadata/read_tensor_metadata helper functions for computing tensor information without serializing it

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100226
Approved by: https://github.com/bertmaher, https://github.com/bdhirsh, https://github.com/anijain2305
2023-05-01 11:12:38 +00:00
PyTorch MergeBot
89c43f4108 Revert "Produce constant variables in cases where a SymNode is created with a constant (#100144)"
This reverts commit d7bdfd3454.

Reverted https://github.com/pytorch/pytorch/pull/100144 on behalf of https://github.com/ezyang due to ci failure is real ([comment](https://github.com/pytorch/pytorch/pull/100144#issuecomment-1529587039))
2023-05-01 11:10:48 +00:00
Michael Voznesensky
d7bdfd3454 Produce constant variables in cases where a SymNode is created with a constant (#100144)
` AOT_DYNAMIC_SHAPES=1 TORCHDYNAMO_DYNAMIC_SHAPES=1  benchmarks/dynamo/huggingface.py --performance  --training --amp --backend eager --disable-cudagraphs --device cuda --only AllenaiLongformerBase --explain`

Looks promising!

Goes from:

Dynamo produced 173 graphs covering 2760 ops with 160 graph breaks (14 unique)

To:

Dynamo produced 6 graphs covering 2298 ops with 15 graph breaks (7 unique)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100144
Approved by: https://github.com/ezyang
2023-04-30 17:13:57 +00:00
Animesh Jain
03806eddbf [dynamo] Compile torchvision augmentations (#100292)
Resolves https://github.com/pytorch/pytorch/issues/100112

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100292
Approved by: https://github.com/jansel
2023-04-29 02:59:41 +00:00
Larry Liu
f5853342ea [dynamo][numpy] Handle return value being numpy ndarray (#99560)
On top of #95849 this PR is trying to handle the special case when dealing with numpy.

Consider the following example:

```
def f(x: torch.Tensor) -> np.ndarray:
	a = x.numpy()
	return a.T
```
In previous PR this will error out because we translate `a.T` to be a method call on `torch_np.ndarray.T` which is also a `torch_np.ndarray`.

This PR handles this case, by conditionally converting a `torch_np.ndarray` to `np.ndarray` before returning, to match the original behavior.

The compiled version will be:

```
def f(x):
    ___tmp_0 = __compiled_fn_0(x)
    if isinstance(___tmp_0, torch_np.ndarray):
        return ___tmp_0.tensor.numpy()
    else:
        return ___tmp_0
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99560
Approved by: https://github.com/jansel, https://github.com/yanboliang
2023-04-27 16:18:35 +00:00
Larry Liu
687afeb686 [dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849)
Issue: #93684

# Problem

Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations.

# Design (as I know it)

* Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`.
* Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent.

This PR adds `NumpyTensorVariable` and supports:
1.  tensor to ndarray, ndarray to tensor
2. numpy functions such as numpy.meshgrid()
3. ndarray attributes such as `itemsize`, `stride`

Next PR will handle returning `np.ndarray` and add support for ndarray methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849
Approved by: https://github.com/ezyang
2023-04-27 16:18:35 +00:00
Animesh Jain
3dcc7b396c [easy] iterate dict with sorted keys for accuracy checking (#99793)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99793
Approved by: https://github.com/jansel
2023-04-24 21:26:35 +00:00
Edward Z. Yang
f602b3a6ae Preserve mark_dynamic when cloning inputs (#99617)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99617
Approved by: https://github.com/ngimel, https://github.com/voznesenskym, https://github.com/anijain2305
2023-04-22 19:46:31 +00:00
Michael Voznesensky
0ac0d9d224 Pass locals to enum_repr to correctly make the guard str for enums (#99680)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99680
Approved by: https://github.com/jansel
2023-04-21 07:14:49 +00:00
Yanbo Liang
05809c7d3b [Dynamo] No graph break for explicit calling Conv{1/2/3}d.forward & ConvTranspose{1/2/3}d.forward (#99015)
Before this PR, if users call ```Conv2d(x)```, dynamo handles it well(no graph break) and puts a ```call_module``` op in the FX graph. However, if users explicitly call ```Conv2d.forward(x)``` in another ```forward``` function, the inlining would be failed(caused graph break). This PR fixed this issue by translating the explicit ```Conv2d.forward(x)``` to ```Conv2d(x)```.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99015
Approved by: https://github.com/jansel, https://github.com/wconstab
2023-04-15 08:04:13 +00:00
Michael Voznesensky
10fbdcf72c Re-PR of 90269 - Force all nn_module associated tensors to be static (#99108)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99108
Approved by: https://github.com/ezyang
2023-04-14 05:53:48 +00:00
Angela Yi
1d077f28ed [export] Constraints API (#98433)
Wrapper for users to insert constraints into model code.

The constraints will not be maintained in the graph after tracing through make_fx so retracing with dynamo/make_fx will not work. This will be supported after torch._assert supported is implemented. Then we can convert the constrain_range calls to torch._asserts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98433
Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan
2023-04-13 21:20:10 +00:00
PyTorch MergeBot
ab761605ae Revert "[export] Constraints API (#98433)"
This reverts commit 1510eb4072.

Reverted https://github.com/pytorch/pytorch/pull/98433 on behalf of https://github.com/izaitsevfb due to Breaks internal tests, asked by author to revert
2023-04-12 23:37:19 +00:00
PyTorch MergeBot
629377ea8b Revert "Replace _dynamo.config with an object instead of module (#96455)"
This reverts commit 420104a886.

Reverted https://github.com/pytorch/pytorch/pull/96455 on behalf of https://github.com/jansel due to BC breaking, was landed prematurely
2023-04-12 15:06:14 +00:00