Commit Graph

528 Commits

Author SHA1 Message Date
Nikita Karetnikov
c4a6f86062 [pt2] add metas for max_unpool2d and max_unpool3d (#103821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103821
Approved by: https://github.com/Skylion007, https://github.com/Chillee
2023-07-01 01:33:35 +00:00
PyTorch MergeBot
4de1ee6ba4 Revert "Value range refinement using multi-variate expressions. (#97964)"
This reverts commit 2642412207.

Reverted https://github.com/pytorch/pytorch/pull/97964 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it is breaking an internal test ([comment](https://github.com/pytorch/pytorch/pull/97964#issuecomment-1615194524))
2023-06-30 21:08:05 +00:00
PyTorch MergeBot
a2a8b4d415 Revert "Turn translation validation on for tests and accuracy runs by default. (#103611)"
This reverts commit e311bed2a8.

Reverted https://github.com/pytorch/pytorch/pull/103611 on behalf of https://github.com/malfet due to Broke inductor tests ([comment](https://github.com/pytorch/pytorch/pull/103611#issuecomment-1614850276))
2023-06-30 15:54:18 +00:00
Yukio Siraichi
2642412207 Value range refinement using multi-variate expressions. (#97964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97964
Approved by: https://github.com/ezyang
2023-06-30 01:32:22 +00:00
Yukio Siraichi
ffb526a2e4 Value range refinement using uni-variate expressions. (#97963)
This PR introduces value range refinement of shape symbols by symbolically evaluating the
value range of the involved guards. This should help `_maybe_evaluate_static` to eliminate
more guards.

This is a stack of PRs created from the discussion on: #96616.

In summary, this PR:
- simplifies `FloorDiv` nodes on the left-hand side of an expression so as to isolate a
symbol in the numerator
- tries to match the expression against the form: `<symbol> <relop> <expr>`
- uses the matched expression for refining the value range of `<symbol>` using the range
of `<expr>`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97963
Approved by: https://github.com/ezyang
2023-06-30 01:32:22 +00:00
Yukio Siraichi
e311bed2a8 Turn translation validation on for tests and accuracy runs by default. (#103611)
This PR turns translation validation on by default for tests and accuracy benchmark
runs. It also installs Z3 on CI.

The main changes are:

- Add `--no-translation-validation` as an option in _test/run_tests.py_
    - Set `PYTORCH_TEST_WITH_TV` environment variable
- Add `TEST_WITH_TV` variable in _torch/testing/_internal/common_utils.py_
- Turn translation validation on for accuracy benchmarks in _benchmarks/dynamo/common.py_
- Add Z3 installation on CI scripts

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103611
Approved by: https://github.com/ezyang
2023-06-30 01:32:21 +00:00
Brian Hirsh
875f60399e pre_dispatch tracing: support autocast and no_grad/enable_grad ctx managers, add a pre_dispatch_eager dynamo backend (#103024)
This PR adds support for `enable_grad`/`no_grad`/`autocast` context managers getting properly traced in `pre_dispatch` tracing. The stuff in this PR includes:
- I added a torch function mode that runs during make_fx pre_dispatch tracing, `ProxyTorchFunctionMode`. It directly intercepts the torch ops that run during the above context managers, and adds them to the current graph instead of executing them
- `enable_grad` and `no_grad` currently desugar into `torch._C.set_grad_enabled(bool)`, but this API isn't currently overrideable by torch function so I added the ability to interpose there
- the `torch.amp` context managers don't currently have a nice equivalent, like `set_autocast_enabled(state)`, so I ended up adding two new API's: `torch.amp._set_autocast_enabled` and `torch.amp._set_autocast_disabled`. If you look at how the context manager is implemented, it ends up calling several different state-changing functions, some of which depend on the backend - so I figured that it would be cleaner just to add a new API (that should probably only be used by tracing) - but open to feedback
- I added a new dynamo backend, `compile(backend="pre_dispatch_eager")`. When pre_dispatch tracing becomes always-on in inductor, it will be another potential surface for bugs. I also added a test file for it (`test/dynamo/test_pre_dispatch.py`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103024
Approved by: https://github.com/ezyang
2023-06-29 14:17:42 +00:00
Nikita Karetnikov
e9705c52ac [pt2] add metas for _pdist_forward and _pdist_backward (#103817)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103817
Approved by: https://github.com/ezyang
2023-06-22 11:18:05 +00:00
Nikita Karetnikov
e48851033a [pt2] add metas for pad ops (#103815)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103815
Approved by: https://github.com/ezyang
2023-06-22 11:18:05 +00:00
Brian Hirsh
c3c03e7cb8 Reland of https://github.com/pytorch/pytorch/pull/101818 (#103888)
Original PR broke internal

This reverts commit 5ed618132f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103888
Approved by: https://github.com/albanD
2023-06-21 21:00:56 +00:00
Peter Bell
8b418f197c [decomp] Add decomposition for torch.renorm (#103858)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103858
Approved by: https://github.com/ezyang, https://github.com/nkaretnikov
2023-06-21 20:57:43 +00:00
Peter Bell
a61096fb94 [decomp] Decompose logaddexp2 (#103765)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103765
Approved by: https://github.com/Chillee
2023-06-21 20:16:24 +00:00
PyTorch MergeBot
7b6dc72ffa Revert "[decomp] Decompose logaddexp2 (#103765)"
This reverts commit bab21d20eb.

Reverted https://github.com/pytorch/pytorch/pull/103765 on behalf of https://github.com/ezyang due to looks like land race ([comment](https://github.com/pytorch/pytorch/pull/103765#issuecomment-1599030496))
2023-06-20 15:35:02 +00:00
Peter Bell
bab21d20eb [decomp] Decompose logaddexp2 (#103765)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103765
Approved by: https://github.com/Chillee
2023-06-20 09:24:21 +00:00
Richard Zou
27a67d8699 Refactor and improve make_fx testing (#103196)
This is in preparation for the custom_op_compile_check utility, which
will call the newly refactored function.

This PR:
- splits off code into helper functions
- adds clearer error messages
- stops updating the inputs destructively (leading to slightly slower
tests)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103196
Approved by: https://github.com/bdhirsh, https://github.com/soulitzer
2023-06-14 14:00:12 +00:00
Nikita Karetnikov
d38b651d51 [pt2] add SymInt support for cosine_similarity (#103400)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103400
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2023-06-13 21:23:48 +00:00
Nikita Karetnikov
c07634436e [pt2] add SymInt support for bilinear (#103396)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103396
Approved by: https://github.com/ezyang
2023-06-13 21:23:48 +00:00
Nikita Karetnikov
4a76fb49f3 [pt2] add metas for avg_pool3d and avg_pool3d_backward (#103392)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103392
Approved by: https://github.com/ezyang
2023-06-13 21:23:46 +00:00
PyTorch MergeBot
5ed618132f Revert "change pre_autograd to pre_dispatch tracing (#101818)"
This reverts commit b0392de2c3.

Reverted https://github.com/pytorch/pytorch/pull/101818 on behalf of https://github.com/izaitsevfb due to Breaks internal builds see D46629736 TypeError: wrap_key() got an unexpected keyword argument pre_autograd ([comment](https://github.com/pytorch/pytorch/pull/101818#issuecomment-1587837667))
2023-06-12 18:16:37 +00:00
Nikita Karetnikov
2b3d955ffd [pt2] add meta and SymInt support for linalg_matrix_exp (#102945)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102945
Approved by: https://github.com/lezcano
2023-06-09 22:45:16 +00:00
Brian Hirsh
b0392de2c3 change pre_autograd to pre_dispatch tracing (#101818)
We discussed in a composability meeting a few weeks ago that `pre_autograd` should probably be renamed to `pre_dispatch`.

One question in this PR was: should I re-use a dispatch key? Or should I create a new dispatch key (that yet again corresponds to "top of the dispatcher")?

~~For now, I ended up sticking our proxy mode on the mode stack corresponding to `PythonTLSSnapshot`, because it was simple and it works. It looks like one of the functorch dispatch keys has higher priority though, so it's possible that functorch will end up running first. Open to options, but we can consider adding a new dispatch key later if that becomes a problem~~

Update: I added a dedicated dispatch key, `PreDispatch`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101818
Approved by: https://github.com/ezyang, https://github.com/Neilblaze, https://github.com/albanD, https://github.com/zou3519
2023-06-09 17:30:15 +00:00
Nikita Karetnikov
1fcc67fd8c [pt2] add SymInt support for linalg.tensorsolve (#102466)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102466
Approved by: https://github.com/Skylion007, https://github.com/lezcano
2023-06-06 08:06:55 +00:00
Nikita Karetnikov
ec0aa965da [pt2] add meta for _linalg_solve_ex (#102454)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102454
Approved by: https://github.com/lezcano
2023-06-06 08:06:55 +00:00
Nikita Karetnikov
4bda4a7e4d [pt2] add meta for lu_unpack (#102937)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102937
Approved by: https://github.com/lezcano
2023-06-06 08:06:53 +00:00
Nikita Karetnikov
6ac3352a37 [pt2] add meta for _linalg_slogdet (#102464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102464
Approved by: https://github.com/ezyang
2023-06-05 03:17:08 +00:00
Nikita Karetnikov
757791d1e3 [pt2] add SymInt support for linalg.vander (#102469)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102469
Approved by: https://github.com/Skylion007, https://github.com/lezcano
2023-06-04 09:58:02 +00:00
Edward Z. Yang
8bbef821c3 Add some unit tests from cm3leon involving repeat_interleave (#102733)
These actually were fixed by https://github.com/pytorch/pytorch/pull/102570
but that PR doesn't test guard-freeness, so here you go.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102733
Approved by: https://github.com/zou3519
2023-06-02 15:35:35 +00:00
PyTorch MergeBot
95cdd58c8f Revert "[pt2] add SymInt support for linalg.tensorsolve (#102466)"
This reverts commit b1b76f614d.

Reverted https://github.com/pytorch/pytorch/pull/102466 on behalf of https://github.com/clee2000 due to reverting b/c stack https://github.com/pytorch/pytorch/pull/102469#issuecomment-1569041604, i think this is the one that actually causes the test to fail ([comment](https://github.com/pytorch/pytorch/pull/102466#issuecomment-1569045123))
2023-05-30 20:26:46 +00:00
PyTorch MergeBot
463df86ce8 Revert "[pt2] add SymInt support for linalg.vander (#102469)"
This reverts commit 05717895aa.

Reverted https://github.com/pytorch/pytorch/pull/102469 on behalf of https://github.com/clee2000 due to broke test_aotdispatch on linux ex 05717895aa https://github.com/pytorch/pytorch/actions/runs/5125654882/jobs/9219389448, shows up as green on pr due to bug with keep-going flag and reruns ([comment](https://github.com/pytorch/pytorch/pull/102469#issuecomment-1569041604))
2023-05-30 20:24:26 +00:00
Nikita Karetnikov
05717895aa [pt2] add SymInt support for linalg.vander (#102469)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102469
Approved by: https://github.com/Skylion007, https://github.com/lezcano
2023-05-30 19:50:16 +00:00
Nikita Karetnikov
b1b76f614d [pt2] add SymInt support for linalg.tensorsolve (#102466)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102466
Approved by: https://github.com/Skylion007, https://github.com/lezcano
2023-05-30 19:50:15 +00:00
Nikita Karetnikov
0ba81ce8fe [pt2] add SymInt support for linalg.tensorinv (#102465)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102465
Approved by: https://github.com/Skylion007, https://github.com/lezcano
2023-05-30 19:50:14 +00:00
Nikita Karetnikov
995ac703cd [pt2] add SymInt support for linalg.pinv (#102367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102367
Approved by: https://github.com/lezcano
2023-05-27 11:10:47 +00:00
PyTorch MergeBot
da3aba1e46 Revert "[pt2] add SymInt support for linalg.pinv (#102367)"
This reverts commit 0d5b74da0c.

Reverted https://github.com/pytorch/pytorch/pull/102367 on behalf of https://github.com/kit1980 due to Broke slow tests https://github.com/pytorch/pytorch/actions/runs/5095190248/jobs/9160028124 ([comment](https://github.com/pytorch/pytorch/pull/102367#issuecomment-1565104562))
2023-05-27 00:33:42 +00:00
Nikita Karetnikov
0d5b74da0c [pt2] add SymInt support for linalg.pinv (#102367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102367
Approved by: https://github.com/lezcano
2023-05-26 15:20:34 +00:00
vfdev-5
e3d97b6213 [inductor] Added smooth_l1_loss refs (#102077)
Added `smooth_l1_loss` to refs + tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102077
Approved by: https://github.com/lezcano, https://github.com/ngimel
2023-05-24 15:07:08 +00:00
Edward Z. Yang
3318a832b3 Tighten FakeTensor reentrancy asserts, add debugging (#102091)
When investigating failures in https://github.com/pytorch/pytorch/pull/100017 I realized that we were reentering FakeTensorMode even though there was already one on the stack. Although we have attempted assert for these cases in the past, e.g., as in https://github.com/pytorch/pytorch/pull/97186 it seems that the existing protections were insufficient.

In this particular case, the reapplication of FakeTensorMode was due to an interaction with NotImplemented multiple dispatch handling. If proxy tensor mode detects an unrecognized tensor type (this includes FakeTensor, if it is not tracked with a proxy), it will return NotImplemented to give this tensor a chance to unpack itself into proxyable operation. However, this is never the right thing for FakeTensor, where no unpacking is possible. However, today, FakeTensor attempts to reapply the FakeTensorMode, resulting in FakeTensorMode being twice on the stack.

This PR does a number of things:

* It adds an assert in `FakeTensorMode.__torch_dispatch__` that you must not already have this mode on the stack, this is ALWAYS an error
* It modifies `FakeTensor.__torch_dispatch__` to return `NotImplemented` if the mode is already active. This prevents us from readding the mode on the stack
* It adds a new logging artifact `not_implemented` which you can use to get debug logs about all of the times a `__torch_dispatch__` handler returned NotImplemented and why it did so. Your subclass has to manually opt into this logging, but I inserted the necessary logs for ProxyTensorMode and FakeTensor(Mode)
* `with fake_mode` now no-ops if the fake mode is already on the stack, which is what users want anyway
* I am BREAKING pre-autograd tracing, because it is currently doing something weird with the original C++ mode stack. Brian is going to follow up with a fix next week.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102091
Approved by: https://github.com/thiagocrepaldi, https://github.com/eellison, https://github.com/wanchaol, https://github.com/bdhirsh
2023-05-24 05:37:51 +00:00
Nikita Karetnikov
e79d9b9938 [pt2] add SymInt support for linalg.matrix_power (#101940)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101940
Approved by: https://github.com/lezcano, https://github.com/ezyang
2023-05-24 00:21:52 +00:00
Nikita Karetnikov
42b974e8f7 [pt2] add meta for linalg_lu_solve (#101836)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101836
Approved by: https://github.com/lezcano
2023-05-24 00:21:50 +00:00
Khushi
1aaf0396eb [reland][opinfo] empty_strided (#101782)
Follows #100223

Previous PR: #100890

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101782
Approved by: https://github.com/ezyang
2023-05-19 03:06:29 +00:00
drisspg
6f13d6892a Add meta support for multinomial (#101324)
# Summary
Found this when trying to compile the text gen loop of nanogpt here: b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324
Approved by: https://github.com/ngimel
2023-05-19 00:04:26 +00:00
Angela Yi
72a73ef67b Add aten.searchsorted.Tensor meta kernel (#101637)
Test Plan: CI

Differential Revision: D45933187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101637
Approved by: https://github.com/ezyang
2023-05-18 06:55:11 +00:00
PyTorch MergeBot
dfac4364c4 Revert "[opinfo] empty_strided (#100890)"
This reverts commit 01c7106580.

Reverted https://github.com/pytorch/pytorch/pull/100890 on behalf of https://github.com/PaliC due to broke test_ops.py slow test ([comment](https://github.com/pytorch/pytorch/pull/100890#issuecomment-1551903975))
2023-05-17 19:00:15 +00:00
ydwu4
326a4cc815 Support map autograd and pytree in/out. (#101633)
Rebased https://github.com/pytorch/pytorch/pull/100494 and added dummy AOTConfig.

This PR adds autograd and pytree support for map operator.

Implementation-wise:

1. We temporarily make two HigherOrderOperators, "map" and "map_impl":
- "map" is user-facing. Currently, it unwraps the pytrees in inputs and create a flat_fn for it. Dynamo currently cannot deal with pytree.tree_flatten and pytree.tree_unflatten, we therefore make it a HigherOrderOperator to trigger dynamo logic of handling HigherOrderOperators.
- "map_impl" is the actual operator that works with the rest of torch subsystems such as functionalization, make_fx. It accepts flattend arguments, and a num_mapped_args integer denoting how many of the flattend arguments need to mapped i.e. their first dimension will be unstacked.

2. We create the forward and backward graph in autograd key and call torch.autograd.Function. Currently, the backward graph is recomputation-based and we need to partition the joint graph in the future to be more efficient.

Example traced graphs for map operators:
### Case 1: simple f and autograd
```python
def f(x, y):
    return x + y

def g(xs, y):
    out = control_flow.map(f, xs, y)
    return torch.autograd.grad(out, (xs, y), torch.ones_like(out))

gm = make_fx(g, tracing_mode="symbolic")(torch.ones(3, 4, 5, requires_grad=True), torch.ones(5, requires_grad=True))
# gm.print_readable() produces following:
class g(torch.nn.Module):
    def forward(self, xs_1: f32[3, s1, s2], y_1: f32[s2]):
        # No stacktrace found for following nodes
        body_graph_0 = self.body_graph_0
        map_impl = torch.ops.map_impl(body_graph_0, 1, xs_1, y_1);  body_graph_0 = None
        getitem: f32[3, s1, s2] = map_impl[0];  map_impl = None
        ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False)
        is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like);  getitem = None
        body_graph_1 = self.body_graph_1
        map_impl_1 = torch.ops.map_impl(body_graph_1, 2, xs_1, ones_like, y_1);  body_graph_1 = xs_1 = ones_like = None
        getitem_1 = map_impl_1[0]
        getitem_2: f32[3, s1, s2] = map_impl_1[1]
        getitem_3: f32[3, s2] = map_impl_1[2];  map_impl_1 = None
        sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_3, [0], True);  getitem_3 = None
        sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0);  y_1 = None
        view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]);  sum_1 = sym_size = None
        return (getitem_2, view)

    class <lambda>(torch.nn.Module):
        def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s2]):
            # No stacktrace found for following nodes
            add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg2_1);  arg1_1 = arg2_1 = None
            return [add]

    class <lambda>(torch.nn.Module):
        def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]):
            # No stacktrace found for following nodes
            add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg3_1);  arg1_1 = None
            is_same_size = torch.ops.aten.is_same_size.default(add, arg2_1);  add = None
            sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg2_1, [0], True)
            sym_size: Sym(s2) = torch.ops.aten.sym_size(arg3_1, 0);  arg3_1 = None
            view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]);  sum_1 = sym_size = None
            return [None, arg2_1, view]
```
### Case 2: list input/output f and autograd
```python
def f(x, y):
    return [x[0].cos() + y.sin(), x[1].sin() * y.cos()]

def g(xs, y):
    out = control_flow.map(f, xs, y)
    flat_out, _ = pytree.tree_flatten(out)
    flat_inp, _ = pytree.tree_flatten((xs, y))
    requires_grad_inp = [inp for inp in flat_inp if inp.requires_grad]
    return torch.autograd.grad(flat_out, requires_grad_inp, [torch.ones_like(out) for out in flat_out])

gm = make_fx(g, tracing_mode="symbolic")(
    [torch.ones(3, 4, 5), torch.ones(3, 4, 5, requires_grad=True)],
    torch.ones(5, requires_grad=True))

# gm.print_readable() produces following:
class g(torch.nn.Module):
    def forward(self, xs, y):
        xs_1: f32[3, s1, s2], xs_2: f32[3, s1, s2], y_1: f32[s2], = fx_pytree.tree_flatten_spec([xs, y], self._in_spec)
        # No stacktrace found for following nodes
        body_graph_0 = self.body_graph_0
        map_impl = torch.ops.map_impl(body_graph_0, 2, xs_1, xs_2, y_1);  body_graph_0 = None
        getitem: f32[3, s1, s2] = map_impl[0]
        getitem_1: f32[3, s1, s2] = map_impl[1];  map_impl = None
        ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False)
        ones_like_1: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem_1, pin_memory = False)
        is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like);  getitem = None
        is_same_size_1 = torch.ops.aten.is_same_size.default(getitem_1, ones_like_1);  getitem_1 = None
        body_graph_1 = self.body_graph_1
        map_impl_1 = torch.ops.map_impl(body_graph_1, 4, xs_1, xs_2, ones_like, ones_like_1, y_1);  body_graph_1 = xs_1 = xs_2 = ones_like = ones_like_1 = None
        getitem_2 = map_impl_1[0]
        getitem_3 = map_impl_1[1]
        getitem_4: f32[3, s1, s2] = map_impl_1[2]
        getitem_5: f32[3, s2] = map_impl_1[3];  map_impl_1 = None
        sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_5, [0], True);  getitem_5 = None
        sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0);  y_1 = None
        view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]);  sum_1 = sym_size = None
        return pytree.tree_unflatten([getitem_4, view], self._out_spec)

    class <lambda>(torch.nn.Module):
        def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]):
            # No stacktrace found for following nodes
            cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1);  arg1_1 = None
            sin: f32[s2] = torch.ops.aten.sin.default(arg3_1)
            add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin);  cos = sin = None
            sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1);  arg2_1 = None
            cos_1: f32[s2] = torch.ops.aten.cos.default(arg3_1);  arg3_1 = None
            mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1);  sin_1 = cos_1 = None
            return [add, mul]

    class <lambda>(torch.nn.Module):
        def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s1, s2], arg4_1: f32[s1, s2], arg5_1: f32[s2]):
            # No stacktrace found for following nodes
            cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1);  arg1_1 = None
            sin: f32[s2] = torch.ops.aten.sin.default(arg5_1)
            add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin);  cos = sin = None
            sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1)
            cos_1: f32[s2] = torch.ops.aten.cos.default(arg5_1)
            mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1)
            is_same_size = torch.ops.aten.is_same_size.default(add, arg3_1);  add = None
            is_same_size_1 = torch.ops.aten.is_same_size.default(mul, arg4_1);  mul = None
            mul_1: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, sin_1);  sin_1 = None
            mul_2: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, cos_1);  arg4_1 = cos_1 = None
            sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(mul_1, [0], True);  mul_1 = None
            sym_size: Sym(s2) = torch.ops.aten.sym_size(arg5_1, 0)
            view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]);  sum_1 = None

            #
            sin_2: f32[s2] = torch.ops.aten.sin.default(arg5_1)
            neg: f32[s2] = torch.ops.aten.neg.default(sin_2);  sin_2 = None
            mul_3: f32[s2] = torch.ops.aten.mul.Tensor(view, neg);  view = neg = None
            cos_2: f32[s1, s2] = torch.ops.aten.cos.default(arg2_1);  arg2_1 = None
            mul_4: f32[s1, s2] = torch.ops.aten.mul.Tensor(mul_2, cos_2);  mul_2 = cos_2 = None
            sum_2: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg3_1, [0], True);  arg3_1 = None
            view_1: f32[s2] = torch.ops.aten.view.default(sum_2, [sym_size]);  sum_2 = sym_size = None
            cos_3: f32[s2] = torch.ops.aten.cos.default(arg5_1);  arg5_1 = None
            mul_5: f32[s2] = torch.ops.aten.mul.Tensor(view_1, cos_3);  view_1 = cos_3 = None
            add_1: f32[s2] = torch.ops.aten.add.Tensor(mul_3, mul_5);  mul_3 = mul_5 = None
            return [None, None, mul_4, add_1]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101633
Approved by: https://github.com/zou3519
2023-05-17 16:52:26 +00:00
PyTorch MergeBot
e69198b043 Revert "Support map autograd and pytree in/out (#100494)"
This reverts commit b8fa41be9d.

Reverted https://github.com/pytorch/pytorch/pull/100494 on behalf of https://github.com/PaliC due to breaking tests on trunk, please check hud.pytorch.org for the broken tests ([comment](https://github.com/pytorch/pytorch/pull/100494#issuecomment-1550454835))
2023-05-16 22:50:18 +00:00
ydwu4
b8fa41be9d Support map autograd and pytree in/out (#100494)
This PR adds autograd and pytree support for map operator.

Implementation-wise:

1. We temporarily make two HigherOrderOperators, "map" and "map_impl":
- "map" is user-facing. Currently, it unwraps the pytrees in inputs and create a flat_fn for it. Dynamo currently cannot deal with pytree.tree_flatten and pytree.tree_unflatten, we therefore make it a HigherOrderOperator to trigger dynamo logic of handling HigherOrderOperators.
- "map_impl" is the actual operator that works with the rest of torch subsystems such as functionalization, make_fx. It accepts flattend arguments, and a num_mapped_args integer denoting how many of the flattend arguments need to mapped i.e. their first dimension will be unstacked.

2. We create the forward and backward graph in autograd key and call torch.autograd.Function. Currently, the backward graph is recomputation-based and we need to partition the joint graph in the future to be more efficient.

Example traced graphs for map operators:
### Case 1: simple f and autograd
```python
def f(x, y):
    return x + y

def g(xs, y):
    out = control_flow.map(f, xs, y)
    return torch.autograd.grad(out, (xs, y), torch.ones_like(out))

gm = make_fx(g, tracing_mode="symbolic")(torch.ones(3, 4, 5, requires_grad=True), torch.ones(5, requires_grad=True))
# gm.print_readable() produces following:
class g(torch.nn.Module):
    def forward(self, xs_1: f32[3, s1, s2], y_1: f32[s2]):
        # No stacktrace found for following nodes
        body_graph_0 = self.body_graph_0
        map_impl = torch.ops.map_impl(body_graph_0, 1, xs_1, y_1);  body_graph_0 = None
        getitem: f32[3, s1, s2] = map_impl[0];  map_impl = None
        ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False)
        is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like);  getitem = None
        body_graph_1 = self.body_graph_1
        map_impl_1 = torch.ops.map_impl(body_graph_1, 2, xs_1, ones_like, y_1);  body_graph_1 = xs_1 = ones_like = None
        getitem_1 = map_impl_1[0]
        getitem_2: f32[3, s1, s2] = map_impl_1[1]
        getitem_3: f32[3, s2] = map_impl_1[2];  map_impl_1 = None
        sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_3, [0], True);  getitem_3 = None
        sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0);  y_1 = None
        view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]);  sum_1 = sym_size = None
        return (getitem_2, view)

    class <lambda>(torch.nn.Module):
        def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s2]):
            # No stacktrace found for following nodes
            add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg2_1);  arg1_1 = arg2_1 = None
            return [add]

    class <lambda>(torch.nn.Module):
        def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]):
            # No stacktrace found for following nodes
            add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg3_1);  arg1_1 = None
            is_same_size = torch.ops.aten.is_same_size.default(add, arg2_1);  add = None
            sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg2_1, [0], True)
            sym_size: Sym(s2) = torch.ops.aten.sym_size(arg3_1, 0);  arg3_1 = None
            view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]);  sum_1 = sym_size = None
            return [None, arg2_1, view]
```
### Case 2: list input/output f and autograd
```python
def f(x, y):
    return [x[0].cos() + y.sin(), x[1].sin() * y.cos()]

def g(xs, y):
    out = control_flow.map(f, xs, y)
    flat_out, _ = pytree.tree_flatten(out)
    flat_inp, _ = pytree.tree_flatten((xs, y))
    requires_grad_inp = [inp for inp in flat_inp if inp.requires_grad]
    return torch.autograd.grad(flat_out, requires_grad_inp, [torch.ones_like(out) for out in flat_out])

gm = make_fx(g, tracing_mode="symbolic")(
    [torch.ones(3, 4, 5), torch.ones(3, 4, 5, requires_grad=True)],
    torch.ones(5, requires_grad=True))

# gm.print_readable() produces following:
class g(torch.nn.Module):
    def forward(self, xs, y):
        xs_1: f32[3, s1, s2], xs_2: f32[3, s1, s2], y_1: f32[s2], = fx_pytree.tree_flatten_spec([xs, y], self._in_spec)
        # No stacktrace found for following nodes
        body_graph_0 = self.body_graph_0
        map_impl = torch.ops.map_impl(body_graph_0, 2, xs_1, xs_2, y_1);  body_graph_0 = None
        getitem: f32[3, s1, s2] = map_impl[0]
        getitem_1: f32[3, s1, s2] = map_impl[1];  map_impl = None
        ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False)
        ones_like_1: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem_1, pin_memory = False)
        is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like);  getitem = None
        is_same_size_1 = torch.ops.aten.is_same_size.default(getitem_1, ones_like_1);  getitem_1 = None
        body_graph_1 = self.body_graph_1
        map_impl_1 = torch.ops.map_impl(body_graph_1, 4, xs_1, xs_2, ones_like, ones_like_1, y_1);  body_graph_1 = xs_1 = xs_2 = ones_like = ones_like_1 = None
        getitem_2 = map_impl_1[0]
        getitem_3 = map_impl_1[1]
        getitem_4: f32[3, s1, s2] = map_impl_1[2]
        getitem_5: f32[3, s2] = map_impl_1[3];  map_impl_1 = None
        sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_5, [0], True);  getitem_5 = None
        sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0);  y_1 = None
        view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]);  sum_1 = sym_size = None
        return pytree.tree_unflatten([getitem_4, view], self._out_spec)

    class <lambda>(torch.nn.Module):
        def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]):
            # No stacktrace found for following nodes
            cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1);  arg1_1 = None
            sin: f32[s2] = torch.ops.aten.sin.default(arg3_1)
            add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin);  cos = sin = None
            sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1);  arg2_1 = None
            cos_1: f32[s2] = torch.ops.aten.cos.default(arg3_1);  arg3_1 = None
            mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1);  sin_1 = cos_1 = None
            return [add, mul]

    class <lambda>(torch.nn.Module):
        def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s1, s2], arg4_1: f32[s1, s2], arg5_1: f32[s2]):
            # No stacktrace found for following nodes
            cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1);  arg1_1 = None
            sin: f32[s2] = torch.ops.aten.sin.default(arg5_1)
            add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin);  cos = sin = None
            sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1)
            cos_1: f32[s2] = torch.ops.aten.cos.default(arg5_1)
            mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1)
            is_same_size = torch.ops.aten.is_same_size.default(add, arg3_1);  add = None
            is_same_size_1 = torch.ops.aten.is_same_size.default(mul, arg4_1);  mul = None
            mul_1: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, sin_1);  sin_1 = None
            mul_2: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, cos_1);  arg4_1 = cos_1 = None
            sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(mul_1, [0], True);  mul_1 = None
            sym_size: Sym(s2) = torch.ops.aten.sym_size(arg5_1, 0)
            view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]);  sum_1 = None

            #
            sin_2: f32[s2] = torch.ops.aten.sin.default(arg5_1)
            neg: f32[s2] = torch.ops.aten.neg.default(sin_2);  sin_2 = None
            mul_3: f32[s2] = torch.ops.aten.mul.Tensor(view, neg);  view = neg = None
            cos_2: f32[s1, s2] = torch.ops.aten.cos.default(arg2_1);  arg2_1 = None
            mul_4: f32[s1, s2] = torch.ops.aten.mul.Tensor(mul_2, cos_2);  mul_2 = cos_2 = None
            sum_2: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg3_1, [0], True);  arg3_1 = None
            view_1: f32[s2] = torch.ops.aten.view.default(sum_2, [sym_size]);  sum_2 = sym_size = None
            cos_3: f32[s2] = torch.ops.aten.cos.default(arg5_1);  arg5_1 = None
            mul_5: f32[s2] = torch.ops.aten.mul.Tensor(view_1, cos_3);  view_1 = cos_3 = None
            add_1: f32[s2] = torch.ops.aten.add.Tensor(mul_3, mul_5);  mul_3 = mul_5 = None
            return [None, None, mul_4, add_1]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100494
Approved by: https://github.com/zou3519
2023-05-16 22:05:11 +00:00
Nikita Karetnikov
42e65a2587 [pt2] add meta for linalg_lu_factor_ex (#101375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101375
Approved by: https://github.com/lezcano
2023-05-16 20:56:54 +00:00
Khushi
01c7106580 [opinfo] empty_strided (#100890)
Follows: #100223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100890
Approved by: https://github.com/ezyang
2023-05-15 23:39:39 +00:00
Nikita Karetnikov
9eb1748b2b [pt2] add meta and SymInt support for linalg_lu (#101372)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101372
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-05-15 20:25:00 +00:00
Nikita Karetnikov
ac4cc63ae2 [pt2] add meta for linalg_ldl_solve (#101367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101367
Approved by: https://github.com/lezcano
2023-05-15 20:25:00 +00:00
Nikita Karetnikov
7dd8e08817 [pt2] add meta for linalg_ldl_factor_ex (#101362)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101362
Approved by: https://github.com/lezcano
2023-05-15 02:56:49 +00:00
Nikita Karetnikov
a8964d6377 [pt2] add meta and SymInt support for linalg_householder_product (#101315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101315
Approved by: https://github.com/lezcano
2023-05-15 02:56:49 +00:00
Nikita Karetnikov
6abde61f8e [pt2] add meta function for _linalg_eigh (#100964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100964
Approved by: https://github.com/ezyang
2023-05-10 15:45:15 +00:00
Khushi
51fe53e619 [opinfo] item (#100313)
Follows #100223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100313
Approved by: https://github.com/ezyang
2023-05-10 11:32:45 +00:00
Nikita Karetnikov
1e591a8b64 [pt2] add meta function for solve_triangular (#100829)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100829
Approved by: https://github.com/ezyang
2023-05-08 13:48:15 +00:00
Nikita Karetnikov
266c84e3ab [pt2] add meta function for linalg_qr (#100714)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100714
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-05-06 15:04:02 +00:00
Nikita Karetnikov
37f1be041a [pt2] enable svd in fake_tensor (#100130)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100130
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-05-05 06:27:59 +00:00
Michael Voznesensky
fe3ecfe0cf Add AotAutogradFallbackTests to dynamic suite (#100454)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100454
Approved by: https://github.com/ezyang
2023-05-04 04:28:45 +00:00
Nikita Karetnikov
e87ed2a88d [primTorch] add ref for polar (#100345)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100345
Approved by: https://github.com/ezyang
2023-05-04 01:37:02 +00:00
Nikita Karetnikov
279f3cd0a6 [pt2] add SymInt support for dsplit, hsplit, vsplit (#100352)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100352
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2023-05-02 18:51:03 +00:00
Nikita Karetnikov
41361538a9 [pt2] add SymInt support for tensordot and inner (#100356)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100356
Approved by: https://github.com/ezyang
2023-05-02 14:42:50 +00:00
Brian Hirsh
62fad315c1 fix per-dispatchkey-mode caching bug (#98030)
The bug was that: if you want to move a mode to the autograd key, we need to use the "functionality" key for it (AutogradFunctionality). But when we do that, we need to clear any PythonDispatcher caches for every op for **every** autograd key (since you could run autograd ops with both cpu and cuda tensors underneath the mode, which both may have been cached).

I didn't add a test, since this ends up getting indirectly tests by export in the PR. If someone would prefer a direct test I can add one.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98030
Approved by: https://github.com/ezyang
2023-04-25 21:58:14 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Nikita Karetnikov
fbb0ff10a4 [pt2] add SymInt support for trapezoid ops (#99281)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99281
Approved by: https://github.com/ezyang
2023-04-25 00:44:25 +00:00
Wanchao Liang
ff7d5b62d4 Improve ProxyTensor tensor_tree list/tuple handling (#99897)
This PR improves the list/tuple handling by merging the logic into
`wrap_with_proxy` directly, and set_meta when we find the current
proxy is a fx.Proxy. This also solves the problem that even `fused_adam`
have `val`, some corresponding `getitem` calls followed after `fused_adam` don't have val
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99897
Approved by: https://github.com/ezyang
2023-04-24 22:50:02 +00:00
Michael Voznesensky
4c2892944f Guard static shapes alongside tensors, instead of from shape_env, in dynamic_shapes=True (#99566)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99566
Approved by: https://github.com/ezyang
2023-04-22 16:46:52 +00:00
Edward Z. Yang
10c938abef Handle meta['val'] for tuple of lists. (#99724)
Fixes https://github.com/pytorch/pytorch/issues/99356

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99724
Approved by: https://github.com/wanchaol
2023-04-21 22:33:21 +00:00
Elias Ellison
638feec4e3 Turn on meta converter for complex (#98869)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98869
Approved by: https://github.com/ngimel
2023-04-20 16:42:38 +00:00
Richard Zou
44b09bf673 Reland "Simple Custom Operator API, V0 (#98440)" (#99416)
See the original PR (#98440) for the description. It broke internal
builds due to proxy_tensor.py not importing torch._dynamo, which is
being fixed in the previous PR in the stack.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99416
Approved by: https://github.com/soulitzer, https://github.com/bdhirsh
2023-04-18 23:48:33 +00:00
PyTorch MergeBot
f497031df9 Revert "Simple Custom Operator API, V0 (#98440)"
This reverts commit 0157b2d722.

Reverted https://github.com/pytorch/pytorch/pull/98440 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-18 13:04:27 +00:00
Nikita Karetnikov
106ccf4a2a [pt2] add meta function for linalg.cross (#99279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279
Approved by: https://github.com/ezyang
2023-04-17 21:21:45 +00:00
Nikita Karetnikov
6f7b434f7b [pt2] add SymInt support for column_stack (#99276)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99276
Approved by: https://github.com/ezyang
2023-04-17 21:21:45 +00:00
PyTorch MergeBot
08dd4ad0b9 Revert "[pt2] add SymInt support for column_stack (#99276)"
This reverts commit 775dd869d0.

Reverted https://github.com/pytorch/pytorch/pull/99276 on behalf of https://github.com/ezyang due to reverting this one too for safety
2023-04-17 19:37:58 +00:00
PyTorch MergeBot
f957334c2b Revert "[pt2] add meta function for linalg.cross (#99279)"
This reverts commit efc3887ea5.

Reverted https://github.com/pytorch/pytorch/pull/99279 on behalf of https://github.com/ezyang due to Apparently this is breaking inductor on master? So weird
2023-04-17 19:33:16 +00:00
Tugsbayasgalan Manlaibaatar
7401f0f8ce Add unbacked symbool support (#98877)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98877
Approved by: https://github.com/ezyang
2023-04-17 17:45:10 +00:00
Richard Zou
0157b2d722 Simple Custom Operator API, V0 (#98440)
This PR introduces CustomOp, a wrapper around a dispatcher operator that allows
users to define custom operators. It adds the skeleton for CustomOp and
some very simple behavior: as of this PR:
- one can create a CustomOp for an operator that does not have inplace or aliasing
- give it CPU/CUDA and Meta implementations
- and trace it into a graph via make_fx.

The design follows
https://docs.google.com/document/d/19Uc5OUCA187q9BZggJb70RT2ZoSTDoG5QQkJkZwd25M/edit
Concretely, we implement the following things mentioned in the doc in this PR:
- Entrypoint 1 (CustomOp.define, creating a new custom operator)
- impl (to define device-specific code) and impl_meta (to define meta
formulas)

The goal for the short term is to get the code to a state where it can be trialed
by the export folks. On top of this PR, the blockers are:
- adding Entrypoint 3 (CustomOp.from_existing)
- adding a way to do data-dependent shape formulas
These will come in future PRs since this one is getting long.

Things that will come in the longer-near-term (before 2.1):
- adding the other entrypoints mentioned in the doc (2 & 3)
- more safety checks and better error messages
- support for views and mutation
- support for defining autograd formulas
- support for functionalization
- making this API public (it's private right now).

Test Plan:
- added a new test case, TestCustomOp. It mostly tests a bunch of error
cases.
- added OpInfos for custom operators and hooked these up to
test_proxy_tensor to test that they work with make_fx. These custom
operators were based off of the ones in the autograd_function_db.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98440
Approved by: https://github.com/ezyang
2023-04-17 12:17:32 +00:00
Nikita Karetnikov
efc3887ea5 [pt2] add meta function for linalg.cross (#99279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279
Approved by: https://github.com/ezyang
2023-04-17 03:05:20 +00:00
Nikita Karetnikov
775dd869d0 [pt2] add SymInt support for column_stack (#99276)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99276
Approved by: https://github.com/ezyang
2023-04-17 03:05:20 +00:00
Nikita Karetnikov
21681f36f4 [pt2] add SymInt support for fft ops (#99115)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99115
Approved by: https://github.com/ezyang
2023-04-15 18:01:39 +00:00
Peter Bell
7b91bd2a7b [primTorch] Add count_nonzero (#98995)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98995
Approved by: https://github.com/lezcano
2023-04-13 22:08:19 +00:00
Nikita Karetnikov
8db04e080c [pt2] add SymInt support for cdist (#98881)
Fixes #98853.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98881
Approved by: https://github.com/ezyang
2023-04-12 23:06:40 +00:00
Edward Z. Yang
419ad49e65 Make Tensor.__contains__ accept SymInt/Float/Bool. (#98933)
Fixes https://github.com/pytorch/pytorch/issues/98870

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98933
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-04-12 19:16:33 +00:00
Nikita Karetnikov
ff825de442 [primTorch] add ref for cumprod (#98670)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98670
Approved by: https://github.com/ezyang
2023-04-09 15:22:28 +00:00
Nikita Karetnikov
a2e7910dfd [pt2] remove skip for masked.logsumexp in test_proxy_tensor.py (#98676)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98676
Approved by: https://github.com/ezyang
2023-04-09 01:28:16 +00:00
Nikita Karetnikov
b411238d76 [pt2] add meta function for logcumsumexp (#98683)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98683
Approved by: https://github.com/ezyang
2023-04-09 01:26:37 +00:00
Nikita Karetnikov
1c226f5aad [pt2] add meta functions for cummax and cummin (#98552)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98552
Approved by: https://github.com/Chillee
2023-04-07 17:58:28 +00:00
Nikita Karetnikov
7b25976323 [pt2] add meta function for take (#98451)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98451
Approved by: https://github.com/ezyang
2023-04-06 14:48:35 +00:00
Michael Voznesensky
b1e60bfb6a Pass f_locals as a dict rather than kwargs (#98107)
Fixes https://github.com/pytorch/pytorch/issues/97688

One big problem is that instead of printing x < y we now print
`E["x"] < E["y"]` and now all of the tests wobbled and I'm mad.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98107
Approved by: https://github.com/ezyang
2023-04-04 00:30:08 +00:00
Edward Z. Yang
8372c5dc68 Refactor dynamic dims api, stateless internals, higher level export API (#96699)
The purpose of this API is to execute a few large components of work:

1) Refactor all the internals of plumbing dynamic dimension information after dynamo to be stateless
2) Decouple allocation controls around dynamic dimensions from verification
3) For (2), for allocation, create an enum that dictates whether we are in DUCK (default today), STATIC (aka assume_static_default in the past), or DYNAMIC (aka user constrained, do not duck shape)
4) For (2), for verification, we separate out the list of dynamic ranges entirely from allocation. This means shape_env does not tracking for what we verify on, and instead, it is the callers job to invoke produce_guards() with the various things they want verified, specifically, with the valid ranges. We do use constrain ranges to refine value ranges when doing analysis.
5) We have decided, therefore, as an extension of (4) to double down on "late" checks versus "eager" checks, primarily because the mechanisms for gathering what actually matters happens during guards, and should be a purview of the caller seeking guards, not the shape env. However, for dynamo, these structures are essentially one and the same.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96699
Approved by: https://github.com/avikchaudhuri, https://github.com/ezyang
2023-03-29 16:55:49 +00:00
Brian Hirsh
35c9ea89fa dont bake in defaults when tracing *_like factories (#97564)
quick fix for https://github.com/pytorch/pytorch/issues/97541. letting CI run to see if there's any fallout

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97564
Approved by: https://github.com/ezyang
2023-03-27 22:53:44 +00:00
Brian Hirsh
af440c427b [draft for discussion] add per-dispatch key modes (#97052)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97052
Approved by: https://github.com/ezyang, https://github.com/zou3519
2023-03-21 23:45:45 +00:00
Rohan Gupta
b01d6f2cdb addmv decomp #2 (#96264)
Fixes #94617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96264
Approved by: https://github.com/ngimel, https://github.com/ezyang
2023-03-16 23:09:45 +00:00
Nikita Karetnikov
0d7c44096a Add baddbmm meta function (#96548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96548
Approved by: https://github.com/ezyang
2023-03-11 19:09:24 +00:00
Nikita Karetnikov
8e0d5bf538 [primTorch] add meta implementation for aten.min.dim (#96442)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96442
Approved by: https://github.com/ngimel
2023-03-11 18:51:51 +00:00
Edward Z. Yang
98ff841a75 Use maxint to bound integers. (#96121)
We don't actually support arbitrary precision integers.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96121
Approved by: https://github.com/tugsbayasgalan, https://github.com/lezcano
2023-03-07 12:46:19 +00:00
Edward Z. Yang
680214ac11 SymIntify a few more relatively non-controversial schemas (#96100)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96100
Approved by: https://github.com/Skylion007
2023-03-06 23:12:40 +00:00
Jason Ansel
5dd52e250f [inductor] Add some simple decomps (#96039)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96039
Approved by: https://github.com/ngimel
2023-03-05 17:07:56 +00:00
Edward Z. Yang
027ebca4d7 Don't use guardless contiguity/stride-like implementations (#95733)
These prevent us from simplifying tests involving unbacked SymInts,
and then you end up with unbacked SymInt in guards, which is bad.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95733
Approved by: https://github.com/tugsbayasgalan
2023-03-03 21:56:41 +00:00
PyTorch MergeBot
4026c62174 Revert "Don't use guardless contiguity/stride-like implementations (#95733)"
This reverts commit deaf077de8.

Reverted https://github.com/pytorch/pytorch/pull/95733 on behalf of https://github.com/ezyang due to apparently this regresses executorch tests internally
2023-03-03 17:43:05 +00:00
Edward Z. Yang
deaf077de8 Don't use guardless contiguity/stride-like implementations (#95733)
These prevent us from simplifying tests involving unbacked SymInts,
and then you end up with unbacked SymInt in guards, which is bad.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95733
Approved by: https://github.com/tugsbayasgalan
2023-03-01 23:14:58 +00:00
Edward Z. Yang
e628a3e724 Don't generate guards that refer to unbacked SymInts (#95732)
This regresses unbacked batch resnet, but I have a plan to recover that
too.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95732
Approved by: https://github.com/tugsbayasgalan
2023-03-01 06:14:27 +00:00
Edward Z. Yang
d78274b759 Automatically guard when SymInt is converted to int (#95479)
During enablement, we disabled int() conversions because they were
any easy way to footgun guards.  We have enough of dynamic shapes
working now that this is now causing spurious errors; e.g., if you feed
a symbolic int to x.size(symint).  We now allow for implicit conversions
of SymInt to int here, posting a guard.  We expect guard provenance
to help people debug overspecialization.

Fixes https://github.com/pytorch/pytorch/issues/95328

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95479
Approved by: https://github.com/wconstab, https://github.com/voznesenskym, https://github.com/ngimel
2023-02-25 19:41:51 +00:00
Edward Z. Yang
8efe4fd590 Memoize repeated nonzero calls to the same fake tensor (#95399)
This removes the need to explicitly constrain_unify `x[mask]` and `y[mask]` when mask is a boolean tensor. It's very narrow but it seems to work in practice.

To invalidate the nonzero call when mutation occurs, I use version counter. I know there are ways to bypass this but I think it's good enough for now.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95399
Approved by: https://github.com/eellison
2023-02-24 00:27:45 +00:00
Edward Z. Yang
4833e47feb Add support for nonzero, some improvements to reduce guards (#95387)
This takes the strategy described in https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit#

It is essentially https://github.com/pytorch/pytorch/pull/95222 but squashed and with changes that are unnecessary given that we assume nonzero returns > 1.

What's in the PR:

* nonzero now supports meta propagation. When `capture_dynamic_output_shape_ops`, it will return a tensor with an unbacked SymInt representing the size in question.
* The unbacked SymInt is UNSOUNDLY assumed to be not equal to 0/1. We will still error if you guard otherwise.
* PrimTorch pointwise operators are updated to use empty_permuted, to avoid guarding on unbacked SymInt from empty_strided (tested in `test_dynamic_pointwise_scalar`)
* Convolution is updated to skip backend selection if batch is unbacked, to avoid guarding on unbacked SymInt (tested in `test_unbacked_batch_resnet`)
* I kept the helper utilities like `definitely_true` for working with possibly unbacked SymInts. They're not used right now but maybe someone will find them useful.
* Added `constrain_unify` to let you specify two unbacked SymInts must have the same value

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95387
Approved by: https://github.com/voznesenskym
2023-02-24 00:27:45 +00:00
Edward Z. Yang
3758559a58 Reland "Introduce constrain_range; remove old expr_subs (#95063)" (#95209)
This reverts commit 4e88547c95.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95209
Approved by: https://github.com/albanD
2023-02-22 18:16:25 +00:00
PyTorch MergeBot
cf6e078c34 Revert "Reland "Introduce constrain_range; remove old expr_subs (#95063)" (#95209)"
This reverts commit f7bf31fff1.

Reverted https://github.com/pytorch/pytorch/pull/95209 on behalf of https://github.com/ezyang due to internal sympy is too old
2023-02-22 01:58:58 +00:00
Edward Z. Yang
f7bf31fff1 Reland "Introduce constrain_range; remove old expr_subs (#95063)" (#95209)
This reverts commit 4e88547c95.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95209
Approved by: https://github.com/albanD
2023-02-21 18:02:48 +00:00
Edward Z. Yang
ce950b412f Reland "Add torch.empty_permuted (#95069)" (#95208)
This reverts commit 92e03cd583.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95208
Approved by: https://github.com/albanD
2023-02-21 18:02:48 +00:00
PyTorch MergeBot
92e03cd583 Revert "Add torch.empty_permuted (#95069)"
This reverts commit bedeb1f014.

Reverted https://github.com/pytorch/pytorch/pull/95069 on behalf of https://github.com/jeanschmidt due to Breaking internal builds. More in https://fburl.com/phabricator/ztrxrroq
2023-02-21 12:05:20 +00:00
PyTorch MergeBot
4e88547c95 Revert "Introduce constrain_range; remove old expr_subs (#95063)"
This reverts commit 3711f7c59f.

Reverted https://github.com/pytorch/pytorch/pull/95063 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, more details can be found: https://fburl.com/phabricator/fq5b6k8a
2023-02-21 10:43:39 +00:00
Natalia Gimelshein
286d821e61 Don't replace FloorDiv with floor in simplify, do simplifications for divisible exprs (#95076)
I don't see why `floor` is better than `FloorDiv` and solve with `FloorDiv` doesn't work anyway (the solution wouldn't be unique even if it worked).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95076
Approved by: https://github.com/jansel, https://github.com/malfet, https://github.com/nkaretnikov
2023-02-20 01:53:54 +00:00
Edward Z. Yang
bedeb1f014 Add torch.empty_permuted (#95069)
torch.empty_permuted is a generalized version of torch.empty(memory_format=...), where you can pass an arbitrary physical layout as a tuple of dims to allow you to setup dense, non-overlapping tensors with non-standard memory format. Check the docblock for a full description of semantics.

The initial motivation for this PR is with guard-less unbacked SymInts. Traditionally, the way we allocate dense tensors with arbitrary layout is with `empty_strided`. However, `empty_strided` does not know that the given strides are actually contiguous, and must test this manually to find out if it is the case. With `empty_permuted`, this is known statically to be the case and helps us skip some 0/1 guards.

However, I also think torch.empty_permuted is a useful API in its own right. It is technically possible to simulate this with an empty and a permute; however, there are some downsides:

* The manual incant is tricky to work out. To allocate an NHWC tensor, the invocation is `torch.empty(N, H, W, C).permute(0, 3, 1, 2)`; the permute call has to take NHWC to NCHW, and is the *inverse* of the permutation people are typically thinking of when they talk about NHWC (0, 2, 3, 1). Instead, torch.empty_permuted lets you say `torch.empty_permuted((N, C, H, W), (0, 2, 3, 1))`, letting you provide the intuitive permutation. It can be literally be read off as NHWC if you assign N=0, C=1, H=2, W=3.
* An empty(requires_grad=True).permute() is no longer a leaf tensor. You can force it to be a leaf with a detach(), but it is more straightforward and less error prone to allow directly allocating a tensor with the correct permutation.

It is also technically possible to simulate this with empty_strided. However, this requires the user to manually compute the contiguous output strides and is bad from a reduction of guards perspective. For what it's worth, this is one of the more common uses of as_strided in the wild, and it would be nice to get rid of it.

A nice enhancement of this feature would be to accept `physical_layout` anywhere `memory_format` is accepted. However, this would be a pretty involved change, so I'm doing the easy thing instead.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95069
Approved by: https://github.com/malfet, https://github.com/ngimel, https://github.com/albanD, https://github.com/dagitses
2023-02-20 00:23:10 +00:00
Edward Z. Yang
3711f7c59f Introduce constrain_range; remove old expr_subs (#95063)
This PR introduces a new `constrain_range` function which can be used to constrain the possible values a SymInt/SymFloat can take on. This knowledge can be then used to discharge potential guards (by running the range analysis, and then seeing if the guard must be true given the original range) without adding another guard.

The usage of ranges is very limited right now; ranges are only constrained when the user explicitly instructs the system so. However, we can also infer range constraints based on guards as well; this is left for future work.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95063
Approved by: https://github.com/eellison
2023-02-19 23:17:09 +00:00
Fabio Rocha
b652577d8e Change test_torchinductor_opinfo.py to mark skips/xfails in a better way (#94813)
With this change, expected failures will be correctly reported as such by pytest (instead of passes as before).
It was sometimes a little confusing to see operators you did not expect to work in inductor reported as passing their tests.

One downside is that expected failures/skips for test variants have now to be identified by tuples. I.e., `("max", "reduction_no_dim"): {f16},` instead of just `"max.reduction_no_dim": {f16}`. It seems to me it is worth it.

This change would also allow to simplify `TestInductorOpInfo` class a little, since it doesn't have to handle the skips/xfails anymore, but that might require dropping support for things like `PYTORCH_COLLECT_EXPECT` and `PYTORCH_FAIL_ON_SUCCESS` so I didn't do it.

Also couple of other minor changes:

 - Got rid of c32, c64, c128 in torchinductor_opinfo. We don't support complex numbers, so they shouldn't be necessary.
 - Renamed TestExpect Enum to ExpectedTestResult to get rid of a pytest warning that thinks it is a class that has tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94813
Approved by: https://github.com/lezcano, https://github.com/jansel
2023-02-16 18:57:01 +00:00
Edward Z. Yang
ef5de0a4cf Don't use PrimTorch decomposition for empty (#94512)
This PR removes the unnecessary == 0 guard when constructing empty tensors, by ensuring that when we create a contiguous tensor we go directly to the C++ torch.empty implementation (instead of indirecting through empty_strided), where we can bypass doing zero tests when computing the size of the storage. This probably also speeds up trace time.

When I did this, I found out that `empty_tensor_restride_symint` was flagrantly wrong (we had never exercised it before because we redirected to `empty_strided` in PrimTorch decomp, which doesn't hit this codepath.) The bugs:

* Stride computation was wrong (only `last_idx` was ever written to)
* Using set_sizes_and_strides with `sym_sizes` input doesn't work, because there is some sort of ordering problem where `clone_symvec` isn't safe when you clone a vector into itself. Probably should fix this.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94512
Approved by: https://github.com/ngimel
2023-02-16 16:04:41 +00:00
Edward Z. Yang
2f32fd7762 Introduce branchless implementations of TensorImpl bools (#94473)
This is the main payload of this diff stack. With it, we are able to construct a 1D tensor from unbacked SymInt with guards that are equivalent to asserting that the size is non-negative (which makes sense!) To get here, I had to arrange for all of the guards that occur when doing contiguity tests to be lazy. This was done by writing non-branching implementations of each of the tests in `sympy_is_contiguous` etc functions, and then using those implementations when we don't branch.

I also had to do some bug fixes for `is_non_overlapping_and_dense`, as unbacked SymInts were very untested previously (and that was the only time you would actually hit the Python version of the code.) In particular, we now consistently pass separate sizes/strides lists into each of the boolean computation functions (and only pack them into a single argument list when going to Sympy, which doesn't support lists of variables in custom functions.)

Finally, to actually test that this is doing something, I add a simple assumptions system from https://github.com/pytorch/pytorch/pull/90985 and use this to get the end to end test test_item_to_constructor passing. Soon, I intend to replace this with a range analysis system which will be used for assumptions in the short term. (We still might use Z3, but for all the stray assumptions I've seen range analysis will be good enough.)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94473
Approved by: https://github.com/albanD
2023-02-16 16:02:13 +00:00
Edward Z. Yang
89e16c4f18 Assume sympy is always installed (#94903)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94903
Approved by: https://github.com/Skylion007, https://github.com/malfet
2023-02-16 14:09:58 +00:00
PyTorch MergeBot
a049bbb100 Revert "Change test_torchinductor_opinfo.py to mark skips/xfails in a better way (#94813)"
This reverts commit bfc0d5e22c.

Reverted https://github.com/pytorch/pytorch/pull/94813 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it causes failures on trunk bfc0d5e22c due to a landrace with b6df987671
2023-02-16 05:08:23 +00:00
Fabio Rocha
bfc0d5e22c Change test_torchinductor_opinfo.py to mark skips/xfails in a better way (#94813)
With this change, expected failures will be correctly reported as such by pytest (instead of passes as before).
It was sometimes a little confusing to see operators you did not expect to work in inductor reported as passing their tests.

One downside is that expected failures/skips for test variants have now to be identified by tuples. I.e., `("max", "reduction_no_dim"): {f16},` instead of just `"max.reduction_no_dim": {f16}`. It seems to me it is worth it.

This change would also allow to simplify `TestInductorOpInfo` class a little, since it doesn't have to handle the skips/xfails anymore, but that might require dropping support for things like `PYTORCH_COLLECT_EXPECT` and `PYTORCH_FAIL_ON_SUCCESS` so I didn't do it.

Also couple of other minor changes:

 - Got rid of c32, c64, c128 in torchinductor_opinfo. We don't support complex numbers, so they shouldn't be necessary.
 - Renamed TestExpect Enum to ExpectedTestResult to get rid of a pytest warning that thinks it is a class that has tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94813
Approved by: https://github.com/lezcano, https://github.com/jansel
2023-02-16 03:32:01 +00:00
min-jean-cho
b6df987671 [Inductor] Added aten.normal_ decomp (#91207)
Fixes #91085

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91207
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano
2023-02-15 21:21:46 +00:00
Edward Z. Yang
abf59f5703 Make _simplified kwarg private (#94782)
CR on https://github.com/pytorch/pytorch/pull/94404

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94782
Approved by: https://github.com/voznesenskym
2023-02-15 01:52:16 +00:00
Edward Z. Yang
f1f26fe8ec Streamlining guard expect tests (#94404)
Changes:
* Add `simplified` kwarg to let you only render guards that are nontrivial (excludes duck sizing)
* Make a list of strings valid for sources, if you just have some variable names you want to bind to
* Add test helper `show_guards` using these facilities, switch a few tests to it

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94404
Approved by: https://github.com/Chillee
2023-02-13 23:36:21 +00:00
Aaron Gokaslan
3d82d8d0ed [BE] Enable more flake8-comprehensions checks (#94601)
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.

This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
2023-02-10 23:40:29 +00:00
mingfeima
c620ece726 port sparse_mm.reduce to pytorch and optimize it on CPU (#83727)
### Motivation of this PR

This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300

**GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes:

* COO: the hotspot is `scatter_reduce`
* CSR: the hotspot is `spmm_reduce`

The reduce type can be choose from: "max", "mean", "max",  "min".

extend `torch.sparse.mm` with an `reduce` argument, maps to `torch.sparse_mm.reduce` internally.
`sparse_mm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_sparse_mm_reduce_impl` which has dual outputs:
* `out` - the actual output
* `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated.

### Performance

Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch.

Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused.

#### before:
```
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
       torch_sparse::spmm_sum        97.09%       56.086s        97.09%       56.088s        6.232s             9
                 aten::linear         0.00%      85.000us         1.38%     795.485ms      88.387ms             9
                 aten::matmul         0.00%      57.000us         1.38%     795.260ms      88.362ms             9
                     aten::mm         1.38%     795.201ms         1.38%     795.203ms      88.356ms             9
                   aten::relu         0.00%      50.000us         0.76%     440.434ms      73.406ms             6
              aten::clamp_min         0.76%     440.384ms         0.76%     440.384ms      73.397ms             6
                   aten::add_         0.57%     327.801ms         0.57%     327.801ms      36.422ms             9
            aten::log_softmax         0.00%      23.000us         0.10%      55.503ms      18.501ms             3
```

#### after
```
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
               aten::spmm_sum        87.35%       11.826s        87.36%       11.827s        1.314s             9
                 aten::linear         0.00%      92.000us         5.87%     794.451ms      88.272ms             9
                 aten::matmul         0.00%      62.000us         5.87%     794.208ms      88.245ms             9
                     aten::mm         5.87%     794.143ms         5.87%     794.146ms      88.238ms             9
                   aten::relu         0.00%      53.000us         3.35%     452.977ms      75.496ms             6
              aten::clamp_min         3.35%     452.924ms         3.35%     452.924ms      75.487ms             6
                   aten::add_         2.58%     348.663ms         2.58%     348.663ms      38.740ms             9
                 aten::argmax         0.42%      57.473ms         0.42%      57.475ms      14.369ms             4
            aten::log_softmax         0.00%      22.000us         0.39%      52.605ms      17.535ms             3
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83727
Approved by: https://github.com/jgong5, https://github.com/cpuhrsch, https://github.com/rusty1s, https://github.com/pearu
2023-02-10 15:56:40 +00:00
albanD
496c0a207b Make segment_reduce properly private. (#93166)
I am attempting not to change the aten function to reduce the amount of BC issues on the torchscript side.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93166
Approved by: https://github.com/ngimel
2023-02-06 18:32:23 +00:00
Michael Voznesensky
60a3b7425d Small refactor of shape guards to allow for 1:1 code_parts (#93894)
By moving guard string assembly into dynamo's default behavior and letting code_parts do the work, we can have much better shape guard failures.

Before this fix, the guard failure in the test would look like:

```
'x.size()[1] == x.size()[0] and x.stride()[0] == x.[264 chars]!= 1' != 'x.size()[0] < 3'
- x.size()[1] == x.size()[0] and x.stride()[0] == x.size()[0] and x.stride()[1] == 1 and x.storage_offset() == 0 and y.size()[0] == x.size()[0] and y.size()[1] == x.size()[0] and y.stride()[0] == x.size()[0] and y.stride()[1] == 1 and y.storage_offset() == 0 and x.size()[0] < 3 and x.size()[0] != 0 and x.size()[0] != 1
+ x.size()[0] < 3
```
now it is
```
"x.size()[0] < 3"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93894
Approved by: https://github.com/ezyang
2023-02-05 09:24:12 +00:00
Michael Suo
4e4293f15f Add meta registration for bucketize (#93893)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93893
Approved by: https://github.com/zhxchen17
2023-02-02 21:03:08 +00:00
jon-chuang
d5901fcc80 fix(fx): make all make_fx invocations isolated (opaque to higher make_fx invocations) by default (#93290)
Fixes https://github.com/pytorch/pytorch/issues/88996#issuecomment-1409174554

Example code:
```python
import torch
from torch.fx.experimental.proxy_tensor import make_fx, wrapper_and_args_for_make_fx

@torch.fx.wrap
def func(a, b):
    return b.expand([1, a.shape[0], b.shape[-1]])

a = torch.randn(3, 4)
b = torch.randn(4)

class TestMode(torch.overrides.TorchFunctionMode):
    def __torch_function__(self, func, types, args=(), kwargs={}):
        if torch.overrides.resolve_name(func) in ["torch.Tensor.expand"]:
            print(f"TestMode: {func} {args} {kwargs}")
            wrapped, all_args = wrapper_and_args_for_make_fx(func, args, kwargs)
            gm = make_fx(wrapped, tracing_mode="real")(all_args)

        return func(*args, **kwargs)

with TestMode():
    gm = make_fx(func, tracing_mode="symbolic")(a, b)

gm.graph.print_tabular()
```
Before:
```
opcode         name        target               args                              kwargs
-------------  ----------  -------------------  --------------------------------  --------
placeholder    a_1         a_1                  ()                                {}
placeholder    b_1         b_1                  ()                                {}
call_function  detach      aten.detach.default  (b_1,)                            {}
call_function  detach_1    aten.detach.default  (detach,)                         {}
call_function  sym_size    aten.sym_size        (a_1, 0)                          {}
call_function  sym_size_1  aten.sym_size        (b_1, 0)                          {}
call_function  expand      aten.expand.default  (b_1, [1, sym_size, sym_size_1])  {}
call_function  detach_2    aten.detach.default  (expand,)                         {}
call_function  expand_1    aten.expand.default  (b_1, [1, sym_size, sym_size_1])  {}
output         output      output               (expand_1,)                       {}
```

After:
```
opcode         name        target               args                              kwargs
-------------  ----------  -------------------  --------------------------------  --------
placeholder    a_1         a_1                  ()                                {}
placeholder    b_1         b_1                  ()                                {}
call_function  sym_size    aten.sym_size        (a_1, 0)                          {}
call_function  sym_size_1  aten.sym_size        (b_1, 0)                          {}
call_function  expand      aten.expand.default  (b_1, [1, sym_size, sym_size_1])  {}
output         output      output               (expand_1,)                       {}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93290
Approved by: https://github.com/ezyang
2023-02-01 17:28:48 +00:00
Ivan Yashchuk
fba13d94a1 Remove deprecated torch.symeig (#70988)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`.

- [x] XLA PR: https://github.com/pytorch/xla/pull/4498

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988
Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet
2023-01-31 11:59:11 +00:00
Edward Z. Yang
ec2461bbd8 Remove proxy tensor's check for data dependent output (#93265)
We'll rely on the underlying fake tensor to raise an error in these cases.  We only raise the error if there is an input to the data dependent operation that is a real tensor (and thus we are at risk of accidentally burning in real values)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93265
Approved by: https://github.com/albanD
2023-01-31 11:58:49 +00:00
Aaron Gokaslan
e790281a85 SymInt'ify view_as (#93242)
Follow up to #93241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93242
Approved by: https://github.com/ezyang
2023-01-30 01:56:50 +00:00
Edward Z. Yang
3c570a2be3 SymInt'ify reshape_as (#93241)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93241
Approved by: https://github.com/Skylion007
2023-01-30 01:46:16 +00:00
Edward Z. Yang
1b5bfe9dd1 Properly compute device for elementwise operations with CPU scalar tensor (#93073)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93073
Approved by: https://github.com/eellison, https://github.com/bdhirsh
2023-01-26 21:27:57 +00:00
Edward Z. Yang
17803fb36e Make meshgrid support symbolic shapes (#93075)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93075
Approved by: https://github.com/Skylion007
2023-01-26 20:57:29 +00:00
Joel Schlosser
e5fd7e6d8f Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854)
For the `crossvit_9_240` model - it works now with dynamo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92854
Approved by: https://github.com/ezyang
2023-01-25 05:08:02 +00:00
PyTorch MergeBot
01f1097770 Revert "Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854)"
This reverts commit d49187bf88.

Reverted https://github.com/pytorch/pytorch/pull/92854 on behalf of https://github.com/malfet due to Resulted in 50+% flaky failures in dynamo, reverting
2023-01-25 00:10:14 +00:00
Joel Schlosser
d49187bf88 Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854)
For the `crossvit_9_240` model - it works now with dynamo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92854
Approved by: https://github.com/ezyang
2023-01-24 21:36:17 +00:00
PyTorch MergeBot
acdd462b1a Revert "Remove deprecated torch.symeig (#70988)"
This reverts commit d70ed68162.

Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful
2023-01-24 19:03:40 +00:00
Ivan Yashchuk
d70ed68162 Remove deprecated torch.symeig (#70988)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988
Approved by: https://github.com/lezcano, https://github.com/kit1980
2023-01-23 22:51:40 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
8f3600b966 [RELAND] Add metadata coverage for unsafe_split and unsafe_split_with_sizes (#92802)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92802
Approved by: https://github.com/soumith
2023-01-23 10:57:10 +00:00
Edward Z. Yang
c4501593c3 Delete get_pyobj() entirely (#92638)
Opt for the shorter and more direct node attribute access.

I need to do this because I'm going to publicly document
SymInt and SymFloat but I don't want to doc get_pyobj().

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92638
Approved by: https://github.com/Chillee, https://github.com/albanD, https://github.com/voznesenskym, https://github.com/bdhirsh
2023-01-20 19:06:56 +00:00
kshitij12345
274958ef43 [vmap] unsafe_split : batching rule and OpInfo (#92291)
Ref: https://github.com/pytorch/functorch/issues/1089

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92291
Approved by: https://github.com/Chillee
2023-01-20 10:31:56 +00:00
PyTorch MergeBot
827e22ec2d Revert "[vmap] unsafe_split : batching rule and OpInfo (#92291)"
This reverts commit 0510ae59b3.

Reverted https://github.com/pytorch/pytorch/pull/92291 on behalf of https://github.com/kshitij12345 due to Broke trunk
2023-01-19 13:49:43 +00:00
kshitij12345
0510ae59b3 [vmap] unsafe_split : batching rule and OpInfo (#92291)
Ref: https://github.com/pytorch/functorch/issues/1089

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92291
Approved by: https://github.com/Chillee
2023-01-19 06:34:45 +00:00
Peter Bell
8770a7ed6f Decompose more inplace ops (#90967)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90967
Approved by: https://github.com/anijain2305
2023-01-18 21:07:47 +00:00
Richard Zou
5d01277fea Deprecate torch.nn.utils.stateless.functional_call (#92280)
This PR:
- Updates the docs to say it is deprecated
- Raises a UserWarning
- Changes most of the callsites inside PyTorch to use
torch.func.functional_call, minus the test_stateless testing.

The motivation behind this is that we can now align behind a single
functional_call API in PyTorch.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92280
Approved by: https://github.com/albanD
2023-01-18 14:26:25 +00:00
Peter Bell
f0b592dae7 Make masked_fill reference traceable (#90972)
As the comment states, `item()` cannot be used since you can't trace through a
scalar.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90972
Approved by: https://github.com/ngimel
2023-01-18 10:54:42 +00:00
Avik Chaudhuri
bb11e072ae Squash and merge linalg meta kernels (#92335)
Squashed changes from https://github.com/pytorch/pytorch/pull/92021 and https://github.com/pytorch/pytorch/pull/92020 and https://github.com/pytorch/pytorch/pull/92019

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92335
Approved by: https://github.com/avikchaudhuri
2023-01-18 05:55:52 +00:00
lezcano
138a0188e0 Add support for logaddexp(float16) in CUDA and implement its reference (#91869)
The reference is implemented so that it generates efficient and
numerically stable triton code.

Fixes https://github.com/pytorch/pytorch/issues/91683

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91869
Approved by: https://github.com/ngimel
2023-01-10 00:19:24 +00:00
Xia, Weiwen
de9c82f41a [Meta] Register aten.pixel_shuffle.default for meta (#91605)
**Summary**
Fixes #91551
`aten.pixel_shuffle.default` is not registered for meta and it always generates contiguous (channels-first) layout of outputs. It can be reproduced by `torch.compile` (as described in the issue #91551) and running in FakeTensorMode.

**Test plan**
python test/inductor/test_torchinductor.py -k test_pixel_shuffle_channels_last
python test/test_proxy_tensor.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91605
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/anijain2305
2023-01-06 00:45:14 +00:00
Edward Z. Yang
f8740db410 Properly resolve source_ref when constructing shape guards (#91058)
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when https://github.com/pytorch/pytorch/pull/89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91058
Approved by: https://github.com/voznesenskym
2022-12-30 05:56:56 +00:00
Edward Z. Yang
bcf15cd93b Store source, not sname, in Symbol (#91057)
I'm going to need this in the follow up PR. Instead of storing only Source.name() in Symbol, I now store a full on Source. Lots of replumbing reoccurs. In particular:

- Move Source to torch._guards to break cycles
- I have to add TensorPropertySource and NegateSource to handle x.size()[0] and -x codegen that I was doing with string manipulation previously
- I tighten up invariants so that I never pass source=None; instead I pass ConstantSource (these are constant sources right) and test for that rather than source being missing. I think this is more parsimonious
- Some mypy wobbles from new imports

I didn't move LocalSource and friends to torch._guards, but I ended up needing to access them in a few places. The main annoyance with moving these is that then I also need to move the bytecode codegen stuff, and that's not so easy to move without bringing in the kitchen sink.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91057
Approved by: https://github.com/albanD, https://github.com/voznesenskym, https://github.com/zou3519
2022-12-30 05:56:56 +00:00
Joel Schlosser
8b55b86dbd Move sym_int and sym_float alongside SymInt / SymFloat in base torch package (#91317)
This PR moves the definitions for:
* `sym_int`
* `sym_ceil` (used only for `sym_int`)
* `sym_floor` (used only for `sym_int`)
* `sym_float`

from `torch/fx/experimental/symbolic_shapes.py` to `torch/__init__.py`, where `SymInt` and `SymFloat` are already defined.

This removes the need for several in-line imports, and enables proper JIT script gating for #91318. I'm very open to doing this in a better way!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91317
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2022-12-28 16:08:16 +00:00
Joel Schlosser
1c40ec46ff Decomps and meta registrations for upsample_nearest 1D / 2D / 3D (#91260)
Adds decompositions and meta registrations for the 1D, 2D, and 3D implementations of `upsample_nearest`. All related OpInfo-based tests for AOTAutograd now pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91260
Approved by: https://github.com/ezyang
2022-12-28 16:03:25 +00:00
PyTorch MergeBot
b68fd7e319 Revert "Store source, not sname, in Symbol (#91057)"
This reverts commit 88c581be87.

Reverted https://github.com/pytorch/pytorch/pull/91057 on behalf of https://github.com/atalman due to causing internal build failures
2022-12-21 22:33:15 +00:00
Edward Z. Yang
88c581be87 Store source, not sname, in Symbol (#91057)
I'm going to need this in the follow up PR. Instead of storing only Source.name() in Symbol, I now store a full on Source. Lots of replumbing reoccurs. In particular:

- Move Source to torch._guards to break cycles
- I have to add TensorPropertySource and NegateSource to handle x.size()[0] and -x codegen that I was doing with string manipulation previously
- I tighten up invariants so that I never pass source=None; instead I pass ConstantSource (these are constant sources right) and test for that rather than source being missing. I think this is more parsimonious
- Some mypy wobbles from new imports

I didn't move LocalSource and friends to torch._guards, but I ended up needing to access them in a few places. The main annoyance with moving these is that then I also need to move the bytecode codegen stuff, and that's not so easy to move without bringing in the kitchen sink.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91057
Approved by: https://github.com/albanD, https://github.com/voznesenskym
2022-12-21 04:51:51 +00:00
Edward Z. Yang
e48c91688b DebugInterpreter works with symbolic shapes now, plus test (#90913)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90913
Approved by: https://github.com/voznesenskym
2022-12-16 05:22:56 +00:00
Edward Z. Yang
67436f621a Add utility for binding symbols based on arguments passed to placeholders (#90912)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90912
Approved by: https://github.com/voznesenskym
2022-12-16 05:22:56 +00:00
Edward Z. Yang
54563e6288 Don't put tracing state on Tensor (#90628)
Fixes https://github.com/pytorch/pytorch/issues/89626

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90628
Approved by: https://github.com/voznesenskym
2022-12-15 08:43:08 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
1aab755320 Fakify params and weights under private config (#90417)
Previously, we planned to lift the parameters and weights while exporting and implement our own transformer to "unlift" the lifted weights and params back to the graph as attributes. But this is bit challenging because:

- We need to maintain correct ordering for weights and parameters that are passed as inputs so that we know how to map them back.
- Some weights are unused in the graph, so our transformer needs to be aware of which weights and parameters are not used in the graph. And we need to distinguish which are real user input and which are parameters.
- There can be more edge cases we haven't seen in other models yet.

I am aware that @Chillee  and @bdhirsh mentioned that functionalization won't work with fake-tensor attributes but this is fine for the short term as we don't expect users to be modifying weights and params in inference mode. In fact, we explicitly disable attribute mutation in torchdynamo export mode right now.

Given above condition, it might be ok to just fakify params when we need. I use a flag to guard against this change.

Differential Revision: [D41891201](https://our.internmc.facebook.com/intern/diff/D41891201)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90417
Approved by: https://github.com/eellison
2022-12-14 09:33:18 +00:00
Joel Schlosser
4a5f4416d0 Make at::outer SymInt-aware (#90714)
Fixes matmul and related ops with meta; no more xfails needed. The non-working case for matmul was the matrix-vector case, which dispatches to `outer`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90714
Approved by: https://github.com/lezcano
2022-12-13 18:16:09 +00:00
Edward Z. Yang
f7365eca90 Add unbacked symints support; item works now (#90624)
The big idea is to add `create_unbacked_symfloat` and `create_unbacked_symint` to ShapeEnv, allowing you to allocate symbolic floats/ints corresponding to data you don't know about at compile time. Then, instead of immediately erroring out when you try to call local_scalar_dense on a FakeTensor, we instead create a fresh symint/symfloat and return that.

There a bunch of odds and ends that need to be handled:

* A number of `numel` calls converted to `sym_numel`
* When we finally return from item(), we need to ensure we actually produce a SymInt/SymFloat when appropriate. The previous binding code assumed that you would have to get a normal Python item. I add a pybind11 binding for Scalar (to PyObject only) and refactor the code to use that. There is some trickiness where you are NOT allowed to go through c10::SymInt if there isn't actually any SymInt involved. See comment.
* One of our unit tests tripped an implicit data dependent access which occurs when you pass a Tensor as an argument to a sizes parameter. This is also converted to support symbolic shapes
* We now support tracking bare SymInt/SymFloat returns in proxy tensor mode (this was already in symbolic-shapes branch)
* Whenever we allocate an unbacked symint, we record the stack trace it was allocated at. These get printed when you attempt data dependent access on the symint (e.g., you try to guard on it)
* Subtlety: unbacked symints are not necessarily > 1. I added a test for this.

These unbacked symints are not very useful right now as you will almost always immediately raise an error later when you try to guard on them. The next logical step is adding an assertion refinement system that lets ShapeEnv learn facts about unbacked symints so it can do a better job eliding guards that are unnecessary.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90624
Approved by: https://github.com/Skylion007, https://github.com/voznesenskym
2022-12-12 13:33:07 +00:00
Edward Z. Yang
e33f1eeeb7 SymIntify resize_ and deduplicate memory format logic (#90442)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90442
Approved by: https://github.com/bdhirsh
2022-12-11 14:38:38 +00:00
Edward Z. Yang
45109ec30a Completely redo how ShapeEnv guards are generated (#90528)
Instead of inferring shape mappings from a bunch of data structures that were plumbed in InstructionTranslator, we instead work out mappings by just iterating over the GraphArgs and mapping symbols to arguments as they show up. If multiple argument sizes/strides/offset map to the same symbol, this means they are duck sized, so we also generate extra equality tests that they must be equal. Finally, we generate 0/1 specialization guards. The resulting code is much shorter, and I think also easier to understand.

TODO: Delete all the tensor ref tracking code, it's unnecessary

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90528
Approved by: https://github.com/voznesenskym
2022-12-10 13:35:04 +00:00
Edward Z. Yang
49c674e155 Revert guaranteed symint allocation (#90381)
So, uh, I have a new strategy for generating dupe guards, one where I don't actually need to allocate symints for every tensor that is fakeified. So I'm reverting the changes I made from earlier PRs in this one.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90381
Approved by: https://github.com/voznesenskym
2022-12-10 13:17:34 +00:00
Edward Z. Yang
e03cde07e4 Guarantee symbol allocation for all sizes/strides/storage offset (#89879)
We may need to express guards on the size/stride/storage offset of
a tensor, but we cannot do this if it's already been duck sized.
This PR guarantees that we allocate a symbol (or negation of the
symbol) whenever we ask to create a SymInt, and propagates this
symbol to SymNode so that Dynamo can look at it (not in this PR).

This PR doesn't actually add guards, nor does Dynamo do anything
with these symbols.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89879
Approved by: https://github.com/albanD
2022-12-01 13:43:10 +00:00
Nikita Karetnikov
4cb6bbbe27 Symintify embedding (#89327)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327
Approved by: https://github.com/ezyang
2022-11-24 03:25:00 +00:00
Edward Z. Yang
5266953443 Add crossref debug mode for functionalization, catches stride errors (#89498)
The idea is to add a custom handler to Functionalize key in Python
dispatcher that runs the functionalized version along side a non
functionalized version, and checks that their outputs agree in the
end.  (Technically, for metadata mutation we should also check the
inputs, but for now we're relying on those functions returning self.)
I turned this on for test_functionalize.py (new TestCrossRefFunctionalize)
and found a bunch of failures that look legit.

This probably doesn't interact that nicely if you're also tracing at
the same time, probably need more special logic for that (directly,
just disabling tracing for when we create the nested fake tensor mode,
but IDK if there's a more principled way to organize this.)

There are some misc fixups which I can split if people really want.

- xfail_inherited_tests moved to test common_utils
- Bindings for _dispatch_tls_set_dispatch_key_included,
  _dispatch_tls_is_dispatch_key_included and _functionalization_reapply_views_tls
- Type stubs for _enable_functionalization, _disable_functionalization
- all_known_overloads utility to let you iterate over all OpOverloads
  in all namespaces.  Iterator support on all torch._ops objects to let
  you iterate over their members.
- suspend_functionalization lets you temporarily disable functionalization mode
  in a context
- check_metadata_matches for easily comparing outputs of functions and see
  if they match (TODO: there are a few copies of this logic, consolidate!)
- _fmt for easily printing the metadata of a tensor without its data
- _uncache_dispatch for removing a particular dispatch key from the cache,
  so that we force it to regenerate
- check_significant_strides new kwarg only_cuda to let you also do stride
  test even when inputs are not CUDA
- Functionalize in torch._C.DispatchKey

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89498
Approved by: https://github.com/malfet
2022-11-23 04:18:25 +00:00
anjali411
9c0bf9387c Meta impl for linalg_cholesky and linalg_cholesky_ex (#89430)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89430
Approved by: https://github.com/ezyang
2022-11-22 17:05:34 +00:00
Sherlock Huang
caf3d5319f Symintify numel(), infer_size, prims.elementwise_meta (#88956)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88956
Approved by: https://github.com/ezyang
2022-11-20 00:42:03 +00:00
PyTorch MergeBot
8ad39536d7 Revert "Symintify numel(), infer_size, prims.elementwise_meta (#88956)"
This reverts commit ce2f8700ba.

Reverted https://github.com/pytorch/pytorch/pull/88956 on behalf of https://github.com/ezyang due to somehow breaks torch.numel
2022-11-19 21:47:55 +00:00
Edward Z. Yang
5582001bd5 Reland 2 "Towards unifying symbolic and non symbolic fake tensor (#89038) (#89143)" (#89346)
This reverts commit 8e4c9828f4.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89346
Approved by: https://github.com/wconstab
2022-11-19 21:14:31 +00:00
Edward Z. Yang
94b5c807fd Detach fake tensors into val, so they aren't affected by metadata mutation (#89140)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89140
Approved by: https://github.com/bdhirsh
2022-11-19 00:08:14 +00:00
lezcano
154e58c032 Add most in-place references/decompositions (#88117)
We add most in-place references in a generic way. We also implement a
wrapper to implement the annoying interface that `nn.functional`
nonlinearities have.

We fix along the way a couple decompositions for some non-linearities by
extending the arguments that the references have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88117
Approved by: https://github.com/mruberry
2022-11-18 14:59:46 +00:00
Sherlock Huang
f1fb586bc6 Symintify repeat_interleave.self_int (#89111)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89111
Approved by: https://github.com/ezyang
2022-11-18 05:04:02 +00:00
PyTorch MergeBot
8e4c9828f4 Revert "Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143)"
This reverts commit e686b8c3ba.

Reverted https://github.com/pytorch/pytorch/pull/89143 on behalf of https://github.com/ZainRizvi due to This seems to be causing the test_make_fx_symbolic_exhaustive_rad2deg_cpu_float32 and test_make_fx_symbolic_exhaustive_inplace_rad2deg_cpu_float32 test to fail across multiple jobs
2022-11-17 17:02:36 +00:00
Edward Z. Yang
e686b8c3ba Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143)
This reverts commit cf6003f046.

Differential Revision: [D41363992](https://our.internmc.facebook.com/intern/diff/D41363992)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89143
Approved by: https://github.com/albanD
2022-11-17 13:55:06 +00:00
PyTorch MergeBot
cf6003f046 Revert "Towards unifying symbolic and non symbolic fake tensor (#89038)"
This reverts commit 37d54239c7.

Reverted https://github.com/pytorch/pytorch/pull/89038 on behalf of https://github.com/ezyang due to executorch segfaults
2022-11-16 16:52:47 +00:00
Edward Z. Yang
37d54239c7 Towards unifying symbolic and non symbolic fake tensor (#89038)
Fake tensor behaves pretty differently depending on if you have
symbolic shapes or not.  This leads to bugs; for example, we
weren't getting correct convolution_backward strides because we
bypassed the correct stride logic in fake tensor on symbolic
shapes.

This PR attempts to unify the two codepaths.  I don't manage to
unify everything, but I get most of it.  The algorithm is delicate
and I'm still hosing down test failures.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89038
Approved by: https://github.com/anjali411
2022-11-16 14:02:43 +00:00
anjali411
dc40d3f93f Add meta impl for grid_sampler_2d_backward (#88745)
TODO: add an OpInfo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88745
Approved by: https://github.com/ezyang
2022-11-16 13:01:47 +00:00
Sherlock Huang
ce2f8700ba Symintify numel(), infer_size, prims.elementwise_meta (#88956)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88956
Approved by: https://github.com/ezyang
2022-11-16 03:36:00 +00:00
anjali411
b815f1fc50 Symintify view_as_complex and view_as_real (#89052)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):
* __->__ #89052
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89052
Approved by: https://github.com/ezyang
2022-11-15 16:28:36 +00:00
Sherlock Huang
5faa2792fa Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761
Approved by: https://github.com/ezyang
2022-11-15 13:34:45 +00:00
PyTorch MergeBot
eea506aee1 Revert "Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)"
This reverts commit 9eabcc370f.

Reverted https://github.com/pytorch/pytorch/pull/88761 on behalf of https://github.com/suo due to much broken 9eabcc370f
2022-11-14 01:58:47 +00:00
Sherlock Huang
9eabcc370f Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761
Approved by: https://github.com/ezyang
2022-11-13 21:30:53 +00:00
anjali411
52be0c42ab meta function for max_pool2d_with_indices_backward (#88743)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88743
Approved by: https://github.com/lezcano, https://github.com/ezyang
2022-11-13 18:31:56 +00:00
Nikita Karetnikov
1e8f95ace1 Symintify broadcast_to (#88776)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88776
Approved by: https://github.com/ezyang
2022-11-11 15:49:43 +00:00
anjali411
d615d12289 Add meta impl for topk (#88694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88694
Approved by: https://github.com/ezyang
2022-11-11 15:28:41 +00:00
anjali411
fc9e36dd42 Add meta support for scalar_tensor and argmax (#88590)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88590
Approved by: https://github.com/albanD
2022-11-11 01:31:00 +00:00
PyTorch MergeBot
d157fca59c Revert "Symintify broadcast_to (#88776)"
This reverts commit 3a09d9a129.

Reverted https://github.com/pytorch/pytorch/pull/88776 on behalf of https://github.com/malfet due to Broke functorch/test_aotdispatch on M1, see 3a09d9a129
2022-11-10 18:19:54 +00:00
Nikita Karetnikov
4b898a7304 Symintify adaptive_avg_pool3d (#88783)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88783
Approved by: https://github.com/ezyang
2022-11-10 15:23:54 +00:00
Nikita Karetnikov
3a09d9a129 Symintify broadcast_to (#88776)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88776
Approved by: https://github.com/ezyang
2022-11-10 15:21:50 +00:00
Edward Z. Yang
d81797e845 Meta function for aten.sort and aten.scatter* (#88705)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88705
Approved by: https://github.com/ezyang
2022-11-09 17:47:14 +00:00
Edward Z. Yang
f0e6cea2ed Meta registrations for inplace operators (#88678)
Also, handle non-default alpha correctly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88678
Approved by: https://github.com/SherlockNoMad, https://github.com/albanD
2022-11-09 01:27:01 +00:00
Edward Z. Yang
a880ddc164 Meta implementation for unsqueeze_ (#88675)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88675
Approved by: https://github.com/SherlockNoMad
2022-11-09 01:27:01 +00:00
Edward Z. Yang
1dab35ca1b Meta implementation for bernoulli (#88676)
For some reason bernoulli uses legacy memory format, see linked issue.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88676
Approved by: https://github.com/SherlockNoMad
2022-11-09 01:26:58 +00:00
Edward Z. Yang
1b5373fc83 Mark as_strided_ as supporting SymInt in C++ (#88674)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88674
Approved by: https://github.com/anjali411
2022-11-08 18:45:05 +00:00
lezcano
39d9d2ed70 Implement reference for lerp (#87424)
We follow the vectorised CPU implementation for numerical accuracy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87424
Approved by: https://github.com/ezyang
2022-11-02 11:21:01 +00:00
Tugsbayasgalan Manlaibaatar
2c7de4a144 Add meta implementation for aten.max.dim (#88005)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88005
Approved by: https://github.com/Chillee, https://github.com/bdhirsh
2022-11-01 18:37:24 +00:00
Edward Z. Yang
2a47b10780 Get the magic method try reverse protocol correct (#88030)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88030
Approved by: https://github.com/anjali411, https://github.com/albanD
2022-10-31 13:19:56 +00:00