Commit Graph

278 Commits

Author SHA1 Message Date
Yanbo Liang
dc58259746 [Inductor] [FX passes] Group linear fusion (#105116)
Summary:
The draft version of a group + batch fusion framework, and the group linear fusion implementation.
In the future, it's pretty straightforward to add a new group/batch fusion policy by defining a class with match + fuse functions.

Test Plan: buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:group_batch_fusion

Differential Revision: D46956695

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105116
Approved by: https://github.com/jansel
2023-07-18 03:56:42 +00:00
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Edward Z. Yang
1152e86da1 Transmute refined SymInt into int (#104828)
Previously, x.size(0) could return a SymInt, even when the internal
sympy expression was actually already constant (e.g., due to an
introduced guard.)  We now allow to query the Python object with
maybe_as_int which allows us to transmute these objects back to
int when possible.

It is still possible to end up with a constant SymInt even after this
change, e.g., if you get out a SymInt and while holding onto it
specialize it, but casual users are more likely to get ints when they
want to.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104828
Approved by: https://github.com/Skylion007
2023-07-15 18:46:10 +00:00
kshitij12345
d552c271db [pt2] grad support (#102264)
Teach dynamo about grad

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102264
Approved by: https://github.com/zou3519
2023-06-21 10:13:09 +00:00
PyTorch MergeBot
e737a8486f Revert "[pt2] grad support (#102264)"
This reverts commit 85b83954c8.

Reverted https://github.com/pytorch/pytorch/pull/102264 on behalf of https://github.com/huydhn due to This is failing in trunk 85b83954c8 and looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/102264#issuecomment-1600001309))
2023-06-21 03:02:55 +00:00
kshitij12345
85b83954c8 [pt2] grad support (#102264)
Teach dynamo about grad

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102264
Approved by: https://github.com/zou3519
2023-06-21 01:37:08 +00:00
Kurt Mohler
ee83c646bb Replace _prims_common.check with torch._check* (#103240)
This relands most of the changes from #102219 which were backed out by #103128. However, instead of removing `_prims_common.check`, it adds a warning and a comment mentioning that it will be removed in the future and `torch._check*` should be used instead. As mentioned in https://github.com/pytorch/pytorch/pull/103128#pullrequestreview-1466414415, `_prims_common.check` cannot yet be removed because of some internal usage

Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103240
Approved by: https://github.com/albanD
2023-06-21 00:46:17 +00:00
Richard Zou
08a054649c [operator_compile_check] Add FakeTensor testing (#103595)
This PR adds dedicated FakeTensor testing to operator_compile_check. We
reuse CrossRefFakeMode to do this and improve the error messages on it.

Note that this only really runs detailed tests for operators that do not
have data-dependent output shape. In the future we should add something
like a dynamic CrossRefFakeMode.

Test Plan:
- existing tests (these now have improved error messages).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103595
Approved by: https://github.com/ezyang, https://github.com/soulitzer
2023-06-16 16:55:51 +00:00
xuanqi
b27c3558a4 [RFC]: Create aten native op for constrain_range (#103346)
At high current implementation of constrains functions (constrain_as_**) will raise exception for the following code snippets:
```
def f(x):
    a = x.item()
    constrain_as_size(a, 4, 7)
    return torch.empty((a, 4))

inp = torch.tensor([5])
ep = torch._export.export(f, (inp,))
```

The reason is because current constrain logic is:
1) Purely python so it won't survive AOT export (the full node is gone after AOT export since AOT export only maintains aten level op).
2) Utilize side effect to add range constraints for traced symbol's shape env ([code](9591e52880/torch/fx/experimental/symbolic_shapes.py (L370-L372))).
3) If runtime assertion is turned on (by default). [`_AddRuntimeAssertionsForConstraintsPass`](9591e52880/torch/_export/passes/add_runtime_assertions_for_constraints_pass.py (L98-L100)) will try to append assertion node based on range constrains extracted from shape env of symbol during another interpretation round.
4). However, since 1), in the round of AOT export, range constraints logic won't run for symbols generated during this round. And later there is no range constrains information available for assertion round and caused issue.
5) As a result of above, it will failure at `torch.empty((a, 4))` (there is no constrains for `a` that it must be positive).

The fix here is just to implement range constrain logic as a native aten op (CPU implementation as no-op) to make it be able to survive AOT export.

**NOTE:**
[Logic](2d745b95d7/torch/fx/experimental/symbolic_shapes.py (L350-L365C15)) within [`constrain_range`](2d745b95d7/torch/fx/experimental/symbolic_shapes.py (LL313C74-L313C74)) is split out as `constrain_range_int` to capture case when non `SymInt` is passed in and reused in the new `_constrain_range`. The reason is when non `SymInt` is provided:
* If it directly calls `sym_constrain_range`, the C++ version will be called which will be no-op.
* So in this case it calls `constrain_range_int` instead to be able to capture issue like user provides a input whose tensor's shape could be out of range during exporting, like the following for above code example:
```
...
inp = torch.tensor([10])
ep = torch._export.export(f, (inp,)) # immediately raise error
```

Differential Revision: [D46734204](https://our.internmc.facebook.com/intern/diff/D46734204)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103346
Approved by: https://github.com/tugsbayasgalan
2023-06-16 14:55:40 +00:00
Thiago Crepaldi
6f655d4195 Add symbolic tracing support to torch._dynamo.export (fake input + weights) (#100017)
Fixes #95900
Using the following repro as guide:

```python
import torch
import torch._dynamo
from torch._subclasses import fake_tensor
from torch.fx.experimental.symbolic_shapes import ShapeEnv
from torch._dynamo.output_graph import config
class Model(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = torch.nn.Linear(2, 2)
        self.linear2 = torch.nn.Linear(2, 2)

    def forward(self, x):
        out = self.linear(x)
        out = self.linear2(out)
        return out

fake_mode = fake_tensor.FakeTensorMode(allow_non_fake_inputs=False,
                                       allow_fallback_kernels=True,
                                       shape_env=ShapeEnv(
                                            allow_scalar_outputs=config.capture_scalar_outputs,
                                            allow_dynamic_output_shape_ops=config.capture_dynamic_output_shape_ops,
                                            frame_id=0
                                        ),
)
# Fakefying input/model before calling torch._dynamo.export
with fake_mode:
    fake_x = torch.rand(5, 2, 2)
    model = Model()

# Calling torch._dynamo.export without active fake mode
graph_module, guards = torch._dynamo.export(
    model,
    fake_x,
    aten_graph=True,
    fake_mode=fake_mode
)
graph_module.print_readable()
graph_module.graph.print_tabular()
```

Summary of changes:

    * Plumb fake_mode through torch.export API. When specified, it
    replaces the creation of a new FaketendorMode at InstructionTranslator on behalf of OutputGraph
     Hacks FakeTensor.__new__ to prevent a
    torch.tensor._make_subclass call for inputs that are already fakefied by
    user. This probably need to be fixed in a nicer way. Any idea?
    * Removed a few asserts that didn't want faked tensors coming
    from user script
    * Added torch._subclasses.fake_tensor.FakeTensor to type list on a few
    asserts check to allow fake inputs

The changes above allowed symbolic tracing with both static and dynamic shapes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100017
Approved by: https://github.com/ezyang
2023-06-15 21:28:10 +00:00
Edward Z. Yang
1c3a7d9a7e Resolve TODO by deleting assert sparse cannot be meta on SymInt (#103299)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103299
Approved by: https://github.com/bdhirsh
2023-06-09 17:13:54 +00:00
Edward Z. Yang
96fd283640 Preserve CreationMeta when metafying views. (#103152)
This helps us avoid erroring / generate more accurate error messages
in Dynamo when doing mutations on views.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103152
Approved by: https://github.com/albanD
2023-06-09 12:34:54 +00:00
Jack Taylor
e936277cc2 [ROCm] force HIP context initialization for inductor UTs (#103149)
Workaround for https://github.com/pytorch/pytorch/issues/102886
related to: https://github.com/pytorch/pytorch/issues/102476 https://github.com/pytorch/pytorch/issues/102475 https://github.com/pytorch/pytorch/issues/102474 https://github.com/pytorch/pytorch/issues/102473 https://github.com/pytorch/pytorch/issues/102473 https://github.com/pytorch/pytorch/issues/102472

Since 9aaa12e328 the first inductor (CPU) UT fails until the GPU context is correct initialised and the subsequent UTs pass. CUDA observes the same issue and a workaround was pushed to force initialisation of cuda context by declaring an empty tensor https://github.com/pytorch/pytorch/issues/92627, we have adopted the same approach but have opted for `torch.zeros` which correctly activates the HIP context after the kernel launch.

**Reproducer:**
```
import torch
from torch._subclasses.fake_tensor import FakeTensorMode
import argparse
if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Swap between torch.empty and torch.randn operations.')
    parser.add_argument('--empty', action='store_true', help='Use torch.empty operation')
    parser.add_argument('--rand', action='store_true', help='Use torch.randn operation')
    args = parser.parse_args()

    torch.cuda.set_device(0)
    if args.empty:
        torch.empty(1, device="cuda")
    elif args.rand:
        torch.rand(1, device="cuda")
    print(f": hasPrimaryContext: {torch._C._cuda_hasPrimaryContext(0)")
    with FakeTensorMode():
        p = torch.randn(4, 2, requires_grad=True, device='cuda')
        x = torch.randn(8, 4, device='cuda')
        y = torch.mm(x, p).square().sum()
        y.backward()
```

**ROCm python repro.py --empty**
0: hasPrimaryContext: False

**ROCm python repro.py --rand**
0: hasPrimaryContext: True

**CUDA python repro.py --empty**
0: hasPrimaryContext: True

**CUDA python repro.py --rand**
0: hasPrimaryContext: True

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103149
Approved by: https://github.com/eellison
2023-06-07 21:42:33 +00:00
Ivan Zaitsev
821493715c Back out "Remove check from _prims_common, replace with torch._check* (#102219)", Back out "Forwatd fix for D46427687" (#103128)
Test Plan: revertitparrot

Reviewed By: malfet

Differential Revision: D46506433

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103128
Approved by: https://github.com/malfet
2023-06-07 01:41:41 +00:00
Richard Zou
5b700fc914 Disable fallback for custom kernels (#101131)
Previous failed attempt was here: https://github.com/pytorch/pytorch/pull/97715.
Basically we tried to disable fallback for all ops (aten + custom) but hit many CI failures due to missing fake tensor coverage. Let's just disable it for custom kernels for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101131
Approved by: https://github.com/zou3519
2023-06-06 23:25:29 +00:00
shibo19
9d20b47e47 make device normalization more generic in faketensor (#102519)
Fixes #ISSUE_NUMBER
 make the device normalization more generic in faketensor to support devices like "cuda", "foo" and so on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102519
Approved by: https://github.com/albanD
2023-06-04 01:44:21 +00:00
Kurt Mohler
a84bb2709a Remove check from _prims_common, replace with torch._check* (#102219)
Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-06-03 02:23:21 +00:00
PyTorch MergeBot
a7efa0ce35 Revert "Remove check from _prims_common, replace with torch._check* (#102219)"
This reverts commit fb79d43649.

Reverted https://github.com/pytorch/pytorch/pull/102219 on behalf of https://github.com/malfet due to Broke lint, see https://github.com/pytorch/pytorch/actions/runs/5158949959/jobs/9293466925 ([comment](https://github.com/pytorch/pytorch/pull/102219#issuecomment-1574245414))
2023-06-02 20:00:48 +00:00
Kurt Mohler
fb79d43649 Remove check from _prims_common, replace with torch._check* (#102219)
Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219
Approved by: https://github.com/lezcano, https://github.com/albanD
2023-06-02 19:13:45 +00:00
William Wen
da963d793b Fix aten.copy device mismatch bug in FakeTensor (#102664)
Fixes `pytest ./generated/test_yizhou_wang_RODNet.py -k test_000` failure in https://github.com/pytorch/pytorch/issues/92670.

FakeTensor would raise an error upon trying to run `aten.copy` with inputs with different devices, although this is allowed behavior.

Also fix `aten.slice_scatter`, since it also takes args with different devices.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102664
Approved by: https://github.com/yanboliang
2023-06-01 23:05:20 +00:00
Edward Z. Yang
e03800a93a Add torch._utils.render_call, improve printoptions (#102623)
- Add get_printoptions and printoptions context manager
- Improve edgeitems handling when it is zero
- Add render_call which can be used to conveniently print command
  line arguments of a function call, while suppressing actual
  tensor data

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102623
Approved by: https://github.com/albanD
2023-05-31 22:08:04 +00:00
Vishwa Raj Singh
c27cefccd3 Faketensor hpu device normalization (#102512)
FakeTensor doesn't normalize device_idx  and failed with below testcase.

import torch
import habana_frameworks.torch.hpu
from torch._subclasses.fake_tensor import FakeTensorMode

with FakeTensorMode.push():
    a = torch.empty(1, device="hpu")
    b = torch.empty(1, device="hpu:0")
    result = a + b

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102512
Approved by: https://github.com/albanD
2023-05-31 17:06:44 +00:00
Peter Bell
ce42010722 [inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812
Approved by: https://github.com/lezcano
2023-05-24 22:17:32 +00:00
Edward Z. Yang
3318a832b3 Tighten FakeTensor reentrancy asserts, add debugging (#102091)
When investigating failures in https://github.com/pytorch/pytorch/pull/100017 I realized that we were reentering FakeTensorMode even though there was already one on the stack. Although we have attempted assert for these cases in the past, e.g., as in https://github.com/pytorch/pytorch/pull/97186 it seems that the existing protections were insufficient.

In this particular case, the reapplication of FakeTensorMode was due to an interaction with NotImplemented multiple dispatch handling. If proxy tensor mode detects an unrecognized tensor type (this includes FakeTensor, if it is not tracked with a proxy), it will return NotImplemented to give this tensor a chance to unpack itself into proxyable operation. However, this is never the right thing for FakeTensor, where no unpacking is possible. However, today, FakeTensor attempts to reapply the FakeTensorMode, resulting in FakeTensorMode being twice on the stack.

This PR does a number of things:

* It adds an assert in `FakeTensorMode.__torch_dispatch__` that you must not already have this mode on the stack, this is ALWAYS an error
* It modifies `FakeTensor.__torch_dispatch__` to return `NotImplemented` if the mode is already active. This prevents us from readding the mode on the stack
* It adds a new logging artifact `not_implemented` which you can use to get debug logs about all of the times a `__torch_dispatch__` handler returned NotImplemented and why it did so. Your subclass has to manually opt into this logging, but I inserted the necessary logs for ProxyTensorMode and FakeTensor(Mode)
* `with fake_mode` now no-ops if the fake mode is already on the stack, which is what users want anyway
* I am BREAKING pre-autograd tracing, because it is currently doing something weird with the original C++ mode stack. Brian is going to follow up with a fix next week.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102091
Approved by: https://github.com/thiagocrepaldi, https://github.com/eellison, https://github.com/wanchaol, https://github.com/bdhirsh
2023-05-24 05:37:51 +00:00
PyTorch MergeBot
5147fe4969 Revert "[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)"
This reverts commit b9721bd705.

Reverted https://github.com/pytorch/pytorch/pull/101812 on behalf of https://github.com/osalpekar due to Causing test_nn_cuda tests to crash during runtime. More details at [D46093942](https://www.internalfb.com/diff/D46093942) ([comment](https://github.com/pytorch/pytorch/pull/101812#issuecomment-1560238085))
2023-05-23 23:06:21 +00:00
Richard Zou
8487105fae [custom_op] Create a new torch._custom_op namespace (#101823)
torch/custom_op.py is getting long, and the autograd pieces are going to
make it even longer. I'm planning on just organizing the files under
a torch/_custom_op folder.

Note that the imports now look a bit crazy (from torch._custom_op.impl
import...) but they will look more OK when we figure out the plan to
make custom_op public (coming later).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101823
Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/bdhirsh
2023-05-23 18:31:29 +00:00
Elias Ellison
e9246b290f Initialize cuda tensor in fake tensor (#102027)
Fix for https://github.com/pytorch/pytorch/issues/92627

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102027
Approved by: https://github.com/ngimel
2023-05-23 06:24:50 +00:00
Peter Bell
b9721bd705 [inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812
Approved by: https://github.com/lezcano
2023-05-22 20:39:18 +00:00
Elias Ellison
f99eeb5bdf Check devices on meta functions that return inputs (#101807)
FakeTensor has a default device logic that wraps meta tensors to the right device after running meta kernels and throws on multiple devices. This logic was only running on the wrapping from meta kernels -> fake. For out variants, where the output of the meta kernel was already a fake tensor because it was an input, the device logic wasn't running.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101807
Approved by: https://github.com/ngimel
2023-05-19 16:13:39 +00:00
Richard Zou
3ffeab7f80 [custom_op] Make repeated registrations error gracefully (#100979)
Previously the error message went through torch.library. This PR changes
it so that on each custom_op.impl_* call:
- we store a (function, location) tuple
- if a (function, location) tuple exists already, then we raise an
error.

This logic already existed for the abstract impl (the impl for meta and
fake tensors), so this PR just extends it to the others.

Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100979
Approved by: https://github.com/bdhirsh, https://github.com/soulitzer
2023-05-12 13:49:15 +00:00
Bert Maher
d283075282 Reduce fake_tensor create_mode logging (#101074)
A lot of Meta-internal logging is at INFO level, so this produces a lot of spam

Differential Revision: [D45732720](https://our.internmc.facebook.com/intern/diff/D45732720/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101074
Approved by: https://github.com/eellison, https://github.com/ezyang
2023-05-11 13:26:38 +00:00
Nikita Karetnikov
37f1be041a [pt2] enable svd in fake_tensor (#100130)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100130
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-05-05 06:27:59 +00:00
Richard Zou
e6f9bc500b CustomOp simple abstract implementation registration (#99439)
This PR:
- adds an abstract registration API for CustomOp (CustomOp.impl_abstract)
that is used for both FakeTensor and meta tensors
- deletes CustomOp.impl_meta

The user story behind this API is that it is the one-stop shop for
registering implementations for data-less Tensors, i.e. FakeTensor and
Meta tensor.

The abstract implementation provided by the user:
- gets registered as the FakeTensor implementation AND the meta formula
- can be written like a regular meta formula. If the user decides that
they need something more special (i.e. data-dependent output shape),
then they are able to query a current context object (FakeTensorImplCtx)
that has methods to construct new unbacked symints.

Caveats:
- we really need to make FakeTensor/FakeTensorMode public. Otherwise,
there isn't a way for the user to interactively test that their abstract
implementation is correct without running through large pieces of the
PT2 stack (make_fx or torch.compile).
- We do not memoize the symints produced by
ctx.create_unbacked_symint(). It is possible to do this in the
future, but it is difficult to do soundly and I am not convinced of
the utility outside of the nonzero() usecase mentioned in #95399

Public API:
- More docs will come when we actually expose this API to users by
putting it in a public namespace, unless you folks want it now.
- The APIs mentioned in `__all__` are the ones that are intended to be
public.

Test Plan:
- Updated existing custom_op_db operators
- Added new numpy_nonzero and numpy_nms operations that test operations
that have data-dependendent output shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99439
Approved by: https://github.com/ezyang
2023-04-28 13:45:39 +00:00
Brian Hirsh
9834358e0f Get SchemaCheckMode to error on ops that return inputs directly. Expose as a dynamo backend, eager_debug (#99744)
Talked to @zou3519 and @ezyang on what the right UX is: tentatively, adding a new dynamo backend is cheap and simple, so it seems worth doing. And longer term, we agreed (?) that it's worth seeing if we can get custom ops sanity asserts to run more automatically, instead of needing a separate backend.

Side comment: that actually seems tough: the mode detects secret mutations by cloning every input to every op, running the op, and checking that the data matches between the real input and the cloned input. So I doubt we'll be able to make that behavior always-on? It would need some config at least.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99744
Approved by: https://github.com/albanD, https://github.com/ezyang, https://github.com/zou3519
2023-04-27 20:12:42 +00:00
Brian Hirsh
1f2d00e537 move SchemaCheckMode to torch/_subclasses (#99743)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99743
Approved by: https://github.com/albanD
2023-04-27 20:12:41 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Jiong Gong
e5c9a0fcf5 [dynamo] avoid graph break on repeat_interleave.self_int (#99528)
Address convit_base failure: https://github.com/pytorch/torchdynamo/issues/1886 mentioned in https://github.com/pytorch/pytorch/issues/93777
Also for models like EleutherAI/gpt-j-6B.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99528
Approved by: https://github.com/ezyang
2023-04-25 04:47:39 +00:00
Edward Z. Yang
41280a0791 Don't detach to create parameters in MetaConverter (#99618)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99618
Approved by: https://github.com/albanD
2023-04-24 19:01:26 +00:00
Michael Voznesensky
5e73569ab4 Add memoized_only mode to tensor conversion (#99741)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99741
Approved by: https://github.com/ezyang
2023-04-22 19:19:39 +00:00
Edward Z. Yang
18fd6394dc Give distinct names to __unknown_tensor (#99729)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99729
Approved by: https://github.com/albanD
2023-04-21 21:03:43 +00:00
Elias Ellison
638feec4e3 Turn on meta converter for complex (#98869)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98869
Approved by: https://github.com/ngimel
2023-04-20 16:42:38 +00:00
Richard Zou
57e1a50da3 Fix FakeTensor printing (#99205)
I got too confused by the FakeTensor printing, so this PR fixes it to
print normally.

Before:
```
with FakeTensorMode():
    x = torch.empty(2, 2, device="cpu")
    print(x)
    # FakeTensor(FakeTensor(..., device='meta', shape=(2, 2)), cpu)
```
After (Tensor printing doesn't print the default device):
```
FakeTensor(..., shape=(2, 2))
```

Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99205
Approved by: https://github.com/eellison
2023-04-18 13:26:27 +00:00
Tugsbayasgalan Manlaibaatar
7401f0f8ce Add unbacked symbool support (#98877)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98877
Approved by: https://github.com/ezyang
2023-04-17 17:45:10 +00:00
Michael Voznesensky
d5f7ec8a31 Apply dynamic shapes policy correctly to _base tensor (#99211)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99211
Approved by: https://github.com/ezyang
2023-04-15 17:18:45 +00:00
andrewor14
651c1be885 Recompute flat_arg_fake_tensors after fakification (#98769)
Summary: This fixes the case when some of the input tensors were
real tensors and fakified in `validate_and_convert_non_fake_tensors`,
but `flat_arg_fake_tensors` would not contain all the inputs
because it was computed before the fakification. We fix this by
recomputing `flat_arg_fake_tensors` after fakification as well.

Test Plan:
python test/dynamo/test_export.py ExportTests.test_mixed_real_and_fake_inputs

Reviewers: Chillee, voznesenskym

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98769
Approved by: https://github.com/voznesenskym
2023-04-14 19:14:29 +00:00
Elias Ellison
fc53472ce4 Move/Fix FakeTensor logic for detecting multiple fake modes (#97186)
This was leftover for when we had more logic in the FakeTensor and not FakeTensorMode, and wasn't firing correctly. It also makes more sense for it to be in the other validation function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97186
Approved by: https://github.com/bdhirsh
2023-04-13 19:20:01 +00:00
PyTorch MergeBot
4828585019 Revert "Move/Fix FakeTensor logic for detecting multiple fake modes (#97186)"
This reverts commit 8a057c445d.

Reverted https://github.com/pytorch/pytorch/pull/97186 on behalf of https://github.com/huydhn due to This breaks ONNX test in trunk and it looks like a landrace as the CI signal is green
2023-04-12 19:24:54 +00:00
Elias Ellison
8a057c445d Move/Fix FakeTensor logic for detecting multiple fake modes (#97186)
This was leftover for when we had more logic in the FakeTensor and not FakeTensorMode, and wasn't firing correctly. It also makes more sense for it to be in the other validation function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97186
Approved by: https://github.com/bdhirsh
2023-04-12 17:40:41 +00:00
Edward Z. Yang
b09722f540 Convert logging f-strings to use % format, part two (#98700)
This hits multi-line logging strings

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Edward Z. Yang
d01ee10b25 Add detect_fake_mode (#98321)
This replaces fake_mode_from_tensors but it preferentially looks for
fake_mode in TracingContext and also if there is an active fake mode
on the dispatch stack, before groveling in tensors to find it.

This advances PegasusForCausalLM, which was previously failing because
we generated a graph that had a parameter (non-fake) and a SymInt,
and thus previously we failed to detect the correct fake mode.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98321
Approved by: https://github.com/voznesenskym
2023-04-05 22:15:16 +00:00
Edward Z. Yang
8372c5dc68 Refactor dynamic dims api, stateless internals, higher level export API (#96699)
The purpose of this API is to execute a few large components of work:

1) Refactor all the internals of plumbing dynamic dimension information after dynamo to be stateless
2) Decouple allocation controls around dynamic dimensions from verification
3) For (2), for allocation, create an enum that dictates whether we are in DUCK (default today), STATIC (aka assume_static_default in the past), or DYNAMIC (aka user constrained, do not duck shape)
4) For (2), for verification, we separate out the list of dynamic ranges entirely from allocation. This means shape_env does not tracking for what we verify on, and instead, it is the callers job to invoke produce_guards() with the various things they want verified, specifically, with the valid ranges. We do use constrain ranges to refine value ranges when doing analysis.
5) We have decided, therefore, as an extension of (4) to double down on "late" checks versus "eager" checks, primarily because the mechanisms for gathering what actually matters happens during guards, and should be a purview of the caller seeking guards, not the shape env. However, for dynamo, these structures are essentially one and the same.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96699
Approved by: https://github.com/avikchaudhuri, https://github.com/ezyang
2023-03-29 16:55:49 +00:00
Edward Z. Yang
fb7f983357 Graph break on operators that fake tensor doesn't support (#97708)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97708
Approved by: https://github.com/eellison
2023-03-28 19:49:54 +00:00
Elias Ellison
6854fd7189 Add Config to Skip Cpp Codegen, Enable in FBCode (#97204)
Differential Revision: [D44353662](https://our.internmc.facebook.com/intern/diff/D44353662)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97204
Approved by: https://github.com/ngimel, https://github.com/bertmaher, https://github.com/mikekgfb, https://github.com/cpuhrsch
2023-03-28 18:21:15 +00:00
Edward Z. Yang
fa82080016 Don't run fallback if symbolic sizes in fake tensor (#97148)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97148
Approved by: https://github.com/Skylion007, https://github.com/eellison, https://github.com/bdhirsh
2023-03-21 02:23:44 +00:00
BowenBao
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Edward Z. Yang
98ff841a75 Use maxint to bound integers. (#96121)
We don't actually support arbitrary precision integers.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96121
Approved by: https://github.com/tugsbayasgalan, https://github.com/lezcano
2023-03-07 12:46:19 +00:00
Edward Z. Yang
027ebca4d7 Don't use guardless contiguity/stride-like implementations (#95733)
These prevent us from simplifying tests involving unbacked SymInts,
and then you end up with unbacked SymInt in guards, which is bad.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95733
Approved by: https://github.com/tugsbayasgalan
2023-03-03 21:56:41 +00:00
PyTorch MergeBot
4026c62174 Revert "Don't use guardless contiguity/stride-like implementations (#95733)"
This reverts commit deaf077de8.

Reverted https://github.com/pytorch/pytorch/pull/95733 on behalf of https://github.com/ezyang due to apparently this regresses executorch tests internally
2023-03-03 17:43:05 +00:00
Wonjoo Lee
3095c95828 Fixes for PyTorch/XLA functionalization integration (#94537)
Fixes for PyTorch/XLA functionalization integration

---
Some notable changes include:
- More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output
- Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them
- Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns
- Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see.
- Update MetaConverter to exclude XLA tensors in raising NotImplemented…
- Add `_propagate_xla_data` op
- Add meta tensor support for some ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94537
Approved by: https://github.com/bdhirsh
2023-03-02 23:02:34 +00:00
Edward Z. Yang
deaf077de8 Don't use guardless contiguity/stride-like implementations (#95733)
These prevent us from simplifying tests involving unbacked SymInts,
and then you end up with unbacked SymInt in guards, which is bad.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95733
Approved by: https://github.com/tugsbayasgalan
2023-03-01 23:14:58 +00:00
Edward Z. Yang
8efe4fd590 Memoize repeated nonzero calls to the same fake tensor (#95399)
This removes the need to explicitly constrain_unify `x[mask]` and `y[mask]` when mask is a boolean tensor. It's very narrow but it seems to work in practice.

To invalidate the nonzero call when mutation occurs, I use version counter. I know there are ways to bypass this but I think it's good enough for now.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95399
Approved by: https://github.com/eellison
2023-02-24 00:27:45 +00:00
Edward Z. Yang
4833e47feb Add support for nonzero, some improvements to reduce guards (#95387)
This takes the strategy described in https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit#

It is essentially https://github.com/pytorch/pytorch/pull/95222 but squashed and with changes that are unnecessary given that we assume nonzero returns > 1.

What's in the PR:

* nonzero now supports meta propagation. When `capture_dynamic_output_shape_ops`, it will return a tensor with an unbacked SymInt representing the size in question.
* The unbacked SymInt is UNSOUNDLY assumed to be not equal to 0/1. We will still error if you guard otherwise.
* PrimTorch pointwise operators are updated to use empty_permuted, to avoid guarding on unbacked SymInt from empty_strided (tested in `test_dynamic_pointwise_scalar`)
* Convolution is updated to skip backend selection if batch is unbacked, to avoid guarding on unbacked SymInt (tested in `test_unbacked_batch_resnet`)
* I kept the helper utilities like `definitely_true` for working with possibly unbacked SymInts. They're not used right now but maybe someone will find them useful.
* Added `constrain_unify` to let you specify two unbacked SymInts must have the same value

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95387
Approved by: https://github.com/voznesenskym
2023-02-24 00:27:45 +00:00
Edward Z. Yang
a2f44d82f8 Flag guard unbacked SymInt/SymFloat support (#94987)
I believe this fixes the AllenaiLongformerBase problem in periodic.

The longer version of the problem is here is we are currently optimistically converting all item() calls into unbacked SymInt/SymFloat, but sometimes this results in a downstream error due to a data-dependent guard. Fallbacks for this case are non-existent; this will just crash the model. This is bad. So we flag guard until we get working fallbacks.

What could these fallbacks look like? One idea I have is to optimistically make data-dependent calls unbacked, but then if it results in a crash, restart Dynamo analysis with the plan of graph breaking when the item() call immediately happened.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94987
Approved by: https://github.com/Skylion007, https://github.com/malfet
2023-02-17 00:25:05 +00:00
Edward Z. Yang
ef5de0a4cf Don't use PrimTorch decomposition for empty (#94512)
This PR removes the unnecessary == 0 guard when constructing empty tensors, by ensuring that when we create a contiguous tensor we go directly to the C++ torch.empty implementation (instead of indirecting through empty_strided), where we can bypass doing zero tests when computing the size of the storage. This probably also speeds up trace time.

When I did this, I found out that `empty_tensor_restride_symint` was flagrantly wrong (we had never exercised it before because we redirected to `empty_strided` in PrimTorch decomp, which doesn't hit this codepath.) The bugs:

* Stride computation was wrong (only `last_idx` was ever written to)
* Using set_sizes_and_strides with `sym_sizes` input doesn't work, because there is some sort of ordering problem where `clone_symvec` isn't safe when you clone a vector into itself. Probably should fix this.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94512
Approved by: https://github.com/ngimel
2023-02-16 16:04:41 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Edward Z. Yang
d690a596dc Fast path binary ops in fake tensor (#94047)
Fast path execution of a few binary ops in fake tensor, to speed up trace time. When testing `python benchmarks/dynamo/timm_models.py --accuracy --timing --backend aot_eager --dynamic-shapes --float32 --only hrnet_w18`, I get the following trace speedup.

Before:

```
cuda eval  hrnet_w18                           PASS
TIMING: entire_frame_compile:53.97591 backend_compile:33.60832
STATS: call_* op count: 1369 | FakeTensor.__torch_dispatch__:4995 | FakeTensorMode.__torch_dispatch__:89985 | ProxyTorchDispatchMode.__torch_dispatch__:3010
```

After:

```
cuda eval  hrnet_w18                           PASS
TIMING: entire_frame_compile:40.18931 backend_compile:25.28828
STATS: call_* op count: 1369 | FakeTensor.__torch_dispatch__:4995 | FakeTensorMode.__torch_dispatch__:69478 | attempt fast:4399 | fast is_contiguous:4399 | ProxyTorchDispatchMode.__torch_dispatch__:3010
```

My experiment notebook can be found at https://docs.google.com/document/d/1_dTIQUwjIVnEWmiFAavJQYVF8uzXqD9Dk6b9gGQLF_U/edit#

This is not the "most" optimized version of the code; compared with Horace/Voz roofline experiment:

```
diff --git a/torch/_subclasses/fake_tensor.py b/torch/_subclasses/fake_tensor.py
index e3bf545f3b8..395942c6ffe 100644
--- a/torch/_subclasses/fake_tensor.py
+++ b/torch/_subclasses/fake_tensor.py
@@ -774,6 +774,10 @@ class FakeTensorMode(TorchDispatchMode):
     def __torch_dispatch__(self, func, types, args=(), kwargs=None):
         kwargs = kwargs if kwargs else {}

+        with no_dispatch():
+            if func in {aten.mul.Tensor, aten.add.Tensor, aten.sub.Tensor, aten.relu.default}:
+                return FakeTensor(self, torch.empty(args[0].shape, device='meta'), device='cuda')
+
         if func == torch.ops.prim.device.default:
             assert len(args) == 1 and isinstance(args[0], FakeTensor)
             if args[0].fake_mode.in_kernel_invocation:
```

I am still leaving about 5s of trace time improvement on the table (3s of which is attributable to not yet handling relu.)

The implementation here is based off of https://github.com/pytorch/pytorch/pull/93118/ but I modeled the short circuit logic off of TensorIterator's implementation, for ease of code review and correctness verification. However, there are some important divergences:

* Traditional fast setup in TensorIterator only short circuits if the shapes of all input elements are equal. On hrnet_w18, only 5% of fastpath'ed binary operators actually satisfy this. So instead, I compute the broadcasted shape, but then I only allow the fast path if (1) at least one input tensor has a shape that is exactly the output size, and (2) all the tensors are contiguous (or if all the tensors are channels last).
* I had to manually adjust the logic to handle wrapped numbers (which ordinarily are handled by wrapping into tensors). I think I got this right.

Some evidence that this heuristic is correct is here in: https://gist.github.com/ezyang/b22fa7b72b7349137211d8dc7041f758 I exhaustively test all dim=3 tensors with sizes [1, 2] and show that we get the same significant strides between PrimTorch and the new algorithm. In fact, there ARE differences between this algorithm and PrimTorch, but in fact this algorithm agrees with TensorIterator where PrimTorch is wrong (sample case: size=(1, 1, 2), stride=(1, 1, 1), stride=(1, 1, 1))

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94047
Approved by: https://github.com/eellison
2023-02-07 18:34:24 +00:00
Yanbo Liang
605b661805 FakeTensor should constant propagate through ops that allow numbers as scalars (#94145)
Fixes #92655

Thanks @eellison for the code change suggestion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94145
Approved by: https://github.com/eellison
2023-02-07 06:20:35 +00:00
Edward Z. Yang
2481fc0df4 Add count to FakeTensorMode.__torch_dispatch__ (#93936)
Most calls to fake tensor never hit `FakeTensor.__torch_dispatch__`

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93936
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2023-02-03 14:21:11 +00:00
Edward Z. Yang
12f22655b1 Short circuit device property access on FakeTensor (#93946)
Before:

```
(/home/ezyang/local/a/pytorch-env) [ezyang@devgpu020.ftw1 ~/local/a/pytorch (ab0e3db0)]$ python benchmarks/dynamo/timm_models.py --accuracy --timing --backend aot_eager --dynamic-shapes --float32 --only hrnet_w18
cuda eval  hrnet_w18                           PASS
TIMING: entire_frame_compile:54.19504 backend_compile:33.86702
STATS: call_* op count: 1369 | FakeTensor.__torch_dispatch__:72549 | FakeTensorMode.__torch_dispatch__:115542 | ProxyTorchDispatchMode.__torch_dispatch__:3103
```

After

```
(/home/ezyang/local/a/pytorch-env) [ezyang@devgpu020.ftw1 ~/local/a/pytorch (ab0e3db0)]$ python benchmarks/dynamo/timm_models.py --accuracy --timing --backend aot_eager --dynamic-shapes --float32 --only hrnet_w18
cuda eval  hrnet_w18                           PASS
TIMING: entire_frame_compile:53.97591 backend_compile:33.60832
STATS: call_* op count: 1369 | FakeTensor.__torch_dispatch__:4995 | FakeTensorMode.__torch_dispatch__:89985 | ProxyTorchDispatchMode.__torch_dispatch__:3010
```

It doesn't really help end-to-end wall time all that much, but it does cut the number of calls to FakeTensor.__torch_dispatch__ by an order of magnitude, which hopefully has other positive effects.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93946
Approved by: https://github.com/eellison, https://github.com/albanD
2023-02-03 14:20:30 +00:00
Elias Ellison
e4f11e01bd [Fake Tensor] Allow fake meta by default, delete unused ctor args (#93993)
Two small changes that I'm bundling together because one of them needs to touch fbcode and I'm not sure how to do stacked diffs + internal changes + land before release cut.

Remove allow_meta from ctor, and allow by default: we should be able to trace through meta with fake tensors, so in some senses it's a bit weird to expose to user to disallow this. However, it's still useful debug wise to error from time to time, so I've added an option to the config that will get back previous behavior.

Remove `throw_on_data_dependent_ops=True`: this was intended as a temporary behavior as we were smoothing things turning on the erroring. There are no uses anywhere of `throw_on_data_dependent_ops=False` I could find.

These are technically backward-incompatble, but fake tensor is new since the last release / in a private namespace, and I don't want to release it with baggage that would be hard to remove later.

Fix for https://github.com/pytorch/pytorch/issues/92877.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93993
Approved by: https://github.com/bdhirsh, https://github.com/ezyang
2023-02-03 09:23:38 +00:00
Jason Ansel
8c09a005c5 [inductor] Pattern matching engine (copy) (#93291)
This is an exact duplicate of https://github.com/pytorch/pytorch/pull/90739

The fbcode workflow for landing that diff seems buggy.  The github-export-checks task is failing with credentials errors.  Plan to try to land it using GH1.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93291
Approved by: https://github.com/desertfire
2023-01-31 04:51:00 +00:00
Michael Voznesensky
d322f82b05 Add @count util to torch, use it to track benchmark stats (#93013)
<img width="1333" alt="image" src="https://user-images.githubusercontent.com/4755252/214687911-f766f072-c162-4298-9aed-c889f1375336.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93013
Approved by: https://github.com/ezyang
2023-01-26 03:09:12 +00:00
Edward Z. Yang
1237cf6b6c Allow direct Tensor constructor to return preexisting PyObject (#92754)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92754
Approved by: https://github.com/albanD, https://github.com/voznesenskym
2023-01-23 20:20:43 +00:00
PyTorch MergeBot
db466ae057 Revert "[Modes] Add assert that the mode isn't already on the stack (#90770)"
This reverts commit 702838637d.

Reverted https://github.com/pytorch/pytorch/pull/90770 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-12 16:44:29 +00:00
samdow
702838637d [Modes] Add assert that the mode isn't already on the stack (#90770)
Redo of #89726 on a clean PR, thanks @voznesenskym for the first draft!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90770
Approved by: https://github.com/ezyang
2023-01-11 15:19:43 +00:00
Samantha Andow
a7749ae177 [reland] rename DisableTorchFunction to DisableTorchFunctionSubclass (#88218) (#89221)
Summary: First half of #87990. This doesn't change any of the behavior and is just a rename

#88218 got reverted for internal breakages. This is the reland of started from internal

Differential Revision:
D41268423

LaMa Project: L1098534

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89221
Approved by: https://github.com/meliy-meyada, https://github.com/zou3519
2023-01-04 18:32:49 +00:00
Edward Z. Yang
bcf15cd93b Store source, not sname, in Symbol (#91057)
I'm going to need this in the follow up PR. Instead of storing only Source.name() in Symbol, I now store a full on Source. Lots of replumbing reoccurs. In particular:

- Move Source to torch._guards to break cycles
- I have to add TensorPropertySource and NegateSource to handle x.size()[0] and -x codegen that I was doing with string manipulation previously
- I tighten up invariants so that I never pass source=None; instead I pass ConstantSource (these are constant sources right) and test for that rather than source being missing. I think this is more parsimonious
- Some mypy wobbles from new imports

I didn't move LocalSource and friends to torch._guards, but I ended up needing to access them in a few places. The main annoyance with moving these is that then I also need to move the bytecode codegen stuff, and that's not so easy to move without bringing in the kitchen sink.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91057
Approved by: https://github.com/albanD, https://github.com/voznesenskym, https://github.com/zou3519
2022-12-30 05:56:56 +00:00
Kurt Mohler
08a47549af Rename Tensor._storage to Tensor.untyped_storage and update docs (#91414)
Fixes #89224

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414
Approved by: https://github.com/ezyang
2022-12-28 19:21:34 +00:00
Nikita Shulga
fd3a7264ae [MPS] Add group_norm[fwd+backward] and mean_var (take 2) (#91190)
Use Prims to implement group_norm, group_norm_backward and mean_var

Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in
order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in
15af4b1cee/torch/__init__.py (L1095)
is executed last during init process.

Add `__all__` to `torch/backends/mps/__init__.py` as well as alias all imports as private

Add `TestNNMPS.test_group_norm_backward` that validates no NaNs are generated during the backward pass

Fixes https://github.com/pytorch/pytorch/issues/88331
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190
Approved by: https://github.com/albanD
2022-12-22 08:54:37 +00:00
PyTorch MergeBot
b68fd7e319 Revert "Store source, not sname, in Symbol (#91057)"
This reverts commit 88c581be87.

Reverted https://github.com/pytorch/pytorch/pull/91057 on behalf of https://github.com/atalman due to causing internal build failures
2022-12-21 22:33:15 +00:00
PyTorch MergeBot
645eda0a00 Revert "[MPS] Add group_norm[fwd+backward] and mean_var (#91190)"
This reverts commit 371716eb36.

Reverted https://github.com/pytorch/pytorch/pull/91190 on behalf of https://github.com/kit1980 due to Broke test_correct_module_names because of underscore _ops
2022-12-21 19:37:43 +00:00
Nikita Shulga
371716eb36 [MPS] Add group_norm[fwd+backward] and mean_var (#91190)
Use Prims to implement group_norm, group_norm_backward and mean_var

Use `torch._ops.ops` instead of `torch.ops` in numerous subpackages in
order to be able to make them importable from `torch/backend/mps/__init__.py` as this alias is defined in
15af4b1cee/torch/__init__.py (L1095)
is executed last during init process.

Depends on https://github.com/pytorch/pytorch/pull/91203

Fixes https://github.com/pytorch/pytorch/issues/88331
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91190
Approved by: https://github.com/albanD
2022-12-21 17:33:27 +00:00
Edward Z. Yang
88c581be87 Store source, not sname, in Symbol (#91057)
I'm going to need this in the follow up PR. Instead of storing only Source.name() in Symbol, I now store a full on Source. Lots of replumbing reoccurs. In particular:

- Move Source to torch._guards to break cycles
- I have to add TensorPropertySource and NegateSource to handle x.size()[0] and -x codegen that I was doing with string manipulation previously
- I tighten up invariants so that I never pass source=None; instead I pass ConstantSource (these are constant sources right) and test for that rather than source being missing. I think this is more parsimonious
- Some mypy wobbles from new imports

I didn't move LocalSource and friends to torch._guards, but I ended up needing to access them in a few places. The main annoyance with moving these is that then I also need to move the bytecode codegen stuff, and that's not so easy to move without bringing in the kitchen sink.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91057
Approved by: https://github.com/albanD, https://github.com/voznesenskym
2022-12-21 04:51:51 +00:00
Edward Z. Yang
0b22f5ae9f Deeply rework WeakIdKeyDictionary (#90825)
In the prior patch, I just YOLOed a mutable mapping implementation.
Many edge cases were not handled correctly.  In this PR, I just
copy paste the WeakKeyDictionary from CPython and the hacked it up
to use WeakIdRef instead of weakref.ref.  You can see each line
I changed with the comment CHANGED; there aren't many.

Being exactly API compatible with WeakKeyDictionary means I can also
rob all of the tests from CPython, which I also did for
test/test_weak.py

How to review?  You could either try taking the delta from CPython
(recommended), or review everything from scratch (not recommended).
Can post diff representing delta on request.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90825
Approved by: https://github.com/albanD
2022-12-15 08:43:08 +00:00
Edward Z. Yang
54563e6288 Don't put tracing state on Tensor (#90628)
Fixes https://github.com/pytorch/pytorch/issues/89626

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90628
Approved by: https://github.com/voznesenskym
2022-12-15 08:43:08 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
1aab755320 Fakify params and weights under private config (#90417)
Previously, we planned to lift the parameters and weights while exporting and implement our own transformer to "unlift" the lifted weights and params back to the graph as attributes. But this is bit challenging because:

- We need to maintain correct ordering for weights and parameters that are passed as inputs so that we know how to map them back.
- Some weights are unused in the graph, so our transformer needs to be aware of which weights and parameters are not used in the graph. And we need to distinguish which are real user input and which are parameters.
- There can be more edge cases we haven't seen in other models yet.

I am aware that @Chillee  and @bdhirsh mentioned that functionalization won't work with fake-tensor attributes but this is fine for the short term as we don't expect users to be modifying weights and params in inference mode. In fact, we explicitly disable attribute mutation in torchdynamo export mode right now.

Given above condition, it might be ok to just fakify params when we need. I use a flag to guard against this change.

Differential Revision: [D41891201](https://our.internmc.facebook.com/intern/diff/D41891201)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90417
Approved by: https://github.com/eellison
2022-12-14 09:33:18 +00:00
Edward Z. Yang
f7365eca90 Add unbacked symints support; item works now (#90624)
The big idea is to add `create_unbacked_symfloat` and `create_unbacked_symint` to ShapeEnv, allowing you to allocate symbolic floats/ints corresponding to data you don't know about at compile time. Then, instead of immediately erroring out when you try to call local_scalar_dense on a FakeTensor, we instead create a fresh symint/symfloat and return that.

There a bunch of odds and ends that need to be handled:

* A number of `numel` calls converted to `sym_numel`
* When we finally return from item(), we need to ensure we actually produce a SymInt/SymFloat when appropriate. The previous binding code assumed that you would have to get a normal Python item. I add a pybind11 binding for Scalar (to PyObject only) and refactor the code to use that. There is some trickiness where you are NOT allowed to go through c10::SymInt if there isn't actually any SymInt involved. See comment.
* One of our unit tests tripped an implicit data dependent access which occurs when you pass a Tensor as an argument to a sizes parameter. This is also converted to support symbolic shapes
* We now support tracking bare SymInt/SymFloat returns in proxy tensor mode (this was already in symbolic-shapes branch)
* Whenever we allocate an unbacked symint, we record the stack trace it was allocated at. These get printed when you attempt data dependent access on the symint (e.g., you try to guard on it)
* Subtlety: unbacked symints are not necessarily > 1. I added a test for this.

These unbacked symints are not very useful right now as you will almost always immediately raise an error later when you try to guard on them. The next logical step is adding an assertion refinement system that lets ShapeEnv learn facts about unbacked symints so it can do a better job eliding guards that are unnecessary.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90624
Approved by: https://github.com/Skylion007, https://github.com/voznesenskym
2022-12-12 13:33:07 +00:00
Edward Z. Yang
b68dead20c Keep track of source name on all allocated SymInts (#90295)
Wow, I had to sweat so much to get this PR out lol.

This PR enforces the invariant that whenever we allocate SymInts as part of fakeification, the SymInt is associated with a Source, and in fact we store the string source name on SymbolWithSourceName. We use 'sname' as the shorthand for source name, as 'name' is already used by sympy to name symbols.

In order to store source names, we have to plumb source names from Dynamo to PyTorch. This made doing this PR a bit bone crushing, because there are many points in the Dynamo codebase where we are improperly converting intermediate tensors into fake tensors, where there is no source (and there cannot be, because it's a frickin' intermediate tensor). I've fixed all of the really awful cases in earlier PRs in the stack. This PR is just plumbing in source names from places where we do have it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90295
Approved by: https://github.com/voznesenskym
2022-12-10 13:17:34 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
Edward Z. Yang
3d4b92b171 Ensure that we fakeify tensor subclasses when they are initially tracked (#90009)
The old code didn't actually fakeify traceable tensor subclasses at the
time they are added as a GraphArg to the module; now we do, by ignoring
the subclass during fakeification and relying on Dynamo to simulate
the subclass on top.  See comments for more details.

BTW, this codepath is super broken, see filed issues linked on the
inside.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90009
Approved by: https://github.com/wconstab, https://github.com/voznesenskym
2022-12-06 22:36:32 +00:00
Michael Voznesensky
3b9a386d48 Add TORCH_FAKE_TENSOR_DEBUG use it to enable storage of traces on fake tensors at init time (#90215)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90215
Approved by: https://github.com/ezyang
2022-12-06 22:28:52 +00:00
Edward Z. Yang
a1ab06ab65 ShapeEnv.create_symbolic_sizes_strides_storage_offset (#89962)
Instead of having storage offset hang out on its own, allocate
all of these symbols all in one go.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89962
Approved by: https://github.com/albanD, https://github.com/voznesenskym
2022-12-06 21:27:02 +00:00
Elias Ellison
1a33b7cbfa Make fake tensors preserve dense strides in type conversion (#89803)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89803
Approved by: https://github.com/ngimel
2022-11-30 01:28:51 +00:00
XiaobingSuper
07151a6bd6 TorchDynamo: weight prepack for onednn convolution external call (#88988)
This PR is about enabled weight prepack using the MKLDNN tensor:
1.  enable fake tensor mode for MKLDNN tensor input.
2.  make convolution fusion kernel support MKLDNN tensor input.
3. do the weight prepack at FX fusion step.

For better performance, we always use channels_last for CPU convolution path. because we test that the channels_last path can get a better performance than block input path, and also avoid the activation's layout conversion(plain to block, block to plain), currently, there only need plain to plain format conversion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88988
Approved by: https://github.com/jgong5, https://github.com/jansel
2022-11-25 01:16:11 +00:00
Edward Z. Yang
860bae49e4 Suppress guards on as_strided call only. (#89569)
See comment in meta_utils.py for the whole story.

This doesn't have a substantive impact yet, but will in the next
PR on the stack.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89569
Approved by: https://github.com/albanD
2022-11-24 14:01:12 +00:00
Edward Z. Yang
ea50549ce6 Suppress guards when creating fake tensors (#89349)
When we create fake tensors, we may call operators that introduce
guards, to accurately reconstruct views.  But these guards are spurious:
if a user is able to present a tensor that "looks the same", they have
implicitly fulfilled the contract that the view is creatable.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89349
Approved by: https://github.com/voznesenskym
2022-11-21 23:14:20 +00:00
Sherlock Huang
caf3d5319f Symintify numel(), infer_size, prims.elementwise_meta (#88956)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88956
Approved by: https://github.com/ezyang
2022-11-20 00:42:03 +00:00
PyTorch MergeBot
8ad39536d7 Revert "Symintify numel(), infer_size, prims.elementwise_meta (#88956)"
This reverts commit ce2f8700ba.

Reverted https://github.com/pytorch/pytorch/pull/88956 on behalf of https://github.com/ezyang due to somehow breaks torch.numel
2022-11-19 21:47:55 +00:00
Edward Z. Yang
5582001bd5 Reland 2 "Towards unifying symbolic and non symbolic fake tensor (#89038) (#89143)" (#89346)
This reverts commit 8e4c9828f4.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89346
Approved by: https://github.com/wconstab
2022-11-19 21:14:31 +00:00
PyTorch MergeBot
8e4c9828f4 Revert "Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143)"
This reverts commit e686b8c3ba.

Reverted https://github.com/pytorch/pytorch/pull/89143 on behalf of https://github.com/ZainRizvi due to This seems to be causing the test_make_fx_symbolic_exhaustive_rad2deg_cpu_float32 and test_make_fx_symbolic_exhaustive_inplace_rad2deg_cpu_float32 test to fail across multiple jobs
2022-11-17 17:02:36 +00:00