pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	3318a832b3	Tighten FakeTensor reentrancy asserts, add debugging (#102091 ) When investigating failures in https://github.com/pytorch/pytorch/pull/100017 I realized that we were reentering FakeTensorMode even though there was already one on the stack. Although we have attempted assert for these cases in the past, e.g., as in https://github.com/pytorch/pytorch/pull/97186 it seems that the existing protections were insufficient. In this particular case, the reapplication of FakeTensorMode was due to an interaction with NotImplemented multiple dispatch handling. If proxy tensor mode detects an unrecognized tensor type (this includes FakeTensor, if it is not tracked with a proxy), it will return NotImplemented to give this tensor a chance to unpack itself into proxyable operation. However, this is never the right thing for FakeTensor, where no unpacking is possible. However, today, FakeTensor attempts to reapply the FakeTensorMode, resulting in FakeTensorMode being twice on the stack. This PR does a number of things: * It adds an assert in `FakeTensorMode.__torch_dispatch__` that you must not already have this mode on the stack, this is ALWAYS an error * It modifies `FakeTensor.__torch_dispatch__` to return `NotImplemented` if the mode is already active. This prevents us from readding the mode on the stack * It adds a new logging artifact `not_implemented` which you can use to get debug logs about all of the times a `__torch_dispatch__` handler returned NotImplemented and why it did so. Your subclass has to manually opt into this logging, but I inserted the necessary logs for ProxyTensorMode and FakeTensor(Mode) * `with fake_mode` now no-ops if the fake mode is already on the stack, which is what users want anyway * I am BREAKING pre-autograd tracing, because it is currently doing something weird with the original C++ mode stack. Brian is going to follow up with a fix next week. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102091 Approved by: https://github.com/thiagocrepaldi, https://github.com/eellison, https://github.com/wanchaol, https://github.com/bdhirsh	2023-05-24 05:37:51 +00:00
Nikita Karetnikov	e79d9b9938	[pt2] add `SymInt` support for `linalg.matrix_power` (#101940 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101940 Approved by: https://github.com/lezcano, https://github.com/ezyang	2023-05-24 00:21:52 +00:00
Nikita Karetnikov	42b974e8f7	[pt2] add meta for `linalg_lu_solve` (#101836 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101836 Approved by: https://github.com/lezcano	2023-05-24 00:21:50 +00:00
Khushi	1aaf0396eb	[reland][opinfo] empty_strided (#101782 ) Follows #100223 Previous PR: #100890 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101782 Approved by: https://github.com/ezyang	2023-05-19 03:06:29 +00:00
drisspg	6f13d6892a	Add meta support for multinomial (#101324 ) # Summary Found this when trying to compile the text gen loop of nanogpt here: `b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324 Approved by: https://github.com/ngimel	2023-05-19 00:04:26 +00:00
Angela Yi	72a73ef67b	Add aten.searchsorted.Tensor meta kernel (#101637 ) Test Plan: CI Differential Revision: D45933187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101637 Approved by: https://github.com/ezyang	2023-05-18 06:55:11 +00:00
PyTorch MergeBot	dfac4364c4	Revert "[opinfo] empty_strided (#100890 )" This reverts commit `01c7106580`. Reverted https://github.com/pytorch/pytorch/pull/100890 on behalf of https://github.com/PaliC due to broke test_ops.py slow test ([comment](https://github.com/pytorch/pytorch/pull/100890#issuecomment-1551903975))	2023-05-17 19:00:15 +00:00
ydwu4	326a4cc815	Support map autograd and pytree in/out. (#101633 ) Rebased https://github.com/pytorch/pytorch/pull/100494 and added dummy AOTConfig. This PR adds autograd and pytree support for map operator. Implementation-wise: 1. We temporarily make two HigherOrderOperators, "map" and "map_impl": - "map" is user-facing. Currently, it unwraps the pytrees in inputs and create a flat_fn for it. Dynamo currently cannot deal with pytree.tree_flatten and pytree.tree_unflatten, we therefore make it a HigherOrderOperator to trigger dynamo logic of handling HigherOrderOperators. - "map_impl" is the actual operator that works with the rest of torch subsystems such as functionalization, make_fx. It accepts flattend arguments, and a num_mapped_args integer denoting how many of the flattend arguments need to mapped i.e. their first dimension will be unstacked. 2. We create the forward and backward graph in autograd key and call torch.autograd.Function. Currently, the backward graph is recomputation-based and we need to partition the joint graph in the future to be more efficient. Example traced graphs for map operators: ### Case 1: simple f and autograd ```python def f(x, y): return x + y def g(xs, y): out = control_flow.map(f, xs, y) return torch.autograd.grad(out, (xs, y), torch.ones_like(out)) gm = make_fx(g, tracing_mode="symbolic")(torch.ones(3, 4, 5, requires_grad=True), torch.ones(5, requires_grad=True)) # gm.print_readable() produces following: class g(torch.nn.Module): def forward(self, xs_1: f32[3, s1, s2], y_1: f32[s2]): # No stacktrace found for following nodes body_graph_0 = self.body_graph_0 map_impl = torch.ops.map_impl(body_graph_0, 1, xs_1, y_1); body_graph_0 = None getitem: f32[3, s1, s2] = map_impl[0]; map_impl = None ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False) is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like); getitem = None body_graph_1 = self.body_graph_1 map_impl_1 = torch.ops.map_impl(body_graph_1, 2, xs_1, ones_like, y_1); body_graph_1 = xs_1 = ones_like = None getitem_1 = map_impl_1[0] getitem_2: f32[3, s1, s2] = map_impl_1[1] getitem_3: f32[3, s2] = map_impl_1[2]; map_impl_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_3, [0], True); getitem_3 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0); y_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return (getitem_2, view) class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s2]): # No stacktrace found for following nodes add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg2_1); arg1_1 = arg2_1 = None return [add] class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]): # No stacktrace found for following nodes add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg3_1); arg1_1 = None is_same_size = torch.ops.aten.is_same_size.default(add, arg2_1); add = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg2_1, [0], True) sym_size: Sym(s2) = torch.ops.aten.sym_size(arg3_1, 0); arg3_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return [None, arg2_1, view] ``` ### Case 2: list input/output f and autograd ```python def f(x, y): return [x[0].cos() + y.sin(), x[1].sin() * y.cos()] def g(xs, y): out = control_flow.map(f, xs, y) flat_out, _ = pytree.tree_flatten(out) flat_inp, _ = pytree.tree_flatten((xs, y)) requires_grad_inp = [inp for inp in flat_inp if inp.requires_grad] return torch.autograd.grad(flat_out, requires_grad_inp, [torch.ones_like(out) for out in flat_out]) gm = make_fx(g, tracing_mode="symbolic")( [torch.ones(3, 4, 5), torch.ones(3, 4, 5, requires_grad=True)], torch.ones(5, requires_grad=True)) # gm.print_readable() produces following: class g(torch.nn.Module): def forward(self, xs, y): xs_1: f32[3, s1, s2], xs_2: f32[3, s1, s2], y_1: f32[s2], = fx_pytree.tree_flatten_spec([xs, y], self._in_spec) # No stacktrace found for following nodes body_graph_0 = self.body_graph_0 map_impl = torch.ops.map_impl(body_graph_0, 2, xs_1, xs_2, y_1); body_graph_0 = None getitem: f32[3, s1, s2] = map_impl[0] getitem_1: f32[3, s1, s2] = map_impl[1]; map_impl = None ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False) ones_like_1: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem_1, pin_memory = False) is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like); getitem = None is_same_size_1 = torch.ops.aten.is_same_size.default(getitem_1, ones_like_1); getitem_1 = None body_graph_1 = self.body_graph_1 map_impl_1 = torch.ops.map_impl(body_graph_1, 4, xs_1, xs_2, ones_like, ones_like_1, y_1); body_graph_1 = xs_1 = xs_2 = ones_like = ones_like_1 = None getitem_2 = map_impl_1[0] getitem_3 = map_impl_1[1] getitem_4: f32[3, s1, s2] = map_impl_1[2] getitem_5: f32[3, s2] = map_impl_1[3]; map_impl_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_5, [0], True); getitem_5 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0); y_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return pytree.tree_unflatten([getitem_4, view], self._out_spec) class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]): # No stacktrace found for following nodes cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1); arg1_1 = None sin: f32[s2] = torch.ops.aten.sin.default(arg3_1) add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin); cos = sin = None sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1); arg2_1 = None cos_1: f32[s2] = torch.ops.aten.cos.default(arg3_1); arg3_1 = None mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1); sin_1 = cos_1 = None return [add, mul] class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s1, s2], arg4_1: f32[s1, s2], arg5_1: f32[s2]): # No stacktrace found for following nodes cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1); arg1_1 = None sin: f32[s2] = torch.ops.aten.sin.default(arg5_1) add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin); cos = sin = None sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1) cos_1: f32[s2] = torch.ops.aten.cos.default(arg5_1) mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1) is_same_size = torch.ops.aten.is_same_size.default(add, arg3_1); add = None is_same_size_1 = torch.ops.aten.is_same_size.default(mul, arg4_1); mul = None mul_1: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, sin_1); sin_1 = None mul_2: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, cos_1); arg4_1 = cos_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(mul_1, [0], True); mul_1 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(arg5_1, 0) view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = None # sin_2: f32[s2] = torch.ops.aten.sin.default(arg5_1) neg: f32[s2] = torch.ops.aten.neg.default(sin_2); sin_2 = None mul_3: f32[s2] = torch.ops.aten.mul.Tensor(view, neg); view = neg = None cos_2: f32[s1, s2] = torch.ops.aten.cos.default(arg2_1); arg2_1 = None mul_4: f32[s1, s2] = torch.ops.aten.mul.Tensor(mul_2, cos_2); mul_2 = cos_2 = None sum_2: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg3_1, [0], True); arg3_1 = None view_1: f32[s2] = torch.ops.aten.view.default(sum_2, [sym_size]); sum_2 = sym_size = None cos_3: f32[s2] = torch.ops.aten.cos.default(arg5_1); arg5_1 = None mul_5: f32[s2] = torch.ops.aten.mul.Tensor(view_1, cos_3); view_1 = cos_3 = None add_1: f32[s2] = torch.ops.aten.add.Tensor(mul_3, mul_5); mul_3 = mul_5 = None return [None, None, mul_4, add_1] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101633 Approved by: https://github.com/zou3519	2023-05-17 16:52:26 +00:00
PyTorch MergeBot	e69198b043	Revert "Support map autograd and pytree in/out (#100494 )" This reverts commit `b8fa41be9d`. Reverted https://github.com/pytorch/pytorch/pull/100494 on behalf of https://github.com/PaliC due to breaking tests on trunk, please check hud.pytorch.org for the broken tests ([comment](https://github.com/pytorch/pytorch/pull/100494#issuecomment-1550454835))	2023-05-16 22:50:18 +00:00
ydwu4	b8fa41be9d	Support map autograd and pytree in/out (#100494 ) This PR adds autograd and pytree support for map operator. Implementation-wise: 1. We temporarily make two HigherOrderOperators, "map" and "map_impl": - "map" is user-facing. Currently, it unwraps the pytrees in inputs and create a flat_fn for it. Dynamo currently cannot deal with pytree.tree_flatten and pytree.tree_unflatten, we therefore make it a HigherOrderOperator to trigger dynamo logic of handling HigherOrderOperators. - "map_impl" is the actual operator that works with the rest of torch subsystems such as functionalization, make_fx. It accepts flattend arguments, and a num_mapped_args integer denoting how many of the flattend arguments need to mapped i.e. their first dimension will be unstacked. 2. We create the forward and backward graph in autograd key and call torch.autograd.Function. Currently, the backward graph is recomputation-based and we need to partition the joint graph in the future to be more efficient. Example traced graphs for map operators: ### Case 1: simple f and autograd ```python def f(x, y): return x + y def g(xs, y): out = control_flow.map(f, xs, y) return torch.autograd.grad(out, (xs, y), torch.ones_like(out)) gm = make_fx(g, tracing_mode="symbolic")(torch.ones(3, 4, 5, requires_grad=True), torch.ones(5, requires_grad=True)) # gm.print_readable() produces following: class g(torch.nn.Module): def forward(self, xs_1: f32[3, s1, s2], y_1: f32[s2]): # No stacktrace found for following nodes body_graph_0 = self.body_graph_0 map_impl = torch.ops.map_impl(body_graph_0, 1, xs_1, y_1); body_graph_0 = None getitem: f32[3, s1, s2] = map_impl[0]; map_impl = None ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False) is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like); getitem = None body_graph_1 = self.body_graph_1 map_impl_1 = torch.ops.map_impl(body_graph_1, 2, xs_1, ones_like, y_1); body_graph_1 = xs_1 = ones_like = None getitem_1 = map_impl_1[0] getitem_2: f32[3, s1, s2] = map_impl_1[1] getitem_3: f32[3, s2] = map_impl_1[2]; map_impl_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_3, [0], True); getitem_3 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0); y_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return (getitem_2, view) class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s2]): # No stacktrace found for following nodes add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg2_1); arg1_1 = arg2_1 = None return [add] class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]): # No stacktrace found for following nodes add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg3_1); arg1_1 = None is_same_size = torch.ops.aten.is_same_size.default(add, arg2_1); add = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg2_1, [0], True) sym_size: Sym(s2) = torch.ops.aten.sym_size(arg3_1, 0); arg3_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return [None, arg2_1, view] ``` ### Case 2: list input/output f and autograd ```python def f(x, y): return [x[0].cos() + y.sin(), x[1].sin() * y.cos()] def g(xs, y): out = control_flow.map(f, xs, y) flat_out, _ = pytree.tree_flatten(out) flat_inp, _ = pytree.tree_flatten((xs, y)) requires_grad_inp = [inp for inp in flat_inp if inp.requires_grad] return torch.autograd.grad(flat_out, requires_grad_inp, [torch.ones_like(out) for out in flat_out]) gm = make_fx(g, tracing_mode="symbolic")( [torch.ones(3, 4, 5), torch.ones(3, 4, 5, requires_grad=True)], torch.ones(5, requires_grad=True)) # gm.print_readable() produces following: class g(torch.nn.Module): def forward(self, xs, y): xs_1: f32[3, s1, s2], xs_2: f32[3, s1, s2], y_1: f32[s2], = fx_pytree.tree_flatten_spec([xs, y], self._in_spec) # No stacktrace found for following nodes body_graph_0 = self.body_graph_0 map_impl = torch.ops.map_impl(body_graph_0, 2, xs_1, xs_2, y_1); body_graph_0 = None getitem: f32[3, s1, s2] = map_impl[0] getitem_1: f32[3, s1, s2] = map_impl[1]; map_impl = None ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False) ones_like_1: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem_1, pin_memory = False) is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like); getitem = None is_same_size_1 = torch.ops.aten.is_same_size.default(getitem_1, ones_like_1); getitem_1 = None body_graph_1 = self.body_graph_1 map_impl_1 = torch.ops.map_impl(body_graph_1, 4, xs_1, xs_2, ones_like, ones_like_1, y_1); body_graph_1 = xs_1 = xs_2 = ones_like = ones_like_1 = None getitem_2 = map_impl_1[0] getitem_3 = map_impl_1[1] getitem_4: f32[3, s1, s2] = map_impl_1[2] getitem_5: f32[3, s2] = map_impl_1[3]; map_impl_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_5, [0], True); getitem_5 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0); y_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return pytree.tree_unflatten([getitem_4, view], self._out_spec) class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]): # No stacktrace found for following nodes cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1); arg1_1 = None sin: f32[s2] = torch.ops.aten.sin.default(arg3_1) add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin); cos = sin = None sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1); arg2_1 = None cos_1: f32[s2] = torch.ops.aten.cos.default(arg3_1); arg3_1 = None mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1); sin_1 = cos_1 = None return [add, mul] class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s1, s2], arg4_1: f32[s1, s2], arg5_1: f32[s2]): # No stacktrace found for following nodes cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1); arg1_1 = None sin: f32[s2] = torch.ops.aten.sin.default(arg5_1) add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin); cos = sin = None sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1) cos_1: f32[s2] = torch.ops.aten.cos.default(arg5_1) mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1) is_same_size = torch.ops.aten.is_same_size.default(add, arg3_1); add = None is_same_size_1 = torch.ops.aten.is_same_size.default(mul, arg4_1); mul = None mul_1: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, sin_1); sin_1 = None mul_2: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, cos_1); arg4_1 = cos_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(mul_1, [0], True); mul_1 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(arg5_1, 0) view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = None # sin_2: f32[s2] = torch.ops.aten.sin.default(arg5_1) neg: f32[s2] = torch.ops.aten.neg.default(sin_2); sin_2 = None mul_3: f32[s2] = torch.ops.aten.mul.Tensor(view, neg); view = neg = None cos_2: f32[s1, s2] = torch.ops.aten.cos.default(arg2_1); arg2_1 = None mul_4: f32[s1, s2] = torch.ops.aten.mul.Tensor(mul_2, cos_2); mul_2 = cos_2 = None sum_2: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg3_1, [0], True); arg3_1 = None view_1: f32[s2] = torch.ops.aten.view.default(sum_2, [sym_size]); sum_2 = sym_size = None cos_3: f32[s2] = torch.ops.aten.cos.default(arg5_1); arg5_1 = None mul_5: f32[s2] = torch.ops.aten.mul.Tensor(view_1, cos_3); view_1 = cos_3 = None add_1: f32[s2] = torch.ops.aten.add.Tensor(mul_3, mul_5); mul_3 = mul_5 = None return [None, None, mul_4, add_1] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100494 Approved by: https://github.com/zou3519	2023-05-16 22:05:11 +00:00
Nikita Karetnikov	42e65a2587	[pt2] add meta for `linalg_lu_factor_ex` (#101375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101375 Approved by: https://github.com/lezcano	2023-05-16 20:56:54 +00:00
Khushi	01c7106580	[opinfo] empty_strided (#100890 ) Follows: #100223 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100890 Approved by: https://github.com/ezyang	2023-05-15 23:39:39 +00:00
Nikita Karetnikov	9eb1748b2b	[pt2] add meta and `SymInt` support for `linalg_lu` (#101372 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101372 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-05-15 20:25:00 +00:00
Nikita Karetnikov	ac4cc63ae2	[pt2] add meta for `linalg_ldl_solve` (#101367 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101367 Approved by: https://github.com/lezcano	2023-05-15 20:25:00 +00:00
Nikita Karetnikov	7dd8e08817	[pt2] add meta for `linalg_ldl_factor_ex` (#101362 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101362 Approved by: https://github.com/lezcano	2023-05-15 02:56:49 +00:00
Nikita Karetnikov	a8964d6377	[pt2] add meta and `SymInt` support for `linalg_householder_product` (#101315 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101315 Approved by: https://github.com/lezcano	2023-05-15 02:56:49 +00:00
Nikita Karetnikov	6abde61f8e	[pt2] add meta function for `_linalg_eigh` (#100964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100964 Approved by: https://github.com/ezyang	2023-05-10 15:45:15 +00:00
Khushi	51fe53e619	[opinfo] item (#100313 ) Follows #100223 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100313 Approved by: https://github.com/ezyang	2023-05-10 11:32:45 +00:00
Nikita Karetnikov	1e591a8b64	[pt2] add meta function for `solve_triangular` (#100829 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100829 Approved by: https://github.com/ezyang	2023-05-08 13:48:15 +00:00
Nikita Karetnikov	266c84e3ab	[pt2] add meta function for `linalg_qr` (#100714 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100714 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-05-06 15:04:02 +00:00
Nikita Karetnikov	37f1be041a	[pt2] enable `svd` in `fake_tensor` (#100130 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100130 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-05-05 06:27:59 +00:00
Michael Voznesensky	fe3ecfe0cf	Add AotAutogradFallbackTests to dynamic suite (#100454 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100454 Approved by: https://github.com/ezyang	2023-05-04 04:28:45 +00:00
Nikita Karetnikov	e87ed2a88d	[primTorch] add ref for `polar` (#100345 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100345 Approved by: https://github.com/ezyang	2023-05-04 01:37:02 +00:00
Nikita Karetnikov	279f3cd0a6	[pt2] add `SymInt` support for `dsplit`, `hsplit`, `vsplit` (#100352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100352 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2023-05-02 18:51:03 +00:00
Nikita Karetnikov	41361538a9	[pt2] add `SymInt` support for `tensordot` and `inner` (#100356 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100356 Approved by: https://github.com/ezyang	2023-05-02 14:42:50 +00:00
Brian Hirsh	62fad315c1	fix per-dispatchkey-mode caching bug (#98030 ) The bug was that: if you want to move a mode to the autograd key, we need to use the "functionality" key for it (AutogradFunctionality). But when we do that, we need to clear any PythonDispatcher caches for every op for every autograd key (since you could run autograd ops with both cpu and cuda tensors underneath the mode, which both may have been cached). I didn't add a test, since this ends up getting indirectly tests by export in the PR. If someone would prefer a direct test I can add one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98030 Approved by: https://github.com/ezyang	2023-04-25 21:58:14 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
Nikita Karetnikov	fbb0ff10a4	[pt2] add `SymInt` support for trapezoid ops (#99281 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99281 Approved by: https://github.com/ezyang	2023-04-25 00:44:25 +00:00
Wanchao Liang	ff7d5b62d4	Improve ProxyTensor tensor_tree list/tuple handling (#99897 ) This PR improves the list/tuple handling by merging the logic into `wrap_with_proxy` directly, and set_meta when we find the current proxy is a fx.Proxy. This also solves the problem that even `fused_adam` have `val`, some corresponding `getitem` calls followed after `fused_adam` don't have val Pull Request resolved: https://github.com/pytorch/pytorch/pull/99897 Approved by: https://github.com/ezyang	2023-04-24 22:50:02 +00:00
Michael Voznesensky	4c2892944f	Guard static shapes alongside tensors, instead of from shape_env, in dynamic_shapes=True (#99566 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99566 Approved by: https://github.com/ezyang	2023-04-22 16:46:52 +00:00
Edward Z. Yang	10c938abef	Handle meta['val'] for tuple of lists. (#99724 ) Fixes https://github.com/pytorch/pytorch/issues/99356 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99724 Approved by: https://github.com/wanchaol	2023-04-21 22:33:21 +00:00
Elias Ellison	638feec4e3	Turn on meta converter for complex (#98869 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98869 Approved by: https://github.com/ngimel	2023-04-20 16:42:38 +00:00
Richard Zou	44b09bf673	Reland "Simple Custom Operator API, V0 (#98440 )" (#99416 ) See the original PR (#98440) for the description. It broke internal builds due to proxy_tensor.py not importing torch._dynamo, which is being fixed in the previous PR in the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99416 Approved by: https://github.com/soulitzer, https://github.com/bdhirsh	2023-04-18 23:48:33 +00:00
PyTorch MergeBot	f497031df9	Revert "Simple Custom Operator API, V0 (#98440 )" This reverts commit `0157b2d722`. Reverted https://github.com/pytorch/pytorch/pull/98440 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-04-18 13:04:27 +00:00
Nikita Karetnikov	106ccf4a2a	[pt2] add meta function for `linalg.cross` (#99279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279 Approved by: https://github.com/ezyang	2023-04-17 21:21:45 +00:00
Nikita Karetnikov	6f7b434f7b	[pt2] add `SymInt` support for `column_stack` (#99276 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99276 Approved by: https://github.com/ezyang	2023-04-17 21:21:45 +00:00
PyTorch MergeBot	08dd4ad0b9	Revert "[pt2] add `SymInt` support for `column_stack` (#99276 )" This reverts commit `775dd869d0`. Reverted https://github.com/pytorch/pytorch/pull/99276 on behalf of https://github.com/ezyang due to reverting this one too for safety	2023-04-17 19:37:58 +00:00
PyTorch MergeBot	f957334c2b	Revert "[pt2] add meta function for `linalg.cross` (#99279 )" This reverts commit `efc3887ea5`. Reverted https://github.com/pytorch/pytorch/pull/99279 on behalf of https://github.com/ezyang due to Apparently this is breaking inductor on master? So weird	2023-04-17 19:33:16 +00:00
Tugsbayasgalan Manlaibaatar	7401f0f8ce	Add unbacked symbool support (#98877 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98877 Approved by: https://github.com/ezyang	2023-04-17 17:45:10 +00:00
Richard Zou	0157b2d722	Simple Custom Operator API, V0 (#98440 ) This PR introduces CustomOp, a wrapper around a dispatcher operator that allows users to define custom operators. It adds the skeleton for CustomOp and some very simple behavior: as of this PR: - one can create a CustomOp for an operator that does not have inplace or aliasing - give it CPU/CUDA and Meta implementations - and trace it into a graph via make_fx. The design follows https://docs.google.com/document/d/19Uc5OUCA187q9BZggJb70RT2ZoSTDoG5QQkJkZwd25M/edit Concretely, we implement the following things mentioned in the doc in this PR: - Entrypoint 1 (CustomOp.define, creating a new custom operator) - impl (to define device-specific code) and impl_meta (to define meta formulas) The goal for the short term is to get the code to a state where it can be trialed by the export folks. On top of this PR, the blockers are: - adding Entrypoint 3 (CustomOp.from_existing) - adding a way to do data-dependent shape formulas These will come in future PRs since this one is getting long. Things that will come in the longer-near-term (before 2.1): - adding the other entrypoints mentioned in the doc (2 & 3) - more safety checks and better error messages - support for views and mutation - support for defining autograd formulas - support for functionalization - making this API public (it's private right now). Test Plan: - added a new test case, TestCustomOp. It mostly tests a bunch of error cases. - added OpInfos for custom operators and hooked these up to test_proxy_tensor to test that they work with make_fx. These custom operators were based off of the ones in the autograd_function_db. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98440 Approved by: https://github.com/ezyang	2023-04-17 12:17:32 +00:00
Nikita Karetnikov	efc3887ea5	[pt2] add meta function for `linalg.cross` (#99279 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99279 Approved by: https://github.com/ezyang	2023-04-17 03:05:20 +00:00
Nikita Karetnikov	775dd869d0	[pt2] add `SymInt` support for `column_stack` (#99276 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99276 Approved by: https://github.com/ezyang	2023-04-17 03:05:20 +00:00
Nikita Karetnikov	21681f36f4	[pt2] add `SymInt` support for fft ops (#99115 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99115 Approved by: https://github.com/ezyang	2023-04-15 18:01:39 +00:00
Peter Bell	7b91bd2a7b	[primTorch] Add count_nonzero (#98995 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98995 Approved by: https://github.com/lezcano	2023-04-13 22:08:19 +00:00
Nikita Karetnikov	8db04e080c	[pt2] add `SymInt` support for `cdist` (#98881 ) Fixes #98853. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98881 Approved by: https://github.com/ezyang	2023-04-12 23:06:40 +00:00
Edward Z. Yang	419ad49e65	Make Tensor.__contains__ accept SymInt/Float/Bool. (#98933 ) Fixes https://github.com/pytorch/pytorch/issues/98870 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98933 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-04-12 19:16:33 +00:00
Nikita Karetnikov	ff825de442	[primTorch] add ref for `cumprod` (#98670 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98670 Approved by: https://github.com/ezyang	2023-04-09 15:22:28 +00:00
Nikita Karetnikov	a2e7910dfd	[pt2] remove skip for `masked.logsumexp` in `test_proxy_tensor.py` (#98676 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98676 Approved by: https://github.com/ezyang	2023-04-09 01:28:16 +00:00
Nikita Karetnikov	b411238d76	[pt2] add meta function for `logcumsumexp` (#98683 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98683 Approved by: https://github.com/ezyang	2023-04-09 01:26:37 +00:00
Nikita Karetnikov	1c226f5aad	[pt2] add meta functions for `cummax` and `cummin` (#98552 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98552 Approved by: https://github.com/Chillee	2023-04-07 17:58:28 +00:00
Nikita Karetnikov	7b25976323	[pt2] add meta function for `take` (#98451 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98451 Approved by: https://github.com/ezyang	2023-04-06 14:48:35 +00:00
Michael Voznesensky	b1e60bfb6a	Pass f_locals as a dict rather than kwargs (#98107 ) Fixes https://github.com/pytorch/pytorch/issues/97688 One big problem is that instead of printing x < y we now print `E["x"] < E["y"]` and now all of the tests wobbled and I'm mad. Signed-off-by: Edward Z. Yang <ezyangmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98107 Approved by: https://github.com/ezyang	2023-04-04 00:30:08 +00:00
Edward Z. Yang	8372c5dc68	Refactor dynamic dims api, stateless internals, higher level export API (#96699 ) The purpose of this API is to execute a few large components of work: 1) Refactor all the internals of plumbing dynamic dimension information after dynamo to be stateless 2) Decouple allocation controls around dynamic dimensions from verification 3) For (2), for allocation, create an enum that dictates whether we are in DUCK (default today), STATIC (aka assume_static_default in the past), or DYNAMIC (aka user constrained, do not duck shape) 4) For (2), for verification, we separate out the list of dynamic ranges entirely from allocation. This means shape_env does not tracking for what we verify on, and instead, it is the callers job to invoke produce_guards() with the various things they want verified, specifically, with the valid ranges. We do use constrain ranges to refine value ranges when doing analysis. 5) We have decided, therefore, as an extension of (4) to double down on "late" checks versus "eager" checks, primarily because the mechanisms for gathering what actually matters happens during guards, and should be a purview of the caller seeking guards, not the shape env. However, for dynamo, these structures are essentially one and the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96699 Approved by: https://github.com/avikchaudhuri, https://github.com/ezyang	2023-03-29 16:55:49 +00:00
Brian Hirsh	35c9ea89fa	dont bake in defaults when tracing *_like factories (#97564 ) quick fix for https://github.com/pytorch/pytorch/issues/97541. letting CI run to see if there's any fallout Pull Request resolved: https://github.com/pytorch/pytorch/pull/97564 Approved by: https://github.com/ezyang	2023-03-27 22:53:44 +00:00
Brian Hirsh	af440c427b	[draft for discussion] add per-dispatch key modes (#97052 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97052 Approved by: https://github.com/ezyang, https://github.com/zou3519	2023-03-21 23:45:45 +00:00
Rohan Gupta	b01d6f2cdb	addmv decomp #2 (#96264 ) Fixes #94617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96264 Approved by: https://github.com/ngimel, https://github.com/ezyang	2023-03-16 23:09:45 +00:00
Nikita Karetnikov	0d7c44096a	Add `baddbmm` meta function (#96548 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96548 Approved by: https://github.com/ezyang	2023-03-11 19:09:24 +00:00
Nikita Karetnikov	8e0d5bf538	[primTorch] add meta implementation for `aten.min.dim` (#96442 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96442 Approved by: https://github.com/ngimel	2023-03-11 18:51:51 +00:00
Edward Z. Yang	98ff841a75	Use maxint to bound integers. (#96121 ) We don't actually support arbitrary precision integers. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96121 Approved by: https://github.com/tugsbayasgalan, https://github.com/lezcano	2023-03-07 12:46:19 +00:00
Edward Z. Yang	680214ac11	SymIntify a few more relatively non-controversial schemas (#96100 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96100 Approved by: https://github.com/Skylion007	2023-03-06 23:12:40 +00:00
Jason Ansel	5dd52e250f	[inductor] Add some simple decomps (#96039 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96039 Approved by: https://github.com/ngimel	2023-03-05 17:07:56 +00:00
Edward Z. Yang	027ebca4d7	Don't use guardless contiguity/stride-like implementations (#95733 ) These prevent us from simplifying tests involving unbacked SymInts, and then you end up with unbacked SymInt in guards, which is bad. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95733 Approved by: https://github.com/tugsbayasgalan	2023-03-03 21:56:41 +00:00
PyTorch MergeBot	4026c62174	Revert "Don't use guardless contiguity/stride-like implementations (#95733 )" This reverts commit `deaf077de8`. Reverted https://github.com/pytorch/pytorch/pull/95733 on behalf of https://github.com/ezyang due to apparently this regresses executorch tests internally	2023-03-03 17:43:05 +00:00
Edward Z. Yang	deaf077de8	Don't use guardless contiguity/stride-like implementations (#95733 ) These prevent us from simplifying tests involving unbacked SymInts, and then you end up with unbacked SymInt in guards, which is bad. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95733 Approved by: https://github.com/tugsbayasgalan	2023-03-01 23:14:58 +00:00
Edward Z. Yang	e628a3e724	Don't generate guards that refer to unbacked SymInts (#95732 ) This regresses unbacked batch resnet, but I have a plan to recover that too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95732 Approved by: https://github.com/tugsbayasgalan	2023-03-01 06:14:27 +00:00
Edward Z. Yang	d78274b759	Automatically guard when SymInt is converted to int (#95479 ) During enablement, we disabled int() conversions because they were any easy way to footgun guards. We have enough of dynamic shapes working now that this is now causing spurious errors; e.g., if you feed a symbolic int to x.size(symint). We now allow for implicit conversions of SymInt to int here, posting a guard. We expect guard provenance to help people debug overspecialization. Fixes https://github.com/pytorch/pytorch/issues/95328 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95479 Approved by: https://github.com/wconstab, https://github.com/voznesenskym, https://github.com/ngimel	2023-02-25 19:41:51 +00:00
Edward Z. Yang	8efe4fd590	Memoize repeated nonzero calls to the same fake tensor (#95399 ) This removes the need to explicitly constrain_unify `x[mask]` and `y[mask]` when mask is a boolean tensor. It's very narrow but it seems to work in practice. To invalidate the nonzero call when mutation occurs, I use version counter. I know there are ways to bypass this but I think it's good enough for now. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95399 Approved by: https://github.com/eellison	2023-02-24 00:27:45 +00:00
Edward Z. Yang	4833e47feb	Add support for nonzero, some improvements to reduce guards (#95387 ) This takes the strategy described in https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit# It is essentially https://github.com/pytorch/pytorch/pull/95222 but squashed and with changes that are unnecessary given that we assume nonzero returns > 1. What's in the PR: * nonzero now supports meta propagation. When `capture_dynamic_output_shape_ops`, it will return a tensor with an unbacked SymInt representing the size in question. * The unbacked SymInt is UNSOUNDLY assumed to be not equal to 0/1. We will still error if you guard otherwise. * PrimTorch pointwise operators are updated to use empty_permuted, to avoid guarding on unbacked SymInt from empty_strided (tested in `test_dynamic_pointwise_scalar`) * Convolution is updated to skip backend selection if batch is unbacked, to avoid guarding on unbacked SymInt (tested in `test_unbacked_batch_resnet`) * I kept the helper utilities like `definitely_true` for working with possibly unbacked SymInts. They're not used right now but maybe someone will find them useful. * Added `constrain_unify` to let you specify two unbacked SymInts must have the same value Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95387 Approved by: https://github.com/voznesenskym	2023-02-24 00:27:45 +00:00
Edward Z. Yang	3758559a58	Reland "Introduce constrain_range; remove old expr_subs (#95063 )" (#95209 ) This reverts commit `4e88547c95`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95209 Approved by: https://github.com/albanD	2023-02-22 18:16:25 +00:00
PyTorch MergeBot	cf6e078c34	Revert "Reland "Introduce constrain_range; remove old expr_subs (#95063 )" (#95209 )" This reverts commit `f7bf31fff1`. Reverted https://github.com/pytorch/pytorch/pull/95209 on behalf of https://github.com/ezyang due to internal sympy is too old	2023-02-22 01:58:58 +00:00
Edward Z. Yang	f7bf31fff1	Reland "Introduce constrain_range; remove old expr_subs (#95063 )" (#95209 ) This reverts commit `4e88547c95`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95209 Approved by: https://github.com/albanD	2023-02-21 18:02:48 +00:00
Edward Z. Yang	ce950b412f	Reland "Add torch.empty_permuted (#95069 )" (#95208 ) This reverts commit `92e03cd583`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95208 Approved by: https://github.com/albanD	2023-02-21 18:02:48 +00:00
PyTorch MergeBot	92e03cd583	Revert "Add torch.empty_permuted (#95069 )" This reverts commit `bedeb1f014`. Reverted https://github.com/pytorch/pytorch/pull/95069 on behalf of https://github.com/jeanschmidt due to Breaking internal builds. More in https://fburl.com/phabricator/ztrxrroq	2023-02-21 12:05:20 +00:00
PyTorch MergeBot	4e88547c95	Revert "Introduce constrain_range; remove old expr_subs (#95063 )" This reverts commit `3711f7c59f`. Reverted https://github.com/pytorch/pytorch/pull/95063 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, more details can be found: https://fburl.com/phabricator/fq5b6k8a	2023-02-21 10:43:39 +00:00
Natalia Gimelshein	286d821e61	Don't replace FloorDiv with floor in simplify, do simplifications for divisible exprs (#95076 ) I don't see why `floor` is better than `FloorDiv` and solve with `FloorDiv` doesn't work anyway (the solution wouldn't be unique even if it worked). Pull Request resolved: https://github.com/pytorch/pytorch/pull/95076 Approved by: https://github.com/jansel, https://github.com/malfet, https://github.com/nkaretnikov	2023-02-20 01:53:54 +00:00
Edward Z. Yang	bedeb1f014	Add torch.empty_permuted (#95069 ) torch.empty_permuted is a generalized version of torch.empty(memory_format=...), where you can pass an arbitrary physical layout as a tuple of dims to allow you to setup dense, non-overlapping tensors with non-standard memory format. Check the docblock for a full description of semantics. The initial motivation for this PR is with guard-less unbacked SymInts. Traditionally, the way we allocate dense tensors with arbitrary layout is with `empty_strided`. However, `empty_strided` does not know that the given strides are actually contiguous, and must test this manually to find out if it is the case. With `empty_permuted`, this is known statically to be the case and helps us skip some 0/1 guards. However, I also think torch.empty_permuted is a useful API in its own right. It is technically possible to simulate this with an empty and a permute; however, there are some downsides: * The manual incant is tricky to work out. To allocate an NHWC tensor, the invocation is `torch.empty(N, H, W, C).permute(0, 3, 1, 2)`; the permute call has to take NHWC to NCHW, and is the inverse of the permutation people are typically thinking of when they talk about NHWC (0, 2, 3, 1). Instead, torch.empty_permuted lets you say `torch.empty_permuted((N, C, H, W), (0, 2, 3, 1))`, letting you provide the intuitive permutation. It can be literally be read off as NHWC if you assign N=0, C=1, H=2, W=3. * An empty(requires_grad=True).permute() is no longer a leaf tensor. You can force it to be a leaf with a detach(), but it is more straightforward and less error prone to allow directly allocating a tensor with the correct permutation. It is also technically possible to simulate this with empty_strided. However, this requires the user to manually compute the contiguous output strides and is bad from a reduction of guards perspective. For what it's worth, this is one of the more common uses of as_strided in the wild, and it would be nice to get rid of it. A nice enhancement of this feature would be to accept `physical_layout` anywhere `memory_format` is accepted. However, this would be a pretty involved change, so I'm doing the easy thing instead. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95069 Approved by: https://github.com/malfet, https://github.com/ngimel, https://github.com/albanD, https://github.com/dagitses	2023-02-20 00:23:10 +00:00
Edward Z. Yang	3711f7c59f	Introduce constrain_range; remove old expr_subs (#95063 ) This PR introduces a new `constrain_range` function which can be used to constrain the possible values a SymInt/SymFloat can take on. This knowledge can be then used to discharge potential guards (by running the range analysis, and then seeing if the guard must be true given the original range) without adding another guard. The usage of ranges is very limited right now; ranges are only constrained when the user explicitly instructs the system so. However, we can also infer range constraints based on guards as well; this is left for future work. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95063 Approved by: https://github.com/eellison	2023-02-19 23:17:09 +00:00
Fabio Rocha	b652577d8e	Change test_torchinductor_opinfo.py to mark skips/xfails in a better way (#94813 ) With this change, expected failures will be correctly reported as such by pytest (instead of passes as before). It was sometimes a little confusing to see operators you did not expect to work in inductor reported as passing their tests. One downside is that expected failures/skips for test variants have now to be identified by tuples. I.e., `("max", "reduction_no_dim"): {f16},` instead of just `"max.reduction_no_dim": {f16}`. It seems to me it is worth it. This change would also allow to simplify `TestInductorOpInfo` class a little, since it doesn't have to handle the skips/xfails anymore, but that might require dropping support for things like `PYTORCH_COLLECT_EXPECT` and `PYTORCH_FAIL_ON_SUCCESS` so I didn't do it. Also couple of other minor changes: - Got rid of c32, c64, c128 in torchinductor_opinfo. We don't support complex numbers, so they shouldn't be necessary. - Renamed TestExpect Enum to ExpectedTestResult to get rid of a pytest warning that thinks it is a class that has tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94813 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-02-16 18:57:01 +00:00
Edward Z. Yang	ef5de0a4cf	Don't use PrimTorch decomposition for empty (#94512 ) This PR removes the unnecessary == 0 guard when constructing empty tensors, by ensuring that when we create a contiguous tensor we go directly to the C++ torch.empty implementation (instead of indirecting through empty_strided), where we can bypass doing zero tests when computing the size of the storage. This probably also speeds up trace time. When I did this, I found out that `empty_tensor_restride_symint` was flagrantly wrong (we had never exercised it before because we redirected to `empty_strided` in PrimTorch decomp, which doesn't hit this codepath.) The bugs: * Stride computation was wrong (only `last_idx` was ever written to) * Using set_sizes_and_strides with `sym_sizes` input doesn't work, because there is some sort of ordering problem where `clone_symvec` isn't safe when you clone a vector into itself. Probably should fix this. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94512 Approved by: https://github.com/ngimel	2023-02-16 16:04:41 +00:00
Edward Z. Yang	2f32fd7762	Introduce branchless implementations of TensorImpl bools (#94473 ) This is the main payload of this diff stack. With it, we are able to construct a 1D tensor from unbacked SymInt with guards that are equivalent to asserting that the size is non-negative (which makes sense!) To get here, I had to arrange for all of the guards that occur when doing contiguity tests to be lazy. This was done by writing non-branching implementations of each of the tests in `sympy_is_contiguous` etc functions, and then using those implementations when we don't branch. I also had to do some bug fixes for `is_non_overlapping_and_dense`, as unbacked SymInts were very untested previously (and that was the only time you would actually hit the Python version of the code.) In particular, we now consistently pass separate sizes/strides lists into each of the boolean computation functions (and only pack them into a single argument list when going to Sympy, which doesn't support lists of variables in custom functions.) Finally, to actually test that this is doing something, I add a simple assumptions system from https://github.com/pytorch/pytorch/pull/90985 and use this to get the end to end test test_item_to_constructor passing. Soon, I intend to replace this with a range analysis system which will be used for assumptions in the short term. (We still might use Z3, but for all the stray assumptions I've seen range analysis will be good enough.) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94473 Approved by: https://github.com/albanD	2023-02-16 16:02:13 +00:00
Edward Z. Yang	89e16c4f18	Assume sympy is always installed (#94903 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94903 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-02-16 14:09:58 +00:00
PyTorch MergeBot	a049bbb100	Revert "Change test_torchinductor_opinfo.py to mark skips/xfails in a better way (#94813 )" This reverts commit `bfc0d5e22c`. Reverted https://github.com/pytorch/pytorch/pull/94813 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it causes failures on trunk `bfc0d5e22c` due to a landrace with `b6df987671`	2023-02-16 05:08:23 +00:00
Fabio Rocha	bfc0d5e22c	Change test_torchinductor_opinfo.py to mark skips/xfails in a better way (#94813 ) With this change, expected failures will be correctly reported as such by pytest (instead of passes as before). It was sometimes a little confusing to see operators you did not expect to work in inductor reported as passing their tests. One downside is that expected failures/skips for test variants have now to be identified by tuples. I.e., `("max", "reduction_no_dim"): {f16},` instead of just `"max.reduction_no_dim": {f16}`. It seems to me it is worth it. This change would also allow to simplify `TestInductorOpInfo` class a little, since it doesn't have to handle the skips/xfails anymore, but that might require dropping support for things like `PYTORCH_COLLECT_EXPECT` and `PYTORCH_FAIL_ON_SUCCESS` so I didn't do it. Also couple of other minor changes: - Got rid of c32, c64, c128 in torchinductor_opinfo. We don't support complex numbers, so they shouldn't be necessary. - Renamed TestExpect Enum to ExpectedTestResult to get rid of a pytest warning that thinks it is a class that has tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94813 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-02-16 03:32:01 +00:00
min-jean-cho	b6df987671	[Inductor] Added aten.normal_ decomp (#91207 ) Fixes #91085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91207 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/lezcano	2023-02-15 21:21:46 +00:00
Edward Z. Yang	abf59f5703	Make _simplified kwarg private (#94782 ) CR on https://github.com/pytorch/pytorch/pull/94404 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94782 Approved by: https://github.com/voznesenskym	2023-02-15 01:52:16 +00:00
Edward Z. Yang	f1f26fe8ec	Streamlining guard expect tests (#94404 ) Changes: * Add `simplified` kwarg to let you only render guards that are nontrivial (excludes duck sizing) * Make a list of strings valid for sources, if you just have some variable names you want to bind to * Add test helper `show_guards` using these facilities, switch a few tests to it Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94404 Approved by: https://github.com/Chillee	2023-02-13 23:36:21 +00:00
Aaron Gokaslan	3d82d8d0ed	[BE] Enable more flake8-comprehensions checks (#94601 ) I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR. This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601 Approved by: https://github.com/ezyang	2023-02-10 23:40:29 +00:00
mingfeima	c620ece726	port sparse_mm.reduce to pytorch and optimize it on CPU (#83727 ) ### Motivation of this PR This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of Gather, Apply Scatter in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 GAS is the major step for Message Passing, the behavior of GAS can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". extend `torch.sparse.mm` with an `reduce` argument, maps to `torch.sparse_mm.reduce` internally. `sparse_mm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_sparse_mm_reduce_impl` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. ### Performance Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum\|mean` is sequential; the original backward impl for `max\|min` is not fused. #### before: ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` #### after ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83727 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch, https://github.com/rusty1s, https://github.com/pearu	2023-02-10 15:56:40 +00:00
albanD	496c0a207b	Make segment_reduce properly private. (#93166 ) I am attempting not to change the aten function to reduce the amount of BC issues on the torchscript side. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93166 Approved by: https://github.com/ngimel	2023-02-06 18:32:23 +00:00
Michael Voznesensky	60a3b7425d	Small refactor of shape guards to allow for 1:1 code_parts (#93894 ) By moving guard string assembly into dynamo's default behavior and letting code_parts do the work, we can have much better shape guard failures. Before this fix, the guard failure in the test would look like: ``` 'x.size()[1] == x.size()[0] and x.stride()[0] == x.[264 chars]!= 1' != 'x.size()[0] < 3' - x.size()[1] == x.size()[0] and x.stride()[0] == x.size()[0] and x.stride()[1] == 1 and x.storage_offset() == 0 and y.size()[0] == x.size()[0] and y.size()[1] == x.size()[0] and y.stride()[0] == x.size()[0] and y.stride()[1] == 1 and y.storage_offset() == 0 and x.size()[0] < 3 and x.size()[0] != 0 and x.size()[0] != 1 + x.size()[0] < 3 ``` now it is ``` "x.size()[0] < 3" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93894 Approved by: https://github.com/ezyang	2023-02-05 09:24:12 +00:00
Michael Suo	4e4293f15f	Add meta registration for bucketize (#93893 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93893 Approved by: https://github.com/zhxchen17	2023-02-02 21:03:08 +00:00
jon-chuang	d5901fcc80	fix(fx): make all `make_fx` invocations isolated (opaque to higher `make_fx` invocations) by default (#93290 ) Fixes https://github.com/pytorch/pytorch/issues/88996#issuecomment-1409174554 Example code: ```python import torch from torch.fx.experimental.proxy_tensor import make_fx, wrapper_and_args_for_make_fx @torch.fx.wrap def func(a, b): return b.expand([1, a.shape[0], b.shape[-1]]) a = torch.randn(3, 4) b = torch.randn(4) class TestMode(torch.overrides.TorchFunctionMode): def __torch_function__(self, func, types, args=(), kwargs={}): if torch.overrides.resolve_name(func) in ["torch.Tensor.expand"]: print(f"TestMode: {func} {args} {kwargs}") wrapped, all_args = wrapper_and_args_for_make_fx(func, args, kwargs) gm = make_fx(wrapped, tracing_mode="real")(all_args) return func(args, *kwargs) with TestMode(): gm = make_fx(func, tracing_mode="symbolic")(a, b) gm.graph.print_tabular() ``` Before: ``` opcode name target args kwargs ------------- ---------- ------------------- -------------------------------- -------- placeholder a_1 a_1 () {} placeholder b_1 b_1 () {} call_function detach aten.detach.default (b_1,) {} call_function detach_1 aten.detach.default (detach,) {} call_function sym_size aten.sym_size (a_1, 0) {} call_function sym_size_1 aten.sym_size (b_1, 0) {} call_function expand aten.expand.default (b_1, [1, sym_size, sym_size_1]) {} call_function detach_2 aten.detach.default (expand,) {} call_function expand_1 aten.expand.default (b_1, [1, sym_size, sym_size_1]) {} output output output (expand_1,) {} ``` After: ``` opcode name target args kwargs ------------- ---------- ------------------- -------------------------------- -------- placeholder a_1 a_1 () {} placeholder b_1 b_1 () {} call_function sym_size aten.sym_size (a_1, 0) {} call_function sym_size_1 aten.sym_size (b_1, 0) {} call_function expand aten.expand.default (b_1, [1, sym_size, sym_size_1]) {} output output output (expand_1,) {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93290 Approved by: https://github.com/ezyang	2023-02-01 17:28:48 +00:00
Ivan Yashchuk	fba13d94a1	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. - [x] XLA PR: https://github.com/pytorch/xla/pull/4498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet	2023-01-31 11:59:11 +00:00
Edward Z. Yang	ec2461bbd8	Remove proxy tensor's check for data dependent output (#93265 ) We'll rely on the underlying fake tensor to raise an error in these cases. We only raise the error if there is an input to the data dependent operation that is a real tensor (and thus we are at risk of accidentally burning in real values) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93265 Approved by: https://github.com/albanD	2023-01-31 11:58:49 +00:00
Aaron Gokaslan	e790281a85	SymInt'ify view_as (#93242 ) Follow up to #93241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93242 Approved by: https://github.com/ezyang	2023-01-30 01:56:50 +00:00
Edward Z. Yang	3c570a2be3	SymInt'ify reshape_as (#93241 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93241 Approved by: https://github.com/Skylion007	2023-01-30 01:46:16 +00:00
Edward Z. Yang	1b5bfe9dd1	Properly compute device for elementwise operations with CPU scalar tensor (#93073 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93073 Approved by: https://github.com/eellison, https://github.com/bdhirsh	2023-01-26 21:27:57 +00:00
Edward Z. Yang	17803fb36e	Make meshgrid support symbolic shapes (#93075 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93075 Approved by: https://github.com/Skylion007	2023-01-26 20:57:29 +00:00
Joel Schlosser	e5fd7e6d8f	Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854 ) For the `crossvit_9_240` model - it works now with dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92854 Approved by: https://github.com/ezyang	2023-01-25 05:08:02 +00:00
PyTorch MergeBot	01f1097770	Revert "Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854 )" This reverts commit `d49187bf88`. Reverted https://github.com/pytorch/pytorch/pull/92854 on behalf of https://github.com/malfet due to Resulted in 50+% flaky failures in dynamo, reverting	2023-01-25 00:10:14 +00:00

1 2 3 4 5 ...

392 Commits