mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
3058700f7f
17 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
3f88e3105f |
Reland: Remove remaining global set_default_dtype calls from tests (#108088)
Fixes #68972 Relands #107246 To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088 Approved by: https://github.com/ezyang |
||
|
|
161ea463e6 |
Revert "Remove remaining global set_default_dtype calls from tests (#107246)"
This reverts commit
|
||
|
|
aa8ea1d787 |
Remove remaining global set_default_dtype calls from tests (#107246)
Fixes #68972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107246 Approved by: https://github.com/ezyang |
||
|
|
723f111545 |
[custom_op] explicit autograd API (#101824)
This PR adds an explicit API for registering a backward formula for a CustomOp. In the end state, we will likely have this explicit API and a magic API (which is sugar on top of an explicit API), since different parties of users prefer different ones. Concretely, to define a backward formula for a CustomOp: - a user must provide us a "save for backward" function that accepts (inputs, output) and returns exactly what they want saved for backward - a user must provide us a "backward" function that accepts (ctx, saved, *grads) and returns us the grad_inputs. The grad_inputs are returned as a dict mapping str to a gradient. Please see the changes in custom_op_db.py for examples of the API. There are a number of pieces to this PR and I'm happy to split it if it helps. They are: - The actual APIs for specifying the two functions (impl_save_for_backward, impl_backward) - The autograd kernel: we take the functions the user give us and construct an autograd.Function object that we then register to the Autograd dispatch key - Indirection for the autograd kernel. We add a layer of indirection so that one can swap out the autograd kernel. This is necessary because by default, we register an "autograd not implemented" kernel as the Autograd implementation but then swap it for the actual kernel when the user provides it. Test Plan: - We apply this API to give backward formulas for things in custom_op_db. We then hook up custom_op_db to the Autograd OpInfo tests. - Various tests in test_python_dispatch.py to check error cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101824 Approved by: https://github.com/ezyang |
||
|
|
326a4cc815 |
Support map autograd and pytree in/out. (#101633)
Rebased https://github.com/pytorch/pytorch/pull/100494 and added dummy AOTConfig. This PR adds autograd and pytree support for map operator. Implementation-wise: 1. We temporarily make two HigherOrderOperators, "map" and "map_impl": - "map" is user-facing. Currently, it unwraps the pytrees in inputs and create a flat_fn for it. Dynamo currently cannot deal with pytree.tree_flatten and pytree.tree_unflatten, we therefore make it a HigherOrderOperator to trigger dynamo logic of handling HigherOrderOperators. - "map_impl" is the actual operator that works with the rest of torch subsystems such as functionalization, make_fx. It accepts flattend arguments, and a num_mapped_args integer denoting how many of the flattend arguments need to mapped i.e. their first dimension will be unstacked. 2. We create the forward and backward graph in autograd key and call torch.autograd.Function. Currently, the backward graph is recomputation-based and we need to partition the joint graph in the future to be more efficient. Example traced graphs for map operators: ### Case 1: simple f and autograd ```python def f(x, y): return x + y def g(xs, y): out = control_flow.map(f, xs, y) return torch.autograd.grad(out, (xs, y), torch.ones_like(out)) gm = make_fx(g, tracing_mode="symbolic")(torch.ones(3, 4, 5, requires_grad=True), torch.ones(5, requires_grad=True)) # gm.print_readable() produces following: class g(torch.nn.Module): def forward(self, xs_1: f32[3, s1, s2], y_1: f32[s2]): # No stacktrace found for following nodes body_graph_0 = self.body_graph_0 map_impl = torch.ops.map_impl(body_graph_0, 1, xs_1, y_1); body_graph_0 = None getitem: f32[3, s1, s2] = map_impl[0]; map_impl = None ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False) is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like); getitem = None body_graph_1 = self.body_graph_1 map_impl_1 = torch.ops.map_impl(body_graph_1, 2, xs_1, ones_like, y_1); body_graph_1 = xs_1 = ones_like = None getitem_1 = map_impl_1[0] getitem_2: f32[3, s1, s2] = map_impl_1[1] getitem_3: f32[3, s2] = map_impl_1[2]; map_impl_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_3, [0], True); getitem_3 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0); y_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return (getitem_2, view) class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s2]): # No stacktrace found for following nodes add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg2_1); arg1_1 = arg2_1 = None return [add] class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]): # No stacktrace found for following nodes add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg3_1); arg1_1 = None is_same_size = torch.ops.aten.is_same_size.default(add, arg2_1); add = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg2_1, [0], True) sym_size: Sym(s2) = torch.ops.aten.sym_size(arg3_1, 0); arg3_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return [None, arg2_1, view] ``` ### Case 2: list input/output f and autograd ```python def f(x, y): return [x[0].cos() + y.sin(), x[1].sin() * y.cos()] def g(xs, y): out = control_flow.map(f, xs, y) flat_out, _ = pytree.tree_flatten(out) flat_inp, _ = pytree.tree_flatten((xs, y)) requires_grad_inp = [inp for inp in flat_inp if inp.requires_grad] return torch.autograd.grad(flat_out, requires_grad_inp, [torch.ones_like(out) for out in flat_out]) gm = make_fx(g, tracing_mode="symbolic")( [torch.ones(3, 4, 5), torch.ones(3, 4, 5, requires_grad=True)], torch.ones(5, requires_grad=True)) # gm.print_readable() produces following: class g(torch.nn.Module): def forward(self, xs, y): xs_1: f32[3, s1, s2], xs_2: f32[3, s1, s2], y_1: f32[s2], = fx_pytree.tree_flatten_spec([xs, y], self._in_spec) # No stacktrace found for following nodes body_graph_0 = self.body_graph_0 map_impl = torch.ops.map_impl(body_graph_0, 2, xs_1, xs_2, y_1); body_graph_0 = None getitem: f32[3, s1, s2] = map_impl[0] getitem_1: f32[3, s1, s2] = map_impl[1]; map_impl = None ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False) ones_like_1: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem_1, pin_memory = False) is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like); getitem = None is_same_size_1 = torch.ops.aten.is_same_size.default(getitem_1, ones_like_1); getitem_1 = None body_graph_1 = self.body_graph_1 map_impl_1 = torch.ops.map_impl(body_graph_1, 4, xs_1, xs_2, ones_like, ones_like_1, y_1); body_graph_1 = xs_1 = xs_2 = ones_like = ones_like_1 = None getitem_2 = map_impl_1[0] getitem_3 = map_impl_1[1] getitem_4: f32[3, s1, s2] = map_impl_1[2] getitem_5: f32[3, s2] = map_impl_1[3]; map_impl_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_5, [0], True); getitem_5 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0); y_1 = None view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None return pytree.tree_unflatten([getitem_4, view], self._out_spec) class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]): # No stacktrace found for following nodes cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1); arg1_1 = None sin: f32[s2] = torch.ops.aten.sin.default(arg3_1) add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin); cos = sin = None sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1); arg2_1 = None cos_1: f32[s2] = torch.ops.aten.cos.default(arg3_1); arg3_1 = None mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1); sin_1 = cos_1 = None return [add, mul] class <lambda>(torch.nn.Module): def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s1, s2], arg4_1: f32[s1, s2], arg5_1: f32[s2]): # No stacktrace found for following nodes cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1); arg1_1 = None sin: f32[s2] = torch.ops.aten.sin.default(arg5_1) add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin); cos = sin = None sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1) cos_1: f32[s2] = torch.ops.aten.cos.default(arg5_1) mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1) is_same_size = torch.ops.aten.is_same_size.default(add, arg3_1); add = None is_same_size_1 = torch.ops.aten.is_same_size.default(mul, arg4_1); mul = None mul_1: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, sin_1); sin_1 = None mul_2: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, cos_1); arg4_1 = cos_1 = None sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(mul_1, [0], True); mul_1 = None sym_size: Sym(s2) = torch.ops.aten.sym_size(arg5_1, 0) view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = None # sin_2: f32[s2] = torch.ops.aten.sin.default(arg5_1) neg: f32[s2] = torch.ops.aten.neg.default(sin_2); sin_2 = None mul_3: f32[s2] = torch.ops.aten.mul.Tensor(view, neg); view = neg = None cos_2: f32[s1, s2] = torch.ops.aten.cos.default(arg2_1); arg2_1 = None mul_4: f32[s1, s2] = torch.ops.aten.mul.Tensor(mul_2, cos_2); mul_2 = cos_2 = None sum_2: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg3_1, [0], True); arg3_1 = None view_1: f32[s2] = torch.ops.aten.view.default(sum_2, [sym_size]); sum_2 = sym_size = None cos_3: f32[s2] = torch.ops.aten.cos.default(arg5_1); arg5_1 = None mul_5: f32[s2] = torch.ops.aten.mul.Tensor(view_1, cos_3); view_1 = cos_3 = None add_1: f32[s2] = torch.ops.aten.add.Tensor(mul_3, mul_5); mul_3 = mul_5 = None return [None, None, mul_4, add_1] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101633 Approved by: https://github.com/zou3519 |
||
|
|
e69198b043 |
Revert "Support map autograd and pytree in/out (#100494)"
This reverts commit
|
||
|
|
b8fa41be9d |
Support map autograd and pytree in/out (#100494)
This PR adds autograd and pytree support for map operator.
Implementation-wise:
1. We temporarily make two HigherOrderOperators, "map" and "map_impl":
- "map" is user-facing. Currently, it unwraps the pytrees in inputs and create a flat_fn for it. Dynamo currently cannot deal with pytree.tree_flatten and pytree.tree_unflatten, we therefore make it a HigherOrderOperator to trigger dynamo logic of handling HigherOrderOperators.
- "map_impl" is the actual operator that works with the rest of torch subsystems such as functionalization, make_fx. It accepts flattend arguments, and a num_mapped_args integer denoting how many of the flattend arguments need to mapped i.e. their first dimension will be unstacked.
2. We create the forward and backward graph in autograd key and call torch.autograd.Function. Currently, the backward graph is recomputation-based and we need to partition the joint graph in the future to be more efficient.
Example traced graphs for map operators:
### Case 1: simple f and autograd
```python
def f(x, y):
return x + y
def g(xs, y):
out = control_flow.map(f, xs, y)
return torch.autograd.grad(out, (xs, y), torch.ones_like(out))
gm = make_fx(g, tracing_mode="symbolic")(torch.ones(3, 4, 5, requires_grad=True), torch.ones(5, requires_grad=True))
# gm.print_readable() produces following:
class g(torch.nn.Module):
def forward(self, xs_1: f32[3, s1, s2], y_1: f32[s2]):
# No stacktrace found for following nodes
body_graph_0 = self.body_graph_0
map_impl = torch.ops.map_impl(body_graph_0, 1, xs_1, y_1); body_graph_0 = None
getitem: f32[3, s1, s2] = map_impl[0]; map_impl = None
ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False)
is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like); getitem = None
body_graph_1 = self.body_graph_1
map_impl_1 = torch.ops.map_impl(body_graph_1, 2, xs_1, ones_like, y_1); body_graph_1 = xs_1 = ones_like = None
getitem_1 = map_impl_1[0]
getitem_2: f32[3, s1, s2] = map_impl_1[1]
getitem_3: f32[3, s2] = map_impl_1[2]; map_impl_1 = None
sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_3, [0], True); getitem_3 = None
sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0); y_1 = None
view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None
return (getitem_2, view)
class <lambda>(torch.nn.Module):
def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s2]):
# No stacktrace found for following nodes
add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg2_1); arg1_1 = arg2_1 = None
return [add]
class <lambda>(torch.nn.Module):
def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]):
# No stacktrace found for following nodes
add: f32[s1, s2] = torch.ops.aten.add.Tensor(arg1_1, arg3_1); arg1_1 = None
is_same_size = torch.ops.aten.is_same_size.default(add, arg2_1); add = None
sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg2_1, [0], True)
sym_size: Sym(s2) = torch.ops.aten.sym_size(arg3_1, 0); arg3_1 = None
view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None
return [None, arg2_1, view]
```
### Case 2: list input/output f and autograd
```python
def f(x, y):
return [x[0].cos() + y.sin(), x[1].sin() * y.cos()]
def g(xs, y):
out = control_flow.map(f, xs, y)
flat_out, _ = pytree.tree_flatten(out)
flat_inp, _ = pytree.tree_flatten((xs, y))
requires_grad_inp = [inp for inp in flat_inp if inp.requires_grad]
return torch.autograd.grad(flat_out, requires_grad_inp, [torch.ones_like(out) for out in flat_out])
gm = make_fx(g, tracing_mode="symbolic")(
[torch.ones(3, 4, 5), torch.ones(3, 4, 5, requires_grad=True)],
torch.ones(5, requires_grad=True))
# gm.print_readable() produces following:
class g(torch.nn.Module):
def forward(self, xs, y):
xs_1: f32[3, s1, s2], xs_2: f32[3, s1, s2], y_1: f32[s2], = fx_pytree.tree_flatten_spec([xs, y], self._in_spec)
# No stacktrace found for following nodes
body_graph_0 = self.body_graph_0
map_impl = torch.ops.map_impl(body_graph_0, 2, xs_1, xs_2, y_1); body_graph_0 = None
getitem: f32[3, s1, s2] = map_impl[0]
getitem_1: f32[3, s1, s2] = map_impl[1]; map_impl = None
ones_like: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem, pin_memory = False)
ones_like_1: f32[3, s1, s2] = torch.ops.aten.ones_like.default(getitem_1, pin_memory = False)
is_same_size = torch.ops.aten.is_same_size.default(getitem, ones_like); getitem = None
is_same_size_1 = torch.ops.aten.is_same_size.default(getitem_1, ones_like_1); getitem_1 = None
body_graph_1 = self.body_graph_1
map_impl_1 = torch.ops.map_impl(body_graph_1, 4, xs_1, xs_2, ones_like, ones_like_1, y_1); body_graph_1 = xs_1 = xs_2 = ones_like = ones_like_1 = None
getitem_2 = map_impl_1[0]
getitem_3 = map_impl_1[1]
getitem_4: f32[3, s1, s2] = map_impl_1[2]
getitem_5: f32[3, s2] = map_impl_1[3]; map_impl_1 = None
sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(getitem_5, [0], True); getitem_5 = None
sym_size: Sym(s2) = torch.ops.aten.sym_size(y_1, 0); y_1 = None
view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = sym_size = None
return pytree.tree_unflatten([getitem_4, view], self._out_spec)
class <lambda>(torch.nn.Module):
def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s2]):
# No stacktrace found for following nodes
cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1); arg1_1 = None
sin: f32[s2] = torch.ops.aten.sin.default(arg3_1)
add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin); cos = sin = None
sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1); arg2_1 = None
cos_1: f32[s2] = torch.ops.aten.cos.default(arg3_1); arg3_1 = None
mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1); sin_1 = cos_1 = None
return [add, mul]
class <lambda>(torch.nn.Module):
def forward(self, arg0_1, arg1_1: f32[s1, s2], arg2_1: f32[s1, s2], arg3_1: f32[s1, s2], arg4_1: f32[s1, s2], arg5_1: f32[s2]):
# No stacktrace found for following nodes
cos: f32[s1, s2] = torch.ops.aten.cos.default(arg1_1); arg1_1 = None
sin: f32[s2] = torch.ops.aten.sin.default(arg5_1)
add: f32[s1, s2] = torch.ops.aten.add.Tensor(cos, sin); cos = sin = None
sin_1: f32[s1, s2] = torch.ops.aten.sin.default(arg2_1)
cos_1: f32[s2] = torch.ops.aten.cos.default(arg5_1)
mul: f32[s1, s2] = torch.ops.aten.mul.Tensor(sin_1, cos_1)
is_same_size = torch.ops.aten.is_same_size.default(add, arg3_1); add = None
is_same_size_1 = torch.ops.aten.is_same_size.default(mul, arg4_1); mul = None
mul_1: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, sin_1); sin_1 = None
mul_2: f32[s1, s2] = torch.ops.aten.mul.Tensor(arg4_1, cos_1); arg4_1 = cos_1 = None
sum_1: f32[1, s2] = torch.ops.aten.sum.dim_IntList(mul_1, [0], True); mul_1 = None
sym_size: Sym(s2) = torch.ops.aten.sym_size(arg5_1, 0)
view: f32[s2] = torch.ops.aten.view.default(sum_1, [sym_size]); sum_1 = None
#
sin_2: f32[s2] = torch.ops.aten.sin.default(arg5_1)
neg: f32[s2] = torch.ops.aten.neg.default(sin_2); sin_2 = None
mul_3: f32[s2] = torch.ops.aten.mul.Tensor(view, neg); view = neg = None
cos_2: f32[s1, s2] = torch.ops.aten.cos.default(arg2_1); arg2_1 = None
mul_4: f32[s1, s2] = torch.ops.aten.mul.Tensor(mul_2, cos_2); mul_2 = cos_2 = None
sum_2: f32[1, s2] = torch.ops.aten.sum.dim_IntList(arg3_1, [0], True); arg3_1 = None
view_1: f32[s2] = torch.ops.aten.view.default(sum_2, [sym_size]); sum_2 = sym_size = None
cos_3: f32[s2] = torch.ops.aten.cos.default(arg5_1); arg5_1 = None
mul_5: f32[s2] = torch.ops.aten.mul.Tensor(view_1, cos_3); view_1 = cos_3 = None
add_1: f32[s2] = torch.ops.aten.add.Tensor(mul_3, mul_5); mul_3 = mul_5 = None
return [None, None, mul_4, add_1]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100494
Approved by: https://github.com/zou3519
|
||
|
|
4c20c0509d |
Split out forward AD tests from test_ops_gradients and reenable slow gradcheck CI (#88216)
Fixes: https://github.com/pytorch/pytorch/issues/88010 This PR does a couple things to stop slow gradcheck from timing out: - Splits out test_ops_fwd_gradients from test_ops_gradients, and factors out TestFwdGradients and TestBwdGradients which both inherit from TestGradients, now situated in common_utils (maybe there is a better place?) - Skips CompositeCompliance (and several other test files) for slow gradcheck CI since they do not use gradcheck - because test times for test_ops_fwd_gradients and test_ops_gradients are either unknown or wrong, we hardcode them for now to prevent them from being put together. We can undo the hack after we see actual test times are updated. ("def calculate_shards" randomly divides tests with unknown test times in a round-robin fashion.) - Updates references to test_ops_gradients and TestGradients - Test files that are skipped for slow gradcheck CI are now centrally located in in run_tests.py, this reduces how fine-grained we can be with the skips, so for some skips (one so far) we still use the old skipping mechanism, e.g. for test_mps Pull Request resolved: https://github.com/pytorch/pytorch/pull/88216 Approved by: https://github.com/albanD |
||
|
|
2c1efe7472 |
Enable some PyTorch core tests with inductor (#87490)
Summary: 1) Graph break on torch.random.set_rng_state since it blocks running inductor core tests; 2) Add several inductor-specific skips; 3) Enable several core tests for inductor CI; cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87490 Approved by: https://github.com/eellison |
||
|
|
b97ae59e29 |
Change legacy wrap_dim to work with symint == (#86842)
- previously, sizes == vector<T>({0}) failed to hit SymInt::operator==, causing a the loop to bail out too early and make an invalid call to downstream maybe_wrap_dim helper
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86842
Approved by: https://github.com/Chillee, https://github.com/malfet, https://github.com/albanD
|
||
|
|
f595467e5c |
Reenable slow gradcheck and make it pass (#80514)
Context: For a while slow gradcheck CI was skipping nearly all tests and this hid the fact that it should've been failing and timing out (10+h runtime for TestGradients). The CI configuration has since been fixed to correct this, revealing the test failures. This PR reenables slow gradcheck CI and makes it pass again. This PR: - makes slow and failing tests run in fast gradcheck mode only - reduce the input size for slow gradcheck only for unary/binary ufuncs (alternatively, skip the test entirely) - skip entire test files on slow gradcheck runner if they don't use gradcheck (test_ops, test_meta, test_decomp, test_ops_jit) - reduces the input size for some ops Follow ups: 1. Investigate slow mode failures https://github.com/pytorch/pytorch/issues/80411 2. See if we can re-enable slow gradcheck tests for some of the slow tests by reducing the sizes of their inputs The following are failing in slow mode, they are now running in fast mode only. ``` test_fn_fwgrad_bwgrad___rmod___cuda_float64 test_fn_fwgrad_bwgrad_linalg_householder_product_cuda_complex128 test_fn_fwgrad_bwgrad__masked_prod_cuda_complex128 test_fn_fwgrad_bwgrad__masked_prod_cuda_float64 test_fn_fwgrad_bwgrad_linalg_matrix_power_cuda_complex128 test_fn_fwgrad_bwgrad_cat_cuda_complex128 test_fn_fwgrad_bwgrad_linalg_lu_factor_ex_cuda_float64 test_fn_fwgrad_bwgrad_copysign_cuda_float64 test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_complex128 test_fn_fwgrad_bwgrad_float_power_cuda_complex128 test_fn_fwgrad_bwgrad_fmod_cuda_float64 test_fn_fwgrad_bwgrad_float_power_cuda_float64 test_fn_fwgrad_bwgrad_linalg_lu_cuda_float64 test_fn_fwgrad_bwgrad_remainder_cuda_float64 test_fn_fwgrad_bwgrad_repeat_cuda_complex128 test_fn_fwgrad_bwgrad_prod_cuda_complex128 test_fn_fwgrad_bwgrad_slice_scatter_cuda_float64 test_fn_fwgrad_bwgrad_tile_cuda_complex128 test_fn_fwgrad_bwgrad_pow_cuda_float64 test_fn_fwgrad_bwgrad_pow_cuda_complex128 test_fn_fwgrad_bwgrad_fft_* test_fn_fwgrad_bwgrad_zero__cuda_complex128 test_fn_gradgrad_linalg_lu_factor_cuda_float64 test_fn_grad_div_trunc_rounding_cuda_float64 test_fn_grad_div_floor_rounding_cuda_float64 ``` Marks the OpInfos for the following ops that run slowly in slow gradcheck as `fast_gradcheck` only (the left column represents runtime in seconds): ``` 0 918.722 test_fn_fwgrad_bwgrad_nn_functional_conv_transpose3d_cuda_float64 1 795.042 test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_complex128 2 583.63 test_fn_fwgrad_bwgrad_nn_functional_max_pool3d_cuda_float64 3 516.946 test_fn_fwgrad_bwgrad_svd_cuda_complex128 4 503.179 test_fn_fwgrad_bwgrad_linalg_svd_cuda_complex128 5 460.985 test_fn_fwgrad_bwgrad_linalg_lu_cuda_complex128 6 401.04 test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_complex128 7 353.671 test_fn_fwgrad_bwgrad_nn_functional_max_pool2d_cuda_float64 8 321.903 test_fn_fwgrad_bwgrad_nn_functional_gaussian_nll_loss_cuda_float64 9 307.951 test_fn_fwgrad_bwgrad_stft_cuda_complex128 10 266.104 test_fn_fwgrad_bwgrad_svd_lowrank_cuda_float64 11 221.032 test_fn_fwgrad_bwgrad_istft_cuda_complex128 12 183.741 test_fn_fwgrad_bwgrad_lu_unpack_cuda_complex128 13 132.019 test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_float64 14 125.343 test_fn_fwgrad_bwgrad_nn_functional_pad_constant_cuda_complex128 15 124.2 test_fn_fwgrad_bwgrad_kron_cuda_complex128 16 123.721 test_fn_fwgrad_bwgrad_pca_lowrank_cuda_float64 17 121.074 test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64 18 119.387 test_fn_fwgrad_bwgrad_rot90_cuda_complex128 19 112.889 test_fn_fwgrad_bwgrad__masked_normalize_cuda_complex128 20 107.541 test_fn_fwgrad_bwgrad_dist_cuda_complex128 21 106.727 test_fn_fwgrad_bwgrad_diff_cuda_complex128 22 104.588 test_fn_fwgrad_bwgrad__masked_cumprod_cuda_complex128 23 100.135 test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64 24 88.359 test_fn_fwgrad_bwgrad_mH_cuda_complex128 25 86.214 test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64 26 83.037 test_fn_fwgrad_bwgrad_nn_functional_bilinear_cuda_float64 27 79.987 test_fn_fwgrad_bwgrad__masked_cumsum_cuda_complex128 28 77.822 test_fn_fwgrad_bwgrad_diag_embed_cuda_complex128 29 76.256 test_fn_fwgrad_bwgrad_mT_cuda_complex128 30 74.039 test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_complex128 ``` ``` 0 334.142 test_fn_fwgrad_bwgrad_unfold_cuda_complex128 1 312.791 test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_complex128 2 121.963 test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64 3 108.085 test_fn_fwgrad_bwgrad_diff_cuda_complex128 4 89.418 test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64 5 72.231 test_fn_fwgrad_bwgrad___rdiv___cuda_complex128 6 69.433 test_fn_fwgrad_bwgrad___getitem___cuda_complex128 7 68.582 test_fn_fwgrad_bwgrad_ldexp_cuda_complex128 8 68.572 test_fn_fwgrad_bwgrad_linalg_pinv_cuda_complex128 9 67.585 test_fn_fwgrad_bwgrad_nn_functional_glu_cuda_float64 10 66.567 test_fn_fwgrad_bwgrad_lu_cuda_float64 ``` ``` 0 630.13 test_fn_gradgrad_nn_functional_conv2d_cuda_complex128 1 81.086 test_fn_gradgrad_linalg_solve_triangular_cuda_complex128 2 71.332 test_fn_gradgrad_norm_cuda_complex128 3 64.308 test_fn_gradgrad__masked_std_cuda_complex128 4 59.519 test_fn_gradgrad_div_no_rounding_mode_cuda_complex128 5 58.836 test_fn_gradgrad_nn_functional_adaptive_avg_pool3 ``` Reduces the sizes of the inputs for: - diff - diag_embed Pull Request resolved: https://github.com/pytorch/pytorch/pull/80514 Approved by: https://github.com/albanD |
||
|
|
de949a0e59 |
Various OpInfo architecture improvements
This PR makes the following improvements: - moves the custom skip list for test_normalize_operator_exhaustive in test_fx_experimental to use the typical OpInfo skip architecture. The skips were updated to xfails, and that identified some operators which were no longer failing the test - redundant tests with OpInfo-based testing in test_jit.py were removed - test_dtypes was improved so its error messages are clear and it makes test_nondifferentiable redundant; the latter test has been removed - OpInfo.supports_complex_autograd() is removed in favor of a more accurate and general test for whether the particular dtype is in the backward dtypes of the operator - gradchecks have been improved to verify that an operator doesn't support grad if it claims not to - gradchecks have been improved to test the gradient of all input tensors that require gradient - the concept of "default test dtypes" has been removed - excessive and mostly redundant out testing for elementwise unary operators has been removed - metadata for whether an op supports nuanced "safe casting" to out behavior has been removed from OpInfos - numerous skips have been converted to xfails - numerous OpInfos have had their metadata fixed based on the new checks - jit-specific utilities in common_methods_invocations.py have been moved to jit_programming_utils.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/75951 Approved by: https://github.com/ngimel |
||
|
|
631f035131 |
Update forward AD not supported error message
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75105 Approved by: https://github.com/albanD |
||
|
|
637273da4e |
[ROCM] unskip test_fn_grad
The skip was not needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75113 Approved by: https://github.com/malfet |
||
|
|
c4321e1396 |
[ROCm] unskip FFT tests
This pr enables a lot of fft tests on rocm, see the full list here https://github.com/ROCmSoftwarePlatform/pytorch/issues/924. After enabling the tests we found that 3 tests , test_reference_1d, test_reference_nd and test_fn_grad have issue. We skip those tests on ROCM in this pr as well. We will address those skipped tests in a subsequent pr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74713 Approved by: https://github.com/malfet |
||
|
|
a1e284d9c8 |
Remove high priority as an owner for tests (#74555)
Summary: Following triage review discussion, it would be best for these tests to not be triaged high priority by automation, but by the triagers in the oncall. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74555 Reviewed By: albanD Differential Revision: D35099202 Pulled By: janeyx99 fbshipit-source-id: 657a0317141de3a598476a6f601ec26cc26231b1 (cherry picked from commit 057519cb2494d0f9a0b169f359ac87ba9e89f088) |
||
|
|
ebca80ed08 |
Move test ops gradients and test ops jit to separate files
Fixes #72368 As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs: Reference : pytorch_test_times.json ``` { "commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed", "JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan", "job_times": { "test_ops": 14928.355000000636, <- This test group is over 4hrs alone ``` ---- Hence separating test_ops into following parts: 1. TestGradients 2. TestJit 3. TestCommon and TestMathBits Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297 Approved by: https://github.com/malfet |