FallbackKernel wasn't handing mutable ops correctly: it would not report
them in get_mutation_names or get_alias_names. This would lead to silent
incorrectness -- Inductor would incorrectly reorder the mutable op with other
mutable ops.
This PR fixes that:
- we only support mutable operations that are "auto_functionalizable".
That is, they mutate inputs and do not return aliases of any inputs.
- Following the Triton kernel work, any mutated inputs must be specified
in get_alias_names and processed via mark_node_as_mutating
- We also do some minor cleanup by killing dead code (FallbackKernel no
longer processes OpOverloadPacket) and adding some handling around
HOPs.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118649
Approved by: https://github.com/eellison, https://github.com/oulgen
In preparation for the next PR up in the stack, which is going to update
"can_auto_functionalize" to support more operators than just ones that
return nothing. We are unable to auto-generate FakeTensor kernels for
operators that do not return nothing, but we are able to generate
functionalization kernels for operators that return something.
Test Plan:
Existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115134
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114955, #114956
Changelog:
- torch.library.impl_abstract optionally accepts a torch.library.Library
object. If passed in, then the lifetime of the registration is tied to
the Library object.
- we've also changed torch.library.impl_abstract to work on all
operators, including overloads.
- we refactored the `torch._custom_ops.*` and `torch._custom_op.*`
impl_abstract APIs and put them under torch._library. This is the
final resting place for them. I will follow-up with deleting
all the `torch._custom_ops.*` stuff later.
- There is a new "SimpleOperatorRegistry" where we actually collect the
abstract_impl. We will expand this to also hold the other
torch._custom_ops.* APIs when we move those to torch.library
NB: Previously we had designed
`impl_abstract` assuming a very high-level Python-only custom op API.
We've revisited that since; now, impl_abstract works for all custom ops,
no matter python or C++, no matter the schema. The new refactored design
reflects this better.
Test Plan:
- existing and new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109912
Approved by: https://github.com/ezyang