A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.
The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.
In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.
Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
This PR:
- updates forward AD codegen in core to generate code that tries calling into decompositions registered to jit when
- (1) the function is not in-place or out variant
- AND (2) the function is differentiable (requires_derivative=True)
- AND (3) there are no forward AD formulas registered
- To simplify things we always generating the if/else (as long as (1) is true), but generate 'false' when either (2) or (3) are false.
- removes the mechanism from functorch
- (follow up) some functorch tests should be updated here so they no longer have to compute the Jacobian with vjp
- factors out some logic to generate the any_has_forward_grad condition
- (bc-breaking) when TensorList inputs unexpectedly have forward grad, the error will no longer contain the name
See https://github.com/pytorch/pytorch/pull/84151#issuecomment-1238519247 for codegen output and more discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84151
Approved by: https://github.com/samdow, https://github.com/albanD, https://github.com/zou3519
Previously there was a constraint that the bdim is required to be at
the front. As I noted in the comment in the code that I wrote years ago,
this is not necessary for correctness, we were just guarding against
potentially incorrect behavior and assumed most people would not vmap
over dimensions other than 0.
Now, the above assumption did not age very well, because we have batch
rules that return a BatchedTensor where the bdim is something other than
0 (e.g. convolution batch rule).
This PR deletes the check for that assumption and adds additional manual
tests that the as_strided batching rule works when one vmaps over a dimension
other than 0.
Automatic tests don't exist because it's a bit hard to get the
test_vmap_exhaustive test runner to replicate the strides of the inputs
faithfully.
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83597
Approved by: https://github.com/samdow
No need to have a lagging op db because there are no more sync issues
between functorch and pytorch. If someone adds a new OpInfo, then we
should explicitly check if we support it or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83418
Approved by: https://github.com/samdow
I'm planning on removing functorch lagging op db because it doesn't make
sense in the context of being a part of PyTorch. Before that happens,
this PR updates it, and a future PR will delete it.
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83346
Approved by: https://github.com/samdow
This is relatively simple; we just test that `input.clone().inplace_(...)`
gives us the correct gradients while ignoring incompatible sample
inputs.
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83114
Approved by: https://github.com/Chillee
I'm not sure why I called this a hack in the first place (perhaps I
wanted to use tree_map and pytrees didn't support namedtuples?). This PR
deletes some comments and the conversion from namedtuple -> tuple
(because that is unnecessary).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83079
Approved by: https://github.com/Chillee
This includes a configuration for linux CUDA, which will give us enough
test coverage for functorch to confidently begin accepting PRs to it again.
NB: Previously it turns out that some tests were not being skipped, even
though we added a skip decorator.
Test Plan:
- wait for CI
- check that the tests being skipped with a skip decorator are actually
skipped via reading test logs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82013
Approved by: https://github.com/janeyx99
This PR:
- adds the ability to run functorch tests via run_test.py
- changes the functorch shards in PyTorch CI to invoke functorch tests
via run_test.py
The main motivation for this is so that functorch tests hook into the
standard PyTorch test infrastructure.
Questions for reviewers:
- the functorch tests are located outside of the pytorch/test folder
(they're in the pytorch/functorch/test folder). Is this OK? (run_test.py
works locally for me).
Test Plan:
- checked that `python run_test.py --functorch` ran functorch tests
locally
- Local mock test: added `{"test_compilation_for_dynamic_shape
(__main__.TestCompileCache)":
["https://github.com/pytorch/pytorch/issues/82016", ["linux"]]}` to .pytorch-disabled-tests.json, ran functorch tests, verified that the test was skipped.
- Wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82012
Approved by: https://github.com/janeyx99
This PR adds functorch shards to some more linux configurations on Pull
Requests. What's missing so far (and coming in the near future) is:
- adding a shard for windows
- adding a shard for asan (functorch currently times out under asan)
- adding shards for things that run in trunk, like mac-os.
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81919
Approved by: https://github.com/kit1980
I was really annoyed at the fact that we preallocate result
tensors for everything and then throw most of them out. New
code variant doesn't do that.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>