This PR:
- changes generate_vmap_rule to either be True or False. Previously it
could be True, False, or not set. This simplifies the implementation a
bit.
- changes the vmap staticmethod to always be on the autograd.Function
rather than sometimes defined.
This is how the other staticmethod (forward, backward, jvp) are
implemented and allows us to document it.
There are 4 possible states for the autograd.Function w.r.t. to the
above:
- generate_vmap_rule is True, vmap staticmethod overriden. This raises
an error when used with vmap.
- generate_vmap_rule is False, vmap staticmethod overriden. This is
valid.
- generate_vmap_rule is True, vmap staticmethod not overriden. This is
valid.
- generate_vmap_rule is False, vmap staticmethod not overriden. This
raises an error when used with vmap.
Future:
- setup_context needs the same treatment, but that's a bit tricker to
implement.
Test Plan:
- new unittest
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91787
Approved by: https://github.com/soulitzer
The autograd.Function <> functorch interaction is in a mostly completed
state now. There are some minor action items remaining
(https://github.com/pytorch/pytorch/issues/90224), but I want to enable
the feature by default so that PyTorch CI / other parties / etc can
begin testing to see if there is any impact on the original
autograd.Function API (there shouldn't be).
The longer-term plan for the feature flag is:
- keep it around until at least the next release (so that people can
turn off the feature if it breaks something in existing code)
- delete the flag then (either before or after the release, I haven't
decided yet)
Test Plan:
- new test
- wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91441
Approved by: https://github.com/albanD, https://github.com/soulitzer
This allows to know at any point during the backward pass what is running and where the Node currently running was created at:
```python
import torch
from torch.utils._python_dispatch import TorchDispatchMode
from torch.autograd import detect_anomaly
class MyMode(TorchDispatchMode):
def __torch_dispatch__(self, func, types, args, kwargs=None):
node = torch._C._current_autograd_node()
print(f"Running {func} from within {node}")
if node is not None:
print("The Node was created at:")
print("\n ".join(node.metadata["traceback_"]))
return func(*args, **kwargs or {})
with MyMode(), detect_anomaly():
print("FW")
a = torch.rand(10, requires_grad=True)
b = a.mul(2)
b = b.div(3)
b = b.sum()
print("BW")
b.backward()
```
Gives
```
$ python foo.py
foo.py:15: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging.
with MyMode(), detect_anomaly():
FW
Running aten.rand.default from within None
Running aten.mul.Tensor from within None
Running aten.div.Tensor from within None
Running aten.sum.default from within None
BW
Running aten.ones_like.default from within None
Running aten.expand.default from within <SumBackward0 object at 0x7fa40c0c6dc0>
The Node was created at:
File "foo.py", line 20, in <module>
b = b.sum()
Running aten.isnan.default from within <SumBackward0 object at 0x7fa40c0c6500>
The Node was created at:
File "foo.py", line 20, in <module>
b = b.sum()
Running aten.any.default from within <SumBackward0 object at 0x7fa32b23a780>
The Node was created at:
File "foo.py", line 20, in <module>
b = b.sum()
Running aten._local_scalar_dense.default from within <SumBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 20, in <module>
b = b.sum()
Running aten.div.Tensor from within <DivBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 19, in <module>
b = b.div(3)
Running aten.isnan.default from within <DivBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 19, in <module>
b = b.div(3)
Running aten.any.default from within <DivBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 19, in <module>
b = b.div(3)
Running aten._local_scalar_dense.default from within <DivBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 19, in <module>
b = b.div(3)
Running aten.mul.Tensor from within <MulBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 18, in <module>
b = a.mul(2)
Running aten.isnan.default from within <MulBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 18, in <module>
b = a.mul(2)
Running aten.any.default from within <MulBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 18, in <module>
b = a.mul(2)
Running aten._local_scalar_dense.default from within <MulBackward0 object at 0x7fa40c0c9190>
The Node was created at:
File "foo.py", line 18, in <module>
b = a.mul(2)
Running aten.detach.default from within <AccumulateGrad object at 0x7fa40c0c9730>
The Node was created at:
File "foo.py", line 18, in <module>
b = a.mul(2)
Running aten.detach.default from within <AccumulateGrad object at 0x7fa40c0c94b0>
The Node was created at:
File "foo.py", line 18, in <module>
b = a.mul(2)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90867
Approved by: https://github.com/soulitzer
Motivation
- These were previously defined in functorch. They are not
functorch-specific, so I'm moving them to torch.autograd.forward_ad and
the autograd python bindings.
- I need this to avoid some of my cyclic import problems.
Should these be public APIs? Probably. Though this needs discussion, so
punting it to the future.
Test Plan:
- moved the tests of these from test/functorch/test_eager_transforms.py
to test/test_autograd.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90240
Approved by: https://github.com/soulitzer
Adds a setup_context staticmethod to autograd.Function.
If it exists, then the user splits the ctx-specific logic from the
forward() and puts it in the setup_context staticmethod.
Docs will come later when we remove the feature flag.
Test Plan:
- some light tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89859
Approved by: https://github.com/soulitzer
This PR adds a private runtime feature flag for the feature work we're going
to do with extending autograd.Function. The motivation of the feature flag
is:
- to guard the feature against unsuspecting users
- control the release of the feature to when we are ready to release it
We might not even need the feature flag (because we hope to have the
work done in the next month), but it is good practice and it does touch
currently public API (autograd.Function).
Concretely, "autograd.Function extension" refers to:
- adding an optional `setup_context` staticmethod to autograd.Function
- adding an optional `vmap` staticmethod to autograd.Function
- autograd.Function support for functorch
Test Plan:
- new test that the feature flag works
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89858
Approved by: https://github.com/soulitzer
Preparation for the next PR in this stack: #89559.
I replaced
- `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`,
- the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and
- `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default).
There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527
Approved by: https://github.com/mruberry
Fixes: https://github.com/pytorch/pytorch/issues/88205
The `CreationMeta::NO_GRAD_MODE` path in handle_view_on_rebase wrongly assumes that the tensor would be a leaf, because tensors created in no_grad are always leaf tensors. However, due to creation_meta propagation, a view of a view created in no_grad also has `CreationMeta::NO_GRAD_MODE`, but DOES have grad_fn.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88243
Approved by: https://github.com/albanD
Re-submit of gh-72302
This still has a small performance hit, but it much smaller. On my
machine I see `_record_fucntion_exit._RecordFunction` takes 1.05 us
compared to the `Tensor` overload taking 0.79 us.
In an overall comparison, I see a 0.7 us slowdown from 6.0 us to
6.7 us for this timeit benchmark
```python
import torch
def foo():
with torch.profiler.record_function("foo"):
return torch.eye(3)
%timeit foo()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76420
Approved by: https://github.com/robieta
In this PR:
- graph_task stores graph roots on construction so that we can later traverse through the graph
- before the nodes are returned, they needed to be converted from raw_ptr to shared_ptr, and this should be OK because the graph is guaranteed to be alive
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87507
Approved by: https://github.com/albanD
`diag` was unnecessarily implemented as a kernel rather than as a composite
function, which made it unnecessarily difficult (explicit backward + all it entails).
We also change a few uses of `diag` on 2D tensors for `diagonal()`. The
latter returns a view rather than creating a new tensor.
We also upgrade its meta implementation to a fully-fledged
decomposition
I tried implementing the backwards of `diagonal()` via `diag_scatter` (or better `diag_scatter_` to keep the perf) but functionalisation was failing and I was not sure how to fix this, so I moved on. It may be possible to simplify that one as well if @soulitzer or someone knows how to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87180
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/mruberry
Big-bang PR to symintify **all** .sizes() calls in derivatives.yaml, which will be needed for symbolic tracing.
* with the exception of `split()`, which is tougher to land because it requires internal changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86610
Approved by: https://github.com/albanD
The rationale for this is that functorch doesn't work with saved
variable hooks at the moment or checkpointing and we need some way to
disable it.
Concretely:
- there's a context manager that does the disabling
- this feature is disabled on a thread-local basis
- one can set an error message or use the default error message that
says the feature has been disabled
Since it is thread local I needed to update ATen/ThreadLocalState. To
make things nicer, this PR refactors all the "saved tensors hooks"
related TLS things into a single struct.
Test Plan:
- new test
Differential Revision: [D39970936](https://our.internmc.facebook.com/intern/diff/D39970936)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85971
Approved by: https://github.com/albanD, https://github.com/soulitzer
The rationale for this is that functorch doesn't work with saved
variable hooks at the moment or checkpointing and we need some way to
disable it.
Concretely:
- there's a context manager that does the disabling
- this feature is disabled on a thread-local basis
- one can set an error message or use the default error message that
says the feature has been disabled
Since it is thread local I needed to update ATen/ThreadLocalState. To
make things nicer, this PR refactors all the "saved tensors hooks"
related TLS things into a single struct.
Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85553
Approved by: https://github.com/soulitzer
Addresses: https://github.com/pytorch/pytorch/issues/83617
This PR a way to query the TLS graph task's exec_info which is a map mapping the Node to a bool indicating whether it will be executed in the current backward pass (as determined by the inputs= argument for .grad of .backward).
- this works with both custom Function nodes and normal codegened nodes
- to be able to verify whether the pyobject passed is an actual node, we now store pointers to PyTypeObjects into a set on registration.
- error out when .backward without inputs= to avoid silently returning True
Alternatives:
- not sure if it is possible to bind to Python from a raw pointer to Node. At least we wouldn't be able to use existing logic, and the Python object should only hold a weak reference to the Node.
- other solutions to the motivating issue seem to require more extensive modification to the engine
See the issue linked for an example of usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84773
Approved by: https://github.com/albanD
Add unit tests and docstrings corresponding to PR https://github.com/pytorch/pytorch/pull/63289
UT:
1. `test_profiler_emit_itt` in `test/test_autograd.py`. This test is merely intended to catch if emit_itt breaks on construction.
2. Test `torch.profiler.itt` functions in `test/test_itt.py`
3. Only testing that emit_itt runs when `record_shapes` option is enabled in `test/test_profiler.py`.
Docstring:
1. add ITT related info into `docs/source/bottleneck.rst`
4. add `torch.profiler.itt` functions to `docs/source/profiler.rst`
5. add docstring to `torch.profiler.itt` functions in `torch/profiler/itt.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84848
Approved by: https://github.com/malfet
Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718
Approved by: https://github.com/albanD
Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`.
This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring.
My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes.
It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang.
(Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590
Approved by: https://github.com/ezyang
Per offline discussion, this will be updated to use expand once expand semantics for nested tensor have been fleshed out.
Next steps will be to add support for other features for forward sum mentioned on #82387 and likewise update the backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82625
Approved by: https://github.com/albanD
`derivatives.yaml` can now take a `dispatch` entry which registers per-autograd dispatch key derivatives such as
```
name: foo(Tensor self, Tensor y) -> Tensor
dispatch:
Default:
x: grad
y: grad.expand(y.sizes())
AutogradNestedTensor:
x: grad
y: NestedTensor_foo_backward(grad, y)
output_differentiabilty: [True]
```
However the old schema where there is no `dispatch` entry is still supported.
Would greatly appreciate feedback on *how to improve the testing strategy* of this PR, currently have registered an aten test op in TestOps.cpp with dummy gradients in derivatives.yaml and have some tests in test_autograd.py:TestAutogradMultipleDispatch but I am not sure whether these are sufficiently rigorous.
Additionally, this PR also makes the assumption that sets like [VIEW_FUNCTIONS](ff5399e528/tools/autograd/gen_inplace_or_view_type.py (L60)) are per-native-function and not per-native-function-and-dispatch-key. I'm not sure whether this is necessarily the case, *would there ever be a situation where (e.g. a nested_tensor op is a view op but the aten function is not or vice versa?)*
* __->__ #82801
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82801
Approved by: https://github.com/bhosmer, https://github.com/albanD
### Description
cudaProfilerStart and cudaProfilerStop are deprecated but exposed by torch.cuda.cudart(). HIP has corresponding functions stubbed out, hipProfilerStart and hipProfilerStop, but they return hipErrorNotSupported. Profiling in HIP is supported, but not via these deprecated APIs.
See https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__PROFILER__DEPRECATED.html.
These functions are indirectly used by one or more unit tests that would otherwise pass if the non-functional HIP APIs were replaced with a dummy function.
### Testing
Unskipped a related unit test, run by ciflow/trunk.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82778
Approved by: https://github.com/ezyang
Towards fixing https://github.com/pytorch/pytorch/issues/82482
This PR fixes two things:
## 1) memory leak
The .detach() call prevents a true memory leak in some cases where the user function is using multiple ops in a row that save their inputs. The following chain of objects keep each other alive
- the `storage` object
- a recomputed Tensor y
- y's grad_fn FooBackward (in c++)
- FooBackward's SavedVariables (in c++)
- SavedVariable Hook
- the `inner_pack` function
- captures `storage`
Since part of this cycle is in c++, the python gc is not able to break it.
Should THPCppFunction_traverse actually visit it's SavedVariables which in turn should visit their hooks? I think the answer is yes but I haven't dived into which python object is traversing what as if there is non-unique ownership of the c++ object, it makes the traversal a lot trickier. @ezyang do you think we should dive into this more?
In this case, this can be easily solved anyways by storing `y.detach()` in the `storage` object as we don't care about the temporary backward graph that gets created during the second forward call.
## 2) Lifetime of the recomputed buffers
The new storage system is now such that the lifetime of the recomputed buffer is directly linked to the SavedVariable c++ object. Meaning that this buffer will get deleted IIF the SavedVariable is cleared.
This means that we now get the exact same behavior as the version without the saved variable hook where Tensors are saved directly on the SavedVariable object.
This is great as this solves all the cases where the non-checkpoint version used to work but the checkpoint version does not (even double access or retain_graph=True).
The one drawback of this approach though is that the buffer do NOT get cleared when the user passes in `retain_graph=True`! The next backward won't even re-run the forward as it already has all the buffers available. Is this a problem that you think we would need to find a solution for @rohan-varma or it is niche enough that we don't care for now?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82776
Approved by: https://github.com/ezyang, https://github.com/rohan-varma
I don't think there's a way to avoid functions returning undefined tensors as outputs, so codegen will have to detect them before calling _set_fw_grad. Alternatively, we can just make calling _set_fw_grad with undefined self a no-op, but I'm biasing toward keeping _set_fw_grad more strict in case it is called in other areas.
Fixes https://github.com/pytorch/pytorch/issues/81111
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81114
Approved by: https://github.com/albanD
See this doc: https://docs.google.com/document/d/1KiRdnoj6B4cI3yl017hTbCqcOGO1gWIpUf20sldipHM/edit#
Two issues (1) regarding hooks in general and (2) regarding retains grad hooks are fixed, Python hooks, which rely on a different mechanism are not discussed here:
- Hooks in cpp in general
- (fixed) new hooks to registered to a newer version of the tensor no longer get applied to grad_fn
associated with older version of the tensor when the first hook was ever registered
- (unchanged) hooks registered to the older version of the tensor remain active on
- Retains grad hooks
- (fixed) now get moved to the latest grad_fn. NB: To the user, retains_grad is not considered hooks
or expected to behave like hooks (which we consider properties of the grad_fn) vs retains_gradness
which is a property of the tensor.
- (not in this PR) Python hooks
- (will fix) same issue as hooks in cpp where new hooks are being applied to grad_fn associated
with the older version of the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79996
Approved by: https://github.com/albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77696https://github.com/pytorch/pytorch/pull/63619 added a RECORD_FUNCTION guard to make calls to `Engine::evaluate_function` visible regardless of the underlying op. While useful, this creates a call that looks like a forward call that somewhat complicates stitching forward and backward ops. I don't want to add complexity (and therefore work) on the hot path; instead it's fairly straightforward to stitch things back together in post. This PR simply propagates sequence number and forward tid info up to the `evaluate_function` event.
Differential Revision: [D36302562](https://our.internmc.facebook.com/intern/diff/D36302562/)
Approved by: https://github.com/aaronenyeshi
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76253
We're observing large QPS regression on the original PR https://github.com/pytorch/pytorch/pull/72302. For the training job we had, it regressed from 720k QPS to 450k QPS (see the test plan in FB internal). We suspect this is because the api was changed from `_record_function_enter` to `_record_function_enter_new`, and we're running experiments to confirm that. Will add more details when the runs in the test plan has finished. For now, it's better to revert the diff to unblock internal usecases and we can think about how to reland this diff later.
Original commit changeset: dc9939f1fa6d
Original Phabricator Diff: D35257354
Test Plan:
on trunk: f338665947
with this diff: f338502850
Reviewed By: malfet, robieta
Differential Revision: D35853300
fbshipit-source-id: dd38042aeacb848f66756491a4c849c7c652a0e1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71708
In Python 3.2, a number of asserts were deprecated.
In Python 3.11, these asserts are deleted completely. The files in this change still use the deprecated asserts.
Switch over to the supported syntax for 3.2 onwards.
Test Plan: Tested on the internal test suite runner.
Reviewed By: ajtulloch
Differential Revision: D33503694
fbshipit-source-id: a150f296033260acf8365d77b837ce0679f57361
(cherry picked from commit abf60ed97409265222915d8265aaabedd625fd93)
Summary:
Description of the new behavior is in PythonFallbackKernel.cpp.
The updated test makes sure that we only call alias on the first Tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73925
Reviewed By: samdow
Differential Revision: D34862940
Pulled By: albanD
fbshipit-source-id: 4d020e41c8bb8b10262dcafd524e84a5ad4d7af0
(cherry picked from commit 0aa6b56dbd3dcee830453fb02cd6c83ab7a8be06)
Summary:
Minimal example that deadlocks before but not after:
```python
import torch
from torch.autograd import Function
class Foo(Function):
staticmethod
def forward(ctx, x):
return x.clone()
staticmethod
def forward(ctx, gO):
return gO.clone()
def get_out():
inp = torch.rand(2, requires_grad=True)
# The python function is first so that it runs
# last in the backward pass
right = Foo.apply(inp)
# An op that creates new memory
left1 = inp.clone()
# An op that saves its input
left2 = left1 ** 2
# Inplace modify so that the backward for
# left2 always raises an error
left1 += 1
# An op that takes both side as input.
# After running, both side's last op will be in
# the ready queue
# And the op for left will run first as it was
# executed last during the forward
out = left2 + right
return out
# Nothing should be global variables here as, from what
# I can see, python leaks all the global objects
get_out().sum().backward()
```
Since this requires the python interpreter to die, it is hard to test in CI.
Let me know if you have an idea how to do it though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73961
Reviewed By: malfet
Differential Revision: D34752747
Pulled By: albanD
fbshipit-source-id: 1a537b1f733e161e8d3ff053cd432b37b34d432a
(cherry picked from commit 17943e4c04c782d81deab439e010195f04e75bbd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72301
First step in resolving #35026.
This adds `PythonRecordFunction` which is a `torch::CustomClassHolder`
for `at::RecordFunction` to keep the ATen code free of torch includes.
And adds new unused internal API functions
`_record_function_enter_new` which return the torchbind object.
Once the FC period is expired, `torch.profiler.record_function` will
be updated to use this new internal API. Then once BC period is
expired, the cpp_custom_type_hack-based API can be removed.
Test Plan: Imported from OSS
Reviewed By: dagitses
Differential Revision: D34586311
Pulled By: robieta
fbshipit-source-id: d3eb9ffad7b348548a2b22c75203a92d1cb5115b
(cherry picked from commit 92d2ca808e5fbd20c9d6645dcabc3f059f9ef2d3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72688
Refactor how we know what to run on the cpu queue.
The Lazy Tensor moved there as it is always present as a device guard and would make the number of devices 1 all the time (forcing the creation of a thread).
FYI wconstab you most likely don't care about this unless you ever use multiple Lazy device?
This should slightly improve the perf if you run backward with Lazy Tensors as the work will be done in the main thread and not a worker thread.
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D34180245
Pulled By: albanD
fbshipit-source-id: 88c5d5bdd631ad01bf271d720d1eab69aba84fc0
(cherry picked from commit da7e9b902f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72008
Fixes #71119
Technically BC-breaking because when an input does not require grad, previously it was returned as-is instead of a view because it didn't need to. Now we will also return a view in that case (whether or not forward AD runs).
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33859553
Pulled By: soulitzer
fbshipit-source-id: 81b3fa371f4c0904630878500aa190492c562367
(cherry picked from commit ee74bc8234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71707
Why?
- detach should behave like jax.stop_gradient in functorch. Because it
does not detach all the way through, functorch (as well as a Tensor
Subclass wrapping a Tensor subclass) won't see it after the first
layer/subclass handles it.
How?
- This PR changes detach to dispatch all the way through to the backend.
- This PR also modifies native::detach to call shallow_copy_and_detach
instead of native::alias. This is because today, the semantics of detach
and alias are differently -- they differ only by
allow_tensor_metadata_change. In the future, we may choose to deprecate
this flag.
- NB: Before and after this PR, detach() shows up twice in
torch_dispatch: https://github.com/pytorch/pytorch/issues/71725. This is
not a regression so I didn't want to fix it in this PR because it is
weird to fix.
Test Plan: - added new tests; run existing tests
Reviewed By: albanD
Differential Revision: D33752860
Pulled By: zou3519
fbshipit-source-id: 40cc2dc8232e75a02586a4ba5b0ef5f16cb76617
(cherry picked from commit f88aae426e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69827
In general, the current pattern allows for implementing optimisations
for all the backends in a common place (see for example the optimisation
for empty matrices).
After this PR, `torch.svd` is implemented in terms of `linalg.svd` and
`linalg.svdvals`, as expected. This makes it differentiable in the case
when `compute_uv=False`, although this is not particularly important, as
`torch.svd` will eventually be deprecated.
This PR also instantiates smaller `U` / `V` when calling cusolver_gesvdj
in the cases when `full_matrices=False` or `compute_uv=False`.
The memory for auxiliary `U` and `V` in the cases above, needed for some
cuSOLVER routines is allocated raw allocators rather than through fully
fledged tensors, as it's just a blob of memory the algorithm requests.
As the code is better structured now, it was easier to see that `U` and
`Vh` needn't be allocated when calling `svd_cusolver_gesvd`.
Now `linalg.svdvals` work as expected wrt the `out=` parameter.
Note that in the test `test_svd_memory_allocation` we were
passing a tensor of the wrong size and dtype and the test seemed to
pass...
This PR also changes the backward formula to avoid saving the input
matrix, as it's not necessary. In a follow up PR, I will clean the
backward formula and make it more numerically stable and efficient.
This PR also does a number of memory optimisations here and there, and fixes
the call to cusolver_gesvd, which were incorrect for m <= n. To test
this path, I compiled the code with a flag to unconditionally execute
the `if (!gesvdj_convergence_check.empty())` branch, and all the tests
passed.
I also took this chance to simplify the tests for these functions in
`test_linalg.py`, as we had lots of tests that were testing some
functionality that is already currently tested in the corresponding
OpInfos. I used xwang233's feature to test both MAGMA and CUDA
backends. This is particularly good for SVD, as cuSOLVER is always
chosen over MAGMA when available, so testing MAGMA otherwise would be
tricky.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D33751983
Pulled By: mruberry
fbshipit-source-id: 11d48d977946345583d33d14fb11a170a7d14fd2
(cherry picked from commit a1860bd567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71569
Not sure if this is the right API
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33695395
Pulled By: soulitzer
fbshipit-source-id: 652b5758f15d901f98ff0da94e977030c7f3415b
(cherry picked from commit 9421a6846a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71531
Based on the comment above the original internal assert, this is the desired check.
1. Don't error, and automatically make jvp return a view for that tensor output (this is easier than I originally thought: https://github.com/pytorch/pytorch/pull/71531#discussion_r789211877)
2. Error (currently doing)
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33695399
Pulled By: soulitzer
fbshipit-source-id: dba49890a55ad1dd59ed5c41faa96bf7cfc9e562
(cherry picked from commit fdb0f266f5)
Summary:
When default hooks are set, they are pushed onto a stack.
When nesting context-manager, only the inner-most hooks will
be applied.
There is special care needed to update the TLS code. See also https://github.com/pytorch/pytorch/issues/70940 (i.e. do we need to be storing the enabled flag as well?)
Fixes https://github.com/pytorch/pytorch/issues/70134
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70932
Reviewed By: mruberry
Differential Revision: D33530370
Pulled By: albanD
fbshipit-source-id: 3197d585d77563f36c175d3949115a0776b309f4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68631
This PR:
- Adds the check that the storage numel of the base and tangent tensors are the same. This is to support the case when as_strided reveals elements that aren't indexable by the input tensor.
- Skips the check when batched tensors are involved, because using as_strided to reveal elements that not indexable by the input tensor is already not allowed vmap.
- Adds tests for the above two cases, as well as an edge case regarding conj bit (what about neg bit?)
For functorch:
- we need to copy the batching rule implemented here
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D32899678
Pulled By: soulitzer
fbshipit-source-id: 54db9550dd2c93bc66b8fb2d36ce40799ebba794
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69592
Currently, forward AD function for`copy_` (in `VariableTypeManual`) does not handle the broadcasting case. ~EDIT: but that is not a design decision, not a bug. In this PR, we make that clear as a comment.~
Note: `broadcast_to` does not have a batching rule in core, so the ops that rely on `copy_` to broadcast will still fail batched forward grad computation.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33020603
Pulled By: soulitzer
fbshipit-source-id: 09cb702bffc74061964a9c05cfef5121f8164814
Summary:
This fixes the case when `torch.inference_mode` is called with `mode=False` (disabled). When used as a decorator, it ignored the argument and enabled inference mode anyway.
`_DecoratorContextManager` is changed so that a new instance is a copy instead of a new instance with default parameters.
I also added more tests to cover this case.
Current behaviour:
```python
>>> import torch
>>> x = torch.ones(1, 2, 3, requires_grad=True)
>>> torch.inference_mode(mode=False)
... def func(x):
... return x * x
...
>>> out = func(x)
>>> out.requires_grad
False
```
New behaviour (fixed):
```python
>>> import torch
>>> x = torch.ones(1, 2, 3, requires_grad=True)
>>> torch.inference_mode(mode=False)
... def func(x):
... return x * x
...
>>> out = func(x)
>>> out.requires_grad
True
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68617
Reviewed By: mrshenli
Differential Revision: D32958434
Pulled By: albanD
fbshipit-source-id: 133c69970ef8bffb9fc9ab5142dedcffc4c32945
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68630
Constraints:
1) (functorch) if all the inputs to an op have requires_grad=False and don't have tangents, then their VariableType
kernel should be a no-op i.e., behave like a redispatch. This is due to functorch's DynamicLayerStack
having the autograd key by default (which is so that transformations like vmap) still work with autograd
2) (inference mode) inference tensors in inference mode will call straight into the kernel, we should still do something sensible
inside even if we normally wouldn't redispatch into it.
3) ~Should support potential application of interposition below autograd: `nn.Parameter` is a example of subclassing where the subclass
is not preserved when an operation is performed. There is an exception though: we want calling `make_dual` on a
`nn.Parameter` to preserve its parameterness.~
4) Should avoid calls to shallow_copy_and_detach to avoid spurious calls into `__python_dispatch__`.
This PR:
- does not redispatch to `make_dual` from its `ADInplaceOrView` kernel to satisfy (1)
- calls into `alias` from the kernel in the native namespace so that behavior is consistent with other views in inference mode to satisfy (2)
- discussion of (3). We still wouldn't be able to directly override `make_dual` below autograd. In this PR, instead of not redispatching at all, we choose to redispatch into `at::alias` so that one can override `make_dual`. The side effect is that one would not be able to distinguish calls between the two, which can be problematic (though a straightforward but hacky solution would be to create a new `at::alias_for_make_dual` that would allow users to distinguish) the two. This isn't ideal but seems to be the simplest way to satisfy (3). We don't pursue that hacky solution here.
- (4) is satisfied because we remove calls to `shallow_copy_and_detach`
<details>
<summary> A potentially less hacky but more involved solution? (WIP) </summary>
Realizing that make_dual is more like requires_grad, perhaps it shouldn't be autograd explicit? Make make_dual a composite or python-only construct. i.e., it would be a view on the primal followed by something to the effect of primal.set_fw_grad(tangent).
Additional constraints:
5) make_dual needs to be backward-differentiable (I can't think of any applications yet becuase
technically as a high-order function, jvp's input is the tangent only, "detach" is not applied on
the tangent, so one would still be able to propagate gradients through it).
6) set_fw_grad needs to raise an error if there is a layout mismatch and base is a forward-differnentiable view
Possible plan
- (6) implies that a plain view would not suffice. We need a `detach`-like operation to ensure that set_fw_grad
knows the view is not forward differentiable.
- (5) implies that is this (new) `detach` would need to be backward differentiable (API TBD).
- (3) is no longer relevant because make_dual is no longer autograd explicit, but perhaps this new detach should behave like the current one? There is a lot of logic to replicate for detach, so this may be hard.
- (1) is satisfied if we use current detach logic, i.e., , and (4) is trivial.
I'm not convinced that this is the right solution either, because in the end does (3) still work?
</details>
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32899679
Pulled By: soulitzer
fbshipit-source-id: 98e13ae954e14e1e68dbd03eb5ab3300d5ed2c5e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69508
Original Phabricator Diff: D32704467 (e032dae329)
Reland, fix is to not test traditional checkpoint when input does not require grad as that is unsupported as documented.
Original PR body:
Resubmission of https://github.com/pytorch/pytorch/pull/62964 with the
suggestions and tests discussed in
https://github.com/pytorch/pytorch/issues/65537.
Adds a `use_reentrant=False` flag to `checkpoint` function. When
`use_reentrant=True` is specified, a checkpointing implementation that uses
SavedVariableHooks instead of re-entrant autograd is used. This makes it more
composable with things such as `autograd.grad` as well as DDP (still need to
add thorough distributed testing).
As discussed in https://github.com/pytorch/pytorch/issues/65537, the tests that we need to add are:
- [x] Gradient hooks are called once
- [x] works when input does require grads but Tensor that require grads are captures (like first layer in a nn)
- [x] works for functions with arbitrary input/output objects
- [x] distributed tests (next PR)
Note that this is only for `torch.utils.checkpoint`, if this approach overall looks good, we will do something similar for `checkpoint_sequential`.
ghstack-source-id: 144948501
Test Plan: CI
Reviewed By: zhaojuanmao
Differential Revision: D32902634
fbshipit-source-id: 2ee87006e5045e5471ff80c36a07fbecc2bea3fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69027
Resubmission of https://github.com/pytorch/pytorch/pull/62964 withe
suggestions and tests discussed in
https://github.com/pytorch/pytorch/issues/65537.
Adds a `use_reentrant=False` flag to `checkpoint` function. When
`use_reentrant=True` is specified, a checkpointing implementation that uses
SavedVariableHooks instead of re-entrant autograd is used. This makes it more
composable with things such as `autograd.grad` as well as DDP (still need to
add thorough distributed testing).
As discussed in https://github.com/pytorch/pytorch/issues/65537, we have added
the following tests:
-[ ] Gradient hooks are called once
ghstack-source-id: 144644859
Test Plan: CI
Reviewed By: pbelevich
Differential Revision: D32704467
fbshipit-source-id: 6eea1cce6b935ef5a0f90b769e395120900e4412
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67041
Original PR here: https://github.com/pytorch/pytorch/pull/62246 (The old PR does more things, but now that's split across this stack)
This PR:
- Adds "jacfwd" and "hessian_fwdrev"
- Modifies existing tests to also test the `forward_ad=True` case
Test Plan: Imported from OSS
Reviewed By: gchanan, zou3519
Differential Revision: D32314424
Pulled By: soulitzer
fbshipit-source-id: 785b0e39162b93dc3b3cb9413233447152eddd53
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66294
In this PR:
- OpInfo for forward AD now checks batched forward grad when `op.check_batched_grad=True`
- Adds setting to disable the test for individual ops `check_batched_forward_grad` and disable for the ops here: https://github.com/pytorch/pytorch/issues/66357
Fixes some more failures:
- Make Forward AD metadata less strict by allowing stride to differ when size is 1
- Fix sum batching rule when logical tensor is a scalar and dim is unspecified
- Batching rule for `_reshape_alias`
- ~Batching rules now preserve storage offset for view operator that return non-zero storage offset~ (moved to previous PR)
Test Plan: Imported from OSS
Reviewed By: zou3519, albanD
Differential Revision: D31842020
Pulled By: soulitzer
fbshipit-source-id: 3517a8fb9d6291fccb53c0b1631eab5bbb24ebd1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66292
In this PR:
1. Fix the case when tangent has a different layout from the base when `set_fw_grad` by adding a native function and its batching rule.
For (1) we replace the following:
```
Tensor new_with_same_meta(const Variable& base) {
int64_t nelement_in_storage = base.storage().nbytes() / base.itemsize();
auto new_tensor = at::zeros({nelement_in_storage}, base.options());
auto res = new_tensor.as_strided(base.sizes(), base.strides(), base.storage_offset());
return res;
}
```
with a native function as to enable a batching rule to alter its behavior.
This new function will be similar to `new_zeros_strided` except we also require the `storage_offset` and `storage_numel` arguments.
Possible concerns:
- Why have redundant logic? Why not add new args `new_zeros_strided`? This is probably a niche use case, so it's better not to complicate the current API.
- Previously the created tensor inherits the TensorOptions of the primal. Now we inherit from the TensorOptions of the tangent.
- Probably fine. Likely, no one relies on this because the behavior is only triggered when tangent/base have different layouts.
- Why pass in exploded size, stride, and offset? It is possible in the non-batched case to pass in a tensor directly, but not possible when we'd like to have a batching rule. The size, stride, and offset we'd be passing won't belong to any live tensor.
Test Plan: Imported from OSS
Reviewed By: zou3519, albanD
Differential Revision: D31842019
Pulled By: soulitzer
fbshipit-source-id: a58433d814fd173bc43a2c550b395377dba40de2
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67367
- Adds check to make sure forward grad itself does not have forward grad at the same level
- Verify with `python test/test_ops.py -k test_forward_mode_AD_linalg_eigh_cpu_float64` that it fails the check before, but passes after the codegen update
Before:
```
if (_any_has_forward_grad_eigenvalues) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
auto eigenvalues_new_fw_grad = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors);
if (eigenvalues_new_fw_grad.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
eigenvalues._set_fw_grad(eigenvalues_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
}
}
if (_any_has_forward_grad_eigenvectors) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
auto eigenvectors_new_fw_grad = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors);
if (eigenvectors_new_fw_grad.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
eigenvectors._set_fw_grad(eigenvectors_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
}
}
```
After:
```
c10::optional<at::Tensor> eigenvalues_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_eigenvalues) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
eigenvalues_new_fw_grad_opt = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors);
}
c10::optional<at::Tensor> eigenvectors_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_eigenvectors) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
eigenvectors_new_fw_grad_opt = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors);
}
if (eigenvalues_new_fw_grad_opt.has_value() && eigenvalues_new_fw_grad_opt.value().defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
eigenvalues._set_fw_grad(eigenvalues_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
if (eigenvectors_new_fw_grad_opt.has_value() && eigenvectors_new_fw_grad_opt.value().defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
eigenvectors._set_fw_grad(eigenvectors_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68535
Reviewed By: ngimel
Differential Revision: D32536089
Pulled By: soulitzer
fbshipit-source-id: a3f288540e2d78a4a9ec4bd66d2c0f0e65dd72cd
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67800
Currently when the grad is the same layout as base, we try to assign the same tensor to the forward grad of both the base and the view. However, when the layout of the grad is different from the layout of the view, this triggers a copy to be created, and the tangent of the view (after the inplace) will not have a view relationship with the view of the base.
This PR just changes it so that we only do the above optimization when the layout also matches the layout of self
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67816
Reviewed By: malfet
Differential Revision: D32190021
Pulled By: soulitzer
fbshipit-source-id: b1b2c9b332e83f4df5695ee9686ea76447f9305b
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/66066
This PR:
- cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality
- tests related to an operator are better colocated
- see the tracker for details
What to think about when moving tests to their correct test suite:
- naming, make sure its not too generic
- how the test is parametrized, sometimes we need to add/remove a device/dtype parameter
- can this be merged with existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413
Reviewed By: jbschlosser, albanD
Differential Revision: D32031480
Pulled By: soulitzer
fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66291
In this PR:
- Trivial batching rules for `make_dual` and `is_same_size` that enable forward ad + vmap functionality
- Adds a check in gradcheck that is performed when both `check_batched_grad` and `check_forward_ad` are `True` (an OpInfo using this is added later in the stack).
- Tests for the gradcheck functionality
- Tests that basic out-of-place op works
Test Plan: Imported from OSS
Reviewed By: albanD, saketh-are
Differential Revision: D31842018
Pulled By: soulitzer
fbshipit-source-id: 84b18d9a77eeb19897757e37555581f2a9dc43d8
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61926
1. update the `if` to just use requires_derivative since that should reflect when function is not differentiable
2. if `requires_derivative=True` but no outputs have forward derivatives, we should error as usual
3. ~In the future we may also want to handle the case~ when `len(fw_derivatives) > 0 and len(fw_derivatives) < num_diff_outputs` we should add assert in codegen that this does not happen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66926
Reviewed By: anjali411
Differential Revision: D31810736
Pulled By: soulitzer
fbshipit-source-id: 11a14477cc7554f576cff2ed1711a448a8c6a66a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181
This PR replaces all the calls to:
- `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python
- `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python.
It also simplifies two pieces of code, and fixes one bug where a pair
of parentheses were missing in the function `make_symmetric_matrices`.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D31692896
Pulled By: anjali411
fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50209
This adds a new warning handler that stores all warnings in a shared
queue, which can be "replayed" at a later time and, crucially, on
another thread. Then, I use this inside the autograd engine to ensure
that warnings are processed by the handler registered on the main
thread.
For testing, I also add an operator that always warns in the backward
pass and test that the warning is a normal Python warning.
cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66235
Reviewed By: ejguan
Differential Revision: D31505413
Pulled By: albanD
fbshipit-source-id: 1a7f60b038f55c20591c0748b9e86735b3fec2f9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65564
- wrap the call into engine with vmap if `batched_grad` is `True`
- improves the comment on the call to engine (somewhat addressing https://github.com/pytorch/pytorch/issues/41659)
- borrows the message from functional.jacobian's vectorized argument concerning usage of the vmap feature
- adds basic test (further testing is done when we replace the usage in vectorized jacobian computation)
TODO:
- create an issue tracking this
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D31236259
Pulled By: soulitzer
fbshipit-source-id: b33e6b26ea98fa9f70c44da08458fc54ba4df0f7
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64999
- Adds a flag to gradcheck `check_backward_ad` that can be used to disable gradcheck for backward ad
- This is a bit bc-breaking in terms of positional args, but I prefer this ordering
- In OpInfo tests for forward ad:
- set `check_backward_ad` False
- In test_ops treat `supports_autograd` as if it is `supports_backward_ad` (it basically already is)
- the only modification needed is to no longer skip forward ad tests if `supports_autograd` is false
- test_dtype, test_variant_consistency, etc behave correctly as-is
- In a follow-up PR, we can rename it to actually be `supports_backward_ad`
- Testing
- https://github.com/pytorch/pytorch/pull/65060
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65040
Reviewed By: albanD
Differential Revision: D31238177
Pulled By: soulitzer
fbshipit-source-id: f068d4cbe7ffb094930b16cddb210583b9b7b2c4
Summary:
OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261
- Eliminate duplicated testing logic in test_autograd
- Moved tests that rely on this testing logic to use OpInfos
- `cat` already has OpInfo (no action needed)
- Created OpInfo for `block_diag` and `broadcast_tensors`
Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997
Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993
Reviewed By: jbschlosser
Differential Revision: D30961736
Pulled By: soulitzer
fbshipit-source-id: e169305384a683acae1178c4e12e9e214a67226a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64813
Raises a TypeError when assigned value to a grad is not a Tensor or
None.
Adds tests.
cc ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64876
Reviewed By: anjali411
Differential Revision: D30901678
Pulled By: soulitzer
fbshipit-source-id: dbb3cb5fd0bbac6918e0b2e2f51d340daa43dee0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554
Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:
1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.
We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D30662206
Pulled By: mruberry
fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61767
## Changes
- [x] Add `torch.concat` alias to `torch.cat`
- [x] Add OpInfo for `cat`/`concat`
- [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`)
- [x] `cat`/`concat`
- [x] `stack`
- [x] `hstack`
- [x] `dstack`
- [x] `vstack`/`row_stack`
- [x] Remove redundant tests for `cat`/`stack`
~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~
**Edit**: cat/concat OpInfo has been added.
**Note**: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch.
Thanks to krshrimali for guidance on my first PR :))
cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560
Reviewed By: saketh-are
Differential Revision: D30762069
Pulled By: mruberry
fbshipit-source-id: 6985159d1d9756238890488a0ab3ae7699d94337
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64261
Note that this does not preserve byte-for-byte compatibility with
existing names.
Test Plan:
* Rely on CI to catch gross errors.
* Merge after release cut to catch subtle issues.
Reviewed By: albanD
Differential Revision: D30700647
Pulled By: dagitses
fbshipit-source-id: 7b02f34b8fae3041240cc78fbc6bcae498c3acd4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63400
This is the first step to break up test_autograd.py for #63205.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30541499
Pulled By: dagitses
fbshipit-source-id: 8d9d32007938b9eade0e88f95a6a3190e7e2ef01
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63619
Adds a RECORD_FUNCTION with the function that is being valuate as part
of backwards execution. This has been useful in picking up some operations
in the backwards pass that otherwise would not show up, for example custom cpp
functions that use custom C++ code.
ghstack-source-id: 137041723
Test Plan:
CI
benchmark:
buck run mode/opt //scripts/rvarm1/ddp:bench
Reviewed By: albanD
Differential Revision: D30439492
fbshipit-source-id: 955917770cdf2a2edb0303223ace710b668ba388
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63324
Fix for https://www.internalfb.com/tasks/?t=98258963
`catch_warnings` seem to only trigger once in certain cases where it
should trigger twice.
This test is only meant to test whether hooks are trigger / not trigger,
so changing it to self.assertGreater is ok.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30340833
Pulled By: Varal7
fbshipit-source-id: 1bfb9437befe9e8ab8f95efe5f513337fa9bdc5c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62909
This PR makes saved tensors default hooks thread local.
This allows using default hooks in a multithreaded context.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30165416
Pulled By: Varal7
fbshipit-source-id: 10a7d580661d3d94bdaf398c4e076b7bea11c16b
Summary:
When using saved tensors hooks (especially default hooks),
if the user defines a `pack_hook` that modifies its input,
it can cause some surprising behavior.
The goal of this PR is to prevent future user headache by catching
inplace modifications of the input of `pack_hook` and raising an error if
applicable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62717
Reviewed By: albanD
Differential Revision: D30255243
Pulled By: Varal7
fbshipit-source-id: 8d73f1e1b50b697a59a2849b5e21cf0aa7493b76
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61931
This PR consolidates the profiling code around a new C++ implementation
(profiler_kineto.h/cpp) and uses it unconditionally from
torch.autograd.profiler/torch.profiler:
1. Always use profiler_kineto.h/cpp as the C++ implementation
2. Simplify profiler.py to remove unneeded parts depending on legacy
impl
3. Move some of the legacy logic into profiler_legacy.py (to be fully
deleted later)
Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake
python test/test_profiler.py -v
USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake
python test/test_profiler.py -v
Imported from OSS
Reviewed By: gdankel
Differential Revision: D29801599
fbshipit-source-id: 9794d29f2af38dddbcd90dbce4481fc8575fa29e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61928Fix#57100.
Creates a function `torch.autograd.graph.set_save_on_cpu_hooks()` which can be used to register default hooks under which all tensors saved during the forward pass are actually copied* to cpu, then copied back to the appropriate device for the backward pass.
*If the tensor was already on cpu, the entire operation is a no op.
If the tensor is on GPU, we copy the tensor to `pin_memory` during packing so that the unpacking can be done asynchronously.
See [benchmark](https://github.com/pytorch/pytorch/pull/61928#issuecomment-885089279) and [note about training large models](https://github.com/pytorch/pytorch/pull/61928#issuecomment-887009448)
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D29848526
Pulled By: Varal7
fbshipit-source-id: 3d289cddd4fa377bd4884ba0d569fa47c777d9e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62563
Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks().
These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed.
Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927.
A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor.
For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.:
```
def pack(x):
name = os.path.join(tmp_dir, str(uuid.uuid4()))
torch.save(x, name)
return name
def unpack(name):
return torch.load(name)
```
Relanding previous PR: https://github.com/pytorch/pytorch/pull/61834
Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc
Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98
The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`.
Test Plan: Imported from OSS
Reviewed By: iramazanli
Differential Revision: D30045405
Pulled By: Varal7
fbshipit-source-id: 7f6c07af3a56fe8835d5edcc815c15ea4fb4e332
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61834
Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks().
These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed.
Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927.
A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor.
For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.:
```
def pack(x):
name = os.path.join(tmp_dir, str(uuid.uuid4()))
torch.save(x, name)
return name
def unpack(name):
return torch.load(name)
```
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D29792193
Pulled By: Varal7
fbshipit-source-id: 33e931230ef59faa3ec8b5d11ef7c05539bce77c
Summary:
This PR un-reverts https://github.com/pytorch/pytorch/issues/61475 + fixes compilation with MSVC, that does not recognize alternative operator spellings (i.e. using `or` instead of `||` )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61937
Reviewed By: albanD
Differential Revision: D29805941
Pulled By: malfet
fbshipit-source-id: 01e5963c6717c1b44b260300d87ba0bf57f26ce9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60021
Dropping the imaginary component is expected and gives the correct gradient
formula, so silencing the warning is appropriate.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D29589371
Pulled By: mruberry
fbshipit-source-id: 73e1511cae69207dc9abe576e2769ee1d03f1bbd
Summary:
Partially addresses https://github.com/pytorch/pytorch/issues/49825 by improving the testing
- Rename some of the old tests that had "inplace_view" in their names, but actually mean "inplace_[update_]on_view" so there is no confusion with the naming
- Adds some tests in test_view_ops that verify basic behavior
- Add tests that creation meta is properly handled for no-grad, multi-output, and custom function cases
- Add test that verifies that in the cross dtype view case, the inplace views won't be accounted in the backward graph on rebase as mentioned in the issue.
- Update inference mode tests to also check in-place
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59891
Reviewed By: albanD
Differential Revision: D29272546
Pulled By: soulitzer
fbshipit-source-id: b12acf5f0e3f788167ebe268423cdb58481b56f6
Summary:
The grad() function needs to return the updated values, and hence
needs a non-empty inputs to populate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016
Test Plan:
Passes Python and C++ unit tests, and added new tests to catch this behavior.
Fixes https://github.com/pytorch/pytorch/issues/47061
Reviewed By: albanD
Differential Revision: D26406444
Pulled By: dagitses
fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a
Summary:
We only set the value and not the actual VC.
This means that in the context of double backward, if that saved tensor is saved again and the original Tensor is modified inplace, we would not detect it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60195
Reviewed By: Varal7
Differential Revision: D29208766
Pulled By: albanD
fbshipit-source-id: 81175f8e3f111f89524f8e46f47577b2ea4fc945
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4661
- Add warnings in engine's `execute` function so it can be triggered through both cpp and python codepaths
- Adds an RAII guard version of `c10::Warning::set_warnAlways` and replaces all prior usages of the set_warnAlways with the new one
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59412
Reviewed By: jbschlosser
Differential Revision: D28969294
Pulled By: soulitzer
fbshipit-source-id: b03369c926a3be18ce1cf363b39edd82a14245f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59483
... for functions that are not implemented
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D28933806
fbshipit-source-id: dadae1af6609f15419cf0f47a98361dc87dff849
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987
Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype:
Here's a summary of the changes in this PR:
This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose).
1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor.
2. NEW API:
a) `.conj()` -- now returning a view.
b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory.
c) `.conj_physical_()`, and `out=` variant
d) `.resolve_conj()` -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0.
e) `.resolve_conj_()` in-place version of (d)
f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors.
g) `view_as_real` -- existing function, but now errors out on conjugated tensors.
3. Conjugate Fallback
a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor.
b) This fallback is well equipped to handle the following cases:
- functional operation e.g., `torch.sin(input)`
- Mutable inputs and in-place operations e.g., `tensor.add_(2)`
- out-of-place operation e.g., `torch.sin(input, out=out)`
- Tensorlist input args
- NOTE: Meta tensors don't work with conjugate fallback.
4. Autograd
a) `resolve_conj()` is an identity function w.r.t. autograd
b) Everything else works as expected.
5. Testing:
a) All method_tests run with conjugate view tensors.
b) OpInfo tests that run with conjugate views
- test_variant_consistency_eager/jit
- gradcheck, gradgradcheck
- test_conj_views (that only run for `torch.cfloat` dtype)
NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit.
Follow up work:
1. conjugate view RFC
2. Add neg bit to re-enable view operation on conjugated tensors
3. Update linalg functions to call into specialized functions that fast path with the hermitian operation.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D28227315
Pulled By: anjali411
fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f
Summary:
Adds `is_inference` as a native function w/ manual cpp bindings.
Also changes instances of `is_inference_tensor` to `is_inference` to be consistent with other properties such as `is_complex`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58729
Reviewed By: mruberry
Differential Revision: D28874507
Pulled By: soulitzer
fbshipit-source-id: 0fa6bcdc72a4ae444705e2e0f3c416c1b28dadc7
Summary:
There are two main changes here:
- THPVariable will actually visit their grad_fn if there are no other reference to the c++ Tensor and no other reference to the grad_fn. The critical observation compared to the existing comment (thanks Ed!) is that if we also check that the c++ Tensor object is not referenced somewhere else, we're sure that no one can change the grad_fn refcount between the traverse and the clear.
- THPVariable don't need a special clear for this new cases as we're the only owner of the c++ Tensor and so the cdata.reset() will necessarily free the Tensor and all its resources.
The two tests are to ensure:
- That the cycles are indeed collectible by the gc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58271
Reviewed By: ngimel
Differential Revision: D28796461
Pulled By: albanD
fbshipit-source-id: 62c05930ddd0c48422c79b03118db41a73c1355d
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57679
##### Release Notes
This is part of the end of the deprecation of inplace/view:
- `detach_` will now raise an error when invoked on any view created by `split`, `split_with_sizes`, or `chunk`. You should use the non-inplace `detach` instead.
- The error message for when an in-place operation (that is not detach) is performed on a view created by `split`, `split_with_size`, and `chunk` has been changed from "This view is **an** output of a function..." to "This view is **the** output of a function...".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58285
Reviewed By: bdhirsh
Differential Revision: D28441980
Pulled By: soulitzer
fbshipit-source-id: e2301d7b8cbc3dcdd328c46f24bcb9eb7f3c0d87
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56608
- Adds binding to the `c10::InferenceMode` RAII class in `torch._C._autograd.InferenceMode` through pybind. Also binds the `torch.is_inference_mode` function.
- Adds context manager `torch.inference_mode` to manage an instance of `c10::InferenceMode` (global). Implemented in `torch.autograd.grad_mode.py` to reuse the `_DecoratorContextManager` class.
- Adds some tests based on those linked in the issue + several more for just the context manager
Issues/todos (not necessarily for this PR):
- Improve short inference mode description
- Small example
- Improved testing since there is no direct way of checking TLS/dispatch keys
-
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58045
Reviewed By: agolynski
Differential Revision: D28390595
Pulled By: soulitzer
fbshipit-source-id: ae98fa036c6a2cf7f56e0fd4c352ff804904752c
Summary:
Port addmm to structure kernel
Follow ups
- migrate `mm` and `addbmm` to structure
- move TORCH_CHECKS currently in `addmm_cpu_impl_` and `addmm_out_cuda_impl` to meta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57417
Reviewed By: bdhirsh
Differential Revision: D28291001
Pulled By: walterddr
fbshipit-source-id: 4eafaa30a465e225fbb4d2a69a36f1e037df9122
Summary:
This one had a tricky usage of `torch.symeig` that had to be replaced. I tested the replacement locally though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57732
Reviewed By: bdhirsh
Differential Revision: D28328189
Pulled By: mruberry
fbshipit-source-id: 7f000fcbf2b029beabc76e5a89ff158b47977474
Summary:
Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method.
However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`,
`torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python.
Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function.
~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913
Reviewed By: albanD
Differential Revision: D28355725
Pulled By: mruberry
fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30696
### Release Notes
Instantiating a custom autograd function is now deprecated. Users should call `.apply()` on the class itself because it is a static method.
--end release notes--
- There are a couple error messages that we can't entirely remove because accessing these attributes of the autograd function instance may segfault (due to cdata being nullptr). Also added a TORCH_CHECK for the name attribute which previously segfaulted.
- Error message updated to convey 1) old-style functions have been deprecated 2) this access pattern was once valid
- Updates variable -> Tensor for some error messages
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57357
Reviewed By: mrshenli
Differential Revision: D28193095
Pulled By: soulitzer
fbshipit-source-id: f021b105e9a3fd4a20d6ee3dfb6a06a8c34b10ca
Summary:
This makes detach both forward and backward non-differentiable by default.
You can pass the `only_backward_mode=True` argument to make it forward differentiable but backward non-differentiable.
The important side effect of this change is that, by default, detach is not tracking any view information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57820
Reviewed By: ezyang
Differential Revision: D28287633
Pulled By: albanD
fbshipit-source-id: bdc4726fcd05889f6ac84e5a3a3ef71b2ec41015
Summary:
This PR also removes qr and eig tests from test/test_torch.py. They were not skipped if compiled without LAPACK and they are now replaced with OpInfos.
Fixes https://github.com/pytorch/pytorch/issues/55929
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56284
Reviewed By: ejguan
Differential Revision: D27827077
Pulled By: mruberry
fbshipit-source-id: 1dceb955810a9fa34bb6baaccbaf0c8229444d3a
Summary:
Problem arises for sinc'(x) where x != 0, but x ** 2 == 0, which happens for some very small floats.
I realized that my solution from https://github.com/pytorch/pytorch/issues/56763 was incomplete when I did a quick implementation using `torch.autograd.Function` and still got a `NaN` from my derivative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56986
Reviewed By: gchanan
Differential Revision: D28093507
Pulled By: albanD
fbshipit-source-id: 2a30e1065b08c5c60de843a0778dedeb0fb295f4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153
Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA.
- [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors
- [x] add complex support to coalesce function
- [x] add complex support to to_dense function
- [x] add complex support to to_sparse function
- [x] add complex support to sparse_add function
- [x] add unit tests
Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra.
Note: Before using ghstack the original PR was #50984
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27765618
Pulled By: ezyang
fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55692
### Release notes
get_numerical_jacobian and get_analytical_jacobian only support `grad_out=1` and `fn` no longer accepts functions that return complex output
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D28004614
Pulled By: soulitzer
fbshipit-source-id: 9592c9c69584b4035b39be62252f138dce39d3b5
Summary:
Adding cuda synchronization when entering and exiting the profiler
context manager
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56651
Test Plan: CI
Reviewed By: gdankel
Differential Revision: D27926270
Pulled By: ilia-cher
fbshipit-source-id: 5cf30128590c1c71a865f877578975c4a6e2cb48
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55656
### For release notes
What:
- All errors that are silenced by "raise_exception=False" are now GradcheckError (which inherits from RuntimeError).
Why:
- Due to a refactor of gradcheck
Workaround:
- If you catch for 'RuntimeError' with `except RuntimeError`, since GradcheckError inherits from RuntimeError, no changes are necessary. However if you explicitly check for the errors type via `type(error)`, you'll need to update your code to check for `GradcheckError` instead.
Factors out all the logic handling involving `fail_test`, `raise_exception` into 1) a wrapper around gradcheck that uses try/except 2) gradcheck_helper that always raises exception.
This allows us to avoid having to write the `if not x: return False` logic that is scattered throughout gradcheck currently.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27920809
Pulled By: soulitzer
fbshipit-source-id: 253aef6d9a3b147ee37a6e37a4ce06437981929a
Summary:
Temporary fix to give people extra time to finish the deprecation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56401
Reviewed By: xw285cornell, drdarshan
Differential Revision: D27862196
Pulled By: albanD
fbshipit-source-id: ed460267f314a136941ba550b904dee0321eb0c6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54480
This PR shouldn't really change the behavior of gradcheck for most ops. However, the changes in test_autograd allow us to run basic checks for both fast and slow (instead of previously just slow). All it should be doing is wrapping the preexisting tests we introduced in prior PRs in a function which takes `fast_mode` as a param. We then call this function twice, once with `fast_mode=True` and once with `fast_mode=False`.
Plan for rollout:
- This PR should only land the code (and runs some basic checks as described above).
- This should help us verify that a) slow is still working as expected b) basic functionality of fast works
- After we land this, but before we run the next PR in the stack, we should land https://github.com/pytorch/pytorch/pull/55182. This is to ensure that there is no gap where the slow tests aren't running.
- The next PR is responsible for enabling the fast_mode=True flag on all tests (where the function has real inputs/outputs), and selectively disabling for the cases the fail.
- Finally in a later PR, we reenable fast-gradcheck for functions w/ complex inputs/outputs
TODOs and open questions (not necessarily blocking this PR):
- ~How do we think about atol/rtol~ (scale atol, keep rtol as-is)
- ~reenable fast-gradcheck for complex numbers~
- ~when inputs are uncoalesced we don't truly test this case because we coalesce the inputs before calling function. Revisit this when https://github.com/pytorch/pytorch/pull/52874/files is landed~
### Developer Experience
Sample output when jacobian mismatch occurs:
```
Traceback (most recent call last):
File "/home/s/local/pytorch4/test/test_autograd.py", line 4220, in test_gradcheck_jacobian_mismatch
check(fast_mode=True)
File "/home/s/local/pytorch4/test/test_autograd.py", line 4196, in check
gradcheck(fn, (x,), fast_mode=fast_mode)
File "/home/s/local/pytorch4/torch/testing/_internal/common_utils.py", line 2067, in gradcheck
return torch.autograd.gradcheck(fn, inputs, **kwargs)
File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 1020, in gradcheck
if not fast_gradcheck(fail_test, seeded_func, func_out, tupled_inputs, outputs, eps, rtol,
File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 915, in fast_gradcheck
return fail_test(get_notallclose_msg(a, n, i, j, prefix) + jacobians_str)
File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 996, in fail_test
raise RuntimeError(msg)
RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor(0.9195)
analytical:tensor(0.9389)
The above quantities relating the numerical and analytical jacobians are computed
in fast mode. See: https://github.com/pytorch/pytorch/issues/53876 for more background
about fast mode. Below, we recompute numerical and analytical jacobians in slow mode:
Numerical:
tensor([[1.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 1.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 1.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 1.0000]])
Analytical:
tensor([[1.0100, 0.0100, 0.0100, 0.0100],
[0.0100, 1.0100, 0.0100, 0.0100],
[0.0100, 0.0100, 1.0100, 0.0100],
[0.0100, 0.0100, 0.0100, 1.0100]])
The max per-element difference (slow mode) is: 0.010000000000054632.
```
Additionally, if the per-element difference is small i.e., `allclose(analytical_slow, numerical_slow, rtol, atol) is True` we follow up with this message:
```
Fast gradcheck failed but element-wise differences are small. This means that the
test might've passed in slow_mode!
If you are adding a new operator, please file an issue and then use one of the
workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
If the test
- manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
with `fast_mode=False` as a keyword argument.
- is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test
to have `gradcheck_fast_mode=False`
- is a Module test (e.g., in common_nn.py), then modify the corresponding
module_test entry to have `gradcheck_fast_mode=False`
```
Test Plan: Imported from OSS
Reviewed By: walterddr, ejguan
Differential Revision: D27825160
Pulled By: soulitzer
fbshipit-source-id: 1fe60569d8b697c213b0d262a832622a4e9cf0c7
Summary:
Reland of https://github.com/pytorch/pytorch/pull/49098
See original issue for details.
The only difference with previous PR is the fix of the _embedding_bag_dense_backward formula to stop declaring a backward formula for an argument that does not exists.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56083
Reviewed By: samestep
Differential Revision: D27778221
Pulled By: albanD
fbshipit-source-id: 159ef91ca931ef2ccfbc3d1c46c7880c32919dc9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54378
### For release notes
`torch.autograd.gradcheck.get_numerical_jacobian` (not part of the public api) is being deprecated.
In the future, user code relying on this function will break because, among other changes, `get_numerical_jacobian` now returns `List[Tuple[torch.Tensor]]` instead of `List[torch.Tensor]`.
(more details if necessary)
For a `fn` that takes in M inputs and N outputs we now return a list of M N-tuples of jacobians where `output[i][j]` would represent the numerical jacobian w.r.t. to the ith input and the jth output. Previously `get_numerical_jacobian` returned a list of tensors where each tensor represents the jacobian w.r.t. to each of the M inputs and a specific output. Finally, the function passed in as the parameter `fn` should expect to handle individual parameters, where previously `fn` is required to expect its parameters wrapped in a tuple.
--- end --
This PR addresses the comment here https://github.com/pytorch/pytorch/pull/53857#discussion_r595429639, to reduce the run-time of old gradcheck's get numerical jacobian by a factor of num_outputs. However, because very few ops actually return multiple outputs, there is not too much real speed up here.
The main benefit of doing this change as part of the refactor is that it helps us isolate the possible bugs that are specific to switching `get numerical jacobian` to run in a per output way vs all outputs at once. Much of the logic implemented here will be the same for the fast gradcheck case, so knowing for certain that everything should pass after this stage will make the next step much simpler.
The get_numerical_jacobian api is also being used in common_nn. So we update the callsite there as well.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D27728720
Pulled By: soulitzer
fbshipit-source-id: ee0f90b4f26ddc5fdbe949c4965eaa91c9ed0bb8
Summary:
There are a few autograd tests checking for tensors leaked by reference cycles. This changes them to use `_WeakTensorRef` over `weakref`. `_WeakTensorRef`, added in https://github.com/pytorch/pytorch/issues/52874, accesses the C++ level `TensorImpl` reference count, compared to `weakref` which accesses python refcounts and so can only tell if the python wrapper object gets deallocated. Not only is this less code, it's also more accurately detecting that the Tensor itself is deallocated.
I didn't touch `weakref` usage in [test_anomaly_assign_parent_cleanup](fc349cbcde/test/test_autograd.py (L3733)) and [test_nested_anomaly_printstack_cleanup](fc349cbcde/test/test_autograd.py (L3772)) because these are intentionally testing for python object cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55726
Reviewed By: ngimel
Differential Revision: D27718526
Pulled By: albanD
fbshipit-source-id: 37a4914360e35dd4ae8db06b29525cebec4d4b84
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53651
I did not put much effort in improving the docs, as I will go over all these docs in future PRs
cc anjali411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55085
Reviewed By: nikithamalgifb
Differential Revision: D27493604
Pulled By: anjali411
fbshipit-source-id: 413363013e188bc869c404b2d54ce1f87eef4425