The rationale for this is that functorch doesn't work with saved
variable hooks at the moment or checkpointing and we need some way to
disable it.
Concretely:
- there's a context manager that does the disabling
- this feature is disabled on a thread-local basis
- one can set an error message or use the default error message that
says the feature has been disabled
Since it is thread local I needed to update ATen/ThreadLocalState. To
make things nicer, this PR refactors all the "saved tensors hooks"
related TLS things into a single struct.
Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85553
Approved by: https://github.com/soulitzer
Addresses: https://github.com/pytorch/pytorch/issues/83617
This PR a way to query the TLS graph task's exec_info which is a map mapping the Node to a bool indicating whether it will be executed in the current backward pass (as determined by the inputs= argument for .grad of .backward).
- this works with both custom Function nodes and normal codegened nodes
- to be able to verify whether the pyobject passed is an actual node, we now store pointers to PyTypeObjects into a set on registration.
- error out when .backward without inputs= to avoid silently returning True
Alternatives:
- not sure if it is possible to bind to Python from a raw pointer to Node. At least we wouldn't be able to use existing logic, and the Python object should only hold a weak reference to the Node.
- other solutions to the motivating issue seem to require more extensive modification to the engine
See the issue linked for an example of usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84773
Approved by: https://github.com/albanD
Add unit tests and docstrings corresponding to PR https://github.com/pytorch/pytorch/pull/63289
UT:
1. `test_profiler_emit_itt` in `test/test_autograd.py`. This test is merely intended to catch if emit_itt breaks on construction.
2. Test `torch.profiler.itt` functions in `test/test_itt.py`
3. Only testing that emit_itt runs when `record_shapes` option is enabled in `test/test_profiler.py`.
Docstring:
1. add ITT related info into `docs/source/bottleneck.rst`
4. add `torch.profiler.itt` functions to `docs/source/profiler.rst`
5. add docstring to `torch.profiler.itt` functions in `torch/profiler/itt.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84848
Approved by: https://github.com/malfet
Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718
Approved by: https://github.com/albanD
Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`.
This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring.
My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes.
It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang.
(Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590
Approved by: https://github.com/ezyang
Per offline discussion, this will be updated to use expand once expand semantics for nested tensor have been fleshed out.
Next steps will be to add support for other features for forward sum mentioned on #82387 and likewise update the backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82625
Approved by: https://github.com/albanD
`derivatives.yaml` can now take a `dispatch` entry which registers per-autograd dispatch key derivatives such as
```
name: foo(Tensor self, Tensor y) -> Tensor
dispatch:
Default:
x: grad
y: grad.expand(y.sizes())
AutogradNestedTensor:
x: grad
y: NestedTensor_foo_backward(grad, y)
output_differentiabilty: [True]
```
However the old schema where there is no `dispatch` entry is still supported.
Would greatly appreciate feedback on *how to improve the testing strategy* of this PR, currently have registered an aten test op in TestOps.cpp with dummy gradients in derivatives.yaml and have some tests in test_autograd.py:TestAutogradMultipleDispatch but I am not sure whether these are sufficiently rigorous.
Additionally, this PR also makes the assumption that sets like [VIEW_FUNCTIONS](ff5399e528/tools/autograd/gen_inplace_or_view_type.py (L60)) are per-native-function and not per-native-function-and-dispatch-key. I'm not sure whether this is necessarily the case, *would there ever be a situation where (e.g. a nested_tensor op is a view op but the aten function is not or vice versa?)*
* __->__ #82801
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82801
Approved by: https://github.com/bhosmer, https://github.com/albanD
### Description
cudaProfilerStart and cudaProfilerStop are deprecated but exposed by torch.cuda.cudart(). HIP has corresponding functions stubbed out, hipProfilerStart and hipProfilerStop, but they return hipErrorNotSupported. Profiling in HIP is supported, but not via these deprecated APIs.
See https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__PROFILER__DEPRECATED.html.
These functions are indirectly used by one or more unit tests that would otherwise pass if the non-functional HIP APIs were replaced with a dummy function.
### Testing
Unskipped a related unit test, run by ciflow/trunk.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82778
Approved by: https://github.com/ezyang
Towards fixing https://github.com/pytorch/pytorch/issues/82482
This PR fixes two things:
## 1) memory leak
The .detach() call prevents a true memory leak in some cases where the user function is using multiple ops in a row that save their inputs. The following chain of objects keep each other alive
- the `storage` object
- a recomputed Tensor y
- y's grad_fn FooBackward (in c++)
- FooBackward's SavedVariables (in c++)
- SavedVariable Hook
- the `inner_pack` function
- captures `storage`
Since part of this cycle is in c++, the python gc is not able to break it.
Should THPCppFunction_traverse actually visit it's SavedVariables which in turn should visit their hooks? I think the answer is yes but I haven't dived into which python object is traversing what as if there is non-unique ownership of the c++ object, it makes the traversal a lot trickier. @ezyang do you think we should dive into this more?
In this case, this can be easily solved anyways by storing `y.detach()` in the `storage` object as we don't care about the temporary backward graph that gets created during the second forward call.
## 2) Lifetime of the recomputed buffers
The new storage system is now such that the lifetime of the recomputed buffer is directly linked to the SavedVariable c++ object. Meaning that this buffer will get deleted IIF the SavedVariable is cleared.
This means that we now get the exact same behavior as the version without the saved variable hook where Tensors are saved directly on the SavedVariable object.
This is great as this solves all the cases where the non-checkpoint version used to work but the checkpoint version does not (even double access or retain_graph=True).
The one drawback of this approach though is that the buffer do NOT get cleared when the user passes in `retain_graph=True`! The next backward won't even re-run the forward as it already has all the buffers available. Is this a problem that you think we would need to find a solution for @rohan-varma or it is niche enough that we don't care for now?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82776
Approved by: https://github.com/ezyang, https://github.com/rohan-varma
I don't think there's a way to avoid functions returning undefined tensors as outputs, so codegen will have to detect them before calling _set_fw_grad. Alternatively, we can just make calling _set_fw_grad with undefined self a no-op, but I'm biasing toward keeping _set_fw_grad more strict in case it is called in other areas.
Fixes https://github.com/pytorch/pytorch/issues/81111
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81114
Approved by: https://github.com/albanD
See this doc: https://docs.google.com/document/d/1KiRdnoj6B4cI3yl017hTbCqcOGO1gWIpUf20sldipHM/edit#
Two issues (1) regarding hooks in general and (2) regarding retains grad hooks are fixed, Python hooks, which rely on a different mechanism are not discussed here:
- Hooks in cpp in general
- (fixed) new hooks to registered to a newer version of the tensor no longer get applied to grad_fn
associated with older version of the tensor when the first hook was ever registered
- (unchanged) hooks registered to the older version of the tensor remain active on
- Retains grad hooks
- (fixed) now get moved to the latest grad_fn. NB: To the user, retains_grad is not considered hooks
or expected to behave like hooks (which we consider properties of the grad_fn) vs retains_gradness
which is a property of the tensor.
- (not in this PR) Python hooks
- (will fix) same issue as hooks in cpp where new hooks are being applied to grad_fn associated
with the older version of the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79996
Approved by: https://github.com/albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77696https://github.com/pytorch/pytorch/pull/63619 added a RECORD_FUNCTION guard to make calls to `Engine::evaluate_function` visible regardless of the underlying op. While useful, this creates a call that looks like a forward call that somewhat complicates stitching forward and backward ops. I don't want to add complexity (and therefore work) on the hot path; instead it's fairly straightforward to stitch things back together in post. This PR simply propagates sequence number and forward tid info up to the `evaluate_function` event.
Differential Revision: [D36302562](https://our.internmc.facebook.com/intern/diff/D36302562/)
Approved by: https://github.com/aaronenyeshi
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76253
We're observing large QPS regression on the original PR https://github.com/pytorch/pytorch/pull/72302. For the training job we had, it regressed from 720k QPS to 450k QPS (see the test plan in FB internal). We suspect this is because the api was changed from `_record_function_enter` to `_record_function_enter_new`, and we're running experiments to confirm that. Will add more details when the runs in the test plan has finished. For now, it's better to revert the diff to unblock internal usecases and we can think about how to reland this diff later.
Original commit changeset: dc9939f1fa6d
Original Phabricator Diff: D35257354
Test Plan:
on trunk: f338665947
with this diff: f338502850
Reviewed By: malfet, robieta
Differential Revision: D35853300
fbshipit-source-id: dd38042aeacb848f66756491a4c849c7c652a0e1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71708
In Python 3.2, a number of asserts were deprecated.
In Python 3.11, these asserts are deleted completely. The files in this change still use the deprecated asserts.
Switch over to the supported syntax for 3.2 onwards.
Test Plan: Tested on the internal test suite runner.
Reviewed By: ajtulloch
Differential Revision: D33503694
fbshipit-source-id: a150f296033260acf8365d77b837ce0679f57361
(cherry picked from commit abf60ed97409265222915d8265aaabedd625fd93)
Summary:
Description of the new behavior is in PythonFallbackKernel.cpp.
The updated test makes sure that we only call alias on the first Tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73925
Reviewed By: samdow
Differential Revision: D34862940
Pulled By: albanD
fbshipit-source-id: 4d020e41c8bb8b10262dcafd524e84a5ad4d7af0
(cherry picked from commit 0aa6b56dbd3dcee830453fb02cd6c83ab7a8be06)
Summary:
Minimal example that deadlocks before but not after:
```python
import torch
from torch.autograd import Function
class Foo(Function):
staticmethod
def forward(ctx, x):
return x.clone()
staticmethod
def forward(ctx, gO):
return gO.clone()
def get_out():
inp = torch.rand(2, requires_grad=True)
# The python function is first so that it runs
# last in the backward pass
right = Foo.apply(inp)
# An op that creates new memory
left1 = inp.clone()
# An op that saves its input
left2 = left1 ** 2
# Inplace modify so that the backward for
# left2 always raises an error
left1 += 1
# An op that takes both side as input.
# After running, both side's last op will be in
# the ready queue
# And the op for left will run first as it was
# executed last during the forward
out = left2 + right
return out
# Nothing should be global variables here as, from what
# I can see, python leaks all the global objects
get_out().sum().backward()
```
Since this requires the python interpreter to die, it is hard to test in CI.
Let me know if you have an idea how to do it though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73961
Reviewed By: malfet
Differential Revision: D34752747
Pulled By: albanD
fbshipit-source-id: 1a537b1f733e161e8d3ff053cd432b37b34d432a
(cherry picked from commit 17943e4c04c782d81deab439e010195f04e75bbd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72301
First step in resolving #35026.
This adds `PythonRecordFunction` which is a `torch::CustomClassHolder`
for `at::RecordFunction` to keep the ATen code free of torch includes.
And adds new unused internal API functions
`_record_function_enter_new` which return the torchbind object.
Once the FC period is expired, `torch.profiler.record_function` will
be updated to use this new internal API. Then once BC period is
expired, the cpp_custom_type_hack-based API can be removed.
Test Plan: Imported from OSS
Reviewed By: dagitses
Differential Revision: D34586311
Pulled By: robieta
fbshipit-source-id: d3eb9ffad7b348548a2b22c75203a92d1cb5115b
(cherry picked from commit 92d2ca808e5fbd20c9d6645dcabc3f059f9ef2d3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72688
Refactor how we know what to run on the cpu queue.
The Lazy Tensor moved there as it is always present as a device guard and would make the number of devices 1 all the time (forcing the creation of a thread).
FYI wconstab you most likely don't care about this unless you ever use multiple Lazy device?
This should slightly improve the perf if you run backward with Lazy Tensors as the work will be done in the main thread and not a worker thread.
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D34180245
Pulled By: albanD
fbshipit-source-id: 88c5d5bdd631ad01bf271d720d1eab69aba84fc0
(cherry picked from commit da7e9b902f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72008
Fixes #71119
Technically BC-breaking because when an input does not require grad, previously it was returned as-is instead of a view because it didn't need to. Now we will also return a view in that case (whether or not forward AD runs).
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33859553
Pulled By: soulitzer
fbshipit-source-id: 81b3fa371f4c0904630878500aa190492c562367
(cherry picked from commit ee74bc8234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71707
Why?
- detach should behave like jax.stop_gradient in functorch. Because it
does not detach all the way through, functorch (as well as a Tensor
Subclass wrapping a Tensor subclass) won't see it after the first
layer/subclass handles it.
How?
- This PR changes detach to dispatch all the way through to the backend.
- This PR also modifies native::detach to call shallow_copy_and_detach
instead of native::alias. This is because today, the semantics of detach
and alias are differently -- they differ only by
allow_tensor_metadata_change. In the future, we may choose to deprecate
this flag.
- NB: Before and after this PR, detach() shows up twice in
torch_dispatch: https://github.com/pytorch/pytorch/issues/71725. This is
not a regression so I didn't want to fix it in this PR because it is
weird to fix.
Test Plan: - added new tests; run existing tests
Reviewed By: albanD
Differential Revision: D33752860
Pulled By: zou3519
fbshipit-source-id: 40cc2dc8232e75a02586a4ba5b0ef5f16cb76617
(cherry picked from commit f88aae426e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69827
In general, the current pattern allows for implementing optimisations
for all the backends in a common place (see for example the optimisation
for empty matrices).
After this PR, `torch.svd` is implemented in terms of `linalg.svd` and
`linalg.svdvals`, as expected. This makes it differentiable in the case
when `compute_uv=False`, although this is not particularly important, as
`torch.svd` will eventually be deprecated.
This PR also instantiates smaller `U` / `V` when calling cusolver_gesvdj
in the cases when `full_matrices=False` or `compute_uv=False`.
The memory for auxiliary `U` and `V` in the cases above, needed for some
cuSOLVER routines is allocated raw allocators rather than through fully
fledged tensors, as it's just a blob of memory the algorithm requests.
As the code is better structured now, it was easier to see that `U` and
`Vh` needn't be allocated when calling `svd_cusolver_gesvd`.
Now `linalg.svdvals` work as expected wrt the `out=` parameter.
Note that in the test `test_svd_memory_allocation` we were
passing a tensor of the wrong size and dtype and the test seemed to
pass...
This PR also changes the backward formula to avoid saving the input
matrix, as it's not necessary. In a follow up PR, I will clean the
backward formula and make it more numerically stable and efficient.
This PR also does a number of memory optimisations here and there, and fixes
the call to cusolver_gesvd, which were incorrect for m <= n. To test
this path, I compiled the code with a flag to unconditionally execute
the `if (!gesvdj_convergence_check.empty())` branch, and all the tests
passed.
I also took this chance to simplify the tests for these functions in
`test_linalg.py`, as we had lots of tests that were testing some
functionality that is already currently tested in the corresponding
OpInfos. I used xwang233's feature to test both MAGMA and CUDA
backends. This is particularly good for SVD, as cuSOLVER is always
chosen over MAGMA when available, so testing MAGMA otherwise would be
tricky.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D33751983
Pulled By: mruberry
fbshipit-source-id: 11d48d977946345583d33d14fb11a170a7d14fd2
(cherry picked from commit a1860bd567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71569
Not sure if this is the right API
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33695395
Pulled By: soulitzer
fbshipit-source-id: 652b5758f15d901f98ff0da94e977030c7f3415b
(cherry picked from commit 9421a6846a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71531
Based on the comment above the original internal assert, this is the desired check.
1. Don't error, and automatically make jvp return a view for that tensor output (this is easier than I originally thought: https://github.com/pytorch/pytorch/pull/71531#discussion_r789211877)
2. Error (currently doing)
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33695399
Pulled By: soulitzer
fbshipit-source-id: dba49890a55ad1dd59ed5c41faa96bf7cfc9e562
(cherry picked from commit fdb0f266f5)
Summary:
When default hooks are set, they are pushed onto a stack.
When nesting context-manager, only the inner-most hooks will
be applied.
There is special care needed to update the TLS code. See also https://github.com/pytorch/pytorch/issues/70940 (i.e. do we need to be storing the enabled flag as well?)
Fixes https://github.com/pytorch/pytorch/issues/70134
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70932
Reviewed By: mruberry
Differential Revision: D33530370
Pulled By: albanD
fbshipit-source-id: 3197d585d77563f36c175d3949115a0776b309f4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68631
This PR:
- Adds the check that the storage numel of the base and tangent tensors are the same. This is to support the case when as_strided reveals elements that aren't indexable by the input tensor.
- Skips the check when batched tensors are involved, because using as_strided to reveal elements that not indexable by the input tensor is already not allowed vmap.
- Adds tests for the above two cases, as well as an edge case regarding conj bit (what about neg bit?)
For functorch:
- we need to copy the batching rule implemented here
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D32899678
Pulled By: soulitzer
fbshipit-source-id: 54db9550dd2c93bc66b8fb2d36ce40799ebba794
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69592
Currently, forward AD function for`copy_` (in `VariableTypeManual`) does not handle the broadcasting case. ~EDIT: but that is not a design decision, not a bug. In this PR, we make that clear as a comment.~
Note: `broadcast_to` does not have a batching rule in core, so the ops that rely on `copy_` to broadcast will still fail batched forward grad computation.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33020603
Pulled By: soulitzer
fbshipit-source-id: 09cb702bffc74061964a9c05cfef5121f8164814
Summary:
This fixes the case when `torch.inference_mode` is called with `mode=False` (disabled). When used as a decorator, it ignored the argument and enabled inference mode anyway.
`_DecoratorContextManager` is changed so that a new instance is a copy instead of a new instance with default parameters.
I also added more tests to cover this case.
Current behaviour:
```python
>>> import torch
>>> x = torch.ones(1, 2, 3, requires_grad=True)
>>> torch.inference_mode(mode=False)
... def func(x):
... return x * x
...
>>> out = func(x)
>>> out.requires_grad
False
```
New behaviour (fixed):
```python
>>> import torch
>>> x = torch.ones(1, 2, 3, requires_grad=True)
>>> torch.inference_mode(mode=False)
... def func(x):
... return x * x
...
>>> out = func(x)
>>> out.requires_grad
True
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68617
Reviewed By: mrshenli
Differential Revision: D32958434
Pulled By: albanD
fbshipit-source-id: 133c69970ef8bffb9fc9ab5142dedcffc4c32945
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68630
Constraints:
1) (functorch) if all the inputs to an op have requires_grad=False and don't have tangents, then their VariableType
kernel should be a no-op i.e., behave like a redispatch. This is due to functorch's DynamicLayerStack
having the autograd key by default (which is so that transformations like vmap) still work with autograd
2) (inference mode) inference tensors in inference mode will call straight into the kernel, we should still do something sensible
inside even if we normally wouldn't redispatch into it.
3) ~Should support potential application of interposition below autograd: `nn.Parameter` is a example of subclassing where the subclass
is not preserved when an operation is performed. There is an exception though: we want calling `make_dual` on a
`nn.Parameter` to preserve its parameterness.~
4) Should avoid calls to shallow_copy_and_detach to avoid spurious calls into `__python_dispatch__`.
This PR:
- does not redispatch to `make_dual` from its `ADInplaceOrView` kernel to satisfy (1)
- calls into `alias` from the kernel in the native namespace so that behavior is consistent with other views in inference mode to satisfy (2)
- discussion of (3). We still wouldn't be able to directly override `make_dual` below autograd. In this PR, instead of not redispatching at all, we choose to redispatch into `at::alias` so that one can override `make_dual`. The side effect is that one would not be able to distinguish calls between the two, which can be problematic (though a straightforward but hacky solution would be to create a new `at::alias_for_make_dual` that would allow users to distinguish) the two. This isn't ideal but seems to be the simplest way to satisfy (3). We don't pursue that hacky solution here.
- (4) is satisfied because we remove calls to `shallow_copy_and_detach`
<details>
<summary> A potentially less hacky but more involved solution? (WIP) </summary>
Realizing that make_dual is more like requires_grad, perhaps it shouldn't be autograd explicit? Make make_dual a composite or python-only construct. i.e., it would be a view on the primal followed by something to the effect of primal.set_fw_grad(tangent).
Additional constraints:
5) make_dual needs to be backward-differentiable (I can't think of any applications yet becuase
technically as a high-order function, jvp's input is the tangent only, "detach" is not applied on
the tangent, so one would still be able to propagate gradients through it).
6) set_fw_grad needs to raise an error if there is a layout mismatch and base is a forward-differnentiable view
Possible plan
- (6) implies that a plain view would not suffice. We need a `detach`-like operation to ensure that set_fw_grad
knows the view is not forward differentiable.
- (5) implies that is this (new) `detach` would need to be backward differentiable (API TBD).
- (3) is no longer relevant because make_dual is no longer autograd explicit, but perhaps this new detach should behave like the current one? There is a lot of logic to replicate for detach, so this may be hard.
- (1) is satisfied if we use current detach logic, i.e., , and (4) is trivial.
I'm not convinced that this is the right solution either, because in the end does (3) still work?
</details>
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32899679
Pulled By: soulitzer
fbshipit-source-id: 98e13ae954e14e1e68dbd03eb5ab3300d5ed2c5e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69508
Original Phabricator Diff: D32704467 (e032dae329)
Reland, fix is to not test traditional checkpoint when input does not require grad as that is unsupported as documented.
Original PR body:
Resubmission of https://github.com/pytorch/pytorch/pull/62964 with the
suggestions and tests discussed in
https://github.com/pytorch/pytorch/issues/65537.
Adds a `use_reentrant=False` flag to `checkpoint` function. When
`use_reentrant=True` is specified, a checkpointing implementation that uses
SavedVariableHooks instead of re-entrant autograd is used. This makes it more
composable with things such as `autograd.grad` as well as DDP (still need to
add thorough distributed testing).
As discussed in https://github.com/pytorch/pytorch/issues/65537, the tests that we need to add are:
- [x] Gradient hooks are called once
- [x] works when input does require grads but Tensor that require grads are captures (like first layer in a nn)
- [x] works for functions with arbitrary input/output objects
- [x] distributed tests (next PR)
Note that this is only for `torch.utils.checkpoint`, if this approach overall looks good, we will do something similar for `checkpoint_sequential`.
ghstack-source-id: 144948501
Test Plan: CI
Reviewed By: zhaojuanmao
Differential Revision: D32902634
fbshipit-source-id: 2ee87006e5045e5471ff80c36a07fbecc2bea3fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69027
Resubmission of https://github.com/pytorch/pytorch/pull/62964 withe
suggestions and tests discussed in
https://github.com/pytorch/pytorch/issues/65537.
Adds a `use_reentrant=False` flag to `checkpoint` function. When
`use_reentrant=True` is specified, a checkpointing implementation that uses
SavedVariableHooks instead of re-entrant autograd is used. This makes it more
composable with things such as `autograd.grad` as well as DDP (still need to
add thorough distributed testing).
As discussed in https://github.com/pytorch/pytorch/issues/65537, we have added
the following tests:
-[ ] Gradient hooks are called once
ghstack-source-id: 144644859
Test Plan: CI
Reviewed By: pbelevich
Differential Revision: D32704467
fbshipit-source-id: 6eea1cce6b935ef5a0f90b769e395120900e4412
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67041
Original PR here: https://github.com/pytorch/pytorch/pull/62246 (The old PR does more things, but now that's split across this stack)
This PR:
- Adds "jacfwd" and "hessian_fwdrev"
- Modifies existing tests to also test the `forward_ad=True` case
Test Plan: Imported from OSS
Reviewed By: gchanan, zou3519
Differential Revision: D32314424
Pulled By: soulitzer
fbshipit-source-id: 785b0e39162b93dc3b3cb9413233447152eddd53
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66294
In this PR:
- OpInfo for forward AD now checks batched forward grad when `op.check_batched_grad=True`
- Adds setting to disable the test for individual ops `check_batched_forward_grad` and disable for the ops here: https://github.com/pytorch/pytorch/issues/66357
Fixes some more failures:
- Make Forward AD metadata less strict by allowing stride to differ when size is 1
- Fix sum batching rule when logical tensor is a scalar and dim is unspecified
- Batching rule for `_reshape_alias`
- ~Batching rules now preserve storage offset for view operator that return non-zero storage offset~ (moved to previous PR)
Test Plan: Imported from OSS
Reviewed By: zou3519, albanD
Differential Revision: D31842020
Pulled By: soulitzer
fbshipit-source-id: 3517a8fb9d6291fccb53c0b1631eab5bbb24ebd1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66292
In this PR:
1. Fix the case when tangent has a different layout from the base when `set_fw_grad` by adding a native function and its batching rule.
For (1) we replace the following:
```
Tensor new_with_same_meta(const Variable& base) {
int64_t nelement_in_storage = base.storage().nbytes() / base.itemsize();
auto new_tensor = at::zeros({nelement_in_storage}, base.options());
auto res = new_tensor.as_strided(base.sizes(), base.strides(), base.storage_offset());
return res;
}
```
with a native function as to enable a batching rule to alter its behavior.
This new function will be similar to `new_zeros_strided` except we also require the `storage_offset` and `storage_numel` arguments.
Possible concerns:
- Why have redundant logic? Why not add new args `new_zeros_strided`? This is probably a niche use case, so it's better not to complicate the current API.
- Previously the created tensor inherits the TensorOptions of the primal. Now we inherit from the TensorOptions of the tangent.
- Probably fine. Likely, no one relies on this because the behavior is only triggered when tangent/base have different layouts.
- Why pass in exploded size, stride, and offset? It is possible in the non-batched case to pass in a tensor directly, but not possible when we'd like to have a batching rule. The size, stride, and offset we'd be passing won't belong to any live tensor.
Test Plan: Imported from OSS
Reviewed By: zou3519, albanD
Differential Revision: D31842019
Pulled By: soulitzer
fbshipit-source-id: a58433d814fd173bc43a2c550b395377dba40de2
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67367
- Adds check to make sure forward grad itself does not have forward grad at the same level
- Verify with `python test/test_ops.py -k test_forward_mode_AD_linalg_eigh_cpu_float64` that it fails the check before, but passes after the codegen update
Before:
```
if (_any_has_forward_grad_eigenvalues) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
auto eigenvalues_new_fw_grad = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors);
if (eigenvalues_new_fw_grad.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
eigenvalues._set_fw_grad(eigenvalues_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
}
}
if (_any_has_forward_grad_eigenvectors) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
auto eigenvectors_new_fw_grad = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors);
if (eigenvectors_new_fw_grad.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
eigenvectors._set_fw_grad(eigenvectors_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
}
}
```
After:
```
c10::optional<at::Tensor> eigenvalues_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_eigenvalues) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
eigenvalues_new_fw_grad_opt = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors);
}
c10::optional<at::Tensor> eigenvectors_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_eigenvectors) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
eigenvectors_new_fw_grad_opt = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors);
}
if (eigenvalues_new_fw_grad_opt.has_value() && eigenvalues_new_fw_grad_opt.value().defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
eigenvalues._set_fw_grad(eigenvalues_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
if (eigenvectors_new_fw_grad_opt.has_value() && eigenvectors_new_fw_grad_opt.value().defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
eigenvectors._set_fw_grad(eigenvectors_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68535
Reviewed By: ngimel
Differential Revision: D32536089
Pulled By: soulitzer
fbshipit-source-id: a3f288540e2d78a4a9ec4bd66d2c0f0e65dd72cd
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67800
Currently when the grad is the same layout as base, we try to assign the same tensor to the forward grad of both the base and the view. However, when the layout of the grad is different from the layout of the view, this triggers a copy to be created, and the tangent of the view (after the inplace) will not have a view relationship with the view of the base.
This PR just changes it so that we only do the above optimization when the layout also matches the layout of self
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67816
Reviewed By: malfet
Differential Revision: D32190021
Pulled By: soulitzer
fbshipit-source-id: b1b2c9b332e83f4df5695ee9686ea76447f9305b
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/66066
This PR:
- cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality
- tests related to an operator are better colocated
- see the tracker for details
What to think about when moving tests to their correct test suite:
- naming, make sure its not too generic
- how the test is parametrized, sometimes we need to add/remove a device/dtype parameter
- can this be merged with existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413
Reviewed By: jbschlosser, albanD
Differential Revision: D32031480
Pulled By: soulitzer
fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66291
In this PR:
- Trivial batching rules for `make_dual` and `is_same_size` that enable forward ad + vmap functionality
- Adds a check in gradcheck that is performed when both `check_batched_grad` and `check_forward_ad` are `True` (an OpInfo using this is added later in the stack).
- Tests for the gradcheck functionality
- Tests that basic out-of-place op works
Test Plan: Imported from OSS
Reviewed By: albanD, saketh-are
Differential Revision: D31842018
Pulled By: soulitzer
fbshipit-source-id: 84b18d9a77eeb19897757e37555581f2a9dc43d8
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61926
1. update the `if` to just use requires_derivative since that should reflect when function is not differentiable
2. if `requires_derivative=True` but no outputs have forward derivatives, we should error as usual
3. ~In the future we may also want to handle the case~ when `len(fw_derivatives) > 0 and len(fw_derivatives) < num_diff_outputs` we should add assert in codegen that this does not happen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66926
Reviewed By: anjali411
Differential Revision: D31810736
Pulled By: soulitzer
fbshipit-source-id: 11a14477cc7554f576cff2ed1711a448a8c6a66a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181
This PR replaces all the calls to:
- `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python
- `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python.
It also simplifies two pieces of code, and fixes one bug where a pair
of parentheses were missing in the function `make_symmetric_matrices`.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D31692896
Pulled By: anjali411
fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50209
This adds a new warning handler that stores all warnings in a shared
queue, which can be "replayed" at a later time and, crucially, on
another thread. Then, I use this inside the autograd engine to ensure
that warnings are processed by the handler registered on the main
thread.
For testing, I also add an operator that always warns in the backward
pass and test that the warning is a normal Python warning.
cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66235
Reviewed By: ejguan
Differential Revision: D31505413
Pulled By: albanD
fbshipit-source-id: 1a7f60b038f55c20591c0748b9e86735b3fec2f9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65564
- wrap the call into engine with vmap if `batched_grad` is `True`
- improves the comment on the call to engine (somewhat addressing https://github.com/pytorch/pytorch/issues/41659)
- borrows the message from functional.jacobian's vectorized argument concerning usage of the vmap feature
- adds basic test (further testing is done when we replace the usage in vectorized jacobian computation)
TODO:
- create an issue tracking this
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D31236259
Pulled By: soulitzer
fbshipit-source-id: b33e6b26ea98fa9f70c44da08458fc54ba4df0f7
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64999
- Adds a flag to gradcheck `check_backward_ad` that can be used to disable gradcheck for backward ad
- This is a bit bc-breaking in terms of positional args, but I prefer this ordering
- In OpInfo tests for forward ad:
- set `check_backward_ad` False
- In test_ops treat `supports_autograd` as if it is `supports_backward_ad` (it basically already is)
- the only modification needed is to no longer skip forward ad tests if `supports_autograd` is false
- test_dtype, test_variant_consistency, etc behave correctly as-is
- In a follow-up PR, we can rename it to actually be `supports_backward_ad`
- Testing
- https://github.com/pytorch/pytorch/pull/65060
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65040
Reviewed By: albanD
Differential Revision: D31238177
Pulled By: soulitzer
fbshipit-source-id: f068d4cbe7ffb094930b16cddb210583b9b7b2c4
Summary:
OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261
- Eliminate duplicated testing logic in test_autograd
- Moved tests that rely on this testing logic to use OpInfos
- `cat` already has OpInfo (no action needed)
- Created OpInfo for `block_diag` and `broadcast_tensors`
Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997
Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993
Reviewed By: jbschlosser
Differential Revision: D30961736
Pulled By: soulitzer
fbshipit-source-id: e169305384a683acae1178c4e12e9e214a67226a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64813
Raises a TypeError when assigned value to a grad is not a Tensor or
None.
Adds tests.
cc ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64876
Reviewed By: anjali411
Differential Revision: D30901678
Pulled By: soulitzer
fbshipit-source-id: dbb3cb5fd0bbac6918e0b2e2f51d340daa43dee0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554
Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:
1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.
We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D30662206
Pulled By: mruberry
fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61767
## Changes
- [x] Add `torch.concat` alias to `torch.cat`
- [x] Add OpInfo for `cat`/`concat`
- [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`)
- [x] `cat`/`concat`
- [x] `stack`
- [x] `hstack`
- [x] `dstack`
- [x] `vstack`/`row_stack`
- [x] Remove redundant tests for `cat`/`stack`
~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~
**Edit**: cat/concat OpInfo has been added.
**Note**: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch.
Thanks to krshrimali for guidance on my first PR :))
cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560
Reviewed By: saketh-are
Differential Revision: D30762069
Pulled By: mruberry
fbshipit-source-id: 6985159d1d9756238890488a0ab3ae7699d94337
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64261
Note that this does not preserve byte-for-byte compatibility with
existing names.
Test Plan:
* Rely on CI to catch gross errors.
* Merge after release cut to catch subtle issues.
Reviewed By: albanD
Differential Revision: D30700647
Pulled By: dagitses
fbshipit-source-id: 7b02f34b8fae3041240cc78fbc6bcae498c3acd4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63400
This is the first step to break up test_autograd.py for #63205.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30541499
Pulled By: dagitses
fbshipit-source-id: 8d9d32007938b9eade0e88f95a6a3190e7e2ef01
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63619
Adds a RECORD_FUNCTION with the function that is being valuate as part
of backwards execution. This has been useful in picking up some operations
in the backwards pass that otherwise would not show up, for example custom cpp
functions that use custom C++ code.
ghstack-source-id: 137041723
Test Plan:
CI
benchmark:
buck run mode/opt //scripts/rvarm1/ddp:bench
Reviewed By: albanD
Differential Revision: D30439492
fbshipit-source-id: 955917770cdf2a2edb0303223ace710b668ba388
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63324
Fix for https://www.internalfb.com/tasks/?t=98258963
`catch_warnings` seem to only trigger once in certain cases where it
should trigger twice.
This test is only meant to test whether hooks are trigger / not trigger,
so changing it to self.assertGreater is ok.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30340833
Pulled By: Varal7
fbshipit-source-id: 1bfb9437befe9e8ab8f95efe5f513337fa9bdc5c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62909
This PR makes saved tensors default hooks thread local.
This allows using default hooks in a multithreaded context.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30165416
Pulled By: Varal7
fbshipit-source-id: 10a7d580661d3d94bdaf398c4e076b7bea11c16b
Summary:
When using saved tensors hooks (especially default hooks),
if the user defines a `pack_hook` that modifies its input,
it can cause some surprising behavior.
The goal of this PR is to prevent future user headache by catching
inplace modifications of the input of `pack_hook` and raising an error if
applicable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62717
Reviewed By: albanD
Differential Revision: D30255243
Pulled By: Varal7
fbshipit-source-id: 8d73f1e1b50b697a59a2849b5e21cf0aa7493b76
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61931
This PR consolidates the profiling code around a new C++ implementation
(profiler_kineto.h/cpp) and uses it unconditionally from
torch.autograd.profiler/torch.profiler:
1. Always use profiler_kineto.h/cpp as the C++ implementation
2. Simplify profiler.py to remove unneeded parts depending on legacy
impl
3. Move some of the legacy logic into profiler_legacy.py (to be fully
deleted later)
Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake
python test/test_profiler.py -v
USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake
python test/test_profiler.py -v
Imported from OSS
Reviewed By: gdankel
Differential Revision: D29801599
fbshipit-source-id: 9794d29f2af38dddbcd90dbce4481fc8575fa29e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61928Fix#57100.
Creates a function `torch.autograd.graph.set_save_on_cpu_hooks()` which can be used to register default hooks under which all tensors saved during the forward pass are actually copied* to cpu, then copied back to the appropriate device for the backward pass.
*If the tensor was already on cpu, the entire operation is a no op.
If the tensor is on GPU, we copy the tensor to `pin_memory` during packing so that the unpacking can be done asynchronously.
See [benchmark](https://github.com/pytorch/pytorch/pull/61928#issuecomment-885089279) and [note about training large models](https://github.com/pytorch/pytorch/pull/61928#issuecomment-887009448)
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D29848526
Pulled By: Varal7
fbshipit-source-id: 3d289cddd4fa377bd4884ba0d569fa47c777d9e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62563
Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks().
These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed.
Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927.
A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor.
For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.:
```
def pack(x):
name = os.path.join(tmp_dir, str(uuid.uuid4()))
torch.save(x, name)
return name
def unpack(name):
return torch.load(name)
```
Relanding previous PR: https://github.com/pytorch/pytorch/pull/61834
Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc
Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98
The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`.
Test Plan: Imported from OSS
Reviewed By: iramazanli
Differential Revision: D30045405
Pulled By: Varal7
fbshipit-source-id: 7f6c07af3a56fe8835d5edcc815c15ea4fb4e332
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61834
Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks().
These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed.
Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927.
A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor.
For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.:
```
def pack(x):
name = os.path.join(tmp_dir, str(uuid.uuid4()))
torch.save(x, name)
return name
def unpack(name):
return torch.load(name)
```
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D29792193
Pulled By: Varal7
fbshipit-source-id: 33e931230ef59faa3ec8b5d11ef7c05539bce77c
Summary:
This PR un-reverts https://github.com/pytorch/pytorch/issues/61475 + fixes compilation with MSVC, that does not recognize alternative operator spellings (i.e. using `or` instead of `||` )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61937
Reviewed By: albanD
Differential Revision: D29805941
Pulled By: malfet
fbshipit-source-id: 01e5963c6717c1b44b260300d87ba0bf57f26ce9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60021
Dropping the imaginary component is expected and gives the correct gradient
formula, so silencing the warning is appropriate.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D29589371
Pulled By: mruberry
fbshipit-source-id: 73e1511cae69207dc9abe576e2769ee1d03f1bbd
Summary:
Partially addresses https://github.com/pytorch/pytorch/issues/49825 by improving the testing
- Rename some of the old tests that had "inplace_view" in their names, but actually mean "inplace_[update_]on_view" so there is no confusion with the naming
- Adds some tests in test_view_ops that verify basic behavior
- Add tests that creation meta is properly handled for no-grad, multi-output, and custom function cases
- Add test that verifies that in the cross dtype view case, the inplace views won't be accounted in the backward graph on rebase as mentioned in the issue.
- Update inference mode tests to also check in-place
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59891
Reviewed By: albanD
Differential Revision: D29272546
Pulled By: soulitzer
fbshipit-source-id: b12acf5f0e3f788167ebe268423cdb58481b56f6
Summary:
The grad() function needs to return the updated values, and hence
needs a non-empty inputs to populate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016
Test Plan:
Passes Python and C++ unit tests, and added new tests to catch this behavior.
Fixes https://github.com/pytorch/pytorch/issues/47061
Reviewed By: albanD
Differential Revision: D26406444
Pulled By: dagitses
fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a
Summary:
We only set the value and not the actual VC.
This means that in the context of double backward, if that saved tensor is saved again and the original Tensor is modified inplace, we would not detect it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60195
Reviewed By: Varal7
Differential Revision: D29208766
Pulled By: albanD
fbshipit-source-id: 81175f8e3f111f89524f8e46f47577b2ea4fc945
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4661
- Add warnings in engine's `execute` function so it can be triggered through both cpp and python codepaths
- Adds an RAII guard version of `c10::Warning::set_warnAlways` and replaces all prior usages of the set_warnAlways with the new one
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59412
Reviewed By: jbschlosser
Differential Revision: D28969294
Pulled By: soulitzer
fbshipit-source-id: b03369c926a3be18ce1cf363b39edd82a14245f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59483
... for functions that are not implemented
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D28933806
fbshipit-source-id: dadae1af6609f15419cf0f47a98361dc87dff849
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987
Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype:
Here's a summary of the changes in this PR:
This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose).
1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor.
2. NEW API:
a) `.conj()` -- now returning a view.
b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory.
c) `.conj_physical_()`, and `out=` variant
d) `.resolve_conj()` -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0.
e) `.resolve_conj_()` in-place version of (d)
f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors.
g) `view_as_real` -- existing function, but now errors out on conjugated tensors.
3. Conjugate Fallback
a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor.
b) This fallback is well equipped to handle the following cases:
- functional operation e.g., `torch.sin(input)`
- Mutable inputs and in-place operations e.g., `tensor.add_(2)`
- out-of-place operation e.g., `torch.sin(input, out=out)`
- Tensorlist input args
- NOTE: Meta tensors don't work with conjugate fallback.
4. Autograd
a) `resolve_conj()` is an identity function w.r.t. autograd
b) Everything else works as expected.
5. Testing:
a) All method_tests run with conjugate view tensors.
b) OpInfo tests that run with conjugate views
- test_variant_consistency_eager/jit
- gradcheck, gradgradcheck
- test_conj_views (that only run for `torch.cfloat` dtype)
NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit.
Follow up work:
1. conjugate view RFC
2. Add neg bit to re-enable view operation on conjugated tensors
3. Update linalg functions to call into specialized functions that fast path with the hermitian operation.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D28227315
Pulled By: anjali411
fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f
Summary:
Adds `is_inference` as a native function w/ manual cpp bindings.
Also changes instances of `is_inference_tensor` to `is_inference` to be consistent with other properties such as `is_complex`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58729
Reviewed By: mruberry
Differential Revision: D28874507
Pulled By: soulitzer
fbshipit-source-id: 0fa6bcdc72a4ae444705e2e0f3c416c1b28dadc7
Summary:
There are two main changes here:
- THPVariable will actually visit their grad_fn if there are no other reference to the c++ Tensor and no other reference to the grad_fn. The critical observation compared to the existing comment (thanks Ed!) is that if we also check that the c++ Tensor object is not referenced somewhere else, we're sure that no one can change the grad_fn refcount between the traverse and the clear.
- THPVariable don't need a special clear for this new cases as we're the only owner of the c++ Tensor and so the cdata.reset() will necessarily free the Tensor and all its resources.
The two tests are to ensure:
- That the cycles are indeed collectible by the gc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58271
Reviewed By: ngimel
Differential Revision: D28796461
Pulled By: albanD
fbshipit-source-id: 62c05930ddd0c48422c79b03118db41a73c1355d
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57679
##### Release Notes
This is part of the end of the deprecation of inplace/view:
- `detach_` will now raise an error when invoked on any view created by `split`, `split_with_sizes`, or `chunk`. You should use the non-inplace `detach` instead.
- The error message for when an in-place operation (that is not detach) is performed on a view created by `split`, `split_with_size`, and `chunk` has been changed from "This view is **an** output of a function..." to "This view is **the** output of a function...".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58285
Reviewed By: bdhirsh
Differential Revision: D28441980
Pulled By: soulitzer
fbshipit-source-id: e2301d7b8cbc3dcdd328c46f24bcb9eb7f3c0d87
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56608
- Adds binding to the `c10::InferenceMode` RAII class in `torch._C._autograd.InferenceMode` through pybind. Also binds the `torch.is_inference_mode` function.
- Adds context manager `torch.inference_mode` to manage an instance of `c10::InferenceMode` (global). Implemented in `torch.autograd.grad_mode.py` to reuse the `_DecoratorContextManager` class.
- Adds some tests based on those linked in the issue + several more for just the context manager
Issues/todos (not necessarily for this PR):
- Improve short inference mode description
- Small example
- Improved testing since there is no direct way of checking TLS/dispatch keys
-
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58045
Reviewed By: agolynski
Differential Revision: D28390595
Pulled By: soulitzer
fbshipit-source-id: ae98fa036c6a2cf7f56e0fd4c352ff804904752c
Summary:
Port addmm to structure kernel
Follow ups
- migrate `mm` and `addbmm` to structure
- move TORCH_CHECKS currently in `addmm_cpu_impl_` and `addmm_out_cuda_impl` to meta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57417
Reviewed By: bdhirsh
Differential Revision: D28291001
Pulled By: walterddr
fbshipit-source-id: 4eafaa30a465e225fbb4d2a69a36f1e037df9122
Summary:
This one had a tricky usage of `torch.symeig` that had to be replaced. I tested the replacement locally though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57732
Reviewed By: bdhirsh
Differential Revision: D28328189
Pulled By: mruberry
fbshipit-source-id: 7f000fcbf2b029beabc76e5a89ff158b47977474
Summary:
Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method.
However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`,
`torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python.
Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function.
~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913
Reviewed By: albanD
Differential Revision: D28355725
Pulled By: mruberry
fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30696
### Release Notes
Instantiating a custom autograd function is now deprecated. Users should call `.apply()` on the class itself because it is a static method.
--end release notes--
- There are a couple error messages that we can't entirely remove because accessing these attributes of the autograd function instance may segfault (due to cdata being nullptr). Also added a TORCH_CHECK for the name attribute which previously segfaulted.
- Error message updated to convey 1) old-style functions have been deprecated 2) this access pattern was once valid
- Updates variable -> Tensor for some error messages
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57357
Reviewed By: mrshenli
Differential Revision: D28193095
Pulled By: soulitzer
fbshipit-source-id: f021b105e9a3fd4a20d6ee3dfb6a06a8c34b10ca
Summary:
This makes detach both forward and backward non-differentiable by default.
You can pass the `only_backward_mode=True` argument to make it forward differentiable but backward non-differentiable.
The important side effect of this change is that, by default, detach is not tracking any view information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57820
Reviewed By: ezyang
Differential Revision: D28287633
Pulled By: albanD
fbshipit-source-id: bdc4726fcd05889f6ac84e5a3a3ef71b2ec41015
Summary:
This PR also removes qr and eig tests from test/test_torch.py. They were not skipped if compiled without LAPACK and they are now replaced with OpInfos.
Fixes https://github.com/pytorch/pytorch/issues/55929
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56284
Reviewed By: ejguan
Differential Revision: D27827077
Pulled By: mruberry
fbshipit-source-id: 1dceb955810a9fa34bb6baaccbaf0c8229444d3a
Summary:
Problem arises for sinc'(x) where x != 0, but x ** 2 == 0, which happens for some very small floats.
I realized that my solution from https://github.com/pytorch/pytorch/issues/56763 was incomplete when I did a quick implementation using `torch.autograd.Function` and still got a `NaN` from my derivative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56986
Reviewed By: gchanan
Differential Revision: D28093507
Pulled By: albanD
fbshipit-source-id: 2a30e1065b08c5c60de843a0778dedeb0fb295f4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153
Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA.
- [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors
- [x] add complex support to coalesce function
- [x] add complex support to to_dense function
- [x] add complex support to to_sparse function
- [x] add complex support to sparse_add function
- [x] add unit tests
Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra.
Note: Before using ghstack the original PR was #50984
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27765618
Pulled By: ezyang
fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55692
### Release notes
get_numerical_jacobian and get_analytical_jacobian only support `grad_out=1` and `fn` no longer accepts functions that return complex output
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D28004614
Pulled By: soulitzer
fbshipit-source-id: 9592c9c69584b4035b39be62252f138dce39d3b5
Summary:
Adding cuda synchronization when entering and exiting the profiler
context manager
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56651
Test Plan: CI
Reviewed By: gdankel
Differential Revision: D27926270
Pulled By: ilia-cher
fbshipit-source-id: 5cf30128590c1c71a865f877578975c4a6e2cb48
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55656
### For release notes
What:
- All errors that are silenced by "raise_exception=False" are now GradcheckError (which inherits from RuntimeError).
Why:
- Due to a refactor of gradcheck
Workaround:
- If you catch for 'RuntimeError' with `except RuntimeError`, since GradcheckError inherits from RuntimeError, no changes are necessary. However if you explicitly check for the errors type via `type(error)`, you'll need to update your code to check for `GradcheckError` instead.
Factors out all the logic handling involving `fail_test`, `raise_exception` into 1) a wrapper around gradcheck that uses try/except 2) gradcheck_helper that always raises exception.
This allows us to avoid having to write the `if not x: return False` logic that is scattered throughout gradcheck currently.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27920809
Pulled By: soulitzer
fbshipit-source-id: 253aef6d9a3b147ee37a6e37a4ce06437981929a
Summary:
Temporary fix to give people extra time to finish the deprecation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56401
Reviewed By: xw285cornell, drdarshan
Differential Revision: D27862196
Pulled By: albanD
fbshipit-source-id: ed460267f314a136941ba550b904dee0321eb0c6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54480
This PR shouldn't really change the behavior of gradcheck for most ops. However, the changes in test_autograd allow us to run basic checks for both fast and slow (instead of previously just slow). All it should be doing is wrapping the preexisting tests we introduced in prior PRs in a function which takes `fast_mode` as a param. We then call this function twice, once with `fast_mode=True` and once with `fast_mode=False`.
Plan for rollout:
- This PR should only land the code (and runs some basic checks as described above).
- This should help us verify that a) slow is still working as expected b) basic functionality of fast works
- After we land this, but before we run the next PR in the stack, we should land https://github.com/pytorch/pytorch/pull/55182. This is to ensure that there is no gap where the slow tests aren't running.
- The next PR is responsible for enabling the fast_mode=True flag on all tests (where the function has real inputs/outputs), and selectively disabling for the cases the fail.
- Finally in a later PR, we reenable fast-gradcheck for functions w/ complex inputs/outputs
TODOs and open questions (not necessarily blocking this PR):
- ~How do we think about atol/rtol~ (scale atol, keep rtol as-is)
- ~reenable fast-gradcheck for complex numbers~
- ~when inputs are uncoalesced we don't truly test this case because we coalesce the inputs before calling function. Revisit this when https://github.com/pytorch/pytorch/pull/52874/files is landed~
### Developer Experience
Sample output when jacobian mismatch occurs:
```
Traceback (most recent call last):
File "/home/s/local/pytorch4/test/test_autograd.py", line 4220, in test_gradcheck_jacobian_mismatch
check(fast_mode=True)
File "/home/s/local/pytorch4/test/test_autograd.py", line 4196, in check
gradcheck(fn, (x,), fast_mode=fast_mode)
File "/home/s/local/pytorch4/torch/testing/_internal/common_utils.py", line 2067, in gradcheck
return torch.autograd.gradcheck(fn, inputs, **kwargs)
File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 1020, in gradcheck
if not fast_gradcheck(fail_test, seeded_func, func_out, tupled_inputs, outputs, eps, rtol,
File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 915, in fast_gradcheck
return fail_test(get_notallclose_msg(a, n, i, j, prefix) + jacobians_str)
File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 996, in fail_test
raise RuntimeError(msg)
RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor(0.9195)
analytical:tensor(0.9389)
The above quantities relating the numerical and analytical jacobians are computed
in fast mode. See: https://github.com/pytorch/pytorch/issues/53876 for more background
about fast mode. Below, we recompute numerical and analytical jacobians in slow mode:
Numerical:
tensor([[1.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 1.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 1.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 1.0000]])
Analytical:
tensor([[1.0100, 0.0100, 0.0100, 0.0100],
[0.0100, 1.0100, 0.0100, 0.0100],
[0.0100, 0.0100, 1.0100, 0.0100],
[0.0100, 0.0100, 0.0100, 1.0100]])
The max per-element difference (slow mode) is: 0.010000000000054632.
```
Additionally, if the per-element difference is small i.e., `allclose(analytical_slow, numerical_slow, rtol, atol) is True` we follow up with this message:
```
Fast gradcheck failed but element-wise differences are small. This means that the
test might've passed in slow_mode!
If you are adding a new operator, please file an issue and then use one of the
workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
If the test
- manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
with `fast_mode=False` as a keyword argument.
- is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test
to have `gradcheck_fast_mode=False`
- is a Module test (e.g., in common_nn.py), then modify the corresponding
module_test entry to have `gradcheck_fast_mode=False`
```
Test Plan: Imported from OSS
Reviewed By: walterddr, ejguan
Differential Revision: D27825160
Pulled By: soulitzer
fbshipit-source-id: 1fe60569d8b697c213b0d262a832622a4e9cf0c7
Summary:
Reland of https://github.com/pytorch/pytorch/pull/49098
See original issue for details.
The only difference with previous PR is the fix of the _embedding_bag_dense_backward formula to stop declaring a backward formula for an argument that does not exists.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56083
Reviewed By: samestep
Differential Revision: D27778221
Pulled By: albanD
fbshipit-source-id: 159ef91ca931ef2ccfbc3d1c46c7880c32919dc9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54378
### For release notes
`torch.autograd.gradcheck.get_numerical_jacobian` (not part of the public api) is being deprecated.
In the future, user code relying on this function will break because, among other changes, `get_numerical_jacobian` now returns `List[Tuple[torch.Tensor]]` instead of `List[torch.Tensor]`.
(more details if necessary)
For a `fn` that takes in M inputs and N outputs we now return a list of M N-tuples of jacobians where `output[i][j]` would represent the numerical jacobian w.r.t. to the ith input and the jth output. Previously `get_numerical_jacobian` returned a list of tensors where each tensor represents the jacobian w.r.t. to each of the M inputs and a specific output. Finally, the function passed in as the parameter `fn` should expect to handle individual parameters, where previously `fn` is required to expect its parameters wrapped in a tuple.
--- end --
This PR addresses the comment here https://github.com/pytorch/pytorch/pull/53857#discussion_r595429639, to reduce the run-time of old gradcheck's get numerical jacobian by a factor of num_outputs. However, because very few ops actually return multiple outputs, there is not too much real speed up here.
The main benefit of doing this change as part of the refactor is that it helps us isolate the possible bugs that are specific to switching `get numerical jacobian` to run in a per output way vs all outputs at once. Much of the logic implemented here will be the same for the fast gradcheck case, so knowing for certain that everything should pass after this stage will make the next step much simpler.
The get_numerical_jacobian api is also being used in common_nn. So we update the callsite there as well.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D27728720
Pulled By: soulitzer
fbshipit-source-id: ee0f90b4f26ddc5fdbe949c4965eaa91c9ed0bb8
Summary:
There are a few autograd tests checking for tensors leaked by reference cycles. This changes them to use `_WeakTensorRef` over `weakref`. `_WeakTensorRef`, added in https://github.com/pytorch/pytorch/issues/52874, accesses the C++ level `TensorImpl` reference count, compared to `weakref` which accesses python refcounts and so can only tell if the python wrapper object gets deallocated. Not only is this less code, it's also more accurately detecting that the Tensor itself is deallocated.
I didn't touch `weakref` usage in [test_anomaly_assign_parent_cleanup](fc349cbcde/test/test_autograd.py (L3733)) and [test_nested_anomaly_printstack_cleanup](fc349cbcde/test/test_autograd.py (L3772)) because these are intentionally testing for python object cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55726
Reviewed By: ngimel
Differential Revision: D27718526
Pulled By: albanD
fbshipit-source-id: 37a4914360e35dd4ae8db06b29525cebec4d4b84
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53651
I did not put much effort in improving the docs, as I will go over all these docs in future PRs
cc anjali411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55085
Reviewed By: nikithamalgifb
Differential Revision: D27493604
Pulled By: anjali411
fbshipit-source-id: 413363013e188bc869c404b2d54ce1f87eef4425
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52253
In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor.
As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874
Reviewed By: bdhirsh
Differential Revision: D27246997
Pulled By: albanD
fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53916
This PR fixes some bugs that are made more clear by the previous refactor.
- make sure gradcheck returns false when its supposed to fail and when raise_exception=False.
- make sure when test_batched_grad fails, it returns false when raise_exception=False
Removing checkIfNumericalAnalyticAreClose made sense here to me because underneath its really doing `torch.allclose`, and using that directly instead of adding another opaque function to call seemed to make the code more clear.
TODO:
- ~add a test to see if when torch.allclose fails, we indeed return false.~
- ~uncomment test from previous PR.~
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D27201692
Pulled By: soulitzer
fbshipit-source-id: 8b8dc37c59edb7eebc2e8db6f8839ce98a81d78b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53857
This PR basically just factors a lot of the logic out from the main gradcheck function into their own individual functions. It aims to avoid any behavior change (but we may not have enough tests to actually verify this). Refactorings that lead to any behavior chang are done in the next PR in this stack.
The rationale for this change is 1) to make the main gradcheck function cleaner to read, and 2) also allow us to reuse the same pieces when we add the fast gradcheck.
Maybe this PR is also a good place to add some tests for gradcheck, i.e., make sure gradcheck fails when it should fail, as to make sure that we are indeed not changing any logic. This will also help us make sure our fast_gradcheck does all the necessary checks:
So far existing tests are:
- test_gradcheck_fail_when_no_differentiable_outputs_and_num_grad_not_zero` (test_autograd)
- test_gradcheck_single_input (test_autograd)
- test_gradcheck_sparse_input (test_autograd)
- test_gradcheck_nondeterministic (test_autograd)
- test_gradcheck (test_overrides)
Full coverage would potentially require adding the following missing tests (for each test for both raise_exception=True/False) - Methodology for getting the list below is that for every type of error message we spit out, we make sure we can hit it:
- complex:
- when numerical != analytical when tested with imag grad_out
- check_inputs
- ~when inputs are not dense, but check_sparse_nnz is false~
- ~when none of the inputs require grad~
- ~(warning) when inputs are not double precision~
- ~when layout is not mkldnn(aka has strides) and input has a dimension with stride 0.~
- check_no_differentiable_outputs:
- ~when none of the outputs are differentiable, but numerical gradient is not zero~
- check_outputs:
- ~when sparse outputs (always raise)~
- ~when mkldnn outputs (always raise)~
- test_batched_grad
- ~when encounter runtime error while computing batched grad (print big message)~
- when not allclose (print out big message)
- test_backward_mul_by_grad_output
- ~when layout of grad_input is not the same as input~
- ~when grad_input is sparse and has incorrect sparse_dim/dense_dim~
- ~when backward not multiplied by grad_output (sparse/non-sparse case)~
- when grad is incorrect type/size
- test_undefined_grad
- ~when encounter runtime error while running backward~
- when we complete backward but grad inputs (the output of .grad()) is not none
- check_analytical_jacobian_attributes (for both complex/non complex)
- when grad input is incorrect dtype/size
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D27201571
Pulled By: soulitzer
fbshipit-source-id: 86670a91e65740d57dd6ada7c6b4512786d15962
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52422
As mentioned in https://github.com/pytorch/pytorch/issues/52415,
`torch.utils.checkpoint` doesn't support checkpointing for functions which have
non-tensor inputs and outputs.
This PR resolves this issue by ensuring the autograd machinery ignores the
non-tensor inputs and outputs and processes the tensors accordingly.
ghstack-source-id: 124406867
Test Plan:
1) unit test
2) waitforbuildbot
Reviewed By: albanD
Differential Revision: D26507228
fbshipit-source-id: 0a5a1591570814176185362e83ad18dabd9c84b0
Summary:
Also updates the doc such that the language matches the type. For example, previously the `tensors` argument is specified as `(sequence of tensor)`, but has type annotation of `_TensorOrTensors`. Now its correctly updated to be `Sequence[Tensor] or Tensor`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53827
Reviewed By: albanD
Differential Revision: D26997541
Pulled By: soulitzer
fbshipit-source-id: e1e609a4e9525139d0fe96f6157175481c90d6f8
Summary:
As per title. Compared to the previous version, it is lighter on the usage of `at::solve` and `at::matmul` methods.
Fixes https://github.com/pytorch/pytorch/issues/51621
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52875
Reviewed By: mrshenli
Differential Revision: D26768653
Pulled By: anjali411
fbshipit-source-id: aab141968d02587440128003203fed4b94c4c655
Summary:
When saved variable is of an output, its grad_fn is not saved in SavedVariable, so it must be passed in during `unpack`.
Here, we can always pass in grad_fn (whether or not saved variable is an output) because it is ignored if the saved variable is not an output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53205
Reviewed By: gchanan, zhangguanheng66
Differential Revision: D26794365
Pulled By: soulitzer
fbshipit-source-id: e039baba20c364c4ab42ff99d0b242dd95c67fb3
Summary:
This PR adds functionality to skip a test based on CUDA version.
This way, we can be more specific when skipping a test, such as when the test only fails for a particular CUDA version.
This allows us to add back the skipped tests for CUDA 11.2 for other CUDA versions, such as 10.1 and 11.1.
I tested this locally (by using 11.0 instead of 11.2), but will run all the CI to make sure it works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52359
Reviewed By: walterddr
Differential Revision: D26487951
Pulled By: janeyx99
fbshipit-source-id: 45c71cc6105ffd9985054880009cf68ea5ef3f6a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39784
At the time the issue was filed, there was only issue (1) below.
There are actually now two issues here:
1. We always set all inputs passed in through `inputs` arg as `needed = True` in exec_info. So if we pass in an input that has a grad_fn that is not materialized, we create an entry of exec_info with nullptr as key with `needed = True`. Coincidentally, when we perform simple arithmetic operations, such as "2 * x", one of the next edges of mul is an invalid edge, meaning that its grad_fn is also nullptr. This causes the discovery algorithm to set all grad_fns that have a path to this invalid_edge as `needed = True`.
2. Before the commit that enabled the engine skipped the dummy node, we knew that root node is always needed, i.e., we hardcode `exec_info[&graph_root]=true`. The issue was that this logic wasn't updated after the code was updated to skip the graph root.
To address (1), instead of passing in an invalid edge if an input in `inputs` has no grad_fn, we create a dummy grad_fn. This is done in both python and cpp entry points. The alternative is to add logic for both backward() and grad() cases to check whether the grad_fn is nullptr and set needed=false in that case (the .grad() case would be slightly more complicated than the .backward() case here).
For (2), we perform one final iteration of the discovery algorithm so that we really know whether we need to execute the graph root.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51940
Reviewed By: VitalyFedyunin
Differential Revision: D26369529
Pulled By: soulitzer
fbshipit-source-id: 14a01ae7988a8de621b967a31564ce1d7a00084e
Summary:
Adding CUDA 11.2 to Windows CI.
Disabled tests:
The following ran into `CUDA error: misaligned address` for CUDA 11.2: (issue linked below)
`test_where_scalar_valid_combination_cuda_complex128` in test_torch.py
`test_sgn_complex_cuda` in test_autograd.py
The following ran into `CUDA error: too many resources requested for launch` for CUDA 11.2: (https://github.com/pytorch/pytorch/issues/52002)
test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_float64
test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_float64
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51598
Reviewed By: mrshenli
Differential Revision: D26344965
Pulled By: janeyx99
fbshipit-source-id: 3c9a4ed16d748969e96593220ec0a9f33e1ffcef
Summary:
Fixes flake8 failures in test_autograd.py by using `gradcheck` from `torch.testing._internal.common_utils` rather than directly from`torch.autograd.gradcheck`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51963
Reviewed By: albanD
Differential Revision: D26339107
Pulled By: malfet
fbshipit-source-id: 63e0f12df16b70e394097ad88852984c1848a9e6
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51349
The memory leak happens when 1) `create_graph` is True AND 2) detect anomaly mode is on. When a backward node's constructor is called during backward, the current evaluating node is assigned as a "parent" of the created node. The code that assigns the parent encounters the below issue:
`functionToPyObject(parent_node)` returns a new PyObject (with refcount 1) or if PyObject already exists, increments its refcount by 1. However [PyDict_SetItem](1b55b65638/Objects/dictobject.c (L1532)) calls into [insertdict](https://github.com/python/cpython/blob/v3.8.1/Objects/dictobject.c#L1034) which increments refcount again. This means that when dict is destroyed, the refcount of the PyObject is at least one. This keeps `parent_node` (the backward function) alive, which then keeps the saved tensor alive.
Similar calls in the codebase to `functionToPyObject` won't require Py_DECREF if it is then passed into a tuple (instead of dict), because the analogous PyTuple_SetItem call does not increment refcount.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51610
Reviewed By: albanD
Differential Revision: D26240336
Pulled By: soulitzer
fbshipit-source-id: 2854528f66fab9dbce448f8a7ba732ce386a7310
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51421
Mark memory events that did not happen within an operator context
explicitly in the profiler output.
Test Plan: python test/test_profiler.py -k test_memory_profiler
Reviewed By: ngimel
Differential Revision: D26166518
Pulled By: ilia-cher
fbshipit-source-id: 3c14d3ac25a7137733ea7cc65f0eb48693a98f5e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51638
This PR makes the following doc changes:
- Makes it clear to users that they should use vectorize "at their own
risk"
- Makes it clear that vectorize uses the "experimental prototype vmap"
so that when users see error messages related to vmap they will know
where it is coming from.
This PR also:
- makes it so that {jacobian, hessian} call a version of vmap that
doesn't warn the user that they are using an "experimental prototype".
The regular torch.vmap API does warn the user about this. This is to
improve a UX a little because the user already knows from discovering
the flag and reading the docs what they are getting themselves into.
Test Plan:
- Add test that {jacobian, hessian} with vectorize=True don't raise
warnings
Reviewed By: albanD
Differential Revision: D26225402
Pulled By: zou3519
fbshipit-source-id: 1a6db920ecf10597fb2e0c6576f510507d999c34
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49756
## Background
Fix applied here is to remove the grad enabled check from `collect_next_edges`, unconditionally returning the actual collected edges. This pushes the responsibility for determining whether the function should be called without grad mode to its call-sites. With this update, `collect_next_edges` will no longer incorrectly return an empty list, which caused the problem described in the issue. Three call-sites depended on this behavior and have been updated.
Beyond bad printing side effects, this fix addresses the more general issue of accessing `grad_fn` with grad mode disabled after an in-place operation on a view. The included test verifies this without the use of print.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51364
Test Plan:
```
python test/test_autograd.py TestAutogradDeviceTypeCPU.test_inplace_view_then_no_grad_cpu
```
Reviewed By: zou3519
Differential Revision: D26190451
Pulled By: jbschlosser
fbshipit-source-id: 9b004a393463f8bd4ac0690e5e53c07a609f87f0
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49824
## Background
When creating a view of a view, there was a possibility that the new view would be less restrictive than the previous view, incorrectly sidestepping the error that should be thrown when using in-place operations on the new view.
The fix addresses this by propagating `CreationMeta` from the previous view to the new view. Currently, the old view's `creation_meta` is only propagated when the new view's `creation_meta == CreationMeta::DEFAULT`. This ensures that the new view is not less restrictive than the previous view wrt. allowing in-place operations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51061
Test Plan:
```
python test/test_autograd.py TestAutogradDeviceTypeCPU.test_inplace_view_of_multiple_output_view_cpu
python test/test_autograd.py TestAutogradDeviceTypeCUDA.test_inplace_view_of_multiple_output_view_cuda
python test/test_autograd.py TestAutogradDeviceTypeCPU.test_inplace_multiple_output_view_of_view_cpu
python test/test_autograd.py TestAutogradDeviceTypeCUDA.test_inplace_multiple_output_view_of_view_cuda
```
Reviewed By: heitorschueroff
Differential Revision: D26076434
Pulled By: jbschlosser
fbshipit-source-id: c47f0ddcef9b8449427b671aff9ad08edca70fcd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50915Fixes#50584
Add a vectorize flag to torch.autograd.functional.jacobian and
torch.autograd.functional.hessian (default: False). Under the hood, the
vectorize flag uses vmap as the backend to compute the jacobian and
hessian, respectively, providing speedups to users.
Test Plan:
- I updated all of the jacobian and hessian tests to also use
vectorized=True
- I added some simple sanity check tests that check e.g. jacobian with
vectorized=False vs
jacobian with vectorized=True.
- The mechanism for vectorized=True goes through batched gradient
computation. We have separate tests for those (see other PRs in this
stack).
Reviewed By: heitorschueroff
Differential Revision: D26057674
Pulled By: zou3519
fbshipit-source-id: a8ae7ca0d2028ffb478abd1b377f5b49ee39e4a1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50615
The method tests for some of the ops have been ported to the new OpInfo based tests. This PR removes those op names from `complex_list` in `test_autograd.py`
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D25931268
Pulled By: anjali411
fbshipit-source-id: 4d08626431c61c34cdca18044933e4f5b9b25232
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33884
Mitigates https://github.com/pytorch/pytorch/issues/5261.
It's not possible for us to support cudnn RNN double backwards due to
limitations in the cudnn API. This PR makes it so that we raise an error
message if users try to get the double backward on a cudnn RNN; in the
error message we suggest using the non-cudnn RNN.
Test Plan: - added some tests to check the error message
Reviewed By: albanD
Differential Revision: D20143544
Pulled By: zou3519
fbshipit-source-id: c2e49b3d8bdb9b34b561f006150e4c7551a78fac
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50592
This adds a `check_batched_grad=False` option to gradcheck and gradgradcheck.
It defaults to False because gradcheck is a public API and I don't want
to break any existing non-pytorch users of gradcheck.
This:
- runs grad twice with two grad outputs, a & b
- runs a vmapped grad with torch.stack([a, b])
- compares the results of the above against each other.
Furthermore:
- `check_batched_grad=True` is set to be the default for
gradcheck/gradgradcheck inside of test_autograd.py. This is done by
reassigning to the gradcheck object inside test_autograd
- I manually added `check_batched_grad=False` to gradcheck instances
that don't support batched grad.
- I added a denylist for operations that don't support batched grad.
Question:
- Should we have a testing only gradcheck (e.g.,
torch.testing.gradcheck) that has different defaults from our public
API, torch.autograd.gradcheck?
Future:
- The future plan for this is to repeat the above for test_nn.py (the
autogenerated test will require a denylist)
- Finally, we can repeat the above for all pytorch test files that use
gradcheck.
Test Plan: - run tests
Reviewed By: albanD
Differential Revision: D25925942
Pulled By: zou3519
fbshipit-source-id: 4803c389953469d0bacb285774c895009059522f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50632
I'll port the following method tests in follow-up PRs:
`'baddbmm', 'addbmm', 'addmv', 'addr'`
After the tests are ported to OpInfo based tests, it would also be much easier to add tests with complex alpha and beta values.
Edit- it seems like it's hard to port the broadcasting variant tests because one ends up skipping `test_inplace_grad` and `test_variant_consistency_eager` even for the case when inputs are not required to be broadcasted.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D25947471
Pulled By: anjali411
fbshipit-source-id: 9faa7f1fd55a1269bad282adac2b39d19bfa4591
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49120
This adds a `check_batched_grad=False` option to gradcheck and gradgradcheck.
It defaults to False because gradcheck is a public API and I don't want
to break any existing non-pytorch users of gradcheck.
This:
- runs grad twice with two grad outputs, a & b
- runs a vmapped grad with torch.stack([a, b])
- compares the results of the above against each other.
Furthermore:
- `check_batched_grad=True` is set to be the default for
gradcheck/gradgradcheck inside of test_autograd.py. This is done by
reassigning to the gradcheck object inside test_autograd
- I manually added `check_batched_grad=False` to gradcheck instances
that don't support batched grad.
- I added a denylist for operations that don't support batched grad.
Question:
- Should we have a testing only gradcheck (e.g.,
torch.testing.gradcheck) that has different defaults from our public
API, torch.autograd.gradcheck?
Future:
- The future plan for this is to repeat the above for test_nn.py (the
autogenerated test will require a denylist)
- Finally, we can repeat the above for all pytorch test files that use
gradcheck.
Test Plan: - run tests
Reviewed By: albanD
Differential Revision: D25563542
Pulled By: zou3519
fbshipit-source-id: 125dea554abefcef0cb7b487d5400cd50b77c52c
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47671
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49272
Test Plan:
```
x = torch.tensor([-2, -1, 0, 1, 2], dtype=torch.float32, requires_grad=True)
y = torch.nn.functional.elu_(x.clone(), alpha=-2)
grads = torch.tensor(torch.ones_like(y))
y.backward(grads)
```
```
RuntimeError: In-place elu backward calculation is triggered with a negative slope which is not supported.
This is caused by calling in-place forward function with a negative slope, please call out-of-place
version instead.
```
Reviewed By: albanD
Differential Revision: D25569839
Pulled By: H-Huang
fbshipit-source-id: e3c6c0c2c810261566c10c0cc184fd81b280c650
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49552
This PR:
1. Migrates independent autograd test for `hstack`, `dstack`, `vstack`, `movedim`, `moveaxis` from `test_autograd.py` to the new `OpInfo` based tests.
2. Migrates autograd test for `gather`, `index_select` from the method_tests to the new `OpInfo` based tests.
2. Enables complex backward for `stack, gather, index_select, index_add_` and adds tests for complex autograd for all the above mentioned ops.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D25682511
Pulled By: anjali411
fbshipit-source-id: 5d8f89db4a9ec340ab99a6196987d44a23e2c6c6