Summary: A bisect blamed #93333 for GPU memory leakage. This diff backs it out.
Test Plan: Monitor max GPU memory usage to see if there's a leak.
Reviewed By: hyuen, yinbinm
Differential Revision: D43511893
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95565
Approved by: https://github.com/ngimel
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
Part of #29137
**BC Breaking Note**
This PR breaks C++ API backward compatibility for `at::std`. A call that has argument types `at::std(Tensor, OptionalIntArrayRef, int64_t, bool)` used to resolve to the `std.correction` overload, but now it resolves to the `std.dim` overload. In order to call the `std.correction` overload, the `int64_t` argument can be wrapped in a `c10::optional`, so that the call has the form `at::std(Tensor, OptionalIntArrayRef, optional<int64_t>, bool)`. The same is true for the corresponding arguments of the `std.out` and `std.correction_out` overloads of `at::std_out`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81845
Approved by: https://github.com/albanD
Summary:
Turn on layer_norm in autodiff
https://github.com/pytorch/pytorch/issues/67732 should have fixed the previously issue exposed by enabling layer_norm in autodiff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69007
Reviewed By: soulitzer
Differential Revision: D32699108
Pulled By: eellison
fbshipit-source-id: 6951668c0e74e056d3776294f4e1fd3123c763e5
Summary:
Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition.
cc gmagogsfm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937
Reviewed By: mruberry
Differential Revision: D32477657
Pulled By: ngimel
fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)
This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.
We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`
The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md
This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.
Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.
2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')
3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.
Credit goes mostly to:
tlemo
kevinstephano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939
Reviewed By: navahgar
Differential Revision: D31093381
Pulled By: eellison
fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967
Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819
Test Plan: no behavior change.
Reviewed By: gmagogsfm
Differential Revision: D31326153
fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
Summary:
Reland of https://github.com/pytorch/pytorch/pull/65242
The last attempt of the reland automatically rebased onto stable, which did not yet have the revert commit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66018
Reviewed By: albanD
Differential Revision: D31348822
Pulled By: soulitzer
fbshipit-source-id: 881d701b404530c1352ac9245bd67264e1652b8a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64000
- updates double backward formula to compute grad wrt output instead of self
- ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242
Reviewed By: albanD
Differential Revision: D31238123
Pulled By: soulitzer
fbshipit-source-id: afd319d3676d9ef8d81607e0e8c2a3e6d09f68e4
Summary:
Turns on BN in autodiff:
1. outputs an empty tensor for running stats to by pass autodiff issue on None;
2. fixing BN inference backward in cudnn & miopen, where backward falls back to native batchnorm kernel instead;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57321
Reviewed By: albanD, ngimel
Differential Revision: D30250419
Pulled By: jansel
fbshipit-source-id: a62553789c20fb50a820003a056f40d9d642dfaa
Summary:
1. extend autodiff by adding entry for layer_norm in symbolic script, we now use native_layer_norm_backward
2. added backward function `layernorm_double_backward` for `native_layer_norm_backward`, preserves double backward support for LayerNorm in autodiff/ScriptModule
3. added python test to verify autodiff on layer_norm with various configuration of optional tensors; (verify the fix in https://github.com/pytorch/pytorch/issues/49430)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50467
Reviewed By: eellison
Differential Revision: D30232864
Pulled By: jansel
fbshipit-source-id: b9c33075386aff96afff7415df9f94388bfb474a
Co-authored-by: Ryan Spring <rspring@nvidia.com>
Co-authored-by: Jie <jiej@nvidia.com>
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13