In preparation of adopting future rocblas library options, it is necessary to track when the backward pass of training is executing. The scope-based helper class `BackwardPassGuard` is provided to toggle state.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71881
Approved by: https://github.com/albanD
This PR modifies `lu_unpack` by:
- Using less memory when unpacking `L` and `U`
- Fuse the subtraction by `-1` with `unpack_pivots_stub`
- Define tensors of the correct types to avoid copies
- Port `lu_unpack` to be a strucutred kernel so that its `_out` version
does not incur on extra copies
Then we implement `linalg.lu` as a structured kernel, as we want to
compute its derivative manually. We do so because composing the
derivatives of `torch.lu_factor` and `torch.lu_unpack` would be less efficient.
This new function and `lu_unpack` comes with all the things it can come:
forward and backward ad, decent docs, correctness tests, OpInfo, complex support,
support for metatensors and support for vmap and vmap over the gradients.
I really hope we don't continue adding more features.
This PR also avoids saving some of the tensors that were previously
saved unnecessarily for the backward in `lu_factor_ex_backward` and
`lu_backward` and does some other general improvements here and there
to the forward and backward AD formulae of other related functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67833
Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry
Fixes#73298
I don't know whether `where` kernel actually supports type promotion, nor am I in the mood to find out, so it's manual type promotion.
Edit: nah, i can't tell TI to "promote to common dtype" because of bool condition, so manual type promotion is our only option.
I'll see what tests start failing and fix.
Uses some parts from #62084
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76691
Approved by: https://github.com/mruberry
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74226
Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`
`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)`
- Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py`
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D35222842
Pulled By: mikaylagawarecki
fbshipit-source-id: 84930add2ad30baf872c495251373313cb7428bd
(cherry picked from commit 1b45139482e22eb0dc8b6aec2a7b25a4b58e31df)
The doc was indicating "If a dimension is not specified, the tensor will
be flattened", whereas the actual behavior is that the input tensor is
flattened only if the `dims` argument is not provided at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74880
Approved by: https://github.com/albanD
Summary:
This PR ports `index_copy` implementation to structured kernels, also adds an `out` variant.
~Note to the reviewers: This is in draft mode, waiting for the tests from the CI, and I'll give a final look before requesting the review.~
Issue tracker: https://github.com/pytorch/pytorch/issues/55070
cc: bdhirsh ysiraichi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67329
Reviewed By: ejguan
Differential Revision: D34077219
Pulled By: bdhirsh
fbshipit-source-id: 6accda33957f654b753261c5c3d765a27a64d2c0
(cherry picked from commit f3ac83217a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71486
This PR adds upgraders for linspace and linspace.out as the optional step size will be deprecated soon. Old models will be using steps size of 100 when nothing is provided.
Test Plan: buck-out/gen/caffe2/test/jit#binary.par -r TestUpgraders.test_aten_linspace
Reviewed By: cccclai, mruberry
Differential Revision: D33654308
fbshipit-source-id: 0e0138091da0b11d4f49156eeb6bcd7e46102a5b
(cherry picked from commit 931ae4af32)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69908
I also took this chance to clarify a bit the documentation of these
functions.
cc brianjo mruberry
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision: D33774417
Pulled By: mruberry
fbshipit-source-id: ab4a9014006783d1f87d432ecb959c854374c2d4
(cherry picked from commit f319a75d78)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69827
In general, the current pattern allows for implementing optimisations
for all the backends in a common place (see for example the optimisation
for empty matrices).
After this PR, `torch.svd` is implemented in terms of `linalg.svd` and
`linalg.svdvals`, as expected. This makes it differentiable in the case
when `compute_uv=False`, although this is not particularly important, as
`torch.svd` will eventually be deprecated.
This PR also instantiates smaller `U` / `V` when calling cusolver_gesvdj
in the cases when `full_matrices=False` or `compute_uv=False`.
The memory for auxiliary `U` and `V` in the cases above, needed for some
cuSOLVER routines is allocated raw allocators rather than through fully
fledged tensors, as it's just a blob of memory the algorithm requests.
As the code is better structured now, it was easier to see that `U` and
`Vh` needn't be allocated when calling `svd_cusolver_gesvd`.
Now `linalg.svdvals` work as expected wrt the `out=` parameter.
Note that in the test `test_svd_memory_allocation` we were
passing a tensor of the wrong size and dtype and the test seemed to
pass...
This PR also changes the backward formula to avoid saving the input
matrix, as it's not necessary. In a follow up PR, I will clean the
backward formula and make it more numerically stable and efficient.
This PR also does a number of memory optimisations here and there, and fixes
the call to cusolver_gesvd, which were incorrect for m <= n. To test
this path, I compiled the code with a flag to unconditionally execute
the `if (!gesvdj_convergence_check.empty())` branch, and all the tests
passed.
I also took this chance to simplify the tests for these functions in
`test_linalg.py`, as we had lots of tests that were testing some
functionality that is already currently tested in the corresponding
OpInfos. I used xwang233's feature to test both MAGMA and CUDA
backends. This is particularly good for SVD, as cuSOLVER is always
chosen over MAGMA when available, so testing MAGMA otherwise would be
tricky.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D33751983
Pulled By: mruberry
fbshipit-source-id: 11d48d977946345583d33d14fb11a170a7d14fd2
(cherry picked from commit a1860bd567)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65908
Added a new overload instead of updating the current signature. (Had issues with JIT and **maybe** it would have been FC breaking)
TODO:
* [x] Don't compute `std::pow(10, decimals)` for each element.
* [x] Update docs (https://docs-preview.pytorch.org/66195/generated/torch.round.html?highlight=round#torch.round)
* [x] Add tests
* ~~Should we try to make it composite?~~
* ~~Should we add specialized test with more values of `decimals` outside of OpInfo with larger range of values in input tensor?~~
cc mruberry rgommers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66195
Reviewed By: anjali411
Differential Revision: D31821385
Pulled By: mruberry
fbshipit-source-id: 9a03fcb809440f0c83530108284e69c345e1850f
(cherry picked from commit 50b67c6968)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993
This PR attempts to port `index_add` to structured kernels, but does more than that:
* Adds an `out=` variant to `index_add`
* Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`.
* Changes in `derivatives.yaml` file for autograd functioning
* Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615
Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this)
~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~
Issue tracker: https://github.com/pytorch/pytorch/issues/55070
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D32646426
fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63570
There is a use of `at::triangular_solve_out` in the file
`torch/csrc/jit/tensorexpr/external_functions.cpp` that I have not dared
to move to `at::linalg_solve_triangular_out`.
**Deprecation note:**
This PR deprecates the `torch.triangular_solve` function in favor of
`torch.linalg.solve_triangular`. An upgrade guide is added to the
documentation for `torch.triangular_solve`.
Note that it DOES NOT remove `torch.triangular_solve`, but
`torch.triangular_solve` will be removed in a future PyTorch release.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D32618035
Pulled By: anjali411
fbshipit-source-id: 0bfb48eeb6d96eff3e96e8a14818268cceb93c83
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62146.
Modernizes and clarifies the documentation of torch.tensor and torch.as_tensor, highlighting the distinction in their copying behavior and preservation of autograd history.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63308
Reviewed By: albanD, ngimel
Differential Revision: D30338025
Pulled By: mruberry
fbshipit-source-id: 83a0c113e4f8fce2dfe086054562713fe3f866c2
Summary:
For some reason, the example for `torch.empty` showed the usage of `torch.empty_like` and the other way around. These are now swapped.
Fixes https://github.com/pytorch/pytorch/issues/68799
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68874
Reviewed By: wenleix
Differential Revision: D32646645
Pulled By: ejguan
fbshipit-source-id: c8298bcaca450aaa4abeef2239af2b14cadc05b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568
This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.
This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.
This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.
Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.
We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.
Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32588230
Pulled By: mruberry
fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910