Summary:
This PR adds autograd support for `torch.orgqr`.
Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it.
The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html.
Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation.
Resolves https://github.com/pytorch/pytorch/issues/50104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637
Reviewed By: agolynski
Differential Revision: D27114246
Pulled By: mruberry
fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54052
Introduce `fp16_compress_wrapper`, which can give some speedup on top of some gradient compression algorithms like PowerSGD.
ghstack-source-id: 124001805
Test Plan: {F509205173}
Reviewed By: iseessel
Differential Revision: D27076064
fbshipit-source-id: 4845a14854cafe2112c0caefc1e2532efe9d3ed8
Summary:
brianjo
- Add a javascript snippet to close the expandable left navbar sections 'Notes', 'Language Bindings', 'Libraries', 'Community'
- Fix two latex bugs that were causing output in the log that might have been misleading when looking for true doc build problems
- Change the way release versions interact with sphinx. I tested these via building docs twice: once with `export RELEASE=1` and once without.
- Remove perl scripting to turn the static version text into a link to the versions.html document. Instead, put this where it belongs in the layout.html template. This is the way the domain libraries (text, vision, audio) do it.
- There were two separate templates for master and release, with the only difference between them is that the master has an admonition "You are viewing unstable developer preview docs....". Instead toggle that with the value of `release`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53851
Reviewed By: mruberry
Differential Revision: D27085875
Pulled By: ngimel
fbshipit-source-id: c2d674deb924162f17131d895cb53cef08a1f1cb
Summary:
Close https://github.com/pytorch/pytorch/issues/51108
Related https://github.com/pytorch/pytorch/issues/38349
This PR implements the `cpu_kernel_multiple_outputs` to support returning multiple values in a CPU kernel.
```c++
auto iter = at::TensorIteratorConfig()
.add_output(out1)
.add_output(out2)
.add_input(in1)
.add_input(in2)
.build();
at::native::cpu_kernel_multiple_outputs(iter,
[=](float a, float b) -> std::tuple<float, float> {
float add = a + b;
float mul = a * b;
return std::tuple<float, float>(add, mul);
}
);
```
The `out1` will equal to `torch.add(in1, in2)`, while the result of `out2` will be `torch.mul(in1, in2)`.
It helps developers implement new torch functions that return two tensors more conveniently, such as NumPy-like functions [divmod](https://numpy.org/doc/1.18/reference/generated/numpy.divmod.html?highlight=divmod#numpy.divmod) and [frexp](https://numpy.org/doc/stable/reference/generated/numpy.frexp.html#numpy.frexp).
This PR adds `torch.frexp` function to exercise the new functionality provided by `cpu_kernel_multiple_outputs`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51097
Reviewed By: albanD
Differential Revision: D26982619
Pulled By: heitorschueroff
fbshipit-source-id: cb61c7f2c79873ab72ab5a61cbdb9203531ad469
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing.
The supported CPU drivers are `gels, gelsy, gelsd, gelss`.
The CUDA interface has only `gels` implemented but only for overdetermined systems.
The current state of this PR:
- [x] CPU interface
- [x] CUDA interface
- [x] CPU tests
- [x] CUDA tests
- [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252
- [x] docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093
Reviewed By: albanD
Differential Revision: D26991788
Pulled By: mruberry
fbshipit-source-id: 8af9ada979240b255402f55210c0af1cba6a0a3c
Summary:
This PR proposes to improve the distributed doc:
* [x] putting the init functions together
* [x] moving post-init functions into their own sub-section as they are only available after init and moving that group to after all init sub-sections
If this is too much, could we at least put these 2 functions together:
```
.. autofunction:: init_process_group
.. autofunction:: is_initialized
```
as they are interconnected. and the other functions are not alphabetically sorted in the first place.
Thank you.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52976
Reviewed By: albanD
Differential Revision: D26993933
Pulled By: mrshenli
fbshipit-source-id: 7cacbe28172ebb5849135567b1d734870b49de77
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53855
Remove "noindex" here:
{F492926346}
ghstack-source-id: 123724419
Test Plan:
waitforbuildbot
The failure on doctest does not seem to be relevant.
Reviewed By: rohan-varma
Differential Revision: D26967086
fbshipit-source-id: adf9db1144fa1475573f617402fdbca8177b7c08
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50002
The last commit adds tests for 3d conv with the `SubModelFusion` and `SubModelWithoutFusion` classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50003
Reviewed By: mrshenli
Differential Revision: D26325953
Pulled By: jerryzh168
fbshipit-source-id: 7406dd2721c0c4df477044d1b54a6c5e128a9034
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53253
Since GradBucket class becomes public, mention this class in ddp_comm_hooks.rst.
Screenshot:
{F478201008}
ghstack-source-id: 123596842
Test Plan: viewed generated html file
Reviewed By: rohan-varma
Differential Revision: D26812210
fbshipit-source-id: 65b70a45096b39f7d41a195e65b365b722645000
Summary:
This is a more fundamental example, as we may support some amount of shape specialization in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53250
Reviewed By: navahgar
Differential Revision: D26841272
Pulled By: Chillee
fbshipit-source-id: 027c719afafc03828a657e40859cbfbf135e05c9
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Provides the implementation for feature request issue https://github.com/pytorch/pytorch/issues/28937.
Adds the `Parametrization` functionality and implements `Pruning` on top of it.
It adds the `auto` mode, on which the parametrization is just computed once per forwards pass. The previous implementation computed the pruning on every forward, which is not optimal when pruning RNNs for example.
It implements a caching mechanism for parameters. This is implemented through the mechanism proposed at the end of the discussion https://github.com/pytorch/pytorch/issues/7313. In particular, it assumes that the user will not manually change the updated parameters between the call to `backwards()` and the `optimizer.step()`. If they do so, they would need to manually call the `.invalidate()` function provided in the implementation. This could be made into a function that gets a model and invalidates all the parameters in it. It might be the case that this function has to be called in the `.cuda()` and `.to` and related functions.
As described in https://github.com/pytorch/pytorch/issues/7313, this could be used, to implement in a cleaner way the `weight_norm` and `spectral_norm` functions. It also allows, as described in https://github.com/pytorch/pytorch/issues/28937, for the implementation of constrained optimization on manifolds (i.e. orthogonal constraints, positive definite matrices, invertible matrices, weights on the sphere or the hyperbolic space...)
TODO (when implementation is validated):
- More thorough test
- Documentation
Resolves https://github.com/pytorch/pytorch/issues/28937
albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33344
Reviewed By: zhangguanheng66
Differential Revision: D26816708
Pulled By: albanD
fbshipit-source-id: 07c8f0da661f74e919767eae31335a9c60d9e8fe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53084
Adding RemoteModule to master RPC docs since it is a prototype
feature.
ghstack-source-id: 122816689
Test Plan: waitforbuildbot
Reviewed By: rohan-varma
Differential Revision: D26743372
fbshipit-source-id: 00ce9526291dfb68494e07be3e67d7d9c2686f1b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing.
The supported CPU drivers are `gels, gelsy, gelsd, gelss`.
The CUDA interface has only `gels` implemented but only for overdetermined systems.
The current state of this PR:
- [x] CPU interface
- [x] CUDA interface
- [x] CPU tests
- [x] CUDA tests
- [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252
- [x] docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093
Reviewed By: H-Huang
Differential Revision: D26723384
Pulled By: mruberry
fbshipit-source-id: c9866a95f14091955cf42de22f4ac9e2da009713
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52141
Remove BufferShuffleDataSet, as it's not being used anywhere within PyTorch (no usage on Github based on a search) and it's not included in the release of PyTorch 1.7.1.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D26710940
Pulled By: ejguan
fbshipit-source-id: 90023b4bfb105d6aa392753082100f9181ecebd0
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52724.
This fixes the following for the LKJCholesky distribution in master:
- `log_prob` does sample validation when `validate_args=True`.
- exposes documentation for the LKJCholesky distribution.
cc. fehiepsi, fritzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52763
Reviewed By: anjali411
Differential Revision: D26657216
Pulled By: neerajprad
fbshipit-source-id: 12e8f8384cf0c3df8a29564c1e1718d2d6a5833f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51807
Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html).
This function does not support broadcasting or batched inputs at the moment.
**NOTE**
numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction.
**TODO**
- [ ] Benchmark against NumPy
- [x] Add OpInfo testing
- [x] Remove unnecessary copy for out= argument
Test Plan: Imported from OSS
Reviewed By: nikithamalgifb
Differential Revision: D26375734
Pulled By: heitorschueroff
fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b
Summary:
Toward fixing https://github.com/pytorch/pytorch/issues/47624
~Step 1: add `TORCH_WARN_MAYBE` which can either warn once or every time in c++, and add a c++ function to toggle the value.
Step 2 will be to expose this to python for tests. Should I continue in this PR or should we take a different approach: add the python level exposure without changing any c++ code and then over a series of PRs change each call site to use the new macro and change the tests to make sure it is being checked?~
Step 1: add a python and c++ toggle to convert TORCH_WARN_ONCE into TORCH_WARN so the warnings can be caught in tests
Step 2: add a python-level decorator to use this toggle in tests
Step 3: (in future PRs): use the decorator to catch the warnings instead of `maybeWarnsRegex`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48560
Reviewed By: ngimel
Differential Revision: D26171175
Pulled By: mruberry
fbshipit-source-id: d83c18f131d282474a24c50f70a6eee82687158f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51748
Adding docs for `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`
functions.
Note: not documenting `fake_quantize_per_tensor_affine_cachemask` and
`fake_quantize_per_channel_affine_cachemask` since they are implementation details
of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`,
and do not need to be exposed to the user at the moment.
Test Plan: Build the docs locally on Mac OS, it looks good
Reviewed By: supriyar
Differential Revision: D26270514
Pulled By: vkuzo
fbshipit-source-id: 8e3c9815a12a3427572cb4d34a779e9f5e4facdd