pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
kshitij12345	5c18994674	[special] Add `i1` and `i1e` (#56352 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 * [x] Check Docs https://12721710-65600975-gh.circle-artifacts.com/0/docs/special.html * [x] Investigate fp32 failure on CI?! (Fails on clang. Reproduced locally with clang-11) * [ ] Kernel vs Composite? * [x] Autograd for `i0e` for zero? Pull Request resolved: https://github.com/pytorch/pytorch/pull/56352 Reviewed By: anjali411 Differential Revision: D28700888 Pulled By: mruberry fbshipit-source-id: 91a3cbb94f5b8a3b063589ec38179848c11def83	2021-05-29 20:55:23 -07:00
Adnios	09a8f22bf9	Add mish activation function (#58648 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4	2021-05-25 10:36:21 -07:00
leslie-fang-intel	0ede83db7a	enable torch.cpu.amp.autocast (#57386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57386 Here is the PR for what's discussed in the RFC https://github.com/pytorch/pytorch/issues/55374 to enable the autocast for CPU device. Currently, this PR only enable BF16 as the lower precision datatype. Changes: 1. Enable new API `torch.cpu.amp.autocast` for autocast on CPU device: include the python API, C++ API, new Dispatchkey etc. 2. Consolidate the implementation for each cast policy sharing between CPU and GPU devices. 3. Add the operation lists to corresponding cast policy for cpu autocast. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28572219 Pulled By: ezyang fbshipit-source-id: db3db509973b16a5728ee510b5e1ee716b03a152	2021-05-20 17:48:36 -07:00
lezcano	452569dffb	cfloat and cdouble functions (#58137 ) Summary: This adds the methods `Tensor.cfloat()` and `Tensor.cdouble()`. I was not able to find the tests for `.float()` functions. I'd be happy to add similar tests for these functions once someone points me to them. Fixes https://github.com/pytorch/pytorch/issues/56014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58137 Reviewed By: ejguan Differential Revision: D28412288 Pulled By: anjali411 fbshipit-source-id: ff3653cb3516bcb3d26a97b9ec3d314f1f42f83d	2021-05-13 21:13:37 -07:00
Ivan Yashchuk	c1430c3425	Add torch.linalg.inv_ex without checking for errors by default (#58039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58039 The new function has the following signature `inv_ex(Tensor inpit, *, bool check_errors=False) -> (Tensor inverse, Tensor info)`. When `check_errors=True`, an error is thrown if the matrix is not invertible; `check_errors=False` - responsibility for checking the result is on the user. `linalg_inv` is implemented using calls to `linalg_inv_ex` now. Resolves https://github.com/pytorch/pytorch/issues/25095 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405148 Pulled By: mruberry fbshipit-source-id: b8563a6c59048cb81e206932eb2f6cf489fd8531	2021-05-13 09:42:15 -07:00
Jeffrey Wan	e71b526e7e	Add inference mode python bindings and tests (#58045 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56608 - Adds binding to the `c10::InferenceMode` RAII class in `torch._C._autograd.InferenceMode` through pybind. Also binds the `torch.is_inference_mode` function. - Adds context manager `torch.inference_mode` to manage an instance of `c10::InferenceMode` (global). Implemented in `torch.autograd.grad_mode.py` to reuse the `_DecoratorContextManager` class. - Adds some tests based on those linked in the issue + several more for just the context manager Issues/todos (not necessarily for this PR): - Improve short inference mode description - Small example - Improved testing since there is no direct way of checking TLS/dispatch keys - Pull Request resolved: https://github.com/pytorch/pytorch/pull/58045 Reviewed By: agolynski Differential Revision: D28390595 Pulled By: soulitzer fbshipit-source-id: ae98fa036c6a2cf7f56e0fd4c352ff804904752c	2021-05-13 08:55:35 -07:00
Nikita Vedeneev	c790fd2bf8	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: albanD Differential Revision: D28355725 Pulled By: mruberry fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78	2021-05-11 22:53:21 -07:00
Ilqar Ramazanli	8b816e9010	To implement gradient for Pytorch (#54617 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54617 Reviewed By: anjali411 Differential Revision: D28057452 Pulled By: iramazanli fbshipit-source-id: 9bd86679282d34f5e5393e6447121586517eb4f0	2021-05-11 18:52:20 -07:00
Mike Ruberry	3c87fe9b14	Revert D28117714: [pytorch][PR] ATen lu_unpack. Required for making `torch.lu_solve` differentiable. Test Plan: revert-hammer Differential Revision: D28117714 (`5c67d8dfd3`) Original commit changeset: befd33db12ec fbshipit-source-id: 295b2134935542a903a73f90a7998239dfe6cc81	2021-05-09 23:20:06 -07:00
Nikita Vedeneev	5c67d8dfd3	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: astaff Differential Revision: D28117714 Pulled By: mruberry fbshipit-source-id: befd33db12ecc147afacac792418b6f4948fa4a4	2021-05-09 19:12:56 -07:00
Heitor Schueroff	4cf2c646c2	Added torch.linalg.matrix_norm (#57127 ) Summary: This PR is focused on the API for `linalg.matrix_norm` and delegates computations to `linalg.norm` for the moment. The main difference between the norms is when `dim=None`. In this case - `linalg.norm` will compute a vector norm on the flattened input if `ord=None`, otherwise it requires the input to be either 1D or 2D in order to disambiguate between vector and matrix norm - `linalg.vector_norm` will flatten the input - `linalg.matrix_norm` will compute the norm over the last two dimensions, treating the input as batch of matrices In future PRs, the computations will be moved to `torch.linalg.matrix_norm` and `torch.norm` and `torch.linalg.norm` will delegate computations to either `linalg.vector_norm` or `linalg.matrix_norm` based on the arguments provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57127 Reviewed By: mrshenli Differential Revision: D28186736 Pulled By: mruberry fbshipit-source-id: 99ce2da9d1c4df3d9dd82c0a312c9570da5caf25	2021-05-09 04:50:33 -07:00
Ivan Yashchuk	58f32fa5fd	Remove compute_uv flag from torch.linalg.svd (#57180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57180 We have now a separate function for computing only the singular values. `compute_uv` argument is not needed and it was decided in the offline discussion to remove it. This is a BC-breaking change but our linalg module is beta, therefore we can do it without a deprecation notice. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28142163 Pulled By: mruberry fbshipit-source-id: 3fac1fcae414307ad5748c9d5ff50e0aa4e1b853	2021-05-07 15:16:42 -07:00
Heitor Schueroff	1f1e2dab6b	Remove optional type for ord parameter in vector_norm (#57662 ) Summary: As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215 Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None` ### BC Breaking Note This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662 Reviewed By: albanD, mruberry Differential Revision: D28228870 Pulled By: heitorschueroff fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13	2021-05-06 17:53:25 -07:00
Ivan Yashchuk	75a2a92b02	Add torch.linalg.cholesky_ex without checking for errors by default (#56724 ) Summary: The new function has the following signature `cholesky_ex(Tensor input, *, bool check_errors=False) -> (Tensor L, Tensor infos)`. When `check_errors=True`, an error is thrown if the decomposition fails; `check_errors=False` - responsibility for checking the decomposition is on the user. When `check_errors=False`, we don't have host-device memory transfers for checking the values of the `info` tensor. Rewrote the internal code for `torch.linalg.cholesky`. Added `cholesky_stub` dispatch. `linalg_cholesky` is implemented using calls to `linalg_cholesky_ex` now. Resolves https://github.com/pytorch/pytorch/issues/57032. Ref. https://github.com/pytorch/pytorch/issues/34272, https://github.com/pytorch/pytorch/issues/47608, https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56724 Reviewed By: ngimel Differential Revision: D27960176 Pulled By: mruberry fbshipit-source-id: f05f3d5d9b4aa444e41c4eec48ad9a9b6fd5dfa5	2021-05-01 18:48:27 -07:00
kshitij12345	d4ddb47719	[special] Add `xlog1py` (#55138 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/50345 * [x] Check Rendered Document (https://12494173-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.xlog1py) * [x] Tests in Binary Ufunc * [x] OpInfo * [x] Structured Kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/55138 Reviewed By: ngimel Differential Revision: D27961461 Pulled By: mruberry fbshipit-source-id: 30a8f41970a829bf50254aadf5615e8ce4148c7e	2021-04-30 05:51:13 -07:00
Akifumi Imanishi	9da0f2e95e	Support `__pos__` and `positive` (#55891 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55604. This PR implements `torch.Tensor.__pos__` and `torch.positive` for the compatibility with NumPy’s interface. (cc: mruberry, rgommers, emcastillo and kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55891 Reviewed By: H-Huang Differential Revision: D28025928 Pulled By: mruberry fbshipit-source-id: e43e329a802f31bf8805f6efab5c2c7ef34c88b9	2021-04-27 13:23:59 -07:00
iramazanli	3e006fc57e	Adding hsplit,vsplit and dsplit methods (#53536 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/53536 Reviewed By: albanD Differential Revision: D27938880 Pulled By: iramazanli fbshipit-source-id: f741119517783ec2bafa296622ee518b587dd127	2021-04-26 09:39:09 -07:00
kshitij12345	298db67220	[OpInfo] Add Function Variant and Opinfo for permute (#56125 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56125 Reviewed By: ezyang Differential Revision: D27960312 Pulled By: mruberry fbshipit-source-id: b9dd89f7e69d7dff29f3b53828656c13df898fa5	2021-04-25 21:26:44 -07:00
Ivan Yashchuk	d5ff432615	Add torch.linalg.svdvals (#56684 ) Summary: This PR adds `torch.linalg.svdvals(input, out=None)` that computes only the singular values of `input`. Resolves https://github.com/pytorch/pytorch/issues/54155. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56684 Reviewed By: albanD Differential Revision: D27938229 Pulled By: mruberry fbshipit-source-id: 5ea79ad9cccf818df0fbda1f431299ebf8de3798	2021-04-25 03:42:24 -07:00
M.L. Croci	1f0223d6bb	Fix bug in gaussian_nll_loss (#56469 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53964. cc albanD almson ## Major changes: - Overhauled the actual loss calculation so that the shapes are now correct (in functional.py) - added the missing doc in nn.functional.rst ## Minor changes (in functional.py): - I removed the previous check on whether input and target were the same shape. This is to allow for broadcasting, say when you have 10 predictions that all have the same target. - I added some comments to explain each shape check in detail. Let me know if these should be shortened/cut. Screenshots of updated docs attached. Let me know what you think, thanks! ## Edit: Description of change of behaviour (affecting BC): The backwards-compatibility is only affected for the `reduction='none'` mode. This was the source of the bug. For tensors with size (N, D), the old returned loss had size (N), as incorrect summation was happening. It will now have size (N, D) as expected. ### Example Define input tensors, all with size (2, 3). `input = torch.tensor([[0., 1., 3.], [2., 4., 0.]], requires_grad=True)` `target = torch.tensor([[1., 4., 2.], [-1., 2., 3.]])` `var = 2*torch.ones(size=(2, 3), requires_grad=True)` Initialise loss with reduction mode 'none'. We expect the returned loss to have the same size as the input tensors, (2, 3). `loss = torch.nn.GaussianNLLLoss(reduction='none')` Old behaviour: `print(loss(input, target, var)) ` `# Gives tensor([3.7897, 6.5397], grad_fn=<MulBackward0>. This has size (2).` New behaviour: `print(loss(input, target, var)) ` `# Gives tensor([[0.5966, 2.5966, 0.5966], [2.5966, 1.3466, 2.5966]], grad_fn=<MulBackward0>)` `# This has the expected size, (2, 3).` To recover the old behaviour, sum along all dimensions except for the 0th: `print(loss(input, target, var).sum(dim=1))` `# Gives tensor([3.7897, 6.5397], grad_fn=<SumBackward1>.` ![doc1](https://user-images.githubusercontent.com/26558092/115391089-f7f47b00-a1d6-11eb-8726-e4da9057aee0.png) ![doc2](https://user-images.githubusercontent.com/26558092/115391094-f925a800-a1d6-11eb-954b-afd187f42bc7.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56469 Reviewed By: jbschlosser, agolynski Differential Revision: D27894170 Pulled By: albanD fbshipit-source-id: 197890189c97c22109491c47f469336b5b03a23f	2021-04-22 07:43:48 -07:00
Victor Bittorf	52f1a07b63	Python API for Vitals (#53238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53238 There is a tension for the Vitals design: (1) we want a macro based logging API for C++ and (2) we want a clean python API. Furthermore, we want to this to work with "print on destruction" semantics. The unfortunate resolution is that there are (2) ways to define vitals: (1) Use the macros for local use only within C++ - this keeps the semantics people enjoy (2) For vitals to be used through either C++ or Python, we use a global VitalsAPI object. Both these go to the same place for the user: printing to stdout as the globals are destructed. The long history on this diff shows many different ways to try to avoid having 2 different paths... we tried weak pointers & shared pointers, verbose switch cases, etc. Ultimately each ran into an ugly trade-off and this cuts the difference better the alternatives. Test Plan: buck test mode/dev caffe2/test:torch -- --regex vital buck test //caffe2/aten:vitals Reviewed By: orionr Differential Revision: D26736443 fbshipit-source-id: ccab464224913edd07c1e8532093f673cdcb789f	2021-04-15 16:06:43 -07:00
kshitij12345	50057e560b	[special] Add `i0e` (#54409 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Changes: * Add `i0e` * Move some kernels from `UnaryOpsKernel.cu` to `UnarySpecialOpsKernel.cu` to decrease compilation time per file. Time taken by i0e_vs_scipy tests: around 6.33.s <details> <summary>Test Run Log</summary> ``` (pytorch-cuda-dev) kshiteej@qgpu1:~/Pytorch/pytorch_module_special$ pytest test/test_unary_ufuncs.py -k _i0e_vs ======================================================================= test session starts ======================================================================== platform linux -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 rootdir: /home/kshiteej/Pytorch/pytorch_module_special, configfile: pytest.ini plugins: hypothesis-5.38.1 collected 8843 items / 8833 deselected / 10 selected test/test_unary_ufuncs.py ...sss.... [100%] ========================================================================= warnings summary ========================================================================= ../../.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73 test/test_unary_ufuncs.py::TestUnaryUfuncsCUDA::test_special_i0e_vs_scipy_cuda_bfloat16 /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( -- Docs: https://docs.pytest.org/en/stable/warnings.html ===================================================================== short test summary info ====================================================================== SKIPPED [3] test/test_unary_ufuncs.py:1182: not implemented: Could not run 'aten::_copy_from' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_copy_from' is only available for these backends: [BackendSelect, Named, InplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode]. BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] InplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:56 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] UNKNOWN_TENSOR_TYPE_ID: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_4.cpp:9348 [kernel] Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:250 [backend fallback] Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] ==================================================== 7 passed, 3 skipped, 8833 deselected, 2 warnings in 6.33s ===================================================== ``` </details> TODO: * [x] Check rendered docs (https://11743402-65600975-gh.circle-artifacts.com/0/docs/special.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54409 Reviewed By: jbschlosser Differential Revision: D27760472 Pulled By: mruberry fbshipit-source-id: bdfbcaa798b00c51dc9513c34626246c8fc10548	2021-04-15 06:06:11 -07:00
Kurt Mohler	3fe4718d16	Add `padding_idx` argument to EmbeddingBag (#49237 ) Summary: This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction. This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided. Fixes https://github.com/pytorch/pytorch/issues/3194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237 Reviewed By: walterddr, VitalyFedyunin Differential Revision: D26948258 Pulled By: jbschlosser fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc	2021-04-14 09:38:01 -07:00
Sameer Deshmukh	5fb1142702	Add CSR (compressed sparse row) layout for sparse tensors (#50937 ) Summary: Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937 Reviewed By: mrshenli Differential Revision: D27439865 Pulled By: ezyang fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d	2021-04-12 10:09:12 -07:00
kshitij12345	902bf0bbbe	[special] Alias for sigmoid and logit & follow-up (#54759 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Chages: * Alias for sigmoid and logit * Adds out variant for C++ API * Updates docs to link back to `special` documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/54759 Reviewed By: mrshenli Differential Revision: D27615208 Pulled By: mruberry fbshipit-source-id: 8bba908d1bea246e4aa9dbadb6951339af353556	2021-04-08 00:56:59 -07:00
Ivan Yashchuk	84d18727bd	Added linalg.eig, linalg.eigvals (#52491 ) Summary: This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility. MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000. Unfortunately, there is no cuSOLVER function for this operation. Autograd support for `torch.linalg.eig` will be added in a follow-up PR. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52491 Reviewed By: anjali411 Differential Revision: D27563616 Pulled By: mruberry fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5	2021-04-06 13:53:26 -07:00
lezcano	fd02fc5d71	Port put_ and take from TH to ATen (#53356 ) Summary: The two ports were don together, as they can be implemented with the same kernel. In TH, they were already implemented with the same kernel. Resolves https://github.com/pytorch/pytorch/issues/24751 Resolves https://github.com/pytorch/pytorch/issues/24614 Resolves https://github.com/pytorch/pytorch/issues/24640 Resolves https://github.com/pytorch/pytorch/issues/24772 This port makes sure that it interacts correctly with the "deterministic algorithms" flag, as done in https://github.com/pytorch/pytorch/pull/51388 This PR also makes these two functions correct in the following aspects (all of them added to the tests as well): - Support for complex numbers - Correct handling of scalar inputs and zero-dimensional inputs - Implementation that does not do any copies nor sorting of any of the input tensors - Faster and more correct implementation of the backwards (now it works as it should when `source.shape() != index.shape()`) - Now `put_(..., accumulate=True)` is implemented correctly with atomic operations on GPU / CPU (when possible) and is deterministic (modulo the loss of precision that might happen due to the reordering of a sum of floats) - Adds the `torch.put` function that was missing, (`index_put` exists, for example) - Corrected docs It also adds a much more thorough testing to the operations and their gradients. There is a BC-breaking change, and that is that now we check that the inputs do not overlap in the `put_` operation. This was handled (some of the cases, other cases were wrong) in the TH implementation by making contiguous copies of the inputs. How should we handle this one? Edit. Benchmarks: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch from itertools import product torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device, cmd): print(f"cmd: {cmd}, ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") large_tensor = torch.rand(([size] ndims), device=device) small_tensor = torch.rand((index_len,), device=device) index = torch.randint(size * ndims, (index_len,), dtype=torch.long, device=device) if cmd == "put": command = "large_tensor.put_(index, small_tensor, accumulate=False)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "accumulate": command = "large_tensor.put_(index, small_tensor, accumulate=True)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "take": command = "torch.take(large_tensor, index)" if device == cuda: command += "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() for method, device in product(["accumulate", "put", "take"], [cpu, cuda]): run_test(3, 1000, 10, device, method) run_test(3, 1000, 1000, device, method) run_test(3, 1000, 10000, device, method) run_test(2, 10000, 100000, device, method) ``` </details> ```python put_(accumulate=False) ``` <details> <summary>ATen CPU (1.5x - 2x speedup)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.05 µs ± 2.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.15 µs ± 5.13 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 21.6 µs ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 238 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 722 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 4.89 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 42.5 µs ± 96.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 428 µs ± 774 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.99 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 24.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.4 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.6 µs ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.44 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.09 µs ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.77 µs ± 0.998 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.8 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> ```python put_(accumulate=True) ``` <details> <summary>ATen CPU (x2 speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.12 µs ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.14 µs ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 20.8 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 264 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 814 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 5.11 µs ± 6.02 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 43.9 µs ± 49.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 442 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (3x - 11x speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.01 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.3 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 12.6 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 34.7 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 38.2 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 61.2 µs ± 50.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 140 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> ```python take() ``` <details> <summary>ATen CPU (1.1x speedup)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.18 µs ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.79 µs ± 2.96 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 16.6 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 161 µs ± 984 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.1 µs ± 3.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.93 µs ± 7.31 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 18.6 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 178 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.38 µs ± 23.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.7 µs ± 9.77 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.6 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.5 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.31 µs ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.52 µs ± 5.78 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.73 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.7 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/53356 Reviewed By: mruberry Differential Revision: D27520243 Pulled By: ngimel fbshipit-source-id: e3979349c2c62d2949e09fb05e5fd4883fbc9093	2021-04-05 18:05:38 -07:00
Peter Bell	2ee02b30b1	Replace rounding_mode="true" with rounding_mode=None (#51988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51988 * #51988 Replace rounding_mode="true" with rounding_mode=None Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27561817 Pulled By: mruberry fbshipit-source-id: 60d1d9c389570f60d599fc1876518717367fb368	2021-04-05 14:53:43 -07:00
Heitor Schueroff	d98072b027	Deprecate torch.chain_matmul in favor of torch.linalg.multi_dot (#53453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53453 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27406282 Pulled By: heitorschueroff fbshipit-source-id: b6e715d1b88e0613ee6b6208cb28ba4757e31717	2021-04-01 04:50:51 -07:00
Heitor Schueroff	5d68b3695c	[Relanding] Implemented torch.linalg.multi_dot (#52859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52859 This reverts commit `92a4ee1cf6`. Added support for bfloat16 for CUDA 11 and removed fast-path for empty input tensors that was affecting autograd graph. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27402390 Pulled By: heitorschueroff fbshipit-source-id: 73c5ccf54f3da3d29eb63c9ed3601e2fe6951034	2021-04-01 04:49:05 -07:00
kshitij12345	c9d0c855f7	[special] Alias for special.expm1 and special.exp2 (#54670 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54670 Reviewed By: H-Huang Differential Revision: D27401440 Pulled By: mruberry fbshipit-source-id: 02b1fd0e8ffd3f5a017d6b6b9229b76b92b4b745	2021-03-30 10:03:13 -07:00
Hameer Abbasi	c690ed0ae8	Fix override for __iter__ (#54702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54702 This fixes subclassing for __iter__ so that it returns an iterator over subclasses properly instead of Tensor. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27352563 Pulled By: ezyang fbshipit-source-id: 4c195a86c8f2931a6276dc07b1e74ee72002107c	2021-03-30 08:30:50 -07:00
kshitij12345	0527d14248	[numpy] Add torch.take_along_dim (#52833 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38349 Wrapper around the existing `torch.gather` with broadcasting logic. TODO: * [x] Add Doc entry (see if phrasing can be improved) * [x] Add OpInfo * [x] Add test against numpy * [x] Handle broadcasting behaviour and when dim is not given. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52833 Reviewed By: malfet Differential Revision: D27319038 Pulled By: mruberry fbshipit-source-id: 00f307825f92c679d96e264997aa5509172f5ed1	2021-03-28 05:22:51 -07:00
kshitij12345	6f8328ef44	[special] Add special.entr (#53500 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 TODO: * [x] Verfiy docs rendering (https://11397990-65600975-gh.circle-artifacts.com/0/docs/special.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53500 Reviewed By: ngimel Differential Revision: D27287096 Pulled By: mruberry fbshipit-source-id: 6b3dfd53e811a0f023ee444a0b56176f825d39e9	2021-03-24 18:44:42 -07:00
Serhat Yilmaz	7e3cf1ee24	[pytorch] Add native support for segment reduce step1: API definition (#53727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53727 This is first diff to add native support for segment reduction in PyTorch. It provides similar functionality like torch.scatter or "numpy.ufunc.reduceat". This diff mainly focuses on API layer to make sure future improvements will not cause backward compatibility issues. Once API is settled, here are next steps I am planning: - Add support for other major reduction types (e.g. min, sum) for 1D tensor - Add Cuda support - Backward support - Documentation for the op - Perf optimizations and benchmark util - Support for multi dimensional tensors (on data and lengths) (not high priority) - Support for 'indices' (not high priority) Test Plan: Added unit test Reviewed By: ngimel Differential Revision: D26952075 fbshipit-source-id: 8040ec96def3013e7240cf675d499ee424437560	2021-03-23 16:00:30 -07:00
Heitor Schueroff	f9e7f132fb	Added torch.linalg.matrix_power (#52608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52608 TODO - [x] Add OpInfo - [x] Update documentation - [x] Add more tests and compare against NumPy Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27261532 Pulled By: heitorschueroff fbshipit-source-id: c1e4ab297da3683f6d5751be8790602f9dc37b6b	2021-03-23 15:10:06 -07:00
kshitij12345	bfd009836e	[torch.special] Add special.erf{c, inv} (#53260 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Also adds `overrides` entry for module and the newly added functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53260 Reviewed By: agolynski Differential Revision: D27114342 Pulled By: mruberry fbshipit-source-id: b1dd88f373db251bb71df12d33b160382138f63f	2021-03-18 19:06:25 -07:00
Bin Bao	4626886f21	[JIT] Add CUDNN Conv-Add-Relu fusion for Frozen Model Optimization (#52102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52102 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D26646100 fbshipit-source-id: 7f7a82cc0b42c958b9e0c854b3b5dc6ea7cfff6c	2021-03-18 15:18:52 -07:00
Kurt Mohler	382a47b493	Add torch.linalg.vector_norm function (#51099 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51099 Reviewed By: agolynski Differential Revision: D27147360 Pulled By: mruberry fbshipit-source-id: 1056f840e7027ad81971c9d1a9f952ab9648f1b5	2021-03-18 06:41:39 -07:00
Ivan Yashchuk	564456ac44	Added autograd support for torch.orgqr (#52637 ) Summary: This PR adds autograd support for `torch.orgqr`. Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it. The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html. Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation. Resolves https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637 Reviewed By: agolynski Differential Revision: D27114246 Pulled By: mruberry fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce	2021-03-18 05:42:18 -07:00
Xiong Wei	da10ccd35f	Implements cpu_kernel_multiple_outputs and torch.frexp (#51097 ) Summary: Close https://github.com/pytorch/pytorch/issues/51108 Related https://github.com/pytorch/pytorch/issues/38349 This PR implements the `cpu_kernel_multiple_outputs` to support returning multiple values in a CPU kernel. ```c++ auto iter = at::TensorIteratorConfig() .add_output(out1) .add_output(out2) .add_input(in1) .add_input(in2) .build(); at::native::cpu_kernel_multiple_outputs(iter, [=](float a, float b) -> std::tuple<float, float> { float add = a + b; float mul = a * b; return std::tuple<float, float>(add, mul); } ); ``` The `out1` will equal to `torch.add(in1, in2)`, while the result of `out2` will be `torch.mul(in1, in2)`. It helps developers implement new torch functions that return two tensors more conveniently, such as NumPy-like functions [divmod](https://numpy.org/doc/1.18/reference/generated/numpy.divmod.html?highlight=divmod#numpy.divmod) and [frexp](https://numpy.org/doc/stable/reference/generated/numpy.frexp.html#numpy.frexp). This PR adds `torch.frexp` function to exercise the new functionality provided by `cpu_kernel_multiple_outputs`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51097 Reviewed By: albanD Differential Revision: D26982619 Pulled By: heitorschueroff fbshipit-source-id: cb61c7f2c79873ab72ab5a61cbdb9203531ad469	2021-03-15 10:44:32 -07:00
Nikita Vedeneev	afa1ff8e04	Implements `torch.linalg.lstsq` (#49093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252 - [x] docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093 Reviewed By: albanD Differential Revision: D26991788 Pulled By: mruberry fbshipit-source-id: 8af9ada979240b255402f55210c0af1cba6a0a3c	2021-03-12 13:25:55 -08:00
Edward Yang	758fb94fcb	Prefix assert_async with underscore, fix some bugs in assert_async CUDA testing (#53276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53276 - One of the tests had a syntax error (but the test wasn't fine grained enough to catch this; any error was a pass) - Doesn't work on ROCm Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D26820048 Test Plan: Imported from OSS Reviewed By: mruberry Pulled By: ezyang fbshipit-source-id: b02c4252d10191c3b1b78f141d008084dc860c45	2021-03-05 17:36:01 -08:00
Edward Yang	cfd9360d09	Revert D26837780: Revert D26819810: Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26837780 Original commit changeset: 21567cab5c0f fbshipit-source-id: 8ea735e5fdc97e32ae3fafd40297a1b8a7cd34b0	2021-03-04 20:45:35 -08:00
Edward Yang	1accffe450	Revert D26819810: Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26819810 Original commit changeset: e528260e1aa9 fbshipit-source-id: 21567cab5c0ff5f5e60a699d4d4678773a567c30	2021-03-04 18:48:56 -08:00
Edward Yang	9e5e5a7d96	Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26815021 Original commit changeset: 972eaafcdf14 fbshipit-source-id: e528260e1aa91df1873c73af00aa57addd671607	2021-03-04 09:28:25 -08:00
Mike Ruberry	b864457743	Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26744062 (`12d63cc2f5`) Original commit changeset: be6d2653afe5 fbshipit-source-id: 972eaafcdf14d96abdec3dea6bcbd5cac1f3d759	2021-03-04 04:11:25 -08:00
Edward Yang	12d63cc2f5	Add assert_async (#53086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53086 Fixes #36853 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26744062 Pulled By: ezyang fbshipit-source-id: be6d2653afe584adf67a05b5d43185b40764650d	2021-03-03 16:18:07 -08:00
Xiao Wang	d30f4d1dfd	Migrate apex.parallel.SyncBatchNorm channels_last to pytorch (#46906 ) Summary: per title This PR did - Migrate `apex.parallel.SyncBatchNorm` channels_last to pytorch `torch.nn.SyncBatchNorm` - Fix a TODO here by fusing `sum`, `div` kernels into backward elementwise kernel `b167402e2e/torch/nn/modules/_functions.py (L76-L95)` Todo - [x] Discuss a regression introduced in https://github.com/pytorch/pytorch/pull/37133#discussion_r512530389, which is the synchronized copy here `b167402e2e/torch/nn/modules/_functions.py (L32-L34)` Comment: This PR uses apex version for the size check. Test passed and I haven't seen anything wrong so far. - [x] The restriction to use channels_last kernel will be like this ``` inline bool batch_norm_use_channels_last_kernels(const at::Tensor& self) { return self.is_contiguous(at::MemoryFormat::ChannelsLast) \|\| self.ndimension() == 2; } ``` I think we can relax that for channels_last_3d as well? Comment: we don't have benchmark for this now, will check this and add functionality later when needed. - [x] Add test - [x] Add benchmark Detailed benchmark is at https://github.com/xwang233/code-snippet/tree/master/syncbn-channels-last Close https://github.com/pytorch/pytorch/issues/50781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46906 Reviewed By: albanD Differential Revision: D26771437 Pulled By: malfet fbshipit-source-id: d00387044e9d43ac7e6c0e32a2db22c63d1504de	2021-03-03 15:29:45 -08:00
Mike Ruberry	9c2673df46	Revert D26723384: [pytorch][PR] Implements `torch.linalg.lstsq` Test Plan: revert-hammer Differential Revision: D26723384 (`3ac9013235`) Original commit changeset: c9866a95f140 fbshipit-source-id: 3e5263d71facdc91ca09d7dcbbbe3ba818ee2821	2021-03-03 15:24:25 -08:00
Edward Yang	0f81a69a96	Make meta a device (getting rid of empty_meta) (#53143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53143 Meta is now an honest to goodness device type, like cpu, so you can use device='meta' to trigger allocation of meta tensors. This way better than empty_meta since we now have working API for most factory functions (they don't necessarily work yet, though, because need to register Meta versions of those functions.) Some subtleties: - I decided to drop the concept of CPU versus CUDA meta tensors; meta tensors are device agnostic. It's hard to say exactly what the correct level of abstraction here is, but in this particular case implementation considerations trump semantic considerations: it is way easier to have just a meta device, than to have a meta device AND a cpu device AND a cuda device. This may limit the applicability of meta tensors for tracing models that do explicit cpu()/cuda() conversions (unless, perhaps, we make those operations no-ops on meta tensors). - I noticed that the DeviceType uppercase strings are kind of weird. Are they really supposed to be all caps? That's weird. - I moved the Meta dispatch key to live with the rest of the "device" dispatch keys. - I intentionally did NOT add a Backend for Meta. For now, I'm going to hope meta tensors never exercise any of the Backend conversion code; even if it does, better to fix the code to just stop converting to and from Backend. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D26763552 Pulled By: ezyang fbshipit-source-id: 14633b6ca738e60b921db66a763155d01795480d	2021-03-03 11:24:13 -08:00
Nikita Vedeneev	3ac9013235	Implements `torch.linalg.lstsq` (#49093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252 - [x] docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093 Reviewed By: H-Huang Differential Revision: D26723384 Pulled By: mruberry fbshipit-source-id: c9866a95f14091955cf42de22f4ac9e2da009713	2021-03-02 19:00:07 -08:00
Joel Schlosser	e86476f736	Huber loss (#50553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48595. ## Background This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant. I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows: ``` Huber loss calls dedicated Huber kernel: 2,795,300 Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612 ``` With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553 Test Plan: ``` python test/test_nn.py TestNN.test_HuberLoss python test/test_nn.py TestNN.test_HuberLoss_delta python test/test_nn.py TestNN.test_huber_loss_invalid_delta python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda python test/test_nn.py TestNN.test_loss_equal_input_target_shape python test/test_nn.py TestNN.test_pointwise_loss_broadcast python test/test_overrides.py python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss python test/test_type_hints.py python test/test_cpp_api_parity.py build/bin/test_api ``` ## Documentation <img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png"> <img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png"> <img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png"> <img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png"> <img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png"> Reviewed By: H-Huang Differential Revision: D26734071 Pulled By: jbschlosser fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5	2021-03-02 17:30:45 -08:00
Luca Wehrstedt	92a4ee1cf6	Revert D26375734: Implemented torch.linalg.multi_dot Test Plan: revert-hammer Differential Revision: D26375734 (`0396f492b9`) Original commit changeset: 839642692424 fbshipit-source-id: cb64db646010128d802e1930d5e9526c1f7aa6a2	2021-02-25 00:43:57 -08:00
Bel H	30cb6ac53c	Introduce `mlc` device (ML Compute device) to PyTorch's device list (#50634 ) Summary: Apple recently announced ML Compute, a new framework available in macOS Big Sur, which enables users to accelerate the training of neural networks on Mac hardware. This PR is the first on a series of PRs that will enable the integration with ML Compute. Most of the integration code will live on a separate subrepo named `mlc`. The integration with `mlc` (ML Compute) will be very similar to that of xla. We rely on registering our ops through: TORCH_LIBRARY_IMPL(aten, PrivateUse1, m) { m.impl_UNBOXED(<op_schema_name>, &customized_op_kernel) ... } Pull Request resolved: https://github.com/pytorch/pytorch/pull/50634 Reviewed By: malfet Differential Revision: D26614213 Pulled By: smessmer fbshipit-source-id: 3b492b346c61cc3950ac880ac01a82fbdddbc07b	2021-02-24 22:39:11 -08:00
Heitor Schueroff	0396f492b9	Implemented torch.linalg.multi_dot (#51807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51807 Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html). This function does not support broadcasting or batched inputs at the moment. NOTE numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction. TODO - [ ] Benchmark against NumPy - [x] Add OpInfo testing - [x] Remove unnecessary copy for out= argument Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26375734 Pulled By: heitorschueroff fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b	2021-02-24 15:32:30 -08:00
Heitor Schueroff	964d47dfb9	Add torch.linalg to generated annotated_args for test_overrides (#52464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52464 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26618696 Pulled By: heitorschueroff fbshipit-source-id: 9889fcaafcb307319b4526ee86355389653a6b61	2021-02-24 15:30:32 -08:00
Nikita Vedeneev	9699c703c2	Stable sort for the CPU take 2. (#51790 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38681. A duplicate of https://github.com/pytorch/pytorch/pull/50052 created to become importable to the fb internal tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51790 Reviewed By: agolynski Differential Revision: D26279045 Pulled By: glaringlee fbshipit-source-id: 348e171dee9c370a76002b65d0c82c329f57a421	2021-02-19 09:28:57 -08:00
Vasiliy Kuznetsov	33afb5f19f	fake_quant cachemask: remove Python bindings (#51878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51878 `fake_quantize_per_tensor_affine_cachemask` and `fake_quantize_per_channel_affine_cachemask` are implementation details of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`, removing the Python bindings for them since there is no need to expose them. Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Imported from OSS Reviewed By: albanD, bugra Differential Revision: D26314173 fbshipit-source-id: 733c93a3951453e739b6ed46b72fbad2244f6e97	2021-02-09 23:27:53 -08:00
mattip	b97a040f71	ENH: toggle TORCH_WARN_ONCE to TORCH_WARN for tests (#48560 ) Summary: Toward fixing https://github.com/pytorch/pytorch/issues/47624 ~Step 1: add `TORCH_WARN_MAYBE` which can either warn once or every time in c++, and add a c++ function to toggle the value. Step 2 will be to expose this to python for tests. Should I continue in this PR or should we take a different approach: add the python level exposure without changing any c++ code and then over a series of PRs change each call site to use the new macro and change the tests to make sure it is being checked?~ Step 1: add a python and c++ toggle to convert TORCH_WARN_ONCE into TORCH_WARN so the warnings can be caught in tests Step 2: add a python-level decorator to use this toggle in tests Step 3: (in future PRs): use the decorator to catch the warnings instead of `maybeWarnsRegex` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48560 Reviewed By: ngimel Differential Revision: D26171175 Pulled By: mruberry fbshipit-source-id: d83c18f131d282474a24c50f70a6eee82687158f	2021-02-08 08:21:19 -08:00
albanD	716a8c2153	make forward AD API private (#51693 ) Summary: Avoid leaking private functions in `torch.` namespace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51693 Reviewed By: gchanan Differential Revision: D26245046 Pulled By: albanD fbshipit-source-id: 5481b57eb56ba96581848598d32ebf5894a7adf0	2021-02-04 19:02:29 -08:00
Peter Bell	b150f150ba	Add division overload with rounding_mode selection (#51706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280 As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26123271 Pulled By: mruberry fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46	2021-02-04 13:08:36 -08:00
Jeffrey Wan	b18eeaa80a	Implement `np.diff` for single order differences (#50569 ) Summary: Implements `np.diff` for single order differences only: - method and function variants for `diff` and function variant for `diff_out` - supports out variant, but not in-place since shape changes - adds OpInfo entry, and test in `test_torch` - automatic autograd because we are using the `Math` dispatch _Update: we only support Tensors for prepend and append in this PR. See discussion below and comments for more details._ Currently there is a quirk in the c++ API based on how this is implemented: it is not possible to specify scalar prepend and appends without also specifying all 4 arguments. That is because the goal is to match NumPy's diff signature of `diff(int n=1, int dim=-1, Union[Scalar, Tensor] prepend=None, Union[Scalar, Tensor] append)=None` where all arguments are optional, positional and in the correct order. There are a couple blockers. One is c++ ambiguity. This prevents us from simply doing `diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None)` etc for all combinations of {Tensor, Scalar} x {Tensor, Scalar}. Why not have append, prepend not have default args and then write out the whole power set of {Tensor, Scalar, omitted} x {Tensor, Scalar, omitted} you might ask. Aside from having to write 18 overloads, this is actually illegal because arguments with defaults must come after arguments without defaults. This would mean having to write `diff(prepend, append, n, dim)` which is not desired. Finally writing out the entire power set of all arguments n, dim, prepend, append is out of the question because that would actually involve 2 * 2 * 3 * 3 = 36 combinations. And if we include the out variant, that would be 72 overloads! With this in mind, the current way this is implemented is actually to still do `diff(int n=1, int dim=-1, Scalar? prepend=None, Tensor? append=None)`. But also make use of `cpp_no_default_args`. The idea is to only have one of the 4 {Tensor, Scalar} x {Tensor, Scalar} provide default arguments for the c++ api, and add `cpp_no_default_args` for the remaining 3 overloads. With this, Python api works as expected, but some calls such as `diff(prepend=1)` won't work on c++ api. We can optionally add 18 more overloads that cover the {dim, n, no-args} x {scalar-tensor, tensor-scalar, scalar-scalar} x {out, non-out} cases for c++ api. _[edit: counting is hard - just realized this number is still wrong. We should try to count the cases we do cover instead and subtract that from the total: (2 * 2 * 3 * 3) - (3 + 2^4) = 17. 3 comes from the 3 of 4 combinations of {tensor, scalar}^2 that we declare to be `cpp_no_default_args`, and the one remaining case that has default arguments has covers 2^4 cases. So actual count is 34 additional overloads to support all possible calls]_ _[edit: thanks to https://github.com/pytorch/pytorch/issues/50767 hacky_wrapper is no longer necessary; it is removed in the latest commit]_ hacky_wrapper was also necessary here because `Tensor?` will cause dispatch to look for the `const optional<Tensor>&` schema but also generate a `const Tensor&` declaration in Functions.h. hacky_wrapper allows us to define our function as `const Tensor&` but wraps it in optional for us, so this avoids both the errors while linking and loading. _[edit: rewrote the above to improve clarity and correct the fact that we actually need 18 more overloads (26 total), not 18 in total to complete the c++ api]_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/50569 Reviewed By: H-Huang Differential Revision: D26176105 Pulled By: soulitzer fbshipit-source-id: cd8e77cc2de1117c876cd71c29b312887daca33f	2021-02-02 20:25:16 -08:00
XiaobingSuper	ec378055c3	add OneDNN linear backward (#49453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49453 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26006889 Pulled By: VitalyFedyunin fbshipit-source-id: 06e2a02b6e01d847395521a31fe84d844f2ee9ae	2021-02-02 12:18:59 -08:00
Hameer Abbasi	b1907f5ebc	Fix pickling for Tensor subclasses (redo) (#47732 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47051 Redo of https://github.com/pytorch/pytorch/issues/47115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47732 Reviewed By: izdeby Differential Revision: D25465382 Pulled By: ezyang fbshipit-source-id: 3a8d57281a2d6f57415d5735d34ad307f3526638	2021-02-01 07:32:52 -08:00
Radhakrishnan Venkataramani	3397919dcf	Rowwise Prune op (Add the test to OSS run_test), Make the op private. (#46131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46131 Refer to the title. Test Plan: `buck test caffe2/test:pruning` Reviewed By: raghuramank100 Differential Revision: D24230472 fbshipit-source-id: 8f0a83446c23fdf30d0313b8c3f5ff1a463b50c7	2021-01-29 06:08:18 -08:00
Vasiliy Kuznetsov	267e243064	fake_quant: more memory efficient per-channel backward (#51255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51255 This is the same as #50561, but for per-channel fake_quant. TODO before land write up better Memory and performance impact (MobileNetV2): TODO Performance impact (microbenchmarks): https://gist.github.com/vkuzo/fbe1968d2bbb79b3f6dd776309fbcffc * forward pass on cpu: 512ms -> 750ms (+46%) * forward pass on cuda: 99ms -> 128ms (+30%) * note: the overall performance impact to training jobs should be minimal, because this is used for weights, and relative importance of fq is dominated by fq'ing the activations * note: we can optimize the perf in a future PR by reading once and writing twice Test Plan: ``` python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cpu python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cuda python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cpu python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cuda ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26117721 fbshipit-source-id: 798b59316dff8188a1d0948e69adf9e5509e414c	2021-01-28 19:39:35 -08:00
Vasiliy Kuznetsov	983b8e6b62	fake_quant: add a more memory efficient version (#50561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50561 Not for review yet, a bunch of TODOs need finalizing. tl;dr; add an alternative implementation of `fake_quantize` which saves a ask during the forward pass and uses it to calculate the backward. There are two benefits: 1. the backward function no longer needs the input Tensor, and it can be gc'ed earlier by autograd. On MobileNetV2, this reduces QAT overhead by ~15% (TODO: link, and absolute numbers). We add an additional mask Tensor to pass around, but its size is 4x smaller than the input tensor. A future optimization would be to pack the mask bitwise and unpack in the backward. 2. the computation of `qval` can be done only once in the forward and reused in the backward. No perf change observed, TODO verify with better matrics. TODO: describe in more detail Test Plan: OSS / torchvision / MobileNetV2 ``` python references/classification/train_quantization.py --print-freq 1 --data-path /data/local/packages/ai-group.imagenet-256-smallest-side/prod/ --output-dir ~/nfs/pytorch_vision_tests/ --backend qnnpack --epochs 5 TODO paste results here ``` TODO more Imported from OSS Reviewed By: ngimel Differential Revision: D25918519 fbshipit-source-id: ec544ca063f984de0f765bf833f205c99d6c18b6	2021-01-27 19:36:04 -08:00
Guilherme Leobas	9dfbfe9fca	Add type annotations to torch.overrides (#50824 ) Summary: This is a follow up PR of https://github.com/pytorch/pytorch/issues/48493. Fixes https://github.com/pytorch/pytorch/issues/48492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50824 Reviewed By: bdhirsh Differential Revision: D26050736 Pulled By: ezyang fbshipit-source-id: 049605fd271cff28c8b6e300c163e9df3b3ea23b	2021-01-25 13:20:09 -08:00
Kurt Mohler	8ab1a1495d	Rename `set_deterministic` to `use_deterministic_algorithms` (#49904 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49904 Reviewed By: ezyang, mrshenli Differential Revision: D25956761 Pulled By: mruberry fbshipit-source-id: 86a59289d50825a0ebbd7c358b483c8d8039ffa6	2021-01-22 11:27:07 -08:00
M.L. Croci	8eb90d4865	Add Gaussian NLL Loss (#50886 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48520. cc albanD (This is a clean retry PR https://github.com/pytorch/pytorch/issues/49807) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50886 Reviewed By: ejguan Differential Revision: D26007435 Pulled By: albanD fbshipit-source-id: 88fe91b40dea6f72e093e6301f0f04fcc842d2f0	2021-01-22 06:56:49 -08:00
Shen Li	1f5c3b3aae	Revert D25958987: [pytorch][PR] Add type annotations to torch.overrides Test Plan: revert-hammer Differential Revision: D25958987 (`2ace4fc01e`) Original commit changeset: aadc065c489b fbshipit-source-id: efd8b7c3cbe03d5ab0afa0d7c695182623285a3a	2021-01-20 08:59:44 -08:00
chengjun	4a8ef4525e	Add new backend type for Intel heterogeneous computation platform. (#49786 ) Summary: Add a new device type 'XPU' ('xpu' for lower case) to PyTorch. Changes are needed for code related to device model and kernel dispatch, e.g. DeviceType, Backend and DispatchKey etc. https://github.com/pytorch/pytorch/issues/48246 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49786 Reviewed By: mrshenli Differential Revision: D25893962 Pulled By: ezyang fbshipit-source-id: 7ff0a316ee34cf0ed6fc7ead08ecdeb7df4b0052	2021-01-20 08:15:18 -08:00
kiyosora	4803eaf502	Implement NumPy-like function torch.fmax() & torch.fmin() (#49312 ) Summary: - Implementing the NumPy-like function`torch.fmax()` and `torch.fmin()` recommended in https://github.com/pytorch/pytorch/issues/48440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49312 Reviewed By: izdeby Differential Revision: D25887246 Pulled By: heitorschueroff fbshipit-source-id: d762eeff8b328bfcbe7d48b7ee9d2da72c249691	2021-01-20 06:45:25 -08:00
Guilherme Leobas	2ace4fc01e	Add type annotations to torch.overrides (#48493 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48493 Reviewed By: mruberry Differential Revision: D25958987 Pulled By: ezyang fbshipit-source-id: aadc065c489bf1a8c6258de14c930e396df763bc	2021-01-20 06:32:22 -08:00
Xinyu Li	7526e38cd3	Revert "Stable sort for CPU (#50052 )" (#50752 ) Summary: This reverts commit `c99f356051`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50752 Reviewed By: zou3519 Differential Revision: D25958146 Pulled By: glaringlee fbshipit-source-id: f4068d038f9bd337bac8b673eaeb46a4646f6c77	2021-01-19 18:21:25 -08:00
Ivan Yashchuk	f9a5ba7398	Added linalg.slogdet (#49194 ) Summary: This PR adds `torch.linalg.slogdet`. Changes compared to the original torch.slogdet: - Complex input now works as in NumPy - Added out= variant (allocates temporary and makes a copy for now) - Updated `slogdet_backward` to work with complex input Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49194 Reviewed By: VitalyFedyunin Differential Revision: D25916959 Pulled By: mruberry fbshipit-source-id: cf9be8c5c044870200dcce38be48cd0d10e61a48	2021-01-19 07:28:12 -08:00
nikitaved	c99f356051	Stable sort for CPU (#50052 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/38681](https://github.com/pytorch/pytorch/issues/38681) for the CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50052 Reviewed By: mrshenli Differential Revision: D25900823 Pulled By: glaringlee fbshipit-source-id: 1a3fa336037d0aa2344d79f46dcacfd478a353d1	2021-01-15 19:34:27 -08:00
Hao Lu	4e76616719	[StaticRuntime][ATen] Add out variant for narrow_copy (#49502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49502 It broke the OSS CI the last time I landed it, mostly cuda tests and python bindings. Similar to permute_out, add the out variant of `aten::narrow` (slice in c2) which does an actual copy. `aten::narrow` creates a view, however, an copy is incurred when we call `input.contiguous` in the ops that follow `aten::narrow`, in `concat_add_mul_replacenan_clip`, `casted_batch_one_hot_lengths`, and `batch_box_cox`. {F351263599} Test Plan: Unit test: ``` buck test //caffe2/aten:math_kernel_test buck test //caffe2/test:sparse -- test_narrow ``` Benchmark with the adindexer model: ``` bs = 1 is neutral Before: I1214 21:32:51.919239 3285258 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0886948. Iters per second: 11274.6 After: I1214 21:32:52.492352 3285277 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0888019. Iters per second: 11261 bs = 20 shows more gains probably because the tensors are bigger and therefore the cost of copying is higher Before: I1214 21:20:19.702445 3227229 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.527563. Iters per second: 1895.51 After: I1214 21:20:20.370173 3227307 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.508734. Iters per second: 1965.67 ``` Reviewed By: ajyu Differential Revision: D25596290 fbshipit-source-id: da2f5a78a763895f2518c6298778ccc4d569462c	2021-01-12 19:35:32 -08:00
Ivan Yashchuk	9384d31af5	Added linalg.pinv (#48399 ) Summary: This PR adds `torch.linalg.pinv`. Changes compared to the original `torch.pinverse`: * New kwarg "hermitian": with `hermitian=True` eigendecomposition is used instead of singular value decomposition. * `rcond` argument can now be a `Tensor` of appropriate shape to apply matrix-wise clipping of singular values. * Added `out=` variant (allocates temporary and makes a copy for now) Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48399 Reviewed By: zhangguanheng66 Differential Revision: D25869572 Pulled By: mruberry fbshipit-source-id: 0f330a91d24ba4e4375f648a448b27594e00dead	2021-01-12 06:52:06 -08:00
Taylor Robie	d31a760be4	move has_torch_function to C++, and make a special case object_has_torch_function (#48965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48965 This PR pulls `__torch_function__` checking entirely into C++, and adds a special `object_has_torch_function` method for ops which only have one arg as this lets us skip tuple construction and unpacking. We can now also do away with the Python side fast bailout for `Tensor` (e.g. `if any(type(t) is not Tensor for t in tensors) and has_torch_function(tensors)`) because they're actually slower than checking with the Python C API. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ezyang Differential Revision: D25590732 Pulled By: robieta fbshipit-source-id: 6bd74788f06cdd673f3a2db898143d18c577eb42	2021-01-10 19:23:35 -08:00
Ivan Yashchuk	4774c6800b	Added linalg.inv (#48261 ) Summary: This PR adds `torch.linalg.inv` for NumPy compatibility. `linalg_inv_out` uses in-place operations on provided `result` tensor. I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization. I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly. Zero batch dimensions are also working and tested. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261 Reviewed By: gchanan Differential Revision: D25849590 Pulled By: mruberry fbshipit-source-id: cfee6f1daf7daccbe4612ec68f94db328f327651	2021-01-10 04:00:51 -08:00
Antonio Cuni	5c5abd591d	Implement torch.linalg.svd (#45562 ) Summary: This is related to https://github.com/pytorch/pytorch/issues/42666 . I am opening this PR to have the opportunity to discuss things. First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`: 1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!) 2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed). 3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False` 4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument. I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following: 1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT` 2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`). 3. The C++ version of `linalg_svd` always returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`. Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45562 Reviewed By: H-Huang Differential Revision: D25803557 Pulled By: mruberry fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f	2021-01-08 06:46:16 -08:00
Antonio Cuni	361f5ed91d	Implement torch.linalg.qr (#47764 ) Summary: I am opening this PR early to have a place to discuss design issues. The biggest difference between `torch.qr` and `numpy.linalg.qr` is that the former `torch.qr` takes a boolean parameter `some=True`, while the latter takes a string parameter `mode='reduced'` which can be one of the following: `reduced` this is completely equivalent to `some=True`, and both are the default. `complete` this is completely equivalent to `some=False`. `r` this returns only `r` instead of a tuple `(r, q)`. We have already decided that we don't want different return types depending on the parameters, so I propose to return `(r, empty_tensor)` instead. I think that in this mode it will be impossible to implement the backward pass, so we should raise an appropriate error in that case. `raw` in this mode, it returns `(h, tau)` instead of `(q, r)`. Internally, `h` and `tau` are obtained by calling lapack's `dgeqrf` and are later used to compute the actual values of `(q, r)`. The numpy docs suggest that these might be useful to call other lapack functions, but at the moment none of them is exposed by numpy and I don't know how often it is used in the real world. I suppose the implementing the backward pass need attention to: the most straightforward solution is to use `(h, tau)` to compute `(q, r)` and then use the normal logic for `qr_backward`, but there might be faster alternatives. `full`, `f` alias for `reduced`, deprecated since numpy 1.8.0 `economic`, `e` similar to `raw but it returns only `h` instead of `(h, tau). Deprecated since numpy 1.8.0 To summarize: * `reduce`, `complete` and `r` are straightforward to implement. * `raw` needs a bit of extra care, but I don't know how much high priority it is: since it is used rarely, we might want to not support it right now and maybe implement it in the future? * I think we should just leave `full` and `economic` out, and possibly add a note to the docs explaining what you need to use instead /cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/47764 Reviewed By: ngimel Differential Revision: D25708870 Pulled By: mruberry fbshipit-source-id: c25c70a23a02ec4322430d636542041e766ebe1b	2020-12-28 17:28:17 -08:00
Mike Ruberry	5acc27c00a	Revert D25690129: [pytorch][PR] Added linalg.inv Test Plan: revert-hammer Differential Revision: D25690129 (`8554b58fbd`) Original commit changeset: edb2d03721f2 fbshipit-source-id: 8679ea18e637423d35919544d2b047a62ac3abd8	2020-12-23 15:27:52 -08:00
Ivan Yashchuk	8554b58fbd	Added linalg.inv (#48261 ) Summary: This PR adds `torch.linalg.inv` for NumPy compatibility. `linalg_inv_out` uses in-place operations on provided `result` tensor. I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization. I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly. Zero batch dimensions are also working and tested. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261 Reviewed By: ngimel Differential Revision: D25690129 Pulled By: mruberry fbshipit-source-id: edb2d03721f22168c42ded8458513cb23dfdc712	2020-12-23 11:29:00 -08:00
Joel Schlosser	68d438c9da	Add PixelUnshuffle (#49334 ) Summary: Adds an implementation of `torch.nn.PixelUnshuffle` as the inverse operation of `torch.nn.PixelShuffle`. This addresses https://github.com/pytorch/pytorch/issues/2456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49334 Test Plan: ``` # Unit tests. python test/test_nn.py TestNN.test_pixel_shuffle_unshuffle # Module test. python test/test_nn.py TestNN.test_PixelUnshuffle # C++ API tests. build/bin/test_api # C++ / python parity tests. python test/test_cpp_api_parity.py # JIT test. python test/test_jit.py TestJitGeneratedFunctional.test_nn_pixel_unshuffle # Override tests. python test/test_overrides.py # Type hint tests. python test/test_type_hints.py ``` Screenshots of rendered docs: <img width="876" alt="Screen Shot 2020-12-18 at 12 19 05 PM" src="https://user-images.githubusercontent.com/75754324/102642255-6b07bb00-412b-11eb-88fa-e53e7e8ba720.png"> <img width="984" alt="Screen Shot 2020-12-18 at 12 19 26 PM" src="https://user-images.githubusercontent.com/75754324/102642276-70fd9c00-412b-11eb-8548-445082a2db02.png"> <img width="932" alt="Screen Shot 2020-12-18 at 12 19 34 PM" src="https://user-images.githubusercontent.com/75754324/102642704-19abfb80-412c-11eb-9546-95bdd1c3cf22.png"> <img width="876" alt="Screen Shot 2020-12-22 at 12 51 36 PM" src="https://user-images.githubusercontent.com/75754324/102918259-986aa680-4454-11eb-99e7-a0b4c8b3e283.png"> <img width="869" alt="Screen Shot 2020-12-22 at 12 51 44 PM" src="https://user-images.githubusercontent.com/75754324/102918274-9ef91e00-4454-11eb-94bb-91b58aff47d3.png"> Reviewed By: mruberry Differential Revision: D25401439 Pulled By: jbschlosser fbshipit-source-id: 209d92ce7295e51699e83616d0c62170a7ce75c8	2020-12-22 20:14:55 -08:00
kshitij12345	2780400904	[numpy] Add `torch.xlogy` (#48777 ) Summary: Reference https://github.com/pytorch/pytorch/issues/38349 Fixes https://github.com/pytorch/pytorch/issues/22656 TODO: * [x] Add docs * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48777 Reviewed By: ngimel Differential Revision: D25681346 Pulled By: mruberry fbshipit-source-id: 369e0a29ac8a2c44de95eec115bf75943fe1aa45	2020-12-22 15:05:59 -08:00
albanD	c23808d8e8	Reland: Add base forward grad logic (#49734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49734 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D25678797 Pulled By: albanD fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd	2020-12-22 12:11:27 -08:00
kshitij12345	2df249f0ab	[fix] inplace remainder/% (#49390 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49214 BC-Breaking Before this PR, `%=` didn't actually do the operation inplace and returned a new tensor. After this PR, `%=` operation is actually inplace and the modified input tensor is returned. Before PR, ```python >>> import torch >>> a = torch.tensor([11,12,13]) >>> id(a) 139627966219328 >>> a %= 10 >>> id(a) 139627966219264 ``` After PR, ```python >>> import torch >>> a = torch.tensor([11,12,13]) >>> id(a) 139804702425280 >>> a %= 10 >>> id(a) 139804702425280 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49390 Reviewed By: izdeby Differential Revision: D25560423 Pulled By: zou3519 fbshipit-source-id: 2b92bfda260582aa4ac22c4025376295e51f854e	2020-12-22 07:30:03 -08:00
Walter Shen	f5178bf151	Revert D25607503: Add base forward grad logic Test Plan: revert-hammer Differential Revision: D25607503 (`fdf02eff3d`) Original commit changeset: f1396290de1d fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f	2020-12-21 19:56:28 -08:00
albanD	fdf02eff3d	Add base forward grad logic (#49097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49097 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25607503 Pulled By: albanD fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099	2020-12-21 14:39:43 -08:00
Xiong Wei	3779bdec56	Implementing NumPy-like function torch.broadcast_to (#48997 ) Summary: Related https://github.com/pytorch/pytorch/issues/38349 Implement NumPy-like function `torch.broadcast_to` to broadcast the input tensor to a new shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48997 Reviewed By: anjali411, ngimel Differential Revision: D25663937 Pulled By: mruberry fbshipit-source-id: 0415c03f92f02684983f412666d0a44515b99373	2020-12-21 11:24:50 -08:00
Ivan Yashchuk	8be205ae13	Added linalg.solve (#48456 ) Summary: This PR adds `torch.linalg.solve`. `linalg_solve_out` uses in-place operations on the provided result tensor. I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization. In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456 Reviewed By: izdeby Differential Revision: D25562222 Pulled By: mruberry fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b	2020-12-21 10:11:12 -08:00
Jeffrey Wan	d0a12c5a47	Add sinc operator (#48740 ) Summary: Implements the sinc operator. See https://numpy.org/doc/stable/reference/generated/numpy.sinc.html ![image](https://user-images.githubusercontent.com/13428986/101653855-cdffa080-3a0d-11eb-8426-ecc81c152ebd.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48740 Reviewed By: ezyang Differential Revision: D25597565 Pulled By: soulitzer fbshipit-source-id: 6dbcf282ee4eba34930bc9e5c85c0c5e79cf0322	2020-12-18 15:52:24 -08:00
Ryan Spring	65876d3f51	Change aten::native_layer_norm signature to match torch.layer_norm definition (#48971 ) Summary: This PR is to change the `aten::native_layer_norm` and `aten::native_layer_norm_backward` signature to match `torch.layer_norm` definition. The current definition doesn't provide enough information to the PyTorch JIT to fuse layer_norm during training. `native_layer_norm(X, gamma, beta, M, N, eps)` => `native_layer_norm(input, normalized_shape, weight, bias, eps)` `native_layer_norm_backward(dY, X, mean, rstd, gamma, M, N, grad_input_mask)` => `native_layer_norm_backward(dY, input, normalized_shape, mean, rstd, weight, bias, grad_input_mask)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48971 Reviewed By: izdeby Differential Revision: D25574070 Pulled By: ngimel fbshipit-source-id: 23e2804295a95bda3f1ca6b41a1e4c5a3d4d31b4	2020-12-16 23:09:18 -08:00
Jeffrey Wan	7767dcfc8d	Revert D25564477: [pytorch][PR] Add sinc operator Test Plan: revert-hammer Differential Revision: D25564477 (`bbc71435b7`) Original commit changeset: 13f36a2b84da fbshipit-source-id: 58cbe8109efaf499dd017531878b9fbbb27976bc	2020-12-16 13:19:16 -08:00
Jeffrey Wan	bbc71435b7	Add sinc operator (#48740 ) Summary: Implements the sinc operator. See https://numpy.org/doc/stable/reference/generated/numpy.sinc.html ![image](https://user-images.githubusercontent.com/13428986/101653855-cdffa080-3a0d-11eb-8426-ecc81c152ebd.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48740 Reviewed By: izdeby Differential Revision: D25564477 Pulled By: soulitzer fbshipit-source-id: 13f36a2b84dadfb4fd1442a2a40a3a3246cbaecb	2020-12-16 10:33:02 -08:00
Bharat123rox	3aeb9cc85d	[DOCS]Correct docs for torch.lu_solve (#47762 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43498 by correcting the function signature of `torch.lu_solve` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47762 Reviewed By: ljk53 Differential Revision: D24900259 Pulled By: ailzhang fbshipit-source-id: 2a43170bde57e03d44025b23e3abcda169cfc9e2	2020-12-07 19:35:23 -08:00
Peter Bell	5180caeeb4	Remove deprecated spectral ops from torch namespace (#48594 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175 This removes the 4 deprecated spectral functions: `torch.{fft,rfft,ifft,irfft}`. `torch.fft` is also now imported by by default. The actual `at::native` functions are still used in `torch.stft` so can't be full removed yet. But will once https://github.com/pytorch/pytorch/issues/47601 has been merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48594 Reviewed By: heitorschueroff Differential Revision: D25298929 Pulled By: mruberry fbshipit-source-id: e36737fe8192fcd16f7e6310f8b49de478e63bf0	2020-12-05 04:12:32 -08:00
kiyosora	6ab84ca0f3	Implement NumPy-like function torch.msort() (#48440 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.msort()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/48440 Reviewed By: bdhirsh Differential Revision: D25265753 Pulled By: mruberry fbshipit-source-id: 7709ac5e5667e7541a3dc9048b9c9896b1a6dfa1	2020-12-04 04:32:09 -08:00
Ivan Yashchuk	cb285080b0	Added computing matrix condition numbers (linalg.cond) (#45832 ) Summary: This PR adds `torch.linalg.cond` for NumPy compatibility. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45832 Reviewed By: ngimel Differential Revision: D25183690 Pulled By: mruberry fbshipit-source-id: a727959bfec2bc2dc36df59d9ef79c0534b68194	2020-12-04 02:23:57 -08:00
Heitor Schueroff	c134f32835	Implemented torch.inner (#46716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46716 Implemented torch.inner similar to [numpy.inner](https://numpy.org/doc/stable/reference/generated/numpy.inner.html). For now it's implemented as a composite op. TODO - [x] Add documentation Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24860351 Pulled By: heitorschueroff fbshipit-source-id: de5c82f285893495491fdba73b35634f4d00bac8	2020-12-03 11:37:55 -08:00
kshitij12345	5c9cef9a6c	[numpy] Add `torch.moveaxis` (#48581 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38349 #36048 https://github.com/pytorch/pytorch/pull/41480#issuecomment-734398262 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48581 Reviewed By: bdhirsh Differential Revision: D25276307 Pulled By: mruberry fbshipit-source-id: 3e3e4df1343c5ce5b71457badc43f08c419ec5c3	2020-12-03 10:34:33 -08:00
Fritz Obermeyer	313e77fc06	Add broadcast_shapes() function and use it in MultivariateNormal (#43935 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43837 This adds a `torch.broadcast_shapes()` function similar to Pyro's [broadcast_shape()](`7c2c22c10d/pyro/distributions/util.py (L151)`) and JAX's [lax.broadcast_shapes()](https://jax.readthedocs.io/en/test-docs/_modules/jax/lax/lax.html). This helper is useful e.g. in multivariate distributions that are parameterized by multiple tensors and we want to `torch.broadcast_tensors()` but the parameter tensors have different "event shape" (e.g. mean vectors and covariance matrices). This helper is already heavily used in Pyro's distribution codebase, and we would like to start using it in `torch.distributions`. - [x] refactor `MultivariateNormal`'s expansion logic to use `torch.broadcast_shapes()` - [x] add unit tests for `torch.broadcast_shapes()` - [x] add docs cc neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/43935 Reviewed By: bdhirsh Differential Revision: D25275213 Pulled By: neerajprad fbshipit-source-id: 1011fdd597d0a7a4ef744ebc359bbb3c3be2aadc	2020-12-03 02:42:04 -08:00
Ivan Yashchuk	74330e0497	Added linalg.matrix_rank (#48206 ) Summary: This PR adds `torch.linalg.matrix_rank`. Changes compared to the original `torch.matrix_rank`: - input with the complex dtype is supported - batched input is supported - "symmetric" kwarg renamed to "hermitian" Should I update the documentation for `torch.matrix_rank`? For the input with no elements (for example 0×0 matrix), the current implementation is divergent from NumPy. NumPy stumbles on not defined max for such input, here I chose to return appropriately sized tensor of zeros. I think that's mathematically a correct thing to do. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48206 Reviewed By: albanD Differential Revision: D25211965 Pulled By: mruberry fbshipit-source-id: ae87227150ab2cffa07f37b4a3ab228788701837	2020-12-02 03:29:25 -08:00
Hameer Abbasi	4e15877d5c	Add documentation for torch.overrides submodule. (#48170 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48170 Reviewed By: ejguan Differential Revision: D25220942 Pulled By: ezyang fbshipit-source-id: a2b7f7b565f5e77173d8ce2fe9676a8131f929b6	2020-11-30 11:25:31 -08:00
kiyosora	272f4db043	Implement NumPy-like function torch.float_power() (#44937 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.float_power()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/44937 Reviewed By: ngimel Differential Revision: D25192119 Pulled By: mruberry fbshipit-source-id: 2e446b8e0c2825f045fe057e30c9419335557a05	2020-11-27 18:01:42 -08:00
Fayçal Arbai	2e0a8b75d8	An implementation of torch.tile as requested in pytorch/pytorch#38349 (#47974 ) Summary: The approach is to simply reuse `torch.repeat` but adding one more functionality to tile, which is to prepend 1's to reps arrays if there are more dimensions to the tensors than the reps given in input. Thus for a tensor of shape (64, 3, 24, 24) and reps of (2, 2) will become (1, 1, 2, 2), which is what NumPy does. I've encountered some instability with the test on my end, where I could get a random failure of the test (due to, sometimes, random value of `self.dim()`, and sometimes, segfaults). I'd appreciate any feedback on the test or an explanation for this instability so I can this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47974 Reviewed By: ngimel Differential Revision: D25148963 Pulled By: mruberry fbshipit-source-id: bf63b72c6fe3d3998a682822e669666f7cc97c58	2020-11-24 18:07:25 -08:00
Ivan Yashchuk	4ed7f36ed1	Added linalg.eigh, linalg.eigvalsh (#45526 ) Summary: This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility. The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does). Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526 Reviewed By: gchanan Differential Revision: D25022659 Pulled By: mruberry fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a	2020-11-22 04:57:28 -08:00
Randall Hunt	562d4c3bc5	Add basic ldexp operator for numpy compatibility (#45370 ) Summary: Adds ldexp operator for https://github.com/pytorch/pytorch/issues/38349 I'm not entirely sure the changes to `NamedRegistrations.cpp` were needed but I saw other operators in there so I added it. Normally the ldexp operator is used along with the frexp to construct and deconstruct floating point values. This is useful for performing operations on either the mantissa and exponent portions of floating point values. Sleef, std math.h, and cuda support both ldexp and frexp but not for all data types. I wasn't able to figure out how to get the iterators to play nicely with a vectorized kernel so I have left this with just the normal CPU kernel for now. This is the first operator I'm adding so please review with an eye for errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45370 Reviewed By: mruberry Differential Revision: D24333516 Pulled By: ranman fbshipit-source-id: 2df78088f00aa9789aae1124eda399771e120d3f	2020-11-20 04:09:39 -08:00
Ivan Yashchuk	343b3e5cae	Added linalg.tensorinv (#45969 ) Summary: This PR adds `torch.linalg.tensorinv` for NumPy compatibility. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45969 Reviewed By: zhangguanheng66 Differential Revision: D25060568 Pulled By: mruberry fbshipit-source-id: 3b145ce64e4bd5021bc229f5ffdd791c572673a0	2020-11-19 11:54:50 -08:00
mfkasim91	8819bad86c	Implement igammac (3rd PR) (#48171 ) Summary: Related: https://github.com/pytorch/pytorch/issues/46183 (torch.igamma) This is the regularized upper incomplete gamma function. This is supposed to be exactly the same as https://github.com/pytorch/pytorch/issues/47463, but after rebasing the `viable/strict` branch. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/48171 Reviewed By: zhangguanheng66 Differential Revision: D25060107 Pulled By: mruberry fbshipit-source-id: 89780dea21dbb2141cbc4f7f18192cb78a769b17	2020-11-18 23:44:32 -08:00
kshitij12345	68a3a3f3b5	Add `torch.swapdims` and `torch.swapaxes` (#46041 ) Summary: Reference https://github.com/pytorch/pytorch/issues/38349 Delegates to `torch.transpose` (not sure what is the best way to alias) TODO: * [x] Add test * [x] Add documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/46041 Reviewed By: gchanan Differential Revision: D25022816 Pulled By: mruberry fbshipit-source-id: c80223d081cef84f523ef9b23fbedeb2f8c1efc5	2020-11-18 11:35:53 -08:00
Ivan Yashchuk	260daf088d	Added linalg.cholesky (#46083 ) Summary: This PR adds `torch.linalg.cholesky` function that matches `numpy.linalg.cholesky`. Fixed `lda` argument to `lapackCholesky` calls. Added `random_hermitian_pd_matrix` helper function for tests. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46083 Reviewed By: ailzhang Differential Revision: D24861752 Pulled By: mruberry fbshipit-source-id: 214dbceb4e8a2c589df209493efd843962d25593	2020-11-13 16:50:40 -08:00
Richard Zou	59aca02224	Implement Tensor.new_empty_strided(sizes, strides, , dtype, device, requires_grad) (#47225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47225 Summary ------- This PR implements Tensor.new_empty_strided. Many of our torch. factory functions have a corresponding new_* method (e.g., torch.empty and torch.new_empty), but there is no corresponding method to torch.empty_strided. This PR adds one. Motivation ---------- The real motivation behind this is for vmap to be able to work through CopySlices. CopySlices shows up a lot in double backwards because a lot of view functions have backward formulas that perform view+inplace. `e0fd590ec9/torch/csrc/autograd/functions/tensor.cpp (L78-L106)` To support vmap through CopySlices, the approach in this stack is to: - add `Tensor.new_empty_strided` and replace `empty_strided` in CopySlices with that so that we can propagate batch information. - Make some slight modifications to AsStridedBackward (and add as_strided batching rule) Please let me know if it would be better if I squashed everything related to supporting vmap over CopySlices together into a single big PR. Test Plan --------- - New tests. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D24741688 Pulled By: zou3519 fbshipit-source-id: b688047d2eb3f92998896373b2e9d87caf2c4c39	2020-11-09 08:31:01 -08:00
Edward Yang	35491412d1	Revert D24649817: [pytorch][PR] Fix pickling for Tensor subclasses. Test Plan: revert-hammer Differential Revision: D24649817 (`c4209f1115`) Original commit changeset: 1872faa36030 fbshipit-source-id: b9832cea45552bd8776909118c4324fbd61fd414	2020-11-05 10:25:48 -08:00
Heitor Schueroff	a4ba018e57	Updated docs/test for dot and vdot (#47242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47242 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D24733771 Pulled By: heitorschueroff fbshipit-source-id: 92e3b0e28e0565918335fa85d52abe5db9eeff57	2020-11-05 06:27:50 -08:00
Hameer Abbasi	c4209f1115	Fix pickling for Tensor subclasses. (#47115 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47051 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47115 Reviewed By: ejguan Differential Revision: D24649817 Pulled By: ezyang fbshipit-source-id: 1872faa3603085f07c0a8a026404161d0715720d	2020-11-04 19:25:32 -08:00
Hameer Abbasi	60ae84754e	Add torch.overrides checks for submodules. (#47285 ) Summary: Partially addresses the override component of https://github.com/pytorch/pytorch/issues/42666 and https://github.com/pytorch/pytorch/issues/42175. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47285 Reviewed By: agolynski Differential Revision: D24706493 Pulled By: ezyang fbshipit-source-id: bf5a742ac7002dce5a9a454a945f1994b4c8b93e	2020-11-04 19:14:04 -08:00
Erjia Guan	f1ac63d324	Implement copysign (#46396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46396 Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` \| a \| b \| c \| a.grad \| \| -1 \| -1 \| -1 \| 1 \| \| -0 \| -1 \| -0 \| 0 \| \| 0 \| -1 \| -0 \| 0 \| \| 1 \| -1 \| -1 \| -1 \| \| -1 \| -0 \| -1 \| 1 \| \| -0 \| -0 \| 0 \| 0 \| \| 0 \| -0 \| 0 \| 0 \| \| 1 \| -0 \| -1 \| -1 \| \| -1 \| 0 \| 1 \| -1 \| \| -0 \| 0 \| 0 \| 0 \| \| 0 \| 0 \| 0 \| 0 \| \| 1 \| 0 \| 1 \| 1 \| \| -1 \| 1 \| 1 \| -1 \| \| -0 \| 1 \| 0 \| 0 \| \| 0 \| 1 \| 0 \| 0 \| \| 1 \| 1 \| 1 \| 1 \| This function becomes non-differentiable at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test (cpu/gpu) - [x] doc - [x] ~kernel_vec~ Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24401366 Pulled By: ejguan fbshipit-source-id: 3621c5ff74b185376a3705589983bb5197ab896d	2020-11-04 08:08:57 -08:00
Howard Huang	a8ef4d3f0b	Provide 'out' parameter for 'tensordot' (#47278 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42102 Added an optional out parameter to the tensordot operation to allow using buffers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47278 Test Plan: pytest test/test_torch.py -k tensordot -v Reviewed By: agolynski Differential Revision: D24706258 Pulled By: H-Huang fbshipit-source-id: eb4bcd114795f67de3a670291034107d2826ea69	2020-11-03 15:56:00 -08:00
Ivan Yashchuk	f276ab55cd	Added Kronecker product of tensors (torch.kron) (#45358 ) Summary: This PR adds a function for calculating the Kronecker product of tensors. The implementation is based on `at::tensordot` with permutations and reshape. Tests pass. TODO: - [x] Add more test cases - [x] Write documentation - [x] Add entry `common_methods_invokations.py` Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45358 Reviewed By: mrshenli Differential Revision: D24680755 Pulled By: mruberry fbshipit-source-id: b1f8694589349986c3abfda3dc1971584932b3fa	2020-11-03 12:41:41 -08:00
Jeffrey Wan	f5073b0c5a	Add `inputs` argument to `autograd.backward()` (#46855 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46373 As noted in https://github.com/pytorch/pytorch/issues/46373, there needs to be a flag passed into the engine that indicates whether it was executed through the backward api or grad api. Tentatively named the flag `accumulate_grad` since functionally, backward api accumulates grad into .grad while grad api captures the grad and returns it. Moving changes not necessary to the python api (cpp, torchscript) to a new PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46855 Reviewed By: ngimel Differential Revision: D24649054 Pulled By: soulitzer fbshipit-source-id: 6925d5a67d583eeb781fc7cfaec807c410e1fc65	2020-11-02 14:32:38 -08:00
Xiong Wei	74d730c0b5	implement NumPy-like functionality column_stack, row_stack (#46313 ) Summary: Related https://github.com/pytorch/pytorch/issues/38349 This PR implements `column_stack` as the composite ops of `torch.reshape` and `torch.hstack`, and makes `row_stack` as the alias of `torch.vstack`. Todo - [x] docs - [x] alias pattern for `row_stack` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46313 Reviewed By: ngimel Differential Revision: D24585471 Pulled By: mruberry fbshipit-source-id: 62fc0ffd43d051dc3ecf386a3e9c0b89086c1d1c	2020-10-29 12:14:39 -07:00
mfkasim91	6eaa324c9f	Implement torch.igamma (#46183 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41637 This is regularized lower incomplete gamma function, equivalent to scipy's `gammainc` and tensorflow `igamma`. cc fritzo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46183 Reviewed By: gchanan Differential Revision: D24479126 Pulled By: mruberry fbshipit-source-id: fdf8ea289fe4ca1b408810732192411e948fcdfe	2020-10-29 11:40:18 -07:00
Ivan Yashchuk	f629fbe235	Added torch.linalg.tensorsolve (#46142 ) Summary: This PR adds `torch.linalg.tensorsolve` function that matches `numpy.linalg.tensorsolve`. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46142 Reviewed By: izdeby Differential Revision: D24539400 Pulled By: mruberry fbshipit-source-id: 6e38364fe0bc511e739036deb274d9307df119b2	2020-10-29 10:29:28 -07:00
albanD	27e2ea4cea	Make add_relu an internal function (#46676 ) Summary: Cleanup for 1.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46676 Reviewed By: gchanan Differential Revision: D24458565 Pulled By: albanD fbshipit-source-id: b1e4b4630233d3f1a4bac20e3077411d1ae17f7b	2020-10-22 18:08:15 -07:00
Ivan Kobzarev	3112e23428	[py][vulkan][reland] Add is_vulkan to py api, add vulkan to device type parsing (#46655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46655 Test Plan: Imported from OSS Pulled By: IvanKobzarev Reviewed By: mrshenli Differential Revision: D24448984 fbshipit-source-id: 5000846a06077f7a5a06dd51da422d2a42f70820	2020-10-22 09:35:50 -07:00
albanD	143d1fd9f5	Namespace cleanup for 1.7 Part 2 (#46673 ) Summary: make valgrind_toggle and valgrind_supported_platform private functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/46673 Reviewed By: gchanan Differential Revision: D24458133 Pulled By: albanD fbshipit-source-id: 6f3fad9931d73223085edbd3cd3b7830c569570c	2020-10-22 07:57:51 -07:00
Shen Li	cebe87fe3a	Revert D24379422: [py][vulkan] Add is_vulkan to py api, add vulkan to device type parsing Test Plan: revert-hammer Differential Revision: D24379422 (`e8fbe54cf5`) Original commit changeset: afab89bb9e17 fbshipit-source-id: 743c77e453239f10c155c67490cba5a42ab42f58	2020-10-21 08:23:05 -07:00
Ivan Kobzarev	e8fbe54cf5	[py][vulkan] Add is_vulkan to py api, add vulkan to device type parsing (#46511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46511 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D24379422 Pulled By: IvanKobzarev fbshipit-source-id: afab89bb9e17c50934083598262bbe14ea82e893	2020-10-20 20:04:24 -07:00
Alexander Grund	5b0f400488	Replace list(map(...)) constructs by list comprehensions (#46461 ) Summary: As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant. It also fixes a bug detected by this where the argument order of `map` was confused: `030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)` Fixes https://github.com/pytorch/pytorch/issues/46392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461 Reviewed By: ailzhang Differential Revision: D24367015 Pulled By: ezyang fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7	2020-10-19 18:42:49 -07:00
Hameer Abbasi	e366591dc8	Fix incorrect signatures in get_testing_overrides, and add test for incorrect signatures (#45983 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45983 Reviewed By: agolynski Differential Revision: D24220048 Pulled By: ezyang fbshipit-source-id: 67826efdb203d849e028467829f7b5ad4559ec67	2020-10-15 07:48:20 -07:00
Erjia Guan	bed3b40523	Implement ravel (#46098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46098 Doc: ![image](https://user-images.githubusercontent.com/68879799/95611323-ae5cf380-0a2f-11eb-9b8e-56bf79ce68af.png) Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24253213 Pulled By: ejguan fbshipit-source-id: 42a866c902272cbe3743a9d0cb3afb9165d51c0b	2020-10-12 16:00:44 -07:00
Heitor Schueroff de Souza	636eb18029	Fixed median nan propagation and implemented nanmedian (#45847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45847 Original PR here https://github.com/pytorch/pytorch/pull/45084. Created this one because I was having problems with ghstack. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24136629 Pulled By: heitorschueroff fbshipit-source-id: dd7c7540a33f6a19e1ad70ba2479d5de44abbdf9	2020-10-08 11:20:21 -07:00
Kurt Mohler	ef4817fe5a	Add `tensor_split` function, based on `numpy.array_split` (#45168 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/9382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45168 Reviewed By: ngimel Differential Revision: D24166164 Pulled By: mruberry fbshipit-source-id: 795459821e52885bc99623a01a2abec060995ce6	2020-10-07 23:14:48 -07:00
kshitij12345	f65ab89edd	[numpy] Add torch.nan_to_num (#44592 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO: * [x] Add tests * [x] Add docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44592 Reviewed By: colesbury Differential Revision: D24079472 Pulled By: mruberry fbshipit-source-id: 2b67d36cba46eaa7ca16cd72671b57750bd568bc	2020-10-05 01:38:56 -07:00
Taylor Robie	2b13d9413e	Re-land: Add callgrind collection to Timer #44717 (#45586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45586 Test Plan: The unit test has been softened to be less platform sensitive. Reviewed By: mruberry Differential Revision: D24025415 Pulled By: robieta fbshipit-source-id: ee986933b984e736cf1525e1297de6b21ac1f0cf	2020-09-30 17:43:06 -07:00
Mike Ruberry	51d0ae9207	Revert D24010742: [pytorch][PR] Add callgrind collection to Timer Test Plan: revert-hammer Differential Revision: D24010742 (`9b27e0926b`) Original commit changeset: df6bc765f8ef fbshipit-source-id: 4c1edd57ea932896f7052716427059c924222501	2020-09-30 10:15:46 -07:00
Taylor Robie	9b27e0926b	Add callgrind collection to Timer (#44717 ) Summary: This PR allows Timer to collect deterministic instruction counts for (some) snippets. Because of the intrusive nature of Valgrind (effectively replacing the CPU with an emulated one) we have to perform our measurements in a separate process. This PR writes a `.py` file containing the Timer's `setup` and `stmt`, and executes it within a `valgrind` subprocess along with a plethora of checks and error handling. There is still a bit of jitter around the edges due to the Python glue that I'm using, but the PyTorch signal is quite good and thus this provides a low friction way of getting signal. I considered using JIT as an alternative, but: A) Python specific overheads (e.g. parsing) are important B) JIT might do rewrites which would complicate measurement. Consider the following bit of code, related to https://github.com/pytorch/pytorch/issues/44484: ``` from torch.utils._benchmark import Timer counts = Timer( "x.backward()", setup="x = torch.ones((1,)) + torch.ones((1,), requires_grad=True)" ).collect_callgrind() for c, fn in counts[:20]: print(f"{c:>12} {fn}") ``` ``` 812800 ???:_dl_update_slotinfo 355600 ???:update_get_addr 308300 work/Python/ceval.c:_PyEval_EvalFrameDefault'2 304800 ???:__tls_get_addr 196059 ???:_int_free 152400 ???:__tls_get_addr_slow 138400 build/../c10/core/ScalarType.h:c10::typeMetaToScalarType(caffe2::TypeMeta) 126526 work/Objects/dictobject.c:_PyDict_LoadGlobal 114268 ???:malloc 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 85900 work/Python/ceval.c:_PyEval_EvalFrameDefault 79946 work/Objects/typeobject.c:_PyType_Lookup 72000 build/../c10/core/Device.h:c10::Device::validate() 70000 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() 66400 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 63000 ???:pthread_mutex_lock 61200 work/Objects/dictobject.c:PyDict_GetItem 59800 ???:free 58400 work/Objects/tupleobject.c:tupledealloc 56707 work/Objects/dictobject.c:lookdict_unicode_nodummy ``` Moreover, if we backport this PR to 1.6 (just copy the `_benchmarks` folder) and load those counts as `counts_1_6`, then we can easily diff them: ``` print(f"Head instructions: {sum(c for c, _ in counts)}") print(f"1.6 instructions: {sum(c for c, _ in counts_1_6)}") count_dict = {fn: c for c, fn in counts} for c, fn in counts_1_6: _ = count_dict.setdefault(fn, 0) count_dict[fn] -= c count_diffs = sorted([(c, fn) for fn, c in count_dict.items()], reverse=True) for c, fn in count_diffs[:15] + [["", "..."]] + count_diffs[-15:]: print(f"{c:>8} {fn}") ``` ``` Head instructions: 7609547 1.6 instructions: 6059648 169600 ???:_dl_update_slotinfo 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 74200 ???:update_get_addr 63600 ???:__tls_get_addr 46800 work/Python/ceval.c:_PyEval_EvalFrameDefault 33512 work/Objects/dictobject.c:_PyDict_LoadGlobal 31800 ???:__tls_get_addr_slow 31700 build/../aten/src/ATen/record_function.cpp:at::RecordFunction::RecordFunction(at::RecordScope) 28300 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object, _object, bool) 27800 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 27401 work/Objects/dictobject.c:lookdict_unicode_nodummy 24115 work/Objects/typeobject.c:_PyType_Lookup 24080 ???:_int_free 21700 work/Objects/dictobject.c:PyDict_GetItemWithError 20700 work/Objects/dictobject.c:PyDict_GetItem ... -3200 build/../c10/util/SmallVector.h:at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool) -3400 build/../aten/src/ATen/native/TensorIterator.cpp:at::TensorIterator::resize_outputs(at::TensorIteratorConfig const&) -3500 /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:std::unique_lock<std::mutex>::unlock() -3700 build/../torch/csrc/utils/python_arg_parser.cpp:torch::PythonArgParser::raw_parse(_object, _object, _object) -4207 work/Objects/obmalloc.c:PyMem_Calloc -4500 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() -4800 build/../torch/csrc/autograd/generated/VariableType_2.cpp:torch::autograd::VariableType::add__Tensor(at::Tensor&, at::Tensor const&, c10::Scalar) -5000 build/../c10/core/impl/LocalDispatchKeySet.cpp:c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKey) -5300 work/Objects/listobject.c:PyList_New -5400 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionParameter::check(_object, std::vector<pybind11::handle, std::allocator<pybind11::handle> >&) -5600 /usr/include/c++/8/bits/std_mutex.h:std::unique_lock<std::mutex>::unlock() -6231 work/Objects/obmalloc.c:PyMem_Free -6300 work/Objects/listobject.c:list_repeat -11200 work/Objects/listobject.c:list_dealloc -28900 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object*, bool) ``` Remaining TODOs: Include a timer in the generated script for cuda sync. * Add valgrind to CircleCI machines and add a unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44717 Reviewed By: soumith Differential Revision: D24010742 Pulled By: robieta fbshipit-source-id: df6bc765f8efce7193893edba186cd62b4b23623	2020-09-30 05:52:54 -07:00
Zafar	bb478810e0	[quant] torch.max_pool1d (#45152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45152 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23846473 Pulled By: z-a-f fbshipit-source-id: 38fd611e568e4f8b39b7a00adeb42c7b99576360	2020-09-29 01:45:22 -07:00
Brian Hirsh	439930c81b	adding a beta parameter to the smooth_l1 loss fn (#44433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23636720 Pulled By: bdhirsh fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d	2020-09-25 16:36:28 -07:00
Supriya Rao	60665ace17	[quant] Add optimized approach to calculate qparams for qembedding_bag (#45149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45149 The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23848060 fbshipit-source-id: c6c57c9bb07664c3f1c87dd7664543e09f634aee	2020-09-23 19:00:22 -07:00
Mike Ruberry	ef885c10d8	[pytorch] Add triplet margin loss with custom distance (#43680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43680 As discussed [here](https://github.com/pytorch/pytorch/issues/43342), adding in a Python-only implementation of the triplet-margin loss that takes a custom distance function. Still discussing whether this is necessary to add to PyTorch Core. Test Plan: python test/run_tests.py Imported from OSS Reviewed By: albanD Differential Revision: D23363898 fbshipit-source-id: 1cafc05abecdbe7812b41deaa1e50ea11239d0cb	2020-09-22 11:35:52 -07:00
anjali411	58b6ab69e5	torch.sgn for complex tensors (#39955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955 resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors. `torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0` This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23460526 Pulled By: anjali411 fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92	2020-09-22 08:24:53 -07:00
Mike Ruberry	60709ad1bf	Adds multiply and divide aliases (#44463 ) Summary: These alias are consistent with NumPy. Note that C++'s naming would be different (std::multiplies and std::divides), and that PyTorch's existing names (mul and div) are consistent with Python's dunders. This also improves the instructions for adding an alias to clarify that dispatch keys should be removed when copying native_function.yaml entries to create the alias entries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44463 Reviewed By: ngimel Differential Revision: D23670782 Pulled By: mruberry fbshipit-source-id: 9f1bdf8ff447abc624ff9e9be7ac600f98340ac4	2020-09-19 15:47:52 -07:00
Peter Bell	caea1adc35	Complex support for stft and istft (#43886 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175, fixes https://github.com/pytorch/pytorch/issues/34797 This adds complex support to `torch.stft` and `torch.istft`. Note that there are really two issues with complex here: complex signals, and returning complex tensors. ## Complex signals and windows `stft` currently assumes all signals are real and uses `rfft` with `onesided=True` by default. Similarly, `istft` always takes a complex fourier series and uses `irfft` to return real signals. For `stft`, I now allow complex inputs and windows by calling the full `fft` if either are complex. If the user gives `onesided=True` and the signal is complex, then this doesn't work and raises an error instead. For `istft`, there's no way to automatically know what to do when `onesided=False` because that could either be a redundant representation of a real signal or a complex signal. So there, the user needs to pass the argument `return_complex=True` in order to use `ifft` and get a complex result back. ## stft returning complex tensors The other issue is that `stft` returns a complex result, represented as a `(... X 2)` real tensor. I think ideally we want this to return proper complex tensors but to preserver BC I've had to add a `return_complex` argument to manage this transition. `return_complex` defaults to false for real inputs to preserve BC but defaults to True for complex inputs where there is no BC to consider. In order to `return_complex` by default everywhere without a sudden BC-breaking change, a simple transition plan could be: 1. introduce `return_complex`, defaulted to false when BC is an issue but giving a warning. (this PR) 2. raise an error in cases where `return_complex` defaults to false, making it a required argument. 3. change `return_complex` default to true in all cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43886 Reviewed By: glaringlee Differential Revision: D23760174 Pulled By: mruberry fbshipit-source-id: 2fec4404f5d980ddd6bdd941a63852a555eb9147	2020-09-18 01:39:47 -07:00
Heitor Schueroff de Souza	28085cbd39	Fixed quantile nan propagation and implemented nanquantile (#44393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44393 torch.quantile now correctly propagates nan and implemented torch.nanquantile similar to numpy.nanquantile. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23649613 Pulled By: heitorschueroff fbshipit-source-id: 5201d076745ae1237cedc7631c28cf446be99936	2020-09-17 05:53:25 -07:00
Muthu Arivoli	b61d3d8be8	Implement torch.kaiser_window (#44271 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44271 Reviewed By: ngimel Differential Revision: D23727972 Pulled By: mruberry fbshipit-source-id: b4c931b2eb3a536231ad6d6c3cb66e52a13286ac	2020-09-16 20:41:31 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
kshitij12345	c68a99bd61	[numpy] Add `torch.exp2` (#44184 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO * [x] Add tests * [x] Add docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44184 Reviewed By: ngimel Differential Revision: D23674237 Pulled By: mruberry fbshipit-source-id: 7f4fb1900fad3051cd7fc9d3d7f6d985c5fb093c	2020-09-14 04:05:37 -07:00
Hameer Abbasi	f9a0d0c21e	Allow Tensor-likes in torch.autograd.gradcheck (#43877 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43877 Reviewed By: zou3519 Differential Revision: D23493257 Pulled By: ezyang fbshipit-source-id: 6cdaabe17157b484e9491189706ccc15420ac239	2020-09-10 09:02:17 -07:00
Mike Ruberry	83a6e7d342	Adds inequality testing aliases for better NumPy compatibility (#43870 ) Summary: This PR adds the following aliaes: - not_equal for torch.ne - greater for torch.gt - greater_equal for torch.ge - less for torch.lt - less_equal for torch.le This aliases are consistent with NumPy's naming for these functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43870 Reviewed By: zou3519 Differential Revision: D23498975 Pulled By: mruberry fbshipit-source-id: 78560df98c9f7747e804a420c1e53fd1dd225002	2020-09-06 09:36:23 -07:00
Muthu Arivoli	719d29dab5	Implement torch.i0 and torch.kaiser_window (#43132 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43132 Reviewed By: smessmer Differential Revision: D23479072 Pulled By: mruberry fbshipit-source-id: 4fb1de44830771c6a7222cf19f7728d9ac7c043b	2020-09-05 23:11:47 -07:00
kshitij12345	b6b5ebc345	Add `torch.vdot` (#43004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43004 Reviewed By: mruberry Differential Revision: D23318935 Pulled By: anjali411 fbshipit-source-id: 12d4824b7cb42bb9ca703172c54ec5c663d9e325	2020-09-02 09:00:30 -07:00
kiyosora	3682df77db	Implementing NumPy-like function torch.heaviside() (#42523 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.heaviside()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/42523 Reviewed By: ngimel Differential Revision: D23416743 Pulled By: mruberry fbshipit-source-id: 9975bd9c9fa73bd0958fe9879f79a692aeb722d5	2020-08-31 15:54:56 -07:00
Xiang Gao	4ef12be900	Add __complex__ (#43844 ) Summary: fixes https://github.com/pytorch/pytorch/issues/43833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43844 Reviewed By: ZolotukhinM Differential Revision: D23422000 Pulled By: ngimel fbshipit-source-id: ebc6a27a9b04c77c3977e6c184cefce9e817cc2f	2020-08-31 11:39:41 -07:00
Xiang Gao	a860be898e	[resubmit] Add amax/amin (#43819 ) Summary: Resubmit for landing next week. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43819 Reviewed By: ngimel Differential Revision: D23421906 Pulled By: mruberry fbshipit-source-id: 23dd60d1e365bb1197d660c3bfad7ee07ba3e97f	2020-08-31 04:54:48 -07:00
Mike Ruberry	3aeb70db0b	Documents sub properly, adds subtract alias (#43850 ) Summary: `torch.sub` was undocumented, so this PR adds its documentation, analogous to `torch.add`'s documentation, and adds the alias `torch.subtract` for `torch.sub`, too. This alias comes from NumPy (see https://numpy.org/doc/stable/reference/generated/numpy.subtract.html?highlight=subtract#numpy.subtract) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43850 Reviewed By: ngimel Differential Revision: D23416908 Pulled By: mruberry fbshipit-source-id: 6c4d2ebaf6ecae91f3a6efe484ce6c4dad96f016	2020-08-30 15:44:56 -07:00
Nikita Shulga	64906497cd	Revert D23391941: [pytorch][PR] Implementing NumPy-like function torch.heaviside() Test Plan: revert-hammer Differential Revision: D23391941 (`a1eae6d158`) Original commit changeset: 7b942321a625 fbshipit-source-id: c2a7418a1fedaa9493300945c30e2392fc0d08ee	2020-08-28 19:16:58 -07:00
kiyosora	a1eae6d158	Implementing NumPy-like function torch.heaviside() (#42523 ) Summary: - Related with https://github.com/pytorch/pytorch/issues/38349 - Implementing the NumPy-like function `torch.heaviside()` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/42523 Reviewed By: glaringlee Differential Revision: D23391941 Pulled By: mruberry fbshipit-source-id: 7b942321a62567a5fc0a3679a289f4c4c19e6134	2020-08-28 18:11:20 -07:00
Nikita Shulga	3f0120edb4	Revert D23360705: [pytorch][PR] Add amax/amin Test Plan: revert-hammer Differential Revision: D23360705 (`bcec8cc3f9`) Original commit changeset: 5bdeb08a2465 fbshipit-source-id: 76a9e199823c7585e55328bad0778bcd8cd49381	2020-08-28 18:01:25 -07:00
Mike Ruberry	20abfc21e4	Adds arctanh, arcsinh aliases, simplifies arc* alias dispatch (#43762 ) Summary: Adds two more "missing" NumPy aliases: arctanh and arcsinh, and simplifies the dispatch of other arc* aliases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43762 Reviewed By: ngimel Differential Revision: D23396370 Pulled By: mruberry fbshipit-source-id: 43eb0c62536615fed221d460c1dec289526fb23c	2020-08-28 13:59:19 -07:00
Gao, Xiang	bcec8cc3f9	Add amax/amin (#43092 ) Summary: Add a max/min operator that only return values. ## Some important decision to discuss \| Question \| Current State \| \|---------------------------------------\|-------------------\| \| Expose torch.max_values to python? \| No \| \| Remove max_values and only keep amax? \| Yes \| \| Should amax support named tensors? \| Not in this PR \| ## Numpy compatibility Reference: https://numpy.org/doc/stable/reference/generated/numpy.amax.html \| Parameter \| PyTorch Behavior \| \|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|-----------------------------------------------------------------------------------\| \| `axis`: None or int or tuple of ints, optional. Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the maximum is selected over multiple axes, instead of a single axis or all the axes as before. \| Named `dim`, behavior same as `torch.sum` (https://github.com/pytorch/pytorch/issues/29137) \| \| `out`: ndarray, optional. Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output. \| Same \| \| `keepdims`: bool, optional. If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. \| implemented as `keepdim` \| \| `initial`: scalar, optional. The minimum value of an output element. Must be present to allow computation on empty slice. \| Not implemented in this PR. Better to implement for all reductions in the future. \| \| `where`: array_like of bool, optional. Elements to compare for the maximum. \| Not implemented in this PR. Better to implement for all reductions in the future. \| Note from numpy: > NaN values are propagated, that is if at least one item is NaN, the corresponding max value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmax. PyTorch has the same behavior Pull Request resolved: https://github.com/pytorch/pytorch/pull/43092 Reviewed By: ngimel Differential Revision: D23360705 Pulled By: mruberry fbshipit-source-id: 5bdeb08a2465836764a5a6fc1a6cc370ae1ec09d	2020-08-28 12:51:03 -07:00
Radhakrishnan Venkataramani	8032dbc117	Add Rowwise Prune PyTorch op (#42708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42708 Add rowwise prune pytorch op. This operator introduces sparsity to the 'weights' matrix with the help of the importance indicator 'mask'. A row is considered important and not pruned if the mask value for that particular row is 1(True) and not important otherwise. Test Plan: buck test caffe2/torch/fb/sparsenn:test -- rowwise_prune buck test caffe2/test:pruning Reviewed By: supriyar Differential Revision: D22849432 fbshipit-source-id: 456f4f77c04158cdc3830b2e69de541c7272a46d	2020-08-27 15:16:23 -07:00
Xiong Wei	033b7ae3ef	implement NumPy-like functionality maximum, minimum (#42579 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Implement NumPy-like functions `maximum` and `minimum`. The `maximum` and `minimum` functions compute input tensors element-wise, returning a new array with the element-wise maxima/minima. If one of the elements being compared is a NaN, then that element is returned, both `maximum` and `minimum` functions do not support complex inputs. This PR also promotes the overloaded versions of torch.max and torch.min, by re-dispatching binary `torch.max` and `torch.min` to `torch.maximum` and `torch.minimum`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42579 Reviewed By: mrshenli Differential Revision: D23153081 Pulled By: mruberry fbshipit-source-id: 803506c912440326d06faa1b71964ec06775eac1	2020-08-26 16:56:12 -07:00
Ralf Gommers	573940f8d7	Fix type annotation errors in torch.functional (#43446 ) Summary: Closes gh-42968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43446 Reviewed By: albanD Differential Revision: D23280962 Pulled By: malfet fbshipit-source-id: de5386a95a20ecc814c39cbec3e4252112340b3a	2020-08-26 08:27:59 -07:00
Hameer Abbasi	c4e841654d	Add alias torch.negative to torch.neg. (#43400 ) Summary: xref https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43400 Reviewed By: albanD Differential Revision: D23266011 Pulled By: mruberry fbshipit-source-id: ca20b30d99206a255cf26438b09c3ca1f99445c6	2020-08-24 01:15:04 -07:00
Mike Ruberry	e57b89c8dc	Adds arccos, arcsin, arctan aliases (#43319 ) Summary: These aliases are consistent with NumPy (see, for example, https://numpy.org/doc/stable/reference/generated/numpy.arccos.html?highlight=acos). Note that PyTorch's existing names are consistent with Python (see https://docs.python.org/3.10/library/math.html?highlight=acos#math.acos) and C++ (see, for example, https://en.cppreference.com/w/cpp/numeric/math/acos). Pull Request resolved: https://github.com/pytorch/pytorch/pull/43319 Reviewed By: pbelevich Differential Revision: D23260426 Pulled By: mruberry fbshipit-source-id: 98a6c97f69d1f718a396c2182e938a7a260c0889	2020-08-21 10:53:17 -07:00
Hameer Abbasi	e31cd46278	Add alias torch.fix for torch.trunc to be compatible with NumPy. (#43326 ) Summary: xref https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43326 Reviewed By: pbelevich Differential Revision: D23249089 Pulled By: mruberry fbshipit-source-id: 6afa9eb20493983d084e0676022c6245e7463e05	2020-08-20 21:47:39 -07:00
Nikita Vedeneev	888ae1b3d8	Introducing Matrix exponential (#40161 ) Summary: Implements (batched) matrix exponential. Fixes [https://github.com/pytorch/pytorch/issues/9983](https://github.com/pytorch/pytorch/issues/9983). The algorithm follows: ``` Bader, P.; Blanes, S.; Casas, F. Computing the Matrix Exponential with an Optimized Taylor Polynomial Approximation. Mathematics 2019, 7, 1174. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40161 Reviewed By: zhangguanheng66 Differential Revision: D22951372 Pulled By: ezyang fbshipit-source-id: aa068cb76d5cf71696b333d3e72cee287b3089e3	2020-08-18 14:15:10 -07:00
Mike Ruberry	e2eb0cb1a9	Adds arccosh alias for acosh and adds an alias consistency test (#43107 ) Summary: This adds the torch.arccosh alias and updates alias testing to validate the consistency of the aliased and original operations. The alias testing is also updated to run on CPU and CUDA, which revealed a memory leak when tracing (see https://github.com/pytorch/pytorch/issues/43119). Pull Request resolved: https://github.com/pytorch/pytorch/pull/43107 Reviewed By: ngimel Differential Revision: D23156472 Pulled By: mruberry fbshipit-source-id: 6155fac7954fcc49b95e7c72ed917c85e0eabfcd	2020-08-16 22:12:25 -07:00
Muthu Arivoli	5bcf9b017a	Implement hstack, vstack, dstack (#42799 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42799 Reviewed By: izdeby Differential Revision: D23140704 Pulled By: mruberry fbshipit-source-id: 6a36363562c50d0abce87021b84b194bb32825fb	2020-08-15 20:39:14 -07:00
Muthu Arivoli	b8102b1550	Implement torch.nextafter (#42580 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42580 Reviewed By: smessmer Differential Revision: D23012260 Pulled By: mruberry fbshipit-source-id: ce82a63c4ad407ec6ffea795f575ca7c58cd6137	2020-08-14 00:35:30 -07:00
Will Gan	e4373083a2	torch.complex and torch.polar (#39617 ) Summary: For https://github.com/pytorch/pytorch/issues/35312 and https://github.com/pytorch/pytorch/issues/38458#issuecomment-636066256. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39617 Reviewed By: zhangguanheng66 Differential Revision: D23083926 Pulled By: anjali411 fbshipit-source-id: 1874378001efe2ff286096eaf1e92afe91c55b29	2020-08-14 00:30:11 -07:00
Muthu Arivoli	92885ebe16	Implement hypot (#42291 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Closes https://github.com/pytorch/pytorch/issues/22764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42291 Reviewed By: malfet Differential Revision: D22951859 Pulled By: mruberry fbshipit-source-id: d0118f2b6437e5c3f775f699ec46e946a8da50f0	2020-08-12 13:18:26 -07:00
kshitij12345	ab0a04dc9c	Add `torch.nansum` (#38628 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38628 Reviewed By: VitalyFedyunin Differential Revision: D22860549 Pulled By: mruberry fbshipit-source-id: 87fcbfd096d83fc14b3b5622f2301073729ce710	2020-08-11 22:26:04 -07:00
Mike Ruberry	bee174dc3f	Adds linalg.det alias, fixes outer alias, updates alias testing (#42802 ) Summary: This PR: - updates test_op_normalization.py, which verifies that aliases are correctly translated in the JIT - adds torch.linalg.det as an alias for torch.det - moves the torch.linalg.outer alias to torch.outer (to be consistent with NumPy) The torch.linalg.outer alias was put the linalg namespace erroneously as a placeholder since it's a "linear algebra op" according to NumPy but is actually still in the main NumPy namespace. The updates to test_op_normalization are necessary. Previously it was using method_tests to generate tests, and method_tests assumes test suites using it also use the device generic framework, which test_op_normalization did not. For example, some ops require decorators like `skipCPUIfNoLapack`, which only works in device generic test classes. Moving test_op_normalization to the device generic framework also lets these tests run on CPU and CUDA. Continued reliance on method_tests() is excessive since the test suite is only interested in testing aliasing, and a simpler and more readable `AliasInfo` class is used for the required information. An example impedance mismatch between method_tests and the new tests, for example, was how to handle ops in namespaces like torch.linalg.det. In the future this information will likely be folded into a common 'OpInfo' registry in the test suite. The actual tests performed are similar to what they were previously: a scripted and traced version of the op is run and the test verifies that both graphs do not contain the alias name and do contain the aliased name. The guidance for adding an alias has been updated accordingly. cc mattip Note: ngimel suggests: - deprecating and then removing the `torch.ger` name - reviewing the implementation of `torch.outer` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42802 Reviewed By: zou3519 Differential Revision: D23059883 Pulled By: mruberry fbshipit-source-id: 11321c2a7fb283a6e7c0d8899849ad7476be42d1	2020-08-11 21:48:31 -07:00
Heitor Schueroff de Souza	c660d2a9ae	Initial quantile operator implementation (#42755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42755 Attempting to land quantile again after being landed here https://github.com/pytorch/pytorch/pull/39417 and reverted here https://github.com/pytorch/pytorch/pull/41616. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23030338 Pulled By: heitorschueroff fbshipit-source-id: 124a86eea3aee1fdaa0aad718b04863935be26c7	2020-08-11 12:08:17 -07:00
Mike Ruberry	87970b70a7	Adds 'clip' alias for clamp (#42770 ) Summary: Per title. Also updates our guidance for adding aliases to clarify interned_string and method_test requirements. The alias is tested by extending test_clamp to also test clip. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42770 Reviewed By: ngimel Differential Revision: D23020655 Pulled By: mruberry fbshipit-source-id: f1d8e751de9ac5f21a4f95d241b193730f07b5dc	2020-08-09 02:46:02 -07:00
Hameer Abbasi	3d46e02ea1	Add __torch_function__ for methods (#37091 ) Summary: According to pytorch/rfcs#3 From the goals in the RFC: 1. Support subclassing `torch.Tensor` in Python (done here) 2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here) 3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor` subclasses (done in https://github.com/pytorch/pytorch/issues/30730) 4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here) 5. Propagating subclass instances correctly also with operators, using views/slices/indexing/etc. (done here) 6. Preserve subclass attributes when using methods or views/slices/indexing. (done here) 7. A way to insert code that operates on both functions and methods uniformly (so we can write a single function that overrides all operators). (done here) 8. The ability to give external libraries a way to also define functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR) This PR makes the following changes: 1. Adds the `self` argument to the arg parser. 2. Dispatches on `self` as well if `self` is not `nullptr`. 3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`. 4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`. 5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`. TODO: - [x] Sequence Methods - [x] Docs - [x] Tests Closes https://github.com/pytorch/pytorch/issues/28361 Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091 Reviewed By: ngimel Differential Revision: D22765678 Pulled By: ezyang fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0	2020-08-05 20:44:13 -07:00

... 3 4 5 6 7 ...

382 Commits