Commit Graph

199 Commits

Author SHA1 Message Date
Shen Li
06a7cb5901 Implementing cuda kernel for tril_indices and triu_indices (#15203)
Summary:
Followup PR of #14904, and the stretch goal of #12653.

Directly calculate coordinates in the original tensor using column index in the result tensor. Every GPU thread takes care of a column (two numbers) in the output tensor.

The implementation detects and handles precision loss during calculating the square root of a `int64_t` variable, and supports tensors with up to `row * column = 2 ^ 59` numbers.

Algorithm details are describe in [comments of TensorFactories.cu](23ddb6f58a/aten/src/ATen/native/cuda/TensorFactories.cu (L109-L255)).

zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15203

Reviewed By: zou3519

Differential Revision: D13517695

Pulled By: mrshenli

fbshipit-source-id: 86b305d22cac08c8962a3b0cf8e9e620b7ec33ea
2018-12-20 10:23:38 -08:00
Brennan Vincent
7a764fe270 multi-dim standard deviation for CUDA. (#14990)
Summary:
This is the CUDA version of #14535 .
It refactors Reduce.cuh to allow more general classes of reductions to be performed -- we no longer assume that the temporary data returned during reduction is just one scalar, and instead allow an arbitrary accumulate type.
We also allow 64-bit indexing when necessary, since in general we will no longer be able to accumulate directly in the output. (In the cases when we can, we continue to split the tensors until they can be addressed with 32-bits, as before).
As an initial use-case, we implement `std` in multiple dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14990

Differential Revision: D13405097

Pulled By: umanwizard

fbshipit-source-id: a56c24dc2fd5326d417632089bd3f5c4f9f0d2cb
2018-12-20 08:56:32 -08:00
vishwakftw
41e7e1bc40 Rename potrs to cholesky_solve (#15334)
Summary:
Changelog:
- Renames `potrs` to `cholesky_solve` to remain consistent with Tensorflow and Scipy (not really, they call their function chol_solve)
- Default argument for upper in cholesky_solve is False. This will allow a seamless interface between `cholesky` and `cholesky_solve`, since the `upper` argument in both function are the same.
- Rename all tests
- Create a tentative alias for `cholesky_solve` under the name `potrs`, and add deprecated warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15334

Differential Revision: D13507724

Pulled By: soumith

fbshipit-source-id: b826996541e49d2e2bcd061b72a38c39450c76d0
2018-12-19 12:31:24 -08:00
y0ast
4bcb425490 fix cholesky call in potrs example (#15215)
Summary:
Cholesky by default returns the lower triangular matrix, see [docs](https://pytorch.org/docs/stable/torch.html#torch.cholesky).

However `torch.potrs` by default requires the upper triangular matrix. The naming of the variable `u` suggests that the example expects the upper to be returned, so I've added the flag to make that happen in the example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15215

Differential Revision: D13476468

Pulled By: soumith

fbshipit-source-id: 7b68035f435a2b1be4d363b3f63e407394af949d
2018-12-15 04:43:34 -08:00
Shen Li
90f9e8103c Implement torch.tril_indices and torch.triu_indices (#12653) (#14904)
Summary:
This is an optimized implementation that does the following:

1. created an empty Tensor of correct size.
2. fill the Tensor with correct values.

The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors.

1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations.
2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration.
3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it.

<img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png">

NOTE:

This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following:

```python
x = torch.ones(3, 3)
i = torch.tril_indices(3, 3)
x[i]  # need to first convert the 2D tensor into a tuple of two 1D tensors.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904

Reviewed By: zou3519

Differential Revision: D13433027

Pulled By: mrshenli

fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a
2018-12-12 15:40:14 -08:00
Imran
342e62f1e3 Minor documentation mistake (#15068)
Summary:
keepdim is a optional parameter for torch.max()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15068

Differential Revision: D13437745

Pulled By: zou3519

fbshipit-source-id: b5198c7d4ae17758cd136f6e5aecc6cb5838f174
2018-12-12 15:24:26 -08:00
vishwakftw
fc30e2782c Remove deprecated info argument in btrifact (#14935)
Summary:
As specified in title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14935

Differential Revision: D13394449

Pulled By: soumith

fbshipit-source-id: 569d59414f3a1a43ea641bded4b5433eb53e3490
2018-12-09 15:59:30 -08:00
Jan Schlüter
1c8d41a08d Allow linspace and logspace with steps=1 and start != end like numpy (#14748)
Summary:
`torch.linspace(0, 1, 1)` fails with `RuntimeError: invalid argument 3: invalid number of points at ../aten/src/TH/generic/THTensorMoreMath.cpp:2119`, while `np.linspace(0, 1, 1)` works fine.
Looking at the code, there is even a comment by gchanan asking: "NumPy allows you to pass different points even if n <= 1 -- should we?"
I would say "yes". Currently, I would need to handle the case of `steps == 1` or `steps == 0` separately, making sure to change the `end` when calling `torch.linspace`. This is impractical. If we support `start != end`, there are two possibilities for the result: Either we ensure the first value in the resulting sequence always equals `start`, or we ensure the last value in the resulting sequence always equals `end`. Numpy chose the former, which also allows it to support a boolean `endpoint` flag. I'd say we should follow numpy.

This PR adapts `linspace` and `logspace` to mimic the behavior of numpy, adapts the tests accordingly, and extends the docstrings to make clear what happens when passing `steps=1`.

If you decide against this PR, the error message should become explicit about what I did wrong, and the documentation should be extended to mention this restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14748

Differential Revision: D13356136

Pulled By: ezyang

fbshipit-source-id: db85b8f0a98a5e24b3acd766132ab71c91794a82
2018-12-06 09:30:55 -08:00
Brennan Vincent
c638f379b3 Make mean function work across multiple dimensions. (#14252)
Summary:
Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it.

Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252

Differential Revision: D13161157

Pulled By: umanwizard

fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c
2018-11-28 06:53:09 -08:00
Brian Vaughan
b08a186153 roll along multiple dimensions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13874

Differential Revision: D13223669

Pulled By: nairbv

fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04
2018-11-27 20:32:30 -08:00
Brennan Vincent
1ca0ec7299 fix typo in torch.sum documentation (#14250)
Summary:
Notice that an extra colon was added to `:attr:`, so in https://pytorch.org/docs/stable/torch.html#torch.sum , `dim` shows up as ":attr::_dim_". This patch fixes the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14250

Reviewed By: soumith

Differential Revision: D13146363

Pulled By: umanwizard

fbshipit-source-id: f7d03dcb0973aae248b56ab407ba8489f2b1fe36
2018-11-26 16:36:52 -08:00
vishwakftw
a30ade1139 Batched cholesky decomposition (#14017)
Summary:
Implements batching for the Cholesky decomposition.

Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations.

Changes made:
- batching code
- tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`.
- doc string modification
- autograd modification
- removal of `_batch_potrf` in `MultivariateNormal`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017

Differential Revision: D13087945

Pulled By: ezyang

fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e
2018-11-17 10:49:15 -08:00
vishwakftw
a3f39f1ebb Fix randint docs (#14083)
Summary: Closes #14079

Differential Revision: D13095904

Pulled By: soumith

fbshipit-source-id: e39319c5326bfdf6f401eaddebe94474349901c3
2018-11-16 03:04:02 -08:00
lberrada
35a24a9a94 Example with edge case 0 for torch.sign (#13771)
Summary:
The behavior of the edge case 0 is not self-evident for the `torch.sign` function ( I personally expected a result of 1):
```python
>>> a = torch.tensor([0.7, -1.2, 0., 2.3])
>>> a
tensor([ 0.7000, -1.2000,  0.0000,  2.3000])
>>> torch.sign(a)
tensor([ 1., -1.,  0.,  1.])
```
This is not currently documented, I think it is worth it to give a simple example showing this behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13771

Differential Revision: D13044520

Pulled By: ailzhang

fbshipit-source-id: c3011ccbdf1c13348f6c7242b06a9aa52ebc9204
2018-11-14 09:16:09 -08:00
Taekin Kim
f112aa746a Fix document about torch.get_default_dtype() (#13890)
Summary:
Minor fix.
```
torch.get_default_dtype() → :class:`torch.dtype`
```
→
```
torch.get_default_dtype() → torch.dtype
```
:class: is not rendered in https://pytorch.org/docs/stable/torch.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13890

Differential Revision: D13040704

Pulled By: colesbury

fbshipit-source-id: 5fadb01ad365042d5df2bac058f4ae89b281d3b7
2018-11-13 09:25:32 -08:00
Vishwak Srinivasan
7b2fb012a8 Make potrs batched (#13453)
Summary:
- This is a straightforward PR, building up on the batch inverse PR, except for one change:
  - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty.

Billing of changes:
- Add batching for `potrs`
- Add relevant tests
- Modify doc string

Minor changes:
- Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`.
- Add test for CUDA `potrs` (2D Tensor op)
- Move the batched shape checking to `LinearAlgebraUtils.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453

Reviewed By: soumith

Differential Revision: D12942039

Pulled By: zou3519

fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35
2018-11-09 15:16:26 -08:00
Brian Vaughan
4fadf571fd handle flat rolling (no dim specified) T36264909 (#13588)
Summary:
update roll to behave as in numpy.roll when dimension to roll not specified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13588

Differential Revision: D12964295

Pulled By: nairbv

fbshipit-source-id: de9cdea1a937773033f081f8c1505a40e4e08bc1
2018-11-08 12:39:35 -08:00
verhoek
619c2f8b44 small fixes regarding docu of torch tensors (#13635)
Summary:
Removed duplicate doc args block.
Made statements involving 'each element' more precise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13635

Differential Revision: D12946987

Pulled By: soumith

fbshipit-source-id: a17da92f69086b530ff769cf4662ae29843fd188
2018-11-06 17:24:42 -08:00
Thomas Viehmann
f0ed927b62 Add diag_embed to ATen and torch (#12447)
Summary:
Fixes: #12160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12447

Differential Revision: D12916234

Pulled By: SsnL

fbshipit-source-id: 512a04efb0c2e0a54295b857a61be66c3aae13da
2018-11-05 08:55:28 -08:00
Brian Vaughan
07f8b61cc6 Roll operator t32802531 (#13261)
Summary:
Adding a roll operator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13261

Differential Revision: D12922575

Pulled By: nairbv

fbshipit-source-id: ff05c075d9c484a615011192b023debf47da4017
2018-11-05 08:33:36 -08:00
vishwakftw
d714ecf879 Rename potrf to cholesky (#12699)
Summary:
This PR performs a renaming of the function `potrf` responsible for the Cholesky
decomposition on positive definite matrices to `cholesky` as NumPy and TF do.

Billing of changes
- make potrf cname for cholesky in Declarations.cwrap
- modify the function names in ATen/core
- modify the function names in Python frontend
- issue warnings when potrf is called to notify users of the change

Reviewed By: soumith

Differential Revision: D10528361

Pulled By: zou3519

fbshipit-source-id: 19d9bcf8ffb38def698ae5acf30743884dda0d88
2018-11-01 15:10:55 -07:00
Egil Martinsson
518b0d0600 Fix add out=None to digamma docstring (Fixes #13225) (#13307)
Summary:
Fixes #13225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13307

Differential Revision: D12840231

Pulled By: SsnL

fbshipit-source-id: 2732a2466ac1d2f3fdabfd1eaccddec96e89ba1b
2018-10-30 11:52:35 -07:00
vishwakftw
1fe8278559 Batched Inverse (#9949)
Summary:
Complete billing of changes:

Related to Batch Inverse:
- [x] Add batched inverse (CPU)
- [x] Add batched inverse (CUDA)
- [x] Modify autograd entry
- [x] Add tests
  - [x] test_autograd
  - [x] test_cuda
  - [x] test_torch
- [x] Modify docs
- [x] Remove `_batch_inverse` in `MultivariateNormal`.
- [x] Allow batch matrices as inputs for negative powers in `matrix_power`

Miscellaneous modifications:
- [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops.
- [x] Add a RAII structure for MAGMA queue management.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9949

Differential Revision: D10559089

Pulled By: zou3519

fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4
2018-10-27 23:42:46 -07:00
Soumith Chintala
cf235e0894 fix lint after new flake8 release added new style constraints (#13047)
Summary:
fix lint after new flake8 release added new style constraints
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047

Differential Revision: D10527804

Pulled By: soumith

fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8
2018-10-24 09:03:38 -07:00
Andrey Malevich
e027f7a913 Fix character with wrong encodding in documentation (#12761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12761

, is not really , and thus it can fail some of the Python 2 import.

Reviewed By: weiyangfb

Differential Revision: D10423231

fbshipit-source-id: 3738c0b9d2f52aa47eef06250f84c5933a38783f
2018-10-17 10:20:45 -07:00
Jaivarsan
1a6071d436 fixing seq to tensors in documentation (#12741)
Summary:
Fixes #12251

In the docs the actual key word argument was supposed to be `tensors` but instead it is given as `seq` for doing `torch.cat` operation.

zou3519 can you review this code? I don't have access to request for code reviews.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12741

Differential Revision: D10419682

Pulled By: ezyang

fbshipit-source-id: a0ec9c3f4aeba23ac3a99e2ae89bd07d2b9ddb58
2018-10-17 09:16:04 -07:00
Thomas Viehmann
0521c47c91 Amend nondeterminism notes (#12217)
Summary:
include atomicAdd commentary as this is less well known

There is some discussion in #12207

Unfortunately, I cannot seem to get the ..include working in `_tensor_docs.py` and `_torch_docs.py`. I could use a hint for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12217

Differential Revision: D10419739

Pulled By: SsnL

fbshipit-source-id: eecd04fb7486bd9c6ee64cd34859d61a0a97ec4e
2018-10-16 23:59:26 -07:00
Thomas Viehmann
d34578026c Various example code fixes (#12707)
Summary:
- Fix broken sparse_coo_examples, update output
- Tensor(...) to tensor(...)
- Fix arguments to math.log to be floats

While the last might be debateable, mypy currently complains when passing an int to math.log. As it is not essential for our examples, let's be clean w.r.t. other people's expectations.

These popped up while checking examples in the context of  #12500 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12707

Differential Revision: D10415256

Pulled By: SsnL

fbshipit-source-id: c907b576b02cb0f89d8f261173dbf4b3175b4b8d
2018-10-16 21:59:40 -07:00
Wei Yang
81975a497f update docs for sparse tensor (#12221)
Summary:
- update docs examples at sparse tensor after print format changed
- update example to create empty sparse tensor:
```
>>> torch.sparse_coo_tensor(torch.LongTensor(size=[1,0]), [], torch.Size([1]))
tensor(indices=tensor([], size=(1, 0)),
       values=tensor([], size=(0,)),
       size=(1,), nnz=0, layout=torch.sparse_coo)
```

zou3519 SsnL yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12221

Differential Revision: D10412447

Pulled By: weiyangfb

fbshipit-source-id: 155b8cb0965f060e978f12239abdc1b3b41f6ab0
2018-10-16 19:56:51 -07:00
Tongzhou Wang
5b8a640d0b Update fft docs for new cache size (#12665)
Summary:
Follow up of #12553
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12665

Differential Revision: D10385615

Pulled By: SsnL

fbshipit-source-id: 44fe9ec75cb735de37c56270f160a16a1d2bfb64
2018-10-16 01:47:36 -07:00
vishwakftw
0740a5d521 compute_uv for SVD (#12517)
Summary:
Adds a `compute_uv` argument that defaults to `True` for optionally computing the singular vectors during SVD.

Closes https://github.com/pytorch/pytorch/issues/12420 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12517

Differential Revision: D10384554

Pulled By: SsnL

fbshipit-source-id: 704998a257afa815eda901b8ae830e8a661695be
2018-10-15 12:35:56 -07:00
Natalia Gimelshein
a98958d3bd dtype option for softmax (#11719)
Summary:
Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719

Reviewed By: ezyang

Differential Revision: D10175514

Pulled By: zou3519

fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243
2018-10-13 17:57:10 -07:00
Wei Yang
ecb3835387 change \gamma to \Gamma (#12214)
Summary:
- revert `\gamma` changes at landed PR: https://github.com/pytorch/pytorch/pull/12126
- minor fix for docs of `torch.norm()`

SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12214

Differential Revision: D10127337

Pulled By: weiyangfb

fbshipit-source-id: 15eb8abda39ec9e8b2e815e2a22096cae786995a
2018-10-01 11:31:18 -07:00
Wei Yang
5ffc915f26 fix docs (#12126)
Summary:
- fix https://github.com/pytorch/pytorch/issues/12120
- add `torch.argsort`, `torch.pdist`, `broadcast_tensors` to *.rst files
- add parameter dim to `torch.unique` doc
- fix table and args for `torch.norm`
- test plan: make html and check docs in browser

gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12126

Differential Revision: D10087006

Pulled By: weiyangfb

fbshipit-source-id: 25f65c43d14e02140d0da988d8742c7ade3d8cc9
2018-09-29 22:26:45 -07:00
Wei Yang
817e83fc01 fix PR #11061 (#11815)
Summary:
- fix PR https://github.com/pytorch/pytorch/pull/11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data `
- with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args
- `torch.as_tensor` retains its behavior as documented

gchanan apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11815

Differential Revision: D9932713

Pulled By: weiyangfb

fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259
2018-09-21 11:04:19 -07:00
yya007
b91b15d86e Implementing Matrix Norm for torch.norm (#11261)
Summary:
Currently, norm function only supports vector norm. This PR extends vector norm to matrix norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11261

Reviewed By: li-roy

Differential Revision: D9652379

Pulled By: yya007

fbshipit-source-id: 519b3fb80b563c17c56a24675c7b0e46bf5a3a1c
2018-09-20 14:43:13 -07:00
Tongzhou Wang
24e958a0a7 Move bernoulli into ATen (#10273)
Summary:
+ https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken
  fixed in moving `bernoulli_out` to ATen
+ https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken
  fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods
+ https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results
  fixed by adding CUDA asserts

In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors.

The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like:
```cpp

  // The template argument `4` below indicates that we want to operate on four
  // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details.
  at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>(
      ret, p,
      [seeds] __device__(
          int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4,
          const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) {
        curandStatePhilox4_32_10_t state;
        curand_init(
            seeds.first,
            blockIdx.x * blockDim.x + threadIdx.x,
            seeds.second,
            &state);
        float4 rand = curand_uniform4(&state);
        switch (n) {
          case 4: {
            assert(0 <= p4 && p4 <= 1);
            v4 = static_cast<scalar_t>(rand.w <= p4);
          }
          case 3: {
            assert(0 <= p3 && p3 <= 1);
            v3 = static_cast<scalar_t>(rand.z <= p3);
          }
          case 2: {
            assert(0 <= p2 && p2 <= 1);
            v2 = static_cast<scalar_t>(rand.y <= p2);
          }
          case 1: {
            assert(0 <= p1 && p1 <= 1);
            v1 = static_cast<scalar_t>(rand.x <= p1);
          }
        }
      }
    );
```

Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops:

post patch
```
➜  ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py
torch.bernoulli(x)
6.841588497161865 +- 0.05413117632269859
torch.bernoulli(xc)
0.05963418632745743 +- 0.0008014909108169377
x.bernoulli_()
0.4024486541748047 +- 0.0021550932433456182
xc.bernoulli_()
0.02167394384741783 +- 2.3818030967959203e-05

```

pre-patch
```
➜  ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py
torch.bernoulli(x)
12.394511222839355 +- 0.0966421514749527
torch.bernoulli(xc)
0.08970972150564194 +- 0.0038722590543329716
x.bernoulli_()
1.654480218887329 +- 0.02364428900182247
xc.bernoulli_()
0.058352887630462646 +- 0.003094920190051198

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273

Differential Revision: D9831294

Pulled By: SsnL

fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088
2018-09-19 16:45:47 -07:00
Amitesh Arora
4ee0a78ee6 varargs for meshgrid (#11600)
Summary:
Adds vararg support for meshgrid and adds checks for all the tensor arguments to have the same dtype and device.

Fixes: [#10823](https://github.com/pytorch/pytorch/issues/10823), #11446

The earlier pull request closed without any changes because I had some rebasing issues, so I made another pull request to close out #10823. Sorry for the inconvenience.

Differential Revision: D9892876

Pulled By: ezyang

fbshipit-source-id: 93d96cafc876102ccbad3ca2cc3d81cb4c9bf556
2018-09-18 07:41:31 -07:00
Wei Yang
407a9fee0c make copy constructed tensor a leaf variable when using torch.tensor(sourceTensor) (#11061)
Summary:
- fix https://github.com/pytorch/pytorch/issues/10876
- the cause of the bug is because copy constructor cannot distinguish between default value of requires_grad and requires_grad=False, thus it makes a copy from source tensor along with its grad_fn if requires_grad=True at source
- with this fix, the behavior becomes
```
>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source, requires_grad=True)
>>> print(copy)
tensor([[-1.2001,  1.9869],
        [-1.0134,  1.3096]], grad_fn=<CopyBackwards>)

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source, requires_grad=False)
>>> print(copy)
tensor([[-0.7402,  0.0467],
        [ 0.4344, -0.0420]])

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source)
>>> print(copy)
tensor([[-0.7402,  0.0467],
        [ 0.4344, -0.0420]])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11061

Differential Revision: D9569714

Pulled By: weiyangfb

fbshipit-source-id: ea368688bdc0f1ce5997870e164e42835b64b4a1
2018-09-17 23:29:09 -07:00
Vishwak Srinivasan
0d9b9100f9 Fix gesv and gels docs (#11699)
Summary: Closes #9935 and closes #5431 .

Differential Revision: D9830448

Pulled By: soumith

fbshipit-source-id: 4e5320a1d0c1d4c8253a5b26f4842cea76530514
2018-09-14 09:24:45 -07:00
Tongzhou Wang
3a39006d38 Fix some more doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11531

Differential Revision: D9776541

Pulled By: SsnL

fbshipit-source-id: 8725485639ea6e9479b6ea95a49f5b75a9457db7
2018-09-11 16:26:55 -07:00
Tongzhou Wang
de460c7ad3 Improvements on conv/pool/fold/stft/ParamDict docs (#11106)
Summary:
Also fixes some incorrect formula rendering.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106

Differential Revision: D9752433

Pulled By: SsnL

fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21
2018-09-11 08:56:21 -07:00
Tongzhou Wang
d3f98b5ffc Add matrix power (#11421)
Summary:
vishwakftw Your patch needed some updates because the default native function dispatches changed from `[function, method]` to `[function]`. The CI was run before that change happened so it still shows green, but the internal test caught it.

I did some changes when rebasing and updating so I didn't just force push to your branch. Let's see if this passes CI and internal test. If it does, let me know if you want me to force push to your branch or use this PR instead.

Note to reviewers: patch was already approved at #10068 .

cc yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11421

Differential Revision: D9733407

Pulled By: SsnL

fbshipit-source-id: cf2ed293bb9942dcc5158934ff4def2f63252599
2018-09-08 15:25:56 -07:00
vishwakftw
da4ebc2971 Switch SVD on CPU from gesvd to gesdd (#11194)
Summary:
- Added a note to the doc string for `svd`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11194

Differential Revision: D9683250

Pulled By: soumith

fbshipit-source-id: 2d2c120be346122afa333629c0516a5c9dbb406f
2018-09-07 07:39:57 -07:00
vishwakftw
593d74061f Document torch.allclose (#11185)
Summary:
- Modify torch.autograd.gradcheck to use torch.allclose instead
- Expose doc strings

Closes #10355
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11185

Differential Revision: D9628016

Pulled By: soumith

fbshipit-source-id: 22a30622b9fe52e41b5b3540406137b59d8c5a75
2018-09-02 09:26:07 -07:00
zou3519
396dec0e37 s/spaerse/sparse (#10968)
Summary:
cc SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10968

Differential Revision: D9546746

Pulled By: zou3519

fbshipit-source-id: a6a4bb8bb04eccf89c3d90a90259070beb484500
2018-08-29 12:13:04 -07:00
Ailing Zhang
a9469c9c8a Fill eigenvector with zeros if not required (#10645)
Summary:
Fix #10345, which only happens in CUDA case.

* Instead of returning some random buffer, we fill it with zeros.

* update torch.symeig doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10645

Reviewed By: soumith

Differential Revision: D9395762

Pulled By: ailzhang

fbshipit-source-id: 0f3ed9bb6a919a9c1a4b8eb45188f65a68bfa9ba
2018-08-29 10:55:22 -07:00
zou3519
7169906249 torch.digamma (#10967)
Summary:
Fixes #10307

cc SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10967

Differential Revision: D9546748

Pulled By: zou3519

fbshipit-source-id: 764e27b1cc8dd487270b3ffa653b806c86f717dd
2018-08-29 09:43:19 -07:00
Vishwak Srinivasan
5fb9b31ed5 Add matrix_rank (#10338)
Summary:
- Similar functionality as NumPy
- Added doc string
- Added tests

Differential Revision: D9240850

Pulled By: SsnL

fbshipit-source-id: 1d04cfadb076e99e03bdf699bc41b8fac06831bf
2018-08-22 09:58:38 -07:00
Tongzhou Wang
abb209ef25 Fixes *fft docs (#10760)
Summary:
cc cranmer

fixes #10751
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10760

Differential Revision: D9444473

Pulled By: SsnL

fbshipit-source-id: a4036773a93981801c1283d69f86e30cb0fe3d6d
2018-08-21 21:09:04 -07:00