Commit Graph

583 Commits

Author SHA1 Message Date
Jan Schlüter
1c8d41a08d Allow linspace and logspace with steps=1 and start != end like numpy (#14748)
Summary:
`torch.linspace(0, 1, 1)` fails with `RuntimeError: invalid argument 3: invalid number of points at ../aten/src/TH/generic/THTensorMoreMath.cpp:2119`, while `np.linspace(0, 1, 1)` works fine.
Looking at the code, there is even a comment by gchanan asking: "NumPy allows you to pass different points even if n <= 1 -- should we?"
I would say "yes". Currently, I would need to handle the case of `steps == 1` or `steps == 0` separately, making sure to change the `end` when calling `torch.linspace`. This is impractical. If we support `start != end`, there are two possibilities for the result: Either we ensure the first value in the resulting sequence always equals `start`, or we ensure the last value in the resulting sequence always equals `end`. Numpy chose the former, which also allows it to support a boolean `endpoint` flag. I'd say we should follow numpy.

This PR adapts `linspace` and `logspace` to mimic the behavior of numpy, adapts the tests accordingly, and extends the docstrings to make clear what happens when passing `steps=1`.

If you decide against this PR, the error message should become explicit about what I did wrong, and the documentation should be extended to mention this restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14748

Differential Revision: D13356136

Pulled By: ezyang

fbshipit-source-id: db85b8f0a98a5e24b3acd766132ab71c91794a82
2018-12-06 09:30:55 -08:00
Junjie Bai
ba0ebe33c1 Unify device argument parsing between torch and c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14786

Differential Revision: D13334501

Pulled By: bddppq

fbshipit-source-id: ae3536be1fe0dcd6a1552ec93629ecc9554c0d7c
2018-12-05 18:37:32 -08:00
Richard Zou
1921816f85 Fix clamp when min/max are both None (#14716)
Summary:
Before this PR, tensor.clamp() would return an empty tensor if min and
max were not specified. This is a regression from 0.4.1, which would
throw an error. This PR restores that error message.

Fixes #14470
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14716

Differential Revision: D13311031

Pulled By: zou3519

fbshipit-source-id: 87894db582d5749eaccfc22ba06aac4e10983880
2018-12-04 07:07:09 -08:00
Roy Li
0786dfee7c Move THTensor_(copy) to aten (#13603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13603
P
Moved vectorized CPU copy to aten. Notable changes mainly in _copy_same_type_.

Reviewed By: ezyang

Differential Revision: D12936031

fbshipit-source-id: 00d28813e3160595e73d104f76685e13154971c1
2018-11-30 11:12:54 -08:00
Brennan Vincent
c638f379b3 Make mean function work across multiple dimensions. (#14252)
Summary:
Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it.

Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252

Differential Revision: D13161157

Pulled By: umanwizard

fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c
2018-11-28 06:53:09 -08:00
Francisco Massa
68251fb931 Fix half tensor printing plus speedup large tensor printing (#14418)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/14344 and https://github.com/pytorch/pytorch/issues/6863

The slowdown was due to the fact that we were only summarizing the tensor (for computing the number of digits to print) if its first dimension was larger than the threshold. It now goes over all the dimensions.

Some quick runtime analysis:

Before this PR:
```python
In [1]: import torch; a = torch.rand(1, 1700, 34, 50)

In [2]: %timeit str(a)
13.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

After this PR

```python
In [1]: import torch; a = torch.rand(1, 1700, 34, 50)

In [2]: %timeit str(a)
2.08 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: b = a.cuda()

In [4]: %timeit str(b)
8.39 ms ± 45.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14418

Reviewed By: weiyangfb

Differential Revision: D13226950

Pulled By: soumith

fbshipit-source-id: 19eb4b855db4c8f891d0925a9c56ae8a2824bb23
2018-11-28 06:13:06 -08:00
Brian Vaughan
a0def0b57e check for invalid ranges in torch.arange
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13915

Differential Revision: D13222110

Pulled By: nairbv

fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b
2018-11-27 20:38:56 -08:00
Brian Vaughan
b08a186153 roll along multiple dimensions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13874

Differential Revision: D13223669

Pulled By: nairbv

fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04
2018-11-27 20:32:30 -08:00
Ailing Zhang
e387d945c2 allow empty index for scatter_* methods (#14077)
Summary:
Fixes #2027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14077

Differential Revision: D13095788

Pulled By: ailzhang

fbshipit-source-id: ad2c8bbf83d36e07940782b9206fbdcde8905fd3
2018-11-19 09:50:21 -08:00
vishwakftw
a5891e6124 Remove debugging code in test_cholesky_batched (#14156)
Summary:
They didn't turn up in my tests because I use pytest which doesn't
print debug statements if the tests pass

Differential Revision: D13115227

Pulled By: soumith

fbshipit-source-id: 46a7d47da7412d6b071158a23ab21e7fb0c6e11b
2018-11-17 22:28:21 -08:00
vishwakftw
a30ade1139 Batched cholesky decomposition (#14017)
Summary:
Implements batching for the Cholesky decomposition.

Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations.

Changes made:
- batching code
- tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`.
- doc string modification
- autograd modification
- removal of `_batch_potrf` in `MultivariateNormal`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017

Differential Revision: D13087945

Pulled By: ezyang

fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e
2018-11-17 10:49:15 -08:00
Johannes M Dieterich
ce48958606 enable more unit tests (#13166)
Summary:
This enables the distributions and utils test sets for ROCm.
Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit.

For attention: bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166

Differential Revision: D12814759

Pulled By: bddppq

fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e
2018-11-12 18:49:52 -08:00
Vishwak Srinivasan
7b2fb012a8 Make potrs batched (#13453)
Summary:
- This is a straightforward PR, building up on the batch inverse PR, except for one change:
  - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty.

Billing of changes:
- Add batching for `potrs`
- Add relevant tests
- Modify doc string

Minor changes:
- Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`.
- Add test for CUDA `potrs` (2D Tensor op)
- Move the batched shape checking to `LinearAlgebraUtils.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453

Reviewed By: soumith

Differential Revision: D12942039

Pulled By: zou3519

fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35
2018-11-09 15:16:26 -08:00
Brian Vaughan
4fadf571fd handle flat rolling (no dim specified) T36264909 (#13588)
Summary:
update roll to behave as in numpy.roll when dimension to roll not specified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13588

Differential Revision: D12964295

Pulled By: nairbv

fbshipit-source-id: de9cdea1a937773033f081f8c1505a40e4e08bc1
2018-11-08 12:39:35 -08:00
vishwakftw
0a090fe60a Fix torch.dist for infinity, zero and minus infinity norms (#13713)
Summary: Fixes #13559

Differential Revision: D12981556

Pulled By: zou3519

fbshipit-source-id: 99e86abab3ca045257374a9212ca24e7ca59fe9d
2018-11-08 12:03:07 -08:00
Wei Yang
5dd153b1c2 speed up torch.sparse_mask() cpu kernel (#13290)
Summary:
- `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()`
- previous `sparse_mask(D, S)` cpu kernel is not parallelized
- this PR speed up the cpu kernel for two separated cases:
  - `D.dim == S.sparse_dim`: simply parallelize the kernel
  - `D.dim > S.sparse_dim`: simply use CUDA kernel implementation
- performance:

`D.dim == S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
               torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz)
>>> size = torch.Size(dims)

>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)

>>> %timeit D.sparse_mask(S)

======= before change =======
6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

======= after change =======
333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

`D.dim > S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000, 2, 2]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
               torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz, dims[2], dims[3])
>>> size = torch.Size(dims)

>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)
%timeit D.sparse_mask(S)

======= before change =======
495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

======= after change =======
594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290

Differential Revision: D12878336

Pulled By: weiyangfb

fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37
2018-11-07 20:02:17 -08:00
Wei Yang
6bfce16873 fix flip() shape bug in CPU (#13344)
Summary:
- a walk around for #13292, a complete fix requires investigation on the root cause when using advanced indexing
- this PR brings in `filp()` CUDA implementation for CPU kernel
- with this change:
```
>>> t = torch.randn(1, 3, 4, 5)
>> t.flip(1, 3).shape
torch.Size([1, 3, 4, 5])
```
- performance:
```
====== with this PR ======
>>> a = torch.randn(1000, 1000)
>>> %timeit -r 100 a.flip(0, 1)
1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each)

====== Perf at previous PR #7873 ======
100 loops, best of 3: 11 ms per loop
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13344

Differential Revision: D12968003

Pulled By: weiyangfb

fbshipit-source-id: 66f434049d143a0575a35b5c983b3e0577a1a28d
2018-11-07 19:53:49 -08:00
Soumith Chintala
a7ee632dff Various Test and build fixes (#13556)
Summary:
- fixes weights-contiguous requirement for THCUNN Convolutions
- Add tests that conv backward pass works for non-contiguous weights
- fix RNN tests / error messages to be consistent and pass
- relax weight grad precision for fp16 for a particular test
- fix regression of CMAKE_PREFIX_PATH not passing through
- add missing skipIfNoLapack annotations where needed

Differential Revision: D12918456

Pulled By: soumith

fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05
2018-11-06 07:13:47 -08:00
Thomas Viehmann
f0ed927b62 Add diag_embed to ATen and torch (#12447)
Summary:
Fixes: #12160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12447

Differential Revision: D12916234

Pulled By: SsnL

fbshipit-source-id: 512a04efb0c2e0a54295b857a61be66c3aae13da
2018-11-05 08:55:28 -08:00
Brian Vaughan
07f8b61cc6 Roll operator t32802531 (#13261)
Summary:
Adding a roll operator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13261

Differential Revision: D12922575

Pulled By: nairbv

fbshipit-source-id: ff05c075d9c484a615011192b023debf47da4017
2018-11-05 08:33:36 -08:00
Tongzhou Wang
2f82a06826 Fix half_tensor.bernoulli_(double) (#13474)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/12431
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13474

Differential Revision: D12897834

Pulled By: SsnL

fbshipit-source-id: 598250fd7b9f1d2509ec0e5012724d7895a62daf
2018-11-02 07:46:46 -07:00
Tongzhou Wang
6d2b3cc869 Fix pytest, make it work with run_test.py (#13416)
Summary:
Fixes #13326

Also now you can use `run_test.py` with `pytest`. E.g.,
```
python run_test.py -vci distributed -pt
```

Yes it works with `distributed` and `cpp_extension`.

cc zou3519 vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13416

Differential Revision: D12895622

Pulled By: SsnL

fbshipit-source-id: 2d18106f3a118d642a666bfb1318f41c859c3df7
2018-11-01 19:08:06 -07:00
vishwakftw
d714ecf879 Rename potrf to cholesky (#12699)
Summary:
This PR performs a renaming of the function `potrf` responsible for the Cholesky
decomposition on positive definite matrices to `cholesky` as NumPy and TF do.

Billing of changes
- make potrf cname for cholesky in Declarations.cwrap
- modify the function names in ATen/core
- modify the function names in Python frontend
- issue warnings when potrf is called to notify users of the change

Reviewed By: soumith

Differential Revision: D10528361

Pulled By: zou3519

fbshipit-source-id: 19d9bcf8ffb38def698ae5acf30743884dda0d88
2018-11-01 15:10:55 -07:00
Sam Gross
a4f00c3d1e Fix error message in tensorlist()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13392

Differential Revision: D12860921

Pulled By: colesbury

fbshipit-source-id: 86da3ef15d70b0343dc922a3842449001c1afffa
2018-10-31 11:19:56 -07:00
Will Feng
11a16961a5 Fix "CUDA Tensor __rsub__ breaks when device is not 0" (#12956)
Summary:
Currently, `a = 1 - torch.tensor([1]).to('cuda:1')` puts `a` in `cuda:1` but reports `a.device` as `cuda:0` which is incorrect, and it causes illegal memory access error when trying to access `a`'s memory (e.g. when printing). This PR fixes the error.

Fixes https://github.com/pytorch/pytorch/issues/10850.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12956

Differential Revision: D12835992

Pulled By: yf225

fbshipit-source-id: 5737703d2012b14fd00a71dafeedebd8230a0b04
2018-10-30 16:29:19 -07:00
Wei Yang
3cb2470bb3 add __deepcopy__ back to Parameter (#12886)
Summary:
- fix https://github.com/pytorch/pytorch/issues/315
- add `__deepcopy__` back to Parameter class
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12886

Differential Revision: D12838771

Pulled By: weiyangfb

fbshipit-source-id: b2ce12244e36f981d89f6c7cdead63237dd820ea
2018-10-30 12:56:26 -07:00
Tongzhou Wang
d8dab6ffa8 Add tensor.to(options) (#13146)
Summary:
ezyang on the template hack
smessmer on SFINAE of the `TensorOptions(Device)`
goldsborough on the C++ API test changes
zdevito on the `jit` codegen changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13146

Reviewed By: ezyang

Differential Revision: D12823809

Pulled By: SsnL

fbshipit-source-id: 98d65c401c98fda1c6fa358e4538f86c6495abdc
2018-10-29 16:26:06 -07:00
Tongzhou Wang
8ad69a80e3 Test scripts only run cases defined in the running script (#13250)
Summary:
1. Refactors `TestTorch` into `TestTorchMixin` (subclass of `object`) and `TestTorch` (subclass of `TestCase`, MRO `(TestCase, TestTorchMixin)`, only defined if `__name__ == '__main__'`). So other scripts won't accidentally run it.
2. Adds an assertion in `load_tests` that each script only runs cases defined in itself.

cc yf225 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13250

Differential Revision: D12823734

Pulled By: SsnL

fbshipit-source-id: 7a169f35fe0794ce76e310d8a137d9a3265c012b
2018-10-29 13:57:40 -07:00
vishwakftw
1fe8278559 Batched Inverse (#9949)
Summary:
Complete billing of changes:

Related to Batch Inverse:
- [x] Add batched inverse (CPU)
- [x] Add batched inverse (CUDA)
- [x] Modify autograd entry
- [x] Add tests
  - [x] test_autograd
  - [x] test_cuda
  - [x] test_torch
- [x] Modify docs
- [x] Remove `_batch_inverse` in `MultivariateNormal`.
- [x] Allow batch matrices as inputs for negative powers in `matrix_power`

Miscellaneous modifications:
- [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops.
- [x] Add a RAII structure for MAGMA queue management.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9949

Differential Revision: D10559089

Pulled By: zou3519

fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4
2018-10-27 23:42:46 -07:00
Johannes M Dieterich
7a6e0bd77e Skip ROCm tests that fail as per #12824 (#13181)
Summary:
For attention: bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13181

Differential Revision: D12811207

Pulled By: bddppq

fbshipit-source-id: de1c92e5a8cf4fc634c4644376d07374441c24e3
2018-10-26 21:06:20 -07:00
Zachary DeVito
dae7616078 Shard all of tests based on how many tests exist. (#13160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160

Reduces pytorch_core build from 2 hours to 30 minutes

Reviewed By: soumith, dzhulgakov

Differential Revision: D10524261

fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9
2018-10-26 18:20:34 -07:00
Ailing Zhang
478886be30 Fix print precision and match numpy behavior (#12746)
Summary:
Fixes #12578 #9395.

* Fix and simplify print logic

* Follow numpy print rule eb2bd11870/numpy/core/arrayprint.py (L859)
> scientific notation is used when absolute value of the smallest number is < 1e-4 or maximum > 1e8 or the ratio of the maximum absolute value to the minimum is > 1e3

I hope I didn't break anything since there seems to be a lot of edge cases here... Here are some easy sanity checks.
```
In [5]: torch.tensor(1)
Out[5]: tensor(1)
Out[2]: array(1) # numpy

In [6]: torch.tensor(10)
Out[6]: tensor(10)
Out[3]: array(10) # numpy

In [8]: torch.tensor(99000000)
Out[8]: tensor(99000000)
Out[5]: array(99000000) # numpy

In [9]: torch.tensor(100000000)
Out[9]: tensor(100000000)
Out[6]: array(100000000) # numpy

In [10]: torch.tensor(100000001)
Out[10]: tensor(100000001)
Out[7]: array(100000001) # numpy

In [11]: torch.tensor(1000000000)
Out[11]: tensor(1000000000)
Out[8]: array(1000000000) # numpy

In [12]: torch.tensor([1, 1000])
Out[12]: tensor([   1, 1000])
Out[9]: array([   1, 1000]) # numpy

In [13]: torch.tensor([1, 1010])
Out[13]: tensor([   1, 1010])
Out[10]: array([   1, 1010]) # numpy
```
For floating points, we use scientific when `max/min > 1000 || max > 1e8 || min < 1e-4`
Lines with "old" are old behaviors that either has precision issue, or not aligned with numpy
```
In [14]: torch.tensor(0.01)
Out[14]: tensor(0.0100)
Out[11]: array(0.01) # numpy

In [15]: torch.tensor(0.1)
Out[15]: tensor(0.1000)
Out[12]: array(0.1) # numpy

In [16]: torch.tensor(0.0001)
Out[16]: tensor(0.0001)
Out[14]: array(0.0001) # numpy

In [17]: torch.tensor(0.00002)
Out[17]: tensor(2.0000e-05)
Out[15]: array(2e-05) # numpy
Out[5]: tensor(0.0000) # old

In [18]: torch.tensor(1e8)
Out[18]: tensor(100000000.)
Out[16]: array(100000000.0) # numpy

In [19]: torch.tensor(1.1e8)
Out[19]: tensor(1.1000e+08)
Out[17]: array(1.1e8) # numpy 1.14.5, In <= 1.13 this was not using scientific print
Out[10]: tensor(110000000.) # old

In [20]: torch.tensor([0.01, 10.])
Out[20]: tensor([ 0.0100, 10.0000])
Out[18]: array([  0.01,  10.  ]) # numpy

In [21]: torch.tensor([0.01, 11.])
Out[21]: tensor([1.0000e-02, 1.1000e+01])
Out[19]: array([  1.00000000e-02,   1.10000000e+01]) # numpy
Out[7]: tensor([ 0.0100, 11.0000]) # old
```
When print floating number in int mode, we still need to respect rules to use scientific mode first
```
In [22]: torch.tensor([1., 1000.])
Out[22]: tensor([   1., 1000.])
Out[20]: array([    1.,  1000.]) # numpy

In [23]: torch.tensor([1., 1010.])
Out[23]: tensor([1.0000e+00, 1.0100e+03])
Out[21]: array([  1.00000000e+00,   1.01000000e+03]) # numpy
Out[9]: tensor([   1., 1010.]) # old
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12746

Differential Revision: D10443800

Pulled By: ailzhang

fbshipit-source-id: f5e4e3fe9bf0b44af2c64c93a9ed42b73fa613f5
2018-10-24 18:12:51 -07:00
James Sun
f4944f0f8a Rename test/common.py to test/common_utils.py (#12794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794

common.py is used in base_module for almost all tests in test/. The
name of this file is so common that can easily conflict with other dependencies
if they happen to have another common.py in the base module. Rename the file to
avoid conflict.

Reviewed By: orionr

Differential Revision: D10438204

fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380
2018-10-17 23:04:29 -07:00
Sepehr Sameni
cffeb03a2d fix forward and backward for norm with negative infinity norm (#12722)
Summary:
I found a bug in norm() and fixed it (and added tests to make sure it's fixed)
here is how to reproduce it:
```python
import torch
x = torch.FloatTensor([[10, 12, 13], [4, 0, 12]])
print(torch.norm(x, -40, dim=0, keepdim=True)) #output is tensor([[ 4.0000,  0.0000, 11.9853]])
print(torch.norm(x, float('-inf'), dim=0, keepdim=True)) #output is tensor([[1., 1., 1.]]) which is wrong!
from numpy.linalg import norm as np_norm
x = x.numpy()
print(np_norm(x, ord=-40, axis=0)) #output is array([[4., 0., 11.985261]])
print(np_norm(x, ord=float('-inf'), axis=0)) #output is array([[4., 0., 12.0]])
```
it's related to [#6817](https://github.com/pytorch/pytorch/issues/6817) and [#6969](https://github.com/pytorch/pytorch/pull/6969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12722

Differential Revision: D10427687

Pulled By: soumith

fbshipit-source-id: 936a7491d1e2625410513ee9c39f8c910e8e6803
2018-10-17 21:07:43 -07:00
Ailing Zhang
25db86cca5 Fix isfinite for int input (#12750)
Summary:
`torch.isfinite()` used to crash on int inputs.
```
>>> import torch
>>> a = torch.tensor([1, 2])
>>> torch.isfinite(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/pytorch/torch/functional.py", line 262, in isfinite
    return (tensor == tensor) & (tensor.abs() != inf)
RuntimeError: value cannot be converted to type int64_t without overflow: inf
```
But this is a easy special case and numpy also supports it.
```
>>> import numpy as np
>>> a = np.array([1, 2])
>>> a.dtype
dtype('int64')
>>> np.isfinite(a)
array([ True,  True], dtype=bool)
```
So added a hacky line to handle non-floating-point input. Since pytorch raises exception when overflow, we can safely assume all valid int tensors are infinite numbers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12750

Differential Revision: D10428204

Pulled By: ailzhang

fbshipit-source-id: f39b2d0975762c91cdea23c766ff1e21d85d57a5
2018-10-17 11:48:25 -07:00
Thomas Viehmann
50c0aedbec Don't segfault on Tensor.__delitem__ (#12726)
Summary:
The mapping protocol stipulates that when `__delitem__` is called, this is passed to `__setitem__` [(well, the same function in the C extension interface)](https://docs.python.org/3/c-api/typeobj.html#c.PyMappingMethods.mp_ass_subscript) with NULL data.

PyTorch master crashes in this situation, with this patch, it does not anymore.

Test code (careful, sefaults your interpreter):
```python
import torch
a = torch.randn(5)
del a[2]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12726

Differential Revision: D10414244

Pulled By: colesbury

fbshipit-source-id: c49716e1a0a3d9a117ce88fc394858f1df36ed79
2018-10-16 17:24:18 -07:00
vishwakftw
0740a5d521 compute_uv for SVD (#12517)
Summary:
Adds a `compute_uv` argument that defaults to `True` for optionally computing the singular vectors during SVD.

Closes https://github.com/pytorch/pytorch/issues/12420 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12517

Differential Revision: D10384554

Pulled By: SsnL

fbshipit-source-id: 704998a257afa815eda901b8ae830e8a661695be
2018-10-15 12:35:56 -07:00
Mingfei Ma
02695c11db fix masked_fill_ bug on non-contiguous tensor (#12594)
Summary:
bug fix on #12230 , the following script pass after the fix.
```python
x = torch.randn(2, 2, 2)
x = x.permute((2, 0, 1))
y = x.clone()
y.masked_fill_(y > 0, 1)
x.masked_fill_(x > 0, 1)
print((x == y).all())
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12594

Differential Revision: D10377088

Pulled By: soumith

fbshipit-source-id: 88feabe1459d325bfdf9a860412ddbd28686a28b
2018-10-14 23:12:27 -07:00
vishwakftw
48bc57fa8d Introduce chain_matmul (#12380)
Summary:
- This was one of the few functions left out from the list of functions in
  NumPy's `linalg` module
- `multi_mm` is particularly useful for DL research, for quick analysis of
  deep linear networks
- Added tests and doc string
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12380

Differential Revision: D10357136

Pulled By: SsnL

fbshipit-source-id: 52b44fa18d6409bdeb76cbbb164fe4e88224458e
2018-10-12 03:58:12 -07:00
Thomas Viehmann
0cf3c1ce66 Add copy= keyword to Tensor.to (#12571)
Summary:
Fixes: #12454
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12571

Differential Revision: D10356994

Pulled By: SsnL

fbshipit-source-id: d87416078a5a8e5ffa690cd73c09fa6b4e16aa25
2018-10-12 02:10:44 -07:00
Johannes M Dieterich
957142a4fe switch ROCm CI targets to white rabbit release (#12577)
Summary:
* switches docker files over to white rabbit release - removed custom package installs
* skips five tests that regressed in that release
* fixes some case-sensitivity issues in ROCm supplied cmake files by sed'ing them in the docker
* includes first changes to the infrastructure to support upcoming hip-clang compiler
* prints ROCm library versions as part of the build (as discussed w/ ezyang )
* explicitly searches for miopengemm
* installs the new hip-thrust package to be able to remove the explicit Thrust checkout in a future revision
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12577

Differential Revision: D10350165

Pulled By: bddppq

fbshipit-source-id: 60f9c9caf04a48cfa90f4c37e242d944a175ab31
2018-10-11 18:03:11 -07:00
Ailing Zhang
8734b174ca Multinomial raise error (#12490)
Summary:
Fixes #12260 #2896

```
torch.multinomial(torch.FloatTensor([0, 1, 0, 0]), 3, replacement=False)
```
The old behavior is that we return `0` after we run out of postive categories. Now we raise an error based on discussion in the issue thread.

- Add testcase for cpu & cuda case, in cuda case `n_samples=1` is a simple special case, so we test against `n_sample=2` instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12490

Differential Revision: D10278794

Pulled By: ailzhang

fbshipit-source-id: d04de7a60f60d0c0d648b975db3f3961fcf42db1
2018-10-10 20:39:04 -07:00
iotamudelta
c96afa3322 topk and sort fixes (#12337)
Summary:
* Topk part 1: fix intrinsincs for 64 wave front (#224)
64 in a wave front - intrinsics change.
* Disable in-place sorting on ROCm. (#237)
It is known to hang - use the Thrust fallback
Skip one test - fails with the fallback.
* Topk fixes (#239)
* Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255
* Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs
* Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63
* Round up blockDim.x to prevent negative index for smem

bddppq ezyang

Note the one additional skipped test resulting from using the thrust sort fallback for all sizes. We are working on getting bitonic to work properly (and always). Until then, this needs to be skipped on ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12337

Differential Revision: D10259481

Pulled By: ezyang

fbshipit-source-id: 5c8dc6596d7a3103ba7b4b550cba895f38c8148e
2018-10-09 12:08:48 -07:00
Thomas Viehmann
0e44db8b0d Add check for backend of arguments to bmm cpu (#12434)
Summary:
Fixes: #12406
Thank you, jcjohnson, for reporting.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12434

Differential Revision: D10235799

Pulled By: soumith

fbshipit-source-id: 44ee35010bac3791901f604095f5b4bc66b0e7f8
2018-10-07 18:55:42 -07:00
Johannes M Dieterich
c9f7d7b506 mark unit tests as working, skip failing unit test (#12313)
Summary:
* enabled fp16 tests for test_torch

* enable fp16 tests for test_nn

* enabled multilabelmargin loss for fp16

* removed skip for test_pdist_empty_col

* Enable test_nn tests that pass with compiler fixes etc.

* Enable test_legacy_nn tests that pass with compiler fixes etc.

ezyang bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313

Differential Revision: D10189922

Pulled By: bddppq

fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6
2018-10-03 23:56:26 -07:00
iotamudelta
a2ebbccc9f fix unit tests on CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187

Differential Revision: D10118483

Pulled By: bddppq

fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442
2018-09-28 23:11:55 -07:00
Wei Yang
de11fe0c83 migrate PReLU to ATen (#11758)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10723
- migrate PReLU to ATen and deprecate legacy PReLU
- performance:

CPU with weight.numel() = 1
```
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
100 loops, best of 100: 9.43 ms per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
10 loops, best of 100: 24.4 ms per loop

>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 695 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.47 ms per loop
```

CPU with weight.numel() = channels
```
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 603 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 13.3 ms per loop

>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 655 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.45 ms per loop
```

CUDA with weight.numel() = 1
```
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 187 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.01 ms per loop

>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 195 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.28 ms per loop
```

CUDA with weight.numel() = channel
```
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 174 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.27 ms per loop

>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 181 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.26 ms per loop
```

The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels.

ezyang SsnL zou3519  soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758

Differential Revision: D9995799

Pulled By: weiyangfb

fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e
2018-09-21 16:26:04 -07:00
Wei Yang
817e83fc01 fix PR #11061 (#11815)
Summary:
- fix PR https://github.com/pytorch/pytorch/pull/11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data `
- with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args
- `torch.as_tensor` retains its behavior as documented

gchanan apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11815

Differential Revision: D9932713

Pulled By: weiyangfb

fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259
2018-09-21 11:04:19 -07:00
yya007
b91b15d86e Implementing Matrix Norm for torch.norm (#11261)
Summary:
Currently, norm function only supports vector norm. This PR extends vector norm to matrix norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11261

Reviewed By: li-roy

Differential Revision: D9652379

Pulled By: yya007

fbshipit-source-id: 519b3fb80b563c17c56a24675c7b0e46bf5a3a1c
2018-09-20 14:43:13 -07:00
Tongzhou Wang
24e958a0a7 Move bernoulli into ATen (#10273)
Summary:
+ https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken
  fixed in moving `bernoulli_out` to ATen
+ https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken
  fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods
+ https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results
  fixed by adding CUDA asserts

In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors.

The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like:
```cpp

  // The template argument `4` below indicates that we want to operate on four
  // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details.
  at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>(
      ret, p,
      [seeds] __device__(
          int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4,
          const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) {
        curandStatePhilox4_32_10_t state;
        curand_init(
            seeds.first,
            blockIdx.x * blockDim.x + threadIdx.x,
            seeds.second,
            &state);
        float4 rand = curand_uniform4(&state);
        switch (n) {
          case 4: {
            assert(0 <= p4 && p4 <= 1);
            v4 = static_cast<scalar_t>(rand.w <= p4);
          }
          case 3: {
            assert(0 <= p3 && p3 <= 1);
            v3 = static_cast<scalar_t>(rand.z <= p3);
          }
          case 2: {
            assert(0 <= p2 && p2 <= 1);
            v2 = static_cast<scalar_t>(rand.y <= p2);
          }
          case 1: {
            assert(0 <= p1 && p1 <= 1);
            v1 = static_cast<scalar_t>(rand.x <= p1);
          }
        }
      }
    );
```

Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops:

post patch
```
➜  ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py
torch.bernoulli(x)
6.841588497161865 +- 0.05413117632269859
torch.bernoulli(xc)
0.05963418632745743 +- 0.0008014909108169377
x.bernoulli_()
0.4024486541748047 +- 0.0021550932433456182
xc.bernoulli_()
0.02167394384741783 +- 2.3818030967959203e-05

```

pre-patch
```
➜  ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py
torch.bernoulli(x)
12.394511222839355 +- 0.0966421514749527
torch.bernoulli(xc)
0.08970972150564194 +- 0.0038722590543329716
x.bernoulli_()
1.654480218887329 +- 0.02364428900182247
xc.bernoulli_()
0.058352887630462646 +- 0.003094920190051198

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273

Differential Revision: D9831294

Pulled By: SsnL

fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088
2018-09-19 16:45:47 -07:00
Amitesh Arora
4ee0a78ee6 varargs for meshgrid (#11600)
Summary:
Adds vararg support for meshgrid and adds checks for all the tensor arguments to have the same dtype and device.

Fixes: [#10823](https://github.com/pytorch/pytorch/issues/10823), #11446

The earlier pull request closed without any changes because I had some rebasing issues, so I made another pull request to close out #10823. Sorry for the inconvenience.

Differential Revision: D9892876

Pulled By: ezyang

fbshipit-source-id: 93d96cafc876102ccbad3ca2cc3d81cb4c9bf556
2018-09-18 07:41:31 -07:00
Wei Yang
407a9fee0c make copy constructed tensor a leaf variable when using torch.tensor(sourceTensor) (#11061)
Summary:
- fix https://github.com/pytorch/pytorch/issues/10876
- the cause of the bug is because copy constructor cannot distinguish between default value of requires_grad and requires_grad=False, thus it makes a copy from source tensor along with its grad_fn if requires_grad=True at source
- with this fix, the behavior becomes
```
>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source, requires_grad=True)
>>> print(copy)
tensor([[-1.2001,  1.9869],
        [-1.0134,  1.3096]], grad_fn=<CopyBackwards>)

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source, requires_grad=False)
>>> print(copy)
tensor([[-0.7402,  0.0467],
        [ 0.4344, -0.0420]])

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source)
>>> print(copy)
tensor([[-0.7402,  0.0467],
        [ 0.4344, -0.0420]])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11061

Differential Revision: D9569714

Pulled By: weiyangfb

fbshipit-source-id: ea368688bdc0f1ce5997870e164e42835b64b4a1
2018-09-17 23:29:09 -07:00
Thomas Viehmann
a02685e109 Fix test_torch's test_potri (#11770)
Summary:
tset_potri -> test_potri, even though it has been like this for a long time

More a curiosity than grave functionality...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11770

Reviewed By: ezyang

Differential Revision: D9884767

Pulled By: soumith

fbshipit-source-id: 9bedde2e94ade281ab1ecc2293ca3cb1a0107387
2018-09-17 21:58:18 -07:00
vishwakftw
47d65ed34f Fix issue 10492 (#11634)
Summary:
- pass infos vector by reference
- checkErrors takes infos vector by reference
- modified gesv tests to not cause infs or nans sporadically
- also clean up error messages

Reviewed By: ezyang

Differential Revision: D9818550

Pulled By: soumith

fbshipit-source-id: 00215205ff88767d6a5e921322394c5fd915d6d8
2018-09-17 12:13:45 -07:00
Gregory Chanan
a8b1755de6 Check device argument makes sense for legacy tensor constructors. (#11669)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/11427.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11669

Differential Revision: D9817881

Pulled By: gchanan

fbshipit-source-id: 77dc5b0e6bc9884d2616210b96c07e4734058bb6
2018-09-17 08:24:25 -07:00
David Riazati
70e68e755a Casting for binary ops (#11708)
Summary:
Fixes #11663

`TensorIterator` was replacing the op tensors with type casted tensors
which ended up producing side effects in binary ops like `a.float() * b`
where `a` and `b` are `LongTensor`s.

colesbury ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11708

Differential Revision: D9834016

Pulled By: driazati

fbshipit-source-id: 4082eb9710b31dfc741161a0fbdb9a8eba8fe39d
2018-09-14 13:40:21 -07:00
Edward Yang
72822ee6b2 Fix #11430 (CPU only builds raise opaque error message when calling .… (#11533)
Summary:
…cuda())

While I was at it, I audited all other ways I know how we might get a CUDA
type from PyTorch and fixed more constructors which don't work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11533

Differential Revision: D9775786

Pulled By: ezyang

fbshipit-source-id: cd07cdd375fdf74945539ec475a48bf08cbc0c17
2018-09-14 09:10:08 -07:00
Roy Li
9abc666745 stop allowing extra positional args in arg parser (#10499)
Summary:
Arg parser allowed additional positional args to be parsed into keyword-only params.

Fixes a couple cases:
- The positional argument happens to be of the right type, and it just works silently. Now, we fail as expected.
- The positional argument fails later down the line. Now, we fail at the appropriate time and get a better error message.

Pre-fix:
```
>>> torch.cuda.LongTensor((6, 0), 1, 1, 0)
tensor([6, 0], device='cuda:1')
```
Post-fix:
```
>>> torch.cuda.LongTensor((6, 0), 1, 1, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new() received an invalid combination of arguments - got (tuple, int, int, int), but expected one of:
 * (torch.device device)
 * (torch.Storage storage)
 * (Tensor other)
 * (tuple of ints size, torch.device device)
 * (object data, torch.device device)
```

Pre-fix:
```
>>> a = torch.tensor(5)
>>> a.new_zeros((5,5), 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new_zeros(): argument 'dtype' (position 2) must be torch.dtype, not int
```

Post-fix:
```
>>> a = torch.tensor(5)
>>> a.new_zeros((5,5), 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new_zeros() takes 1 positional argument but 2 were given
```

fixes #8351
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10499

Differential Revision: D9811093

Pulled By: li-roy

fbshipit-source-id: ce946270fd11b264ff1b09765db3300879491f76
2018-09-13 11:56:12 -07:00
Johannes M Dieterich
776a9992e1 topk test fix, hgemm integration (#11593)
Summary:
After discussions in #11584 , new PR for just the test skip and hgemm integration.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11593

Differential Revision: D9798527

Pulled By: ezyang

fbshipit-source-id: e2ef5609676571caef2f8e6844909fe3a11d8b3e
2018-09-12 16:56:13 -07:00
Edward Yang
ac9268f25d Conversions to and from complex numbers. (#11420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11420

Surprisingly tricky!  Here are the major pieces:

- We grow a even yet more ludicrous macro
  AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF
  which does what it says on the tin.  This is because I was
  too lazy to figure out how to define the necessary conversions
  in and out of ComplexHalf without triggering ambiguity problems.
  It doesn't seem to be as simple as just Half.  Leave it for
  when someone actually wants this.

- Scalar now can hold std::complex<double>.  Internally, it is
  stored as double[2] because nvcc chokes on a non-POD type
  inside a union.

- overflow() checking is generalized to work with complex.
  When converting *to* std::complex<T>, all we need to do is check
  for overflow against T.  When converting *from* complex, we
  must check (1) if To is not complex, that imag() == 0
  and (2) for overflow componentwise.

- convert() is generalized to work with complex<->real conversions.
  Complex to real drops the imaginary component; we rely on
  overflow checking to tell if this actually loses fidelity. To get
  the specializations and overloads to work out, we introduce
  a new Converter class that actually is specializable.

- Complex scalars convert into Python complex numbers

- This probably fixes complex tensor printing, but there is no way
  to test this right now.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: cpuhrsch

Differential Revision: D9697878

Pulled By: ezyang

fbshipit-source-id: 181519e56bbab67ed1e5b49c691b873e124d7946
2018-09-08 16:39:43 -07:00
Tongzhou Wang
d3f98b5ffc Add matrix power (#11421)
Summary:
vishwakftw Your patch needed some updates because the default native function dispatches changed from `[function, method]` to `[function]`. The CI was run before that change happened so it still shows green, but the internal test caught it.

I did some changes when rebasing and updating so I didn't just force push to your branch. Let's see if this passes CI and internal test. If it does, let me know if you want me to force push to your branch or use this PR instead.

Note to reviewers: patch was already approved at #10068 .

cc yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11421

Differential Revision: D9733407

Pulled By: SsnL

fbshipit-source-id: cf2ed293bb9942dcc5158934ff4def2f63252599
2018-09-08 15:25:56 -07:00
Tongzhou Wang
b9b9ae935b Make torch.randint have default dtype int64 (#11040)
Summary:
cc gchanan apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11040

Differential Revision: D9565728

Pulled By: SsnL

fbshipit-source-id: eb5be9609f30c88f52746fa7e13ad71e2856648e
2018-09-08 07:55:06 -07:00
vishwakftw
733402bef4 Fix issues with certain heterogeneous types in lists during tensor creation (#11377)
Summary:
Closes #9963
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11377

Differential Revision: D9701824

Pulled By: soumith

fbshipit-source-id: 89c5448fd90ece1b365dc42f775b6b0c73ce790c
2018-09-07 12:56:35 -07:00
Peter Goldsborough
fb4e8088f3 Remove methods that start with an underscore from at::Tensor (#11152)
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.

For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.

ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152

Differential Revision: D9683607

Pulled By: goldsborough

fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
2018-09-07 11:55:11 -07:00
Erik Brinkman
91089a7e17 Add GPU implementation of pdist (#11102)
Summary:
Add the gpu kernel version.

The parallelism I went with performs poorly when there are a large number of vectors, but they're all short, as I don't allocate the thread pool to wrap in that case.

Test Plan
---------
```
python -m unittest test_torch.TestTorch.test_pdist_{empty,scipy} test_nn.TestNN.test_pdist{,_zeros,_empty_row,_empty_col,_cpu_gradgrad_unimplemented,_cuda_gradgrad_unimplemented} test_jit.TestJitGenerated.test_nn_pdist
```

Current performance specs are a little underwhelming, I'm in the process of debugging.

size | torch | torch cuda | scipy
-----|-------|------------|------
16 x 16 | 9.13 µs ± 3.55 µs | 9.86 µs ± 81.5 ns | 15.8 µs ± 1.2 µs
16 x 1024 | 15 µs ± 224 ns | 9.48 µs ± 88.7 ns | 88.7 µs ± 8.83 µs
1024 x 16 | 852 µs ± 6.03 µs | 7.84 ms ± 6.22 µs | 4.7 ms ± 166 µs
1024 x 1024 | 34.1 ms ± 803 µs | 11.5 ms ± 6.24 µs | 273 ms ± 6.7 ms
2048 x 2048 | 261 ms ± 3.5 ms | 77.5 ms ± 41.5 µs | 2.5 s ± 97.6 ms
4096 x 4096 | 2.37 s ± 154 ms | 636 ms ± 2.97 µs | 25.9 s ± 394 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11102

Differential Revision: D9697305

Pulled By: erikbrinkman

fbshipit-source-id: 2b4f4b816c02b3715a85d8db3f4e77479d19bb99
2018-09-07 09:09:46 -07:00
Edward Yang
49231ab0a8 Reimplement storage slicing. (#11314)
Summary:
In #9466 I got rid of storage views and eliminated all places where
they were used... OR SO I THOUGHT.  In actuality, under certain
conditions (specifically, if you trained a CUDA multiprocessing model
shared over CUDA IPC and then serialized your parameters), you could
also serialize storage slices to the saved model format.  In #9466,
I "fixed" the case when you loaded the legacy model format (really,
just unshared the storages--not strictly kosher but if you aren't
updating the parameters, shouldn't matter), but NOT the modern model format, so
such models would fail.

So, I could have applied the legacy model format fix too, but
hyperfraise remarked that he had applied a fix that was effectively
the same as unsharing the storages, but it had caused his model to
behave differently.  So I looked into it again, and realized that
using a custom deleter, I could simulate the same behavior as old
storage slices.  So back they come.

In principle, I could also reimplement storage views entirely using
our allocators, but I'm not going to do that unless someone really
really wants it.

Fixes #10120.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11314

Reviewed By: ailzhang

Differential Revision: D9671966

Pulled By: ezyang

fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce
2018-09-06 16:11:59 -07:00
Wei Yang
425ea6b31e fix doc for functional.dropout* (#10417)
Summary:
- fixes #4177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10417

Differential Revision: D9542876

Pulled By: weiyangfb

fbshipit-source-id: 480ed973d1fe0364f4acb5cd596c2031895b82df
2018-09-05 17:26:00 -07:00
Thomas Viehmann
267e1ec112 Accept more numpy scalars as doubles (#9659)
Summary:
Allows mulitplication of e.g. numpy.float32 with tensors.

This came up with #9468

If you want this and after the other patch is done, I'll add tests (but that would be conflicting, so I prefer to wait).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9659

Differential Revision: D8948078

Pulled By: weiyangfb

fbshipit-source-id: c7dcc57b63e2f100df837f70e1299395692f1a1b
2018-09-05 10:25:55 -07:00
Thomas Viehmann
d4060d2d0e Implement torch.tensordot (#10025)
Summary:
Fixes: #8988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10025

Reviewed By: ezyang

Differential Revision: D9540967

Pulled By: yf225

fbshipit-source-id: 6ba2a7777162983977db884b693e6f4543b31aeb
2018-09-04 21:10:07 -07:00
Christian Puhrsch
313e89d8db Fix dimension collapsing (#11226)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11226

Differential Revision: D9646638

Pulled By: cpuhrsch

fbshipit-source-id: 104f367f75a4478bb7580324ea3661de71b2c8b0
2018-09-04 17:27:52 -07:00
Tongzhou Wang
7e2136c2b5 remove allclose from test_doc skipped list
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11187

Differential Revision: D9628349

Pulled By: SsnL

fbshipit-source-id: 0ff94666542ca049a6d82091bd9fc79ec1699ac6
2018-09-03 09:39:56 -07:00
iotamudelta
33c7cc13ca improve docker packages, fix bugs, enable tests, enable FFT (#10893)
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893

Differential Revision: D9615053

Pulled By: ezyang

fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
2018-09-02 08:54:42 -07:00
Tongzhou Wang
1350f76b62 Fix max and min with inf on CUDA (#11091)
Summary:
Fixes #10237 #11084

cc vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11091

Differential Revision: D9582859

Pulled By: SsnL

fbshipit-source-id: 3991c0a2af65ba82fa815b82f9e6b2107912fd10
2018-09-01 23:09:23 -07:00
Erik Brinkman
611a608517 Add ATen pdist CPU kernel (#10782)
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782

Reviewed By: ezyang

Differential Revision: D9583378

Pulled By: erikbrinkman

fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
2018-08-30 11:55:27 -07:00
pbialecki
2cc98d8df7 Adds dim argument to torch.unique (#10423)
Summary:
Initial version of `unique` supporting a `dim` argument.

As discussed in [this issue](https://github.com/pytorch/pytorch/issues/9997) I added the `dim` argument to `torch.unique` with the same behavior like [numpy](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.unique.html).

Since the implementation is based on `std/thrust::unique`, the `tensor` always needs to be sorted. The `sorted` argument in `torch.unique` does not have any function, as in the CUDA version of the plain `torch.unique`.

To check the performance and equal behavior between `torch.unique` and `np.unique`, I've used [this gist](https://gist.github.com/ptrblck/ac0dc862f4e1766f0e1036c252cdb105).

Currently we achieve the following timings for an input of `x = torch.randint(2, (1000, 1000))`:
(The values are calculated by taking the average of the times for both dimension)

| Device | PyTorch (return_inverse=False) | Numpy (return_inverse=False) | PyTorch (return_inverse=True) | Numpy (return_inverse=True) |
| --- | --- | --- | --- | --- |
| CPU | ~0.007331s | ~0.022452s | ~0.011139s | ~0.044800s |
| GPU | ~0.006154s | - | ~0.105373s | - |

Many thanks to colesbury for the awesome mentoring and the valuable advices on the general implementation and performance issues!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10423

Differential Revision: D9517289

Pulled By: soumith

fbshipit-source-id: a4754f805223589c2847c98b8e4e39d8c3ddb7b5
2018-08-29 16:26:09 -07:00
Tongzhou Wang
e9eed8edb4 Add doc for Tensor.digamma_? (#11008)
Summary:
follow up for #10967

zou3519 vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11008

Differential Revision: D9559889

Pulled By: SsnL

fbshipit-source-id: a05d8fbad92a54bcdb93de6e62a7f94180da1d99
2018-08-29 14:11:16 -07:00
Christian Puhrsch
ec519e8a4a Reduce number of elements within test_abs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10997

Differential Revision: D9556861

Pulled By: cpuhrsch

fbshipit-source-id: 986ef275e94fcffcc04a5c1103b8b7bfb4ae3ba5
2018-08-29 12:55:54 -07:00
Ailing Zhang
a9469c9c8a Fill eigenvector with zeros if not required (#10645)
Summary:
Fix #10345, which only happens in CUDA case.

* Instead of returning some random buffer, we fill it with zeros.

* update torch.symeig doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10645

Reviewed By: soumith

Differential Revision: D9395762

Pulled By: ailzhang

fbshipit-source-id: 0f3ed9bb6a919a9c1a4b8eb45188f65a68bfa9ba
2018-08-29 10:55:22 -07:00
Wei Yang
f1df85d799 bug-fix in normal_( ) (#10846)
Summary:
- fixes #10642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10846

Differential Revision: D9495014

Pulled By: weiyangfb

fbshipit-source-id: 35a9fc349f9f0c21a24141f29c62853ab6a68dae
2018-08-24 11:26:18 -07:00
Will Feng
b14f2e899c Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279)
Summary:
When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times:

```
_sparseDims + _denseDims = len(shape)
_indices.shape: dimensionality: 2,  shape: (_sparseDims, nnz)
_values.shape:  dimensionality: 1 + _denseDims.  shape: (nnz, shape[_sparseDims:])
```

This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled.

Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279

Differential Revision: D8936683

Pulled By: yf225

fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e
2018-08-23 10:10:24 -07:00
Vishwak Srinivasan
5fb9b31ed5 Add matrix_rank (#10338)
Summary:
- Similar functionality as NumPy
- Added doc string
- Added tests

Differential Revision: D9240850

Pulled By: SsnL

fbshipit-source-id: 1d04cfadb076e99e03bdf699bc41b8fac06831bf
2018-08-22 09:58:38 -07:00
Vishwak Srinivasan
8013dac43d Fix bincount for empty input (#9757)
Summary:
Added tests too. Fixes #9756 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757

Reviewed By: Yangqing

Differential Revision: D9348485

Pulled By: soumith

fbshipit-source-id: e13afadf8dbea20ee6ee595383c522dcbaf8796a
2018-08-15 20:55:59 -07:00
Thomas Viehmann
151e7de893 varargs for einsum (#10067)
Summary:
Implemented via a wrapper, thank you Richard for the suggestion!

Fixes: #9929
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10067

Differential Revision: D9083388

Pulled By: soumith

fbshipit-source-id: 9ab21cd35278b01962e11d3e70781829bf4a36da
2018-08-15 15:13:25 -07:00
Tongzhou Wang
d043f83019 Add tests for Tensor.* nn.* F.* docs (#10311)
Summary:
Test only for existence for now. I had to skip a lot of them so there a FIXME in the test.

Also I'm not testing torch.* because of namespace issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311

Differential Revision: D9196341

Pulled By: SsnL

fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc
2018-08-14 11:39:46 -07:00
Vishwak Srinivasan
7d16e87f14 Fix byte ordering issue in from_numpy (#9508)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/3671 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9508

Differential Revision: D9307186

Pulled By: soumith

fbshipit-source-id: 39dcaa6fd2d330d7085802acd6f63c19270164fa
2018-08-13 21:39:16 -07:00
pbialecki
c6fc3ab557 fixes printing non-contiguous tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10405

Differential Revision: D9302794

Pulled By: soumith

fbshipit-source-id: e4a7db8d33400a5a050d05fd1679de8bc3cbcf30
2018-08-13 16:26:20 -07:00
iotamudelta
75651d5b58 improve use of ROCm libraries, enable more tests, small fixes (#10406)
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
2018-08-13 11:39:43 -07:00
Christian Puhrsch
0b8a0125ab Fixes torch.log after torch.expand giving incorrect results (#10269)
Summary:
fixes #10241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10269

Differential Revision: D9272472

Pulled By: cpuhrsch

fbshipit-source-id: cd1afbb4386a0d0956ee21b24f0d529755b986ca
2018-08-10 13:39:38 -07:00
Gregory Chanan
209af45614 Back out "[pytorch][PR] Fix bincount for empty input"
Summary: Original commit changeset: 6c4c66c23679

Reviewed By: SsnL

Differential Revision: D9253403

fbshipit-source-id: bf5ee669ed095c06ff58a2871f7350e879261076
2018-08-09 14:25:33 -07:00
Vishwak Srinivasan
b43beec070 Fix bincount for empty input (#9757)
Summary:
Added tests too. Fixes #9756 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757

Differential Revision: D8966879

Pulled By: soumith

fbshipit-source-id: 9f08a9d5d5d037db16319141d7a227a5efa23869
2018-08-09 12:40:45 -07:00
Thomas Viehmann
6e49f933ad Check that result is on CPU for CPU unary ops kernels (#10358)
Summary:
Fixes: #10270
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10358

Differential Revision: D9233066

Pulled By: soumith

fbshipit-source-id: 39b7524fe55ddb899fb27e2c0ef504ce54dbad35
2018-08-08 21:11:53 -07:00
Roy Li
fe68879832 Fix dir(torch) for python 3.7 (#10271)
Summary:
fixes #10160.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10271

Differential Revision: D9188031

Pulled By: li-roy

fbshipit-source-id: a3620553a8ba2b7391acdf78dbe58afcdb6c5f7f
2018-08-07 09:57:51 -07:00
iotamudelta
a38b572de3 enable unit tests and other changes (#10266)
Summary:
This PR for the ROCm target does the following:
* enable some unit tests on ROCm
* fix a missing static_cast that breaks BatchNorm call on ROCm
* fix BatchNorm to work on ROCm w/ ROCm warp sizes etc
* improve the pyhipify script by introducing kernel scope to some transpilations and other improvements
* fix a linking issue on ROCm
* for more unit test sets: mark currently broken tests broken (to be fixed)
* enable THINLTO (phase one) to parallelize linking
* address the first failing of the elementwise kernel by removing non-working ROCm specialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266

Differential Revision: D9184178

Pulled By: ezyang

fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297
2018-08-06 14:54:01 -07:00
Owen Anderson
7a377b9a53 Add torch.argsort mirroring similar functionality in numpy. (#9600)
Summary:
Per issue #9542
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9600

Differential Revision: D8952338

Pulled By: resistor

fbshipit-source-id: c3f69d62858ad9458ec5ae563e3ff24b1c9283a7
2018-08-03 11:45:47 -07:00
Richard Zou
6b338c8026 Implement torch.broadcast_tensors (#10075)
Summary:
This exposes expand_outplace to python. Fixes #8076. Fixes #10041.

I didn't name it torch.broadcast because numpy.broadcast does something
slightly different (it returns an object with the correct shape
information).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10075

Differential Revision: D9125816

Pulled By: zou3519

fbshipit-source-id: ebe17c8bb54a73ec84b8f76ce14aff3e9c56f4d1
2018-08-01 19:18:34 -07:00
Wei Yang
6f6a1f2d63 fix test_load_error_msg failure (Network is unreachable) (#10021)
Summary:
- fixes [some failure]
- removed use of urlopen in test_load_error_msg]

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10021

Differential Revision: D9068108

Pulled By: weiyangfb

fbshipit-source-id: a9484d4a913508d54731b6a1eef3cddff66604f2
2018-08-01 00:24:01 -07:00
Gregory Chanan
34c7c56c73 Re-enable empty n-dimensional empty tensor and fix parallel CPU on empty tensors (#10077)
Summary:
This is a combination of https://github.com/pytorch/pytorch/pull/9947 (this was reverted) and https://github.com/pytorch/pytorch/pull/10076.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10077

Differential Revision: D9087491

Pulled By: gchanan

fbshipit-source-id: 9fe9905628000f2ff3e47df32533cd7d1f25a354
2018-07-31 16:43:45 -07:00
Gregory Chanan
6fb9acfc16 Revert empty n-dim and ATen in C2 integration builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10064

Differential Revision: D9082082

Pulled By: gchanan

fbshipit-source-id: ae49470f5b4c89b13beb55fd825de1ba05b6a4fa
2018-07-31 07:25:56 -07:00
Thomas Viehmann
6c7fb1582f Introduce __array_priority__ on torch.Tensor (#9651)
Summary:
This causes numpy to yield to the torch functions,
e.g. instead of numpy array/scalar __mul__ converting the tensor to
an array, it will now arrange for the Tensor __rmul__ to be called.

Fixes case 2 of #9468
I also makes case 3 and 4 equivalent but does not fix them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9651

Differential Revision: D8948079

Pulled By: ezyang

fbshipit-source-id: bd42c04e96783da0bd340f37f4ac3559e9bbf8db
2018-07-30 14:39:43 -07:00
vishwakftw
ea3c36b822 NumPy Scalar to PyTorch Scalar (#9225)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4985 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9225

Differential Revision: D8769317

Pulled By: ezyang

fbshipit-source-id: eeaeaf0749c9dc9e372634da68b4bd23e6e3ad28
2018-07-30 14:39:40 -07:00
Thomas Viehmann
faa96c1c47 Deal with spaces in einsum equation string (#9994)
Summary:
Fixes #9930
Thank you, vadimkantorov for the report.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9994

Differential Revision: D9042876

Pulled By: ezyang

fbshipit-source-id: 3bbd1aaaf1b432be40a7652b6a746d80934a216b
2018-07-30 12:57:56 -07:00
Gregory Chanan
ce5f0d40b6 Enable n-dimensional empty tensors. (#9947)
Summary:
These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947

Reviewed By: ezyang

Differential Revision: D9032778

Pulled By: gchanan

fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd
2018-07-30 12:33:17 -07:00
Sam Gross
829d763c69 Implement add, sub, mul, div using TensorIterator (#8919)
Summary:
```
This adds TensorIterator, a helper class for computing element-wise
operations that's intended to replace the CPU and CUDA apply utils
functions.

CPU kernels are implemented as functions that operate on strided 1-d
tensors compared to CPUApplyUtils which operated individual elements. This
allows the kernels to handle vectorization, while TensorIterator handles
parallelization and non-coalesced dimensions.

GPU kernels continue to operate on elements, but the number of
specializations is reduced. The contiguous case remains the same. The
non-contiguous case uses a single (reduced) shape for all operands and
the fast integer division from THCIntegerDivider. To avoid extra
specializations for indexing with 64-bits, large operations are split
into smaller operations that can be indexed with 32-bits.

Major semantic changes:

 - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by
   TensorIterator. The autograd engine performs the reduction assuming
   standard broadcasting if the gradient shape does not match the
   expected shape. Functions that do not use standard broadcasting rules
   should either continue to trace the expand calls or handle the
   reduction in their derivative formula.

 - Use ONNX v7, which supports broadcasting ops.

Performance impact:

 - Small increased fixed overhead (~0.5 us)
 - Larger overhead for wrapped numbers (~2.5 us)
 - No significant change for ops on contiguous tensors
 - Much faster worst-case performance for non-contiguous GPU tensors
 - Faster CPU bias addition (~2x)
 - Faster GPU bias addition (~30% faster)

Future work:

 - Decrease overhead, especially for wrapping numbers in Tensors
 - Handle general inter-type operations
 - Extend to unary ops and reductions
 - Use buffering for compute-bound operations on non-contiguous tensors
   (pull in from CPUApplyUtils)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919

Differential Revision: D8677600

Pulled By: colesbury

fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd
2018-07-27 14:43:24 -07:00
Gregory Chanan
c0bacc6284 Guard test_lapack_empty with has_magma. (#9936)
Summary:
CUDA lapack functions generally don't work unless has_magma is true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9936

Differential Revision: D9028579

Pulled By: gchanan

fbshipit-source-id: 9b77e3b05253fd49bcabf604d0924ffa0e116055
2018-07-27 10:09:00 -07:00
Wei Yang
302adb7cc8 added torch.rot90() to ATen (#8628)
Summary:
1. fixes #6271
2. implemented torch.rot90() following [numpy.rot90()](6a58e25703/numpy/lib/function_base.py (L54-L138))
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8628

Reviewed By: ezyang

Differential Revision: D8987860

Pulled By: weiyangfb

fbshipit-source-id: 8dac3b2a1f6d3288672977aba8b547706ce97fe9
2018-07-25 15:11:44 -07:00
Gregory Chanan
be163f50a3 Avoid divide-by-zero when bartlett_window size is 0.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9788

Differential Revision: D8980951

Pulled By: gchanan

fbshipit-source-id: 429b341ac687afe4f1429bb141ef070bf315519c
2018-07-25 10:40:39 -07:00
bhushan
ea67a2bd11 Allows negative index to tensor.narrow (Fixes: #9546)
Summary:
Fixes #9546
Test cases added

Reviewed By: ezyang

Differential Revision: D8974842

Pulled By: zou3519

fbshipit-source-id: a7707406c2a21e8e14f9c2a8ad4d64c8b08156df
2018-07-25 09:25:45 -07:00
Edward Yang
0262fd0f91 Delete Tensor::typeString() (#9764)
Summary:
The primary use-site of typeString was checked_cast_tensor.
I did a little more than I needed in this patch, to set
the stage for actually deleting the tensor type.

Specifically, I modified checked_cast_tensor to explicitly
take Backend and ScalarType, the idea being that once we
remove the tensor subclasses, we will delete the T template
parameter.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9764

Differential Revision: D8969196

Pulled By: ezyang

fbshipit-source-id: 9de92b974b2c28f12ddad13429917515810f24c6
2018-07-24 22:26:15 -07:00
Thomas Viehmann
7050d83dd7 Make logsumexp_out inplace (#9755)
Summary:
Fixes: #9754

Maybe this could also make its way into 0.4.1, it  is a severe debugging headache if you hit this...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9755

Reviewed By: ezyang

Differential Revision: D8967178

Pulled By: zou3519

fbshipit-source-id: 151ed24e3a15a0c67014e411ac808fb893929a42
2018-07-24 12:40:48 -07:00
Vishwak Srinivasan
360c1bbd5b Add multivariate log-gamma (mvlgamma) (#9451)
Summary:
1. Add tests in test_cuda, test_torch
2. Add doc strings

Closes https://github.com/pytorch/pytorch/issues/9378 .

Differential Revision: D8859746

Pulled By: ezyang

fbshipit-source-id: 939c309d90940a7aa08f53004c9e7b3b1c9cf54e
2018-07-24 12:10:10 -07:00
Gregory Chanan
6ab5e697b9 Small fixups for enabling zero size dims. (#9724)
Summary:
1) Properly test cpu for alpha/beta addmm cases.
2) Unsqueeze on empty no longer throws an exception.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9724

Reviewed By: ezyang

Differential Revision: D8958513

Pulled By: gchanan

fbshipit-source-id: 6ce2ec4a47201f9b225b8c52354144ace43e9e09
2018-07-24 11:11:39 -07:00
Gregory Chanan
9d6521c3a0 Support n-dimensional empty tensors in CUDA non-reduction dimension f… (#9658)
Summary:
…unctions.

This also unifies the error checkign between scatter/scatterAdd on CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9658

Differential Revision: D8941527

Pulled By: gchanan

fbshipit-source-id: 750bbac568f607985088211887c4167b67be11ea
2018-07-23 08:40:12 -07:00
Gregory Chanan
3efdece9da Support n-dimensional empty tensors in take/put.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9635

Differential Revision: D8935119

Pulled By: gchanan

fbshipit-source-id: 5035583e7322b1a1720d961945dd0eefb4cb28ef
2018-07-20 15:40:49 -07:00
Gregory Chanan
bae156a481 Support (some) CUDA Lapack on n-dimensional empty tensors.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9631

Reviewed By: ezyang

Differential Revision: D8933202

Pulled By: gchanan

fbshipit-source-id: 1ade4ca439bf26aa921df1da83a827d860f8f48f
2018-07-20 11:40:25 -07:00
Gregory Chanan
f180373d68 Support n-dimensional empty tensors in CUDA BLAS and fix a btrifact bug. (#9573)
Summary:
This is mainly straightforward, with two exceptions:
1) cublasSgemv, cublasDgemv appear to have a bug where (x,0).mv(0) does not handle beta, whereas cublasSgemm, cublasDgemm do for case where (x,0).mm(0,y).  This is handled by manually calling zero / mul.

2) I fixed a bug in btrifact that was broken even when dealing with non-empty tensors.  Basically, if out.stride(0) was 1, because the underlying BLAS call expects column-major matrices, to get a column-major tensor, out.transpose_(0, 1) would be called.  But this is just wrong, as if the batch dimension (0) doesn't match the size of the columns (1), you don't even have a tensor of the correct shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9573

Reviewed By: ezyang

Differential Revision: D8906144

Pulled By: gchanan

fbshipit-source-id: de44d239a58afdd74d874db02f2022850dea9a56
2018-07-19 09:50:27 -07:00
Will Feng
e0446fcfa9 Pass dtype to tensor contructor in test_neg (#9558)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/9554.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9558

Differential Revision: D8901085

Pulled By: yf225

fbshipit-source-id: 0edb176fcb18e0c0bcfc6f209343b9097767c9b8
2018-07-19 08:54:39 -07:00
Christian Puhrsch
8769fec03f Move clamp into ATen (#9506)
Summary:
Glue component of https://github.com/pytorch/pytorch/pull/9319

Important to unblock wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9506

Reviewed By: wanchaol

Differential Revision: D8879437

Pulled By: cpuhrsch

fbshipit-source-id: 16ea8a93f3f5df2695180b3a30a583834b7004f1
2018-07-18 13:40:11 -07:00
Tongzhou Wang
27455e9c78 Use _six for inf and nan (#9500)
Summary:
Things like `float('inf')` are actually quite expensive.
```py
In [1]: import math

In [2]: %timeit -n 200 math.inf
49.3 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 200 loops each)

In [3]: %timeit -n 200 float('inf')
194 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 200 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9500

Reviewed By: soumith

Differential Revision: D8876229

Pulled By: SsnL

fbshipit-source-id: 78602b76bb53d5588910b58270930c0bd413d2d7
2018-07-18 10:40:29 -07:00
Gregory Chanan
f277645968 Support N-dimensional empty tensors in CPU BLAS and (a selection of) … (#9522)
Summary:
…CPU LAPACK routines.

Note that the LAPACK functions in general require a different approach, because direct calls with size zero dims do not work.
Here I just selected a reasonable subset of LAPACK routines to support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9522

Reviewed By: ezyang

Differential Revision: D8888180

Pulled By: gchanan

fbshipit-source-id: 16b9013937806d375d83d1c406815765fda00602
2018-07-18 08:25:21 -07:00
bhushan23
5eaed750c2 Implementing torch.isfinite (#9487)
Summary:
fixes #9132
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9487

Reviewed By: soumith

Differential Revision: D8875529

Pulled By: SsnL

fbshipit-source-id: d1b8aa825d202cfbdca27897da6a8bc1b714f856
2018-07-18 08:25:20 -07:00
vishwakftw
1c3580b6fe Added hash for device (#9246)
Summary:
If this is good, I could write some tests to ensure collision doesn't occur within a given range.

Closes #7228
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9246

Differential Revision: D8872608

Pulled By: ezyang

fbshipit-source-id: 0ed29a73188f4167b42756f59a5c9a3d5cb37326
2018-07-17 17:10:17 -07:00
Gregory Chanan
890037eaaf Fix (non-reduction) ops over a dimension for n-dimensional empty tens… (#9482)
Summary:
…ors (CPU).

This includes (mainly) CPU fixes; CUDA fixes are a little more involved because you can't use an empty grid.
This also includes a fix for index_copy, which checked that self.size(dim) == src.size(0), which isn't correct (the same dimension should be compared).
Finally, also includes a fix for CUDA flip (although it's not tested yet), to get the stride using multiplication rather than division to avoid divide-by-0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9482

Reviewed By: ezyang

Differential Revision: D8873047

Pulled By: gchanan

fbshipit-source-id: 86523afd3d50277834f654cd559dfbc7875cdffe
2018-07-17 13:11:04 -07:00
Tongzhou Wang
050a2588b5 change stft to have consistent signature with librosa (#9497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9497

Fixes #7883 by using `rfft`.

It's worth noting that this is BC breaking. And it's impossible to detect the change because the two signatures before and after this change supports a common subset of calling patterns, e.g., `stft(Tensor, int, int)`. (some other calling patterns will raise error).

soumith and I plan to change the current `stft` interface because it is a bit messy and non-standard. rafaelvalle suggested us that `librosa` is a good reference API to align with. After discussing with soumith and ezyang , and given that `stft` is only out for 1 release, I decide to go with directly changing the signature. Also, my understanding is that most researchers in this field will welcome this change as `librosa` seems to be the golden-standard here. (it doesn't yet support all `pad_mode` but those will become available if added to `F.pad`.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9308

Reviewed By: ezyang

Differential Revision: D8806148

Pulled By: SsnL

fbshipit-source-id: f6e8777d0c34d4a4d7024e638dc9c63242e8bb58
2018-07-17 10:55:43 -07:00
vishwakftw
52cc073212 Implement reshape_as (#9452)
Summary:
1. Added tests
2. Added doc string
3. Remove view_as redundant definition from tensor.py

Closes #9416
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9452

Differential Revision: D8851794

Pulled By: ezyang

fbshipit-source-id: 0aa0430dd0a174e1a5caddbc50a7e2c9eb7802bc
2018-07-17 08:54:42 -07:00
Edward Yang
976f9253a5 Eliminate storage views. (#9466)
Summary:
Storage views were previously used to implement CUDA IPC sharing,
but they weren't necessary.  The new strategy is described in
Note [CUDA IPC and the caching allocator].

This also fixes an unrelated bug, where we weren't actually using
the Tensor forking pickler, because we didn't register a pickler
for torch.Tensor.

Fixes #9447.  Fixes #46.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466

Reviewed By: apaszke

Differential Revision: D8859698

Pulled By: ezyang

fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
2018-07-16 15:40:24 -07:00
Will Feng
52abcdd0dc Fix out-of-range error for test_neg (#9431)
Summary:
`test_neg` sometimes fails internally because `random_()` can generate an out-of-range value for CharTensor. This PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9431

Reviewed By: SsnL

Differential Revision: D8843284

Pulled By: yf225

fbshipit-source-id: bf516cceb8f780e133fa54f7364c77821eb7c013
2018-07-16 10:26:54 -07:00
bhushan
5eb9d40cc6 Introducing IsInf (#9169)
Summary:
torch.isinf - checks element wise +/- inf implements #9132
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9169

Reviewed By: SsnL

Differential Revision: D8768614

Pulled By: zou3519

fbshipit-source-id: dd1b5f6c976deb421d626e22cdd25500ec04d796
2018-07-15 07:55:09 -07:00
Thomas Viehmann
8444e1660b Only accept continguous tensors in TopK for cuda (#9441)
Summary:
Fixes: #9421

I don't think it is easy to deal with non-contiguous array in cuda topk, so I'm adding a check.
The argument number is a bit confusing when it shows in PyTorch but it is consistent with the other checks. (Not sure whether it would make sense to eliminate argument numbers from the error TH/THC error messages given that they're probably off more than once...)

Do we need a test that it indeed refuses non-contiguous?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9441

Reviewed By: soumith

Differential Revision: D8850719

Pulled By: ezyang

fbshipit-source-id: d50561bb37ed50ab97aeaf54d8e3fc6c765bdc7c
2018-07-14 12:24:52 -07:00
Gregory Chanan
f09828ee0e Support n-dimensional empty tensors in TensorShape methods. (#9362)
Summary:
This includes either bug fixes or NumPy semantics changes for the following methods:
chunk, diagonal, unfold, repeat, flatten, reshape, split, unsqueeze.

The n-dimensional empty tensor feature is still hidden behind a feature flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9362

Reviewed By: ezyang

Differential Revision: D8817002

Pulled By: gchanan

fbshipit-source-id: 6ff704ec96375f00b4dd39ebcd976efac0607fb4
2018-07-13 08:25:40 -07:00
Vishwak Srinivasan
cd3e067e46 Add reversed(torch.Tensor) (#9216)
Summary:
Closes https://github.com/pytorch/pytorch/issues/3376
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9216

Differential Revision: D8753933

Pulled By: soumith

fbshipit-source-id: 5dac9b8b11ff34a205b6478db99b02fda8bd9cce
2018-07-12 19:42:07 -07:00
Alican Bozkurt
d017e1798f add erfc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9366

Differential Revision: D8816768

Pulled By: soumith

fbshipit-source-id: 7d709f932cf156a2e7ec71c710837beb7f647d66
2018-07-12 08:32:02 -07:00
Gregory Chanan
f92edf7ef4 N-dimensional empty tensors: indexing, factories, reductions. (#9209)
Summary:
This PR implements and tests N-dimensional empty tensors for indexing, factories, and reductions if compiled with -DUSE_TH_SIZE_ZERO_DIM.

Still remaining to add:
1) TensorShape functions
2) Simple linear algebra functions (matrix multiply variants)
3) Other functions that operate over a dimension (but don't reduce).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9209

Reviewed By: ezyang

Differential Revision: D8751257

Pulled By: gchanan

fbshipit-source-id: 2113374dc7af6caf31a99bf67b3893f130a29e23
2018-07-09 19:40:01 -07:00
richard
f48e15624e Unique cuda support (#8899)
Summary:
Add cuda support for unique.

There is a simple test below for a tensor including 1M <int> data.
And the performance is faster.

```python
Performance
cpu: 0.05040597915649414 s
x: tensor([1, 3, 1,  ..., 4, 9, 4])
x output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
x inverse: tensor([0, 2, 0,  ..., 3, 8, 3])

gpu: 0.015192985534667969 s
y: tensor([1, 3, 1,  ..., 4, 9, 4], device='cuda:0')
y output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], device='cuda:0')
y inverse: tensor([0, 2, 0,  ..., 3, 8, 3], device='cuda:0')
```

```python
Code
import torch
import time
x=torch.randint(1,10,(1000000,),dtype=torch.long)
device = torch.device("cuda")
y=x.to(device)
start = time.time();
output,inverse = x.unique(sorted=True,return_inverse=True)
stop = time.time();
print('cpu:',stop-start,'s')
print('x:',x)
print('x output:',output)
print('x inverse:',inverse)

start = time.time();
output1,inverse1 = y.unique(sorted=True,return_inverse=True)
torch.cuda.synchronize();
stop = time.time();
print('gpu:',stop-start,'s')
print('y:',y)
print('y output:',output1)
print('y inverse:',inverse1)
```
Closes https://github.com/pytorch/pytorch/pull/8899

Reviewed By: SsnL

Differential Revision: D8677655

Pulled By: ezyang

fbshipit-source-id: 09df3f0602f235c5d36c7a6e7e1d89dbf82570bb
2018-07-08 17:09:26 -07:00
Xiang Gao
a615baa51f move unbind to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8587

Differential Revision: D8764086

Pulled By: soumith

fbshipit-source-id: 7f311cf13c341040e1f2cf4a8f05723e32d38947
2018-07-08 16:46:35 -07:00
Tongzhou Wang
a769fae91d Fix TestAutograd.test_pinverse not actually testing (#9192)
Summary:
cc vishwakftw

Also added a check if none of the input tensors in `gradcheck` have `requires_grad=True`.
Closes https://github.com/pytorch/pytorch/pull/9192

Differential Revision: D8739401

Pulled By: SsnL

fbshipit-source-id: 81bb3aa0b5c04eb209b137a4bd978e040e76cbcd
2018-07-05 18:55:00 -07:00
Gao, Xiang
213540cd85 Add meshgrid to PyTorch (#8581)
Summary:
Part of this issue https://github.com/pytorch/pytorch/issues/7580
Closes https://github.com/pytorch/pytorch/pull/8581

Differential Revision: D8661660

Pulled By: soumith

fbshipit-source-id: 4a72fb5152ed6eb4d57f14de691bf09a2a2e5b0c
2018-07-05 11:25:27 -07:00
Vishwak Srinivasan
14cbd9adb8 Implement torch.pinverse : Pseudo-inverse (#9052)
Summary:
1. Used SVD to compute.
2. Tests in test_autograd, test_cuda and test_torch
3. Doc strings in _torch_docs.py and _tensor_docs.py

Closes #6187
Closes https://github.com/pytorch/pytorch/pull/9052

Reviewed By: soumith

Differential Revision: D8714628

Pulled By: SsnL

fbshipit-source-id: 7e006c9d138b9f49e703bd0ffdabe6253be78dd9
2018-07-05 09:11:24 -07:00
vishwakftw
08daed40f7 Fix bug in flip() (#9156)
Summary:
Closes #9147
Added a test to prevent regression in test_torch
Added entries in docs

cc ezyang weiyangfb
Closes https://github.com/pytorch/pytorch/pull/9156

Differential Revision: D8732095

Pulled By: soumith

fbshipit-source-id: 7a6892853cfc0ccb0142b4fd25015818849adf61
2018-07-04 07:24:01 -07:00
Will Feng
90fd4df695 Add flag for disabling tests with multiprocessing spawn start method (#9061)
Summary:
This will resolve some of the timeout issues in CPU and GPU tests internally.
Closes https://github.com/pytorch/pytorch/pull/9061

Reviewed By: ezyang

Differential Revision: D8707471

Pulled By: yf225

fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246
2018-06-30 14:39:11 -07:00
Will Feng
15a75208ee Use std::random_device for generating storage handle (#8971)
Summary:
Currently the `test_RNG_after_pickle` in the PR would fail because pickling a tensor changes the RNG state. This PR aims to fix it.
Closes https://github.com/pytorch/pytorch/pull/8971

Reviewed By: ezyang

Differential Revision: D8677474

Pulled By: yf225

fbshipit-source-id: 1713d9611699ad288b66d92dbb29ce9feb34b8cf
2018-06-28 15:10:27 -07:00
Orion Reblitz-Richardson
edb88b5f3a
Update from Facebook (#8887)
* add opencl + fpga context

adds an opencl context inside caffe2/fb which can be used for fpga access

* [Caffe2] Force tensor inference checks to be triggered during testing

We've started to rely on TensorInference functions more for different analysis.  This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.

* [Caffe2] Fix cost models for DotProduct and Div.  Update Tensor Inference for dot product

As title.  DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs.  TensorInference defined to support implementation.

* [SG-MoE] Add an option to make the experts NOT as components

* [nomnigraph] Rename and fixup convertToNeuralNetOperator API

This will make things a bit cleaner

* no longer symlink THNN.h and THCUNN.h

* forced decoder network (onnx export)

Closes https://github.com/pytorch/translate/pull/95

Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties.

Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea

* Revert schema change to fix production models

Revert schema change to fix production models

* MockLogDeviceReader - rebase on FIX

# Goal

1), Build a make_mock_log_device_reader using make_mock_reader

2), Replace the real log_device_reader here: https://fburl.com/raihwf1p

# Log by D8151734

Real log_device_reader:
```
I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0
I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin

* [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier

implement log barrier as a regularization method

* Add teacher weight screening.

Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function.

* Add NormalizerContext

See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file.

I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow.

https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1

* Adding cosine similarity option in dot processor

Add pairwise cosine similarity option in dot product.
Add an option to concate dot product and cosine similarity.
Add test cases.

* [nomnigraph][redo] Concat elim for sparseNN

Same as D7962948, which was reverted because Operator Schema was not
defined

* [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN

Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads).

https://github.com/pytorch/pytorch/pull/7918/files

* [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size

enables nomnigraph and reduces codesize

* [Warmup] Allow both offline incremental training and online training

Change plan name on saving side and reading side to support both training type

This diff depends on D8128530 and D8168651.

* Revert D7802642: [Warmup] Allow both offline incremental training and online training

This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Add legacy grad logic to fix div op on old graphs.

Add legacy grad logic to fix div op on old graphs.

* Correctly propagate operator failures

Propagate errors from operators that throw exceptions and return false

* Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN

This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope

extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption().  And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope.

* [opt] hgdirsync wasn't enabled, merge diverged code

Here's the damage, P59732616 basically xplat was left behind but had
the change from assert to CAFFE_ENFORCE

* OMP parallelism over RoIs for RoIAlign op

Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on
the number of OMP threads set during startup.

PR: https://github.com/pytorch/pytorch/pull/8562

* Use int64_t for shape in FillOps

to avoid overflow of int32

* Implement Rotated RoIAlign op

Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086.
The idea is simple - orientation/angle is added as an RPN
anchor parameter and then the angle is further regressed similar to bbox
coords. There are some additional changes related to NMS and IoU, but besides
that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ.

RoIs are represented in [center_x, center_y, width, height, angle] format.
`angle` repre

* Rotated RoIAlign op CUDA forward implementation

CUDA forward impl for D8415490

* RoIAlignRotated op CUDA backward pass implementation

TSIA

* All remaining fixes to eliminate process_github.sh

Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py

remove skipIf(True, 'Fbcode') line from process_github.sh

replace sed of cpp file with #ifdef to control cudnnDestroy use

undo sync-time deletion of .gitattributes, remove process_github.sh

switch to using _utils._internal rather than try-import-except

This diff also fixes the open-source bug where rebuilds have

* Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package

* [easy] improve error log in adagrad op

as title

* re-allow use of thnn_h_path

This fixes cffi usage in OSS

* [4/4] [tum] paralyzing layerNorm for GPU full sync

as title

* add compile=False to pytorch tests, remove hack with pyc

* Add shape and type inference for RowWiseArgMax operator

See title

* Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally

# Problem

`MockHiveReader` uses `GlobalCounter` to limit `max_examples`.

GlobalCounter on server node collect local counts from worker nodes every 1 sec.

This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`.

# Plan

Given,
```
Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int

* [Caffe2] Fix FCGradient cost inference.  Prevent overflow in cost inference

FCGradient missed a factor 2 in the `num_outputs == 3` case.  Overflow was occurring with flop calculation for FC.  Changed types to `uint64_t` to prevent future problems.

* Fix binary ops with empty inputs

Fix binary ops with empty inputs

* Support the filling of input blob with provided data

as title for Biz Integrity case

* Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test.

* [c2][easy] improve pack ops error loggings

as desc.

* Add ShapeTypeInference for LpNorm operator

As desc

* Shard test_nn to reduce runtime for each test target

Closes https://github.com/pytorch/pytorch/pull/8793

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

* Change default caffe2_streams_per_gpu to 1

* Remove IN_SANDCASTLE from common.py and test_nn.py

We prefer to disable the failing tests through Sandcastle UI instead.

* Add a new class for an updated prof_dag.proto

This diff contains:
- An updated prof_dag.proto that contains blob profiles.
- A class to deserialize this information (serialization is in a follow up diff)
- Update to separate profiling information from NeuralNet (and use it as part of the class above).
- Unit tests

* Lambdarank for SparseNN

This diff adds a lambda_rank_layer for SparseNN.
 changes include
1) Adds support for multi sessions in c2 op
2) Adds support for two different loss functions in c2 op
3) Unit tests for op

* Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [easy] A few fixups to multithread predictor benchmark

(1) support perf on T6 server
(2) remove dead code

* fix a bug about the map size

as title

* Fix reduce sum on in-place case.

Fix reduce sum on in-place case.

* [Warmup] Reland reverted diff Allow both offline incremental training and online training

Closes https://github.com/pytorch/pytorch/pull/8827

fix net transform integration test. Allow offline and online trainer to coexist D7802642.

* Add StoreHandlerNotAvailableException

Add an exception for a store that is not available or has been
deleted.

* Use exception handling for fault tolerance, missing KV store

Remove status blobs to communication ops so that exceptions propagate on
failure.

* [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj

for simple bounded constrained optimization, incl non-negative box constraints.

* [GanH]: Adaptive Weighting with More Estimations

With implemented postivity optimization, we now learn adaptive weights with different
parameterizations.

This improves parameter estimation and training stability.

* Revert some changes for landing

* Remove AutoNoGIL in StorageSharing

* Temporarily disable net_tests

* Revert "[Caffe2] Force tensor inference checks to be triggered during testing"

This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4.

* Revert "Fix reduce sum on in-place case."

This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64.

* Revert "Revert "Fix reduce sum on in-place case.""

This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.
2018-06-26 14:55:48 -07:00
Peter Goldsborough
55757357b2
[C++ API] Better forward methods (#8739)
* Better forward methods in C++ API

capitalize error message in test_torch.test_flatten

Support for operator()

* Add operator() to Functional

* Get rid of SigmoidLinear

* Add BoundFunction to FunctionalImpl

* Remove macro from conv because it makes errors more nasty
2018-06-26 13:23:16 -07:00
gchanan
04440d2c57 Fix nonzero and tensor printing of n-dimensional empty tensors. (#8849) 2018-06-25 12:09:47 -04:00
cpuhrsch
46bff5d9ff Set MKL VML error mode to ignore (#8800) 2018-06-22 16:54:47 -04:00
Wei Yang
ce13ca235e added default lambd=0.5 for hardshrink (#8770)
* added default lambd=0.5 and tests

* lint
2018-06-22 09:52:55 -04:00
anderspapitto
48e90e3339 Build system changes (#8627)
* All changes needed to get rid of process_github.sh

* allow thnn_h_path
2018-06-20 17:45:26 -04:00
gchanan
b6af5d40bf
Some 0-sized dimension support, port catArray away from resizeLegacy. (#8666)
* Some 0-sized dimension support, port catArray away from resizeLegacy.

The goal of this PR is to port catArray away from resizeLegacy (so we can delete the legacy resize calls), but since catArray has some weird behavior because
we don't have arbitrary 0-sized dimension support, I made some effort to fix these both in one pass.

The major changes here are:
1) catArray uses the new resize API, no longer the old resizeLegacy API.
2) As 1) is the last usage of resizeLegacy, it is deleted.
3) If compiled with USE_TH_SIZE_ZERO_DIM, catArray will work and properly check shapes for n-dimensional empty tensors.
4) However, we retain the old behavior of "ignoring" size [0] tensors in catArray.  We previously allowed this because we didn't have n-dimensional empty tensors.
5) To get the above to work, we also add support for n-dimensional empty tensors for narrow and slice (ifdef USE_TH_SIZE_ZERO_DIM).
6) We change the stride formula for empty tensors to match NumPy; basically, we never multiply by 0 as the size, always at least 1, so the
   strides are monotonically increasing in the empty tensor case.
7) We print the size of empty tensors if size != [0]; this matches NumPy behavior (even in cases where the size could be inferred from the brackets.
8) For test purposes, we add torch._C._use_zero_size_dim() to add tests for the above.

* Fix flake8.

* Address review comments.
2018-06-20 13:26:08 -04:00
li-roy
cc6b046f48 Implement flatten function (#8578)
* Implement flatten function

* address comments

* allow start_dim=end_dim

* undo submodule change
2018-06-20 12:53:06 -04:00
li-roy
8e4fe5dcf4 Fix serialization for Parameters (#8633)
* Fix serialization for Parameters

* address comments

* addres comments
2018-06-19 22:11:13 -04:00
Peter Goldsborough
372d1d6735
Create ATen tensors via TensorOptions (#7869)
* Created TensorOptions

Storing the type in TensorOptions to solve the Variable problem

Created convenience creation functions for TensorOptions and added tests

Converted zeros to TensorOptions

Converted rand to TensorOptions

Fix codegen for TensorOptions and multiple arguments

Put TensorOptions convenience functions into torch namespace too

All factory functions except *_like support TensorOptions

Integrated with recent JIT changes

Support *_like functions

Fix in place modification

Some cleanups and fixes

Support sparse_coo_tensor

Fix bug in Type.cpp

Fix .empty calls in C++ API

Fix bug in Type.cpp

Trying to fix device placement

Make AutoGPU CPU compatible

Remove some auto_gpu.h uses

Fixing some headers

Fix some remaining CUDA/AutoGPU issues

Fix some AutoGPU uses

Fixes to dispatch_tensor_conversion

Reset version of new variables to zero

Implemented parsing device strings

Random fixes to tests

Self review cleanups

flake8

Undo changes to variable.{h,cpp} because they fail on gcc7.2

Add [cuda] tag to tensor_options_cuda.cpp

Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks

Fix linker error in AutoGPU.cpp

Fix bad merge conflict in native_functions.yaml

Fixed caffe2/contrib/aten

Fix new window functions added to TensorFactories.cpp

* Removed torch::TensorOptions

Added code to generate wrapper functions for factory methods

Add implicit constructor from Backend to TensorOptions

Remove Var() from C++ API and use torch:: functions

Use torch:: functions more subtly in C++ API

Make AutoGPU::set_device more exception safe

Check status directly in DynamicCUDAHooksInterface

Rename AutoGPU to DeviceGuard

Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad

remove python_default_init: self.type()

Add back original factory functions, but with deprecation warnings

Disable DeviceGuard for a couple functions in ATen

Remove print statement

Fix DeviceGuard construction from undefined tensor

Fixing CUDA device compiler issues

Moved as many methods as possible into header files

Dont generate python functions for deprecated factories

Remove merge conflict artefact

Fix tensor_options_cuda.cpp

Fix set_requires_grad not being checked

Fix tensor_new.h

TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac

Fix bug in DeviceGuard.h

Missing includes

TEMPORARILY moving a few more methods into .cpp to see if it fixes windows

Fixing linker errors

* Fix up SummaryOps to use new factories

Undo device agnostic behavior of DeviceGuard

Use -1 instead of optional for default device index

Also move DeviceGuard methods into header

Fixes around device index after optional -> int32_t switch

Fix use of DeviceGuard in new_with_tensor_copy

Fix tensor_options.cpp

* Fix Type::copy(

* Remove test_non_float_params from ONNX tests

* Set requires_grad=False in ONNX tests that use ints

* Put layout/dtype/device on Tensor

* Post merge fixes

* Change behavior of DeviceGuard to match AutoGPU

* Fix C++ API integration tests

* Fix flip functions
2018-06-16 00:40:35 -07:00