Commit Graph

999 Commits

Author SHA1 Message Date
CamiWilliams
329757a907 Torch.flatten() returns a 1-dim tensor on a 0-dim tensor (#25406)
Summary:
PR for `torch.flatten()` to return a 1-dim tensor on a 0-dim tensor

> torch.tensor(123).shape -> torch.Size([])
> torch.tensor(123).flatten() -> torch.tensor([123])
> torch.tensor(123).flatten().shape -> torch.Size([1])

resolve https://github.com/pytorch/pytorch/issues/22963
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25406

Differential Revision: D17120464

Pulled By: CamiWilliams

fbshipit-source-id: efbecd61f0aefd82f2ab417ca6bb467488ff99de
2019-08-30 08:53:08 -07:00
Brian Vaughan
f0c6021846 fix bug in assertNotEqual for int tensors (#25412)
Summary:
re-apply: https://github.com/pytorch/pytorch/pull/25199
but without a failing quantized test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25412

Differential Revision: D17131303

Pulled By: nairbv

fbshipit-source-id: edf7736af3ede5e809eded72be9514e922e70db4
2019-08-30 06:52:30 -07:00
Jerry Zhang
e231bd16fb Revert D17112656: [pytorch][PR] fix bug in assertNotEqual for int tensors
Test Plan: revert-hammer

Differential Revision:
D17112656

Original commit changeset: 43e0e7da6d58

fbshipit-source-id: 0a0f7b8b125f24a45023ddb46fe144f21499b723
2019-08-29 10:36:56 -07:00
Hong Xu
2e1c37c95c Move the CUDA implementation of ceil to ATen. (#24866)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24866

Fix #24542

Test Plan: Imported from OSS

Differential Revision: D16965903

Pulled By: VitalyFedyunin

fbshipit-source-id: b9decaa58bec813a23d369b5e1eec627599f41da
2019-08-29 08:48:31 -07:00
Brian Vaughan
1e2b19db6d fix bug in assertNotEqual for int tensors (#25199)
Summary:
assertNotEqual was failing to detect differences in int tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25199

Differential Revision: D17112656

Pulled By: nairbv

fbshipit-source-id: 43e0e7da6d58eb1c837a508d462a748b2065bdd9
2019-08-29 07:32:50 -07:00
Gregory Chanan
f362a5a04b Revert "Let logical_xor support non-bool tensors." (#25269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25269

This reverts commit 5ca612b55e.

Test Plan: Imported from OSS

Differential Revision: D17080088

fbshipit-source-id: e6b6215b713910c448e9a6b831b08f28b849c64a
2019-08-28 15:41:51 -07:00
SsnL
6100de9b1b implement bool_tensor.bernoulli_ (#25076)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25072
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25076

Differential Revision: D17073453

Pulled By: ezyang

fbshipit-source-id: 42410da8c9911c1d7b3543bde740c7e66ae0cc1c
2019-08-28 12:25:27 -07:00
Igor Fedan
afb7a162fb Migrate erfinv and erfinv_ from the TH to Aten (CUDA) (#24943)
Summary:
https://github.com/pytorch/pytorch/issues/24560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24943

Differential Revision: D16996434

Pulled By: ifedan

fbshipit-source-id: 77111a4e47bb2b20f65225d48e7213cd77ddae19
2019-08-28 09:30:08 -07:00
Pavel Belevich
112f249446 Port pow operator from the TH code to Aten (#23492)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/24750
```
DEBUG = 0
OMP_NUM_THREADS = 1

import torch

base = torch.randn(1000000)
exp  = torch.randn(1000000)
out  = torch.empty_like(base)

timeit base.pow(0)							+30x
old 6.26 ms ± 35.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 213 µs ± 3.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(1/3)						+6x
old 56 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.41 ms ± 237 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(-1/3)						+6x
old 57 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.49 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(1/2)						+6x
old 4.04 ms ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 620 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(-1/2)						+5x
old 6.56 ms ± 43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 1.24 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(1)							no diff
old 322 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
new 331 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(-1)							+3.5x
old 2.48 ms ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 717 µs ± 130 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(2)							no diff
old 328 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
new 324 µs ± 4.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(-2)							+3.5x
old 2.45 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 662 µs ± 3.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(3)							+7x
old 2.39 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 334 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(-3)							+9x
old 93.7 ms ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 10.3 ms ± 666 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(123456.789)					+5x
old 46.5 ms ± 418 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.68 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(-123456.789)				+5x
old 46.5 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 10 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(exp)						+6x
old 60.6 ms ± 4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.7 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(0, exp)					no diff
old 18.3 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 21.2 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

timeit torch.pow(1, exp)					+30x
old 6.01 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 203 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit torch.pow(-1, exp)					+3x
old 30.8 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.67 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(42, exp)					+8x
old 80.1 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.51 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(-42, exp)					+2x
old 21.8 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.5 ms ± 89.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(0, exp, out=out)			no diff
old 20.2 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 22.1 ms ± 648 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

timeit torch.pow(1, exp, out=out)			+30x
old 6.7 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 203 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit torch.pow(-1, exp, out=out)			+3x
old 32.5 ms ± 3.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.4 ms ± 99.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(42, exp, out=out)			+10x
old 91 ms ± 7.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.64 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(-42, exp, out=out)			+2.5x
old 25.9 ms ± 5.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 10.1 ms ± 698 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```

BC: enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.
BC: enforce stronger integer tensor base power integer exponent requirement on CPU and CUDA: `Integers to negative integer powers are not allowed.`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23492

Differential Revision: D16731583

Pulled By: pbelevich

fbshipit-source-id: 4e5bf689357fe82a19371e42d48abbb7b4c1c3ca
2019-08-28 09:11:50 -07:00
Igor Fedan
9b1097958e Migrate digamma\digamma_\polygamma\polygamma_ from the TH to Aten (CPU) (#25048)
Summary:
https://github.com/pytorch/pytorch/issues/24612
https://github.com/pytorch/pytorch/issues/24550
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25048

Differential Revision: D16996440

Pulled By: ifedan

fbshipit-source-id: 0d76588d179d4c932e3fc284cb399dcfc77bc622
2019-08-28 08:29:13 -07:00
Patrick Donnelly
883628cb5c Added documentation for nn.functional.bilinear (#24951)
Summary:
Adds documentation for `nn.functional.bilinear`, as requested in https://github.com/pytorch/pytorch/issues/9886.

The format follows that of `nn.functional.linear`, and borrows from `nn.bilinear` in its description of `Tensor` shapes.

I am happy to add more extensive documentation (e.g. "Args," "Example(s)"). From what I gather, the format of comments is inconsistent across functions in `nn.functional.py` and between modules (e.g. `nn.functional` and `nn`). It's my first PR, so guidance for contributing documentation and other code would be greatly appreciated!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24951

Differential Revision: D17091261

Pulled By: soumith

fbshipit-source-id: efe2ad764700dfd6f30eedc03de4e1cd0d10ac72
2019-08-28 08:19:25 -07:00
Funtowicz Morgan
2c22076342 Moving sign function to ATen (#22861)
Summary:
This PR linked to https://github.com/pytorch/pytorch/issues/22806 moving sign function to ATen.

sign(x) supports bool,  and vectorized operation on CPU.
sign(NaN) is defined to return 0.
sign(bool) is a no-op, the resulting tensor will holds the same values than the input one.

- [x] CPU Backend
- [x] CUDA Backend
- [x] Bring support for bool dtype
- [x] Bring support for Half dtype
- [x] Add test for NaN
- [x] Add test for bool dtype
- [x] Delete legacy implementation in THTensorMoreMath.cpp

Performances:
```python
timeit -s 'import torch; x = torch.randn((1000, 1000))' -n 1000 'torch.sign(x)'
timeit -s 'import torch; x = torch.randn((1000, 1000), device="cuda")' -n 1000 'torch.sign(x); torch.cuda.synchronize()'
```

| device |  before  | after |
| :-------------: | :-------------: | :-----: |
| CPU    | 1.24 msec | 33.9 usec |
| GPU    | 680 usec | 7.13 usec  |
| CPU (1 thread) | 0.82 msec | 0.73 msec |
| GPU (1 thread) | 16.1 used | 15.9 usec |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22861

Differential Revision: D16503452

Pulled By: VitalyFedyunin

fbshipit-source-id: a87ce7fff139642ef4ed791f15873074ad0d53af
2019-08-27 19:01:34 -07:00
Pavel Belevich
30bc65271d torch.from_numpy fix for np.int (#25139)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22615
Because of different sizeof(long) we have the following relations between NPY_TYPES and NPY_INTXX aliases:
```
int value	Enum			Unix		Windows
1		NPY_BYTE		NPY_INT8	NPY_INT8
3		NPY_SHORT		NPY_INT16	NPY_INT16
5		NPY_INT			NPY_INT32	-
7		NPY_LONG		NPY_INT64	NPY_INT32
9		NPY_LONGLONG		-		NPY_INT64
```
I suggest the following fix for `numpy_dtype_to_aten` method:
```
if (dtype == NPY_INT || dtype == NPY_INT32) {
	return kInt;
} else if (dtype == NPY_LONGLONG || dtype == NPY_INT64) {
	return kLong;
}
```
On Unix it will be replaced with:
Unix:
```
if (dtype == 5 || dtype == 5) {
	return kInt;
} else if (dtype == 9 || dtype == 7) {
	return kLong;
}
```
and on Windows with:
```
if (dtype == 5 || dtype == 7) {
	return kInt;
} else if (dtype == 9 || dtype == 9) {
	return kLong;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25139

Differential Revision: D17048443

Pulled By: pbelevich

fbshipit-source-id: 9f2c27ff2829b893a35d3d57f176a58e7749a468
2019-08-26 05:07:22 -07:00
Kexuan Sun
4b3ea92787 Test if descriptions of args are in the template (#24161)
Summary:
As in https://github.com/pytorch/pytorch/issues/23439, some descriptions of arguments in `_torch_docs.py` have been replaced by `common_args`, it would be helpful to check if any descriptions can be replaced for new docs in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24161

Differential Revision: D16889293

Pulled By: ezyang

fbshipit-source-id: bf6f581494482d6eb32e634f73e84a4586766230
2019-08-20 16:34:50 -07:00
Max Balandat
d33623f7c1 Make SobolEngine use random seed if not specified (#24884)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/24881. Makes behavior consistent with the rest of the random functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24884

Test Plan: Unit tests

Reviewed By: sdsingh

Differential Revision: D16912036

Pulled By: Balandat

fbshipit-source-id: eff00cca989926a5d9e20d8846a8674f7cd270cb
2019-08-20 09:22:41 -07:00
Pavel Belevich
6100205eb8 TensorIterator::binary_op input-output overlap check (#24058)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/8212

This fix is based on the idea that in-place ops(e.g. add_(...)) and out ops(e.g. tensor.add(..., out=...)) must check that the output tensor does not partially overlap with any of it's input tensors. Otherwise the result of such op is unexpected to the user. Since TensorIterator is a common backend for such ops and it's already used to check output self-overlapping, this fix is implemented in the same place.

MemOverlapStatus enum class is introduced to model two tensors overlapped state:

- TOO_HARD if at least one of them is not contiguous
- FULL if both are contiguous and share exactly the same memory array [data(), data() + numel() *itemsize()]
- PARTIAL is both are contiguous but underlying memory is shared partially, in other words memory arrays overlap but not identical.
- NO if both are contiguous but have independent non overlapping memory arrays

Performance test of clone/addcmul_/addcdiv_ with check_mem_overlaps:

a = torch.empty(10000000, device='cpu')
b = torch.randn(10000000, device='cpu')
timeit a.copy_(b)
master: 10.3 ms ± 429 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
branch: 10.2 ms ± 946 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

a = torch.empty(10000000, device='cuda')
b = torch.randn(10000000, device='cuda')
timeit a.copy_(b)
master: 373 µs ± 97.9 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
branch: 373 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

a = torch.randn(1000000, device='cpu')
b = torch.randn(1000000, device='cpu')
c = torch.randn(1000000, device='cpu')
timeit a.addcmul_(b, c)
master: 2.02 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
branch: 2.11 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

a = torch.randn(1000000, device='cuda')
b = torch.randn(1000000, device='cuda')
c = torch.randn(1000000, device='cuda')
timeit a.addcmul_(b, c)
master: 72.6 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	72.4 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(1000000, device='cpu')
b = torch.randn(1000000, device='cpu')
c = torch.randn(1000000, device='cpu')
timeit a.addcdiv_(b, c)
master: 2.19 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 1000 loop each)
branch:	1.97 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

a = torch.randn(1000000, device='cuda')
b = torch.randn(1000000, device='cuda')
c = torch.randn(1000000, device='cuda')
timeit a.addcdiv_(b, c)
master: 71.3 µs ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	71.7 µs ± 3.96 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.empty(100, device='cpu')
b = torch.randn(100, device='cpu')
timeit a.copy_(b)
master: 12.1 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
branch:	11.1 µs ± 61.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

a = torch.empty(100, device='cuda')
b = torch.randn(100, device='cuda')
timeit a.copy_(b)
master: 20.9 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	22.8 µs ± 2.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(100, device='cpu')
b = torch.randn(100, device='cpu')
c = torch.randn(100, device='cpu')
timeit a.addcmul_(b, c)
master: 24.1 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	24 µs ± 91.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(100, device='cuda')
b = torch.randn(100, device='cuda')
c = torch.randn(100, device='cuda')
timeit a.addcmul_(b, c)
master: 34.5 µs ± 4.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	29.8 µs ± 496 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(100, device='cpu')
b = torch.randn(100, device='cpu')
c = torch.randn(100, device='cpu')
timeit a.addcdiv_(b, c)
master: 21.3 µs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	23.8 µs ± 403 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(100, device='cuda')
b = torch.randn(100, device='cuda')
c = torch.randn(100, device='cuda')
timeit a.addcdiv_(b, c)
master: 30.3 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	31.8 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24058

Differential Revision: D16767892

Pulled By: pbelevich

fbshipit-source-id: 0cdaaa471d003a2886b1736f8985842226b8493a
2019-08-19 15:06:04 -07:00
Vishwak Srinivasan
4358cbe01b Allow torch.tril / triu to handle bool and half inputs (#24163)
Summary:
Changelog:
- Enable torch.tril / triu for bool and float16 dtypes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24163

Test Plan:
- Tests added in test_torch.py for all devices and dtypes (except bfloat16)

Fixes https://github.com/pytorch/pytorch/issues/24035

Differential Revision: D16793315

Pulled By: ezyang

fbshipit-source-id: 2bbc51ce567405a7cb2d8ab567eee6c2e40aa76a
2019-08-19 15:02:53 -07:00
vishwakftw
f849ebf1fe Enable torch.eye for bool and half (#24148)
Summary:
Changelog:
- Enable torch.eye for bool and float16 dtypes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24148

Test Plan:
- Tests added in test_torch.py for all available devices and dtypes (except torch.bfloat16)

Fixes https://github.com/pytorch/pytorch/issues/24088

Differential Revision: D16891048

Pulled By: ezyang

fbshipit-source-id: 3e86fe271bd434300c396e63f82c1a1f3adac2b4
2019-08-19 14:59:37 -07:00
Iurii Zdebskyi
eee3e92936 Enabled torch.mm and torch.mv for bfloat16
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24224

Test Plan: Imported from OSS

Differential Revision: D16779996

Pulled By: izdeby

fbshipit-source-id: c859d8945a564edfa3f8a1430f140ae30d484d19
2019-08-16 15:46:15 -07:00
Heungsub Hans Lee
e166811598 Documentation for Tensor.record_stream() (#24078)
Summary:
This patch writes documentation for `Tensor.record_stream()`, which is not a documented API currently. I've discussed publishing it with colesbury in https://github.com/pytorch/pytorch/issues/23729.

The documentation is based on [the introduction at `CUDACachingAllocator.cpp`](25d1496d58/c10/cuda/CUDACachingAllocator.cpp (L47-L50)). ~~I didn't explain full details of the life cycle of memory blocks or stream awareness of the allocator for the consistent level of details with other documentations.~~ I explained about the stream awareness in a note block.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24078

Differential Revision: D16743526

Pulled By: zou3519

fbshipit-source-id: 05819c3cc96733e2ba93c0a7c0ca06933acb22f3
2019-08-16 08:07:33 -07:00
Hong Xu
5ca612b55e Let logical_xor support non-bool tensors.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23978

Test Plan: Imported from OSS

Differential Revision: D16719299

Pulled By: gchanan

fbshipit-source-id: 2fe170be6090733e20410db7cf99266543299c58
2019-08-15 12:21:31 -07:00
Hong Xu
00e4870001 Let logical_not support non-bool tensors. (#23916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23916

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23916

Test Plan: Imported from OSS

Differential Revision: D16719300

Pulled By: gchanan

fbshipit-source-id: 5be6aeea9a38cc40ad59d0449d25a25f7dfa2787
2019-08-15 12:21:27 -07:00
Iurii Zdebskyi
52b4221bfa Enabled masked methods for bfloat16 (#24183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24183

-----------
Fix: Enabled masked select/scatter/fill  for BFloat16 on CPU
Test: via unit tests

Test Plan: Imported from OSS

Differential Revision: D16763461

Pulled By: izdeby

fbshipit-source-id: fe733635a2064e5a088a108ff77c2a1a1487a27c
2019-08-15 08:45:24 -07:00
Hong Xu
bc92ce9e07 Recommend logical_not() instead of bitwise_not() when applying sub and neg on bool tensors. (#23860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23860

Close #23836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23860

Test Plan: Imported from OSS

Differential Revision: D16678299

Pulled By: gchanan

fbshipit-source-id: b08e77f6a41c3994240849985caaff7c559d3f83
2019-08-15 08:40:29 -07:00
Hong Xu
338f9c860f Add logical_xor operator (#23847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23847

Related to #23836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23847

Test Plan: Imported from OSS

Differential Revision: D16678300

Pulled By: gchanan

fbshipit-source-id: 67020aca5830b6bec2f561105954e0a8c2ee37e0
2019-08-15 08:40:25 -07:00
Hong Xu
1f4c73618c Add logical_not operator. (#23839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23839

Close #23836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23839

Test Plan: Imported from OSS

Differential Revision: D16678301

Pulled By: gchanan

fbshipit-source-id: 54e7b3f3b04c577e239b88493247e1c036266774
2019-08-15 08:40:21 -07:00
Jie
064d156511 (#23574)
Summary:
Assert that there's no multiple written-to to a single memory location, which
caused corrupted output.
Fixed batched matrix trlu logic, which relies on the previous copy behavior to
support tensors with stride 0 at leading dimension.

This fixes the issue proposed at: https://github.com/pytorch/pytorch/issues/23063
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23574

Differential Revision: D16600717

Pulled By: ezyang

fbshipit-source-id: e41e14f03eccf97398b64ba43647110beb1529e6
2019-08-14 21:12:07 -07:00
Kexuan Sun
e2a6212912 Resolve unused variables in tests (#24075)
Summary:
Variables such as `device` and `sparse` in for loops should be used in tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24075

Differential Revision: D16763073

Pulled By: ezyang

fbshipit-source-id: 8735cbc8d9ed695db8489cfc949c895180a7b826
2019-08-14 21:02:52 -07:00
Vitaly Fedyunin
a5872a16a0 Rename torchtest.test_all_device_types to torchtest.for_all_device_types (#24337)
Summary:
Rename decorator to `for_all_device_types` as `test_` prefixed name recognized as test in some environments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24337

Differential Revision: D16806807

Pulled By: VitalyFedyunin

fbshipit-source-id: 3132366046e183329ba5838a4bc29441fdb5bd4e
2019-08-14 12:09:51 -07:00
Richard Zou
f996f8d61d Update tensor.view_names / tensor.names_ API (#23973)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23973

Without loss of generality, I describe the API for `tensor.view_names`.
`tensor.names_` has an analogous API.

`tensor.view_names(*names)` returns a view on tensor with named dims `names`.
`names` must be of length `tensor.dim()`; otherwise, if '*' is in `names`,
then it (known as the "glob") is expanded greedily to be equal to the
corresponding names from `tensor.names`.

For example,
```
>>> x = torch.empty(2, 3, 5, 7, names=('N', 'C', 'H', 'W'))
>>> x.view_names('*', 'height', 'width').names
('N', 'C', 'height', 'width')

>>> x.view_names('batch', '*', 'width').names
('batch', 'C', 'H', 'width')
```

tensor.view_names(**rename_map) returns a view on tensor that has
renamed dims as specified in the mapping `rename_map`.

For example,
```
>>> x = torch.empty(2, 3, 5, 7, names=('N', 'C', 'H', 'W'))
>>> x.view_names(W='width', H='height').names
('N', 'C', 'height', 'width')
```

These are different(!!!) from the C++ API, which only allows the
following:
- tensor.view_names(optional<DimnameList>)

C++ API parity for named tensors is not important right now; I am
punting that to the future.

Test Plan: - [namedtensor ci]

Differential Revision: D16710916

Pulled By: zou3519

fbshipit-source-id: 7cb8056c0fb4c97b04c3a2d1dd0f737e0a67ce34
2019-08-14 09:40:35 -07:00
Richard Zou
2fcdb3a1f3 Rename set_names -> view_names, set_names_ -> names_ (#23962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23962

This change should make the semantics clearer.

`tensor.names_(names)` sets tensor.names to be `names`.

`tensor.view_names(names)` returns a view of the tensor with names
`names`.

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D16710915

Pulled By: zou3519

fbshipit-source-id: c82fa9812624d03c86f7be84b0a460e3c047aaa0
2019-08-14 09:40:31 -07:00
Richard Zou
7030f2c623 Implement tensor.align_to(names), torch.align_tensors(*tensors) (#23804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23804

`output = tensor.align_to(names)` returns a view of `tensor` such that
`output.names = names`. Dimensions with the same names in `tensor` and
`output` have the same sizes; dimensions with new names have size 1.

The following must be true for this operation to succeed:
1) tensor.names must be a subsequence (not necessarily contiguous) of `names`
2) Aligning tensor.names to names must not change the absolute position from the
   right of any unnamed dimension.

In practice, these constraints mean that aligning cannot transpose
names.

Some examples:
- Tensor[C].align_to(C) -> Tensor[C]
- Tensor[N].align_to([N, C]) -> Tensor[N, C]
- Tensor[H, W].align_to([N, H, W, C]) -> Tensor[N, H, W, C]
- Tensor[None].align_to([N, None]) -> Tensor[N, None]
- Tensor[N].align_to([N, None None]) -> Tensor[N, None, None]

Examples of error cases:
- Tensor[W, H].align_to([N, H, W, C]) -> Error (not a subsequence)
- Tensor[None, H].align_to([None, H, W]) -> Error (would change the
absolute position from the right of a None dimension)

`torch.align_tensors(*tensors)` aligns the named dimensions of each
tensor according to the alignment rules so that they can be used in an
operation. More concretely, it aligns each tensor to the
longest names among the names of the tensors in `tensors`.

This allows users to emulate "broadcasting by names", which is one of
the things named tensors tries to enable. Here is an example:

```
imgs: Tensor[N, C, H, W]
scale: Tensor[N]

// Doesn't work because we do broadcasting by alignment by default
imgs * scale

// Does work
imgs, scale = torch.align_tensors(imgs, scale)
imas * scale
```

Future:
- Consider allowing broadcasting by names by default.

Test Plan:
- The diff looks pretty large but more than half of it is testing.
- new tests [namedtensor ci]

Differential Revision: D16657927

Pulled By: zou3519

fbshipit-source-id: e2f958bf5146c8ee3b694aba57d21b08e928a4e6
2019-08-14 09:40:27 -07:00
Richard Zou
98a3b3d565 Add name propagation for at::alias, add tensor.set_names (#24202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24202

tensor.set_names(names) is the out-of-place variant of
tensor.set_names_(names). This naming is probably confusing so I am
taking any and all suggestions.

Test Plan: - run tests [namedtensor ci]

Differential Revision: D16773014

Pulled By: zou3519

fbshipit-source-id: 61024303c1a34db631cc4cb2c53757345e40d72c
2019-08-13 17:01:18 -07:00
Iurii Zdebskyi
0ea8f22951 Enabled comparison ops for bfloat16 dtype on CPU (#24182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24182

-----
Fix: Enabled comparison operations for BFloat16 on CPU
Test: via unit tests

Test Plan: Imported from OSS

Differential Revision: D16763460

Pulled By: izdeby

fbshipit-source-id: 885ff9006d3bd60bb945147c3b86f97cd0d26f7b
2019-08-13 15:44:24 -07:00
Vitaly Fedyunin
6d14f7a214 Simplify tests that should cover all possible devices (#23824)
Summary:
This PR introduce `pytorchtest.test_all_device_types()` decorator which helps to write CPU, CUDA tests faster, iterating single test through all available devices

Simple `test_var_mean_some_dims`  becomes
```
test_var_mean_some_dims (__main__.TestTorch) ... ok
test_var_mean_some_dims_cpu (__main__.TestTorch) ... ok
test_var_mean_some_dims_cuda (__main__.TestTorch) ... ok
```

```python

class pytorchtest():
    """Allows to generate and run per-device unittests.

    This decorator class allows to generate and run per-device unittest.

    Example:

    class _TestTorchMixin(pytorchtest):

        pytorchtest.test_all_device_types()
        def test_zeros_like(self, device):
            expected = torch.zeros((100, 100,), device=device)

    Will execute:

        test_zeros_like (__main__.TestTorch) ... skipped 'Look at test_zeros_like_cpu, test_zeros_like_cuda results.'
        test_zeros_like_cpu (__main__.TestTorch) ... ok
        test_zeros_like_cuda (__main__.TestTorch) ... ok

    To work properly, test class should be inherited from the `pytorchtest`.
    test_all_device_types decorator does not guarantee proper functionality in
    combination with other decorators.

    Please do not extend this decorator to support other cases (such as dtype,
    layouts, etc) without consulting with bigger group. Devices is the special
    case as build flags control additions/removals (see
    https://github.com/pytorch/pytorch/pull/23824 for the reference).
    """
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23824

Differential Revision: D16716959

Pulled By: VitalyFedyunin

fbshipit-source-id: ba39af0f9bce2c4a64da421bbc24d6a1c1d9139d
2019-08-13 09:36:31 -07:00
Brian Vaughan
465b4de9d4 add function name to error messages generated by checked_tensor_unwrap (#24187)
Summary:
Improve error messages by showing the relevant function call that failed.

Before:
```
>>> torch.ones(1, dtype=torch.float) < torch.ones(1, dtype=torch.double)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Expected object of scalar type Float but got scalar type Double for argument https://github.com/pytorch/pytorch/issues/2 'other'
```

After:

```
>>> torch.ones(1, dtype=torch.float) < torch.ones(1, dtype=torch.double)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Expected object of scalar type Float but got scalar type Double for argument https://github.com/pytorch/pytorch/issues/2 'other' in call to _th_lt
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24187

Differential Revision: D16769167

Pulled By: nairbv

fbshipit-source-id: 4992eb4e86bdac2ab8805cc5356f7f92c63e1255
2019-08-12 14:02:22 -07:00
Richard Zou
75db368031 Revert D16763388: Add name propagation for at::alias, add tensor.set_names
Differential Revision:
D16763388

Original commit changeset: 4b2fb3acc051

fbshipit-source-id: 5be35bdcc2e7c71378af9e34be19305bdd4ba0d1
2019-08-12 13:42:43 -07:00
Richard Zou
1108fa1acb Add name propagation for at::alias, add tensor.set_names (#24105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24105

tensor.set_names(names) is the out-of-place variant of
tensor.set_names_(names). This naming is probably confusing so I am
taking any and all suggestions.

Test Plan: - run tests [namedtensor ci]

Differential Revision: D16763388

Pulled By: zou3519

fbshipit-source-id: 4b2fb3acc0514515e7ca805dbc5c3d4a9bd96317
2019-08-12 12:44:56 -07:00
Igor Fedan
1d3d92e770 Port addcdiv operator from the TH code to Aten
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24086

Differential Revision: D16733306

Pulled By: ifedan

fbshipit-source-id: c103bc44e0bb42dff0229252e1a12ce9b4e5aeae
2019-08-09 13:48:01 -07:00
Gregory Chanan
e81f296807 Fixed Bool in IsIntegralType bug (plus review comments) (#23942)
Summary:
Same as https://github.com/pytorch/pytorch/pull/23887, but also includes review comments, so we can kick off a build.

Original PR:
This [PR](https://github.com/pytorch/pytorch/pull/23346) caused [this](https://github.com/pytorch/pytorch/issues/23882) bug.

Fix:
- Deprecate old isIntegralType and add overload which takes a boolean flag which tells if torch.bool should be included in integral types or not.

Testing:
- Added extra test cases
- Tested via running unit tests locally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23942

Differential Revision: D16688056

Pulled By: gchanan

fbshipit-source-id: eff457e27b13e116c05ffd022b2fb0495abe0e97
2019-08-09 12:25:27 -07:00
Richard Zou
0bba302da5 Revert D16621830: Add name propagation for at::alias, add tensor.set_names
Differential Revision:
D16621830

Original commit changeset: f8a3837d3a37

fbshipit-source-id: 801ab858a0741d98b0b9d56763fa70a9010fe75e
2019-08-09 10:55:18 -07:00
Richard Zou
78f3b883f0 Add name propagation for at::alias, add tensor.set_names (#23624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23624

tensor.set_names(names) is the out-of-place variant of
tensor.set_names_(names). This naming is probably confusing so I am
taking any and all suggestions.

Test Plan:
- run tests [namedtensor ci]

gh-metadata: pytorch pytorch 23624 gh/zou3519/86/head

Differential Revision: D16621830

Pulled By: zou3519

fbshipit-source-id: f8a3837d3a370b41210e938369348dcbb4aee53a
2019-08-09 09:17:31 -07:00
Edward Yang
21ea0a115c Revert D16627924: [pytorch][PR] Port addcdiv operator from the TH code to Aten
Differential Revision:
D16627924

Original commit changeset: 960856d30fd3

fbshipit-source-id: a375a3ede5ef956a07fb55c7b4a5d4fc34c96ddb
2019-08-09 08:33:44 -07:00
Hong Xu
2e8557778b Refactor randperm test (#23526)
Summary:
CPU and CUDA testing code are largely the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23526

Reviewed By: ezyang

Differential Revision: D16586271

Pulled By: VitalyFedyunin

fbshipit-source-id: 91c70c05789120fde4718ce955de243087a8c993
2019-08-09 08:33:35 -07:00
Stefan Krah
478c793065 Remove numpy assert that fails on Windows (older numpy versions). (#24012)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24001.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24012

Differential Revision: D16732191

Pulled By: ezyang

fbshipit-source-id: 36660a6635ab64d2f63278b1616deb1282dea037
2019-08-09 07:55:02 -07:00
Igor Fedan
fb77f14054 Port addcdiv operator from the TH code to Aten (#23683)
Summary:
https://github.com/pytorch/pytorch/issues/22796
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23683

Differential Revision: D16627924

Pulled By: ifedan

fbshipit-source-id: 960856d30fd3f79394925eddd0152cc5e27b39b3
2019-08-09 07:44:57 -07:00
Igor Fedan
9114089d70 port atan2 from TH to ATen (#23558)
Summary:
https://github.com/pytorch/pytorch/issues/22799
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23558

Differential Revision: D16591638

Pulled By: ifedan

fbshipit-source-id: d12d4c8229337a22a3278f0c7a8bbc9a86d4c9b7
2019-08-09 07:44:53 -07:00
Iurii Zdebskyi
5b9f55f33f Enable Add, sub, mul, and div on CPU for bfloat16 type. (#22851)
Summary:
Enable Add, sub, mul, and div on CPU for bfloat16 type.
Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22851

Differential Revision: D16256757

Pulled By: izdeby

fbshipit-source-id: 8b62f7581fc0ca0d2cff48ab40d877a9fcf70a5b
2019-08-08 12:34:25 -07:00
Igor Fedan
341d5934b7 Move addcmul to Aten(CUDA) (#23814)
Summary:
https://github.com/pytorch/pytorch/issues/22797
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23814

Differential Revision: D16712381

Pulled By: ifedan

fbshipit-source-id: aeca4fdb9b10143932f195900b1f424ef6d26c89
2019-08-08 12:34:21 -07:00
Ojas Ahuja
10b1254edd fix crash on torch.Tensor.repeat() for 0 repeats (#23766)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/23603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23766

Differential Revision: D16644866

Pulled By: soumith

fbshipit-source-id: ee7d368afdfe874133d0bd90f4d03a191ee22b13
2019-08-07 09:16:00 -07:00
Hong Xu
f90afff3bd Recommend ~ and bitwise_not() when user tries to apply neg (-) on a bool tensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23621

Test Plan: Imported from OSS

Differential Revision: D16656729

Pulled By: ezyang

fbshipit-source-id: d107e8caa2ccfa6ff8a1bd8a31b4d79f142d68fb
2019-08-05 16:21:34 -07:00
Vitaly Fedyunin
6e4a83ab57 Channels last stored in tensor (#23391)
Summary:
Define 4D tensor as stored in channels last memory format, when dimensions order is NCHW and C-strides < W-strides < H-strides < N-strides (If size of any dimension is equal to 1, this dimension strides value is not taken into account).

Channels last contiguous tensor is channel last tensor which occupies contiguous memory block. So x.is_contiguous(memory_format=torch.channels_last) checks if tensor is channels last contiguous.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23391

Differential Revision: D16601414

Pulled By: VitalyFedyunin

fbshipit-source-id: 8d098e7eec2f00fb1d12261bc240b3645d4f5b73
2019-08-05 11:50:29 -07:00
Hong Xu
be7fe1ccb9 Add tests to ensure that both abs(0.0) and abs(-0.0) lead to 0.0 (#23701)
Summary:
As pointed out by colesbury in https://github.com/pytorch/pytorch/pull/23579#discussion_r309798987
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23701

Differential Revision: D16623781

Pulled By: mrshenli

fbshipit-source-id: f48a29499128b08d2ac8bc9e466f2326112ead94
2019-08-05 07:50:06 -07:00
Iurii Zdebskyi
19c675178f Updated docs and added deprecation warnings to acknowledge a bool tensor (#22261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22261
ghimport-source-id: 1611d62d056a04c0ad15ef662e594a3d206a78e2

Test Plan: Imported from OSS

Differential Revision: D16005990

Pulled By: izdeby

fbshipit-source-id: 2413824aa75a0755719e4df11acd21e6607e5a85
2019-08-05 07:42:34 -07:00
Xiang Gao
520982d1df Zero sized tensor support for repeat_interleave (#23717)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22753
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23717

Differential Revision: D16623598

Pulled By: mrshenli

fbshipit-source-id: 297a3274fb5a5b2fcc0c3ad601337d7eb29fdca2
2019-08-05 07:36:47 -07:00
Iurii Zdebskyi
865c7eea48 Changed tensor comparison return type from uint8 to bool (#21113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21113
ghimport-source-id: 9c4ba63457a72bfc41894387e0b01be3fd9a9baf

Test Plan: Imported from OSS

Differential Revision: D15552204

Pulled By: izdeby

fbshipit-source-id: a608213668649d058e22b510d7755cb99e7d0037
2019-08-01 07:54:53 -07:00
vishwakftw
5d130e4232 Allowing batching for det/logdet/slogdet operations (#22909)
Summary:
Changelog:
- Add batching for det / logdet / slogdet operations
- Update derivative computation to support batched inputs (and consequently batched outputs)
- Update docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22909

Test Plan:
- Add a `test_det_logdet_slogdet_batched` method in `test_torch.py` to test `torch.det`, `torch.logdet` and `torch.slogdet` on batched inputs. This relies on the correctness of `torch.det` on single matrices (tested by `test_det_logdet_slogdet`). A port of this test is added to `test_cuda.py`
- Add autograd tests for batched inputs

Differential Revision: D16580988

Pulled By: ezyang

fbshipit-source-id: b76c87212fbe621f42a847e3b809b5e60cfcdb7a
2019-07-31 10:01:32 -07:00
Richard Zou
c5482e33e9 Rename tensor.is_named to has_named, expose has_named to python.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23315

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23315 gh/zou3519/79/head

Imported from OSS

Differential Revision: D16494414

Pulled By: zou3519

fbshipit-source-id: d2d6beb45db9288e5df707b68b6046d783ca9f97
2019-07-31 07:14:07 -07:00
Tongzhou Wang
af638ad5d7 pin_memory should not copy on already pinned tensors (#23484)
Summary:
fixes https://github.com/pytorch/pytorch/issues/21076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23484

Differential Revision: D16546264

Pulled By: ezyang

fbshipit-source-id: 8058e0bbc6336751f36b884d71234feef498a982
2019-07-30 21:16:23 -07:00
Vitaly Fedyunin
401fbb0088 Port resize_as_ and clone from TH to Aten (#23027)
Summary:
API operators now routed to `at::native::resize_as_*_` and `at::native::clone` accordingly.
Internal `THTensor_(resizeAs)`, `THCTensor_(resizeAs)`, `THTensor_(newClone)` and `THCTensor_(newClone)` remains to support older TH code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23027

Differential Revision: D16362304

Pulled By: VitalyFedyunin

fbshipit-source-id: 4c1e8516da685f3fdea632ff791d143f27aeebeb
2019-07-30 10:40:27 -07:00
vishwakftw
b3a9a7a9b9 Rename gels to lstsq (#23460)
Summary:
Changelog:
- Rename `gels` to `lstsq`
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lstsq` under the name `gels` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23460

Test Plan: - All tests should pass to confirm that the patch is correct

Differential Revision: D16547834

Pulled By: colesbury

fbshipit-source-id: b3bdb8f4c5d14c7716c3d9528e40324cc544e496
2019-07-30 09:56:04 -07:00
Pavel Belevich
fd61cc9ebc Moved at::assert_no_internal_overlap to TensorIterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22917

Differential Revision: D16521429

Pulled By: pbelevich

fbshipit-source-id: 80ae583c6486d6948431b79e1452902bdf2cfbc3
2019-07-30 07:47:33 -07:00
Will Feng
7b081e5d1e Improve error message for changing tensor metadata after .data or .detach() (#23504)
Summary:
When a user tries to change metadata of a tensor created from `.data` or `.detach()`, we currently shows an error message "<function_name> is not allowed on Tensor created from .data or .detach()". However, this error message doesn't suggest what the right fix should look like. This PR improves the error message.

Closes https://github.com/pytorch/pytorch/issues/23393.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23504

Differential Revision: D16547415

Pulled By: yf225

fbshipit-source-id: 37f4a0385442e2b0966386fb14d3d938ecf4230c
2019-07-29 22:25:14 -07:00
Hong Xu
e366af7d87 Add TORCH_CHECK to disable sub for bool tensors (#23519)
Summary:
This resolves two issues in one shot:

- sub shouldn't be available for bool type.
- When sub is applied to an unsupported type, the current error messages
  shows "add_cpu/add_cuda is not implemented for [type]". They should be
  "sub_cpu/sub_cuda" instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23519

Differential Revision: D16548770

Pulled By: izdeby

fbshipit-source-id: fe404a2a97b8d11bd180ec41364bf8e68414fb15
2019-07-29 16:28:35 -07:00
Richard Zou
0dcb8755c8 Implement tensor.set_names_, tensor.names setter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23172

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23172 gh/zou3519/74/head

Imported from OSS

Differential Revision: D16494364

Pulled By: zou3519

fbshipit-source-id: 8d0e26b33346d4eadba30b2e76610f6d7be7c373
2019-07-26 08:50:49 -07:00
Pieter Noordhuis
2938299de1 Fix lint failure introduced in #23346
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23371

Differential Revision: D16489985

Pulled By: pietern

fbshipit-source-id: 914048563bbe7bf5ab897c6f12f4a1bb4bff30e1
2019-07-25 05:17:15 -07:00
Iurii Zdebskyi
cf0f3556f6 Enabled cumsum and cumprod for bool tensors (#23346)
Summary:
```
a = torch.tensor([[True, False, True],
                  [False, False, False],
                  [True, True, True]])

>>> torch.cumsum(a, 0)
tensor([[1, 0, 1],
        [1, 0, 1],
        [2, 1, 2]])

>>> torch.cumsum(a, 1)
tensor([[1, 1, 2],
        [0, 0, 0],
        [1, 2, 3]])
```

Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23346

Differential Revision: D16469393

Pulled By: izdeby

fbshipit-source-id: b55f3ca0588f9961a771def40f6ef58932021e1a
2019-07-24 18:16:01 -07:00
Iurii Zdebskyi
200cb836f1 Enabled 'add_cuda' for bool and fixed alpha scalar bug (#23044)
Summary:
Enabled 'add_cuda' for bool
Tested via unit tests

Fixed https://github.com/pytorch/pytorch/issues/22431 #22430
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23044

Differential Revision: D16368095

Pulled By: izdeby

fbshipit-source-id: 033d28095ff1c5df4078905c52782cf4cf9ed6b0
2019-07-24 10:31:34 -07:00
Johannes M Dieterich
4cd726c7b3 Update ROCm CI to python3.6 (#23088)
Summary:
Rehash of https://github.com/pytorch/pytorch/issues/22322 .

Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Added pattern match skip for anything but the ROCm CI compared to #223222 for the python find step in the PyTorch build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23088

Differential Revision: D16448261

Pulled By: bddppq

fbshipit-source-id: 69ece1a213418d9abf1444c496dce1c190ee07c8
2019-07-23 23:07:45 -07:00
Vishwak Srinivasan
0ab19d66ee Port lu_solve to ATen (#22379)
Summary:
Changelog:
- Port TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Remove TH/THC implementations
- Update doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22379

Test Plan: - Added new tests in test_torch.py (port to test_cuda.py exists)

Differential Revision: D16089645

Pulled By: zou3519

fbshipit-source-id: dc8561aadacacb23e80c375b4fec687df2b6bbc8
2019-07-23 19:11:35 -07:00
Kexuan Sun
45d3f495ef Add document of function torch.as_strided (#22842)
Summary:
Documentation of `torch.as_strided` and `Tensor.as_strided` is missing. As mentioned in https://github.com/pytorch/pytorch/issues/9886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22842

Differential Revision: D16254106

Pulled By: soumith

fbshipit-source-id: dee142483fb9ef7bea84bd44a970b6eccdcdc471
2019-07-23 06:06:00 -07:00
Jerry Zhang
3861520603 Verify flatten works for quantized Tensor (#23121)
Summary:
Added a test in `test_torch.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23121
ghstack-source-id: 86983227

Differential Revision: D16391409

fbshipit-source-id: 04e72b2f753a0a6ddbf58d55b794e443b18a2156
2019-07-22 18:34:25 -07:00
Iurii Zdebskyi
22f7c9e31b (#23105)
Summary:
Fixed a [bug](https://github.com/pytorch/pytorch/issues/22992) where passing result tensor into masked_select wont work with bool mask.
Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23105

Differential Revision: D16386676

Pulled By: izdeby

fbshipit-source-id: 93a1e9bfbc916c8a8eaa149a70a5553f3711f53e
2019-07-22 07:49:30 -07:00
vishwakftw
6dfecc7e01 Remove deprecated linear algebra functions (and methods) (#22841)
Summary:
Changelog:
- Removed the following linear algebra functions in PyTorch in favor of the renamed operations
  - `btrifact` (use `lu` instead)
  - `btrifact_with_info` (use `lu` with `get_infos=True` instead)
  - `btrisolve` (use `lu_solve` instead)
  - `btriunpack` (use `lu_unpack` instead)
  - `gesv` (use `solve` instead)
  - `pstrf` (use `cholesky` instead)
  - `potrf` (use `cholesky` instead)
  - `potri` (use `cholesky_inverse` instead)
  - `potrs` (use `cholesky_solve` instead)
  - `trtrs` (use `triangular_solve` instead)

- Removed dead code after the removal of `pstrf`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22841

Test Plan:
- All existing tests should pass to verify that the removal is clean

Closes https://github.com/pytorch/pytorch/issues/22832

Differential Revision: D16346184

Pulled By: zou3519

fbshipit-source-id: f748d16ed7609c028de6adcbc28684d5a1af0678
2019-07-19 11:43:06 -07:00
Junjie Bai
eb76b7a564 Revert D16199862: [pytorch][PR] [ROCm] Update ROCm CI to python3.6
Differential Revision:
D16199862

Original commit changeset: 46ca6029a232

fbshipit-source-id: 2843b919f2655674e39dc764053621994061a12b
2019-07-17 14:26:56 -07:00
iotamudelta
031b406c38 Update ROCm CI to python3.6 (#22322)
Summary:
Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Open tasks/questions:
* RoiAlignTest.CheckCPUGPUEqual fails in the Caffe2 unit tests. Is this something expects / can be skipped?
* for testing, I've used update-alternatives on CentOS/Ubuntu to select python == python 3.6. Is this the preferred way?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22322

Differential Revision: D16199862

Pulled By: ezyang

fbshipit-source-id: 46ca6029a232f7d23f3fdb5efc33ae39a379fca8
2019-07-17 13:42:30 -07:00
Pavel Belevich
bcfa023a00 hardshrink_cpu and hardshrink_backward_cpu refactoring with at::native::cpu_kernel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22459

Differential Revision: D16132625

Pulled By: pbelevich

fbshipit-source-id: d7eb1cd6ed04eba3d0c54feaca1e5ab2836211b5
2019-07-16 18:58:35 -07:00
vishwakftw
f8ad65adb1 Fix torch.triu / torch.tril on contiguous tensors with non-default st… (#22730)
Summary:
…rides

Changelog:
- Fix behavior of `torch.triu` / `torch.tril` on certain unsqueezed tensors that lead to uninitialized values on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22730

Test Plan:
- Add tests for these cases in test_triu_tril in test_torch

Fixes https://github.com/pytorch/pytorch/issues/22581

Differential Revision: D16222897

Pulled By: zou3519

fbshipit-source-id: b86b060187797e5cd2a7731421dff1ba2b5c9596
2019-07-16 10:09:03 -07:00
vishwakftw
7d055c21b3 Port SVD to ATen, enable batching for matrix inputs (#21588)
Summary:
Changelog:
- Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Allow batches of matrices as arguments to `torch.svd`
- Remove existing implementations in TH and THC
- Update doc string
- Update derivatives to support batching
- Modify nuclear norm implementation to use at::svd instead of _batch_svd
- Remove _batch_svd as it is redundant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588

Test Plan:
- Add new test suite for SVD in test_torch.py with port to test_cuda.py
- Add tests in common_methods_invocations.py for derivative testing

Differential Revision: D16266115

Pulled By: nairbv

fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb
2019-07-15 13:34:01 -07:00
Iurii Zdebskyi
bd88fd0793 Added .bfloat16() (#22852)
Summary:
Add conversion method for bfloat16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22852

Differential Revision: D16256760

Pulled By: izdeby

fbshipit-source-id: 01d75495f9df513a0cdf78791c3eb013ab92bd95
2019-07-15 09:32:18 -07:00
shihongzhi
45cf33a731 add fill_diagonal function (#21892)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21796
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21892

Differential Revision: D16164678

Pulled By: colesbury

fbshipit-source-id: 85df8ae9b7a6a91b6023fe7295b3a8124e4526ea
2019-07-11 09:20:44 -07:00
Sam Gross
4240220926 Revert D16183577: Delegate Python ~ (invert operator) to Tensor.bitwise_not().
Differential Revision:
D16183577

Original commit changeset: f86838c407db

fbshipit-source-id: bbf53ce52a20b1e90b1fe522d73e558d8044c4ba
2019-07-10 18:29:22 -07:00
Iurii Zdebskyi
fffa7200c1 fixing lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22703

Differential Revision: D16188326

Pulled By: izdeby

fbshipit-source-id: 72e6b6f957068c3995010a1b811f24cd2304ff6f
2019-07-10 16:02:21 -07:00
Hong Xu
7750cae722 Refactor and improve randperm tests.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22121

Test Plan: Imported from OSS

Differential Revision: D16153794

Pulled By: li-roy

fbshipit-source-id: 4dbfa6cfcc79f6d431918a6646664215fa9ea0b9
2019-07-10 12:23:33 -07:00
Hong Xu
0f7c3710dd Support Half type in randperm.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22102

Test Plan: Imported from OSS

Differential Revision: D16153586

Pulled By: li-roy

fbshipit-source-id: d58e3dbc5da893005f4eaf521a28b0d752274eff
2019-07-10 12:23:25 -07:00
Hong Xu
9c4c9c3af0 Delegate Python ~ (invert operator) to Tensor.bitwise_not().
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22326

Test Plan: Imported from OSS

Differential Revision: D16183577

Pulled By: colesbury

fbshipit-source-id: f86838c407db4ded9ce70998bf1ab1ffd75b3b58
2019-07-10 12:17:52 -07:00
Hong Xu
574e808680 Add a bitwise NOT operator for integer and Boolean types (CUDA).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22320

Test Plan: Imported from OSS

Differential Revision: D16183578

Pulled By: colesbury

fbshipit-source-id: 2f72cce5e10fd637be1ac87e1bbfe0937a661034
2019-07-10 12:17:48 -07:00
Hong Xu
e2dc1fc715 Add a bitwise NOT operator for integer and Boolean types (CPU).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22283

Test Plan: Imported from OSS

Differential Revision: D16183576

Pulled By: colesbury

fbshipit-source-id: 2e539fab8ff885dddb9bff334d1d784b28d65b8f
2019-07-10 12:17:44 -07:00
Iurii Zdebskyi
10c60b601a Added Bfloat16 tensor for cpu with very limited support (#21860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21860
ghimport-source-id: 5290755b63033cdfdeb911a4ecf4aa282b3db02d

Test Plan: Imported from OSS

Differential Revision: D15856091

Pulled By: izdeby

fbshipit-source-id: 54e7e17be1b5c5a2e80a41feaeaeba75dbb8108f
2019-07-10 09:08:52 -07:00
Iurii Zdebskyi
3a8d7463bd Enabled BFloat16 storage (#21523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21523
ghimport-source-id: 698b3cbd6b21c09b9ff8bf8011980df8e35c33b0

Test Plan: Imported from OSS

Differential Revision: D15819368

Pulled By: izdeby

fbshipit-source-id: f6b3bba7b3ca8ee677bd80a231dbb3920c07d61c
2019-07-09 21:51:06 -07:00
Brandon Amos
046c4589df lu: When not using pivoting, return the identity permutation instead of zeros (#22242)
Summary:
Some of my qpth users have told me that updating to the latest version of PyTorch and replacing the btrifact/btrisolve calls with the LU ones wasn't working and I didn't believe them until I tried it myself :)

These updates have broken unpivoted LU factorizations/solves on CUDA. The LU factorization code used to return the identity permutation when pivoting wasn't used but now returns all zeros as the pivots. This PR reverts it back to return the identity permutation. I've not yet tested this code as I'm having some trouble compiling PyTorch with this and am hitting https://github.com/pytorch/pytorch/issues/21700 and am not sure how to disable that option.

Here's a MWE to reproduce the broken behavior, and my fix.

```python
torch.manual_seed(0)

n = 4
L = torch.randn(n,n)
A = L.mm(L.t()).unsqueeze(0)
b = torch.randn(1, n)

A_lu_cpu = torch.lu(A)
A_lu_cuda_nopivot = torch.lu(A.cuda(), pivot=False)
A_lu_cuda_pivot = torch.lu(A.cuda(), pivot=True)
print('A_lu_cuda_nopivot\n', A_lu_cuda_nopivot)
print('-----\nA_lu_cuda_pivot\n', A_lu_cuda_nopivot)

x_cpu = b.lu_solve(*A_lu_cpu)
x_cuda_nopivot = b.cuda().lu_solve(*A_lu_cuda_nopivot)
x_cuda_nopivot_fixed = b.cuda().lu_solve(
    A_lu_cuda_nopivot[0], torch.arange(1, n+1, device='cuda:0').int())
x_cuda_pivot = b.cuda().lu_solve(*A_lu_cuda_pivot)

print(x_cpu, x_cuda_nopivot, x_cuda_nopivot_fixed, x_cuda_pivot)
```

Output:

```
A_lu_cuda_nopivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

-----

A_lu_cuda_pivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

(tensor([[-0.3121, -0.1673, -0.4450, -0.2483]]),
 tensor([[-0.1661, -0.1875, -0.5694, -0.4772]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22242

Differential Revision: D16049334

Pulled By: ezyang

fbshipit-source-id: 7eacae810d87ffbdf8e07159bbbc03866dd9979d
2019-07-09 11:16:50 -07:00
Iurii Zdebskyi
304552b61a Enabled masked fill and scatter for bool tensors (#22491)
Summary:
Enabled masked_fill, masked_fill_, masked_scatter_, masked_scatter on bool tensors.
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22491

Differential Revision: D16108817

Pulled By: izdeby

fbshipit-source-id: 8b1808f41d7a4f65fe6d3797a3c83b2dac3446c7
2019-07-08 10:49:40 -07:00
vishwakftw
9bafe5d4da Fix torch.normal with CUDA tensors (#22533)
Summary:
`addcmul_out` overwrote the samples, which led to constant values being output by `torch.normal`.

Changelog:
- Replace the `addcmul_out` calls with combo of inplace `mul` and `add` and justification for this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22533

Test Plan:
- Enable tests for test_normal on all devices

Fixes https://github.com/pytorch/pytorch/issues/22529

Differential Revision: D16141337

Pulled By: ezyang

fbshipit-source-id: 567a399042e0adcd154582f362318ce95a244c62
2019-07-08 08:27:38 -07:00
Brennan Vincent
e210c65097 Add torch.where overload with only condition argument (#21986)
Summary:
Requested in https://github.com/pytorch/pytorch/issues/21798
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21986

Differential Revision: D16081577

Pulled By: zhangguanheng66

fbshipit-source-id: 658c0f451b833aceb1a41ee424c7990eec00bc02
2019-07-02 18:18:15 -07:00
Brennan Vincent
dcd902bdde provide "size" parameter in torch.normal when called with two floats (#20545)
Summary:
This has been requested in https://github.com/pytorch/pytorch/issues/20323

(It is still not exactly the same as NumPy, which allows you to pass tensors at mean/std and broadcast them with size, but the present PR is extremely simple and does the main thing people are asking for)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20545

Differential Revision: D15358736

Pulled By: zhangguanheng66

fbshipit-source-id: 762ea5eab5b8667afbac2df0137df017ba6e413c
2019-07-02 18:18:11 -07:00
iurii zdebskyi
a54acd3755 Update the way boolean tensor are being printed (#22238)
Summary:
In case when the boolean tensor gets printed out, no need to specify the dtype.

Example:
```
>> x = torch.tensor([[True, True, True], [True, True, True]])
>> print(x)
tensor([[True, True, True],
        [True, True, True]])

>> x = torch.tensor([True])
>> print(x)
tensor([True])

>> x = torch.tensor(True)
>> print(x)
tensor(True)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22238

Differential Revision: D15996304

Pulled By: izdeby

fbshipit-source-id: 5699acf3e00abca8a2bbb5384f8271eeb063dce7
2019-07-01 18:04:42 -07:00
Roy Li
9c8f9f0ecb Remove many usages of Type (#21941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21941
ghimport-source-id: f20cca6229daba9eb8652adb3d959266ae081ef1

Test Plan: Imported from OSS

Differential Revision: D15893331

Pulled By: li-roy

fbshipit-source-id: c988b16008ff0e2725a88c6025afd4aabdaca45a
2019-06-30 04:11:28 -07:00
Hong Xu
e259894e83 Test raising TypeError in torch.from_numpy() (#21607)
Summary:
With some additional cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21607

Differential Revision: D16046063

Pulled By: li-roy

fbshipit-source-id: 15256a0e94afea39db3cb581c546c2a18a8a7fda
2019-06-27 23:54:47 -07:00
iurii zdebskyi
59c42595e0 Enabled gather and scatter for bool tensor (#21924)
Summary:
- moving stuff around in order to enable bool.
- Added implementation of atomicAdd(bool, bool)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21924

Differential Revision: D15883711

Pulled By: izdeby

fbshipit-source-id: 733f35c2bc3d87cec9f9687d72b62d2d2cd7c03e
2019-06-27 09:07:50 -07:00
Vitaly Fedyunin
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
Ailing Zhang
e8bc992b03 print device when it's not on default device (#22094)
Summary:
we used to not print device when it's on xla. It's sometimes confusing as it looks the same as cpu tensor...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22094

Differential Revision: D15975405

Pulled By: ailzhang

fbshipit-source-id: f19ceb9e26f5f2f6e7d659de12716f0dfe065f42
2019-06-25 20:28:50 -07:00
vishwakftw
bcb5fd8f06 Port symeig to ATen and enable batching of inputs (#21858)
Summary:
Changelog:
- Port `symeig` from TH/THC to ATen
- Enable batching of matrix inputs for `symeig`
- Modify derivative computation based on batching
- Update docs to reflect the change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21858

Test Plan: - Added additional tests in `test_torch.py` (with a port to `test_cuda.py`) and `common_methods_invocations.py` to test if both the port and batching work.

Differential Revision: D15981789

Pulled By: soumith

fbshipit-source-id: ab9af8361f8608db42318aabc8421bd99a1ca7ae
2019-06-25 12:13:27 -07:00
Ailing
c68119387d serialize torch.Size object (#20952)
Summary:
fixes https://github.com/pytorch/pytorch/issues/20823
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20952

Differential Revision: D15514274

Pulled By: ailzhang

fbshipit-source-id: 8340a40fadfd06063f7f33b0d99d693e74d5defb
2019-06-25 10:44:35 -07:00
iurii zdebskyi
ede08492e1 Enabled mul for bool tensors on CUDA (#21771)
Summary:
Enable mul_cuda for bool tensors.
This is a helper PR to fix a [test failure](https://circleci.com/gh/pytorch/pytorch/1992191?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) in the other [PR](https://github.com/pytorch/pytorch/pull/21113).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21771

Differential Revision: D15883737

Pulled By: izdeby

fbshipit-source-id: 4c39644bbe8e80da4d14570862589944285d4bfe
2019-06-24 18:37:29 -07:00
Jerry Zhang
91bf0a9f9d Move quantized tensor tests in test_torch.py to test_quantized_tensor.py (#22089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22089

att

Reviewed By: jianyuh

Differential Revision: D15950101

fbshipit-source-id: 70acdeeef3a05201d72f986d5a0005832efd75ff
2019-06-21 22:48:34 -07:00
Jerry Zhang
3838324539 Add max/min/argmax/argmin/sort/argsort for quantized Tensor (#21546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21546

Added following methods for QTensor:
- max, min
- argmax, argmin
- sort, argsort

Reviewed By: dzhulgakov

Differential Revision: D15718117

fbshipit-source-id: 746b978d5722cb75e216fc65585bf206d45a7969
2019-06-20 21:00:03 -07:00
Jerry Zhang
88921feafd change return type for q_scale and q_zero_point (#21709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21709

Change the return type from Scalar to double/int64_t so we don't need to do conversion when we call other quantize related aten functions

Differential Revision: D15793003

fbshipit-source-id: 510936c69fa17a4d67340a31ebb03415647feb04
2019-06-20 20:30:39 -07:00
Igor Fedan
abd6cffe55 Added some extra tests for std_mean and var_mean for multiple dims. (#20650)
Summary:
Added some extra tests for std_mean and var_mean for multiple dims.
Some refactoring of previously created tests based on PR comments: https://github.com/pytorch/pytorch/pull/18731
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20650

Differential Revision: D15396101

Pulled By: ifedan

fbshipit-source-id: d15c3c2c7084a24d6cfea4018173552fcc9c03a9
2019-06-18 20:36:32 -07:00
Jerry Zhang
fa5263af2c Add set_quantizer_ for QTensor (#21852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21852

To enable change of q_scale and q_zero_point in `copy_`

Differential Revision: D15793427

fbshipit-source-id: a7040b5b956d161fd6af6176287f4a4aa877c9be
2019-06-18 19:50:12 -07:00
Iurii Zdebskyi
c0f51142cd Added a test case for an index error for the index_copy_ (#21912)
Summary:
Follow up PR with a test for the [fixed](4b45f08f87) [bug](https://github.com/pytorch/pytorch/issues/20322).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21912

Differential Revision: D15878674

Pulled By: izdeby

fbshipit-source-id: c8fef2214606c796d174d0faaaf633531a7bea88
2019-06-18 13:43:56 -07:00
Jerry Zhang
94f903654c Add qscheme() method (#20608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608

Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc.

Reviewed By: zafartahirov

Differential Revision: D15364354

fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7
2019-06-14 16:29:29 -07:00
Stefan Krah
710821875a Fix flaky nuclear_norm() test (#21638)
Summary:
Try to fix a sporadic failure on some CIs.

I've run this test hundreds of times on my machine (GeForce 1060, MAGMA) but I cannot reproduce this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21638

Differential Revision: D15827779

Pulled By: ezyang

fbshipit-source-id: 3586075e48907b3b84a101c560a34cc733514a02
2019-06-14 11:40:03 -07:00
LeviViana
deb2140c6e Throwing errors for min and max reductions in empty CUDA tensors (#19612)
Summary:
Related to https://github.com/pytorch/pytorch/issues/17750.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19612

Differential Revision: D15813649

Pulled By: gchanan

fbshipit-source-id: aa3dc34dd1e6d8bb24fa4c18891204108759bb35
2019-06-14 08:34:30 -07:00
Sungmann Cho
f59581218f Fix spelling errors (#21665)
Summary:
alloctor -> allocator
excutable -> executable
excution -> execution
foward -> forward
initiaize -> initialize
paralell -> parallel
preprocesor -> preprocessor
tranpose -> transpose
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665

Differential Revision: D15806155

Pulled By: soumith

fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c
2019-06-13 15:21:55 -07:00
vishwakftw
4c03ac7ac4 Allow batch sizes > 65535 for inverse, solve, cholesky_solve and tria… (#21689)
Summary:
…ngular_solve

Changelog:
- Iterate over mini batches of 65535 matrices (maximum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21689

Differential Revision: D15800254

Pulled By: soumith

fbshipit-source-id: c743ff13f1ba25d26874429d44e41a3c0ed21d6a
2019-06-12 23:30:19 -07:00
Brennan Vincent
699de487db numerical integration "trapz" function. (#21610)
Summary:
This is intended to match [numpy.trapz](https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html): numerical integration based on the trapezoid rule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21610

Differential Revision: D15747618

Pulled By: umanwizard

fbshipit-source-id: 8eadb2e75c9877b07592d875ca0b2cca6cb72297
2019-06-12 15:30:13 -07:00
Syed Tousif Ahmed
ae342fd076 Refactor Random Number Generators in ATen (#21364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21364
ghimport-source-id: ca7d37e10190ba46dc8512f437404ca9216d3369

Differential Revision: D15696497

Pulled By: ezyang

fbshipit-source-id: 2e713b8566ae915e175b5a79ac1dd9b86cc2a23d
2019-06-12 13:01:30 -07:00
vishwakftw
9737b166a4 Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes https://github.com/pytorch/pytorch/issues/21257, fixes https://github.com/pytorch/pytorch/issues/21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15761029

Pulled By: colesbury

fbshipit-source-id: 2aeb51e2d3cfdb8356806a7d5b12d4b9910e37fb
2019-06-11 15:18:17 -07:00
Stefan Krah
8b9b215dc5 Add a 'dim' argument to nuclear norm (#21022)
Summary:
Addresses #18275.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21022

Differential Revision: D15743515

Pulled By: ezyang

fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3
2019-06-10 15:18:34 -07:00
Sam Gross
5a48642fde Revert D15717575: [pytorch][PR] Fix bug in multinomial_alias_draw
Differential Revision:
D15717575

Original commit changeset: b1154e226d42

fbshipit-source-id: 305ca010bfda88c9295c52e0626d867452c72f84
2019-06-10 13:28:11 -07:00
vishwakftw
bb1dbdb99b Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes #21257, fixes #21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15717575

Pulled By: ezyang

fbshipit-source-id: b1154e226d426c0d412d360c15f7c64aec95d101
2019-06-10 12:05:48 -07:00
Brennan Vincent
74828be4a7 fix segfault in cat on CPU with tensors that can't be indexed with 32-bit ints. (#21530)
Summary:
Should be self-explanatory. This `int` variable is overflowing.

Reported in #21526
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21530

Differential Revision: D15719275

Pulled By: umanwizard

fbshipit-source-id: 24e917a00a5b78bc3af29ef3b8b72eea7e89d5d5
2019-06-09 15:28:06 -07:00
Gregory Chanan
b7f5d1e4c6 Fix size of histc return on CPU when input is 0-dimensional and bins=1. (#21497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21497
ghimport-source-id: bc03f27408aa772f78d5351afe404b5e91a7c4ce

Differential Revision: D15715478

Pulled By: gchanan

fbshipit-source-id: 90e1b65249b4b12f936ee8877cc0bc5a972d9ceb
2019-06-07 11:23:46 -07:00
Brennan Vincent
0a3fb45d3d allow passing Python built-in types as dtypes (#21215)
Summary:
Another simple bit of syntax that NumPy supports and we don't.

Support int, float, and bool.

```python
>>> torch.randn((2,3), dtype=float)
tensor([[-0.1752, -0.3240, -0.6148],
        [ 0.1861,  1.6472,  0.1687]], dtype=torch.float64)
```

A bit confusingly, Python's "float" actually means double, but nothing we can do about that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21215

Differential Revision: D15697012

Pulled By: umanwizard

fbshipit-source-id: 9a38d960a610b8e67023486b0c9265edd3c22246
2019-06-06 13:17:23 -07:00
Brennan Vincent
f4f32cecfd numpy like nonzero (called nonzero_tuple) (#20293)
Summary:
No performance degradation compared to Numpy when indexing:

```
In [15]: x=torch.randn((1000,1000))

In [16]: %timeit x[x.nonzero_tuple()]
4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: y=x.numpy()

In [18]: %timeit y[y.nonzero()]
14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]: x=x.t()

In [22]: %timeit x[x.nonzero_tuple()]
9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [24]: y=x.numpy()

In [25]: %timeit y[y.nonzero()]
16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293

Differential Revision: D15358754

Pulled By: umanwizard

fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1
2019-06-06 12:50:59 -07:00
Iurii Zdebskyi
2e37ab85af Enable bool support for several index methods (#21435)
Summary:
Enable bool tensors for these index methods:
- index_select
- index_copy
- put
- take
- index_fill

Tested via unit tests

TODO:
Enable index_add in a separate PR as it requires more "side" changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21435

Differential Revision: D15684964

Pulled By: izdeby

fbshipit-source-id: 48440e4d44873d70c4577e017dd0d8977e0fa15a
2019-06-06 12:14:01 -07:00
Iurii Zdebskyi
f1adddd1c6 Updated sum() logic to properly deal with bool tensor (#21421)
Summary:
`torch.tensor([True, False, True], dtype=torch.bool).sum()` should return **2** instead of **True** as it does now.

Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21421

Differential Revision: D15674203

Pulled By: izdeby

fbshipit-source-id: b00e3d0ca809c9b92b750adc05632522dad50c74
2019-06-06 12:02:23 -07:00
Hong Xu
f891b4338a Test the exceptions raised by isfinite and isinf (#21168)
Summary:
Following up ef1fdc27a3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21168

Differential Revision: D15696615

Pulled By: ezyang

fbshipit-source-id: 46904974ef3c4cb87c7a1d06871bf01543e61ef2
2019-06-06 10:30:26 -07:00
Iurii Zdebskyi
03617574d3 Сhange type of a tensor with bools (#19097)
Summary:
**This is **bc-breaking** change**
Change dtype of a tensor which was created from bool data.
Old behavior: torch.tensor([True, False]) -> uint8 tensor
Now: torch.tensor([True, False]) -> bool tensor

Tested via tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097

Reviewed By: ezyang

Differential Revision: D15632553

Pulled By: izdeby

fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3
2019-06-05 10:19:27 -07:00
Brennan Vincent
e268fc97c3 Re-add Tensor.T (#21175)
Summary:
Something flaky is going on with `test_inplace_view_saved_output` on Windows.

With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted.

Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output...

I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing.

**Test Plan:** I will wait to make sure the Windows CI suite passes before landing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175

Differential Revision: D15566970

Pulled By: umanwizard

fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3
2019-06-04 17:38:25 -07:00
Igor Fedan
d348d6405c cdist: pairwise distances between two sets of tensors with batch mode (#20934)
Summary:
Batch implementation for cdist function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20934

Differential Revision: D15609458

Pulled By: ifedan

fbshipit-source-id: 31c12e120d168baec6a6af913f599838a44034d7
2019-06-04 15:52:52 -07:00
Natalia Gimelshein
ad971a37d0 Improve performance of advanced indexing backward (#20557)
Summary:
This PR improves performance of advanced indexing backward, partially solving #15245 (performance is still worse than gather, but not by such outrageous margins). Before, using benchmarking harness from #15245, cuda 10/V100:
```
Indexing is faster by at most -270.61607820767887 us on N: 16 D: 256 K: 1
Indexing is slower by at most 11127.466280784833 us on N: 16 D: 4096 K: 4096
```
after:
```
Indexing is faster by at most 23.524456737696028 us on N: 512 D: 4096 K: 4096
Indexing is slower by at most 186.24056029472553 us on N: 16 D: 1024 K: 4096
```
Strategy is to reuse embedding backward kernel, adapting it to handle unindexed dimensions in the beginning by launching additional threadblocks, and also allowing it to handle slices that are bigger than `65K*128`, that is hardly ever a problem for embedding. Still, integer indexing is baked in the kernel, and is important for performance, so for now bigger than 2G element tensors are not supported.
The main savings come from not having to expand index to all unindexed dimensions, and not sorting expanded index with incoming gradient values, but rather only sorting unexpanded index.
There are ways to make sorting overhead smaller (thanks mcarilli for suggestions) but I'll get to it when it becomes a real problem, or rather, when cuda graphs will force us to get rid of thrust::sort calls.
I've also added tests for indexing backward, before tests for index_put_ and indexing backward were non-existent.
This PR also fixes #20457 by casting indices to `self` backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20557

Differential Revision: D15582434

Pulled By: ezyang

fbshipit-source-id: 91e8f2769580588ec7d18823d99a26f1c0da8e2a
2019-06-03 11:38:53 -07:00
Jerry Zhang
7f960a9c01 remove quantize_linear from Tensor method (#21196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21196

we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend
Python
```
torch.quantize_linear(t, ...)
```
C++
```
at::quantize_linear(t, ...)
```

Differential Revision: D15577123

fbshipit-source-id: d0abeea488418fa9ab212f84b0b97ee237124240
2019-05-31 12:01:10 -07:00
Edward Yang
e161360b62 Revert D15558784: [reland][pt1][quant] remove quantize_linear from Tensor method
Differential Revision:
D15558784

Original commit changeset: 0b194750c423

fbshipit-source-id: d180a7f76bb05ad7470f17bc3d2bd614fab16529
2019-05-31 06:20:05 -07:00
Jerry Zhang
f91f24764e remove quantize_linear from Tensor method (#21156)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21156

we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend
Python
```
torch.quantize_linear(t, ...)
```
C++
```
at::quantize_linear(t, ...)
```

Differential Revision: D15558784

fbshipit-source-id: 0b194750c423f51ad1ad5e9387a12b4d58d969a9
2019-05-30 22:02:12 -07:00
Jerry Zhang
277bf69fa0 Add torch.load/torch.save for QTensor (#20830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20830

att

Reviewed By: dzhulgakov

Differential Revision: D15340701

fbshipit-source-id: 677038c8101f66dec4856c2eccf9f9e394012226
2019-05-30 20:52:19 -07:00
Iurii Zdebskyi
64f06d4964 Enable all and any for bool tensors (#21033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21033
ghimport-source-id: 35fdcf27b0bde8ec3e5b3051cf0d730f20f94783

Differential Revision: D15530497

Pulled By: izdeby

fbshipit-source-id: 9c15cc960055f59a05ce0276f9d51c567626d966
2019-05-30 16:16:00 -07:00
Iurii Zdebskyi
9a22cb9f49 Enabled add, sum and mul for bool tensor (#21032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21032
ghimport-source-id: 6ab21752b4af451e8b10a0e02cd5d726aa7472f0

Differential Revision: D15530496

Pulled By: izdeby

fbshipit-source-id: f4f83aa80eafbb4f307aadc1a13d8cdcf3055c24
2019-05-30 16:11:43 -07:00
Edward Yang
c4a90ca18e Revert D15477933: [pt1][quant] remove quantize_linear and dequantize from Tensor method
Differential Revision:
D15477933

Original commit changeset: c8aa81f681e0

fbshipit-source-id: ec494fbbab72e20da262bdd8657887e1fdd173cb
2019-05-30 05:04:12 -07:00
Jerry Zhang
67291ba74f remove quantize_linear and dequantize from Tensor method (#20874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20874

A criteria for what should go in Tensor method is whether numpy has it, for this one it does not
so we are removing it as a Tensor method, we can still call it as function.
Python
```
torch.quantize_linear(t, ...), torch.dequantize(t)
```
C++
```
at::quantize_linear(t, ...), at::dequantize(t)
```

Reviewed By: dzhulgakov

Differential Revision: D15477933

fbshipit-source-id: c8aa81f681e02f038d72e44f0c700632f1af8437
2019-05-29 19:17:16 -07:00
Jerry Zhang
4900edebcf QTensor permute, transpose and contiguous (#20869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20869

Adding support for the functions listed in the title, by implementing the copy kernel.

Differential Revision: D15474060

fbshipit-source-id: 9264df6e442cca1cc5d952e3e5dcc9f4a426f317
2019-05-29 16:05:53 -07:00
Jerry Zhang
157fcfc07d Add quantize_linear_per_channel (#20765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20765

att

Reviewed By: dskhudia

Differential Revision: D15435455

fbshipit-source-id: 77770044411ce8ee02d26d63eb7e79cd10db103e
2019-05-29 14:29:16 -07:00
Yashodhan Ghadge
0ffd20c268 Fix empty tensor for unique_dim (#19000)
Summary:
Fixes: #18408

cc: zasdfgbnm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19000

Reviewed By: ezyang

Differential Revision: D15470136

Pulled By: VitalyFedyunin

fbshipit-source-id: daf71566b4dbdc91927d164f813b5ee8645af1a2
2019-05-29 13:50:32 -07:00
Iurii Zdebskyi
7cb1aa67b0 Enabled min, max, minall, maxall, cmin, cmax, cmaxValue, cminValue for bool tensors (#21031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21031
ghimport-source-id: 379b3e9d20872eb5ad14403ed6751cdb0e730bc5

Reviewed By: ezyang

Differential Revision: D15530499

Pulled By: izdeby

fbshipit-source-id: f113d6974ee18ac3dfb5c0bcff66865345d137d2
2019-05-29 13:22:54 -07:00
Jerry Zhang
94b9706017 fix dequantize_linear (#21035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21035

Fix the dtype error in `dequantize_linear`, it should accept the same dtype argument as `quantize_linear`

Differential Revision: D15521931

fbshipit-source-id: 0114c046a3f1046e42fca49c74c85e487fee8616
2019-05-29 12:18:15 -07:00
Edward Yang
0544a491d5 Revert D15499749: [pytorch][PR] Add Tensor.T attribute to reverse dimensions
Differential Revision:
D15499749

Original commit changeset: f3306b496667

fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878
2019-05-29 04:29:48 -07:00
vishwakftw
f6ec464890 Enable batched QR decomposition and add a some option (#20689)
Summary:
This PR covers two important points with respect to the QR decomposition:
- batching of input matrices (#7500)
- adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538)

Changelog:
- Enable batching for inputs to `torch.qr`
- Move QR decomposition implementation to ATen (CPU and CUDA)
- Remove existing implementations in TH/THC
- Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition
- Modify doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689

Differential Revision: D15529230

Pulled By: soumith

fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
2019-05-28 17:52:37 -07:00
Brennan Vincent
9294de8c9f Add Tensor.T attribute to reverse dimensions (#20598)
Summary:
For compatibility with numpy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598

Differential Revision: D15499749

Pulled By: umanwizard

fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc
2019-05-28 16:59:06 -07:00
Nishant Pandit
9d9751f634 Convert dequantize_linear to an internal function _dequantize_linear (#20938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20938

Dequantize_linear need not be exposed to the front end users.
It will only be used for the jit passes for q-dq insertion and op
substitution.

Differential Revision: D15446097

fbshipit-source-id: a5fbcf2bb72115122c9653e5089d014e2a2e891d
2019-05-27 15:40:21 -07:00
Brennan Vincent
c46c6a4fe6 Zero slice bug (#20914)
Summary:
Bug reported internally at FB:

```python
>>> t=torch.from_numpy(np.empty((0,4)))
>>> t[:,1::2]*=1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Trying to resize storage that is not resizable at ../aten/src/TH/THStorageFunctions.cpp:76
```

This happens because the storage offset of `t[:, 1::2]` is 1, and it has 0 elements. We can fix this by avoiding resizing the storage for no-element arrays.

(We could *also* have avoided it by not modifying the storage index in this case, but I felt this way was more semantically correct -- in general, we should not be assuming it's okay to do anything to the storage when it has zero elements).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20914

Differential Revision: D15497860

Pulled By: umanwizard

fbshipit-source-id: 6af61d73a05edfc5c07ce8be9e530f15bf72e6a9
2019-05-24 15:10:59 -07:00
Sam Gross
dee11a92c1 Use Device instead of Backend in TensorIterator (#20690)
Summary:
This PR also moves Device::validate into the header file, which makes
statements like `Device d = kCPU` effectively free.

Device includes the device's index, so TensorIterator::compute_types
now implicitly checks that all CUDA inputs are on the same GPU.
Previously, this was done ad-hoc in places like TensorIterator::binary_op.

Note that zero-dim Tensor (scalars) are NOT required to be on the
same device as other inputs because they behave almost like Python numbers.
TensorIterator handles copying zero-dim Tensors to the common device.

Prior to this PR, TensorIterator would copy zero-dim Tensors between CPU
and GPU, but not between different GPUs (because Backend didn't encode
the GPU index). This removes that restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20690

Differential Revision: D15414826

Pulled By: colesbury

fbshipit-source-id: 1d0ad1f7d663252af36dd4590bcda418c2f7a09f
2019-05-24 12:14:08 -07:00
Jerry Zhang
9ea009fe8b Add as_quantized_tensor (#20740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20740

Provide a way to assemble quantized Tensor from int8 Tensor, scale and zero point.

Differential Revision: D15232416

fbshipit-source-id: c3a3d9d7214b1dc569214c019440c2779fbd063b
2019-05-22 15:19:45 -07:00
Sam Gross
7a0c6d528a Fix copy_transpose_valid check (#20759)
Summary:
Fixes #20755

(Was broken in #20685)

cc vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20759

Differential Revision: D15433712

Pulled By: colesbury

fbshipit-source-id: 29f612f7d4d7b73158d6f5dc1e46fd2f8fb09a2f
2019-05-21 15:37:37 -07:00
Jerry Zhang
cca923c481 Add dequantize_linear for JIT pass (#20107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20107

att

Reviewed By: nishantpdce

Differential Revision: D15202187

fbshipit-source-id: 7d6274a67fcca695c0425587f35046fecbc2ccdc
2019-05-21 12:26:48 -07:00
Brennan Vincent
987f1ccf49 Add "ndim" property to tensor (#20565)
Summary:
For compatibility with numpy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20565

Differential Revision: D15374390

Pulled By: umanwizard

fbshipit-source-id: 4ab209a5fb27d8ba27ee7eb6b67b858ce2480594
2019-05-20 16:10:50 -07:00
Iurii Zdebskyi
71260b98e2 Fixed histc return type for CUDA (#20369)
Summary:
Fixing reported [issue](https://github.com/pytorch/pytorch/issues/20208).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20369

Reviewed By: zou3519

Differential Revision: D15300959

Pulled By: izdeby

fbshipit-source-id: 219692f99a66ea433112dfc226132eb6867122cf
2019-05-20 08:08:28 -07:00
Jerry Zhang
85fad0597c Add qint8 type (int8_t) (#19984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19984

Add qint8 for QTensor, with underlying type of int8_t

Reviewed By: jianyuh

Differential Revision: D15150715

fbshipit-source-id: 57580f599d46f9323af5ce462dbbc464b25e40d7
2019-05-17 20:35:05 -07:00
Stefan Krah
8c9f4c560a Add matmul optimization for the case A.ndim <= 2 && B.ndim >= 3 (#20448)
Summary:
This addresses #18862.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20448

Differential Revision: D15393465

Pulled By: ezyang

fbshipit-source-id: 87e5b0ed8253ea00365f420d98ac96dd4e934028
2019-05-17 09:44:26 -07:00
vishwakftw
690efa5220 Remove checks for CUDA 8 in LU-based tests (#20482)
Summary:
CUDA 8 is no longer supported and removed from CI, so these checks are irrelevant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20482

Differential Revision: D15393438

Pulled By: ezyang

fbshipit-source-id: ac0979bf660b3314eec502c745e34ce4940bda0e
2019-05-17 08:51:56 -07:00
Jerry Zhang
220e6894c5 Rename qint8 data type (#19932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19932

In preparation to add int8_t data type for QTensor

Reviewed By: zafartahirov

Differential Revision: D15137838

fbshipit-source-id: 59462c36d6fc5982986d4196bf3f32f49bb294d7
2019-05-16 18:09:28 -07:00
Vitaly Fedyunin
a837c00acd Removing unnecessary comments (+fix flake8)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20589

Differential Revision: D15373655

Pulled By: VitalyFedyunin

fbshipit-source-id: 25277648d3e8f8a09cec7569ceda56e74c2ef0b1
2019-05-16 09:19:34 -07:00
Vitaly Fedyunin
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
Jerry Zhang
abb3698976 Add QInt32 ScalarType and qint32 data type (#19816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19816

We need this for quantization for bias
add third argument of ScalarType to `quantize_linear`

Differential Revision: D15094174

fbshipit-source-id: f19ec8f4716cf5fe0aa21b38d45af6d27c9ab377
2019-05-15 18:50:18 -07:00
Igor Fedan
4c23c34e79 Computing var/stddev and mean at the same time (#18731)
Summary:
The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18731

Differential Revision: D14726082

Pulled By: ifedan

fbshipit-source-id: 473cba0227b69eb2240dca5e61a8f4366df0e029
2019-05-15 16:42:38 -07:00
Brennan Vincent
72bb84c518 Provide a few default args for numpy translation (#20451)
Summary:
Add automatic translations for a few argument names that commonly differ between PyTorch and NumPy.

For now, they are as follows:

* `keepdim` -> `keepdims`
* `dim` -> `axis`
* `input` -> (any of `a`, `x`, `x1`)
* `other` -> `x2`

Basic examples:
```python
>>> t=torch.randn(10,10)
>>> torch.sum(x=t, axis=1)
tensor([ 0.5199, -0.3768,  4.3619, -0.9105,  1.1804,  1.0837, -0.9036,  0.2365,
         1.1171, -0.0999])
```
```python
>>> torch.add(x1=5, x2=6)
tensor(11)
```

The additional overhead is zero when using traditional PyTorch argument names, and a few (usually 1) extra PyDict lookups when using NumPy argument names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20451

Differential Revision: D15337521

Pulled By: umanwizard

fbshipit-source-id: 7a7d389786f4ccf5c86a14ecb2002c61730c51b5
2019-05-15 10:13:17 -07:00
Philipp Lang
f23fb66e6e Fix in file position logic: file descriptor and Python-side handle (#20270)
Summary:
This addresses #18436

The logic replicates the essence of closing file descriptors in numpy:
bf20e30340/numpy/core/include/numpy/npy_3kcompat.h (L278)

This stores the position of the file descriptor before resetting it to the Python handle offset, then resets to the original position before exit. The Python-side handle is then updated to reflect the new position. Also added somewhat more demanding tests to cover this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20270

Differential Revision: D15275902

Pulled By: soumith

fbshipit-source-id: 5ca8a52b61c7718d2e69571f72f80b1350b0acdb
2019-05-09 08:20:01 -07:00
Brian Vaughan
c406bf20a0 error instead of crashing on attempt to subclass typed tensors (#20283)
Summary:
https://github.com/pytorch/pytorch/issues/20052

typed tensors (e.g. torch.FloatTensor) can't be subclassed. Was causing
crashes and other errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20283

Differential Revision: D15278138

Pulled By: nairbv

fbshipit-source-id: 8493eac4d34dfb76b054362bf0acec02146cd0e2
2019-05-09 07:10:38 -07:00
Ilia Cherniavskii
481b6d0268 Allow a non-OpenMP based build (#19749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19749
ghimport-source-id: a6636c0acddbdc5fd5b0dcb20b9f80cbdb9159b9

Differential Revision: D15141993

Pulled By: ilia-cher

fbshipit-source-id: 96085608398b2a4c97c68b2948f5184d07f9ad3d
2019-05-06 19:34:48 -07:00
Jerry Zhang
17268a9225 Add print function for QTensor (#19513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19513

Add support for printing a QTensor in python frontend

Differential Revision: D15017168

fbshipit-source-id: 312d1f18e6ca3c9eb4a5b8bb1c64f7cc8bc1dcf5
2019-05-06 13:12:43 -07:00
Iurii Zdebskyi
ca57dd9332 Fixed log_normal and geometric for CPU (#19938)
Summary:
log_normal_ and geometric_ were disabled for CPU by mistake in [this PR](bc53805f2e), this PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19938

Differential Revision: D15143404

Pulled By: izdeby

fbshipit-source-id: 41c7bd29f046b5a3ac6d601de8c64ab553771d19
2019-04-30 12:18:10 -07:00
iurii zdebskyi
aa6403bae6 Added .bool() method
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19928

Differential Revision: D15131923

Pulled By: izdeby

fbshipit-source-id: 3909cf4623fe85e98ceaf57fbb57745919899445
2019-04-30 10:34:31 -07:00
iurii zdebskyi
de19eeee99 Enabled masked for a bool tensor (#19140)
Summary:
Added deprecation warnings for the masked methods and enabled them for a bool tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19140

Differential Revision: D14888021

Pulled By: izdeby

fbshipit-source-id: 0e42daf8f3732ca29f36d10485402bfc502716ad
2019-04-29 10:40:12 -07:00
Xiaomeng Yang
2ce39de3fc Add elementwise_affine for layer_norm_op (#19713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713

Add elementwise_affine for layer_norm_op

Reviewed By: houseroad

Differential Revision: D15075454

fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f
2019-04-26 17:20:01 -07:00
Jerry Zhang
6ec55c13a9 Enable assignment for QTensor in pytorch frontend (#19676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19676

Make copy work with QTensor, enable assignment of QTensor in pytorch frontend.

Differential Revision: D15064710

fbshipit-source-id: 04f2dc02a825695d41fa1114bfca49e92108fef3
2019-04-24 16:05:34 -07:00
Alex Şuhan
4a65ee95cc Make torch.equal work with boolean CPU tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19604

Differential Revision: D15056022

Pulled By: li-roy

fbshipit-source-id: 1309b107b2d4ee0a490bce1b43c3c175180a1580
2019-04-24 15:51:10 -07:00
Vitaly Fedyunin
d14abe3aff Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688)
Summary:
Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688

Differential Revision: D15012644

Pulled By: VitalyFedyunin

fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd
2019-04-24 15:38:56 -07:00
Edward Yang
c42f3f9055 Revert D15008160: Enable assignment for QTensor in pytorch frontend
Differential Revision:
D15008160

Original commit changeset: 5f1166246d76

fbshipit-source-id: 24c7350431ae6a87199d6e3f7ffbbc8ec7d3c28b
2019-04-24 06:58:13 -07:00
Jerry Zhang
309c15e2df Enable assignment for QTensor in pytorch frontend (#19530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19530
Make copy work with QTensor, enable assignment of QTensor in pytorch frontend.

Differential Revision: D15008160

fbshipit-source-id: 5f1166246d768b23f009cde1fa03e8952368a332
2019-04-23 21:29:31 -07:00
Phúc Lê
9b272affde Add base support to torch.logspace, default base=10 (#19542)
Summary:
Add base support for torch.logspace. See #19220 for details.
SsnL can you feedback? Thanks a lot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19542

Differential Revision: D15028484

Pulled By: soumith

fbshipit-source-id: fe5a58a203b279103abbc192c754c25d5031498e
2019-04-23 15:06:34 -07:00
jhultman
f767c9ac76 Add docs and test guaranteeing indices from torch.nonzero ordered C-style (#19539)
Summary:
See #17556.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19539

Differential Revision: D15030151

Pulled By: ezyang

fbshipit-source-id: d46ee56a66d89b0113f86e3f8693dc1680d0adb9
2019-04-23 09:29:21 -07:00
Tongzhou Wang
3b4d4ef503 Remove unnecessary printing from tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19606

Differential Revision: D15046583

Pulled By: ezyang

fbshipit-source-id: ea9bb691d23855e7eddbabe68bf112a726641ba4
2019-04-23 09:24:08 -07:00
vishwakftw
c30224ad21 Rename potri to cholesky_inverse (#19498)
Summary:
Changelog:
- Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498

Differential Revision: D15029901

Pulled By: ezyang

fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a
2019-04-22 08:18:39 -07:00
Jerry Zhang
fc1aadec3b Make empty_affine_quantized private (#19446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19446

change empty_affine_quantized to _empty_affine_quantized

Reviewed By: dzhulgakov

Differential Revision: D15008757

fbshipit-source-id: c7699ac0c208a8f17d88e95193970c75ba7219d3
2019-04-19 11:21:44 -07:00
Xiang Gao
e1750754c8 Step 4: add support for unique with dim=None (#18651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18651
ghimport-source-id: e11988130a3f9a73529de0b0d08b4ec25fbc639c

Differential Revision: D15000463

Pulled By: VitalyFedyunin

fbshipit-source-id: 9e258e473dea6a3fc2307da2119b887ba3f7934a
2019-04-18 18:28:07 -07:00
Ailing Zhang
88f70a1670 Fix pickling torch.float32 (#18045)
Summary:
Attempt fix for #14057 . This PR fixes the example script in the issue.
The old behavior is a bit confusing here. What happened to pickling is python2 failed to recognize `torch.float32` is in module `torch`, thus it's looking for `torch.float32` in module `__main__`. Python3 is smart enough to handle it.
According to the doc [here](https://docs.python.org/2/library/pickle.html#object.__reduce__), it seems `__reduce__` should return `float32` instead of the old name `torch.float32`. In this way python2 is able to find `float32` in `torch` module.
> If a string is returned, it names a global variable whose contents are pickled as normal. The string returned by __reduce__() should be the object’s local name relative to its module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18045

Differential Revision: D14990638

Pulled By: ailzhang

fbshipit-source-id: 816b97d63a934a5dda1a910312ad69f120b0b4de
2019-04-18 12:28:10 -07:00
Jerry Zhang
ad8f34fcca Add empty_quantized (#18960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18960

empty_affine_quantized creates an empty affine quantized Tensor from scratch.
We might need this when we implement quantized operators.

Differential Revision: D14810261

fbshipit-source-id: f07d8bf89822d02a202ee81c78a17aa4b3e571cc
2019-04-17 16:17:40 -07:00
Richard Zou
eaa14f5f59 Error out on in-place binops on tensors with internal overlap (#19317)
Summary:
This adds checks for `mul_`, `add_`, `sub_`, `div_`, the most common
binops. See #17935 for more details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19317

Differential Revision: D14972399

Pulled By: zou3519

fbshipit-source-id: b9de331dbdb2544ee859ded725a5b5659bfd11d2
2019-04-17 13:02:07 -07:00
Junjie Bai
33443d083e Fix python lint (#19331)
Summary:
VitalyFedyunin jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19331

Differential Revision: D14969435

Pulled By: bddppq

fbshipit-source-id: c1555c52064758ecbe668f92b837f2d7524f6118
2019-04-16 21:47:30 -07:00
Jerry Zhang
06c28d8a12 Add slicing and int_repr() to QTensor (#19296)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#19296 [pt1][quant] Add slicing and int_repr() to QTensor**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14756833/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #18960 [pt1][quant] Add empty_quantized&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14810261/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #19312 Use the QTensor with QReLU&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14819460/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #19319 [RFC] Quantized SumRelu&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14866442/)

Methods added to pytorch python frontend:
- int_repr() returns a CPUByte Tensor which copies the data of QTensor.
- Added as_strided for QTensorImpl which provides support for slicing a QTensor(see test_torch.py)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19296

Differential Revision: D14756833

Pulled By: jerryzh168

fbshipit-source-id: 6f4c92393330e725c4351d6ff5f5fe9ac7c768bf
2019-04-16 20:17:21 -07:00
Xiang Gao
df67969e6b Step 3: Add support for return_counts to torch.unique for dim not None (#18650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18650
ghimport-source-id: 75759c95e6c48e27c172b919097dbc40c6bfb5e6

Differential Revision: D14892319

Pulled By: VitalyFedyunin

fbshipit-source-id: ec5d1b80fc879d273ac5a534434fd648468dda1e
2019-04-16 14:06:45 -07:00
Vitaly Fedyunin
1c5073fb4b Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.

Now compatible with both torch scripts:
 `  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)`
and
`  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))`

Same checked for all similar functions `rand_like`, `empty_like` and others

It is fixed version of #18455
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952

Differential Revision: D14801792

Pulled By: VitalyFedyunin

fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba
2019-04-16 11:06:15 -07:00
Jerry Zhang
e1f38a847d Fix type conversion in dequant and add a test (#19226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19226

Type conversoin was wrong previously. Thanks zafartahirov for finding it!

Differential Revision: D14926610

fbshipit-source-id: 6824f9813137a3d171694d743fbb437a663b1f88
2019-04-16 10:52:44 -07:00
Jerry Zhang
1c836e7bb9 Add Quantized Backend (#18546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18546

We'll expose all combinations of various ways of quantization in the top level dispatch key, that is we have AffineCPUTensor, PerChannelAffineCUDATensor, etc.

QTensor method added:
- is_quantized()
- item()

Differential Revision: D14637671

fbshipit-source-id: 346bc6ef404a570f0efd34e8793056ad3c7855f5
2019-04-12 12:55:49 -07:00
Xiang Gao
3f7ddd269c Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim (#18649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18649
ghimport-source-id: 3411d240a6af5fe299a889667964730184e30645

Differential Revision: D14888292

Pulled By: VitalyFedyunin

fbshipit-source-id: 80da83c264598f74ab8decb165da4a1ce2b352bb
2019-04-12 12:41:20 -07:00
Iurii Zdebskyi
507fe66bea Enable comp ops for bool tensor (#19109)
Summary:
Enabled comparison ops for bool tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19109

Differential Revision: D14871187

Pulled By: izdeby

fbshipit-source-id: cf9951847d69124a93e5e21dd0a39c9568b1037d
2019-04-11 14:37:10 -07:00
iurii zdebskyi
1858773c0c Fixed bool Tensor value change bug (#19096)
Summary:
Fixes #19077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19096

Differential Revision: D14871044

Pulled By: izdeby

fbshipit-source-id: 61b12559c8c5b9613e00ba5933f478321ea80469
2019-04-10 11:09:07 -07:00
Xiang Gao
ea2405c7dc Add torch.unique_consecutive (#19060)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/19045

Please review: VitalyFedyunin ngimel

This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there.

The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely.

Benchmark on a tensor of shape `torch.Size([15320, 2])`:

```python
print(torch.__version__)
print()
a = tensor.sort().values.to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+2addccc

cpu, sorted_input=False:
340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cuda, sorted_input=False:
213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

```python
print(torch.__version__)
print()
a1, a2 = tensor.unbind(1)
indices = (a1 * tensor.max() + a2).sort().indices
a = tensor.index_select(0, indices).to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
cpu, sorted_input=False:
55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cuda, sorted_input=False:
171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060

Differential Revision: D14866909

Pulled By: ezyang

fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f
2019-04-10 07:36:08 -07:00
James Reed
82b570528d Move abs, frac, reciprocal, and neg to TensorIterator (#19041)
Summary:
I've been messing around with vectorizing the fusion compiler in JIT, and noticed that these ops were pathologically slow. I moved them to use TensorIterator + Vec256<> and got some speed wins.

Benchmark script:

```
import torch, time

ops = ['abs', 'neg', 'reciprocal', 'frac']

x = torch.rand(1024, 1024)
NITER = 10000

print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')

for op in ops:
    s = time.time()
    for i in range(NITER):
        getattr(x, op)()
    elapsed_sec = ((time.time() - s) / NITER)
    print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')

```

Before this change (on my mac with a skylake):
```
op      time per iter (ms)      gops/s  GB/s
abs     0.9730974197387695      1.0775652866097343      8.620522292877874
neg     1.0723679780960083      0.9778136063534356      7.822508850827485
reciprocal      1.2610594034194946      0.8315040490215421      6.6520323921723366
frac    1.1681334018707275      0.8976509004200546      7.181207203360437
```

After this change:
```
op      time per iter (ms)      gops/s  GB/s
abs     0.5031076192855835      2.084198210889721       16.673585687117768
neg     0.4433974027633667      2.3648672578256087      18.91893806260487
reciprocal      0.47145988941192624     2.2241043693195985      17.79283495455679
frac    0.5036592721939087      2.0819154096627024      16.65532327730162
```

So, after this change it looks like we are hitting machine peak for bandwidth and are bandwidth bound.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19041

Differential Revision: D14862037

Pulled By: jamesr66a

fbshipit-source-id: e2032ac0ca962dbf4120bb36812277c260e22912
2019-04-09 21:55:00 -07:00
Vishwak Srinivasan
487388d8ad Rename btrisolve to lu_solve (#18726)
Summary:
Changelog:
- Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726

Differential Revision: D14726237

Pulled By: zou3519

fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b
2019-04-09 15:21:24 -07:00
Edward Yang
29ea08616b Add torch.__config__.show(), reporting detailed version of all libraries. (#18579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18579
ghimport-source-id: 65124c95e49423de4ad1008c65e75057fea09b94

Differential Revision: D14778507

Pulled By: ezyang

fbshipit-source-id: 1e4bb79f4800a116ce8fb7af2fefbd34da8d102c
2019-04-09 11:13:24 -07:00
Xiang Gao
89145e602b Namedtuple return for gels, triangular_solve, and test refactor (#17195)
Summary:
Partial fix of: https://github.com/pytorch/pytorch/issues/394
- `gels` and `triangular_solve` now returns namedtuple
- refactor test for namedtuple API for better coverage and maintainability
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17195

Differential Revision: D14851875

Pulled By: ezyang

fbshipit-source-id: 9b2cba95564269d2c3a15324ba48751d68ed623c
2019-04-09 09:13:26 -07:00
Gao, Xiang
8c9caf185b Add numpy like repeat as torch.repeat_interleave (#18395)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/14093
cc: SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18395

Differential Revision: D14599509

Pulled By: umanwizard

fbshipit-source-id: 2391a1cc135fe5bab38475f1c8ed87c4a96222f3
2019-04-05 18:16:25 -07:00
J M Dieterich
e45e3634d6 add launch bounds, enable more tests (#18909)
Summary:
Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use.

Enable tests that now work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909

Differential Revision: D14801490

Pulled By: ezyang

fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7
2019-04-05 10:17:15 -07:00
Vitaly Fedyunin
b7c830b916 Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854)
Summary:
This reverts commit c484cf43a0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854

Differential Revision: D14778393

Pulled By: VitalyFedyunin

fbshipit-source-id: 4b5a1f5b1c091bbc4a8e75614734cc011d26b452
2019-04-05 06:25:33 -07:00
Iurii Zdebskyi
b4d2df1fee Added bool and half support for resize_as_ and view methods (#18821)
Summary:
Enabled **resize_as_** and **view** methods for bool and half tensors.
tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18821

Reviewed By: ezyang

Differential Revision: D14762852

Pulled By: izdeby

fbshipit-source-id: 4312079fb4e893fea6f71ff4f163094b2674f1e8
2019-04-04 13:09:10 -07:00
Gregory Chanan
8732a1b42e Disallow changing the device of a tensor via set_. (#18832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18832
ghimport-source-id: fde4ad90541ba52dfa02bdd83466f17e6541e535

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.
* **#18832 [STACK] Disallow changing the device of a tensor via set_.**
* #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.

This is necessary to cache the device on a TensorImpl.

Differential Revision: D14766231

fbshipit-source-id: bba61634b2d6252ac0697b96033c9eea680956e8
2019-04-04 11:15:37 -07:00
Gregory Chanan
486fae563d Stop swapping in Storages of the wrong device for Tensors. (#18831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18831
ghimport-source-id: 2741e0d70ebe2c2217572c3af54ddd9d2047e342

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.
* #18832 [STACK] Disallow changing the device of a tensor via set_.
* **#18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.**

This is necessary to support device caching, see https://github.com/pytorch/pytorch/pull/18751 and https://github.com/pytorch/pytorch/pull/18578.

In library code, we potentially swap in Storages with the wrong device when device_guard is False.  This happens as follows with "view-like" operations.
1) We allocate a tensor on the 'wrong' device (because device_guard is false).
2) We swap out the 'wrong' storage with the 'right' storage using e.g. THCTensor_setStorage.

Instead, we can just construct the Tensor with the correct Storage from the beginning.  This is what we do with 'view'.

Note there are two other "view-like" cases where this happens:
1) unfold
2) set_()

Because these aren't performance critical, I just added the device_guard instead of applying the above correction.

For completeness, this also includes a test that all `device_guard: false` functions behave properly under these conditions.

Reviewed By: dzhulgakov

Differential Revision: D14766232

fbshipit-source-id: 0865c3ddae3f415df5da7a9869b1ea9f210e81bc
2019-04-04 06:25:33 -07:00
Vitaly Fedyunin
773ce4fbd0 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance (#18648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18648
ghimport-source-id: 1cf4a8fe91492621e02217f38cae5d7e0699fb05

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18661 Step 7: remove _unique
* #18655 Step 6: Rename _unique2 to unique and add int? dim
* #18654 Step 5: remove _unque_dim in favor of unique_dim
* #18651 Step 4: add support for unique with dim=None
* #18650 Step 3: Add support for return_counts to torch.unique for dim not None
* #18649 Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim
* **#18648 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance**

`unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. soumith Please take this and there is no need to put #18391 back.

The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure.

Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure.

Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`.

Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure.

Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure.

Step 5: Remove `_unique_dim`. This might cause internal failure.

Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail.

Step 7: Remove `_unique`. This is very likely to cause internal failure.

This PR
======

This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements.

Please review ngimel VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy.

Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`:

Before
---------

```python
print(torch.__version__)
%timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize()
```

```
1.0.1
192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

```python
print(torch.__version__)
%timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize()
```

```
1.0.1
226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After
-------

```python
print(torch.__version__)
%timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+83ab8ac
190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

```python
print(torch.__version__)
%timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+83ab8ac
232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

Differential Revision: D14730905

fbshipit-source-id: 10026b4b98628a8565cc28a13317d29adf1225cc
2019-04-03 15:29:55 -07:00
Jerry Zhang
dfcd7b0185 QTensor (#18230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230

Implementing minimum qtensor API to unblock other workstreams in quantization

Changes:
- Added Quantizer which represents different quantization schemes
- Added qint8 as a data type for QTensor
- Added a new ScalarType QInt8
- Added QTensorImpl for QTensor
- Added following user facing APIs
  - quantize_linear(scale, zero_point)
  - dequantize()
  - q_scale()
  - q_zero_point()

Reviewed By: dzhulgakov

Differential Revision: D14524641

fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c
2019-04-03 13:17:11 -07:00
Gregory Chanan
2113ea6fbf Add device and dtype to storage. (#18749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18749
ghimport-source-id: 9026a037f5e11cdb9ccd386f4b6b5768b9c3259b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18751 Disallow changing the device of a tensor via set_.
* #18750 Use non-legacy constructors for tensor deserialization.
* **#18749 Add device and dtype to storage.**

The goal here is to fix our serialization, which currently depends on the legacy constructors.  Having dtype and device on Storage allows us to use the non-legacy constructors.

This fits somewhat along our goal of removing Storage, my having Storage act like a Tensor.

Differential Revision: D14729516

fbshipit-source-id: bf4a3e8669ad4859931f4a3fa56df605cbc08dcb
2019-04-03 07:59:02 -07:00
Iurii Zdebskyi
48f70ea0a2 Added numpy conversion (#18505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18505
ghimport-source-id: f3c9b9251e5793f9e192f587194ddfebb45facc1

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18505 [WIP]Added numpy conversion**
* #18166 Bool Tensor for CUDA

Differential Revision: D14646403

fbshipit-source-id: 79d39d692c778ce1981c1d35b1c33e3d93111041
2019-04-03 07:28:24 -07:00
Igor Fedan
3079d95b6c Fix flake8 issues
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18762

Reviewed By: houseroad

Differential Revision: D14734152

Pulled By: ifedan

fbshipit-source-id: 5adf123f88273895ad34ee9041896358d686de08
2019-04-02 21:18:01 -07:00
Iurii Zdebskyi
b832b99afb Bool Tensor for CUDA (#18166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18166
ghimport-source-id: a8e2ba2d966e49747a55701c4f6863c5e24d6f14

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18166 Bool Tensor for CUDA**
* #18165 Resolved comments from Bool Tensor for CPU PR
------

This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
1. Storage Implementation [Done]
2. Tensor Creation.
a) CPU [Done]
b) CUDA [This PR]
3. Tensor Conversions.
4. Tensor Indexing.
5. Tensor Operations.
6. Back compatibility related changes.

Change:
Enable bool tensor in CUDA with the following operations:

    torch.zeros
    torch.tensor
    torch.ones
    torch.rand/rand_like/randint/randint_like
    torch.full
    torch.full_like
    torch.empty
    torch.empty_like

Tested via unit tests and local scripts.

Differential Revision: D14605104

fbshipit-source-id: b7d7340a7d70edd03a109222d271e68becba762c
2019-04-02 16:17:05 -07:00
Igor Fedan
2e97c82470 torch.cross' dim default changed to c10::optional instead of int=-1 (#17582)
Summary:
Argument dim=-1 doesn't work for torch.cross. The signature of the torch.cross has been changed to c10::optional<int64_t> dim instead of int64_t. So based on document "If dim is not given, it defaults to the first dimension found with the size 3." and if dim is specified (even negative) it will use the correspondent dim.

Fixes #17229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17582

Differential Revision: D14483063

Pulled By: ifedan

fbshipit-source-id: f9699093ec401cb185fd33ca4563c8a46cdcd746
2019-04-02 13:27:00 -07:00
Vitaly Fedyunin
c484cf43a0 Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455

Reviewed By: ezyang

Differential Revision: D14672084

Pulled By: VitalyFedyunin

fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124
2019-04-02 08:48:19 -07:00
vishwakftw
baac5489a8 Expose alias multinomial methods to ATen (#17904)
Summary:
This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods.

cc: neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904

Differential Revision: D14700205

Pulled By: ezyang

fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3
2019-04-02 07:56:41 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Vishwak Srinivasan
e73be58ff7 Rename btriunpack to lu_unpack (#18529)
Summary:
Changelog:
- Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface.
- Rename all relevant tests, fix callsites
- Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529

Differential Revision: D14683161

Pulled By: soumith

fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1
2019-03-29 13:01:30 -07:00
Vishwak Srinivasan
d859031ebf Rename btrifact* to lu (#18435)
Summary:
Changelog:

- Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`).
- Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple.
- Rename all tests, fix callsites
- Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage.
- Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435

Differential Revision: D14680352

Pulled By: soumith

fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8
2019-03-29 00:34:30 -07:00
Edward Yang
81e030d9a6 Upgrade flake8-bugbear to master, fix the new lints. (#18507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18507
ghimport-source-id: 1c3642befad2da78a7e5f39d6d58732b85c76267

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18507 Upgrade flake8-bugbear to master, fix the new lints.**

It turns out Facebobok is internally using the unreleased master
flake8-bugbear, so upgrading it grabs a few more lints that Phabricator
was complaining about but we didn't get in open source.

A few of the getattr sites that I fixed look very suspicious (they're
written as if Python were a lazy language), but I didn't look more
closely into the matter.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14633682

fbshipit-source-id: fc3f97c87dca40bbda943a1d1061953490dbacf8
2019-03-27 08:07:41 -07:00
Xiang Gao
2ba41c5550 Add some missing docs for tensor methods and attributes, new unittest to enforce tensors.rst no longer miss anything (#16057)
Summary:
This depend on https://github.com/pytorch/pytorch/pull/16039

This prevent people (reviewer, PR author) from forgetting adding things to `tensors.rst`.

When something new is added to `_tensor_doc.py` or `tensor.py` but intentionally not in `tensors.rst`, people should manually whitelist it in `test_docs_coverage.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16057

Differential Revision: D14619550

Pulled By: ezyang

fbshipit-source-id: e1c6dd6761142e2e48ec499e118df399e3949fcc
2019-03-26 18:05:56 -07:00
Soumith Chintala
66628f78b7 Revert D14605905: [pytorch][PR] Add return_counts to torch.unique
Differential Revision:
D14605905

Original commit changeset: 555f5a12a8e2

fbshipit-source-id: c7874f5987893e956c022180a37763d88bba38db
2019-03-26 17:18:01 -07:00
Tongzhou Wang
5292685d2f Improve numerical precision of (s)logdet (#18449)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449

Differential Revision: D14611638

Pulled By: soumith

fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf
2019-03-26 15:32:14 -07:00
Soumith Chintala
436723122e fix arange shape issue inconsistency across cpu and cuda (#18462)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18462

Differential Revision: D14620263

Pulled By: soumith

fbshipit-source-id: 223524cdda2f5d55c2ca8d4cdcf6f7a05a6c15eb
2019-03-26 15:27:24 -07:00
Xiang Gao
5bff395a82 Namedtuple return for solve, slogdet, sort, topk (#17093)
Summary:
More ops for https://github.com/pytorch/pytorch/issues/394. ~~Also need to rebase after landing #16186, because we need to update the whitelist of the new unit test added in #16186.~~

cc: ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17093

Differential Revision: D14620068

Pulled By: ezyang

fbshipit-source-id: deec5ffc9bf7624e0350c85392ee59789bad4237
2019-03-26 12:39:08 -07:00
Iurii Zdebskyi
1a742075ee Resolving comments from Bool Tensor for CPU PR (#18165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18165
ghimport-source-id: 55cb3fb63a25c2faab1725b4ec14c688bf45bd38

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18166 Bool Tensor for CUDA
* **#18165 Resolved comments from Bool Tensor for CPU PR**
-------
------------
This is a follow up PR that resolves some additional feedback on one the of previous Bool Tensor PRs.

gchanan, here is a list of almost all the comments from the original PR with respective fixes and replies:

**[utils/python_scalars.h]** why is this converting from uint8_t and not bool? (comment?)
When i was adding this, i was testing by creating a tensor and then calling its .tolist(). it worked for bool and uint8_t equally good so i left uint8_t as thought it makes more sense as we are calling PyBool_FromLong. �Changing it to bool.

**[ATen/Dispatch.h]**better name?.
fixed.

**[test/test_torch.py]** what about other factories, such as full? (and more).
There is a test that goes through the factory methods - test_tensor_factories_empty. i added some bool cases above it and added a comment that once CUDA will be done, i will unite them and it will iterate not just between CUDA and CPU but also all types. ��Adding all bool cases now. Will unite in CUDA PR.

**[generic/THTensorMath.h]** any changes in this file actually needed?
Bad merge. Fixed.

**[TH/THTensor.h]** this generates code for random, clampedRandom, and cappedRandom -- do we have tests for all of these with bool?
Added

**[c10/core/ScalarType.h]** I'm not very confident about the lack of Bool here -- can you look at the call sites and see what makes sense to do here?
Added bool to the macro and created a similar one without for a single case which fails the build with errors:

_./torch/csrc/jit/symbolic_variable.h:79:20: error: ambiguous overload for ‘operator*’ (operand types are ‘const torch::jit::SymbolicVariable’ and ‘torch::jit::Value*’)
return (*this) * insertConstant(rhs);_

Differential Revision: D14605105

fbshipit-source-id: abf82d50e8f8c50b386545ac068268651b28496d
2019-03-26 09:59:34 -07:00
vishwakftw
5e462a3ed6 Introduce SobolEngine (#10505)
Summary:
`SobolEngine` is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111.

Documentation has been added, tests have been added based on Balandat 's references. The implementation is an optimized / tensor-ized implementation of Balandat 's implementation in Cython as provided in #9332.

This closes #9332 .

cc: soumith Balandat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10505

Reviewed By: zou3519

Differential Revision: D9330179

Pulled By: ezyang

fbshipit-source-id: 01d5588e765b33b06febe99348f14d1e7fe8e55d
2019-03-26 07:53:07 -07:00
Xiang Gao
e2730ddb21 Add return_counts to torch.unique (#18391)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/12598

This PR was originally authorized by ptrblck at https://github.com/pytorch/pytorch/pull/15495, but since there was no update for months after the request change, I clone that branch and resolve the code reviews here. Hope everything is good now. Especially, the implementation of count is changed from ptrblck's original algorithm to the one ngimel suggest, i.e. using `unique_by_key` and `adjacent_difference`.

The currently implementation of `_unique_dim` is VERY slow for computing inverse index and counts, see https://github.com/pytorch/pytorch/issues/18405. I will refactor `_unique_dim` in a later PR. For this PR, please allow me to keep the implementation as is.

cc: ptrblck ezyang ngimel colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18391

Reviewed By: soumith

Differential Revision: D14605905

Pulled By: VitalyFedyunin

fbshipit-source-id: 555f5a12a8e28c38b10dfccf1b6bb16c030bfdce
2019-03-25 20:38:17 -07:00
Edward Yang
50df3e5e2e Add ability to query if built with CUDA and MKL-DNN. (#18362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362
ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18362 Add ability to query if built with CUDA and MKL-DNN.**

Fixes #18108.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14584430

fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473
2019-03-25 10:39:09 -07:00
Edward Yang
e3da16a99e Add test for #17271 (torch.exp incorrect for 2**31 size tensor) (#18292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18292
ghimport-source-id: a3e96584db0eef7b6202a1211808f9f6e59dd529

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18292 Add test for #17271 (torch.exp incorrect for 2**31 size tensor)**
* #18291 Correctly call superclass setUp in TestCase subclasses.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14567642

fbshipit-source-id: c60ee7597a86f5d2c5c0b72cb106f17815950427
2019-03-22 07:50:38 -07:00
vishwakftw
291746f110 Rename trtrs to triangular_solve (#18213)
Summary:
Changelog:
- Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage.
- Move `isnan` to _torch_docs.py
- Remove unnecessary imports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213

Differential Revision: D14566902

Pulled By: ezyang

fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f
2019-03-21 14:27:21 -07:00
Edward Yang
549c4da917 Add a decorator for marking slow tests. (#18231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18231
ghimport-source-id: 78c230f60c41877fe91b89c8c979b160f36f856b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18231 Add a decorator for marking slow tests.**

The general strategy:
- It's a normal skip decorator, which triggers a skip if
  PYTORCH_TEST_WITH_SLOW is not set.
- It also annotates the method in question that says it's
  slow.  We use this to implement a catch-all skipper in
  setUp that skips all non-slow tests when
  PYTORCH_TEST_SKIP_FAST is set.

I added a little smoketest to test_torch and showed that I get:

```
Ran 432 tests in 0.017s
OK (skipped=431)
```

when running with PYTORCH_TEST_WITH_SLOW=1 and PYTORCH_TEST_SKIP_FAST=1

CI integration coming in later patch, as well as nontrivial uses of
this decorator.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14544441

fbshipit-source-id: 54435ce4ec827193e019887178c09ebeae3ae2c9
2019-03-21 11:17:34 -07:00
Edward Yang
ba81074c40 Fix B902 lint error: invalid first argument. (#18181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181
ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18184 Fix B903 lint: save memory for data classes with slots/namedtuple
* **#18181 Fix B902 lint error: invalid first argument.**
* #18178 Fix B006 lint errors: using mutable structure in default argument.
* #18177 Fix lstrip bug revealed by B005 lint

A variety of sins were committed:
- Some code was dead
- Some code was actually a staticmethod
- Some code just named it the wrong way
- Some code was purposely testing the omitted case

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530876

fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e
2019-03-21 09:10:28 -07:00
Gao, Xiang
7e6220393f Cleanup arg{min, max} (#17103)
Summary:
Why do we need this workaround? `PythonArgParser` handles these two cases well.

The discussion started at https://github.com/pytorch/pytorch/pull/6201#issuecomment-378724406. The conclusion at that time by goldsborough was:

> Because we wanted to allow `dim=None` in Python and route to a different function. Essentially the problem was wanting to wrap the C++ function in Python. AFAIK there is no way of translating `dim=None` behavior into C++? So Richard and I came up with this strategy

Maybe at that time `PythonArgParser` was not powerful enough to handle the routing of two function with same name but different C++ signature.

Will keep an eye on the CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17103

Differential Revision: D14523503

Pulled By: VitalyFedyunin

fbshipit-source-id: cae3e2678062da2eccd93b51d4050578c7a9ab80
2019-03-20 16:28:27 -07:00
Vishwak Srinivasan
a519217ee7 Add batched version of trtrs (#18025)
Summary:
- Remove single batch TH/THC implementations
- Remove `_batch_trtrs_lower` from `multivariate_normal`
- Add tests for batched behavior
- Modify trtrs_backward to accommodate for batched case
- Modify docs

In a future PR, this will be renamed to `triangular_solve`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18025

Differential Revision: D14523004

Pulled By: ifedan

fbshipit-source-id: 11c6a967d107f969b60e5a5c73ce6bb8099ebbe1
2019-03-20 11:11:32 -07:00
vishwakftw
234bb8719a Add backend checks to solve methods (gesv, cholesky_solve) (#18116)
Summary:
Changelog:
- Incorporate a simple backend check in the linearSolveCheckInputs function in LinearAlgebraUtils.h
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18116

Differential Revision: D14504469

Pulled By: soumith

fbshipit-source-id: 7402b6dbaa8d73048946613b806d54f68bcbd8f4
2019-03-19 10:44:45 -07:00
Vishwak Srinivasan
421b508d55 Rename gesv to solve (#18060)
Summary:
Changelog:

- Renames `gesv` to `solve` to remain consistent with `cholesky_solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `solve` under the name `gesv`, and add a deprecated warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18060

Differential Revision: D14503117

Pulled By: zou3519

fbshipit-source-id: 99c16d94e5970a19d7584b5915f051c030d49ff5
2019-03-18 16:04:24 -07:00
Richard Zou
3c977fb7ce Error out on in-place (unary) ops on tensors that have internal overlap (#17927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17927
ghimport-source-id: 626d321e430b6b5c0ea3aa1eb9df8c1e2d058bf8

Stack:
* #17926 Implement at::has_internal_overlap helper function
* **#17927 Error out on in-place (unary) ops on tensors that have internal overlap**

On the way to #17935.

Works for CPU and CUDA on the following ops:
- abs_, acos_, asin_, atan_, ceil_, cos_, erf_, erfc_, exp_, expm1_
- floor_, log_, log10_, log1p_, log2_, round_, rsqrt_,
- sin_, sqrt_, tan_, tanh_, trunc_

This PR adds a check to see if the out/result tensor has internal
overlap. If it does, then we error out because the result **may** be
incorrect.

This is overly conservative; there are some cases where if the result is
the same as the input, the inplace operation is OK (such as floor_,
round_, and trunc_). However, the current code isn't organized in such a
way that this is easy to check, so enabling those will come in the future.

Reviewed By: ezyang

Differential Revision: D14438871

fbshipit-source-id: 15e12bf1fdb2ab7f74bb806e22bc74840bd6abd1
2019-03-15 07:50:19 -07:00
Richard Zou
a4123decf7 Implement at::has_internal_overlap helper function (#17926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17926
ghimport-source-id: 9f7572b5d43e474492363fa17dcb86a6c27ca13c

Stack:
* **#17926 Implement at::has_internal_overlap helper function**
* #17927 Error out on in-place (unary) ops on tensors that have internal overlap

On the way to #17935.

Checks if a tensor's sizes/strides indicate that multiple elements share
the same memory location. This problem in general is hard so
at::has_internal_overlap implements two heuristics and avoids solving
the general problem:

if a tensor is contiguous, it cannot have internal overlap
if a tensor has any zero strides, it does have internal overlap
otherwise, return MemOverlap::kTooHard to indicate that there might be
overlap, but we don't know.

Reviewed By: ezyang

Differential Revision: D14438858

fbshipit-source-id: 607ab31771315921ab6165b2a1f072ac3e75925a
2019-03-15 07:50:17 -07:00
J M Dieterich
1ba1ca0acb Update to ROCm2.2 (#18007)
Summary:
ROCm 2.2 was released today, if we respin the CI docker images with the attached, PyTorch/Caffe2 will support ROCm 2.2

Changes necessary:
* for the Ubuntu target, HIP PR 934 needs to be applied to fix the forceinline definition. ROCm 2.3 will contain this.
* two unit tests proof flaky on different platforms, disable them defensively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18007

Differential Revision: D14473903

Pulled By: bddppq

fbshipit-source-id: b1939f11d1c765a3bf71bb244b15f6ceb0e816d3
2019-03-14 18:47:22 -07:00
Lu Fang
f827f1052a Fix the CI
Summary: https://github.com/pytorch/pytorch/pull/17995 's CI has verified it should fix the CI.

Reviewed By: bddppq

Differential Revision: D14447674

fbshipit-source-id: 50085db9ae7421b5be216ed0a2216234babfdf6c
2019-03-13 17:28:50 -07:00
Guanheng Zhang
26a4c2ada6 Speed up gemm by reordering the for loops (#17730)
Summary:
Optimize the order of the "for" loops.

Note: For "transa = true" cases, the order of the "for" loops has been optimzied in the original code. Therefore, no significant improvement is observed in those case (i.e. "transa && transb" and "transa && !transb")

mode/opt (i.e. static libary)
//////////////////////////////////////////////////////////////////////////////
transa && transb
after:
loops:  2229     x:     128      y:     128      z:     128      time:  2243ns      =>  acceleration multiplier:  0.90
loops:  124      x:     128      y:     1024     z:     128      time:  40381ns      =>  acceleration multiplier:  0.97
loops:  121      x:     1024     y:     128      z:     128      time:  41651ns      =>  acceleration multiplier:  0.96
loops:  15       x:     1024     y:     1024     z:     128      time:  333771ns       =>  acceleration multiplier:  0.98
loops:  4610     x:     128      y:     128      z:     64       time:  1084ns       =>  acceleration multiplier:  0.95
loops:  252      x:     128      y:     1024     z:     64       time:  19860ns      =>  acceleration multiplier:  0.98
loops:  248      x:     1024     y:     128      z:     64       time:  20232ns      =>  acceleration multiplier:  0.98
loops:  30       x:     1024     y:     1024     z:     64       time:  167338ns      =>  acceleration multiplier:  0.99

before:
loops:  2468     x:     128      y:     128      z:     128      time:  2026ns
loops:  128      x:     128      y:     1024     z:     128      time:  39338ns
loops:  126      x:     1024     y:     128      z:     128      time:  39930ns
loops:  16       x:     1024     y:     1024     z:     128      time:  327549ns
loops:  4840     x:     128      y:     128      z:     64       time:  1033ns
loops:  258      x:     128      y:     1024     z:     64       time:  19441ns
loops:  252      x:     1024     y:     128      z:     64       time:  19854ns
loops:  31       x:     1024     y:     1024     z:     64       time:  166254ns

//////////////////////////////////////////////////////////////////////////////
transa && !transb
after:
loops:  4880     x:     128      y:     128      z:     128      time:  1024ns      =>  acceleration multiplier:  0.98
loops:  638      x:     128      y:     1024     z:     128      time:  7839ns      =>  acceleration multiplier:  1.04
loops:  605      x:     1024     y:     128      z:     128      time:  8276ns      =>  acceleration multiplier:  1.01
loops:  77       x:     1024     y:     1024     z:     128      time:  65713ns      =>  acceleration multiplier:  1.00
loops:  9935     x:     128      y:     128      z:     64       time:  503ns      =>  acceleration multiplier:  1.00
loops:  1252     x:     128      y:     1024     z:     64       time:  3994ns      =>  acceleration multiplier:  1.00
loops:  1183     x:     1024     y:     128      z:     64       time:  4226ns      =>  acceleration multiplier:  0.98
loops:  153      x:     1024     y:     1024     z:     64       time:  32766ns      =>  acceleration multiplier:  0.99

before:
loops:  4985     x:     128      y:     128      z:     128      time:  1003ns
loops:  615      x:     128      y:     1024     z:     128      time:  8140ns
loops:  599      x:     1024     y:     128      z:     128      time:  8357ns
loops:  76       x:     1024     y:     1024     z:     128      time:  65934ns
loops:  9897     x:     128      y:     128      z:     64       time:  505ns
loops:  1248     x:     128      y:     1024     z:     64       time:  4008ns
loops:  1203     x:     1024     y:     128      z:     64       time:  4159ns
loops:  154      x:     1024     y:     1024     z:     64       time:  32499ns

//////////////////////////////////////////////////////////////////////////////
!transa && transb
after:
loops:  3919     x:     128      y:     128      z:     128      time:  1276ns      =>  acceleration multiplier:  2.97
loops:  497      x:     128      y:     1024     z:     128      time:  10069ns      =>  acceleration multiplier:  7.85
loops:  449      x:     1024     y:     128      z:     128      time:  11145ns      =>  acceleration multiplier:  4.77
loops:  57       x:     1024     y:     1024     z:     128      time:  88595ns      =>  acceleration multiplier:  7.12
loops:  7575     x:     128      y:     128      z:     64       time:  660ns      =>  acceleration multiplier:  3.00
loops:  967      x:     128      y:     1024     z:     64       time:  5173ns      =>  acceleration multiplier:  7.66
loops:  877      x:     1024     y:     128      z:     64       time:  5702ns      =>  acceleration multiplier:  4.76
loops:  111      x:     1024     y:     1024     z:     64       time:  45232ns      =>  acceleration multiplier:  7.03

before:
loops:  1320     x:     128      y:     128      z:     128      time:  3789ns
loops:  64       x:     128      y:     1024     z:     128      time:  79061ns
loops:  95       x:     1024     y:     128      z:     128      time:  53107ns
loops:  8        x:     1024     y:     1024     z:     128      time:  631161ns
loops:  2521     x:     128      y:     128      z:     64       time:  1983ns
loops:  127      x:     128      y:     1024     z:     64       time:  39604ns
loops:  185      x:     1024     y:     128      z:     64       time:  27128ns
loops:  16       x:     1024     y:     1024     z:     64       time:  318155ns

//////////////////////////////////////////////////////////////////////////////
!transa && !transb
after:
loops:  3895     x:     128      y:     128      z:     128      time:  1283ns      =>  acceleration multiplier:  1.73
loops:  393      x:     128      y:     1024     z:     128      time:  12746ns      =>  acceleration multiplier:  3.36
loops:  411      x:     1024     y:     128      z:     128      time:  12170ns      =>  acceleration multiplier:  1.93
loops:  46       x:     1024     y:     1024     z:     128      time:  110116ns      =>  acceleration multiplier:  3.17
loops:  7404     x:     128      y:     128      z:     64       time:  675ns      =>  acceleration multiplier:  1.58
loops:  636      x:     128      y:     1024     z:     64       time:  7872ns      =>  acceleration multiplier:  2.70
loops:  724      x:     1024     y:     128      z:     64       time:  6911ns      =>  acceleration multiplier:  1.32
loops:  73       x:     1024     y:     1024     z:     64       time:  68502ns      =>  acceleration multiplier:  2.49

before:
loops:  2253     x:     128      y:     128      z:     128      time:  2219ns
loops:  117      x:     128      y:     1024     z:     128      time:  42788ns
loops:  214      x:     1024     y:     128      z:     128      time:  23465ns
loops:  15       x:     1024     y:     1024     z:     128      time:  349076ns
loops:  4694     x:     128      y:     128      z:     64       time:  1065ns
loops:  236      x:     128      y:     1024     z:     64       time:  21251ns
loops:  549      x:     1024     y:     128      z:     64       time:  9108ns
loops:  30       x:     1024     y:     1024     z:     64       time:  170799ns
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17730

Differential Revision: D14325149

Pulled By: zhangguanheng66

fbshipit-source-id: a7a5a83890fdf99fee6eb87a3a5060b7b6bd862f
2019-03-13 08:57:26 -07:00
Edward Yang
6466ddbd86 Fix lint in test_torch.py (#17807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17807

Lint also detected a bug in test_linspace where we weren't
actually testing the CUDA case.

Differential Revision: D14388241

fbshipit-source-id: e219e46400f4952c6b384bca3baa0724ef94acde
2019-03-12 13:48:28 -07:00
Thomas Viehmann
aba9051a65 kthvalue consistency with sort in the presence of NaN (#17824)
Summary:
This PR causes kthvalue to be consistent with sort
(i.e. treat NaN as larger than any number), so that
`a.kthvalue(n) == a.sort()[n - 1]`.

One drawback is that median with a NaN argument does not return NaN,
which is a deviation from NumPy.

Thank you, ngimel, for raising this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17824

Differential Revision: D14410092

Pulled By: ezyang

fbshipit-source-id: bdec2d8272dc4c65bcf2f9b8995e237774c44c02
2019-03-12 08:49:19 -07:00
vishwakftw
f268370b42 torch.btrifact for tensors with greater than 3 dimensions (#14964)
Summary:
Motivation:
- Earlier, `torch.btrifact` could not handle tensors with greater than 3 dimensions. This is because of the check:
>   AT_CHECK(THTensor_(nDimension)(a) == 3, "expected 3D tensor, got size: ", a->sizes());

What is in this PR?:
- Move `btrifact` to ATen
- Remove relation to TH/THC.
- Handle tensors with more than three dimensions
- Tests
- Docs modifications: added a note about the non-pivoting variant.

[blocked due to old magma-cuda binaries]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14964

Differential Revision: D14405106

Pulled By: soumith

fbshipit-source-id: f051f5d6aaa45f85836a2867176c065733563184
2019-03-12 01:46:07 -07:00
Iurii Zdebskyi
4aa22833cf Bool tensor creation (cpu) (#17376)
Summary:
This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
    1. Storage Implementation [Done]
    2. Tensor Creation.
        a) CPU (this PR)
        b) CUDA
    3. Tensor Conversions.
    4. Tensor Indexing.
    5. Tensor Operations.
    6. Back compatibility related changes.

**Change**:
Enable CPU tensors and these operations:
- torch.zeros
- torch.tensor
- torch.ones
- torch.randint
- torch.full
- torch.full_like
- torch.empty
- torch.empty_like

**Tested via**:
1) unit tests

2)
torch.zeros(2,2, dtype=torch.bool)
torch.tensor([True, False], dtype=torch.bool)
torch.tensor([-1, -1.1, 0, 1, 1.1, 2], dtype=torch.bool)
torch.ones([1,2], dtype=torch.bool)
torch.randint(10, (2, 2), dtype=torch.bool)
torch.full((2, 3), True, dtype=torch.bool)
torch.empty(4, dtype=torch.bool)

a = torch.tensor([0,0,1])
b = torch.full_like(a, True)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17376

Reviewed By: ezyang

Differential Revision: D14375995

Pulled By: izdeby

fbshipit-source-id: a65490b5360ee0e6e3accc54ce7e32e49ad2d2a8
2019-03-11 17:03:40 -07:00
Gao, Xiang
11c89dde55 Allow structseq to be input of operators where tuple is expected (#17208)
Summary:
Currently the following code gives an error on python 2 because `ret` is a structseq which is not a tuple
```python
ret = a.max(dim=0)
ret1 = torch.max(a, dim=0, out=ret)
```

This PR modify tuple check in python arg parser to allow structseq to be input of operators where tuple is expected, which would make the above code work.

Depend on: https://github.com/pytorch/pytorch/pull/17136
Partially fixes: https://github.com/pytorch/pytorch/issues/16813
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17208

Differential Revision: D14280198

Pulled By: VitalyFedyunin

fbshipit-source-id: beffebfd3951c4f5c7c8fe99a5847616a89491f3
2019-03-11 11:33:35 -07:00
bhushan
b57fe3cc66 Introducing array-like sequence methods __contains__ (#17733)
Summary:
for tensor

Fixes: #17000
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17733

Differential Revision: D14401952

Pulled By: soumith

fbshipit-source-id: c841b128c5a1fceda1094323ed4ef1d0cf494909
2019-03-11 09:00:16 -07:00
bhushan
6bcff88d3e Fix log_softmax and softmax if any dimension is 0-d (#17651)
Summary:
- Test added
- test_dim_function_empty: softmax and log_softmax on last dimension

fixes: #17262
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17651

Differential Revision: D14349009

Pulled By: gchanan

fbshipit-source-id: b6f728f5c6be8ae7615749e3f0c201886632923e
2019-03-10 15:25:58 -07:00
vishwakftw
9d70e199f4 Move lerp to ATen, add functionality for tensor weights (#17348)
Summary:
Changelog:
- Remove TH/THC bindings
- Add tensor weights for `lerp`
- Modify derivatives appropriately
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17348

Differential Revision: D14355845

Pulled By: soumith

fbshipit-source-id: eaede4c09ee589d77ba6cf52583510ea8e3a2fcf
2019-03-07 14:04:58 -08:00