Commit Graph

295 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu
13a684d50b Fix test TestCuda.test_streams_multi_gpu_query (#23912)
Summary:
This is a similar issue as TestCuda.test_events_wait.

PyTorch test sets a policy() method to assertLeaksNoCudaTensors.
    Whenever a test is run, assertLeaksNoCudaTensors is called,
    which in turn calls CudaMemoryLeakCheck, which in turn calls
    initialize_cuda_context_rng, where it executes torch.randn
    on each device, where a kernel is launched on each device.

    Since the kernel may not finish on device 0, the first assertion
    self.assertTrue(s0.query()) fails.

    The fix is to insert

            torch.cuda.synchronize(d0)
            torch.cuda.synchronize(d1)

    at the beginning of the test so that previously launched kernels finish before the real
    test begins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23912

Differential Revision: D16688599

Pulled By: ezyang

fbshipit-source-id: 3de2b555e99f5bbd05727835b9d7c93a026a0519
2019-08-07 07:44:30 -07:00
Hong Xu
be7fe1ccb9 Add tests to ensure that both abs(0.0) and abs(-0.0) lead to 0.0 (#23701)
Summary:
As pointed out by colesbury in https://github.com/pytorch/pytorch/pull/23579#discussion_r309798987
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23701

Differential Revision: D16623781

Pulled By: mrshenli

fbshipit-source-id: f48a29499128b08d2ac8bc9e466f2326112ead94
2019-08-05 07:50:06 -07:00
vishwakftw
5d130e4232 Allowing batching for det/logdet/slogdet operations (#22909)
Summary:
Changelog:
- Add batching for det / logdet / slogdet operations
- Update derivative computation to support batched inputs (and consequently batched outputs)
- Update docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22909

Test Plan:
- Add a `test_det_logdet_slogdet_batched` method in `test_torch.py` to test `torch.det`, `torch.logdet` and `torch.slogdet` on batched inputs. This relies on the correctness of `torch.det` on single matrices (tested by `test_det_logdet_slogdet`). A port of this test is added to `test_cuda.py`
- Add autograd tests for batched inputs

Differential Revision: D16580988

Pulled By: ezyang

fbshipit-source-id: b76c87212fbe621f42a847e3b809b5e60cfcdb7a
2019-07-31 10:01:32 -07:00
Tongzhou Wang
af638ad5d7 pin_memory should not copy on already pinned tensors (#23484)
Summary:
fixes https://github.com/pytorch/pytorch/issues/21076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23484

Differential Revision: D16546264

Pulled By: ezyang

fbshipit-source-id: 8058e0bbc6336751f36b884d71234feef498a982
2019-07-30 21:16:23 -07:00
vishwakftw
b3a9a7a9b9 Rename gels to lstsq (#23460)
Summary:
Changelog:
- Rename `gels` to `lstsq`
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lstsq` under the name `gels` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23460

Test Plan: - All tests should pass to confirm that the patch is correct

Differential Revision: D16547834

Pulled By: colesbury

fbshipit-source-id: b3bdb8f4c5d14c7716c3d9528e40324cc544e496
2019-07-30 09:56:04 -07:00
Yaxun (Sam) Liu
0c9979dd7d Fix TestCuda.test_events_wait (#23520)
Summary:
PyTorch test sets a policy() method to assertLeaksNoCudaTensors.
Whenever a test is run, assertLeaksNoCudaTensors is called,
which in turn calls CudaMemoryLeakCheck, which in turn calls
initialize_cuda_context_rng, where it executes torch.randn
on each device, where a kernel is launched on each device.

Since the kernel may not finish on device 1, the assertion
self.assertTrue(s1.query()) fails.

The fix is to insert

        torch.cuda.synchronize(d0)
        torch.cuda.synchronize(d1)

at the beginning of the test so that previously launched kernels finish before the real
test begins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23520

Differential Revision: D16547701

Pulled By: soumith

fbshipit-source-id: 42ad369f909d534e15555493d08e9bb99dd64b6a
2019-07-29 13:09:41 -07:00
Hong Xu
236149edc5 Make randperm works properly on non-contiguous tensors. (#23043)
Summary:
Close https://github.com/pytorch/pytorch/issues/22710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23043

Differential Revision: D16446340

Pulled By: VitalyFedyunin

fbshipit-source-id: 1760af310fee71b369e1aaaf96546277058611c9
2019-07-29 11:59:04 -07:00
Johannes M Dieterich
4cd726c7b3 Update ROCm CI to python3.6 (#23088)
Summary:
Rehash of https://github.com/pytorch/pytorch/issues/22322 .

Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Added pattern match skip for anything but the ROCm CI compared to #223222 for the python find step in the PyTorch build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23088

Differential Revision: D16448261

Pulled By: bddppq

fbshipit-source-id: 69ece1a213418d9abf1444c496dce1c190ee07c8
2019-07-23 23:07:45 -07:00
Vishwak Srinivasan
0ab19d66ee Port lu_solve to ATen (#22379)
Summary:
Changelog:
- Port TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Remove TH/THC implementations
- Update doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22379

Test Plan: - Added new tests in test_torch.py (port to test_cuda.py exists)

Differential Revision: D16089645

Pulled By: zou3519

fbshipit-source-id: dc8561aadacacb23e80c375b4fec687df2b6bbc8
2019-07-23 19:11:35 -07:00
Junjie Bai
eb76b7a564 Revert D16199862: [pytorch][PR] [ROCm] Update ROCm CI to python3.6
Differential Revision:
D16199862

Original commit changeset: 46ca6029a232

fbshipit-source-id: 2843b919f2655674e39dc764053621994061a12b
2019-07-17 14:26:56 -07:00
iotamudelta
031b406c38 Update ROCm CI to python3.6 (#22322)
Summary:
Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Open tasks/questions:
* RoiAlignTest.CheckCPUGPUEqual fails in the Caffe2 unit tests. Is this something expects / can be skipped?
* for testing, I've used update-alternatives on CentOS/Ubuntu to select python == python 3.6. Is this the preferred way?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22322

Differential Revision: D16199862

Pulled By: ezyang

fbshipit-source-id: 46ca6029a232f7d23f3fdb5efc33ae39a379fca8
2019-07-17 13:42:30 -07:00
vishwakftw
7d055c21b3 Port SVD to ATen, enable batching for matrix inputs (#21588)
Summary:
Changelog:
- Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Allow batches of matrices as arguments to `torch.svd`
- Remove existing implementations in TH and THC
- Update doc string
- Update derivatives to support batching
- Modify nuclear norm implementation to use at::svd instead of _batch_svd
- Remove _batch_svd as it is redundant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588

Test Plan:
- Add new test suite for SVD in test_torch.py with port to test_cuda.py
- Add tests in common_methods_invocations.py for derivative testing

Differential Revision: D16266115

Pulled By: nairbv

fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb
2019-07-15 13:34:01 -07:00
Hong Xu
7750cae722 Refactor and improve randperm tests.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22121

Test Plan: Imported from OSS

Differential Revision: D16153794

Pulled By: li-roy

fbshipit-source-id: 4dbfa6cfcc79f6d431918a6646664215fa9ea0b9
2019-07-10 12:23:33 -07:00
Hong Xu
0f7c3710dd Support Half type in randperm.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22102

Test Plan: Imported from OSS

Differential Revision: D16153586

Pulled By: li-roy

fbshipit-source-id: d58e3dbc5da893005f4eaf521a28b0d752274eff
2019-07-10 12:23:25 -07:00
Hong Xu
574e808680 Add a bitwise NOT operator for integer and Boolean types (CUDA).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22320

Test Plan: Imported from OSS

Differential Revision: D16183578

Pulled By: colesbury

fbshipit-source-id: 2f72cce5e10fd637be1ac87e1bbfe0937a661034
2019-07-10 12:17:48 -07:00
Brandon Amos
046c4589df lu: When not using pivoting, return the identity permutation instead of zeros (#22242)
Summary:
Some of my qpth users have told me that updating to the latest version of PyTorch and replacing the btrifact/btrisolve calls with the LU ones wasn't working and I didn't believe them until I tried it myself :)

These updates have broken unpivoted LU factorizations/solves on CUDA. The LU factorization code used to return the identity permutation when pivoting wasn't used but now returns all zeros as the pivots. This PR reverts it back to return the identity permutation. I've not yet tested this code as I'm having some trouble compiling PyTorch with this and am hitting https://github.com/pytorch/pytorch/issues/21700 and am not sure how to disable that option.

Here's a MWE to reproduce the broken behavior, and my fix.

```python
torch.manual_seed(0)

n = 4
L = torch.randn(n,n)
A = L.mm(L.t()).unsqueeze(0)
b = torch.randn(1, n)

A_lu_cpu = torch.lu(A)
A_lu_cuda_nopivot = torch.lu(A.cuda(), pivot=False)
A_lu_cuda_pivot = torch.lu(A.cuda(), pivot=True)
print('A_lu_cuda_nopivot\n', A_lu_cuda_nopivot)
print('-----\nA_lu_cuda_pivot\n', A_lu_cuda_nopivot)

x_cpu = b.lu_solve(*A_lu_cpu)
x_cuda_nopivot = b.cuda().lu_solve(*A_lu_cuda_nopivot)
x_cuda_nopivot_fixed = b.cuda().lu_solve(
    A_lu_cuda_nopivot[0], torch.arange(1, n+1, device='cuda:0').int())
x_cuda_pivot = b.cuda().lu_solve(*A_lu_cuda_pivot)

print(x_cpu, x_cuda_nopivot, x_cuda_nopivot_fixed, x_cuda_pivot)
```

Output:

```
A_lu_cuda_nopivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

-----

A_lu_cuda_pivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

(tensor([[-0.3121, -0.1673, -0.4450, -0.2483]]),
 tensor([[-0.1661, -0.1875, -0.5694, -0.4772]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22242

Differential Revision: D16049334

Pulled By: ezyang

fbshipit-source-id: 7eacae810d87ffbdf8e07159bbbc03866dd9979d
2019-07-09 11:16:50 -07:00
iurii zdebskyi
59c42595e0 Enabled gather and scatter for bool tensor (#21924)
Summary:
- moving stuff around in order to enable bool.
- Added implementation of atomicAdd(bool, bool)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21924

Differential Revision: D15883711

Pulled By: izdeby

fbshipit-source-id: 733f35c2bc3d87cec9f9687d72b62d2d2cd7c03e
2019-06-27 09:07:50 -07:00
Edward Yang
8f9e0f77dd Turn off non-default stream testing. (#21793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21793
ghimport-source-id: 5264fa90ca77fbc79898cfa2f0ee02f47dec27d4

Test Plan: Imported from OSS

Differential Revision: D15874814

Pulled By: ezyang

fbshipit-source-id: 5c51ab9ae431faf2db549b88b07ba00783acab25
2019-06-18 07:00:08 -07:00
Stefan Krah
710821875a Fix flaky nuclear_norm() test (#21638)
Summary:
Try to fix a sporadic failure on some CIs.

I've run this test hundreds of times on my machine (GeForce 1060, MAGMA) but I cannot reproduce this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21638

Differential Revision: D15827779

Pulled By: ezyang

fbshipit-source-id: 3586075e48907b3b84a101c560a34cc733514a02
2019-06-14 11:40:03 -07:00
vishwakftw
4c03ac7ac4 Allow batch sizes > 65535 for inverse, solve, cholesky_solve and tria… (#21689)
Summary:
…ngular_solve

Changelog:
- Iterate over mini batches of 65535 matrices (maximum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21689

Differential Revision: D15800254

Pulled By: soumith

fbshipit-source-id: c743ff13f1ba25d26874429d44e41a3c0ed21d6a
2019-06-12 23:30:19 -07:00
vishwakftw
9737b166a4 Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes https://github.com/pytorch/pytorch/issues/21257, fixes https://github.com/pytorch/pytorch/issues/21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15761029

Pulled By: colesbury

fbshipit-source-id: 2aeb51e2d3cfdb8356806a7d5b12d4b9910e37fb
2019-06-11 15:18:17 -07:00
Stefan Krah
8b9b215dc5 Add a 'dim' argument to nuclear norm (#21022)
Summary:
Addresses #18275.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21022

Differential Revision: D15743515

Pulled By: ezyang

fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3
2019-06-10 15:18:34 -07:00
Vishwak Srinivasan
3df5a46a99 Skip triangular_solve CUDA test on non-default stream
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21590

Differential Revision: D15742549

Pulled By: ezyang

fbshipit-source-id: fd5b2cbce86e5f229c2ffba114ef362934296d07
2019-06-10 11:38:42 -07:00
huba
b144ba66d5 Change PyTorch tests to use non-default CUDA stream (#21474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21474
ghimport-source-id: b2477765362248a80557d1a20db02a1290bdcde3

Differential Revision: D15699700

Pulled By: fbhuba

fbshipit-source-id: 1aa4309fec0982c8477cfab29ca5f42d2b171f97
2019-06-07 10:24:48 -07:00
Edward Yang
8c9a88bdab Make test_cuda.py work on Python 2. (#21466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21466
ghimport-source-id: 0a235c8b8cf994621a5a5afe022340dd35764c91

Differential Revision: D15698096

Pulled By: ezyang

fbshipit-source-id: 1759c2681071e9c7e83de3de86daf4333c5f8f3a
2019-06-07 08:13:03 -07:00
vishwakftw
f6ec464890 Enable batched QR decomposition and add a some option (#20689)
Summary:
This PR covers two important points with respect to the QR decomposition:
- batching of input matrices (#7500)
- adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538)

Changelog:
- Enable batching for inputs to `torch.qr`
- Move QR decomposition implementation to ATen (CPU and CUDA)
- Remove existing implementations in TH/THC
- Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition
- Modify doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689

Differential Revision: D15529230

Pulled By: soumith

fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
2019-05-28 17:52:37 -07:00
Sam Gross
b85c52923b Re-land "Fix advanced indexing on "huge" Tensors" (#21019)
Summary:
This #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh
that broke the ROCm build.

cc bddppq

Original summary:

This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21019

Differential Revision: D15518477

Pulled By: colesbury

fbshipit-source-id: 4db5626fda76eb58250793e8aa7d4f2832db3a34
2019-05-28 12:45:56 -07:00
Junjie Bai
5ddbfc97e9 Revert D15501945: [pytorch][PR] Fix advanced indexing on "huge" Tensors
Differential Revision:
D15501945

Original commit changeset: e876e678e866

fbshipit-source-id: 2833eb118a62e301571a983529f6e4fc91442581
2019-05-27 20:26:37 -07:00
Sam Gross
b93bdf6989 Fix advanced indexing on "huge" Tensors (#20919)
Summary:
This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e

Fixes #20888
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20919

Differential Revision: D15501945

Pulled By: colesbury

fbshipit-source-id: e876e678e866d2efda8ee92c47a1d2d1310671f0
2019-05-24 16:25:04 -07:00
Sam Gross
dee11a92c1 Use Device instead of Backend in TensorIterator (#20690)
Summary:
This PR also moves Device::validate into the header file, which makes
statements like `Device d = kCPU` effectively free.

Device includes the device's index, so TensorIterator::compute_types
now implicitly checks that all CUDA inputs are on the same GPU.
Previously, this was done ad-hoc in places like TensorIterator::binary_op.

Note that zero-dim Tensor (scalars) are NOT required to be on the
same device as other inputs because they behave almost like Python numbers.
TensorIterator handles copying zero-dim Tensors to the common device.

Prior to this PR, TensorIterator would copy zero-dim Tensors between CPU
and GPU, but not between different GPUs (because Backend didn't encode
the GPU index). This removes that restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20690

Differential Revision: D15414826

Pulled By: colesbury

fbshipit-source-id: 1d0ad1f7d663252af36dd4590bcda418c2f7a09f
2019-05-24 12:14:08 -07:00
Sam Gross
320c38555e Refactor CUDA copy and general copy dispatch (#20685)
Summary:
Copy.cu goes from 308 to 190 lines of code. In general it uses, the same
copy strategy, using cudaMempcyAsync, a pointwise kernel, or a copy
using temporary buffers. The pointwise kernel has slightly improved
performance when broadcasting due to faster index calculation.

This deletes "`s_copy_`", "`_s_copy_from`", and "`_copy_same_type_`". The only
entry-point now is "`copy_`".

A mini-benchmark is here:
https://gist.github.com/colesbury/706de1d4e8260afe046020988410b992

Before:
https://gist.github.com/colesbury/ab454b6fe3791bff420d7bcf8c041f18
After:
https://gist.github.com/colesbury/9024d242b56ab09a9ec985fa6d1620bc

Results were measured on 2.2 GHz Broadwell; no-turbo; one thread;
compiled with GCC 7.3.0. (Results are slower than typical usage due to
turbo being off.)

The only significant differences is in the CUDA [1024] -> [1024, 1024]
broadcasting copy which is ~25% faster. I don't expect a noticeable
difference in real programs.

CPU copy overhead is a tiny bit (~200 ns) faster, but I don't expect
anyone to notice that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20685

Differential Revision: D15414819

Pulled By: colesbury

fbshipit-source-id: d3c6e04a5020470e3bef15b1fc09503cae5df440
2019-05-20 17:09:44 -07:00
Iurii Zdebskyi
71260b98e2 Fixed histc return type for CUDA (#20369)
Summary:
Fixing reported [issue](https://github.com/pytorch/pytorch/issues/20208).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20369

Reviewed By: zou3519

Differential Revision: D15300959

Pulled By: izdeby

fbshipit-source-id: 219692f99a66ea433112dfc226132eb6867122cf
2019-05-20 08:08:28 -07:00
Roy Li
163f0e182c Fix bug in non_blocking copy (#20305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20305
ghimport-source-id: eb3dacb10fd93bbb5a6bbe078ed1ec842163d0e6

Differential Revision: D15276094

Pulled By: li-roy

fbshipit-source-id: 4728f419aa050e6c94a4f62231fa1a86caa556a7
2019-05-11 15:20:19 -07:00
Phúc Lê
9b272affde Add base support to torch.logspace, default base=10 (#19542)
Summary:
Add base support for torch.logspace. See #19220 for details.
SsnL can you feedback? Thanks a lot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19542

Differential Revision: D15028484

Pulled By: soumith

fbshipit-source-id: fe5a58a203b279103abbc192c754c25d5031498e
2019-04-23 15:06:34 -07:00
SsnL
dce3d74dfb add torch.cuda.synchronize(device=None) (#19573)
Summary:
fixes https://github.com/pytorch/pytorch/issues/19509
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19573

Differential Revision: D15045730

Pulled By: ezyang

fbshipit-source-id: 732721b4b360fc4348ca7c87d4cd1386e7651bdd
2019-04-23 08:40:38 -07:00
vishwakftw
c30224ad21 Rename potri to cholesky_inverse (#19498)
Summary:
Changelog:
- Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498

Differential Revision: D15029901

Pulled By: ezyang

fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a
2019-04-22 08:18:39 -07:00
Tongzhou Wang
973d51079b Add device-specific cuFFT plan caches (#19300)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300

Differential Revision: D14986967

Pulled By: soumith

fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255
2019-04-18 06:39:35 -07:00
Richard Zou
eaa14f5f59 Error out on in-place binops on tensors with internal overlap (#19317)
Summary:
This adds checks for `mul_`, `add_`, `sub_`, `div_`, the most common
binops. See #17935 for more details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19317

Differential Revision: D14972399

Pulled By: zou3519

fbshipit-source-id: b9de331dbdb2544ee859ded725a5b5659bfd11d2
2019-04-17 13:02:07 -07:00
J M Dieterich
31686805f2 Enable unit tests for ROCm 2.3 (#19307)
Summary:
Unit tests that hang on clock64() calls are now fixed.

test_gamma_gpu_sample is now fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19307

Differential Revision: D14953420

Pulled By: bddppq

fbshipit-source-id: efe807b54e047578415eb1b1e03f8ad44ea27c13
2019-04-16 10:58:27 -07:00
Sam Gross
7caad0ed33 Free all blocks with outstanding events on OOM-retry (#19222)
Summary:
The caching allocator tries to free all blocks on an out-of-memory
error. Previously, it did not free blocks that still had outstanding
stream uses. This change synchronizes on the outstanding events and
frees those blocks.

See #19219
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19222

Differential Revision: D14925071

Pulled By: colesbury

fbshipit-source-id: a2e9fe957ec11b00ea8e6c0468436c519667c558
2019-04-15 11:29:27 -07:00
Johannes M Dieterich
d8669a2c7e Enable working ROCm tests (#19169)
Summary:
Enable multi-GPU tests that work with ROCm 2.2. Have been run three times on CI to ensure stability.

While there, remove skipIfRocm annotations for tests that depend on MAGMA. They still skip but now for the correct reason (no MAGMA) to improve our diagnostics.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19169

Differential Revision: D14924812

Pulled By: bddppq

fbshipit-source-id: 8b88f58bba58a08ddcd439e899a0abc6198fef64
2019-04-12 21:51:10 -07:00
Vishwak Srinivasan
487388d8ad Rename btrisolve to lu_solve (#18726)
Summary:
Changelog:
- Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726

Differential Revision: D14726237

Pulled By: zou3519

fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b
2019-04-09 15:21:24 -07:00
J M Dieterich
e45e3634d6 add launch bounds, enable more tests (#18909)
Summary:
Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use.

Enable tests that now work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909

Differential Revision: D14801490

Pulled By: ezyang

fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7
2019-04-05 10:17:15 -07:00
Roy Li
f5741eb855 Store ScalarType and Backend instead of Type in TensorIterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17601

Reviewed By: ezyang

Differential Revision: D14274754

fbshipit-source-id: b08880ae586b6ae57d4c0bbeb203796d087926c4
2019-04-04 02:24:16 -07:00
vishwakftw
baac5489a8 Expose alias multinomial methods to ATen (#17904)
Summary:
This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods.

cc: neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904

Differential Revision: D14700205

Pulled By: ezyang

fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3
2019-04-02 07:56:41 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Vishwak Srinivasan
e73be58ff7 Rename btriunpack to lu_unpack (#18529)
Summary:
Changelog:
- Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface.
- Rename all relevant tests, fix callsites
- Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529

Differential Revision: D14683161

Pulled By: soumith

fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1
2019-03-29 13:01:30 -07:00
Vishwak Srinivasan
d859031ebf Rename btrifact* to lu (#18435)
Summary:
Changelog:

- Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`).
- Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple.
- Rename all tests, fix callsites
- Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage.
- Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435

Differential Revision: D14680352

Pulled By: soumith

fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8
2019-03-29 00:34:30 -07:00
jithunnair-amd
fdedc62c26 enable more unit tests (#18537)
Summary:
Enable unit tests working with ROCm 2.3. In particular, these are unit tests where we skipped for double data types previously and some tests for multi-GPU setups.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18537

Differential Revision: D14651822

Pulled By: ezyang

fbshipit-source-id: 7dd575504ebe235a91489866c91000e9754b1235
2019-03-27 14:27:23 -07:00
Tongzhou Wang
5292685d2f Improve numerical precision of (s)logdet (#18449)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449

Differential Revision: D14611638

Pulled By: soumith

fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf
2019-03-26 15:32:14 -07:00