Commit Graph

122 Commits

Author SHA1 Message Date
Du Phan
c345212c86 Support gpu triangle solve (#6648)
* add cuda trtrs

* remove queue

* add test trtrs
2018-04-17 14:33:39 +02:00
Richard Zou
6c0f74089f
More precise digamma (#6517)
* More precise digamma

Fixes #6190.

This is a rebase of #3955 with some tweaks for better performance around
poles. The code is ported over from cephes with permission.

By itself, the cephes code returns inf for the poles.

For better performance around the poles with float32, one intermediate
step is always computed with double precision, regardless of dtype.
This step does `PI / tan(PI * input)`. This is necessary because small (1e-6)
rounding errors for the inputs to tan have strong effects on the output
(ie, the derivative of tan is very large at some points).

* Replace usages of finite-differences digamma with newly implemented digamma

* Better behavior near and at poles

* ScalarConvert -> scalar_cast for readability
2018-04-13 11:49:09 -04:00
gchanan
749d51414a
Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00
Tongzhou Wang
37d5c58f4b Skip all TestTorch tests in test_cuda.py (#6489) 2018-04-10 20:31:05 -04:00
albanD
bb097e2a50 [pytorch] Fix signed random_ (#6463)
* Fix cpu signed random

* fix gpu signed tensor

* add test for signed random_

* cleaner tests

* fix lint
2018-04-10 13:07:04 -04:00
Vishwak Srinivasan
0aa35780bf [ready] Implement log2 and log10 in PyTorch (#6272)
* Implemented log2 and log10

* Re-add incorrectly removed files

* Fix minor bugs

* Fix log1p docs

* Add a try-except for python2 math module in log2 test

* Revert changes made to aten/doc/*

* Fix docstring errors

* Fix windows build
2018-04-05 14:28:37 -04:00
Tongzhou Wang
ecd5de0f36 [fft][2 of 3] Forward for fft methods (#5856)
* implement fft ifft rfft irfft

* add tests for fft ifft rfft irfft
2018-03-28 18:44:29 -04:00
gchanan
6ae0576e1c
Remove dtypes from legacy tensor.new(...) (#6081)
This is in preparation for splitting out sparsity (layout) from dtypes; it's complex to maintain these
and tensor.new(...) is a legacy API in any case.
2018-03-28 18:37:21 -04:00
Richard Zou
9923701a0d Fix crash when cat-ing empty cuda tensors (#5971)
Fixes #5739. The CUDA path for `torch.cat` was missing a check for the
case where all input tensors are empty.
2018-03-23 22:22:39 -04:00
Vedanuj Goswami
08b1324ec2 Fix integer overflow in remainder operator (#5906)
* Fix integer overflow in remainder

* Fix remainder operator in CUDA

* Add tests for remainder integer overflow

* Add has_different_sign static function
2018-03-20 22:05:34 -04:00
Thomas Viehmann
7cbe63da86 improve handling of precision issue in torch.multinomial (solves #4858) (#5774)
* improve handling of precision issue in torch.multinomial (solves #4858)

* add test

* review feedback - eliminate size check. Thanks!
2018-03-17 10:26:22 -04:00
Tongzhou Wang
940a0ab67b Add logdet and slogdet (#5393)
* 1. Add logdet and slogdet in ATen side
2. Previously, det can return result with incorrect sign upon seeing symmetric
   matrices. This is caused by the wrong assumption I had on SVD (when input is
   symmetric U=V^T). This fixes it.
3. Moreover, after fixing 2 now QR is always needed for det forward. So I moved
   SVD to backward call. Since this is a specific variant of SVD, it is named as
   _svd_with_positive_UV_det, with derivative.yaml entry being svd_backward.
4. Updated/added backward functions for det, logdet and slogdet, which uses
   _svd_with_positive_UV_det and svd_backward inside.
5. Optimized svd_backward:
   a. Avoid unnecessary kernels when only sigma has gradient (this is the usual
      case, and also true with *det backward functions).
   b. Fix SVD double backward by avoiding a nan.

* 1. Add/update grad checks for det, logdet, and slogdet.
2. Fix an incorrect check for dim_args_idx in test_autograd.py
3. Add option to only test a subset of output values, specified by
   test_output_indices, for cases like slogdet where only the
   second output is differentiable.
4. Add better doc for the test generating list.

* Add/improve output tests for det, logdet and slogdet
Add a scaling to random matrices so closeness checks are more robust

* Remove unnecessaery Variable wrappers in some test files

* Add logdet slogdet docs

* Improve an err msg in THTensorLapack.c

* add inverse-based backward for invertible matrices
use svd only for non-invertible case, so don't need the special variant anymore

* use LU rather than QR
2018-03-16 09:23:00 -04:00
Richard Zou
74043b69c2 Alias torch.diagonal, torch.diagflat (#5622)
* Alias torch.diagonal, torch.diagflat

* Address comments; Add sanity tests for torch.diagonal and torch.diagflat
2018-03-09 23:46:42 -05:00
Richard Zou
8ab101ccee Implement pow() for integer types (#5526)
* CPU int-types pow()

* CUDA int-type pow()

* Cleanup + fix deleted line

* Tests for integer-types pow

* Fix build

* Fix windows tests

* Make _test_int_pow static
2018-03-08 22:33:32 -05:00
Richard Zou
461e3e3ae0 Allow indexing tensors with both CPU and CUDA tensors (#5583)
* Allow indexing tensors with both CPU and CUDA tensors

* Remove stray import
2018-03-07 10:24:12 -05:00
Will Feng
9235277dba Re-enable some CUDA tests on Windows (#5446)
This PR enables the following tests on Windows again:

CUDA HalfTensor tests in test_torch.py and test_nn.py
test_Conv2d_deterministic_cudnn in test_nn.py
test_*Tensor_qr_big in test_cuda.py

The issues are no longer reproducible, possibly because of an upgrade to the display driver.

* Reenable CUDA HalfTensor tests on Windows

* Reenable test_Conv2d_deterministic_cudnn on Windows

* Reenable test_*Tensor_qr_big on Windows
2018-03-01 12:21:17 -05:00
Sam Gross
509aed6ca3
More Variable/Tensor clean-ups (#5464) 2018-02-28 16:46:47 -05:00
gchanan
94938be367
Support dtypes in legacy new constructors. (#5343)
* Support dtypes in legacy new constructors.

* Add comment about why we don't have dtype for sparse (indices, values).

* separate legacy tensor ctor vs new (new includes dtypes).

* Use TypeError.
2018-02-28 12:52:11 -05:00
Sam Gross
30ec06c140
Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
Ailing
3ef2e484bf Add fp16 testcases in test_cuda (#5122) 2018-02-21 14:35:29 +01:00
Richard Zou
70e71391d2 Fix THCTensor_(max) and THCTensor_(min) inits (#5265)
Their cuda kernels should be initialized with (min_value, 0) and
(max_value, 0), respectively, where the second number is a default index
value. However, they were being initialized with (max, 1) and (min, 1)
instead, probably a remnant from the lua torch days.

This caused bugs in torch.max() and torch.min() when the input is at the
extreme values, and the max value (or min value) occurs at index 0. For example,

  import torch
  x = torch.ByteTensor([[0]])
  x.cuda().max(dim=0)  # returns (0, 1) but the expected result is (0, 0)
2018-02-15 14:41:19 -08:00
Sam Gross
85e22b5475
Reverts force_gpu_half changes from #3660 (#5000)
The test_cuda.py setup purports to test half tensors, but actually just
re-tests FloatTensors because the keys in type_map were str instead of
type. Testing HalfTensors is more complicated, requiring changes to
precision and requires excluding some unimplemented methods.

We should fully test half CUDA tensors. This change just deletes the
duplicate tests of FloatTensor.
2018-02-07 15:33:17 -05:00
Tongzhou Wang
47ee86776e Fix CPU torch.multinomial with noncontiguous prob tensor (#5093)
* fix CPU torch.multinomial not working on noncontiguous probability distn'

* address comments

* change some tabs to spaces in THStorage.c
2018-02-06 22:11:43 -05:00
Peter Goldsborough
86fd5fd524 Replace async with non_blocking for Python 3.7 (#4999)
* Replace async with non_blocking for Python 3.7 upgrade

* Remove trailing whitespace

* Give _cuda and _type kwargs and accept async for compatibility

* Rename async to non_blocking in all C++ code

* Add entries for async in python_variable_methods

* Friendlier backward compatibility for cuda and type
2018-02-02 09:23:51 -05:00
albanD
6c197c2f15 fix triu and tril for zero-strided inputs on gpu (#4962) 2018-01-31 14:38:49 -05:00
Will Feng
82fed06535 disable qr_big cuda test on Windows (#4747) 2018-01-23 21:29:32 -05:00
Richard Zou
c7a2e318ed Restore cuda variable.bernoulli() (#4787) 2018-01-23 21:12:47 -05:00
Adam Paszke
1061d7970d Move broadcast and broadcast_coalesced to C++ 2018-01-18 11:16:45 +01:00
Tongzhou Wang
5918243b0c Methods for checking CUDA memory usage (#4511)
* gpu mem allocated

* add test

* addressed some of @apaszke 's comments

* cache stats

* add more comments about test
2018-01-09 11:47:48 -05:00
Sam Gross
b8fd57a0cc
Fix handling of empty indices in CUDA Tensor.put_ (#4486)
Fixes #4386
2018-01-05 12:58:27 -05:00
Will Feng
c6adee0807 disable CUDA HalfTensor tests in test_cuda for Windows (#4482) 2018-01-04 22:58:13 +01:00
Fritz Obermeyer
35abc4efa2 Add low-precision digamma() and polygamma() functions (#4399) 2018-01-02 11:53:23 +01:00
Vishwak Srinivasan
e519ef5337 Adding torch.expm1() and its inplace function (#4350) 2017-12-28 18:56:03 +09:00
Sam Gross
1632ab2979
Fix default device for Variable.new() (#4307)
Variable.new() should default to the device of "self" if no device is
specified. Previously, we were using the current device. This now
matches Tensor.new().
2017-12-21 18:35:35 -05:00
Tongzhou Wang
d8b2e5d091 Add python only default init expression; Implement stft, hann/hamming/bartlett window. (#4095)
* implement stft

* addressed comments; implemented window functions; added support for python only default initialization
2017-12-18 12:28:23 -05:00
Tongzhou Wang
e0d5d1b7c9 view in certain noncontig case (#4062) 2017-12-18 02:08:17 -05:00
Richard Zou
9394e65b44 Add proper shape checking to torch.cat (#4087)
* Fix catArray in THTensor

Asserts that the inputs have the same size except in the
cat dimension or are empty (or a mix of both).

* Fix catArray for THCTensor

* Document torch.cat shape checks

* Fix types
2017-12-18 02:05:58 -05:00
Sam Gross
bec0349280 Implement Variable.cuda and Variable.type using ATen (#4139)
* Implement Variable.cuda using ATen

This adds an optional async flag to Tensor::copy_, which attempts to do
a non-blocking copy if the one of the tensors is in pinned memory and
the other is a CUDA tensor.

* Perform cross-device copy in CopyBackwards

Also call torch.cuda._lazy_init() from Variable.cuda()

* Implement Variable.type via ATen

* Changes from review:

 - remove copy_out
 - remove unnecessary include
 - fix default device for .cuda()

* Combine if statements in dispatch_type
2017-12-18 01:54:35 -05:00
Richard Zou
dac5e6568d Better error messages for blas ops with cuda.LongTensor (#4160)
* Better error messages for blas ops with cuda.LongTensor

Fixes #4157

Test plan

Try matrix multiplying with cuda.LongTensors

>>> import torch
>>> x = torch.randn(4, 4).long().cuda()
>>> y = torch.randn(4, 4).long().cuda()
>>> x.mm(y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: addmm for CUDA tensors only supports floating-point types. Try converting the tensors with .flo
at() at /private/home/rzou/pytorch/pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:381
2017-12-14 11:28:59 -05:00
Sam Gross
aeb7a3668d
Implement Variable.new (#4080) 2017-12-11 15:45:43 -05:00
Tongzhou Wang
c681b03d37 Add determinant function on variable; Add backward on svd (#3816)
* determinant on variable

* svd bwd
2017-12-01 13:22:46 -05:00
Adam Paszke
6ae0d477ea Fix cuBLAS arguments for fp16 dot (#3660)
* Fix cuBLAS arguments for fp16 dot

* Enable FloatTensor <-> CUDA HalfTensor checks in test_cuda.py
2017-11-29 07:16:34 -08:00
Richard Zou
ec389f5128 Fix cuda symeig (#3566)
* Fix cuda symeig

* Add symeig test

* Better check for magma
2017-11-08 20:20:14 -05:00
Richard Zou
00d2befba1 THTensor_varOuterDim numeric stability (#3533) 2017-11-07 13:47:20 -05:00
Richard Zou
3d06a1e075 Make THCTensor_varInnermostDim numerically stable using Welford's algorithm (#3425)
* Use Welford's algorithm when reducing along inner dimension for THCTensor's variance fn

* Use accreals in THCTensor's varInnermostDim

* Skip cuda tests if no cuda

* Variance testing
2017-11-06 16:00:29 -05:00
SsnL
8fd171a6fd add test_index to test_cuda 2017-11-06 14:21:31 -05:00
Sam Gross
7c0b16c140 Add torch.take and Tensor.put_ (#3263)
* Add torch.take and Tensor.put_

These are similar to numpy.take and numpy.put. The take function allows
you to linearly index into a tensor without viewing it as a 1D tensor
first. The output has the same shape as the indices. The put function
copies value into a tensor also using linear indices.
2017-11-01 06:04:44 -04:00
SsnL
91a8d3325e test sparse dp, broadcast_coalesced, reduce_add_coalesced 2017-10-28 18:52:35 -04:00
Ozan Çağlayan
e43a63a968 tensor: Ensure that the tensor is contiguous before pinning (#3266) (#3273)
* tensor: Ensure that the tensor is contiguous before pinning (#3266)

pin_memory() was producing out-of-order tensor when the given
tensor was transposed, i.e. in column-major order.
This commit fixes this by calling contiguous() before pinning.

* test: add contiguous test for pin_memory (#3266)
2017-10-25 13:17:54 +02:00
SsnL
634c8315a4 isContiguous problems (#3148)
* with the size=1 case, impossible to do single point check, replace with isContiguousRange

* fix stride in desc; fix undef scope

* add test for this case for cudnn

* assertTrue
2017-10-20 10:20:33 -04:00