* 1. Add logdet and slogdet in ATen side
2. Previously, det can return result with incorrect sign upon seeing symmetric
matrices. This is caused by the wrong assumption I had on SVD (when input is
symmetric U=V^T). This fixes it.
3. Moreover, after fixing 2 now QR is always needed for det forward. So I moved
SVD to backward call. Since this is a specific variant of SVD, it is named as
_svd_with_positive_UV_det, with derivative.yaml entry being svd_backward.
4. Updated/added backward functions for det, logdet and slogdet, which uses
_svd_with_positive_UV_det and svd_backward inside.
5. Optimized svd_backward:
a. Avoid unnecessary kernels when only sigma has gradient (this is the usual
case, and also true with *det backward functions).
b. Fix SVD double backward by avoiding a nan.
* 1. Add/update grad checks for det, logdet, and slogdet.
2. Fix an incorrect check for dim_args_idx in test_autograd.py
3. Add option to only test a subset of output values, specified by
test_output_indices, for cases like slogdet where only the
second output is differentiable.
4. Add better doc for the test generating list.
* Add/improve output tests for det, logdet and slogdet
Add a scaling to random matrices so closeness checks are more robust
* Remove unnecessaery Variable wrappers in some test files
* Add logdet slogdet docs
* Improve an err msg in THTensorLapack.c
* add inverse-based backward for invertible matrices
use svd only for non-invertible case, so don't need the special variant anymore
* use LU rather than QR
* CPU int-types pow()
* CUDA int-type pow()
* Cleanup + fix deleted line
* Tests for integer-types pow
* Fix build
* Fix windows tests
* Make _test_int_pow static
This PR enables the following tests on Windows again:
CUDA HalfTensor tests in test_torch.py and test_nn.py
test_Conv2d_deterministic_cudnn in test_nn.py
test_*Tensor_qr_big in test_cuda.py
The issues are no longer reproducible, possibly because of an upgrade to the display driver.
* Reenable CUDA HalfTensor tests on Windows
* Reenable test_Conv2d_deterministic_cudnn on Windows
* Reenable test_*Tensor_qr_big on Windows
* Support dtypes in legacy new constructors.
* Add comment about why we don't have dtype for sparse (indices, values).
* separate legacy tensor ctor vs new (new includes dtypes).
* Use TypeError.
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.
To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.
There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:
https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
Their cuda kernels should be initialized with (min_value, 0) and
(max_value, 0), respectively, where the second number is a default index
value. However, they were being initialized with (max, 1) and (min, 1)
instead, probably a remnant from the lua torch days.
This caused bugs in torch.max() and torch.min() when the input is at the
extreme values, and the max value (or min value) occurs at index 0. For example,
import torch
x = torch.ByteTensor([[0]])
x.cuda().max(dim=0) # returns (0, 1) but the expected result is (0, 0)
The test_cuda.py setup purports to test half tensors, but actually just
re-tests FloatTensors because the keys in type_map were str instead of
type. Testing HalfTensors is more complicated, requiring changes to
precision and requires excluding some unimplemented methods.
We should fully test half CUDA tensors. This change just deletes the
duplicate tests of FloatTensor.
* Replace async with non_blocking for Python 3.7 upgrade
* Remove trailing whitespace
* Give _cuda and _type kwargs and accept async for compatibility
* Rename async to non_blocking in all C++ code
* Add entries for async in python_variable_methods
* Friendlier backward compatibility for cuda and type
Variable.new() should default to the device of "self" if no device is
specified. Previously, we were using the current device. This now
matches Tensor.new().
* Fix catArray in THTensor
Asserts that the inputs have the same size except in the
cat dimension or are empty (or a mix of both).
* Fix catArray for THCTensor
* Document torch.cat shape checks
* Fix types
* Implement Variable.cuda using ATen
This adds an optional async flag to Tensor::copy_, which attempts to do
a non-blocking copy if the one of the tensors is in pinned memory and
the other is a CUDA tensor.
* Perform cross-device copy in CopyBackwards
Also call torch.cuda._lazy_init() from Variable.cuda()
* Implement Variable.type via ATen
* Changes from review:
- remove copy_out
- remove unnecessary include
- fix default device for .cuda()
* Combine if statements in dispatch_type
* Better error messages for blas ops with cuda.LongTensor
Fixes#4157
Test plan
Try matrix multiplying with cuda.LongTensors
>>> import torch
>>> x = torch.randn(4, 4).long().cuda()
>>> y = torch.randn(4, 4).long().cuda()
>>> x.mm(y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: addmm for CUDA tensors only supports floating-point types. Try converting the tensors with .flo
at() at /private/home/rzou/pytorch/pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:381
* Use Welford's algorithm when reducing along inner dimension for THCTensor's variance fn
* Use accreals in THCTensor's varInnermostDim
* Skip cuda tests if no cuda
* Variance testing
* Add torch.take and Tensor.put_
These are similar to numpy.take and numpy.put. The take function allows
you to linearly index into a tensor without viewing it as a 1D tensor
first. The output has the same shape as the indices. The put function
copies value into a tensor also using linear indices.
* tensor: Ensure that the tensor is contiguous before pinning (#3266)
pin_memory() was producing out-of-order tensor when the given
tensor was transposed, i.e. in column-major order.
This commit fixes this by calling contiguous() before pinning.
* test: add contiguous test for pin_memory (#3266)
* with the size=1 case, impossible to do single point check, replace with isContiguousRange
* fix stride in desc; fix undef scope
* add test for this case for cudnn
* assertTrue
test_FloatTensor_qr_big test is still a bit flaky on K80. Increasing tolerance to improve reliability as tests are moved around and results change for this test.
Here's the command I used to invoke autopep8 (in parallel!):
git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i
Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.
Also configures flake8 to match pep8's behavior.
Also configures TravisCI to check the whole project for lint.
* Fix error in ELU backward
* Add --seed flag for testst st
* Add test for BatchNorm eval
* Fix autograd.backward docs
* Support cc flags in cuDNN search
* Fix IndexSelect backward formula
See issue #20
The torch.Size class is a tuple subclass which distinguishes sizes from
other tuples so that torch.Tensor(size) is interpreted as size instead
of data.