The test_cuda.py setup purports to test half tensors, but actually just
re-tests FloatTensors because the keys in type_map were str instead of
type. Testing HalfTensors is more complicated, requiring changes to
precision and requires excluding some unimplemented methods.
We should fully test half CUDA tensors. This change just deletes the
duplicate tests of FloatTensor.
* Replace async with non_blocking for Python 3.7 upgrade
* Remove trailing whitespace
* Give _cuda and _type kwargs and accept async for compatibility
* Rename async to non_blocking in all C++ code
* Add entries for async in python_variable_methods
* Friendlier backward compatibility for cuda and type
Variable.new() should default to the device of "self" if no device is
specified. Previously, we were using the current device. This now
matches Tensor.new().
* Fix catArray in THTensor
Asserts that the inputs have the same size except in the
cat dimension or are empty (or a mix of both).
* Fix catArray for THCTensor
* Document torch.cat shape checks
* Fix types
* Implement Variable.cuda using ATen
This adds an optional async flag to Tensor::copy_, which attempts to do
a non-blocking copy if the one of the tensors is in pinned memory and
the other is a CUDA tensor.
* Perform cross-device copy in CopyBackwards
Also call torch.cuda._lazy_init() from Variable.cuda()
* Implement Variable.type via ATen
* Changes from review:
- remove copy_out
- remove unnecessary include
- fix default device for .cuda()
* Combine if statements in dispatch_type
* Better error messages for blas ops with cuda.LongTensor
Fixes#4157
Test plan
Try matrix multiplying with cuda.LongTensors
>>> import torch
>>> x = torch.randn(4, 4).long().cuda()
>>> y = torch.randn(4, 4).long().cuda()
>>> x.mm(y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: addmm for CUDA tensors only supports floating-point types. Try converting the tensors with .flo
at() at /private/home/rzou/pytorch/pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:381
* Use Welford's algorithm when reducing along inner dimension for THCTensor's variance fn
* Use accreals in THCTensor's varInnermostDim
* Skip cuda tests if no cuda
* Variance testing
* Add torch.take and Tensor.put_
These are similar to numpy.take and numpy.put. The take function allows
you to linearly index into a tensor without viewing it as a 1D tensor
first. The output has the same shape as the indices. The put function
copies value into a tensor also using linear indices.
* tensor: Ensure that the tensor is contiguous before pinning (#3266)
pin_memory() was producing out-of-order tensor when the given
tensor was transposed, i.e. in column-major order.
This commit fixes this by calling contiguous() before pinning.
* test: add contiguous test for pin_memory (#3266)
* with the size=1 case, impossible to do single point check, replace with isContiguousRange
* fix stride in desc; fix undef scope
* add test for this case for cudnn
* assertTrue
test_FloatTensor_qr_big test is still a bit flaky on K80. Increasing tolerance to improve reliability as tests are moved around and results change for this test.