Commit Graph

332 Commits

Author SHA1 Message Date
Edward Z. Yang
711e5a6ceb
Port THS to ATen. (#8409)
* Port THS to ATen.

The basic structure of the patch:

- All kernels in aten/src/THS got rewritten as native
  functions in aten/src/ATen/native/sparse

  I took the liberty to rename some of the kernels,
  opting for a longer, more transparent names than
  things like 'spaddcmul'.

- Instead of holding fields for sparse tensor in the TH
  C struct THSTensor, they are now held in a C++ class
  SparseTensorImpl (this explains why I had to do this
  all in one go; I can't have *two* reps for sparse
  tensors!)

  Along the way, we change a key internal representation
  invariant: an "empty" sparse tensor has dimI == 1 and
  dimV == 0 (this is different from dimI == 0 and dimV == 0
  we had before); this ensures that we maintain the invariant
  that dim == dimI + dimV.  "Scalar" sparse tensors are
  made illegal, because there really is no way to properly
  express them in COO format.

- Because we haven't ported THCS or any of the traditional
  dense TH implementations, there is a new set of adapter
  functions in native/LegacyBridge.cpp exclusively devoted
  to deciding whether or not to go to the new native implementation
  or back to the legacy TH binding (prefixed with th_).
  The intent is that when everything gets ported, we can
  delete this file.

- I've kept the stubs for all the THS functions, but they now all
  error if you try to actually call them.  Eventually, we should
  replace these with calls to ATen so that everything keeps
  working.

- I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.

There are some miscellaneous improvements which were needed for other
changes in this patch:

- There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
  it says on the tin.

- axpy templated function moved to TH/BlasUtils.h, there's a new macro
  which lets you easily forward to all of the TH functions. We also expose
  THBlas_copy.  I'm not terribly pleased with these functions but
  they seem to serve a purpose they need.

- New method on Tensor to get TensorImpl*, unsafeGetTensorImpl

- accessor() is now this-const, since const-correctness on Tensor is a lie

- New toSparse()/toDense() methods on Type; now you can call these
  directly without having to manually apply at::toSparse/toDense
  on the Backend and then running toBackend yourself.

Changes to the kernels:

- Previously, the whole body of all kernels was compiled for
  every supported scalar type.  In our new implementation,
  the scalar dispatch has been pushed into the smallest extent
  which (1) is not in a type loop and (2) requires statically
  knowing the scalar type.  These sites all use
  AT_DISPATCH_ALL_TYPES.  I tried to use lambdas as much as
  possible, but sometimes it was not possible when a OpenMP
  pragma was used.

- Anywhere we tested if the nDimension of a tensor was zero,
  we replaced with a test that numel is zero.  Because, as we
  known, nDimension of zero-size tensors in TH is zero, and
  that's wrong wrong wrong (and not done this way in ATen).

Some subtleties:

- Places where previously fastget1d was used, I now use a
  TensorAccessor.  However, you have to be careful about grabbing
  the accessor, because sometimes you will be accessor'ing
  indices/values and they are empty, which means they will
  be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.)
  So, essentially, it is only safe to grab an accessor *after*
  you have checked that nnz != 0.  All of these shenanigans
  will go away when we properly support zero-size dimensions.

  A few places, we test for this case just by wrapping the loop
  in a conditional on nnz.  Some other places this is not so easy,
  so we instead short-circuit the function with a special case for
  when nnz == 0 (usually, these implementations are degenerate).

- There is a very subtle but important difference between
  _sparse_get_impl(self)->indices() and self._indices();
  the latter may return a view!  This is because nnz is
  not guaranteed to match the dimensions of indices/values;
  you can "truncate" a sparse tensor by setting the nnz.
  Actually, I think this is not a good idea and we should
  enforce a stronger invariant, but for this patch I slavishly
  adhere to the old ways, and as such I have to be very
  careful if I want to resize something, I had better use
  the former and not the latter.

- I had to reimplement broadcasting by hand (thus the s_
  and non-s_ functions in the sparse native files).  There
  is a very important distinction between foo_out and foo_,
  so it is important that the LegacyBridge function always
  call to the lower layer, and not try to avoid boilerplate
  by calling to another LegacyBridge function first.
  I did NOT put broadcasting in LegacyBridge (even though,
  ultimately, that's where it must live), because the th_
  functions which are invoked from LegacyBridge handle
  broadcasting themselves, and I don't want to broadcast
  twice.

- Sparse function MUST explicitly specify the Type they
  dispatch from, otherwise Variable wrapping/unwrapping will
  not work correctly.  If you use _get_sparse_impl, that is
  sufficient to levy this requirement.

- The "has native" tests in LegacyBridge.cpp are not 100%,
  because some of the functions are mixed dense-sparse functions,
  and so you can't just say, "Oh, if it's sparse and CPU, call
  the native sparse implementation."  This is handled on a
  case by case basis.  There is some especially complex
  logic for add(), which has dense-dense, sparse-sparse
  and dense-sparse implementations.

- I added some uses of SparseTensorRef in native_functions.yaml,
  but you will notice that these are all on native_* functions,
  and not the actual, top-level functions.  So the SparseTensorRef
  is purely documentary (helping you not call the wrong overload)
  but there is no magic; we do the wrapping ourselves the hard
  way. (This is in constrast to the TH binding code which is magical.)
  Except for _sparse_mask; _sparse_mask is magical.

- There is a raw_copy_sparse_ method, which is really my way of
  getting around the fact that copy_ has never been implemented
  for sparse tensors (even before this patch), but there IS a
  super secret, internal way of doing these copies that the THS
  code used, and which I needed to get my hands on when I did this
  port.  We should refactor so that either (a) copy_ does support
  sparse-sparse copy natively, or (b) we do this other ways.

- Irritatingly, I must explicitly resize_as_ before copy_ into
  a tensor.  This was not the case with THTensor_(copy) but I don't
  have any direct binding that doesn't have this requirement.

- For some reason, the sparse tensor constructor accepts a scalar
  tensor for the values tensor.  This is kind of weird because
  you always need an nnz-dimension.  However, the old code supported
  this and just expanded it into a 1D size 0 tensor; so we need some
  explicit code to do this.

There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise.  I added them all when I was debugging and was
loathe to remove them.

Some last mile fixes after this commit went into PR

- Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts).
- Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short.
- Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings
- Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing
- Update test_function's output
- Some last mile fixes for dispatch confusion in sparse_coo_tensor functions.
- New simplified regression test based on failures I saw in ONNX
- Increase tolerance on super resolution test
- More robust dynamic_type normalization, fixes ONNX bug.
  The dynamic_type situation is very delicate; probably need
  to stop having both Scalar and real.
- Make new_with_tensor_sparse more CUDA safe
- Note about CUDA-safety in SparseTensorImpl
- Rename dimI/dimV to sparseDims/denseDims.
- Make localScalar on SparseTensorImpl work.
- Make numel uniformly supported on all types, not just dense
  types
- Add tests for is_nonzero() method (which exercises localScalar)
- Disable constant JIT autogenerated tests, which are fragile and broken
  by this change, but being fixed in a parallel track.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-15 17:52:21 -04:00
li-roy
6869a5f0fb Throw error on 0-length tensor slicing (#7775)
* throw error on 0-length tensor slicing

* return empty tensor instead of throwing error

* make 0 slice work for tuples also

* add tests

* move check to aten

* Address comments
2018-06-14 17:40:51 -04:00
Wei Yang
ae55865a3b Migrated hardshrink() to ATen and deprecated nn.Hardshrink() (#8117)
* 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet

* optimized memory read/write

* 1. pass in lambd as scalar for CPU/CUDA_apply*; 2. removed tests for hardshrink at test_legacy_nn

* fixes test_utils

* 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda

* 1. printing lambd value; 2. default lambd=0.5 is still failing

* getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py

* cleaned up debug printf
2018-06-14 16:42:20 -04:00
Chintak Sheth
21609e0fd0 `bincount` feature implementation (#6688)
* Implement CPU bincount feature support

* Incorporate feedback on renaming to SummaryOps file and other nits

* bincount gpu implementation

* refactor cuda code and incorporate nits

* doc fix

* cuda bincount - cast weights to double if integral type

* fix: signed unsigned comparison error

* fix: ssize_t error

* refactor

* make template typenames readable and other nist

* make compatible with v0.5

* incorporate comments

* update test cases to ensure CUDA code coverage
2018-06-14 11:38:04 -04:00
li-roy
6a85b133d3
Improve number formatting in tensor print (#7632)
* Improve number formatting in tensor print

* fix bad rebase

* address comments

* fix test

* fix test

* use assertExpected for tests

* address comments

* address comments
2018-06-13 23:57:16 -07:00
Wei Yang
71a3633e3f
change tensor.set_() argument names to match descriptions in doc (#8403)
Replaced args name `storage` and `sourceStorage` to `source` in tensor.set_() to match the descriptions in docs.
2018-06-13 13:22:50 -07:00
Will Feng
ffffee6aa9 Skip test_multinomial_invalid_probs on Windows (#8360) 2018-06-12 17:00:49 -04:00
Wei Yang
c3e4b3c88b
raise more informative error msg for torch.load not support seek (#7754)
Raising more informative error msg for torch.load() when input file does not support seek() or tell()
2018-06-12 12:57:28 -07:00
Tongzhou Wang
742912512c Move signal window functions to ATen; add Blackman window (#8130)
* Move signal window functions to ATen; add Blackman window

* fix cuda test not checking scipy
2018-06-08 11:37:46 -04:00
Will Feng
89ea6acde2 [NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647)
* Add nan and inf probs check to multinomial

* fix bug

* Spawn CUDA test in subprocess

* Make sure invalid input won't pass the test case

* Try to fix error

* Test failure cases in Python 3 only

* Try to fix Windows error

* Move CUDA test to test_cuda.py

* fix issues

* fix module name error

* no need to check for CUDA existence in test_cuda

* Use PY3
2018-06-06 22:49:12 -04:00
Tongzhou Wang
c0a419e6ba
Add non_blocking to Tensor/Module.to (#7312)
* Add non_blocking to Tensor/Module.to

* flake8

* Add argparse tests

* cpp parse

* Use C++ parser

* use a commong parse function with Tensor.to

* fix test_jit

* use THPObjectPtr

* increase refcount for None, True, and False

* address comments

* address comments
2018-06-04 18:46:52 -04:00
li-roy
bafec1637e support loading gzip (#6490)
* support loading gzip

* address comments

* address comments

* fix lint

* fix test for python2
2018-05-31 15:06:38 -04:00
Seth Hendrickson
e9c33e91d9 Remove python bindings for torch.slice (#7924)
* skip python bindings for slice

* remove tests

* convert slice test to indexing
2018-05-31 13:42:49 -04:00
Richard Zou
b5594ac750 Raise error when torch.load a storage on a non-existing device (#7921)
* Raise error when torch.load a storage on a non-existing device

Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine
would raise an unreadable error:

```
~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self)
    223         if self.idx is -1:
    224             return
--> 225         self.prev_idx = torch._C._cuda_getDevice()
    226         if self.prev_idx != self.idx:
    227             torch._C._cuda_setDevice(self.idx)

AttributeError: module 'torch._C' has no attribute '_cuda_getDevice'
```

This PR makes it so that torch.load raises a hard error if one tries to
load a storage onto a non-existing device and suggests the user to use
torch.load's map_location feature.

* Address comments

* missing dep
2018-05-31 09:44:50 -04:00
Richard Zou
769f5f7cfe
Handling of scalars in torch.Size (#5676)
* Handling of scalars in torch.Size

torch.Size() constructor uses python_arg_parser

IntList in python_arg_parser can take iter/range

Have IntList take python iterables and ranges.

Address comments: don't use python_arg_parser and instead call __index__ in THPSize_pynew

Address comments

Address comments

* Rebased

* Address nit
2018-05-30 17:50:32 -04:00
Thomas Viehmann
42a68749bf einsum: don't inplace modify arguments (fixes: #7763) (#7765)
Thank you, Pierce Freeman, for the report and minimal example!
2018-05-29 11:26:39 +01:00
Ailing
b4ae80d459 serialization for torch.device (#7713) 2018-05-21 11:34:26 +02:00
Ailing
75cf0faf4c Implement __reduce__ for torch.dtype (#7699) 2018-05-20 14:59:02 +02:00
gchanan
4f20a0e439
Fix various sparse transpose issues; remove dead code from Declaratio… (#7200)
* Fix various sparse transpose issues; remove dead code from Declarations.yaml.

1) Fixes some checks in t_, transpose_ that don't allow transposing empty sparse tensors.
2) Remove out= variants from docs since they don't exist (and haven't since at least v0.3.1).
3) Unify implementations of t_, transpose_, t, transpose.
4) Move dead checking code from Declarations.cwrap to actual implementations.
5) Fix test which never tested transpose_.

* Add test for error with t, t_.

* Address review comments.

* Fix jit tests.

* Fix test_jit.
2018-05-18 19:51:41 +02:00
gchanan
7abdc303c6
Don't allow requires_grad to be set on integer Tensor constructors in… (#7185)
* Don't allow requires_grad to be set on integer Tensor constructors in tensor_new.

* Fix autograd test.

* Fix test_distributions.

* Fix test_jit.

* Fix NN tests.
2018-05-18 19:45:10 +02:00
Seth Hendrickson
32b23a4bfc Throw error on tensor creation when sequence shape cannot be determined (#7583)
* first commit

* unit test

* minor style edits
2018-05-18 19:14:42 +02:00
Thomas Viehmann
bf95dff85b Map digamma +/-inf results to nan in test (fixes #7651) (#7665) 2018-05-18 16:35:00 +02:00
Thomas Viehmann
e1148db7f2 Implement logsumexp (fixes #2591) (#7254)
* Implement logsumexp (fixes #2591)

* Add logsumexp_backward, fix _out declaration.

Thank you Simon and Edward for your comments!
2018-05-14 22:08:14 -04:00
Thomas Viehmann
cfc1d92975 Implement ellipses ('...') and diagonals (e.g. 'ii->i') in einsum. (#7173)
This brings the two most important missing numpy einsum features
to toch.einsum.
2018-05-12 23:39:37 -04:00
Richard Zou
eaa3f2e613 Fix advanced indexing with negative indices (#7345)
* Fix advanced indexing with negative indices

Fixes #7156

Here is some behavior before this PR:
```
In[1]:
x = torch.arange(9).view(3, 3).contiguous()
x[[0], [-1]]  # Should be equivalent to x[0, -1]

Out[1]:
tensor([ 8])
```

The bug is that negative indices are added to the computed linear index
directly. In the above example, the linear index computed is "-1", which
wraps around to "8", giving the last element of a flattened view of `x`.

Instead, we should wrap negative indices around before adding them to
the linear index.

* Use toCLong()
2018-05-12 23:24:40 -04:00
Jon Walsh
857e3f4a5e Throw error in tensor constructor when numpy strides mismatch (#7440) 2018-05-11 11:00:43 +02:00
Ethan Steinberg
9fa1dff66a Allow the use of torch.device for loading (#7339)
* Allow using torch.device for loading

* Make recommended changes

* Better tests
2018-05-10 15:50:00 -04:00
Richard Zou
71626491c4 Add batched linear solver to torch.gesv() (#6100)
* Add batched linear solver to torch.gesv()

Fixes #3164
Picks up from #4502

I moved `gesv` to ATen.
Adds bindings for MAGMA's `gesv_batched` function for CUDA.
For CPU, runs `THLapack(gesv)` in a for loop.

The new function supports arbitrary batch dimensions (and broadcasting
of those dimensions). For example, the 4-d tensor `A x B x M x M` should
be treated as having batch-size `(A x B)`.

The overhead of creating the magma_queue_t is: ~350000 microseconds
the first time it's called and ~6 microseconds every time after that.

* Tests and docs

* Address comments

* Address comments

* Rebase

* Address comments

* Fix rebase

* Addressed comments

* Address comments

* Address comments

* Addressed comments
2018-05-08 17:06:27 -04:00
Adam Paszke
8091388d0f
Add support for __floordiv__ and __rdiv__ for integral tensors (#7245) 2018-05-03 23:34:59 +02:00
gchanan
681baa9254
Restore warning to torch.range. (#7194)
Also, get rid of warning specification in Declarations.cwrap, which currently has no effect.
2018-05-02 21:53:00 -04:00
Thomas Viehmann
07513cfd1d implement sum over multiple dimensions (fixes #2006) (#6152) 2018-05-02 21:50:29 -04:00
cpuhrsch
88a705555a
Add SLEEF for float and double (#6725) 2018-05-02 18:40:44 +00:00
gchanan
8031da5479
Implement torch.as_tensor, similar to numpy.asarray. (#7109)
* Implement torch.as_tensor, similar to numpy.asarray.
torch.as_tensor behaves like torch.tensor except it avoids copies if possible; so also somewhat like tensor.new but without the size overloads.
I didn't add a requires_grad field, because we haven't decided on the semantics such as as_param.

* Remove requires_grad for doc.
2018-05-01 12:54:43 -04:00
Thomas Viehmann
8fbab83c2a only Tensors of floating point dtype can require gradients (see #7021) (#7034) 2018-04-30 10:20:00 +02:00
gchanan
361648a4a7
Fix torch.tensor(...) device-type calculation when used with numpy an… (#6995)
* Fix torch.tensor(...) device-type calculation when used with numpy and type inference.

* Fix tensor device type inference as well.

* Better variable type inference: infer cuda-ness only if device is not specified.
2018-04-27 18:12:33 -04:00
cpuhrsch
ae35e0e924 Support non-contiguous tensors for unary ops (#6119) 2018-04-27 21:31:34 +02:00
gchanan
a6bfa16c17
torch.arange: add numpy-style type inference. (#7016)
* torch.arange: add numpy-style type inference.

This is a backwards-compatibility breaking change.

* Fix flake8.

* Use at::optional.

* Remove unneeded header files.

* Use reference wrapper.

* Update arange for test.

* Address review comments.
2018-04-27 15:11:45 -04:00
gchanan
18ed2160b0
Use Index rather than Long for IntList parsing (#6674)
* Use Index rather than Long for IntList, so floating-point types convertible to ints fail the parsing.

Basically, our unpackLong code works with floating-point types that are convertible to ints, but this isn't often what you want (because of truncation).
What you actually want is to convert to an index, which will usually find such issues.

I made this the minimal change I could because:
1) I didn't want to change unpackLong because the existing code call checkLong before unpackLong, so this should be a non-issue most of the time.  And fixing this properly requires calling checkLong again, which will slow everything down.
2) An exception above is with IntList, which only checks that 1) it is a tuple or 2) it is a varargs tuple (i.e. torch.ones(1, 2, 3)).

* Fix bug.

* Don't conflict tensor and IntList bindings.

* Change function to be consistent between python 2 and 3.

* Check Index.

* Move IntList overloads in legacy new functions to below Tensor overloads.
2018-04-26 19:13:23 -04:00
gchanan
a08091a42d
Implement matmul_out and dot_out. (#6961)
* Implement matmul_out and dot_out.

* Fix autograd by only calling _out variants if we have an out ourselves.

* Disallow mismatched types in dot_out.

* Make sure out variant doesn't have a method.

* Do proper type conversion.
2018-04-26 16:52:58 -04:00
Thomas Viehmann
2b44c420c8 Enhance diagonal (fixes #6479) (#6718)
* Enhance diagonal

This patch
- adds Tensor.diagonal to complement torch.diagonal
- implements diagonal natively in ATen
- makes diagonal a view
- implements taking arbitrary diagonals
- implements diagonal backward instead of referring
  to the (more limited) diag

* add tests, copy diagonal code to backward for double differentiability

* improve tests and doc comment. Thank you, Adam!

* Mark diagonal as view function in gen_autograd.py, use simple backward.
2018-04-26 11:11:20 -04:00
Thomas Viehmann
f98b778086 Fix forward and backward for norm/renorm with infty norm (fixes #6817) (#6969) 2018-04-26 12:54:53 +02:00
gchanan
3d907ef78e
Consistently check 'out' variants against specified dtype/layout/device parameters. (#6973)
We were previously doing this in the most common cases, but not consistently.
2018-04-25 22:46:42 -04:00
Soumith Chintala
333e8c9b22
any/all returns LongTensor, make test expect that (#6957) 2018-04-25 14:05:29 -04:00
Tao He
39d4814933 Make any and all on ByteTensor behave like sum/prod. (#4627) 2018-04-25 10:25:38 +02:00
cpuhrsch
a8bdb561b7 Fix reductions on some contiguous tensors where size(dim) == 1 (#6815) 2018-04-22 13:55:55 -04:00
Richard Zou
d1a992a85e Disallow chunks that are <= in torch.chunk (#6761)
Fixes #6759.

Before, `tensor.chunk(0)` would cause a divide by 0.
`tensor.chunk(-1)` would throw an error complaining that "split_size
needs to be positive".

This PR changes it so that the error message makes it clear that
`chunks` has to be greater than 0.
2018-04-19 18:31:14 -04:00
MRuberry
9c47eb5548 Fixes test_torch.py so that all tests pass on Volta hardware. (#6736)
Issue: "python3 test_cuda.py" currently results in a failure when using Volta hardware.

The failure is in test_advancedindex, and is caused by two "sub-tests." At line 4651 a series of indices are used to compare PyTorch's and Numpy's indexing behavior. At least two of these indices index the same element of the reference tensor multiple times. These are:

[slice(None), [[2]], [[0, 3], [4, 4]]]
[slice(None), [[0, 1], [1, 0]], [[2, 3], [3, 0]]]

The first index selects the 5th element of the third row twice, and the
second index selects the 4th element of the second row twice.

This causes the test to attempt to update the same index with two distinct values simultaneously. On my machine the Numpy created tensor will always take the "latter" of these two values, while the Volta tensor will always take the "former." (Not to say this behavior is guaranteed by either framework.)

The fix is to remove these two indices from test_torch.py. This causes all tests to pass.

While updating test_torch.py I also noticed that assert_get_eq(tensor, indexer) had a bug where it was referring to "reference" instead of "tensor." This bug had no impact on behavior. The fix is to have this function refer to its input tensor, "tensor," instead. All tests still pass after this fix.
2018-04-18 22:44:14 -04:00
Adam Paszke
d26ab68485 Sort declarations when generating Python bindings (#6701)
* Sort declarations when generating Python bindings

This helps resolve ambiguities in argument parsing according to
any rules we will need.

For now, this allows us to make scalar operations more conservarive
wrt. argument types, but makes them commutative again.

* Fix inconsistencies between mod with tensor and scalar

* Fix a stupid mistake
2018-04-18 21:51:35 -04:00
Thomas Viehmann
bd0cc7d364 Implement torch.einsum (fixes #1889) (#6307)
* start at generic trilinear

* Implement einsum (fixes #1889)

This provides a simple implementation of einsum. It is built on
top of the work for computing bilinear (#6110).
It uses a naive left-to-right resolution at the moment.
Autograd is able to differentiate by itself.
The obvious unsupported feature is taking diagonals (einsum('ii->i',(a,)).

* add tests and docs

* fix flake8

* clean diff

* rebase on current master to resolve conflicting String wrapping

* clean up after rebase

* better commentary in einsum and sumproduct_pair

* don't say fixme if it's fixed and rename num_outputs to num_output_dims

* adapt python wrapper to use std::string instead of String to avoid typedef at::String

* typos and some vector to array conversion

* fix accidental python<->python3 change

* really fix bad rebase
2018-04-18 13:41:27 +02:00
Francisco Massa
feb8522f99
randperm supports n=0 (#6656)
This makes it compatible with arange and numpy.random.permutation
2018-04-17 19:03:57 +02:00