Summary:
- Just a simple fix to support `fill_`
- And a fix for indexing in `pytorch-complex`
Differential Revision: D9804061
Pulled By: ezyang
fbshipit-source-id: 631129b3fa220a9670770b3766f14a8e03633bdf
Summary:
Many constructors like `torch.zeros` or `torch.randn` didn't support
size tracing correctly which is fixed by this pass. Same issue has been
fixed in legacy tensor constructors.
Additionally, new tensor constructors, which do not participate in
tracing (most notably `torch.tensor`, `torch.as_tensor` and
`torch.from_numpy`) raise a warning when they are used.
Finally, entering a traceable operation disables the tracing in its body.
This is needed because
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11288
Reviewed By: ezyang
Differential Revision: D9751183
Pulled By: apaszke
fbshipit-source-id: 51444a39d76a3e164adc396c432fd5ee3c8d5f7f
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.
For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.
ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152
Differential Revision: D9683607
Pulled By: goldsborough
fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824
API additions:
- Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&)
- Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&)
- Tensor::operator=(Tensor&&) && (for completeness sake)
- TensorBase::unsafeGetTensorImpl()
- TensorBase::unsafeReleaseTensorImpl()
- TensorBase::getIntrusivePtr()
- TensorImpl::type_id()
- Tensor::set_data()
- Tensor::is_same(Tensor)
- Tensor::use_count()
- Tensor::type_id()
- Tensor::scalar_type()
- WeakTensor::is_same(WeakTensor)
- intrusive_ptr::weak_use_count()
- weak_intrusive_ptr::weak_use_count()
- c10::raw::intrusive_ptr::{incref,decref,make_weak}
- c10::raw::weak_intrusive_ptr::{incref,decref,lock}
API changes:
- Tensor::pImpl is no longer public (and now named tensor_impl_)
- Most methods accessed this way are now accessible on Tensor
maybe_zero_dim() and set_wrapped_number() being prominent exceptions
(they are now accessed through unsafeGetTensorImpl())
- Type is no longer friend of Tensor
- TensorBase::reset(TensorImpl*) is deleted
- TensorBase::reset(TensorImpl*, bool should_retain) is deleted
- TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead
- TensorBase::get() is deleted; use unsafeGetTensorImpl() instead
- TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead
- TensorBase::retain() is deleted; use _raw_incref() instead
- TensorBase::release() is deleted; use _raw_decref() instead
- WeakTensor lost most of its methods (it no longer inherits from
TensorBase)
- TensorImpl::storage() is now a const method
- Tensor(TensorBase) constructor removed, instead
we go through getIntrusivePtr(). I'm not sure about
this change; I happened to have accidentally removed the
TensorBase constructor and decided to fix call sites,
but I could go the other way.
- detail::set_data() is deleted; use Tensor::set_data() instead
- c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead.
(The reason for this change, is that it is invalid to cast an intrusive_ptr_target*
to a raw_intrusive_ptr_target* to take advantage of the methods. But there is
no reason the incref/decref methods shouldn't also work on intrusive_ptr_target;
it is primarily an API consideration. We can be more standards compliant by
keeping them as functions, which are universally applicable.)
- intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on
pointers of the NullType. (This counts as a bug fix, because the documentation
specified that pointers produced by release() are valid to reclaim(), and
a release() on a null intrusive_ptr produces the NullType::singleton())
Bug fixes:
- Dispatch code for mutable references incorrectly returned
a reference to a value argument (which would immediately
go out of scope). They now correctly return a tensor by
value.
- intrusive_ptr copy/move assignment did not work correctly when
an object was assigned to itself. We now check for this case and
no-op if so. (This bug manifested itself as a Tensor mysteriously
becoming an UndefinedTensor after lines of code like
'x = x.mul_(y)')
Other changes:
- The checked cast functions in Utils.h have now been
renamed and detemplatized into checked unwrap functions.
- Added type_id() and scalar_type() methods to Tensor
- pImpl is no longer public
- Documented what the && overloads are doing
- All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor')
have been expunged. This is NO LONGER a valid way to create a new
tensor, and if you do this, upon your first incref, you will catch an ASSERT
failure saying that only tensors created by intrusive_ptr::release() are valid
to reclaim(). Use c10::make_intrusive instead in this situation.
- IValue is adjusted to use intrusive_ptr instead of Retainable, and all
other sub-classes of Retainable were modified to use intrusive_ptr.
When doing this, I had to make the constructors of sub-classes like
ConstantList public, so that c10::make_intrusive could invoke them. Fortunately,
if you incorrectly stack allocate a ConstantList, and then try to get an
intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0.
- IValue very narrowly sidesteps the problem of handling NullType, as it
considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor>
which is not always true. This was always the case, but there's now a comment
explaining what's going on.
Some MSVC bugs were uncovered during the preparation of this patch.
They are documented as comments in the code.
Reviewed By: gchanan
Differential Revision: D9481140
fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc
Summary:
This is along the way of removing Tensor as a member of the tagged union in Scalar. This simplifies ordering dependencies, because currently Scalar and Tensor both depend on each other (so we introduce a TensorBase). Also, this API isn't particularly useful publicly: we can't autograd through Scalars, so you still need a Tensor overload basically everywhere anyway.
I'm undecided what the final API should be here. We could keep a Tensor constructor on Scalar, but have it generate a local scalar; this is convenient but given this API used to be non-synchronizing, it may not be the best.
For now, I'm just using _local_scalar, which is clear, although we should get rid of the prefix _ if that's the API we intend to promote.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10852
Reviewed By: ezyang
Differential Revision: D9496766
Pulled By: gchanan
fbshipit-source-id: 16f39b57536b9707132a5a4d915650c381bb57db
Summary:
```
Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage
This patch does two major changes:
- It replaces the use of Retainable in Storage with a new implementation
based on intrusive_ptr. This will be necessary because Caffe2 will
be using this class to implement intrusive_ptrs, and we need to
line these up for the merge. One good thing about the new implementation is
that the default copy/move constructors/assignment operators and destructor
work automatically, instead of needing to be hardcoded into Storage/Tensor.
- It replaces all places where we returned std::unique_ptr<Storage> with
Storage, collapsing an unnecessary double indirection that is no longer
necessary now that we have correctly working copy/move constructors.
I didn't initially want to do step (2), but it was very important to
eliminate all bare uses of new Storage and new StorageImpl, and this making
the API change was the most straightforward way to do this.
HOW TO FIX YOUR CODE IN THE NEW API
- You no longer need to dereference the result of tensor.storage() to pass
it to set. So, instead of:
x.set_(*y.storage());
just write:
x.set_(y.storage());
- If you were accessing methods on StorageImpl via the pImpl() method, you
must use the dot operator to run pImpl(). Even better; just drop pImpl,
we now have method forwarding. So, instead of:
storage->pImpl()->data();
just do:
storage->data();
// storage.pImpl()->data() works too but is not as recommended
- storage->getDevice() is no more; instead use storage->device().index()
MISC CODE UPDATES
- retain, release, weak_retain, weak_release and weak_lock are now
reimplemented using the "blessed API", and renamed to make it
clearer that their use is discouraged.
- nvcc OS X and general OS X portability improvements to intrusive_ptr
- A new comment in intrusive_ptr describing how stack allocated
intrusive_ptr_targets work differently than heap allocated ones
from c10::make_intrusive
CAVEAT EMPTOR
- THStorage_weakRetain used to work on strong pointers, but it NO LONGER
works with intrusive_ptr. You must reclaim the strong pointer into a
real strong pointer, construct a weak pointer from it, and then release
the strong and weak pointers. See StorageSharing.cpp for an example.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488
Reviewed By: gchanan
Differential Revision: D9306134
Pulled By: ezyang
fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478
- Removed Backend constructor from Device, and fixed all
use-sites to use DeviceType::CPU instead of kCPU, or
use a new function backendToDeviceType to perform
the conversion.
- New method device_type() on Type; it gives you the
underlying device type, e.g., CPU for SparseCPU.
- We add backward compatibility for kCPU/kCUDA uses,
by introducing a new special type which is implicitly
convertible to both DeviceType and Backend. As long as
you don't define a function that's overloaded on both
DeviceType and Backend (but not on BackendOrDeviceType),
the implicit conversions will ensure that uses
of at::Device(at::kCPU) keep working. We fixed use-sites in
the library, but did NOT fix sites in the test code, so that
we can exercise this BC code.
Reviewed By: Yangqing
Differential Revision: D9301861
fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716
Summary:
More clang tidy cleanups in `torch/csrc`. This time:
1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration)
2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs.
Also updated my script a little bit.
apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737
Differential Revision: D9069069
Pulled By: goldsborough
fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae
Summary:
```
This adds TensorIterator, a helper class for computing element-wise
operations that's intended to replace the CPU and CUDA apply utils
functions.
CPU kernels are implemented as functions that operate on strided 1-d
tensors compared to CPUApplyUtils which operated individual elements. This
allows the kernels to handle vectorization, while TensorIterator handles
parallelization and non-coalesced dimensions.
GPU kernels continue to operate on elements, but the number of
specializations is reduced. The contiguous case remains the same. The
non-contiguous case uses a single (reduced) shape for all operands and
the fast integer division from THCIntegerDivider. To avoid extra
specializations for indexing with 64-bits, large operations are split
into smaller operations that can be indexed with 32-bits.
Major semantic changes:
- No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by
TensorIterator. The autograd engine performs the reduction assuming
standard broadcasting if the gradient shape does not match the
expected shape. Functions that do not use standard broadcasting rules
should either continue to trace the expand calls or handle the
reduction in their derivative formula.
- Use ONNX v7, which supports broadcasting ops.
Performance impact:
- Small increased fixed overhead (~0.5 us)
- Larger overhead for wrapped numbers (~2.5 us)
- No significant change for ops on contiguous tensors
- Much faster worst-case performance for non-contiguous GPU tensors
- Faster CPU bias addition (~2x)
- Faster GPU bias addition (~30% faster)
Future work:
- Decrease overhead, especially for wrapping numbers in Tensors
- Handle general inter-type operations
- Extend to unary ops and reductions
- Use buffering for compute-bound operations on non-contiguous tensors
(pull in from CPUApplyUtils)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919
Differential Revision: D8677600
Pulled By: colesbury
fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd
* Bag of fixes
* Rename tensor_range.h to tensor_list_view.h
* Post rebase fixes
* Rename torch::tensor namespace to torch::tensors due to name conflict
* Avoid recursion in Module::to
* Created TensorOptions
Storing the type in TensorOptions to solve the Variable problem
Created convenience creation functions for TensorOptions and added tests
Converted zeros to TensorOptions
Converted rand to TensorOptions
Fix codegen for TensorOptions and multiple arguments
Put TensorOptions convenience functions into torch namespace too
All factory functions except *_like support TensorOptions
Integrated with recent JIT changes
Support *_like functions
Fix in place modification
Some cleanups and fixes
Support sparse_coo_tensor
Fix bug in Type.cpp
Fix .empty calls in C++ API
Fix bug in Type.cpp
Trying to fix device placement
Make AutoGPU CPU compatible
Remove some auto_gpu.h uses
Fixing some headers
Fix some remaining CUDA/AutoGPU issues
Fix some AutoGPU uses
Fixes to dispatch_tensor_conversion
Reset version of new variables to zero
Implemented parsing device strings
Random fixes to tests
Self review cleanups
flake8
Undo changes to variable.{h,cpp} because they fail on gcc7.2
Add [cuda] tag to tensor_options_cuda.cpp
Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks
Fix linker error in AutoGPU.cpp
Fix bad merge conflict in native_functions.yaml
Fixed caffe2/contrib/aten
Fix new window functions added to TensorFactories.cpp
* Removed torch::TensorOptions
Added code to generate wrapper functions for factory methods
Add implicit constructor from Backend to TensorOptions
Remove Var() from C++ API and use torch:: functions
Use torch:: functions more subtly in C++ API
Make AutoGPU::set_device more exception safe
Check status directly in DynamicCUDAHooksInterface
Rename AutoGPU to DeviceGuard
Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad
remove python_default_init: self.type()
Add back original factory functions, but with deprecation warnings
Disable DeviceGuard for a couple functions in ATen
Remove print statement
Fix DeviceGuard construction from undefined tensor
Fixing CUDA device compiler issues
Moved as many methods as possible into header files
Dont generate python functions for deprecated factories
Remove merge conflict artefact
Fix tensor_options_cuda.cpp
Fix set_requires_grad not being checked
Fix tensor_new.h
TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac
Fix bug in DeviceGuard.h
Missing includes
TEMPORARILY moving a few more methods into .cpp to see if it fixes windows
Fixing linker errors
* Fix up SummaryOps to use new factories
Undo device agnostic behavior of DeviceGuard
Use -1 instead of optional for default device index
Also move DeviceGuard methods into header
Fixes around device index after optional -> int32_t switch
Fix use of DeviceGuard in new_with_tensor_copy
Fix tensor_options.cpp
* Fix Type::copy(
* Remove test_non_float_params from ONNX tests
* Set requires_grad=False in ONNX tests that use ints
* Put layout/dtype/device on Tensor
* Post merge fixes
* Change behavior of DeviceGuard to match AutoGPU
* Fix C++ API integration tests
* Fix flip functions
* Port THS to ATen.
The basic structure of the patch:
- All kernels in aten/src/THS got rewritten as native
functions in aten/src/ATen/native/sparse
I took the liberty to rename some of the kernels,
opting for a longer, more transparent names than
things like 'spaddcmul'.
- Instead of holding fields for sparse tensor in the TH
C struct THSTensor, they are now held in a C++ class
SparseTensorImpl (this explains why I had to do this
all in one go; I can't have *two* reps for sparse
tensors!)
Along the way, we change a key internal representation
invariant: an "empty" sparse tensor has dimI == 1 and
dimV == 0 (this is different from dimI == 0 and dimV == 0
we had before); this ensures that we maintain the invariant
that dim == dimI + dimV. "Scalar" sparse tensors are
made illegal, because there really is no way to properly
express them in COO format.
- Because we haven't ported THCS or any of the traditional
dense TH implementations, there is a new set of adapter
functions in native/LegacyBridge.cpp exclusively devoted
to deciding whether or not to go to the new native implementation
or back to the legacy TH binding (prefixed with th_).
The intent is that when everything gets ported, we can
delete this file.
- I've kept the stubs for all the THS functions, but they now all
error if you try to actually call them. Eventually, we should
replace these with calls to ATen so that everything keeps
working.
- I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.
There are some miscellaneous improvements which were needed for other
changes in this patch:
- There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
it says on the tin.
- axpy templated function moved to TH/BlasUtils.h, there's a new macro
which lets you easily forward to all of the TH functions. We also expose
THBlas_copy. I'm not terribly pleased with these functions but
they seem to serve a purpose they need.
- New method on Tensor to get TensorImpl*, unsafeGetTensorImpl
- accessor() is now this-const, since const-correctness on Tensor is a lie
- New toSparse()/toDense() methods on Type; now you can call these
directly without having to manually apply at::toSparse/toDense
on the Backend and then running toBackend yourself.
Changes to the kernels:
- Previously, the whole body of all kernels was compiled for
every supported scalar type. In our new implementation,
the scalar dispatch has been pushed into the smallest extent
which (1) is not in a type loop and (2) requires statically
knowing the scalar type. These sites all use
AT_DISPATCH_ALL_TYPES. I tried to use lambdas as much as
possible, but sometimes it was not possible when a OpenMP
pragma was used.
- Anywhere we tested if the nDimension of a tensor was zero,
we replaced with a test that numel is zero. Because, as we
known, nDimension of zero-size tensors in TH is zero, and
that's wrong wrong wrong (and not done this way in ATen).
Some subtleties:
- Places where previously fastget1d was used, I now use a
TensorAccessor. However, you have to be careful about grabbing
the accessor, because sometimes you will be accessor'ing
indices/values and they are empty, which means they will
be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.)
So, essentially, it is only safe to grab an accessor *after*
you have checked that nnz != 0. All of these shenanigans
will go away when we properly support zero-size dimensions.
A few places, we test for this case just by wrapping the loop
in a conditional on nnz. Some other places this is not so easy,
so we instead short-circuit the function with a special case for
when nnz == 0 (usually, these implementations are degenerate).
- There is a very subtle but important difference between
_sparse_get_impl(self)->indices() and self._indices();
the latter may return a view! This is because nnz is
not guaranteed to match the dimensions of indices/values;
you can "truncate" a sparse tensor by setting the nnz.
Actually, I think this is not a good idea and we should
enforce a stronger invariant, but for this patch I slavishly
adhere to the old ways, and as such I have to be very
careful if I want to resize something, I had better use
the former and not the latter.
- I had to reimplement broadcasting by hand (thus the s_
and non-s_ functions in the sparse native files). There
is a very important distinction between foo_out and foo_,
so it is important that the LegacyBridge function always
call to the lower layer, and not try to avoid boilerplate
by calling to another LegacyBridge function first.
I did NOT put broadcasting in LegacyBridge (even though,
ultimately, that's where it must live), because the th_
functions which are invoked from LegacyBridge handle
broadcasting themselves, and I don't want to broadcast
twice.
- Sparse function MUST explicitly specify the Type they
dispatch from, otherwise Variable wrapping/unwrapping will
not work correctly. If you use _get_sparse_impl, that is
sufficient to levy this requirement.
- The "has native" tests in LegacyBridge.cpp are not 100%,
because some of the functions are mixed dense-sparse functions,
and so you can't just say, "Oh, if it's sparse and CPU, call
the native sparse implementation." This is handled on a
case by case basis. There is some especially complex
logic for add(), which has dense-dense, sparse-sparse
and dense-sparse implementations.
- I added some uses of SparseTensorRef in native_functions.yaml,
but you will notice that these are all on native_* functions,
and not the actual, top-level functions. So the SparseTensorRef
is purely documentary (helping you not call the wrong overload)
but there is no magic; we do the wrapping ourselves the hard
way. (This is in constrast to the TH binding code which is magical.)
Except for _sparse_mask; _sparse_mask is magical.
- There is a raw_copy_sparse_ method, which is really my way of
getting around the fact that copy_ has never been implemented
for sparse tensors (even before this patch), but there IS a
super secret, internal way of doing these copies that the THS
code used, and which I needed to get my hands on when I did this
port. We should refactor so that either (a) copy_ does support
sparse-sparse copy natively, or (b) we do this other ways.
- Irritatingly, I must explicitly resize_as_ before copy_ into
a tensor. This was not the case with THTensor_(copy) but I don't
have any direct binding that doesn't have this requirement.
- For some reason, the sparse tensor constructor accepts a scalar
tensor for the values tensor. This is kind of weird because
you always need an nnz-dimension. However, the old code supported
this and just expanded it into a 1D size 0 tensor; so we need some
explicit code to do this.
There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise. I added them all when I was debugging and was
loathe to remove them.
Some last mile fixes after this commit went into PR
- Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts).
- Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short.
- Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings
- Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing
- Update test_function's output
- Some last mile fixes for dispatch confusion in sparse_coo_tensor functions.
- New simplified regression test based on failures I saw in ONNX
- Increase tolerance on super resolution test
- More robust dynamic_type normalization, fixes ONNX bug.
The dynamic_type situation is very delicate; probably need
to stop having both Scalar and real.
- Make new_with_tensor_sparse more CUDA safe
- Note about CUDA-safety in SparseTensorImpl
- Rename dimI/dimV to sparseDims/denseDims.
- Make localScalar on SparseTensorImpl work.
- Make numel uniformly supported on all types, not just dense
types
- Add tests for is_nonzero() method (which exercises localScalar)
- Disable constant JIT autogenerated tests, which are fragile and broken
by this change, but being fixed in a parallel track.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add non_blocking to Tensor/Module.to
* flake8
* Add argparse tests
* cpp parse
* Use C++ parser
* use a commong parse function with Tensor.to
* fix test_jit
* use THPObjectPtr
* increase refcount for None, True, and False
* address comments
* address comments
This makes the JIT tracer much more robust, by allowing it to record
dependencies on tensor sizes. For example, if you were to trace this
function
def fn(x):
return x.view(x.size(1), -1)
before this patch, then it would embed the actual value of x.size(1)
in the trace as a constant, making it very hard to have e.g. batch size
independent traces. Now, this will correctly record the dependency, and
will retrieve the size of x at every run.
* Fix torch.tensor(...) device-type calculation when used with numpy and type inference.
* Fix tensor device type inference as well.
* Better variable type inference: infer cuda-ness only if device is not specified.
* Use Index rather than Long for IntList, so floating-point types convertible to ints fail the parsing.
Basically, our unpackLong code works with floating-point types that are convertible to ints, but this isn't often what you want (because of truncation).
What you actually want is to convert to an index, which will usually find such issues.
I made this the minimal change I could because:
1) I didn't want to change unpackLong because the existing code call checkLong before unpackLong, so this should be a non-issue most of the time. And fixing this properly requires calling checkLong again, which will slow everything down.
2) An exception above is with IntList, which only checks that 1) it is a tuple or 2) it is a varargs tuple (i.e. torch.ones(1, 2, 3)).
* Fix bug.
* Don't conflict tensor and IntList bindings.
* Change function to be consistent between python 2 and 3.
* Check Index.
* Move IntList overloads in legacy new functions to below Tensor overloads.
* Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod.
This adds optional dtypes to torch.sum, torch.prod, torch.cumsum, torch.cumprod.
By default, the dtype is torch.float64 for integral types, and the dtype of the input for floating point types.
* Don't use optional<ScalarType>, because the jit can't handle it yet.
Instead, we manually build the overloads. This is fairly painful because of default arguments, but should be easy to pull out once the jit can handle optional<ScalarType>.
* Fix keepdim with out parameters.
* Fix _cudnn_rnn_flatten_weight.
* If dtype is provided to an out function, make sure it matches the dtype of the result.
* Fix typo.
* Separate cuda-ness from dtype.
There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).
There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.
* Fix test_autograd.
* Add defaults to randint_like.
* Track is_cuda in py tensor types.
* Fix test_sparse.
* Fix multiprocessing.
* Fix rnn.
* Fix test_nn.
* Fix flake8.
* Add string-style devices to all tensors.
Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that
was meant to be device agnostic.
This PR implements the following:
1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors.
For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal.
2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification.
For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1).
Also has backwards compatibility support for specifying integers, which are treated as cuda devices.
DeviceSpecs have the following properties:
a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda')
b) device_index: integer for the device index (None if not specified)
c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices,
it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`.
3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example:
torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1')
TODO in future PRs:
A) Split out cuda from dtype so you don't need to overspecify cuda-ness
B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc.
at the torch. level that work on strings/DeviceSpecs
* Add deviceInt64 to python arg parser.
* device_str.
* Remove device_str.
* remove device prefix from attributes.
* Use const char * instead of string.
* Move autogpu index out of Device.
* comment on is_default.
* Rename torch.DeviceSpec to torch.device.
* comment.
* Fix tests.
* Fix flake8.
* Fix sparse_coo_tensor parameter name.
* Improve error message.
* Remove device_ prefix from C++ device object.
* Allocate static strings.
* Return not implemented from rich compare.
* Move torch::Device to THPDevice.
* Remove cuda index.
* Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
* Introduce torch.layout and split layout from dtypes.
Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'.
Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case
(i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case.
It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the
last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc.
This is accomplished by:
1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both
torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype
2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch.
* Formatting, make init throw python_error.
* Fix cuda not enabled error message.
* Fix test.
- Remove some uses of mega-header THP.h
- Use HANDLE_TH_ERRORS in functions that may throw
- Move NumPy includes to common header
- Delete unused allocator
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.
To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.
There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:
https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
* Various dtype improvements.
1) Add dtypes to the new data-based constructors: Variable.new_tensor and torch.autograd.variable.
2) In the python signatures, use Type instead of Dtype to match the C++ signatures; the error messages still print as dtype.
3) Handle / add a better error message when a dtype is used when ATen was not compiled with that type (e.g. cuda types).
4) Move cuda_lazy_init to its own file.
A later commit will add support to the legacy constructors as well.
* Move implementation of lazy_init to cpp.
* Fix parsed_arg size.
* Add numpy-style dtypes to Variable factories.
1) Add numpy-style dtypes corresponding to torch tensor types. These are:
torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64
as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents.
2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names. These are:
torch.half, torch.float, torch.double, torch.short, torch.int, torch.long.
torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures.
3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type.
4) Adds a "dtype" getter to Variables that return the canonical dtype from 1)
This PR is missing the following useful features that should be added in the future:
A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet.
B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function.
C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device.
* backend_to_string can be private.
* Define python binding argument indexes in a more simple way.
* add all_declared_types, still need to hook it up to THPDType.
* Fix all_declared_types for missing types (it's Sparse + Half).
* Ensure cuda dtypes are created even if compiled with NO_CUDA=1.
* Fix case where dtype is provided but dispatch is via namespace.
This happens in ones_like, empty_like, randn_like.
There is some question if we should do:
1) at::ones_like(tensor).toType(dtype)
2) at::ones_like(tensor.toType(dtype))
I did the former because this matches with the numpy documentation, i.e.:
"Overrides the data type of the result." and it's easier to implement.
Note that the above causes an extra copy, either of the input or output.
Here's a better implementation:
1) Make zeros_like, ones_like native functions that take an optional type (named dtype?).
2) Match the type argument with the dtype, so we don't have two different parameters.
3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes())
* Don't return from maybe_initialize_cuda.
* Don't leak DType name.
* Address cpp review comments.
* Share code between sparse and non-sparse test_dtypes.
* Rewrite _like functions as native function with explicit type parameter.
* Use type 'Type' instead of 'dtype' for consistency.
* Address review comments.
* Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.
* Allow zero-dim tensors to be bound to at::Scalar
This relaxes THPUtils_unpackLong and THPUtils_unpackDouble to allow
values convertable to PyLong and PyFloat objects. This includes NumPy
scalars and zero-dim tensors (Variables).
This is important to maintain backwards compatibility in the Tensor
constructors once scalars are enabled and Variable and Tensor are
merged.
* Add comment and unpack PyInt as int64_t
Uses TypeError from torch/csrc/Exceptions.h in python_arg_parser.cpp so
that the exception is interpreted as a Python TypeError instead of
RuntimeError.
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.
The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
* Convolution derivatives in ATen
This PR introduces ATen implementation of convolution, which dispatches to
THNN/CuDNN/nnpack based on input parameters. The general strategy is to compose
this function out of the various forward-backward pairs of specific
implementations, rather than write a monolithic function with backwards (which
is what we did before because the boilerplate of doing it otherwise would have
been very high.) The new API provides the following functions:
- _convolution, which is a fully generic, native convolution implementation
that dispatches to various other convolution implementations depending on
input characteristics. This is prefixed with an underscore because it
explicitly takes benchmark, deterministic and cudnn_enabled which are
implementation details for CuDNN. The intent is to eventually provide a
convolution that reads these parameters out of the context using #4104.
- _convolution_nogroup is a convolution implementation for non-CuDNN
algorithms which don't support group convolution natively.
- _convolution_double_backward is the generic double-backwards implementation
for convolution.
In more detail:
- Most functionality from torch/csrc/autograd/functions/convolution.cpp has been
moved into aten/src/ATen/native/Convolution.cpp
- We continue to make use of ConvParams, but we now construct the parameters
upon entry to a function from the function signature (which does not use
ConvParams; having convolution take ConvParams directly would require teaching
the code generator how to accept these as parameters, complicating ATen's API
model) and destruct them when making subprocedure calls.
- I introduce a new idiom, input_r, which represents a const Tensor& reference,
which will subsequently be assigned to a local Tensor input. This is helpful
because a lot of the existing algorithms relied on being able to assign to
locals, which is not permitted with a const reference.
- The native argument parser now supports std::array<bool,2> inputs (NB: there
MUST NOT be a space; this is the same hack as is applied to derivatives.yaml)
- Native parser now supports Tensor? arguments, which indicates a nullable
tensor. Previously this function was only used by NN methods.
- Documentation updates on THNN library
- I added an extra fgradInput argument to VolumetricConvolutionMM_updateOutput
and VolumetricConvolutionMM_accGradParameters so that its buffer list lines up
with the backward argument list. This makes it possible to write derivative
for conv3d which previously was not supported (commented out in
derivatives.yaml)
- Extra double_backward declarations for all convolution backwards functions was
added.
- You can now use the syntax Tensor? in native_functions.yaml to indicate that a
tensor argument is nullable. There are adjustments to propagate this to the
Python argument parser.
- NNPACK was ported to ATen, and ATen now builds and links against ATen if
possible. New AT_NNPACK_ENABLED macro. The nnpack functions are
nnpack_spatial_convolution.
- Some modest CuDNN convolution refactoring to remove _forward from names.
- There's a new cudnn_convolution_backward function to deal with the fact that
CuDNN convolution double backward requires you to have computed all gradients
in one go.
- Variable set_flags now checks if the tensor is undefined, fixing a silent memory
corruption.
- checkSameType updated to not raise an exception if called with Variable arguments
- "no ATen declaration found for" error message is improved to say what available declarations are
- make_variable now accepts undefined tensors, and returns an undefined tensor in this case.
* Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable.
Also changes the error messages around Generator parser; previously, you'd get an error
like: torch._C.Generator is not a torch.Generator; now the check is proper but returns
that only None is supported.
* Support passing Generators to ATen Variable-bound methods.
This involves changing THPGenerator to have an at::Generator rather than a THGenerator.
TH getRNGState, setRNGState are still called directly because they are not bound from ATen yet;
they should probably be on the Generators and return (opaque) GenerateState objects.
* Fix default values.
* Properly use THRandom_initialSeed.
* update standard gamma to use new default generator.
* Generate torch.cat autograd via ATen.
Most of the change is around supporting generation of:
1) TensorList arguments
2) Arguments to "size", "sizes", i.e. "sizes(dim)"
This adds some generated autograd functions implemented in C++, which
are generated from derivatives.yaml. It also generates Python bindings
for the Variable methods. The generated files are:
Functions.cpp/h: subclasses of torch::autograd::Function
VariableType.cpp/h: The at::Type for autograd Variables
python_variable_methods.cpp: Python bindings to torch::autograd::Variable
python_variable_methods_dispatch.h: wrapper which releases GIL and sets the
CUDA device
python_functions.cpp/h: exposes generated autograd functions as Python
objects
The generated functions are mostly shadowed by the definitions in
variable.py. We'll remove the Python implementations in favor of the
generated C++ implementations in a subsequent commit.