* Sort declarations when generating Python bindings
This helps resolve ambiguities in argument parsing according to
any rules we will need.
For now, this allows us to make scalar operations more conservarive
wrt. argument types, but makes them commutative again.
* Fix inconsistencies between mod with tensor and scalar
* Fix a stupid mistake
* start at generic trilinear
* Implement einsum (fixes#1889)
This provides a simple implementation of einsum. It is built on
top of the work for computing bilinear (#6110).
It uses a naive left-to-right resolution at the moment.
Autograd is able to differentiate by itself.
The obvious unsupported feature is taking diagonals (einsum('ii->i',(a,)).
* add tests and docs
* fix flake8
* clean diff
* rebase on current master to resolve conflicting String wrapping
* clean up after rebase
* better commentary in einsum and sumproduct_pair
* don't say fixme if it's fixed and rename num_outputs to num_output_dims
* adapt python wrapper to use std::string instead of String to avoid typedef at::String
* typos and some vector to array conversion
* fix accidental python<->python3 change
* really fix bad rebase
* Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod.
This adds optional dtypes to torch.sum, torch.prod, torch.cumsum, torch.cumprod.
By default, the dtype is torch.float64 for integral types, and the dtype of the input for floating point types.
* Don't use optional<ScalarType>, because the jit can't handle it yet.
Instead, we manually build the overloads. This is fairly painful because of default arguments, but should be easy to pull out once the jit can handle optional<ScalarType>.
* Fix keepdim with out parameters.
* Fix _cudnn_rnn_flatten_weight.
* If dtype is provided to an out function, make sure it matches the dtype of the result.
* Fix typo.
* Separate cuda-ness from dtype.
There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).
There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.
* Fix test_autograd.
* Add defaults to randint_like.
* Track is_cuda in py tensor types.
* Fix test_sparse.
* Fix multiprocessing.
* Fix rnn.
* Fix test_nn.
* Fix flake8.
* Add string-style devices to all tensors.
Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that
was meant to be device agnostic.
This PR implements the following:
1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors.
For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal.
2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification.
For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1).
Also has backwards compatibility support for specifying integers, which are treated as cuda devices.
DeviceSpecs have the following properties:
a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda')
b) device_index: integer for the device index (None if not specified)
c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices,
it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`.
3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example:
torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1')
TODO in future PRs:
A) Split out cuda from dtype so you don't need to overspecify cuda-ness
B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc.
at the torch. level that work on strings/DeviceSpecs
* Add deviceInt64 to python arg parser.
* device_str.
* Remove device_str.
* remove device prefix from attributes.
* Use const char * instead of string.
* Move autogpu index out of Device.
* comment on is_default.
* Rename torch.DeviceSpec to torch.device.
* comment.
* Fix tests.
* Fix flake8.
* Fix sparse_coo_tensor parameter name.
* Improve error message.
* Remove device_ prefix from C++ device object.
* Allocate static strings.
* Return not implemented from rich compare.
* Move torch::Device to THPDevice.
* Remove cuda index.
* Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
* Add max_values and argmax convenience functions to ATen
* Add documentation for torch.argmax/argmin and skip max_values
* Add tests for argmax/argmin
* Dont default the dim argument
* Use dim=0 in test_torch.py for argmax tests
* Implement argmin() and argmax() without dim
* Call .contiguous() before .view(-1)
* Introduce torch.layout and split layout from dtypes.
Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'.
Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case
(i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case.
It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the
last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc.
This is accomplished by:
1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both
torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype
2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch.
* Formatting, make init throw python_error.
* Fix cuda not enabled error message.
* Fix test.
* Support native namespace functions with type dispatch.
Use 'ones' as an example. Note this is a "halfway" solution; i.e. the call chain is:
at::ones(shape, dtype) -> dtype.ones(shape, dtype) -> CPUFloatType.ones(shape, dtype) -> at::native::ones(shape, dtype)
The "nicer" solution would probably be something like:
at::ones(shape, dtype) -> dtype.ones(shape) -> CPUFloatType.ones(shape) -> at::native::ones(shape, this)
* Fix type inference.
* Fix test install.
* Fix extensions.
* Put dtype argument at the beginning.
* Fix extension.cpp.
* Fix rnn.
* Move zeros in the same manner.
* Fix cuda.
* Change randn.
* Change rand.
* Change randperm.
* Fix aten contrib.
* Resize in randperm_out.
* Implement eye.
* Fix sparse zeros.
* linspace, logspace.
* arange.
* range.
* Remove type dispatch from gen_python_functions.
* Properly generate maybe_init_cuda for type dispatch functions not named type.
* Don't duplicate dtype, this parameters for native type dispatched functions.
* Call VariableType factory methods from the base type so it gets version number 0.
* Address review comments.
* Add support for device python arguments with constructors.
* Fix flake8.
* Simplify device handling.
* Dont use torch._C._VariableFunctions.
* Handle default values for functions that have tensor args (e.g. ones_like).
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.
To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.
There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:
https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
* Various dtype improvements.
1) Add dtypes to the new data-based constructors: Variable.new_tensor and torch.autograd.variable.
2) In the python signatures, use Type instead of Dtype to match the C++ signatures; the error messages still print as dtype.
3) Handle / add a better error message when a dtype is used when ATen was not compiled with that type (e.g. cuda types).
4) Move cuda_lazy_init to its own file.
A later commit will add support to the legacy constructors as well.
* Move implementation of lazy_init to cpp.
* Fix parsed_arg size.
* Add numpy-style dtypes to Variable factories.
1) Add numpy-style dtypes corresponding to torch tensor types. These are:
torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64
as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents.
2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names. These are:
torch.half, torch.float, torch.double, torch.short, torch.int, torch.long.
torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures.
3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type.
4) Adds a "dtype" getter to Variables that return the canonical dtype from 1)
This PR is missing the following useful features that should be added in the future:
A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet.
B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function.
C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device.
* backend_to_string can be private.
* Define python binding argument indexes in a more simple way.
* add all_declared_types, still need to hook it up to THPDType.
* Fix all_declared_types for missing types (it's Sparse + Half).
* Ensure cuda dtypes are created even if compiled with NO_CUDA=1.
* Fix case where dtype is provided but dispatch is via namespace.
This happens in ones_like, empty_like, randn_like.
There is some question if we should do:
1) at::ones_like(tensor).toType(dtype)
2) at::ones_like(tensor.toType(dtype))
I did the former because this matches with the numpy documentation, i.e.:
"Overrides the data type of the result." and it's easier to implement.
Note that the above causes an extra copy, either of the input or output.
Here's a better implementation:
1) Make zeros_like, ones_like native functions that take an optional type (named dtype?).
2) Match the type argument with the dtype, so we don't have two different parameters.
3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes())
* Don't return from maybe_initialize_cuda.
* Don't leak DType name.
* Address cpp review comments.
* Share code between sparse and non-sparse test_dtypes.
* Rewrite _like functions as native function with explicit type parameter.
* Use type 'Type' instead of 'dtype' for consistency.
* Address review comments.
* Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.
This adds at::_unsafe_view and uses it in matmul. The _unsafe_view
function is identical to view except that the output is not treated
like a view by the automatic differentiation code. This avoids in-place
modifications triggering the more expensive CopySlices/AsStridedBackward
behavior.
The _unsafe_view function is only safe to use on temporaries that will
be immediately discarded and that do not alias other tensors. Otherwise,
in-place modificatiions may trigger incorrect gradients. The funciton is
not exposed to Python.
See #5169
* Add transpose() to TensorGeometry.
This code is dead; I briefly used it in my RNN patchset but
eventually rewrote it to not be necessary. However, it seemed
like a useful gadget so I kept it. In general, it seems that it
would be useful for TensorGeometry to support all operations that
Tensor does, but it only computes the changes to sizes/strides
instead of actually doing the computation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Turn on wrap_dim behavior for TensorGeometry
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support for hard-coded differentiable outputs.
Some outputs of functions are nondifferentiable, and should always
be returned with requires_grad=False. Traditionally, we have used
the presence of 'grad' to signal that only the first output is
differentiable, and the rest are not, but cudnn_rnn (to be
implemented) breaks this pattern; its first three outputs are differentiable,
but its last output is a buffer that is just consumed by backwards.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* TensorGeometry constructor from just sizes
The sizes are assumed to form a contiguous tensor, and we compute
the strides we would get in that case.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support saving TensorList for backwards.
There is some back story here. Saved TensorList in backwards will
be used by cudnn_rnn, and it is worth asking, why is it necessary to
save a list of tensors? Indeed, *technically* speaking a list of
tensors is not necessary, we only need to save the sizes of each
of the weight tensors. (We need the sizes because cuDNN is only
going to blast the derivative of weights into a flat buffer, but
we need to match the sizes of the views into the buffer when we
eventually return the derivatives.)
However, it was surprisingly awful trying to implement passing just
sizes, because as non-Tensor arguments, the JIT interpreter generation
code is expected to handle all non-Tensor arguments as attributes in the
trace, and our attributes struct doesn't actually know how to do
arrays of arrays. Saved TensorList code was much easier to get working,
so that's what this patch does.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef.
Like ArrayRef, this class does not own the underlying data, it is expected
to be used in situations where the data resides in some other buffer.
This is intended to be trivially copyable, so it should be passed by
value.
For now, 2D only (so the copies are actually cheap, without having
to write a SmallVector class) and contiguous only (so we can
return non-strided ArrayRef on index).
The intended use-case (not in this commit) is to make it easier to
work with RNN weights, which are num_weights x num_layers matrix of
parameters.
P.S. dimension 0 indexes rows, dimension 1 indexes columns
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Generalize getDataType in Descriptors.h
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Change copy_range to take Tensor, and change cat_tensors_backward accordingly
Should a backward function return a Variable or a Tensor? For the most
part, all of our backward functions return Tensor, except cat_tensors_backward,
which returns a variable_list (which is really the only thing that matters,
because Tensor and Variable are interconvertible). But this is kind of weird,
because it means that you can't implement a backwards in ATen that returns
a std::vector<Tensor>, and then hook it up transparently with the derivatives
code. So I switched it over.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support 5-ary return Tensor tuple.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support code generation with mixed Tensor/TensorList in output.
I don't think I ended up using this in cudnn_rnn, but this seems
it might be useful for someone else later.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support 4-ary boolean array
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add support for retain_variables in tools/autograd/derivatives.yaml
'retain_variables', a bool which is true if a user has specified
that saved variables should be retained in case the backwards is
run again later. This allows an optimization where we can
destroy saved buffers if we know variables are not going to be retained,
e.g., it is (will be) used by _cudnn_rnn
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Lazily initialize cuDNN descriptors
Previously, cuDNN descriptors were eagerly allocated as soon
as a FooDescriptor object was created. However, in some uses
of TensorDescriptor, this is problematic: some tensors are optional
and cuDNN's API expects to be given a nullptr TensorDescriptor
in this case, not an uninitialized (but allocated) descriptor.
Lazily initializing the descriptors makes it less likely for
us to use uninitialized memory and matches the usual semantics of
unique_ptr. It's good sense!
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Port cuDNN RNNs to ATen.
This brings three new functions:
- _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into
a single contiguous weight buffer as required by cuDNN
- _cudnn_rnn: run RNN forwards
- _cudnn_rnn_backward: run RNN backwards
RNNs have a lot of parameters, so we restructured what was previously
a single 'fn' object that recorded all the parameters into three
objects: RNNDescriptorParams, TensorDescriptorListParams and
DropoutDescriptorParams.
We make use of MatrixRef to organize the weight tensors (which are
weight/bias x number of layers), but I did not teach the codegen
how to pass these as arguments/return values natively, so instead
a MatrixRef is passed as its constituent ArrayRef and int64_t stride0.
cudnn_rnn has three differentiable outputs and one nondifferentiable
one, so it makes use of the support for hard-coded differentiable outputs.
I haven't deleted all of the descriptor code from Python, because dropout
initialization still goes through this codepath, that should be fixed soon
but I don't see it as essential for this PR.
This commit also removes the last use of NestedIOFunction from PyTorch.
There are some shenanigans with cuDNN dropout descriptor initialization,
see below:
Note [cuDNN dropout descriptor initialization]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In most cases, setting descriptors in cuDNN is cheap (e.g.,
cudnnSetTensorNdDescriptor). However, this is not the case for
cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an
expensive precomputation to initialize the random number generator states. In
cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor,
which means that law-abiding clients were expected to generate a dropout
descriptor once and cache it. However, our ATen interface is (1) stateless (so
we can't cache the descriptors) and (2) does not accept arbitrary user types in
its interface (so we can't pass the descriptor in). This puts us in a pickle.
In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which
forgoes the expensive initialization process, and can initialize the
descriptor with a pre-initialized state CUDA tensor. This is great, because
it means we can simply pass in the state tensor and then initialize the
descriptor internally. Unfortunately, this function is not available in
cuDNN 6.
To work around this, we break the cuDNN abstraction barrier, and have
the struct layout of the underlaying dropout descriptor. With this struct,
we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great!
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Fix cuDNN 7 behavior.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Delete some unused, controversial methods from MatrixRef.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add missing filter_dim_a slice
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Replace nested for-loop with itertools.chain.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* CR comment on mut_desc()
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Refactor DropoutDescriptor API.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Use cached CurrentDeviceProperties from Context.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Document _cudnn_rnn outputs.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Improve fmap docs, convert some functions to use it.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Move IndexRange to autograd/function.h
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Elaborate on CUDNN_STATUS_INVALID_VALUE return some more.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add an all-in-one setter for RNNDescriptorParams.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Print what the unrecognized RNN mode was
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* RNN TensorDescriptor improvements
- Have an explicit size/stride overload for set TensorDescriptor,
so you don't have to create a goofy view to feed in.
- Change the padding to 3D rather than 5D, which is all you actually
need (it's just 2D that is not supported by cuDNN API.)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Fix implementation of cudnnRestoreDropoutDescriptor, plus test.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Better comments about input layout.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add comment about no-DropoutDescriptor argument RNNDescriptor function.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Rename vocab_size back to input_size.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Don't use backslash in comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Bugfix for contiguous TensorGeometry calculation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Make contiguity errors more user-friendly.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* s/fn.dropout.train/fn_train/
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Make dcx properly undefined when not required.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Remove old TODO.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add state size check in cudnnRestoreDropoutDescriptor
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Explicitly narrow int64_t to size_t
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Restore copyParams comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Update benchmark numbers, and slight engineering improvements.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Typofix.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add kwarg-only 'requires_grad' parameter to Variable factories.
Functions that create variables, e.g. torch.ones_like currently always return Variables with requires_grad=False;
this is less convenient than the existing Variable constructor that has a requires_grad parameter. This commit
adds the parameter at the python binding level.
* Fix flake8.
* Address review comments.
* Match set_requires_grad implementation with tensor_new version.
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.
The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
For example, this splits threshold into threshold(), which is now
never in-place, and threshold_() which is always in-place.
This simplifies the in-place vs. non-in-place logic in
gen_variable_type.py, which was bug-prone.
The general strategy is there is a new module, torch.onnx.symbolic, which
contains a function for every ATen method name with the ONNX translation.
While implementing this, I took the opportunity to expunge all references
of 'g' from the public API; instead, it is managed by a global variable in
torch.onnx which tracks the "current graph".
Other changes:
- If you pass a Tensor to op as an argument, it will now automatically be
converted into a Constant ONNX node. This lets us remove needing to
implement ONNX
- Rename value to other, wherever there is both a Scalar and Tensor overload.
This way, keyword dispatch can work uniformly in both cases.
- Deleted any autograd Function classes that both had a symbolic and were ported
to the new C++ autograd implementation. There may still be some straggling
classes that didn't have symbolic.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>