Summary:
Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html
This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files).
This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838
Test Plan: CI. You can also run `flake8` locally.
Reviewed By: jbschlosser
Differential Revision: D27724232
Pulled By: samestep
fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51508
No substantive changes. The codegen for this file was getting a
bit long so I moved it off into tools.codegen.dest submodule (I
wanted to do tools.codegen.gen but that conflicts with the existing
module; oy vey!) To do this I had to move some other functions around
so that they were more generally accessible. Otherwise
self-explanatory.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ljk53
Differential Revision: D26187856
Pulled By: ezyang
fbshipit-source-id: fd3784571d03d01c4acb7ca589fcde4492526408
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49245
This will make it easier to implement the POC in
d534f7d4c5
see also https://github.com/pytorch/pytorch/pull/45666
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D25594005
Pulled By: ezyang
fbshipit-source-id: e458d3dc3a765ec77425761b9b17f23769cecf9e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49122
cpparguments_exprs has induced a lot of head scratching in many recent PRs for how to structure the code in a good way. This PR eliminates the old algorithm for an entirely new algorithm inspired by logic programming. The net result is shorter, cleaner and should be more robust to future changes.
This PR is a bit of a whopper. Here is the order to review it.
- tools/codegen/api/types.py
- Deleted CppArgument, CppArgumentPackIface (and subclasses), CppExpr, DispatcherExpr, DispatcherArgument, NativeExpr, NativeArgument, MetaArgument. All things previously called XArgument are now Binding. All things previously called XExpr are now Expr. I deleted the `__str__` implementation on Binding and fixed all call sites not to use it. On Binding, I renamed `str_no_default` and `str_default` to `defn` and `decl` for better symmetry with the corresponding signature concepts, although I'm open to naming them back to their original versions.
- Obviously, things are less type safe without the class distinctions. So I introduce a new ADT called CType. CType represents the *semantic C++ type* of a binding: it is both the C++ type (e.g., `const Tensor&`) as well as the argument name that specifies what the binding denotes (e.g., `other`). Every binding now records its CType. The key observation here is that you don't actually care if a given expression is from the cpp or dispatcher or native API; what you care is having enough information to know what the expression means, so you can use it appropriately. CType has this information. For the most part, ArgNames are just the string names of the arguments as you see them in JIT schema, but there is one case (`possibly_redundant_memory_format`) where we encode a little extra information. Unlike the plain strings we previously used to represent C++ types, CType have a little bit of structure around optional and references, because the translation code needs to work around these concepts.
- I took the opportunity to kill all of the private fields like `_arguments` and `_returns_type` (since the argument types don't make sense anymore). Everything is computed for you on the fly. If this is a perf problem in codegen we can start using `cached_property` decorator.
- All of the heavy lifting in CppSignature.argument_packs has been moved to the cpp module. We'll head over there next. Similarly, all of the exprs methods are now calling translate, the new functionality which we haven't gotten to yet
- tools/codegen/api/cpp.py
- We refactor all of the type computation functions to return CType instead of str. Because CTypes need to know the denotation, there is a new `binds: ArgName` argument to most functions that provides the denotation, so we can slot it in. (An alternative would have been to construct CTypes without denotations and then fill them in post-facto, but I didn't do it this way. One downside is there are some places where I need a CType without denotation, so I fill these in with `__placeholder__` whenever this happens).
- `argument` and `arguments` are now extremely simple. There is no more Pack business, just produce one or more Bindings. The one thing of note is that when both a `memory_format` and `options` are in scope, we label the memory format as `possibly_redundant_memory_format`. This will be used in translation
- tools/codegen/api/dispatcher.py and tools/codegen/api/native.py - same deal as cpp.py. One thing is that `cpparguments_exprs` is deleted; that is in the translator
- tools/codegen/api/translate.py - the translator! It uses a very simple backwards deduction engine to work out how to fill in the arguments of functions. There are comments in the file that explain how it works.
- Everything else: just some small call site tweaks for places when I changed API.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ljk53
Differential Revision: D25455887
Pulled By: ezyang
fbshipit-source-id: 90dc58d420d4cc49281aa8647987c69f3ed42fa6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48249
Introduced autograd related data models at tools.codegen.api.autograd.
Migrated load_derivatives.py to produce the new data models from derivatives.yaml.
It has clean mypy-strict result.
Changed both gen_autograd_functions.py and gen_variable_type.py to consume
the new data model.
Added type annotations to gen_autograd_functions.py - it has clean mypy-strict
result except for the .gen_autograd import (so haven't added it to the strict
config in this PR).
To limit the scope of the PR, gen_variable_type.py is not refactored, and the
main structure of load_derivatives.py / gen_autograd_functions.py is kept. We
only make necessary changes to make it work.
Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
.jenkins/pytorch/codegen-test.sh <baseline_output_dir>
.jenkins/pytorch/codegen-test.sh <test_output_dir>
Then run diff to compare the generated files:
diff -Naur <baseline_output_dir> <test_output_dir>
```
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D25086561
Pulled By: ljk53
fbshipit-source-id: 1f43ab0931d9814c24683b9a48ca497c5fc3d729
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47223
This PR enables batched gradient computation for advanced indexing.
Previously, the backward formula was writing parts of the grad tensori
in-place to zeros_like(self). Since grad is a BatchedTensor and self is
not a BatchedTensor, this is not possible.
To solve the problem, we instead create a new tensor with
`grad.new_zeros` and then write to that in-place. This new tensor will
have the same batchedness as the `grad` tensor.
To prevent regressions (the autograd codegen special cases zeros_like
to avoid saving the `self` tensor for backward), we teach the autograd
codegen how to save `self.options()`.
Test Plan:
- new tests
- run old indexing tests
Reviewed By: ejguan
Differential Revision: D24741684
Pulled By: zou3519
fbshipit-source-id: e267999dc079f4fe58c3f0bdf5c263f1879dca92
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24184
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D16764168
Pulled By: ezyang
fbshipit-source-id: cc252a860fd7e4b7fb2b95c5d9fcdbf6935ffeb6
Summary:
re-apply changes reverted in:
https://github.com/pytorch/pytorch/pull/22412
Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456
Differential Revision: D16097159
Pulled By: nairbv
fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d
Summary:
This change is backwards incompatible in *C++ only* on mean(), sum(), and prod() interfaces that accepted either of:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
but now to specify both the dim and dtype will require the keepdim parameter:
```
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
```
[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088
Reviewed By: ailzhang
Differential Revision: D15944971
Pulled By: nairbv
fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74
Summary:
For consistency, derivatives.yaml now uses the same schema specification as native_functions.yaml.
Note that there are some small downsides, e.g. changing the default values or return parameter names in native_functions.yaml also now requires updating derivatives.yaml as well. But this has a few nice properties:
1) Able to copy-paste definitions from native_functions to derivatives.
2) Makes it impossible to write derivatives for operators without schemas (e.g. old TH operators).
3) Moves us closer to the ideal situation of co-locating forward and backwards declarations.
Note that this doesn't change any generated code; in particular, this has the same behavior of mapping in-place and out-of-place definitions together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20916
Differential Revision: D15497800
Pulled By: gchanan
fbshipit-source-id: baee5caf56b675ce78dda4aaf6ce6a34575a6432
Summary:
This only deals with four functions, but is an important first step towards removing BoolTensor and IndexTensor entirely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17193
Differential Revision: D14157829
Pulled By: cpuhrsch
fbshipit-source-id: a36f16d1d88171036c44cc7de60ac9dfed9d14f2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751
This was made more complicated by the fact that ivalue::IntList
is a thing. So I had to fix all of the sites where we referring
to IValue post facto.
The following codemods were run, in this order:
```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```
Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752
Reviewed By: dzhulgakov
Differential Revision: D13954363
fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
Summary:
Note there was a hacky way of doing this before by specifying "return:" lists manually; this makes the
return names part of the function declaration itself.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14100
Differential Revision: D13101810
Pulled By: gchanan
fbshipit-source-id: 1c80574cd4e8263764fc65126427b122fe36df35
Summary:
This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes#9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack.
Follow up:
remove python_default_init completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582
Differential Revision: D10417423
Pulled By: wanchaol
fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
* Add transpose() to TensorGeometry.
This code is dead; I briefly used it in my RNN patchset but
eventually rewrote it to not be necessary. However, it seemed
like a useful gadget so I kept it. In general, it seems that it
would be useful for TensorGeometry to support all operations that
Tensor does, but it only computes the changes to sizes/strides
instead of actually doing the computation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Turn on wrap_dim behavior for TensorGeometry
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support for hard-coded differentiable outputs.
Some outputs of functions are nondifferentiable, and should always
be returned with requires_grad=False. Traditionally, we have used
the presence of 'grad' to signal that only the first output is
differentiable, and the rest are not, but cudnn_rnn (to be
implemented) breaks this pattern; its first three outputs are differentiable,
but its last output is a buffer that is just consumed by backwards.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* TensorGeometry constructor from just sizes
The sizes are assumed to form a contiguous tensor, and we compute
the strides we would get in that case.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support saving TensorList for backwards.
There is some back story here. Saved TensorList in backwards will
be used by cudnn_rnn, and it is worth asking, why is it necessary to
save a list of tensors? Indeed, *technically* speaking a list of
tensors is not necessary, we only need to save the sizes of each
of the weight tensors. (We need the sizes because cuDNN is only
going to blast the derivative of weights into a flat buffer, but
we need to match the sizes of the views into the buffer when we
eventually return the derivatives.)
However, it was surprisingly awful trying to implement passing just
sizes, because as non-Tensor arguments, the JIT interpreter generation
code is expected to handle all non-Tensor arguments as attributes in the
trace, and our attributes struct doesn't actually know how to do
arrays of arrays. Saved TensorList code was much easier to get working,
so that's what this patch does.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef.
Like ArrayRef, this class does not own the underlying data, it is expected
to be used in situations where the data resides in some other buffer.
This is intended to be trivially copyable, so it should be passed by
value.
For now, 2D only (so the copies are actually cheap, without having
to write a SmallVector class) and contiguous only (so we can
return non-strided ArrayRef on index).
The intended use-case (not in this commit) is to make it easier to
work with RNN weights, which are num_weights x num_layers matrix of
parameters.
P.S. dimension 0 indexes rows, dimension 1 indexes columns
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Generalize getDataType in Descriptors.h
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Change copy_range to take Tensor, and change cat_tensors_backward accordingly
Should a backward function return a Variable or a Tensor? For the most
part, all of our backward functions return Tensor, except cat_tensors_backward,
which returns a variable_list (which is really the only thing that matters,
because Tensor and Variable are interconvertible). But this is kind of weird,
because it means that you can't implement a backwards in ATen that returns
a std::vector<Tensor>, and then hook it up transparently with the derivatives
code. So I switched it over.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support 5-ary return Tensor tuple.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support code generation with mixed Tensor/TensorList in output.
I don't think I ended up using this in cudnn_rnn, but this seems
it might be useful for someone else later.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Support 4-ary boolean array
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add support for retain_variables in tools/autograd/derivatives.yaml
'retain_variables', a bool which is true if a user has specified
that saved variables should be retained in case the backwards is
run again later. This allows an optimization where we can
destroy saved buffers if we know variables are not going to be retained,
e.g., it is (will be) used by _cudnn_rnn
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Lazily initialize cuDNN descriptors
Previously, cuDNN descriptors were eagerly allocated as soon
as a FooDescriptor object was created. However, in some uses
of TensorDescriptor, this is problematic: some tensors are optional
and cuDNN's API expects to be given a nullptr TensorDescriptor
in this case, not an uninitialized (but allocated) descriptor.
Lazily initializing the descriptors makes it less likely for
us to use uninitialized memory and matches the usual semantics of
unique_ptr. It's good sense!
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Port cuDNN RNNs to ATen.
This brings three new functions:
- _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into
a single contiguous weight buffer as required by cuDNN
- _cudnn_rnn: run RNN forwards
- _cudnn_rnn_backward: run RNN backwards
RNNs have a lot of parameters, so we restructured what was previously
a single 'fn' object that recorded all the parameters into three
objects: RNNDescriptorParams, TensorDescriptorListParams and
DropoutDescriptorParams.
We make use of MatrixRef to organize the weight tensors (which are
weight/bias x number of layers), but I did not teach the codegen
how to pass these as arguments/return values natively, so instead
a MatrixRef is passed as its constituent ArrayRef and int64_t stride0.
cudnn_rnn has three differentiable outputs and one nondifferentiable
one, so it makes use of the support for hard-coded differentiable outputs.
I haven't deleted all of the descriptor code from Python, because dropout
initialization still goes through this codepath, that should be fixed soon
but I don't see it as essential for this PR.
This commit also removes the last use of NestedIOFunction from PyTorch.
There are some shenanigans with cuDNN dropout descriptor initialization,
see below:
Note [cuDNN dropout descriptor initialization]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In most cases, setting descriptors in cuDNN is cheap (e.g.,
cudnnSetTensorNdDescriptor). However, this is not the case for
cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an
expensive precomputation to initialize the random number generator states. In
cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor,
which means that law-abiding clients were expected to generate a dropout
descriptor once and cache it. However, our ATen interface is (1) stateless (so
we can't cache the descriptors) and (2) does not accept arbitrary user types in
its interface (so we can't pass the descriptor in). This puts us in a pickle.
In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which
forgoes the expensive initialization process, and can initialize the
descriptor with a pre-initialized state CUDA tensor. This is great, because
it means we can simply pass in the state tensor and then initialize the
descriptor internally. Unfortunately, this function is not available in
cuDNN 6.
To work around this, we break the cuDNN abstraction barrier, and have
the struct layout of the underlaying dropout descriptor. With this struct,
we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great!
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Fix cuDNN 7 behavior.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Delete some unused, controversial methods from MatrixRef.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add missing filter_dim_a slice
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Replace nested for-loop with itertools.chain.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* CR comment on mut_desc()
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Refactor DropoutDescriptor API.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Use cached CurrentDeviceProperties from Context.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Document _cudnn_rnn outputs.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Improve fmap docs, convert some functions to use it.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Move IndexRange to autograd/function.h
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Elaborate on CUDNN_STATUS_INVALID_VALUE return some more.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add an all-in-one setter for RNNDescriptorParams.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Print what the unrecognized RNN mode was
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* RNN TensorDescriptor improvements
- Have an explicit size/stride overload for set TensorDescriptor,
so you don't have to create a goofy view to feed in.
- Change the padding to 3D rather than 5D, which is all you actually
need (it's just 2D that is not supported by cuDNN API.)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Fix implementation of cudnnRestoreDropoutDescriptor, plus test.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Better comments about input layout.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add comment about no-DropoutDescriptor argument RNNDescriptor function.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Rename vocab_size back to input_size.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Don't use backslash in comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Bugfix for contiguous TensorGeometry calculation.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Make contiguity errors more user-friendly.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* s/fn.dropout.train/fn_train/
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Make dcx properly undefined when not required.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Remove old TODO.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add state size check in cudnnRestoreDropoutDescriptor
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Explicitly narrow int64_t to size_t
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Restore copyParams comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Update benchmark numbers, and slight engineering improvements.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Typofix.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
When generating autograd::Function wrappers for ATen functions, we need
to take derivative expressions in derivatives.yaml (identified by name)
and correlate them with the correct index they should take in
grad_inputs (identified positionally only). Previously, this
computation was done *statically* in load_derivatives.py (set_up_derivatives)
and then we hard-coded indices in the generated Functions.cpp.
This is sufficient for supporting ATen operations which consist solely
of Tensor arguments, or a single TensorList argument. However, this
strategy will not work for mixed Tensor/TensorList arguments, as the
index of any Tensor after a TensorList is not known at codegen time,
since it will vary depending on the length of the TensorList, e.g.,
foo({x1, x2}, y) ==> y is index 2
foo({x1, x2, x3}, y) ==> y is index 3
This commit introduces a new strategy for generating these indices which
pushes index computation to *runtime* (though any decent C++ optimizer
can re-optimize the index computation back into constants; this was
verified in Godbolt.) Instead of hard-coding constants, a small
IndexRangeGenerator object is created and used to generate the correct
index ranges (std::pair<size_t, size_t>) for each argument.
Here is an example of mm rewritten in the new codegen format:
variable_list MmBackward::apply(const variable_list& grads) {
IndexRangeGenerator gen;
auto self_ix = gen.range(1);
auto mat2_ix = gen.range(1);
variable_list grad_inputs(gen.size());
auto& grad = grads[0];
auto self = self_.unpack();
auto mat2 = mat2_.unpack();
if (should_compute_output({ mat2_ix })) {
auto grad_result = mm_mat2_backward(grad, self, mat2_sizes, mat2.strides(), 1);
copy_range(grad_inputs, mat2_ix, grad_result);
}
if (should_compute_output({ self_ix })) {
auto grad_result = mm_mat1_backward(grad, mat2, self_sizes, self.strides(), 1);
copy_range(grad_inputs, self_ix, grad_result);
}
return grad_inputs;
}
Unlike before, where self_ix and mat2_ix were hardcoded as 0 and 1,
we derive them by invoking IndexRangeGenerator (which internally
is just a little counter which bumps up each invocation of 'range').
Each _ix variable actually represents a range, as can be seen here.
variable_list CatBackward::apply(const variable_list& grads) {
IndexRangeGenerator gen;
auto tensors_ix = gen.range(tensors_size_);
variable_list grad_inputs(gen.size());
auto& grad = grads[0];
if (should_compute_output({ tensors_ix })) {
auto grad_result = cat_tensors_backward(grad, tensors_sizes_dim, dim);
copy_range(grad_inputs, tensors_ix, grad_result);
}
return grad_inputs;
}
The invocation of 'copy_range' reads a TensorList returned by the
backward function into the correct entries in grad_inputs.
tensors_size_ is a new member of CatBackward which is filled with
the size of the forward input tensor when cat is originally invoked.
With this new code generation strategy, we can completely eliminate
the special cases for Tensor and TensorList in index selection, and
we can smoothly support mixed Tensor/TensorList by making multiple
invocations of gen.range() with non-one arguments.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.
The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
This commit fixes double-backwards on batch norm. There were two
bugs:
- Returned buffers from batchnorm backwards were being marked as differentiable
when they shouldn't be. The fix for this is "easy": use 'grad' instead of
'grads[0]' in cudnn_batch_norm's backward definition. (More on this below.)
- I was using toTensor on a Scalar, which gives me a Tensor of the wrong
type when I'm in CUDA world. Using the Scalar add() overload directly
solves the problem.
The differentiability of returned buffers was annoyingly subtle and I nearly
went off and implemented a big pile of infrastructure to "tell" the codegen how
to distinguish between differentiable and non-differentiable outputs before
realizing that there must be a way we do this legitimately, because it works for
THNN. I documented this in derivatives.yaml, and also added tests for the
problem in load_derivatives.py to catch the various ways you could "get it
wrong". Hope this helps someone else.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
The gen_variable_type.py script now is only responsible for generating
VariableType.h/cpp. The parent script, "gen_autograd.py", delegates to
gen_autograd_functions.py, gen_variable_type.py, and
gen_python_functions.py.
I've removed "fallthrough" functions. It's replaced by
DONT_RECORD_TRACE, DONT_PROFILE, and DONT_REQUIRE_DERIVATIVE.
In preparation for binding the _out variants, I changed some static
types to Tensor (from Variable) and we now unpack and name tuple return
values.
This modifies NN binding in ATen so that the xxx_forward functions now
return buffers instead of taking them as inputs. The NN functions with
no suffix are implemented in Type.cpp. They call the xxx_forward
variants and discard any returned buffers.
This simplifies derivatives for NN functions. The derivatives are now
defined on the xxx_forward functions and buffers are treated as any
other input.
This is a step towards removing the special casing of NN functions in gen_variable_type.py. It fixes the signature of in-place NN functions so that they return Tensor & instead of Tensor.