Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782
Reviewed By: ezyang
Differential Revision: D9583378
Pulled By: erikbrinkman
fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
Summary:
Test only for existence for now. I had to skip a lot of them so there a FIXME in the test.
Also I'm not testing torch.* because of namespace issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311
Differential Revision: D9196341
Pulled By: SsnL
fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc
Summary:
closes#9702 .
cc jph00
Commit structure:
1. Change the index calculation logic. I will explain using 1-D for simplicity.
Previously we have (in pseudo code):
```
// 1. get the float locations from grid
scalar_t x = from_grid()
// 2. find the integral surrounding indices
int x_left = floor(x)
int x_right = x_left + 1
// 3. calculate the linear interpolate weights
scalar_t w_left = x_right - x
scalar_t w_right = x - x_left
// 4. manipulate the integral surrounding indices if needed
// (e.g., clip for border padding_mode)
x_left = manipulate(x_left, padding_mode)
x_right = manipulate(x_right, padding_mode)
// 5. interpolate
output_val = interpolate(w_left, w_right, x_left, x_right)
```
This is actually incorrect (and also unintuitive) because it calculates the
weights before manipulate out-of-boundary indices. Fortunately, this
isn't manifested in both of the current supported modes, `'zeros'` and
`'border'` padding:
+ `'zeros'`: doesn't clip
+ `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are
clipped to the same value, so weights don't matter
But this is a problem with reflection padding, since after each time we reflect,
the values of `w_left` and `w_right` should be swapped.
So in this commit I change the algorithm to (numbers corresponding to the
ordering in the above pseudo-code)
```
1. get float location
4. clip the float location
2. find the integral surrounding indices
3. calculate the linear interpolate weights
```
In the backward, because of this change, I need to add new variables to track
`d manipulate_output / d manipulate_input`, which is basically a multiplier
on the gradient calculated for `grid`. From benchmarking this addition doesn't
cause obvious slow downs.
2. Implement reflection padding. The indices will keep being reflected until
they become within boundary.
Added variant of `clip_coordinates` and `reflect_coordinates` to be used in
backward. E.g.,
```cpp
// clip_coordinates_set_grad works similarly to clip_coordinates except that
// it also returns the `d output / d input` via pointer argument `grad_in`.
// This is useful in the backward pass of grid_sampler.
scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t *grad_in)
```
For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`.
If `in` is reflected **odd** times in `'reflection'` mode, `grad_in`
is set to `-1`.
3. Implement nearest interpolation.
4. Add test cases
5. Add better input checking
Discussed with goldsborough for moving `operator<<` of `at::Device`,
`at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise
`AT_CHECK` can't find them.)
6. Support empty tensors. cc gchanan
+ Make empty tensors not acceptable by cudnn.
+ Add `AT_ASSERT(kernel block size > 0)` if using `GET_BLOCKS`
+ Cache `numel` in `TensorGeometry`
I was going to use `numel` to test if cudnn descriptor should accept a
tensor, but it isn't used eventually. I can revert this if needed.
7. Add more test cases, including on input checking and empty tensors
8. Remove an obsolete comment
9. Update docs. Manually tested by generating docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051
Differential Revision: D9123950
Pulled By: SsnL
fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6
Summary:
- fixes#9141, #9301
- use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion)
- return (N) instead of (N, C) to match the same behavior as MultiMarginLoss
- Note that with this PR, the following behavior is expected:
```
loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none')
loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean')
loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum')
loss.sum() == loss_sum # True
loss.mean() == loss_mean # True
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965
Differential Revision: D9038402
Pulled By: weiyangfb
fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9
Summary:
There is a regression in softmin in 0.4.1 that was not present in 0.4.0. The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x). These are not the same. The fix is trivial because the bug is due to operator precedence.
This is a major regression that broke my training. I'm not sure how a unit test did not catch this.
```
x = torch.tensor([1, 2, 3.5, 4])
print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0
print(F.softmax(-x, dim=0)) # this is what softmax should be
print(F.softmax(x, dim=0))
print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly
```
In 0.4.1 this produces
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
tensor([0.6668, 0.2453, 0.0547, 0.0332])
tensor([0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
In 0.4.0 this produces the correct values
tensor([ 0.6668, 0.2453, 0.0547, 0.0332])
tensor([ 0.6668, 0.2453, 0.0547, 0.0332])
tensor([ 0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066
Differential Revision: D9106995
Pulled By: soumith
fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b
Summary:
_pointwise loss has some python special casing, we converted reduction to aten enums too early.
fixes#10009
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018
Differential Revision: D9075489
Pulled By: li-roy
fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162
Summary:
The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the
modification that is is in log space.
The there also is a binding for the (much faster) CuDNN implementation.
This could eventually fix#3420
I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments.
I could use feedback on all sorts of things, including:
- Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors)
- Input convention. I use log probs because that is what the gradients are for.
- Launch parameters for the kernels
- Errors and obmissions and anything else I'm not even aware of.
Thank you for looking!
In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this).
I have read CuDNN is much faster than implementations because it does *not* use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step.
Average timings for the kernels from nvprof for some size:
```
CuDNN:
60.464us compute_alphas_and_betas
16.755us compute_grads_deterministic
Cuda:
121.06us ctc_loss_backward_collect_gpu_kernel (= grads)
109.88us ctc_loss_gpu_kernel (= alphas)
98.517us ctc_loss_backward_betas_gpu_kernel (= betas)
WarpCTC:
299.74us compute_betas_and_grad_kernel
66.977us compute_alpha_kernel
```
Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations.
Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably.
My performance measuring testing script:
```
import timeit
import sys
import torch
num_labels = 10
target_length = 30
input_length = 50
eps = 1e-5
BLANK = 0#num_labels
batch_size = 16
torch.manual_seed(5)
activations = torch.randn(input_length, batch_size, num_labels + 1)
log_probs = torch.log_softmax(activations, 2)
probs = torch.exp(log_probs)
targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long)
targets_2d = targets.view(batch_size, target_length)
target_lengths = torch.tensor(batch_size*[target_length])
input_lengths = torch.tensor(batch_size*[input_length])
activations = log_probs.detach()
def time_cuda_ctc_loss(grout, *args):
torch.cuda.synchronize()
culo, culog_alpha = torch._ctc_loss(*args)
g, = torch.autograd.grad(culo, args[0], grout)
torch.cuda.synchronize()
def time_cudnn_ctc_loss(groupt, *args):
torch.cuda.synchronize()
culo, cugra= torch._cudnn_ctc_loss(*args)
g, = torch.autograd.grad(culo, args[0], grout)
torch.cuda.synchronize()
def time_warp_ctc_loss(grout, *args):
torch.cuda.synchronize()
culo = warpctc.ctc_loss(*args, blank_label=BLANK, size_average=False, length_average=False, reduce=False)
g, = torch.autograd.grad(culo, args[0], grout)
torch.cuda.synchronize()
if sys.argv[1] == 'cuda':
lpcu = log_probs.float().cuda().detach().requires_grad_()
args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK]
grout = lpcu.new_ones((batch_size,))
torch.cuda.synchronize()
print(timeit.repeat("time_cuda_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'cudnn':
lpcu = log_probs.float().cuda().detach().requires_grad_()
args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True]
grout = lpcu.new_ones((batch_size,))
torch.cuda.synchronize()
print(timeit.repeat("time_cudnn_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'warpctc':
import warpctc
activations = activations.cuda().detach().requires_grad_()
args = [activations, input_lengths.int(), targets.int(), target_lengths.int()]
grout = activations.new_ones((batch_size,), device='cpu')
torch.cuda.synchronize()
print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals()))
```
I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628
Differential Revision: D8952453
Pulled By: ezyang
fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860
Summary:
As in the title. Lets us simplify a lot of code.
Depends on #9363, so please review only the last commit.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414
Reviewed By: zdevito
Differential Revision: D8836496
Pulled By: apaszke
fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3
Summary:
It implements per-channel alpha_dropout. It also creates corresponding function classes and unifies the process of dropout and alpha_dropout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9073
Differential Revision: D8727008
Pulled By: ezyang
fbshipit-source-id: 9d509f9c5db4e98f7b698cdfc4443505a4d2b331
Summary:
Commits:
1. In extension doc, get rid of all references of `Variable` s (Closes#6947 )
+ also add minor improvements
+ also added a section with links to cpp extension :) goldsborough
+ removed mentions of `autograd.Function.requires_grad` as it's not used anywhere and hardcoded to `return_Py_True`.
2. Fix several sphinx warnings
3. Change `*` in equations in `module/conv.py` to `\times`
4. Fix docs for `Fold` and `Unfold`.
+ Added better shape check for `Fold` (it previously may give bogus result when there are not enough blocks). Added test for the checks.
5. Fix doc saying `trtrs` not available for CUDA (#9247 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9239
Reviewed By: soumith
Differential Revision: D8762492
Pulled By: SsnL
fbshipit-source-id: 13cd91128981a94493d5efdf250c40465f84346a
Summary:
This PR addresses #5823.
* fix docstring: upsample doesn't support LongTensor
* Enable float scale up & down sampling for linear/bilinear/trilinear modes. (following SsnL 's commit)
* Enable float scale up & down sampling for nearest mode. Note that our implementation is slightly different from TF that there's actually no "align_corners" concept in this mode.
* Add a new interpolate function API to replace upsample. Add deprecate warning for upsample.
* Add an area mode which is essentially Adaptive_average_pooling into resize_image.
* Add test cases for interpolate in test_nn.py
* Add a few comments to help understand *linear interpolation code.
* There is only "*cubic" mode missing in resize_images API which is pretty useful in practice. And it's labeled as hackamonth here #1552. I discussed with SsnL that we probably want to implement all new ops in ATen instead of THNN/THCUNN. Depending on the priority, I could either put it in my queue or leave it for a HAMer.
* After the change, the files named as *Upsampling*.c works for both up/down sampling. I could rename the files if needed.
Differential Revision: D8729635
Pulled By: ailzhang
fbshipit-source-id: a98dc5e1f587fce17606b5764db695366a6bb56b
Summary:
1. Let `ModuleTest` raise when they fail on non-contiguous inputs. Fix legacy modules.
2. Fix BN (both THNN and cuDNN) not working on non-contiguous inputs.
3. Fix CUDA EmbeddingBag not working on non-contiguous inputs. To prevent calling `.contiguous()` on in both `forward` and `backward`,
a. prefix all current `embedding_bag*` functions with `_`, indicating that they require input to be contiguous (there is a check in each function).
b. create `embedding_bag`, which makes input arguments `.contiguous()`, and calls `_embedding_bag`
3. Make many ATen `embedding*` functions to work on non-contiguous inputs so we don't need to call `input = input.contiguous()` in Python `nn.functional.embedding`.
4. Fix dense-sparse addition when the sparse input is not coalesced and indices or values tensor is not contiguous. This came up in the test cases of Embedding modules with `sparse=True`. Added tests.
5. Update `TensorUtils.cpp` to use `AT_*` macros.
Request:
review from cpuhrsch on the `Embedding*` changes.
review from ezyang on ATen sparse & BN changes.
Closes https://github.com/pytorch/pytorch/pull/9114
Differential Revision: D8717299
Pulled By: SsnL
fbshipit-source-id: 0acc6f1c9522b5b605361e75112c16bbe1e98527
Summary:
The tests were using the old args, which caused them to emit a lot of deprecation warnings.
closes#9103.
Reviewed By: ezyang
Differential Revision: D8720581
Pulled By: li-roy
fbshipit-source-id: 3b79527f6fe862fb48b99a6394e8d7b89fc7a8c8
* Add pos_weight argument to nn.BCEWithLogitsLoss and F.binary_cross_entropy_with_logits (#5660)
- Add an option to control precision/recall in imbalanced datasets
- Add tests (but new_criterion_tests)
* Move pos_weight to the end of args list in the documentation.
`pos_weight` was moved to the end because it is the last argument in both
`nn.BCEWithLogitsLoss` and `binary_cross_entropy_with_logits`
* 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet
* optimized memory read/write
* 1. pass in lambd as scalar for CPU/CUDA_apply*; 2. removed tests for hardshrink at test_legacy_nn
* fixes test_utils
* 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda
* 1. printing lambd value; 2. default lambd=0.5 is still failing
* getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py
* cleaned up debug printf
* move softmax/logsoftmax to ATen
* specify cpu and gpu accum types
* use accreal for CPU
* expose softmax backward to python, fix legacy interface
* fix Distributions.cu to use common AccumulateType
* fix cuda 8 build
* delete commented out lines
* rebase on master, fix breakages
* Add max mode support to EmbeddingBag
* Lint fix
* Fix compilation issue on other platforms
* Rebase + don't waste memory when not in max mode
* Oops, missed a spot
* Fix whitespace from merge
* less precision
* Lower precision to avoid spurious failures
* Minor typo
* Switch to size()
* Added ReLU unit to LP pooling, so the gradient does not become NAN if all inputs are zero.
* Added workaround for odd p. Added a bit of doc.
* Make the linter happy.
* Codemod to update our codebase to 0.4 standard
* Update some of the test scri[ts
* remove Variable in test_clip_grad_value
* fix _symbolic_override_wrapper_maker
Fixes#5554
Adds an error message for when NLLLoss is passed an input and target
whose batch sizes don't match. Ideally this check should live in ATen
but since there is NLLLoss logic in python the check is there right now.
According to the code in _torch/nn/functional.py:1399_
(```if target.size()[1:] != input.size()[2:]:```),
if the size of input is (N, C, d_1, d_2, ..., d_K), the size of target should be (N, d_1, d_2, ..., d_K).
* Changes in bilinear upsampling
* Add align_corners option to upsampling module & functional when using linearly interpolating modes
When align_corners=True, it uses the old original upsampling scheme, which gives visually better results,
but doesn't properly align input and output pixels, and thus cause the output vary basing on input.
This PR adds this align_corners option, and changes the default behavior to align_corners=False, with
proper warning if this option is not specified upon using nn.Upsample or nn.functional.upsample to let
be aware of this new change.
Adds tests in test_nn.py for spatial invariance when align_corners=False, and usual module tests for
align_corners=False.
* remove redundant checks and unnecessary variables; fix the cast
* fix negative indices
This PR addresses issue #5024
* Expose Conv2dBackward in python
* Separate interface for exposing gardients of operators
* Revert old changes
* Add tests
* Add conv1d gradients. Refactor tests for grad convolutions
* Refactor names and change examples
* Remove Varibale from tests for conv backward
* add reduce=True arg to MarginRankingLoss
* make default margin arg match for legacy
* remove accidentally added test
* fix test
* fix native_functions.yaml alphabetical order
* support n-d inputs in bilinear and move to aten
* support n-d inputs in bilinear and move to aten
* add asserts to bilinear inputs
* address comments
* cast int64_t in asserts
* implement TripletMarginLoss as a native function
* implement TripletMarginLoss as native function
* fix compile error
* address comments
* address comments
* Add keepdim arg to pairwise distance
* Fix some minor errors in existing docs.
* Fix Convolution and Pooling docs in torch.nn.functional
* Cleaned up torch.nn.functional docs
* Address @SsnL 's comments
* Add multiplication sign missing in docs
* Fix more typos, and clear some warnings
* Change infinity symbol in LPPool2d
* Revert some changes in torch.nn.functional
* Few more minor changes
* implement CosineEmbeddingLoss as a native function and add reduce=True arg to it
* fix flake8
* address comments
* add reference function to tests
* fix flake8
The nn.* counterpart of #5443 . Mostly removed Variable wrapper. Also added doc for nn.RReLU.
Notice that torch.randn(*, requires_grad=True) isn't documented until #5462 is done.
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.
To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.
There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:
https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
* at::maybe_data_ptr and Check.h => TensorUtils.h
* THNN support for optional BN running_*
* ATen support for optional BN running_*
* Python nn.* support for optional BN running_*; Improve IN and BN doc
* Add tests for IN and BN new option
* Layer Norm
* Fix LRN doc
* functional interface for LN and IN
* Layer norm tests
* fix BN double backward returning undefined tensors
* fix jit test using wrong dim inputs for BN
* add/improve BN, IN and LN GPU tests with half type
* Udpate docs to be consistent with Conv notation
Fix onnx
Clarified onnx symbokic wrapper
* fix typo
* Address comments
* add reduce=True arg to HingeEmbeddingLoss
* pass arg to super constructor in HingeEmbeddingLoss
* make HingeEmbeddingLoss reference fn work on legacy
* Add criterion scalar tests.
This exposed an issue in MarginRankingLoss with scalars, but the cleanest way to fix is to wait
until forward runs on Variables (so we don't have to wait for the backward to check if something
is a scalar).
* Fix flake8.
* Add error message for margin_ranking_loss with scalars.
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.
The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
Implements nn.Embedding (lookup table) in ATen.
Breaking change: new optional argument padding_idx in F.embedding to
match nn.Embedding.
Note that there are a few bugs in Embedding that are inherited from the
previous code:
- CUDA renorm has race conditions if index contains duplicate entries
- sparse gradient doesn't work with scale_grad_by_freq
This is a step towards removing the special casing of NN functions in gen_variable_type.py. It fixes the signature of in-place NN functions so that they return Tensor & instead of Tensor.
- Rename THNN convolution to have thnn_ prefix.
- Propagate CuDNN benchmark and deterministic to at::Context
- Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults
The conv_transposeNd wrappers are updated to have the same argument
order as Python.
- torch.nn.functional directly dispatches to the native wrappers
- Make it possible to turn off tracing for some native wrappers, so I don't
have to write symbolics for all the functions above
- Spectral ops can now make use of CuDNN convolution if possible
- Better commentary on cudnn_batch_norm
- Turn on DCE for all JIT tests.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Batchnorm in ATen
This commit moves BatchNorm derivatives into ATen, eliminating
torch/csrc/autograd/functions/batch_normalization.cpp
Some refactoring along the way:
- Functions got renamed to remove _forward from their names
- CuDNN batchnorm forward was modified to return save_mean/save_std instead of
take it as parameters. To avoid returning undefined Variables, these return
(small) uninitialized tensors when they are not used.
- THNN batch normalization takes care of resizing save_mean and save_std on
forward.
- There are some shenanigans re batchnorm backwards in eval mode. I'm tracking
that in #4284
- I decided not to introduce buffers as a proper concept in ATen, which means
that tensors like running_mean/running_var are variables in ATen. This meant
there needed to be some adjustments to how we *trace* such variables; the
new strategy is if we can't find a Value for a variable, we look and see
if we have a Value for the buffer pointed to by the variable, before
finally falling back on constant.
- This PR finally reliably triggered OOM on Travis builds; I fixed this by reducing
the number of parallel jobs.
- Stop using std::string when it's not necessary.
- Remove training parameter from cudnn_batch_norm_backward, because it
doesn't make sense; cuDNN doesn't implement the math for evaluation mode
batchnorm backwards.
- batchnorm_double_backward is now in an anonymous namespace, as it
no longer needs to be called from torch/csrc
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Convolution derivatives in ATen
This PR introduces ATen implementation of convolution, which dispatches to
THNN/CuDNN/nnpack based on input parameters. The general strategy is to compose
this function out of the various forward-backward pairs of specific
implementations, rather than write a monolithic function with backwards (which
is what we did before because the boilerplate of doing it otherwise would have
been very high.) The new API provides the following functions:
- _convolution, which is a fully generic, native convolution implementation
that dispatches to various other convolution implementations depending on
input characteristics. This is prefixed with an underscore because it
explicitly takes benchmark, deterministic and cudnn_enabled which are
implementation details for CuDNN. The intent is to eventually provide a
convolution that reads these parameters out of the context using #4104.
- _convolution_nogroup is a convolution implementation for non-CuDNN
algorithms which don't support group convolution natively.
- _convolution_double_backward is the generic double-backwards implementation
for convolution.
In more detail:
- Most functionality from torch/csrc/autograd/functions/convolution.cpp has been
moved into aten/src/ATen/native/Convolution.cpp
- We continue to make use of ConvParams, but we now construct the parameters
upon entry to a function from the function signature (which does not use
ConvParams; having convolution take ConvParams directly would require teaching
the code generator how to accept these as parameters, complicating ATen's API
model) and destruct them when making subprocedure calls.
- I introduce a new idiom, input_r, which represents a const Tensor& reference,
which will subsequently be assigned to a local Tensor input. This is helpful
because a lot of the existing algorithms relied on being able to assign to
locals, which is not permitted with a const reference.
- The native argument parser now supports std::array<bool,2> inputs (NB: there
MUST NOT be a space; this is the same hack as is applied to derivatives.yaml)
- Native parser now supports Tensor? arguments, which indicates a nullable
tensor. Previously this function was only used by NN methods.
- Documentation updates on THNN library
- I added an extra fgradInput argument to VolumetricConvolutionMM_updateOutput
and VolumetricConvolutionMM_accGradParameters so that its buffer list lines up
with the backward argument list. This makes it possible to write derivative
for conv3d which previously was not supported (commented out in
derivatives.yaml)
- Extra double_backward declarations for all convolution backwards functions was
added.
- You can now use the syntax Tensor? in native_functions.yaml to indicate that a
tensor argument is nullable. There are adjustments to propagate this to the
Python argument parser.
- NNPACK was ported to ATen, and ATen now builds and links against ATen if
possible. New AT_NNPACK_ENABLED macro. The nnpack functions are
nnpack_spatial_convolution.
- Some modest CuDNN convolution refactoring to remove _forward from names.
- There's a new cudnn_convolution_backward function to deal with the fact that
CuDNN convolution double backward requires you to have computed all gradients
in one go.
- Variable set_flags now checks if the tensor is undefined, fixing a silent memory
corruption.
- checkSameType updated to not raise an exception if called with Variable arguments
- "no ATen declaration found for" error message is improved to say what available declarations are
- make_variable now accepts undefined tensors, and returns an undefined tensor in this case.
* add reduce arg to PoissonNLLLoss
* fixed comments except reference function
* fixed unit test
* small indentation fix
* fixing last comments by richard
* lint check
* another linting issue
* Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra
The executive summary is that this moves the torch/csrc/cudnn
library into ATen, adding a number of new cudnn_ methods to ATen
for batchnorm, convolution, affine grid generator and grid sampler.
ATen infra changes:
- TensorGeometry was moved to ATen
- TensorGeometry was modified to make its interface resemble that of
Tensor; in particular, sizes is no longer a field, it's a method.
- AT_CUDA_ENABLED macro is set via ATen/Config.h header which is
generated at cmake configure time.
Fixes https://github.com/zdevito/ATen/issues/168
- Change AT_CUDA_ENABLED macro to be a function macro, so that we
error if it is not defined
- Introduce a new TensorArg class, which is a Tensor plus a little
metadata. This helps us give good error messages when checking
dimensions/shapes of tensors.
Fixes https://github.com/zdevito/ATen/issues/169
- Also introduce a TensorGeometryArg class, for when you don't
need the actual tensor data (which is most of the time.)
- Add ATen/Check.h, which contains a number of utility functions
for testing shapes, types and devices of input tensors. This
will be particulary useful for native methods, which don't get
code generated input testing code. These functions take a
'CheckedFrom' argument, at the moment just a string, which
specifies some extra information about what function was
doing the actual checking; this greatly improves error messages.
- Many check functions take initializer lists, which let you
test that all tensors have some property. This API is
peculiar, in that we IGNORE undefined tensors in this case.
This is handled by filterDefined.
- Add AT_CUDNN_ENABLED macro
- CuDNN linking from ATen was improved; for example, we now actually
add the CuDNN headers to our include path.
- Add some missing override specifiers to some methods
- We now actually build tests with CUDA functionality accessible
(previously, AT_CUDA_ENABLED was not defined, meaning that
the headers were missing all CUDA-only functionality.)
- Native functions now support giving explicit names to return
outputs in yaml. This makes it possible to hook into the NN
autogenerated derivatives codepath using native functions.
CuDNN rewrite changes:
- torch/csrc/cudnn now uses ATen (rather than passing around
THVoidTensor) and lives in ATen. This lets us remove tensorPointer
shenanigans. The functions are exposed to ATen as native functions
described in aten/src/ATen/cudnn/cuDNN.yaml
- ATen now builds and links against CuDNN when enabled. The cmake
package script was taken from Caffe2.
- Some header reorganization was done to help reduce dependencies
on headers (this reorg is no longer used but I've kept it)
- Rename CHECK to CUDNN_CHECK
- Rip out old shape/type testing code in favor of modern ATen/Check.h
interface using TensorArg. In many cases, increase the robustness of
the checking code.
- Change the inputs of the public facing functions, so that they can
be bound by ATen
- Delete THCState*; this is retrieved from the global ATen context
- Delete cudnnHandle_t, this is retrieved from the global Handles.h
- Delete cudnnDataType_t, this is retrieved from the Tensor type
- Delete Convolution class, instead its constituent arguments are
passed individually
- Change functions to return tensors, rather than take an appropriately
sized output tensor as an input.
- Redo how transposed convolution / backward convolution is implemented
(knock on effect of returning tensors). Previously it was assumed
that you would always pass an appropriately sized output tensor, but
we don't want to do this anymore. For backwards, we instead give
the desired output tensor (input, really) size, because that is
readily available. For *transposed* convolution, however, we take
output_padding, and otherwise do the shape calculation.
- Redo how legacy group convolution is implemented (knock on effect from
porting cudnn to ATen.) Previously, group convolution was implemented
by manually constructing sizes and strides and then outputting
appropriate, with macros switching between individual groups and
all-at-once based on CuDNN version. Now, the code looks exactly what
you'd expect: there's a top-level wrapping function that supports
group convolution no matter the version of CuDNN, and a low-level
wrapper which supports only what CuDNN supports. The top-level
function conditions on CuDNN version, and invokes the low-level
interface 1 or n times.
- There is now a debugging printer for tensor descriptors.
- Convolution struct is replaced with ConvolutionArgs, which is not
part of the public API but is used internally to conveniently
pass around all of the arguments needed for Convolution.
- Add some constexprs for well-known dimensions, reduce amount of
magic numbers in code.
- Put 'deterministic' in to ConvParams. Fixes#3659
- Lots more comments.
- Some pessimizations, in the name of code clarity:
- The descriptors are initialized on every invocation of convolution
forward/backward. Previously, the descriptors were cached, so that
you didn't have to initialize them again on backwards. This is
difficult to support in the ATen interface so I didn't support it.
- Legacy group convolution initializes its workspace for *every* group
it performs. I did not feel motivated to fix this because the
legacy codepath is already quite slow.
- Affine grid generator and grid sampler automatically call contiguous
on their arguments as necessary.
- Batchnorm input checking is greatly beefed up, it now checks for
the following input characteristics:
- Definedness
- GPU location
- Type
- Contiguity
- Size
PyTorch binding code changes
- batchnorm now uses consistent var/data naming
- batchnorm and convolution make use of new ATen bindings
- Affine grid generator and grid sampler make use of ATen CuDNN
bindings via derivatives.yaml. This means I had to restructure
the code a little, since the THNN bindings still go through
a legacy Python class.
- I fixed some warnings:
- s/friend class/friend struct/ on InterpreterStateImpl
- Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp
- Removed unused pack_list on Scalar
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
GCC 4.8 buildfix
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Add TensorGeometry to ATen.h
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
CUDNN_CHECK
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Update TODO comment
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Delete return in cudnn_grid_sampler
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Don't allocate a new vector when filtering defined.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Remove Check overloads, convert to pass references.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Some more microbenchmarking.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
For example, this splits threshold into threshold(), which is now
never in-place, and threshold_() which is always in-place.
This simplifies the in-place vs. non-in-place logic in
gen_variable_type.py, which was bug-prone.
This operator is a warmup I was doing before tackling convolution, as it
has many properties that make it a "first" for implementing things. In
particular, it is the first operator whose backwards have multiple
returns; this means its double backwards is the first backwards for a
function with multiple differentiable outputs. This exercises new code
for output_mask and set_flags.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Prevent numerical issues with poisson_nll_loss when log_input=False
Evaluation of the logarithm of the input variable in poisson negative log likelihood leads to NaN loss if variable being evaluated is zero. Small epsilon is added to prevent this. See equivalent Keras epsilon here: https://github.com/fchollet/keras/blob/master/keras/losses.py#L68
* PEP8 fix
* Add epsilon support to PoissonNLLLoss in nn.modules.loss
* API changes
* Implement reduce for THNN ClassNLLCriterion
* Implement reduce keyword for THCUNN ClassNLLCriterion
* Implement reduce for THNN SpatialClassNLLCriterion
* Implement reduce for THCUNN SpatialClassNLLCriterion
* Make legacy NLLLoss work
* Docs for NLLLoss reduce
* reduce keyword for double backwards NLLLoss
* reduce=False tests
* Addressed comments
* Fix trailing whitespace
* Fix test failures in legacy nn
* Rebase: add reduce keyword to aten declarations of NLLLoss
* Add reference functions for all NLLLoss and NLLLoss2d test cases
* Replaced slow get/set fns. Don't use int64_t in kernels.
* Use TH_INDEX_BASE in NLLLoss for consistency
* Fix legacy ClassNLLCriterion tests
- Cleaned up THNN and THCUNN code and kernels
- Improved THCUNN kernel performance 5x, making it match cuDNN performance
- Added support for computing softmax over arbitrary dims
NOTE: The default dim for 3D inputs is now 1 (used to be 0)
- Both functions now accept inputs with arbitrarily many dimensions
- Autograd functions no longer save the input (it's unnecessary)
- Added cuDNN bindings for softmax, but they are unused as THCUNN
matches or even exceeds cuDNN performance
* Fix docs for nn.Embedding and F.embedding.
- add description of 'sparse' argument (#3104)
- fix F.embedding example (resulted in RuntimeError)
* Make EmbeddingBag a New Style Function.
* Add a functional interface for EmbeddingBag
* Fix failing tests: add max_norm and norm_type to context,
and fix typo in backend call.
* Docfix: remove torch.manual_seed from example code.
* Add a note about using sparse keyword in Embedding function.
* Add reduce keyword to MSECriterion API
* Move gradOutput usage from py to backend
* Implement reduce keyword for THNN MSECriterion
* Implement reduce keyword for THCUNN MSECriterion
* Implement reduce keyword for MSE double backwards
* Tests for MSECriterion with reduce keyword
* Documentation for reduce for MSELoss
* Make legacy nn work with reduce keyword by ignoring it
* Apply linter suggestions
* Address comments (small changes)
* Revert "Tests for MSECriterion with reduce keyword"
This reverts commit 1c0be0defa49d336d023d7d9795db4037c92b6fe.
* Undo changes to legacy nn tests
* Reuse module test for MSELoss by creating a wrapper class for MSELoss
* Address comments: refactor MSECriterion.cu to be nicer
* Fix lint & build errors
* Add examples in functional.py
Added examples for F.cross_entropy, F.binary_cross_entropy and F.binary_cross_entropy_with_logits.
* Add ` for PyTorch docs
Added ` for PyTorch docs.
* Add examples in loss.py
Added examples for nn.BCELoss and nn.BCEWithLogitLoss.
* added tests + removed explicit expand of weight in bce with logits
* add auto broadcasting of weight to BCELoss
* remove the need for _BCELoss
* formatting of warning
* remove TODO
* move across assert from _functions/thnn/loss.py
* flake8 fixes
* add dropout2d and dropout3d to functional
added some loss functions to functional
added tests
using dropout from backend
added docs
fixes
* edited loss modules to call functional
This takes advantage of the broadcasting behavior of torch.matmul to
support inputs with more than two dimensions. The extra dimensions are
treated like part of the batch dimension, much like nn.Bottle in Lua
Torch.
There are a few related small performance changes:
* Addmm computes the gradient in column-major for inputs in
column-major format
* Variable.mm calls Addmm in-place with the desired output buffer
* Add SELU activation function
* Remove unnecessary case
* Add Function for SELU + tests and fix RReLU inplace
* Fix extra line in doc
* Fix tests
Remove in-place tests for RReLU. For some reason they fail on legacy nn, but passes on nn
* SELU in new-style Function
It also supports double backprop, verifyed with gradgradcheck
* Fix flake8
Here's the command I used to invoke autopep8 (in parallel!):
git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i
Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.
Also configures flake8 to match pep8's behavior.
Also configures TravisCI to check the whole project for lint.
* Always compile .numpy() for all types
* Add torch.nn.functional docs and hidden headers
* Use sphinx to generate torchvision docs
* Remove unused import in ffi utils