Commit Graph

262 Commits

Author SHA1 Message Date
Stefan Krah
fc8834df4b Port adaptive_max_pool3d() to ATen (#19547)
Summary:
This is the second part of #18064.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19547

Differential Revision: D15046630

Pulled By: ezyang

fbshipit-source-id: 03f80602b94d47bca66bfd0dcab1b7bb99e5b7f1
2019-04-23 12:51:25 -07:00
Stefan Krah
75ce5173a9 Port adaptive_max_pool2d() to ATen (#19409)
Summary:
This is the first part of  #18064.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19409

Differential Revision: D15037390

Pulled By: ezyang

fbshipit-source-id: 16a3feed2fd9cc66033696da224a7d5fb7208534
2019-04-23 07:37:25 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Thomas Viehmann
5360984fbd Remove TH(CU)NN Sparse Linear (#17610)
Summary:
Sparse Linear in TH(CU)NN implements sparse linear layers without
using sparse matrices.
It is currently not documented in PyTorch and there is no functional or
module interface. This means it is unused from a PyTorch point of view.

The reason for removing it is twofold:
- The module uses sort, which I would like to move to ATen.
- When we implement a SparseLinear layer, we would want to do it
  using sparse tensors, so it's not all that useful, anyway.

I checked this on slack with soumith, I hope the above is an accurate
representation. All bad ideas are my own.

This is part of the ongoing work to move
sort/topk/mode/median/kthvalue to ATen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17610

Differential Revision: D14280663

Pulled By: gchanan

fbshipit-source-id: 289231d2c20626855ce2ceecd4f204b460c32378
2019-03-01 12:36:52 -08:00
vishwakftw
8c81a72e87 Switch to CUDA implementation if batch size >= 65536 for affine_grid (#16403)
Summary:
Changelog:

- Append a condition that switches to the native CUDA implementation for affine_grid

Fixes #16365

Differential Revision: D13832192

Pulled By: soumith

fbshipit-source-id: 3f484e6673d71e3ba7627b170cb8f1611e12b9b2
2019-01-26 11:18:57 -08:00
Shen Li
23e28efed4 Porting legacy reflection_pad2d to ATen
Summary:
Other changes:
1. Avoided using `THCDeviceTensor` by re-calculating the mapping from cuda (blockIdx, threadIdx) to input/output tensor index.
2. Changed Camelcase naming to underscore naming.

Differential Revision: D13546803

fbshipit-source-id: 1df54f13e64934da3d803d9b6586bd5208d42d6d
2019-01-09 20:55:27 -08:00
Lin Huang
2d8b332262 Port replication_pad2d and replication_pad3d to ATen (#15538)
Summary:
port replication padding 2D and 3D from legacy TH API implementation
to ATen implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15538

Differential Revision: D13547567

Pulled By: lhuang04

fbshipit-source-id: decfe100d9edfdcfb62f39ee23f37b6cae0d461f
2019-01-04 17:08:14 -08:00
Shen Li
279ca4acd2 Port legacy reflection_pad1d to ATen (#15480)
Summary:
1. Avoided using `THCDeviceTensor` by re-calculating the mapping from cuda (blockIdx, threadIdx) to input/output tensor index.
2. Changed Camelcase naming to underscore naming.

Profiling:

Legacy:

```bash
$py.test test/test_nn.py -k ReflectionPad1d -v -s
....
=========== 2 passed, 1258 deselected, 800 warnings in 4.35 seconds ============
```

Now:

```bash
$py.test test/test_nn.py -k ReflectionPad1d -v -s
...
=========== 2 passed, 1258 deselected, 800 warnings in 4.03 seconds ============
```

I have two questions about the code. Any insights are appreciated. gchanan zou3519

1. I can verify that [this magic](https://github.com/pytorch/pytorch/blob/master/aten/src/THCUNN/TemporalReflectionPadding.cu#L32-L36) correctly maps output index to input index in different cases. But, I have no idea about how did you come up with this algorithm that merges three categories (in left padding, in original input, in right padding) into a single statement?

2. Why do we need [get contiguous](https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/TemporalReflectionPadding.c#L80) tensors when calculating forward and backward propagation?

Reflection_pad2d porting will come in the next PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15480

Differential Revision: D13544924

Pulled By: mrshenli

fbshipit-source-id: 182045434f210032a82cab721a190da0cd781fbf
2019-01-03 10:30:37 -08:00
Lin Huang
b7bc49ad70 Port replication_pad1d to ATen (#15507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15507

Pull Request resolved: https://github.com/pytorch/pytorch/pull/15485

port replication_pad1d

Reviewed By: ezyang

Differential Revision: D13531920

fbshipit-source-id: dcd64ebd2c24b7431996231b8d5addfb600b1072
2018-12-24 06:34:02 -08:00
David Riazati
f3cc9b2218 Remove fully qualified weak script names (#15364)
Summary:
Cleanup to make references to `weak_script` consistent across codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15364

Differential Revision: D13509676

Pulled By: driazati

fbshipit-source-id: 93dbbbe57e9b9b6587895f3cc6fac678babd21de
2018-12-18 16:48:52 -08:00
Roy Li
e0b261a35b Port nn fold and unfold to c++
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14597

Reviewed By: ezyang

Differential Revision: D13272227

fbshipit-source-id: 6eccab5ff5830a977398a96393b778095120edc6
2018-12-17 15:46:37 -08:00
Immanuel Alexander
64b3364209 Move adaptive avg pooling 2d to ATen native (#14714)
Summary:
adaptive_avg_pool1d, adaptive_avg_pool2d, and adaptive_avgpool3d are neural network functions that are currently implemented in our legacy THNN (CPU) / THCUNN (CUDA) libraries.  It is generally better if these live in our new library ATen, since it is more feature complete and reduces cognitive overhead.

This change moves currently to adaptive_avg_pool1d and adaptive_avg_pool2d to ATen.

timed relevant cpu tests with this change:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.s.s.s.s.s.s.s...
----------------------------------------------------------------------
Ran 17 tests in 6.273s

OK (skipped=7)

real	0m7.164s
user	3m1.289s
sys	0m0.905s
```

compared to master:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.s.s.s.s.s.s.s...
----------------------------------------------------------------------
Ran 17 tests in 7.232s

OK (skipped=7)

real	0m8.065s
user	3m34.714s
sys	0m2.440s
```

also timed relevant cuda tests with this change:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.................
----------------------------------------------------------------------
Ran 17 tests in 21.049s

OK

real	0m24.106s
user	0m20.890s
sys	0m4.026s
```

compared to master
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.................
----------------------------------------------------------------------
Ran 17 tests in 23.021s

OK

real	0m27.095s
user	0m20.121s
sys	0m3.668s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14714

Differential Revision: D13384084

Pulled By: xnder

fbshipit-source-id: 344442103ccbbda72d3c010d2feea00e9985d226
2018-12-12 12:25:22 -08:00
Elias Ellison
82175f31b4 Move Affine grid to C++ (#14392)
Summary:
Port AffineGrid to C++, because script does not support compiling Function classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392

Differential Revision: D13219698

Pulled By: eellison

fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6
2018-11-27 18:38:11 -08:00
William Horton
1bec8f773b Move ConstantPadNd into ATen (#10885)
Summary:
Addresses #9499. Completed work on the forward function, tests should be passing for that. Working on backward function now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10885

Differential Revision: D9643786

Pulled By: SsnL

fbshipit-source-id: 2930d6f3d2975c45b2ba7042c55773cbdc8fa3ac
2018-10-26 15:25:27 -07:00
Adam Paszke
6d6655e6be Port PackedSequences functions to C++ (#11224)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11224

Differential Revision: D9652703

Pulled By: apaszke

fbshipit-source-id: 558e39457e590cad07516e5bb2ecb12789564950
2018-09-05 06:35:15 -07:00
Adam Paszke
542aadd9a7 Stop using symbolic override for tracing RNNs (#10638)
Summary:
This disables the symbolic override hacks and makes tracing emit the recently added ATen ops for RNNs (`aten::lstm`, `aten::gru`, ...). I managed to reuse pretty much all of the translation code for their symbolics.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10638

Differential Revision: D9385830

Pulled By: apaszke

fbshipit-source-id: ff06ef7b1ae7c3b7774825e0991bc3887e1ff59b
2018-08-24 20:25:58 -07:00
Tongzhou Wang
de11a5fb28 Resubmit #8322 with scipy version check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10775

Differential Revision: D9458207

Pulled By: SsnL

fbshipit-source-id: f2b0dbf2d236134afded9b15d8bf55ff98f50e7b
2018-08-22 13:39:49 -07:00
Adam Paszke
d35f365ad5 Remove all cuDNN specific inputs to RNN functions (#10581)
Summary:
This is still not the final PR, but it removes all blockers for actually using the RNN functions directly in the JIT. Next patch should be final, and will actually remove the symbolic_override code, and change it to proper symbolics for those ATen functions. Turns out the symbolic code can be also cleaned up a bit, and I'll do that too.

zdevito ezyang
colesbury (for minor DispatchStub.h) changes

There was no way to handle those in the JIT for now, and they turned
out to be completely unnecessary. It should make the Python and C++
module code much simpler too, since all the logic is now centralized
in the native functions.

The downside is that RNN modules no longer own their dropout buffers,
which are shared per-device instead (with appropriate locking and
synchronization). This might appear as a perf regression at first, but
in reality it's highly unlikely that anyone will want to run cuDNN RNNs
on the same GPU in parallel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10581

Reviewed By: colesbury

Differential Revision: D9365541

Pulled By: apaszke

fbshipit-source-id: 3ef8677ee5481bae60c74a9117a2508665b476b5
2018-08-17 11:09:51 -07:00
Simon Wang
a129f9ad3b Revert D9332335: [pytorch][PR] Implements volumetric (5d) affine grid generation.
Differential Revision:
D9332335

Original commit changeset: 1b3a91d078ef

fbshipit-source-id: 3dcce680257a6da121f5d67918ed4236e0c5bfec
2018-08-15 15:25:11 -07:00
Adam Paszke
86363e1d8e Move RNN implementations to C++ (#10481)
Summary:
This is the first of two changes that are supposed to improve how we handle RNNs in the JIT. They still get traced as `PythonOp`s, but now it will be much easier to actually expose them to the JIT as e.g. `aten::lstm`, and ignore the Python interpreter entirely. This needs some symbolic adjustments that will be part of a second PR.

Even when we fix symbolics, there will still be a bit of a problem with statefulness of the cuDNN API (we need a mutable cache for the dropout state, but our IR has no way of representing that).

zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10481

Reviewed By: ezyang

Differential Revision: D9341113

Pulled By: apaszke

fbshipit-source-id: 0ae30ead72a1b12044b7c12369d11e5ca8ec30b5
2018-08-15 13:25:41 -07:00
Eli Stevens
f5a4dd89b5 Implements volumetric (5d) affine grid generation. (#8322)
Summary:
I've implemented affine grid generation for volumetric (5d) inputs. The implementation is based off of the spatial implementation, extended by one dimension. I have a few questions about my implementation vs. the existing one that I will add inline.

I have some extensive test cases for the forward pass here: https://gist.github.com/elistevens/6e3bfb20d8d0652b83bd16b3e911285b However, they use `pytest.fixture` extensively, so I'm not sure the best way to incorporate them into the pytorch test suite. Suggestions? I have not tested backwards at all.

Diff probably best viewed with whitespace changes ignored.

Thanks for considering!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8322

Differential Revision: D9332335

Pulled By: SsnL

fbshipit-source-id: 1b3a91d078ef41a6d0a800514e49298fd817e4df
2018-08-15 11:02:08 -07:00
Adam Paszke
adbcb3c1dc Move dropout and alpha dropout to ATen (#10384)
Summary:
zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10384

Reviewed By: ezyang

Differential Revision: D9272583

Pulled By: apaszke

fbshipit-source-id: ed5d37b28ce9ff25800bbaa0daf066cfbf1f9921
2018-08-10 14:55:28 -07:00
Adam Paszke
be5fb8f6fd Move fused RNN kernels into ATen (#10305)
Summary:
As in the title. I also did a small refactor that let us loose almost 400 loc. This is a first step in moving the RNN code to C++.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10305

Reviewed By: ezyang

Differential Revision: D9196227

Pulled By: apaszke

fbshipit-source-id: 54da905519aade29baa63ab1774a3ee1db5663ba
2018-08-10 09:12:05 -07:00
Natalia Gimelshein
5bb21493fd add fused dropout kernels (#9666)
Summary:
While waiting for dropout to be fully ported to ATen, here's performance fix for the most common dropout case. Dropout is still in python function, I just added efficient path to it. I could not make inplace work, because generator always generates `return self` for inplace function, and I need to return both original tensor and mask, so inplace goes on the existing pass. Even with non-inplace version, since mask is now a ByteTensor, memory used is just a little larger than for inplace dropout, due to savings on mask.
Once dropout is moved to aten, these kernels still can be used for efficient implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9666

Reviewed By: SsnL

Differential Revision: D8948077

Pulled By: ezyang

fbshipit-source-id: 52990ef769471d957e464af635e5f9b4e519567a
2018-08-07 13:34:53 -07:00
Sam Gross
829d763c69 Implement add, sub, mul, div using TensorIterator (#8919)
Summary:
```
This adds TensorIterator, a helper class for computing element-wise
operations that's intended to replace the CPU and CUDA apply utils
functions.

CPU kernels are implemented as functions that operate on strided 1-d
tensors compared to CPUApplyUtils which operated individual elements. This
allows the kernels to handle vectorization, while TensorIterator handles
parallelization and non-coalesced dimensions.

GPU kernels continue to operate on elements, but the number of
specializations is reduced. The contiguous case remains the same. The
non-contiguous case uses a single (reduced) shape for all operands and
the fast integer division from THCIntegerDivider. To avoid extra
specializations for indexing with 64-bits, large operations are split
into smaller operations that can be indexed with 32-bits.

Major semantic changes:

 - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by
   TensorIterator. The autograd engine performs the reduction assuming
   standard broadcasting if the gradient shape does not match the
   expected shape. Functions that do not use standard broadcasting rules
   should either continue to trace the expand calls or handle the
   reduction in their derivative formula.

 - Use ONNX v7, which supports broadcasting ops.

Performance impact:

 - Small increased fixed overhead (~0.5 us)
 - Larger overhead for wrapped numbers (~2.5 us)
 - No significant change for ops on contiguous tensors
 - Much faster worst-case performance for non-contiguous GPU tensors
 - Faster CPU bias addition (~2x)
 - Faster GPU bias addition (~30% faster)

Future work:

 - Decrease overhead, especially for wrapping numbers in Tensors
 - Handle general inter-type operations
 - Extend to unary ops and reductions
 - Use buffering for compute-bound operations on non-contiguous tensors
   (pull in from CPUApplyUtils)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919

Differential Revision: D8677600

Pulled By: colesbury

fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd
2018-07-27 14:43:24 -07:00
Adam Paszke
aa7af94656 Make JIT tracing a thread-local property (#9414)
Summary:
As in the title. Lets us simplify a lot of code.

Depends on #9363, so please review only the last commit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414

Reviewed By: zdevito

Differential Revision: D8836496

Pulled By: apaszke

fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3
2018-07-19 19:09:39 -07:00
tippisum
5c695e3a60 Implement 2D and 3D alpha_dropout (#9073)
Summary:
It implements per-channel alpha_dropout. It also creates corresponding function classes and unifies the process of dropout and alpha_dropout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9073

Differential Revision: D8727008

Pulled By: ezyang

fbshipit-source-id: 9d509f9c5db4e98f7b698cdfc4443505a4d2b331
2018-07-17 17:10:16 -07:00
Roy Li
a47a30b9ce Implement grid_sampler in aten (#8929)
Summary:
Partially addresses #8928.

Maybe #7273?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8929

Reviewed By: ezyang

Differential Revision: D8668919

Pulled By: li-roy

fbshipit-source-id: 8ad07b224d2ab211c274c4c10f042501efaae32c
2018-07-10 15:10:24 -07:00
Wei Yang
cb1bfe91af Deprecated several functions at torch.nn.functional (#8748)
Summary:
1. fixes #6245
2. deprecated tanh, sigmoid
Closes https://github.com/pytorch/pytorch/pull/8748

Differential Revision: D8697975

Pulled By: weiyangfb

fbshipit-source-id: f30714aa0611a1fe870040692f3dbcc8238aece9
2018-07-02 15:54:46 -07:00
Peter Goldsborough
9ce15173fb Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012)
Summary:
The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class.
The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set.

To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API.

I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN.

ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/9012

Reviewed By: pjh5

Differential Revision: D8689786

Pulled By: goldsborough

fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c
2018-06-29 17:25:23 -07:00
Thomas Viehmann
9a9eadacc6 explicitly check device for grid_sampler (fixes: #8599) (#8646) 2018-06-19 11:53:46 -04:00
Soumith Chintala
92f67d9404 fix lint 2018-06-15 18:18:20 -07:00
li-roy
26bed6d83e
assert limit on cudnn grid_sampler (#8576) 2018-06-15 17:33:33 -07:00
ngimel
63ae163b24 put dropout states on the input device (#7515)
* put dropout states on the input device

* add assert to aten, add test, fix lint

* only assert device if states are defined
2018-05-13 16:25:37 -04:00
Tongzhou Wang
1c01eabd3c
Codemod to update our codebase to 0.4 standard (#6641)
* Codemod to update our codebase to 0.4 standard

* Update some of the test scri[ts

* remove Variable in test_clip_grad_value

* fix _symbolic_override_wrapper_maker
2018-04-17 22:06:54 -04:00
James Reed
3b0204d43c
[JIT] Hacky: Staged symbolics for RNN nodes (#6297)
* Staged symbolic for RNN modules

* Move function to symbolic.py

* Add comments, improve tests, fixup logic
2018-04-12 16:29:25 -07:00
gchanan
749d51414a
Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00
Richard Zou
5d628db0a2 Deprecate ctx.saved_variables via python warning. (#5923)
* Deprecate ctx.saved_variables via python warning.

Advises replacing saved_variables with saved_tensors.
Also replaces all instances of ctx.saved_variables with ctx.saved_tensors in the
codebase.

Test by running:
```
import torch
from torch.autograd import Function

class MyFunction(Function):
    @staticmethod
    def forward(ctx, tensor1, tensor2):
        ctx.save_for_backward(tensor1, tensor2)
        return tensor1 + tensor2

    @staticmethod
    def backward(ctx, grad_output):
        var1, var2 = ctx.saved_variables
        return (grad_output, grad_output)

x = torch.randn((3, 3), requires_grad=True)
y = torch.randn((3, 3), requires_grad=True)
model = MyFunction()
model.apply(x, y).sum().backward()
```
and assert the warning shows up.

* Address comments

* Add deprecation test for saved_variables
2018-03-26 14:13:45 -04:00
li-roy
e4eee7c2cf Implement MarginRankingLoss as native function and add reduce=True arg to it (#5346)
* add reduce=True arg to MarginRankingLoss

* make default margin arg match for legacy

* remove accidentally added test

* fix test

* fix native_functions.yaml alphabetical order
2018-03-21 15:40:58 -04:00
li-roy
1dcad08537 Support N-D tensors in Bilinear (#5764)
* support n-d inputs in bilinear and move to aten

* support n-d inputs in bilinear and move to aten

* add asserts to bilinear inputs

* address comments

* cast int64_t in asserts
2018-03-17 11:57:43 -04:00
li-roy
4c4a42b3f9 implement CosineEmbeddingLoss as a native function and add reduce arg (#5646)
* implement CosineEmbeddingLoss as a native function and add reduce=True arg to it

* fix flake8

* address comments

* add reference function to tests

* fix flake8
2018-03-08 17:54:24 -05:00
Edward Z. Yang
9de922991c
Revert "implement CosineEmbeddingLoss as a native function and add reduce arg" (#5640)
* Revert "implement CosineEmbeddingLoss as a native function and add reduce arg (#5447)"

This reverts commit c16478fe3f.
2018-03-08 14:07:17 -05:00
li-roy
c16478fe3f implement CosineEmbeddingLoss as a native function and add reduce arg (#5447)
forward (new) [1.1905965859768912, 1.160144692985341, 1.1558120870031416]
backward (new) [1.9150976981036365, 1.9792822760064155, 1.8779143309220672]
double backward (new) [3.6898688060464337, 3.5784677929477766, 3.569505032035522]

forward (old) [3.2359962839400396, 3.275224728975445, 3.3409753759624436]
backward (old) [5.668679727939889, 5.722980880062096, 5.585088661056943]
double backward (old) N/A

* implement CosineEmbeddingLoss as a native function and add reduce=True arg to it

* fix flake8

* address comments

* add reference function to tests

* fix flake8
2018-03-08 13:15:12 -05:00
Edward Z. Yang
f064c5aa33
Expunge all occurrences of torch._C._VariableFunctions (#5525)
Some of the call-sites now look a little hokey with this
removed, saving that for another patch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 12:19:44 -05:00
Edward Z. Yang
0877558e60
Port cuDNN RNN dropout state initialization to ATen and make Python c… (#5383)
* Port cuDNN RNN dropout state initialization to ATen and make Python code use it.

Fixes #5138.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Variable/Tensor bugfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 10:00:00 -05:00
Soumith Chintala
36abf023bd
Added 3d grid sampler (for volumetric transformer networks) (#5453)
* add 3d grid_sample

* add cuda implementation, more testing
2018-02-28 19:32:15 -05:00
gchanan
affe742d31
Add scalar module tests for test_nn. (#5116)
* Add scalar module tests for test_nn.

* Properly return from glu.

* Guard scalar test with skipIf.
2018-02-08 13:53:24 -05:00
anderspapitto
ef14590209 Support calling pack_padedd_sequence with a Variable lengths (#5113)
This was accidentally lost while addressing review comments on
https://github.com/pytorch/pytorch/pull/4695

pack_padded_sequence may be called either with a list or with a
Variable. If called with a list we convert to Variable internally.

I added to test_nn to test the new codepath. The bug was also caught
by the onnx-fb-universe tests (which rely on passing in Variable).
2018-02-07 17:11:33 -05:00
anderspapitto
b2cfd961d3 Handle sequence lengths correctly when exporting RNNs to ONNX (#4695)
* PackedSequence: store batch_sizes as tensor

rather than converting to a list of python integers. This maintains
the invariant that module's inputs/outputs are collections of
Variables.

In particular, this causes the JIT to no longer choke when flattening
and unflattening arguments.

* Handle sequence lengths correctly when exporting RNNs to ONNX

- when uniform sequence lengths are provided, correctly omit the
  argument when constructing the ONNX graph, so as to not fix the
  graph to the batch size.

- handle PackedSequences by floating them through the graph and
  eliminating them in an optimization pass. ONNX does not have packed
  sequences, but operates on a representation equivalent to
  PaddedSequence, so we hide the representation-switching from ONNX

- as a preliminary step towards handling PackedSequences, not directly
  tied to ONNX export, change batch_sizes from being an argument to
  the RNN operators into being an argument to the forward() function
  of those RNN operators. This more closely models the reality that
  batch_sizes are effectively part of the input sequences.
2018-02-06 21:40:27 -05:00
Sam Gross
895aebac08
Use Variable instead of Tensor in Function.forward (#4786)
The Tensor and Variable classes are being merged.
autograd.Function.forward is now called on Variables, but with "no-grad"
mode (torch.no_grad()) enabled.

One benefit is that we no longer have to explicitly track shared
storages.
2018-02-06 17:24:27 -05:00