Commit Graph

282 Commits

Author SHA1 Message Date
Elias Ellison
82175f31b4 Move Affine grid to C++ (#14392)
Summary:
Port AffineGrid to C++, because script does not support compiling Function classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392

Differential Revision: D13219698

Pulled By: eellison

fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6
2018-11-27 18:38:11 -08:00
David Riazati
1b80644b4d Revert D13192228: [pytorch][PR] [jit] Add boolean dispatch for function overloading
Differential Revision:
D13192228

Original commit changeset: fce33c400c1f

fbshipit-source-id: 75c9991dc7097f9513c6c89d16eff2de6e287c3b
2018-11-27 13:14:42 -08:00
David Riazati
66c8bbf021 Add boolean dispatch for function overloading (#14081)
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See `max_pool1d` for an example usage.

This is the first step in enabling the use of `max_pool` functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.

Depends on #14232 for `Optional[BroadcastingList[T]]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14081

Differential Revision: D13192228

Pulled By: driazati

fbshipit-source-id: fce33c400c1fd06e59747d98507c5fdcd8d4c113
2018-11-27 10:51:32 -08:00
Wanchao Liang
7fc34a4122 Convert gumbel_softmax, lp pooling weak functions and modules (#14232)
Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232

Differential Revision: D13164506

Pulled By: wanchaol

fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700
2018-11-21 23:44:24 -08:00
David Riazati
d9cdcc9a3b Add list inequality operator (#14129)
Summary:
This PR adds `aten::neq` for list inequality comparisons and converts
`nll_loss` to weak script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14129

Differential Revision: D13123894

Pulled By: driazati

fbshipit-source-id: 8c1edf7c163217ec00eb653f95d196db3998613f
2018-11-21 16:32:58 -08:00
David Riazati
8f20d40bb7 Allow undefined tensors as constants (#14120)
Summary:
This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`:

```python
torch.jit.script
def fn(x=None):
    # type: (Optional[Tensor]) -> Tensor
    return torch.jit._unwrap_optional(x)

torch.jit.script
def fn2():
    # type: () -> Tensor
    return fn()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120

Differential Revision: D13124625

Pulled By: driazati

fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569
2018-11-20 16:54:27 -08:00
Wanchao Liang
d6bfc53b9e Export BatchNorm functional and module, add necessary JIT support (#14016)
Summary:
This PR did three things:

1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016

Differential Revision: D13112064

Pulled By: wanchaol

fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91
2018-11-20 14:15:06 -08:00
David Riazati
0d29846d5e Convert more weak functions (#14003)
Summary:
Same deal as #13707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14003

Differential Revision: D13076403

Pulled By: driazati

fbshipit-source-id: eb3cb3b2c31caf1de591b613bdc4c9a6ed4e1767
2018-11-15 16:45:50 -08:00
Wanchao Liang
6d094224b9 Fix optional import/export, export multi-margin-loss (#13877)
Summary:
This PR did two thing:

1. it fix the optional import/export to include any type including tensor types (previously we only support base types), this is essential to unblock optional tensor type annotation in our test logic
2. it tries to export mult_margin_loss functional to serve as a example of optional undefined tensor use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13877

Differential Revision: D13076090

Pulled By: wanchaol

fbshipit-source-id: c9597295efc8cf4b6462f99a93709aae8dcc0df8
2018-11-15 00:45:22 -08:00
Xiang Gao
143ba72264 Move cosine_similarity to ATen (#12199)
Summary:
I'm now traveling and don't have access to a good computer to compile test by myself. Will see the outcome of CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12199

Differential Revision: D13062326

Pulled By: nairbv

fbshipit-source-id: 85873525caa94906ccaf2c739eb4cd55a72a4ffd
2018-11-14 10:41:44 -08:00
David Riazati
5163a28917 Convert more weak functions (#13707)
Summary:
Convert some more functions to match up with features added. Some
conversions were unsuccessful but the type line was left in for later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13707

Differential Revision: D13030210

Pulled By: driazati

fbshipit-source-id: 02d5712779b83b7f18d0d55539e336321335e0cc
2018-11-13 13:50:57 -08:00
David Riazati
0c375571f5 Support OptionalType export and type match (#13647)
Summary:
* Adds `OptionalType` support for import/export
    * Optionals get exported along with their contained type, i.e. 'Optional[int]'
* Allows concrete types and `None` to be passed to an op that takes an optional
* Converts `softmax`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13647

Differential Revision: D12954672

Pulled By: driazati

fbshipit-source-id: 159e9bfb7f3e398bec3912d414c393098cc7455a
2018-11-12 12:15:25 -08:00
Wanchao Liang
79ceecec8e Optional undefined tensor support (#13650)
Summary:
This PR is a part of task to unblock standard library export.
* we treat None differently from Tensor and other types, when passing None as Tensor, it's an undefined tensor rather than the None IValue.
* Refine the type system so that we have correct tensor types hierarchy (Dynamic/Tensor/CompleteTensor), Dynamic should be at the top of the inheritance hierarchy.
* It also tries to export bilinear as an example of undefined tensor(None) input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13650

Differential Revision: D12967026

Pulled By: wanchaol

fbshipit-source-id: 6aedccc7ce2a12fadd13d9e620c03e1260103a5a
2018-11-09 11:29:57 -08:00
Dan Zheng
51f58f0990 Fix typo in CTC loss doc comments. (#13727)
Summary:
`target_lenghts` -> `target_lengths`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13727

Differential Revision: D12981582

Pulled By: zou3519

fbshipit-source-id: e5e02b26cf3030a91494655ff863273333cc4133
2018-11-08 14:50:48 -08:00
David Riazati
556ff8e7b7 Add builtins for size() and list with defaults (#13639)
Summary:
* `aten::size()` to match `torch.Tensor.size`
* `aten::list_with_default` for semantics of `torch.nn.modules.utils.list_with_default`
* converts `adaptive_avg_pool2d` and `adaptive_avg_pool3d`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13639

Differential Revision: D12954670

Pulled By: driazati

fbshipit-source-id: 68c30af0efc02c60af5fb8c9715b2435cc01a0d9
2018-11-08 11:26:35 -08:00
David Riazati
4472ad3b2f Move functional _Reduction to its own module (#13401)
Summary:
To support `_Reduction` in the jit this PR moves it out to a new file so that it goes through the paths for python modules in the script compiler and converts `F.ctc_loss` to weak script

Depends on #13484 for saving rng state
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13401

Differential Revision: D12868501

Pulled By: driazati

fbshipit-source-id: 23cec0fb135744578c73e31ac825e238db495d27
2018-11-08 01:04:10 -08:00
Gregory Chanan
7341ab0a33 Fix range of target examples and JIT test case for CTC loss.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13644

Differential Revision: D12949733

Pulled By: gchanan

fbshipit-source-id: 1c4cacbb6a50d5002165bdd0a7881883db5c8249
2018-11-07 07:04:31 -08:00
David Riazati
fc6a9a19ea Add torch._C._nn built-in, more weak fns (#13322)
Summary:
This PR adds functions defined in `torch._C._nn` as builtin functions (including inplace variants). This allows for the conversion of more functions to weak script

NB: many `torch.nn.functional` functions will have to be slightly rewritten to avoid early returns (as with `threshold` in this PR)

Converts these functions to weak script:
* `threshold`
* `relu`
* `hardtanh`
* `relu6`
* `elu`
* `selu`
* `celu`
* `leaky_relu`
* `rrelu`
* `tanh`
* `sigmoid`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13322

Differential Revision: D12852203

Pulled By: driazati

fbshipit-source-id: 220670df32cb1ff39d120bdc04aa1bd41209c809
2018-11-05 21:02:18 -08:00
David Riazati
1969898647 Convert functional dropouts to weak script (#13484)
Summary:
To convert `nn.functional.dropout`
* `_VF` had to be exposed as a Python module so this PR adds a module class to forward to `torch._C._VariableFunctions`
* rng state between calls in the tests needed to be made consistent
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13484

Differential Revision: D12929622

Pulled By: driazati

fbshipit-source-id: 78b455db9c8856b94d2dda573fb7dc74d5784f56
2018-11-05 17:13:07 -08:00
Sam Gross
98f5c005da Speed up CPU threshold and relu implementation (#13182)
Summary:
```
The previous threshold implementation was not vectorized or parallelized.
This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms

CPU timings:
https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8

1 thread (before vs. after)
10240:  17.4 us vs. 6.9 µs per loop
102400: 141 us vs. 39.8 µs per loop

16 threads (before vs. after)
10240:  17.4 us vs. 6.7 µs per loop
102400: 141 us vs. 14.3 µs per loop

CUDA timings are not measurably different.

[1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions
https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182

Reviewed By: soumith

Differential Revision: D12825105

Pulled By: colesbury

fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15
2018-11-05 12:51:29 -08:00
Tongzhou Wang
99a5d19591 Rename elementwise_mean to mean (#13419)
Summary:
Closes #12459
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419

Differential Revision: D12883299

Pulled By: SsnL

fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7
2018-11-01 10:31:26 -07:00
Ailing Zhang
488d393ea6 Fix pointwise loss broadcast (#12996)
Summary: Fixes #12129 , #12327

Differential Revision: D10513781

Pulled By: ailzhang

fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec
2018-10-31 10:17:25 -07:00
Michael Suo
d2659f6689 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13346

Differential Revision: D12850686

Pulled By: michaelsuo

fbshipit-source-id: b7474d0a3f3347034592bef45125610c040cff6a
2018-10-30 16:22:58 -07:00
verhoek
0db505bf27 Made docstrings for Embedding more accurate. (#13310)
Summary:
Made the previous description for max_norm more precise, avoiding 'this' and describing what actually happens in the code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13310

Differential Revision: D12840813

Pulled By: SsnL

fbshipit-source-id: 98090c884267a62ce93cd85da84252d46926dfa5
2018-10-30 12:25:38 -07:00
Jason Gauci
5b15a501da Refactor & unit test feed predictor
Summary:
1. Refactor DDPG predictor.  Merge the critic predictor with ParametricDQNPredictor since they are the same
2. Fix bug where loss was multiplied by the batch size
3. Create DDPGFeedPredictor which uses the feed predictor output format
4. Add support for gridworld simulation memoization to DDPG.  Also memoize normalization tables.

Reviewed By: kittipatv

Differential Revision: D10161240

fbshipit-source-id: 2813890043de1241c1fb9b9c2b6a897403f9fc12
2018-10-30 10:27:47 -07:00
William Horton
1bec8f773b Move ConstantPadNd into ATen (#10885)
Summary:
Addresses #9499. Completed work on the forward function, tests should be passing for that. Working on backward function now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10885

Differential Revision: D9643786

Pulled By: SsnL

fbshipit-source-id: 2930d6f3d2975c45b2ba7042c55773cbdc8fa3ac
2018-10-26 15:25:27 -07:00
David Riazati
14ea4bf0d1 Make 7 nn modules into weak modules (#12966)
Summary:
Depends on #12682 ([stacked diff](https://github.com/driazati/pytorch/compare/weak_mod...driazati:mod_conv1))

* Adds tests for weak module conversion that creates a `ScriptModule` that uses the weak module and checks its graph
* Adds `torch._jit_internal.weak_module` tags to modules that already work
  * `Sigmoid`
  * `Tanh`
  * `Hardshrink`
  * `PReLU`
  * `Softsign`
  * `Tanhshrink`
  * `PairwiseDistance`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12966

Differential Revision: D10559557

Pulled By: driazati

fbshipit-source-id: dc4bea3aa744b3c44d4fa7dceefd97e951f824d0
2018-10-25 13:59:34 -07:00
Thomas Viehmann
dd823ccd28 small improvements to torch.nn.normalization docs (#12936)
Summary:
Based on a [discussion at the forums](https://discuss.pytorch.org/t/question-about-functional-normalize-and-torch-norm/27755), it might be worthwhile to clarify the documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12936

Differential Revision: D10502139

Pulled By: ezyang

fbshipit-source-id: 480c3c367f8c685dcde107b3018cb4129032322d
2018-10-22 23:14:47 -07:00
David Riazati
1e8064dec0 Convert 2 nn.functional functions to weak script (#12723)
Summary:
* Moves `weak_script` annotation to `torch/_jit_internal.py` folder to resolve dependency issue between `torch.jit` and `torch.nn`
* Add `torch._jit.weak_script` to `tanhshrink` and `softsign`, their tests now pass instead of giving an `unknown builtin op` error
* Blacklist converted `torch.nn.functional` functions from appearing in the builtin op list if they don't actually have corresponding `aten` ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12723

Differential Revision: D10452986

Pulled By: driazati

fbshipit-source-id: c7842bc2d3ba0aaf7ca6e1e228523dbed3d63c36
2018-10-21 14:09:55 -07:00
Thomas Viehmann
0521c47c91 Amend nondeterminism notes (#12217)
Summary:
include atomicAdd commentary as this is less well known

There is some discussion in #12207

Unfortunately, I cannot seem to get the ..include working in `_tensor_docs.py` and `_torch_docs.py`. I could use a hint for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12217

Differential Revision: D10419739

Pulled By: SsnL

fbshipit-source-id: eecd04fb7486bd9c6ee64cd34859d61a0a97ec4e
2018-10-16 23:59:26 -07:00
Tongzhou Wang
ac994f2c78 Fix SpectralNorm with DataParallel (#12671)
Summary:
There were two problems with SN + DP:

1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost.
2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work.

Fixes are:
1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained
2. Do not call `detach_`.
3. Added comments in SN about the subtlety.
4. Added a note to the DP doc on this particular behavior of DP.

cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu

Fixes https://github.com/pytorch/pytorch/issues/11476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671

Differential Revision: D10410232

Pulled By: SsnL

fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9
2018-10-16 16:02:17 -07:00
Ailing Zhang
e15501fb68 fix bce_with_logits with legacy reduce (#12689)
Summary:
Fix #12624 . internal usecase of legacy `reduce`.
Add test in test_nn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689

Reviewed By: ezyang

Differential Revision: D10391195

Pulled By: ailzhang

fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1
2018-10-16 09:46:58 -07:00
Natalia Gimelshein
a98958d3bd dtype option for softmax (#11719)
Summary:
Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719

Reviewed By: ezyang

Differential Revision: D10175514

Pulled By: zou3519

fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243
2018-10-13 17:57:10 -07:00
Ailing Zhang
5317429e82 move bceWithLogits from python to Aten (#11054)
Summary:
Fixes #10648 .
Perf comparison:
```
import torch
import torch.nn as nn
import time

def bm(testsize, repeat=100, cuda=False):
    total_time = 0.0
    pos_weight= torch.ones(testsize[1], device='cuda' if cuda else 'cpu') / testsize[1]
    # loss = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
    loss = nn.BCEWithLogitsLoss()
    input = torch.randn(testsize, device='cuda' if cuda else 'cpu').clamp_(2.8e-2, 1 - 2.8e-2)
    target = torch.randn(testsize, device='cuda' if cuda else 'cpu').gt(0).float()
    input.requires_grad = True
    target.requires_grad = True
    for _ in range(repeat):
        start = time.time()
        l = loss(input, target)
        l.backward()
        # print(target.grad)
        end = time.time()
        total_time += end - start
    return total_time

for cuda in [False, True]:
    for testsize in [(100, 100), (1000, 1000), (2000, 2000)]:
        # print(testsize, cuda)
        print('{:.5f}'.format(bm(testsize, cuda=cuda)))
```
|    | Python CPU | Aten CPU | Python GPU | Aten GPU
| ------------- | ------------- | ------------- | ------------- | ------------- |
| (100, 100)  | 0.15813s | 0.10890s | 0.14601s | 0.07070s |
| (1000, 1000)  | 1.74051s | 0.95038s | 0.15158s | 0.10153s |
| (2000, 2000) | 5.36515s | 2.46996s | 0.31322s | 0.200941s |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11054

Differential Revision: D9728289

Pulled By: ailzhang

fbshipit-source-id: b7c5bc50635f8cc63c317caa4321e32f7df860f8
2018-10-12 11:13:33 -07:00
Wei Yang
de11fe0c83 migrate PReLU to ATen (#11758)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10723
- migrate PReLU to ATen and deprecate legacy PReLU
- performance:

CPU with weight.numel() = 1
```
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
100 loops, best of 100: 9.43 ms per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
10 loops, best of 100: 24.4 ms per loop

>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 695 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.47 ms per loop
```

CPU with weight.numel() = channels
```
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 603 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 13.3 ms per loop

>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 655 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.45 ms per loop
```

CUDA with weight.numel() = 1
```
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 187 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.01 ms per loop

>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 195 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.28 ms per loop
```

CUDA with weight.numel() = channel
```
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 174 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.27 ms per loop

>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 181 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.26 ms per loop
```

The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels.

ezyang SsnL zou3519  soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758

Differential Revision: D9995799

Pulled By: weiyangfb

fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e
2018-09-21 16:26:04 -07:00
Marc Ferradou
e734c94fa2 Quick update to embedding_bag doc (#11784)
Summary:
Related to #11624 adding maxes to the function def of embedding_bag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11784

Differential Revision: D9892598

Pulled By: ezyang

fbshipit-source-id: e6372ccf631826ddf1e1885b2f8f75f354a36c0b
2018-09-17 23:56:05 -07:00
Gao, Xiang
513fd3dd36 Improve doc of torch.nn.functional.pad (#11623)
Summary:
I'm reading the doc of `torch.nn.functional.pad` and it looks a bit confusing to me. Hopefully this PR makes it clearer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11623

Differential Revision: D9818255

Pulled By: soumith

fbshipit-source-id: 4f6b17b0211c6927007f44bfdf42df5f84d47536
2018-09-13 19:25:24 -07:00
Tongzhou Wang
760679352e Move Pixel Shuffle to ATen (#9721)
Summary:
<del>#9692 </del>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9721

Differential Revision: D8955829

Pulled By: SsnL

fbshipit-source-id: 4f4d1c7720b6f757fbef9a10f70209ae76f61399
2018-09-13 18:25:48 -07:00
Marc Ferradou
f129da1a47 Add max to the ValueError for EmbeddingBag mode check (#11655)
Summary:
Related to #11624
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11655

Differential Revision: D9815454

Pulled By: SsnL

fbshipit-source-id: 8dd82e0c0aa68362e12b301e095a85af7d7fd71a
2018-09-13 14:39:40 -07:00
Roy Li
75f49befeb move instance_norm to aten (#10792)
Summary:
This also removes the usage of torch.onnx.symbolic_override in instance_norm. Fixes #8439.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10792

Differential Revision: D9800643

Pulled By: li-roy

fbshipit-source-id: fa13a57de5a31fbfa2d4d02639d214c867b9e1f1
2018-09-13 12:26:22 -07:00
Rasmus Diederichsen
35348dab10 WIP: Include note on cudnn determinism in each function backed by cudnn (#11434)
Summary:
Ping ezyang
This addresses your comment in #114. Strangely, when running the doc build (`make html`) none of my changes are actually showing, could you point out what I'm doing wrong?

Once #11329 is merged it might make sense to link to the reproducibility note everywhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11434

Differential Revision: D9751208

Pulled By: ezyang

fbshipit-source-id: cc672472449564ff099323c39603e8ff2b2d35c9
2018-09-11 20:27:09 -07:00
Peter Goldsborough
d95fedb436 Use ATen dropout implementation in Dropout module and add FeatureDropout (#11458)
Summary:
This PR does two things:
1. Replaces the implementation of the `Dropout` module with a call to the ATen function,
2. Replaces `Dropout2d` with a new `FeatureDropout` module that shall take the place of `Dropout2d` and `Dropout3d`. I contemplated calling it `Dropout2d` and making `Dropout3d` an alias for it, but similar to our decision for `BatchNorm{1,2,3}d` (c.f. https://github.com/pytorch/pytorch/pull/9188), we can deviate from Python PyTorch in favor of the ideal-world solution, which is to have a single module, since both actually just call `feature_dropout`.

I also replaced the implementation of `dropout3d`  with a call to `dropout2d` in Python. The code is the same and it's easier for developers to parse than having to manually match the tokens to make sure it's really 100% the same code (which it is, if I matched the tokens correctly).

ebetica ezyang SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11458

Differential Revision: D9756603

Pulled By: goldsborough

fbshipit-source-id: fe847cd2cda2b6da8b06779255d76e32a974807c
2018-09-11 20:16:12 -07:00
Tongzhou Wang
de460c7ad3 Improvements on conv/pool/fold/stft/ParamDict docs (#11106)
Summary:
Also fixes some incorrect formula rendering.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106

Differential Revision: D9752433

Pulled By: SsnL

fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21
2018-09-11 08:56:21 -07:00
Wei Yang
425ea6b31e fix doc for functional.dropout* (#10417)
Summary:
- fixes #4177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10417

Differential Revision: D9542876

Pulled By: weiyangfb

fbshipit-source-id: 480ed973d1fe0364f4acb5cd596c2031895b82df
2018-09-05 17:26:00 -07:00
Erik Brinkman
611a608517 Add ATen pdist CPU kernel (#10782)
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782

Reviewed By: ezyang

Differential Revision: D9583378

Pulled By: erikbrinkman

fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
2018-08-30 11:55:27 -07:00
Roy Li
f2bb9f0bb5 speed up kl div loss (#10336)
Summary:
Moved kl div loss to aten.

benchmarks for 5000 iterations on input size (1000,100)

New
```
cuda:
forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006]
input requires_grad=True:
backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155]
double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918]
target requires_grad=True:
backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101]
double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982]

cpu:
forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648]
input requires_grad=True:
backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492]
double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031]
target requires_grad=True:
backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143]
double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685]
```

Old
```
cuda:
forward [3.101281268056482, 3.068499860819429, 3.0527669726870954]
input requires_grad=True:
backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839]
double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133]
target requires_grad=True:
backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282]
double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176]

cpu:
forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352]
input requires_grad=True:
backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552]
double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931]
target requires_grad=True:
backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046]
double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10336

Differential Revision: D9213636

Pulled By: li-roy

fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b
2018-08-27 16:10:59 -07:00
Tongzhou Wang
d043f83019 Add tests for Tensor.* nn.* F.* docs (#10311)
Summary:
Test only for existence for now. I had to skip a lot of them so there a FIXME in the test.

Also I'm not testing torch.* because of namespace issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311

Differential Revision: D9196341

Pulled By: SsnL

fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc
2018-08-14 11:39:46 -07:00
Adam Paszke
adbcb3c1dc Move dropout and alpha dropout to ATen (#10384)
Summary:
zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10384

Reviewed By: ezyang

Differential Revision: D9272583

Pulled By: apaszke

fbshipit-source-id: ed5d37b28ce9ff25800bbaa0daf066cfbf1f9921
2018-08-10 14:55:28 -07:00
Tongzhou Wang
6a55238a3f Grid sampler: nearest interpolation & reflection padding (#10051)
Summary:
closes #9702 .

cc jph00

Commit structure:

1. Change the index calculation logic. I will explain using 1-D for simplicity.

	Previously we have (in pseudo code):

	```
	// 1. get the float locations from grid
	scalar_t x = from_grid()

	// 2. find the integral surrounding indices
	int x_left = floor(x)
	int x_right = x_left + 1

	// 3. calculate the linear interpolate weights
	scalar_t w_left = x_right - x
	scalar_t w_right = x - x_left

	// 4. manipulate the integral surrounding indices if needed
	// (e.g., clip for border padding_mode)
	x_left = manipulate(x_left, padding_mode)
	x_right = manipulate(x_right, padding_mode)

	// 5. interpolate
	output_val = interpolate(w_left, w_right, x_left, x_right)
	```

	This is actually incorrect (and also unintuitive) because it calculates the
	weights before manipulate out-of-boundary indices. Fortunately, this
	isn't manifested in both of the current supported modes, `'zeros'` and
	`'border'` padding:

	+ `'zeros'`: doesn't clip
	+ `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are
	  clipped to the same value, so weights don't matter

	But this is a problem with reflection padding, since after each time we reflect,
	the values of `w_left` and `w_right` should be swapped.

	So in this commit I change the algorithm to (numbers corresponding to the
        ordering in the above pseudo-code)

	```
	1. get float location
	4. clip the float location
	2. find the integral surrounding indices
	3. calculate the linear interpolate weights
	```

	In the backward, because of this change, I need to add new variables to track
	`d manipulate_output / d manipulate_input`, which is basically a multiplier
	on the gradient calculated for `grid`. From benchmarking this addition doesn't
	cause obvious slow downs.

2. Implement reflection padding. The indices will keep being reflected until
	they become within boundary.

	Added variant of `clip_coordinates` and `reflect_coordinates` to be used in
	backward. E.g.,
	```cpp
	// clip_coordinates_set_grad works similarly to clip_coordinates except that
	// it also returns the `d output / d input` via pointer argument `grad_in`.
	// This is useful in the backward pass of grid_sampler.
	scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t *grad_in)
	```
	For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`.
	If `in` is reflected **odd** times in `'reflection'` mode, `grad_in`
	is set to `-1`.

3. Implement nearest interpolation.

4. Add test cases

5. Add better input checking
  Discussed with goldsborough for moving `operator<<` of `at::Device`,
  `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise
  `AT_CHECK` can't find them.)

6. Support empty tensors. cc gchanan

    + Make empty tensors not acceptable by cudnn.
    + Add `AT_ASSERT(kernel block size  > 0)` if using `GET_BLOCKS`
   + Cache `numel` in `TensorGeometry`
      I was going to use `numel` to test if cudnn descriptor should accept a
      tensor, but it isn't used eventually. I can revert this if needed.

7. Add more test cases, including on input checking and empty tensors

8. Remove an obsolete comment

9. Update docs. Manually tested by generating docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051

Differential Revision: D9123950

Pulled By: SsnL

fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6
2018-08-10 12:43:27 -07:00
Wei Yang
149d4f776b use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965)
Summary:
- fixes #9141, #9301
- use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion)
- return (N) instead of (N, C) to match the same behavior as MultiMarginLoss
- Note that with this PR, the following behavior is expected:
```
loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none')
loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean')
loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum')

loss.sum() == loss_sum  # True
loss.mean() == loss_mean  # True
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965

Differential Revision: D9038402

Pulled By: weiyangfb

fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9
2018-08-03 17:54:19 -07:00