Commit Graph

588 Commits

Author SHA1 Message Date
Sameer Deshmukh
602394e996 verify input sizes for instance norm and group norm (#29082)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/19250
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29082

Differential Revision: D19373507

Pulled By: ezyang

fbshipit-source-id: 231a79280f4cd7db2c26218a60869356a124bf77
2020-01-27 09:05:56 -08:00
Jianyu Huang
3ada2e0d64 [pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049)
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/4049

Pull Request resolved: https://github.com/pytorch/pytorch/pull/27477

We would like to add the intra-op parallelization support for the EmbeddingBag operator.

This should bring speedup for the DLRM benchmark:
https://github.com/pytorch/pytorch/pull/24385

Benchmark code:
```
from __future__ import absolute_import, division, print_function, unicode_literals

import torch
import time

eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum')

input = torch.LongTensor(1500).random_(0, 1000000)
offsets = torch.zeros(64, dtype=torch.int64)

niter = 10000
s = time.time()
for _ in range(niter):
    out = eb(input, offsets)
time_per_iter = (time.time() - s) / niter
print('time_per_iter', time_per_iter)
print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9)
```

The following results are single core on Skylake T6:
- Before our change (with the original caffe2::EmbeddingLookup)
time_per_iter 6.313693523406982e-05
GB/s 6.341517821789133

- After our change using the EmbeddingLookupIdx API which takes the offsets instead of lengths.
time_per_iter 5.7627105712890626e-05
GB/s 6.947841559053659

- With Intel's PR: https://github.com/pytorch/pytorch/pull/24385
time_per_iter 7.393271923065185e-05
GB/s 5.415518381664018

For multi-core performance, because Clang doesn't work with OMP, I can only see the single-core performance on SKL T6.
ghstack-source-id: 97124557

Test Plan:
With D16990830:
```
buck run mode/dev //caffe2/caffe2/perfkernels:embedding_bench
```

With D17750961:
```
buck run mode/opt //experimental/jianyuhuang/embeddingbag:eb
buck run mode/opt-lto //experimental/jianyuhuang/embeddingbag:eb
```

OSS test
```
python run_test.py -i nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu
```

Buck test
```
buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu"

OMP_NUM_THREADS=3 buck test mode/opt -c pytorch.parallel_backend=tbb //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets"  --print-passing-details
```

Generate the AVX2 code for embedding_lookup_idx_avx2.cc:
```
python hp_emblookup_codegen.py --use-offsets
```

Differential Revision: D17768404

fbshipit-source-id: 8dcd15a62d75b737fa97e0eff17f347052675700
2020-01-23 21:29:44 -08:00
Guanheng Zhang
db02a4e4ce Support 3D attention mask in MultiheadAttention. (#31996)
Summary:
Support a 3D attention mask for MultiheadAttention. If `attn_mask` has the batch dimension, it will not be unsqueezed. Fix https://github.com/pytorch/pytorch/issues/30678
Relevant issues/pr:
https://github.com/pytorch/pytorch/pull/25359
https://github.com/pytorch/pytorch/issues/29520
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31996

Differential Revision: D19332816

Pulled By: zhangguanheng66

fbshipit-source-id: 3448af4b219607af60e02655affe59997ad212d9
2020-01-23 13:16:48 -08:00
Tongzhou Wang
cc2d5b15ad F.normalize uses clamp_min_ inplace (#32360)
Summary:
We don't care about autograd when `out!=None` anyways
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32360

Differential Revision: D19452402

Pulled By: colesbury

fbshipit-source-id: c54775289f8a700019ca61e951d59ff4894ac980
2020-01-21 10:38:06 -08:00
Alban Desmaison
77c78b7d28 remove .data from torch/nn doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31481

Test Plan: Imported from OSS

Differential Revision: D19303242

Pulled By: albanD

fbshipit-source-id: 4f650df9e9e302a299175967bcc6e30a5099fa2a
2020-01-14 07:30:42 -08:00
BowenBao
c4f10e0fe7 Renaming scales parameter for interpolate (#31526)
Summary:
PR separated from https://github.com/pytorch/pytorch/pull/31274.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31526

Reviewed By: zou3519

Differential Revision: D19221931

Pulled By: gchanan

fbshipit-source-id: 81958a9910867ac9d62f2b47abc49384526c4e51
2020-01-02 08:19:30 -08:00
Lara
97c1e90f46 ONNX Interpolate Add Scales Params (#28324)
Summary:
Fix for : https://github.com/pytorch/pytorch/issues/27176
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28324

Reviewed By: hl475

Differential Revision: D18309133

Pulled By: houseroad

fbshipit-source-id: 348bb41393442c6b107d88fc2cd3224e0afa3ccf
2019-12-11 20:09:15 -08:00
Tongzhou Wang
d6ca93b353 add doc for F.softplus
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30055

Differential Revision: D18762624

Pulled By: zou3519

fbshipit-source-id: 61da88cbb8cd0f37ac26b0fb8aaacdbe85c724ba
2019-12-04 07:16:30 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Christian Puhrsch
7903fb118f Move qkv_same, kv_same into branch (#30142)
Summary:
Perf improvements to multi_head_attention_forward

- qkv_same and kv_same were not used outside of that branch. Further, kv_same was calculated even though it is not used if qkv_same
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30142

Differential Revision: D18610938

Pulled By: cpuhrsch

fbshipit-source-id: 19b7456f20aef90032b0f42d7da8c8a2d5563ee3
2019-11-22 10:40:02 -08:00
Vitaly Fedyunin
a4f60b64dc explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29391

Test Plan: Imported from OSS

Differential Revision: D18429726

Pulled By: VitalyFedyunin

fbshipit-source-id: 07dfff568ad776cf792122913530566d53be55fa
2019-11-18 21:47:52 -08:00
Andreas Koepf
c7ed89cf65 Migrate nll_loss2d from TH to ATen (CPU) (#28304)
Summary:
Added check for indicies in Reduction::None case.

### Benchmark results

Note: Due to the size of the input tensors this time the random number generation is responsible for a significant portion of the total time. It is better to look at the individual net time-outputs (which do not include the input preparation).
Script used for benchmark.: [nnl_loss2d_benchmark.py](https://gist.github.com/andreaskoepf/5864aa91e243317cb282c1e7fe576e1b)

#### WITH PR applied
```
using reduction:  none
CPU forward 1000 took 7.916500908322632e-05
CPU forward 10000 took 0.0002642290201038122
CPU forward 100000 took 0.003828087996225804
CPU forward 1000000 took 0.037140720000024885
CPU forward 10000000 took 0.33387596398824826
CPU forward TOTAL time 7.218988707987592

using reduction:  mean
CPU forward 1000 took 9.165197843685746e-05
CPU forward 10000 took 0.0005258890159893781
CPU forward 100000 took 0.0050761590246111155
CPU forward 1000000 took 0.047345594997750595
CPU forward 10000000 took 0.4790863030066248
CPU forward TOTAL time 7.9106070210109465
CPU for- & backward 1000 took 0.0005489500181283802
CPU for- & backward 10000 took 0.0015284279943443835
CPU for- & backward 100000 took 0.015138130984269083
CPU for- & backward 1000000 took 0.15741890601930209
CPU for- & backward 10000000 took 1.6703072849777527
CPU for- & backward TOTAL time 9.555764263990568

using reduction:  sum
CPU forward 1000 took 8.789298590272665e-05
CPU forward 10000 took 0.000514078012201935
CPU forward 100000 took 0.005135576997417957
CPU forward 1000000 took 0.04715992201818153
CPU forward 10000000 took 0.4821214270195924
CPU forward TOTAL time 7.9119505700073205
CPU for- & backward 1000 took 0.00047759301378391683
CPU for- & backward 10000 took 0.0015945070190355182
CPU for- & backward 100000 took 0.018208994006272405
CPU for- & backward 1000000 took 0.15904426100314595
CPU for- & backward 10000000 took 1.5679037219961174
CPU for- & backward TOTAL time 9.495157692988869
```

#### WITHOUT original TH impl
```
using reduction:  none
CPU forward 1000 took 0.0003981560003012419
CPU forward 10000 took 0.0035912430030293763
CPU forward 100000 took 0.035353766987100244
CPU forward 1000000 took 0.3428319719969295
CPU forward 10000000 took 3.364342701010173
CPU forward TOTAL time 11.166179805004504

using reduction:  mean
CPU forward 1000 took 8.63690220285207e-05
CPU forward 10000 took 0.0004704220045823604
CPU forward 100000 took 0.0045734510058537126
CPU forward 1000000 took 0.046232511987909675
CPU forward 10000000 took 0.4191019559802953
CPU forward TOTAL time 7.846049971994944
CPU for- & backward 1000 took 0.0005974550149403512
CPU for- & backward 10000 took 0.0014057719963602722
CPU for- & backward 100000 took 0.013776941981632262
CPU for- & backward 1000000 took 0.13876214998890646
CPU for- & backward 10000000 took 1.3666698939923663
CPU for- & backward TOTAL time 9.10526105100871

using reduction:  sum
CPU forward 1000 took 7.598899537697434e-05
CPU forward 10000 took 0.00046885499614290893
CPU forward 100000 took 0.0044489419960882515
CPU forward 1000000 took 0.04495517900795676
CPU forward 10000000 took 0.418376043002354
CPU forward TOTAL time 7.789334400993539
CPU for- & backward 1000 took 0.0004464260127861053
CPU for- & backward 10000 took 0.0017732900159899145
CPU for- & backward 100000 took 0.01626713399309665
CPU for- & backward 1000000 took 0.11790941300569102
CPU for- & backward 10000000 took 1.4346664609911386
CPU for- & backward TOTAL time 9.294745502003934
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28304

Differential Revision: D18350157

Pulled By: ezyang

fbshipit-source-id: e9437debe51386a483f4265193c475cdc90b28e4
2019-11-09 18:31:20 -08:00
Xiaomeng Yang
2460dced8f Add torch.nn.GELU for GELU activation (#28944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28944

Add torch.nn.GELU for GELU activation

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GELU"

Reviewed By: hl475, houseroad

Differential Revision: D18240946

fbshipit-source-id: 6284b30def9bd4c12bf7fb2ed08b1b2f0310bb78
2019-11-03 21:55:05 -08:00
Lu Fang
e9a91756cd Back out "[pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU)"
Summary: Original commit changeset: 9ddffe4dbbfa

Test Plan: ci

Reviewed By: yf225

Differential Revision: D17939581

fbshipit-source-id: 44a3b843bf1e7059fec57b9e3d12ed4886816145
2019-10-15 21:12:10 -07:00
Edward Yang
2aa84d927b Revert D17939700: Revert D17889288: [pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU)
Test Plan: revert-hammer

Differential Revision:
D17939700

Original commit changeset: 4fc6156ba388

fbshipit-source-id: dded0a2140d2c14cd2f2a574987ecc164b0e5bfe
2019-10-15 15:24:36 -07:00
Edward Yang
c44e33b578 Revert D17889288: [pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU)
Test Plan: revert-hammer

Differential Revision:
D17889288

Original commit changeset: 9ddffe4dbbfa

fbshipit-source-id: 4fc6156ba38834512b2f735ac0d03e34e69b7286
2019-10-15 14:35:28 -07:00
Andreas Koepf
9033ace9c4 Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) (#27673)
Summary:
Replaces fused TH kernels with a 2-liner of regular Tensor functions.
Benchmarking revealed that performance improves compared to PyTorch 1.2.

Refs: https://github.com/pytorch/pytorch/issues/24631, https://github.com/pytorch/pytorch/issues/24632, https://github.com/pytorch/pytorch/issues/24764, https://github.com/pytorch/pytorch/issues/24765
VitalyFedyunin

### Benchmarking results on my laptop:

## 1.4.0a0+f63c9e8 output
```
PyTorch version: 1.4.0a0+f63c9e8
CPU Operator sanity check:
tensor(0.5926, grad_fn=<MeanBackward0>)
tensor([-0.0159, -0.0170, -0.0011, -0.0083, -0.0140, -0.0217, -0.0290, -0.0262,
        -0.0078, -0.0129])
double backward
tensor(-0.1540, grad_fn=<SumBackward0>)
ok

GPU Operator sanity check:
tensor(0.5601, device='cuda:0', grad_fn=<MeanBackward0>)
tensor([-0.0393, -0.0316, -0.0233, -0.0140, -0.0141, -0.0161, -0.0322, -0.0238,
        -0.0054, -0.0151], device='cuda:0')
double backward
tensor(-0.2148, device='cuda:0', grad_fn=<SumBackward0>)
ok

CPU warmup 1000 took 9.025700273923576e-05
CPU warmup 10000 took 0.0009383050055475906
CPU warmup 100000 took 0.0015631120040779933
CPU warmup TOTAL time 0.0026368020044174045
CPU forward 1000 took 6.919399311300367e-05
CPU forward 10000 took 0.00014462800754699856
CPU forward 100000 took 0.0011234670091653243
CPU forward 1000000 took 0.014555767003912479
CPU forward 10000000 took 0.13409724000666756
CPU forward 100000000 took 1.246048310000333
CPU forward TOTAL time 1.3961777170043206
CPU for- & backward 1000 took 0.0003219560021534562
CPU for- & backward 10000 took 0.00037290599721018225
CPU for- & backward 100000 took 0.001975035003852099
CPU for- & backward 1000000 took 0.02621342398924753
CPU for- & backward 10000000 took 0.2944270490115741
CPU for- & backward 100000000 took 1.6856628700043075
CPU for- & backward TOTAL time 2.0091958299890393

GPU warmup 1000 took 0.0002462909906171262
GPU warmup 10000 took 9.991199476644397e-05
GPU warmup 100000 took 0.00034347400651313365
GPU warmup TOTAL time 0.0007382350013358518
GPU forward 1000 took 9.67290106927976e-05
GPU forward 10000 took 9.349700121674687e-05
GPU forward 100000 took 9.384499571751803e-05
GPU forward 1000000 took 0.0004975290066795424
GPU forward 10000000 took 0.0017606960027478635
GPU forward 100000000 took 0.003572814996005036
GPU forward TOTAL time 0.006185991995153017
GPU for- & backward 1000 took 0.00035818999458570033
GPU for- & backward 10000 took 0.0003240450023440644
GPU for- & backward 100000 took 0.0003223370003979653
GPU for- & backward 1000000 took 0.00036740700306836516
GPU for- & backward 10000000 took 0.0003690610028570518
GPU for- & backward 100000000 took 0.0003672500024549663
GPU for- & backward TOTAL time 0.002197896988946013
```

## 1.2 output
```
PyTorch version: 1.2.0
CPU Operator sanity check:
tensor(0.5926, grad_fn=<SoftMarginLossBackward>)
tensor([-0.0159, -0.0170, -0.0011, -0.0083, -0.0140, -0.0217, -0.0290, -0.0262,
        -0.0078, -0.0129])
double backward
tensor(-0.1540, grad_fn=<SumBackward0>)
ok

GPU Operator sanity check:
tensor(0.5601, device='cuda:0', grad_fn=<SoftMarginLossBackward>)
tensor([-0.0393, -0.0316, -0.0233, -0.0140, -0.0141, -0.0161, -0.0322, -0.0238,
        -0.0054, -0.0151], device='cuda:0')
double backward
tensor(-0.2148, device='cuda:0', grad_fn=<SumBackward0>)
ok

CPU warmup 1000 took 8.422900282312185e-05
CPU warmup 10000 took 0.00036992700188420713
CPU warmup 100000 took 0.003682684007799253
CPU warmup TOTAL time 0.004169487991021015
CPU forward 1000 took 5.521099956240505e-05
CPU forward 10000 took 0.00036948200431652367
CPU forward 100000 took 0.003762389998883009
CPU forward 1000000 took 0.03725024699815549
CPU forward 10000000 took 0.3614480490068672
CPU forward 100000000 took 3.6139175269927364
CPU forward TOTAL time 4.016912263003178
CPU for- & backward 1000 took 0.0002734809968387708
CPU for- & backward 10000 took 0.0006605249946005642
CPU for- & backward 100000 took 0.005437346000690013
CPU for- & backward 1000000 took 0.051245586000732146
CPU for- & backward 10000000 took 0.5291594529990107
CPU for- & backward 100000000 took 5.23841712900321
CPU for- & backward TOTAL time 5.8253340990049765

GPU warmup 1000 took 0.0005757809994975105
GPU warmup 10000 took 0.0004058420017827302
GPU warmup 100000 took 0.0003764610009966418
GPU warmup TOTAL time 0.0013992580061312765
GPU forward 1000 took 0.0003543390048434958
GPU forward 10000 took 0.0003633670130511746
GPU forward 100000 took 0.0004807310033356771
GPU forward 1000000 took 0.0005875999922864139
GPU forward 10000000 took 0.0016903509967960417
GPU forward 100000000 took 0.014400018990272656
GPU forward TOTAL time 0.0179396449966589
GPU for- & backward 1000 took 0.0006167769897729158
GPU for- & backward 10000 took 0.0006845899915788323
GPU for- & backward 100000 took 0.000631830989732407
GPU for- & backward 1000000 took 0.0010741150035755709
GPU for- & backward 10000000 took 0.0017265130009036511
GPU for- & backward 100000000 took 0.014847910992102697
GPU for- & backward TOTAL time 0.01965981800458394
```

### Code used for performance test
```
import torch
import torch.nn.functional as F
import torch.nn as nn

from timeit import default_timer

torch.manual_seed(0)
cpu = torch.device('cpu')
gpu = torch.device('cuda')

loss_fn = F.soft_margin_loss

def run_benchmark(name, depth, require_grad, device, fn):
    total_start = default_timer()
    for i in range(3, 3 + depth):
        start = default_timer()
        n = 10 ** i
        a = torch.rand(n, requires_grad=require_grad, device=device)
        b = torch.rand(n, device=device)
        fn(a, b)
        end = default_timer()
        print('{} {} took {}'.format(name, n, end-start))
    total_end = default_timer()
    print('{} TOTAL time {}'.format(name, total_end-total_start))

def fwd_only(a, b):
    out = loss_fn(a, b)

def fwd_bck(a, b):
    out = loss_fn(a, b)
    out.backward()

def sanity_check(name, device):
    print('{} Operator sanity check:'.format(name))
    a = torch.rand(10, requires_grad=True, device=device)
    b = torch.rand(10, device=device)
    out = loss_fn(a,b)
    print(out)
    out.backward()
    print(a.grad)
    print('double backward')
    loss = loss_fn(a, b)
    loss2 = torch.autograd.grad(loss, a, create_graph=True)
    z = loss2[0].sum()
    print(z)
    z.backward()
    print('ok')
    print()

print('PyTorch version:', torch.__version__)
sanity_check('CPU', cpu)
sanity_check('GPU', gpu)
print()

run_benchmark('CPU warmup', 3, False, cpu, fwd_only)
run_benchmark('CPU forward', 6, False, cpu, fwd_only)
run_benchmark('CPU for- & backward', 6, True, cpu, fwd_bck)
print()

run_benchmark('GPU warmup', 3, False, gpu, fwd_only)
run_benchmark('GPU forward', 6, False, gpu, fwd_only)
run_benchmark('GPU for- & backward', 6, True, gpu, fwd_bck)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27673

Differential Revision: D17889288

Pulled By: ezyang

fbshipit-source-id: 9ddffe4dbbfab6180847a8fec32443910f18f0a9
2019-10-15 08:44:57 -07:00
zou3519
e5d6b75319 Bag of documentation fixes; fix more sphinx warnings (#27850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850

Many of these are real problems in the documentation (i.e., link or
bullet point doesn't display correctly).

Test Plan: - built and viewed the documentation for each change locally.

Differential Revision: D17908123

Pulled By: zou3519

fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a
2019-10-15 07:31:14 -07:00
Ailing Zhang
15f9fe1d92 Add missing Optional annotation.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27564

Differential Revision: D17816121

Pulled By: ailzhang

fbshipit-source-id: 5a4ac12ed81bf5d900ec3e7ab616082cb98d832d
2019-10-11 09:04:29 -07:00
Guanheng Zhang
eb93200321 Fix DDP incompatibility issue with nn.MultiheadAttention. (#26826)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/26698.

With different query/keys/value dimensions, `nn.MultiheadAttention` has DDP incompatibility issue because in that case `in_proj_weight` attribute is created but not used. Fix it and add a distributed unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26826

Differential Revision: D17583807

Pulled By: zhangguanheng66

fbshipit-source-id: c393584c331ed4f57ebaf2d4015ef04589c973f6
2019-10-08 12:13:34 -07:00
Lara
d396c7332a Update ONNX Export for Interpolate in Opset 11 (#26778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26778

- Add support for linear and cubic interpolate in opset 11.
- Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8.
- Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample).
Original PR resolved: https://github.com/pytorch/pytorch/pull/24805

Reviewed By: hl475

Differential Revision: D17564911

Pulled By: houseroad

fbshipit-source-id: 591e1f5b361854ace322eca1590f8f84d29c1a5d
2019-09-25 05:43:20 -07:00
Edward Yang
1bb895e1c1 Revert D17330801: [pytorch][PR] Update ONNX Export for Interpolate in Opset 11
Test Plan: revert-hammer

Differential Revision:
D17330801

Original commit changeset: 1bdefff9e72f

fbshipit-source-id: dff07477403170c27260f736ab6e6010f0deca9f
2019-09-24 18:56:45 -07:00
Lara
de3d4686ca Update ONNX Export for Interpolate in Opset 11 (#24805)
Summary:
- Add support for linear and cubic interpolate in opset 11.
- Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8.
- Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24805

Reviewed By: hl475

Differential Revision: D17330801

Pulled By: houseroad

fbshipit-source-id: 1bdefff9e72f5e70c51f4721e1d7347478b7505b
2019-09-24 16:29:57 -07:00
Patrick Donnelly
883628cb5c Added documentation for nn.functional.bilinear (#24951)
Summary:
Adds documentation for `nn.functional.bilinear`, as requested in https://github.com/pytorch/pytorch/issues/9886.

The format follows that of `nn.functional.linear`, and borrows from `nn.bilinear` in its description of `Tensor` shapes.

I am happy to add more extensive documentation (e.g. "Args," "Example(s)"). From what I gather, the format of comments is inconsistent across functions in `nn.functional.py` and between modules (e.g. `nn.functional` and `nn`). It's my first PR, so guidance for contributing documentation and other code would be greatly appreciated!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24951

Differential Revision: D17091261

Pulled By: soumith

fbshipit-source-id: efe2ad764700dfd6f30eedc03de4e1cd0d10ac72
2019-08-28 08:19:25 -07:00
bnehoran
74b65c32be Add align_corners option to grid_sample and affine_grid, change default to False (#24929)
Summary:
Resolves: https://github.com/pytorch/pytorch/issues/20785
Addresses https://github.com/pytorch/pytorch/issues/24470 for `affine_grid`
Subsumes and closes: https://github.com/pytorch/pytorch/pull/24878 and likewise closes: https://github.com/pytorch/pytorch/issues/24821

Adds the `align_corners` option to `grid_sample` and `affine_grid`, paralleling the option that was added to `interpolate` in version 0.4.0.

In short, setting `align_corners` to `False` allows these functions to be resolution agnostic.
This ensures, for example, that a grid generated from a neural net trained to warp 1024x1024 images will also work to warp the same image upsampled/downsampled to other resolutions like 512x512 or 2048x2048 without producing scaling/stretching artifacts.

Refer to the documentation and https://github.com/pytorch/pytorch/issues/20785 for more details.

#### BC-Breaking Changes

- **Important**: BC-Breaking change because of new default for `align_corners`
The old functionality can still be achieved by setting `align_corners=True`, but the default is now set to `align_corners=False`, since this is the more correct setting, and since this matches the default setting of `interpolate`.

- **Should not cause BC issues**: BC-Breaking change for pathological use case
2D affine transforms on 1D coordinates and 3D affine transforms on 2D coordinates (that is, when one of the spatial dimensions has an empty span) are ill-defined, and not an intended use case of `affine_grid`. Whereas before, all grid point components along such dimension were set arbitrarily to `-1` (that is, before multiplying be the affine matrix), they are now all set instead to `0`, which is a much more consistent and defensible arbitrary choice. A warning is triggered for such cases.

#### Documentation

- Update `affine_grid` documentation to express that it does indeed support 3D affine transforms. This support was already there but not documented.
- Add documentation warnings for BC-breaking changes in `grid_sample` and `affine_grid` (see above).

#### Refactors

- `affine_grid` no longer dispatches to cuDNN under any circumstances.
The decision point for when the cuDNN `affine_grid_generator` is compatible with the native PyTorch version and when it fails is a headache to maintain (see [these conditions](5377478e94/torch/nn/_functions/vision.py (L7-L8))). The native PyTorch kernel is now used in all cases.

- The kernels for `grid_sample` are slightly refactored to make maintenance easier.

#### Tests
Two new tests are added in `test_nn.py`:
- `test_affine_grid_error_checking` for errors and warnings in `affine_grid`
- `test_affine_grid_3D` for testing `affine_grid`'s 3D functionality. The functionality existed prior to this, but wasn't tested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24929

Differential Revision: D16949064

Pulled By: ailzhang

fbshipit-source-id: b133ce0d47a2a5b3e2140b9d05fb05fca9140926
2019-08-21 21:17:49 -07:00
Ailing Zhang
b0737ccdc1 Revert D16887357: [pytorch][PR] [BC-BREAKING] Add align_corners option to grid_sample and affine_grid, change default to False
Differential Revision:
D16887357

Original commit changeset: ea09aad7853e

fbshipit-source-id: 0bebb159be4e6ebe479771b42c0b483f5a84a094
2019-08-19 22:05:56 -07:00
Barak Nehoran
87217cfd2a Add align_corners option to grid_sample and affine_grid, change default to False (#23923)
Summary:
Resolves: https://github.com/pytorch/pytorch/issues/20785

Adds the `align_corners` option to `grid_sample` and `affine_grid`, paralleling the option that was added to `interpolate` in version 0.4.0.

In short, setting `align_corners` to `False` allows these functions to be resolution agnostic.
This ensures, for example, that a grid generated from a neural net trained to warp 1024x1024 images will also work to warp the same image upsampled/downsampled to other resolutions like 512x512 or 2048x2048 without producing scaling/stretching artifacts.

Refer to the documentation and https://github.com/pytorch/pytorch/issues/20785 for more details.

**Important**: BC-Breaking Change because of new default
The old functionality can still be achieved by setting `align_corners=True`, but the default is now set to `align_corners=False`, since this is the more correct setting, and since this matches the default setting of `interpolate`.

The vectorized 2D cpu version of `grid_sampler` is refactored a bit. I don’t suspect that this refactor would affect the runtime much, since it is mostly done in inlined functions, but I may be wrong, and this has to be verified by profiling.

~The tests are not yet updated to reflect the new default. New tests should probably also be added to test both settings of `align_corners`.~ _Tests are now updated._
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23923

Differential Revision: D16887357

Pulled By: ailzhang

fbshipit-source-id: ea09aad7853ef16536e719a898db8ba31595daa5
2019-08-19 09:45:44 -07:00
Elias Ellison
33a1c30cb1 cleanup torch/nn/functional.py (#23977)
Summary:
Cleanup torch/nn/functional now that JIT:
- Handles multiple returns
- Typechecks exits (exceptions)
- assertions refine types
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23977

Differential Revision: D16697750

Pulled By: eellison

fbshipit-source-id: 1f777d6b9ead1105de50120fffd46d523e1e6797
2019-08-07 16:31:36 -07:00
Tongzhou Wang
3107f1dcd5 fix align_corners doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23707

Differential Revision: D16617565

Pulled By: ezyang

fbshipit-source-id: 9ae581e9233d8c2b92f35b9486af1dab30ce8e3a
2019-08-02 12:43:35 -07:00
Ailing Zhang
b7d90332ea add notes about overshoot in bicubic mode (#23321)
Summary:
fix https://github.com/pytorch/pytorch/issues/21044

Bicubic interpolation can cause overshoot.

Opencv keeps results dtype aligned with input dtype:
- If input is uint8, the result is clamped [0, 255]
- If input is float, the result is unclamped.

In Pytorch case, we only accept float input, so we'll keep the result unclamped, and add some notes so that users can explicitly call `torch.clamp()` when necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23321

Differential Revision: D16464796

Pulled By: ailzhang

fbshipit-source-id: 177915e525d1f54c2209e277cf73e40699ed1acd
2019-07-24 14:46:37 -07:00
Igor Fedan
c2df54d6d0 avg_pool2d avg_pool3d for LongTensor (#22433)
Summary:
Generate avg_pool2d/avg_pool3d for LongTensor for CPU.
Added divisor_override parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22433

Differential Revision: D16108809

Pulled By: ifedan

fbshipit-source-id: 8de7ff585a0479702cceafb5ccf9dfea62a9cc50
2019-07-17 19:59:09 -07:00
Tongzhou Wang
332824551c Fix F.one_hot doc signature
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22929

Differential Revision: D16290741

Pulled By: ezyang

fbshipit-source-id: d8b979e64d92b94c5a70bb4ffe2a83042ed6abfc
2019-07-17 13:23:25 -07:00
David Riazati
10c4b98ade Remove weak script (#22212)
Summary:
* Deletes all weak script decorators / associated data structures / methods
   * In order to keep supporting the standard library in script, this enables recursive script on any function defined in `torch.nn`
   * Most changes in `torch/nn` are the result of `ag -Q "weak" torch/nn/ -l | xargs sed -i '/weak/d'`, only `rnn.py` needed manual editing to use the `ignore` and `export` to continue supporting the overloaded `forward` methods
* `Sequential`/`ModuleList` no longer need to be added to constants since they are compiled on demand

This should also fix https://github.com/pytorch/pytorch/issues/22212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22212

Differential Revision: D15988346

Pulled By: driazati

fbshipit-source-id: af223e3ad0580be895377312949997a70e988e4f
2019-07-03 17:28:25 -07:00
Guanheng Zhang
bb0f299f27 Update MultiheadAttention module support key/value with different number of features and allow static key/value (#21288)
Summary:
The changes include:

1. Allow key/value to have different number of features with query. It supports the case when key and value have different feature dimensions.
2. Support three separate proj_weight, in addition to a single in_proj_weight. The proj_weight of key and value may have different dimension with that of query so three separate proj_weights are necessary. In case that key and value have same dimension as query, it is preferred to use a single large proj_weight for performance reason. However, it should be noted that using a single large weight or three separate weights is a size-dependent decision.
3. Give an option to use static k and v in the multihead_attn operator (see saved_k and saved_v). Those static key/value tensors can now be re-used when training the model.
4. Add more test cases to cover the arguments.

Note: current users should not be affected by the changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21288

Differential Revision: D15738808

Pulled By: zhangguanheng66

fbshipit-source-id: 288b995787ad55fba374184b3d15b5c6fe9abb5c
2019-07-02 18:06:25 -07:00
Lara
34aee933f9 ONNX Export Interpolate (Resize) for opset version 10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21434

Reviewed By: zrphercule

Differential Revision: D15777197

Pulled By: houseroad

fbshipit-source-id: 517b06a54a234ffdb762401e83f5a732023ed259
2019-06-19 13:40:27 -07:00
Ivan Ogasawara
0f675f9cbc Port im2col and vol2col (#21769)
Summary:
resolves partially https://github.com/pytorch/pytorch/issues/18353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21769

Differential Revision: D15854530

Pulled By: ezyang

fbshipit-source-id: 574853c068010d1b7588047d2ab7450077471447
2019-06-17 10:06:26 -07:00
Natalia Gimelshein
efd20de276 fix multihead attention for half (#21658)
Summary:
Currently multihead attention for half type is broken
```
  File "/home/ngimel/pytorch/torch/nn/functional.py", line 3279, in multi_head_attention_forward
    attn_output = torch.bmm(attn_output_weights, v)
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument https://github.com/pytorch/pytorch/issues/2 'mat2'
```
because softmax converts half inputs into fp32 inputs. This is unnecessary - all the computations in softmax will be done in fp32 anyway, and the results need to be converted into fp16 for the subsequent batch matrix multiply, so nothing is gained by writing them out in fp32. This PR gets rid of type casting in softmax, so that half works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21658

Differential Revision: D15807487

Pulled By: zhangguanheng66

fbshipit-source-id: 4709ec71a36383d0d35a8f01021e12e22b94992d
2019-06-13 15:17:04 -07:00
Kabir Kwatra
26bcadcc61 Gumbel-Softmax Arxiv Docs Link Fix (#21376)
Summary:
Links separated #20297
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21376

Differential Revision: D15696413

Pulled By: ezyang

fbshipit-source-id: 513bd430e41c109aa2d0fbaa9a242acb2a12059b
2019-06-06 10:11:18 -07:00
Xiaomeng Yang
0c6efbd410 Fix gelu documents (#21265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21265

Fix gelu documents

Reviewed By: hl475

Differential Revision: D15598958

fbshipit-source-id: 483040069102daada705401c36c8990598142d3d
2019-06-02 20:17:56 -07:00
Xiaomeng Yang
93ae040ff0 Add gelu activation in pytorch (#20665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20665

Add gelu activation forward on CPU in pytorch

Compare to current python implemented version of gelu in BERT model like

  def gelu(self, x):
      x * 0.5 * (1.0 + torch.erf(x / self.sqrt_two))

The torch.nn.functional.gelu function can reduce the forward time from 333ms to 109ms (with MKL) / 112ms (without MKL) for input size = [64, 128, 56, 56] on a devvm.

Reviewed By: zheng-xq

Differential Revision: D15400974

fbshipit-source-id: f606b43d1dd64e3c42a12c4991411d47551a8121
2019-06-02 09:08:47 -07:00
Guanheng Zhang
8e3311c5e2 Remove functionality unsupported by the JIT from multi_head_attention_forward. (#20653)
Summary:
Remove the internal functions in multi_head_attention_forward. Those internal functions cause 10-15% performance regression and there is possibly a JIT issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20653

Differential Revision: D15398888

Pulled By: cpuhrsch

fbshipit-source-id: 0a3f053a4ade5009e73d3974fa6733c2bff9d929
2019-05-27 15:12:58 -07:00
daquexian
a3a458ed30 Fix align corner docs (#20961)
Summary:
I believe the `True` and `False` in the doc are reversed :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20961

Differential Revision: D15510806

Pulled By: soumith

fbshipit-source-id: 62566bb595e187506b23dedc24892e48f35b1147
2019-05-26 14:57:37 -07:00
Yifu Wang
5e69e76aba Remove padding_mode from torch.nn.functional.conv{1,2,3}d's docstr (#20891)
Summary:
Fixes #20694
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20891

Differential Revision: D15510790

Pulled By: soumith

fbshipit-source-id: aa3630693c7446bf18a390cb49c4df9bc9c59eea
2019-05-26 14:52:51 -07:00
Josef Lindman Hörnlund
87040af498 Fix documentation for attention mask shape (#20850)
Summary:
Attention mask should be of shape `(L, S)` since it is added to `attn_output_weights`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20850

Differential Revision: D15495587

Pulled By: ezyang

fbshipit-source-id: 61d6801da5291df960daab273e874df28aedbf6e
2019-05-24 09:10:11 -07:00
Guanheng Zhang
3caf4e6985 Remove weak_script in MultiheadAttention function. (#20563)
Summary:
Remove weak_script. After recently splitting the forward() function in MultiheadAttention module, we notice a memory leak on GPU. Fix the problem by removing those "weak_script" decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20563

Differential Revision: D15368262

Pulled By: zhangguanheng66

fbshipit-source-id: 475db93c9ee0dbaea8fb914c004e7d1e0d419bc2
2019-05-15 20:10:39 -07:00
Jason Lian
6e82b1c77d Split nn.MultiHeadAttention into Module + functional (#20415)
Summary:
Moving functions from torch/nn/modules/activation.py to torch/nn/functional.py. For functions not implemented (_get_input_buffer and _set_input_buffer), a TODO is added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20415

Differential Revision: D15318078

Pulled By: jamarshon

fbshipit-source-id: 5ca698e2913821442cf8609cc61ac8190496a3c6
2019-05-14 08:41:28 -07:00
interesaaat
35fed93b1e Adding Poisson NLL loss to libtorch (#19316)
Summary:
This PR add Poisson NLL loss to aten and substitute the python implementation with a call to the c++.

Fixes #19186.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19316

Differential Revision: D15012957

Pulled By: ezyang

fbshipit-source-id: 0a3f56e8307969c2f9cc321b5357a496c3d1784e
2019-05-10 11:57:49 -07:00
Ailing Zhang
899bddeeb6 fix typo in adaptive methods annotation (#20306)
Summary:
fixes #20215
The confusing behavior was caused by typos in type annotation :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20306

Differential Revision: D15276216

Pulled By: ailzhang

fbshipit-source-id: 1b0c9635a72a05c9b537f80d85b117b5077fbec7
2019-05-09 09:29:37 -07:00
Mikhail Zolotukhin
3a0727e58b Fix flake8. (#19832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19832
ghimport-source-id: 7360a52dbcf83458797c27002afc1fd53ee5907f

Differential Revision: D15115620

Pulled By: ZolotukhinM

fbshipit-source-id: aa62b04facc1e1824a8889a32dace5804daa21df
2019-04-30 12:09:10 -07:00
Tongzhou Wang
42fbeef5d7 update F.grid_sample doc for clarity (#19754)
Summary:
https://github.com/pytorch/pytorch/issues/19717
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19754

Differential Revision: D15085449

Pulled By: soumith

fbshipit-source-id: 0dda05bd395d58a496bf397ca7f1c50a239b0ed1
2019-04-26 16:01:24 -07:00
Wanchao Liang
e9c8f372c4 dispatch max_pools with no indices, expose max_pools to torch namespace (#19449)
Summary:
in functional interfaces we do boolean dispatch, but all to max_pool\*d_with_indices. This change it to emit max_pool\*d op instead when it's not necessary to expose with_indices ops to different backends (for jit).

It also bind max_pool\*d to the torch namespace, which is the same behavior with avg_pool\*d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19449

Differential Revision: D15016839

Pulled By: wanchaol

fbshipit-source-id: f77cd5f0bcd6d8534c1296d89b061023a8288a2c
2019-04-23 11:20:05 -07:00
Richard Zou
2a2007e5ac EmbeddingBag CPU forward with per_sample_weights. (#18735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18735
ghimport-source-id: d81bef54dafd7167d2451250d7be478d3c013920

Reviewed By: cpuhrsch

Differential Revision: D14851415

Pulled By: zou3519

fbshipit-source-id: cea6039e760ad571b90f0a536e420498f34be325
2019-04-09 18:12:55 -07:00
Zachary DeVito
09c19e1068 Fix interpolate tracing (#19034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19034
ghimport-source-id: 874e0b0a8685184416152a77fc1850d9a06516ae

Differential Revision: D14837282

Pulled By: zdevito

fbshipit-source-id: b0ed82b607c288a54eecec3d6ed62c4626e5a563
2019-04-08 14:59:26 -07:00
Elias Ellison
e6bbbb017e Fix interpolate trace (#18875)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/10654

The issue is that in tracing `.size` returns an int tensor, and when an int tensor is multiplied by a scalar the int dominates and the scalar gets casted 0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18875

Differential Revision: D14814441

Pulled By: eellison

fbshipit-source-id: a4e96a2698f2fcbf3ec4b2bb4c43a30250f30ad9
2019-04-05 17:55:23 -07:00
Joakim Rishaug
b90cbb841d Method is supposed to be in-place (#18684)
Summary:
Tracing models which attempts to return this in-place value doesn't turn out well.

I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before.

Sample output from traced model attempting to set `max_norm` on `Embedding`:
```
a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #39: python_call + 0x11 (0x5563c3c521c1 in uwsgi)
frame #40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi)
frame #41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi)
frame #42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi)
frame #43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi)
frame #44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi)
frame #45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi)
frame #46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi)
frame #47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi)
frame #48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6)
frame #49: _start + 0x2a (0x5563c3bec16a in uwsgi)
:
operation failed in interpreter:
op_version_set = 0
def forward(self,
    input_1: Tensor) -> Tensor:
  _0 = torch.norm(self.item_embedding.weight, 2, 1, True)
  _1 = torch.div(self.item_embedding.weight, _0)
  m_weight = torch.t(_1)
  input_2 = torch.contiguous(input_1)
  weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.)
             ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  x = torch.embedding(weight_1, input_2, -1, False, False)
  input_3 = torch.div(x, torch.norm(x, 2, 2, True))
  max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0))
  hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu"))
  _2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1]
  input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.10000000000000001, False, False, True)
  input = torch.matmul(input_4, torch.t(self.rnn2item.weight))
  tastevec = torch.div(input, torch.norm(input, 2, 2, True))
  outputs = torch.matmul(tastevec, m_weight)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18684

Differential Revision: D14782041

Pulled By: ezyang

fbshipit-source-id: 7b2fc19b7d5b6600263644498bb728319a19f39d
2019-04-05 13:00:29 -07:00
Soumith Chintala
cb39bd9c2f pad_circular -> _pad_circular (#18608)
Summary:
pad_circular is really private, as circular padding is exposed via `F.pad`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18608

Differential Revision: D14691704

Pulled By: soumith

fbshipit-source-id: 8c2f90596feed670976115041efed3ca071e8306
2019-03-30 13:27:04 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Aurélien Roy
12abc8a99a Target and input sizes mismatch warning in L1 Loss / L1 Smooth Loss (#18565)
Summary:
Addind the same warning message already present in the mse_loss function to the L1 losses when input and target sizes are different.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18565

Differential Revision: D14671415

Pulled By: soumith

fbshipit-source-id: 01f5e1fb1ea119dbb2aecf1d94d0cb462f284982
2019-03-28 20:49:51 -07:00
mc-robinson
8bc5b86709 Added tensor size warning to F.mse_loss() (#18349)
Summary:
To address the issue of broadcasting giving the wrong result in `nn.MSELoss()` as mentioned here https://github.com/pytorch/pytorch/issues/16045 . In particular, the issue often arises when computing the loss between tensors with shapes (n, 1) and (n,)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18349

Differential Revision: D14594176

Pulled By: soumith

fbshipit-source-id: f23ae68a4bf42f3554ad7678a314ba2c7532a6db
2019-03-24 19:22:14 -07:00
Narine Kokhlikyan
670f509984 Circular Convolution Function via circular padding (#17240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17240

Added circular padding in addition to zero padding to Conv1D, Conv2D and Conv3D based on the solution suggested in: https://github.com/pytorch/pytorch/issues/3858

Reviewed By: ezyang

Differential Revision: D14126416

fbshipit-source-id: a2f1587503ee0cfff98d5cb0d5b0a600ef8aaeb4
2019-03-18 12:33:20 -07:00
ZhuBaohe
75f88d4da6 Correct loss docstrings (#17300)
Summary:
In the loss doc description, replace the deprecated 'reduct' and 'size_average' parameters with the 'reduction' parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17300

Differential Revision: D14195789

Pulled By: soumith

fbshipit-source-id: 625e650ec20f13b2d22153a4a535656cf9c8f0eb
2019-03-10 11:56:41 -07:00
zou3519
68c5c66800 Warn about memory overlaps on expanded tensors (#17576)
Summary:
Eventually we should remove these when we're certain that all our ops
handle memory overlaps correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17576

Differential Revision: D14349990

Pulled By: zou3519

fbshipit-source-id: c3a09f6113b9b1bf93e7f13c0b426c45b2cdf21f
2019-03-06 17:44:04 -08:00
ZhuBaohe
19a6de328f Correct docstring of vision/init functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17351

Differential Revision: D14276355

Pulled By: soumith

fbshipit-source-id: 9b572b6a04eeb1e44cd93961edac76ed10f7b24e
2019-03-01 11:40:23 -08:00
vishwakftw
724c7e76c6 Fix reduction='none' in poisson_nll_loss (#17358)
Summary:
Changelog:
- Modify `if` to `elif` in reduction mode comparison
- Add error checking for reduction mode
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17358

Differential Revision: D14190523

Pulled By: zou3519

fbshipit-source-id: 2b734d284dc4c40679923606a1aa148e6a0abeb8
2019-02-25 10:35:33 -08:00
ZhuBaohe
e81878e0a9 Correct padding and activations docstrings in nn module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17197

Differential Revision: D14131284

Pulled By: soumith

fbshipit-source-id: 6edd225b47b1dde81b5ad0a23c588c6621987a69
2019-02-19 08:16:52 -08:00
ZhuBaohe
8852e21245 Correct recurrent/linear/dropout/sparse layers docstrings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17238

Differential Revision: D14130811

Pulled By: soumith

fbshipit-source-id: d3998ca7da46aec5a59220c6af489f71f3d60735
2019-02-19 05:23:04 -08:00
Krishna
b892f69440 one_hot docs missing (#17142)
Summary:
one_hot docs is missing [here](https://pytorch.org/docs/master/nn.html#one-hot).

I dug around and could not find a way to get this working properly.

Differential Revision: D14104414

Pulled By: zou3519

fbshipit-source-id: 3f45c8a0878409d218da167f13b253772f5cc963
2019-02-15 10:48:18 -08:00
ZhuBaohe
acf5ec07af Correct conv and pooling docstrings in nn module (#17052)
Summary:
This PR fix conv and pooling docstrings in nn module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17052

Differential Revision: D14068566

Pulled By: ezyang

fbshipit-source-id: 3ec1de232ff6334b6a544dadefbb0ee6193d443a
2019-02-15 06:58:02 -08:00
David Riazati
48943c3b7a Update Upsample docs to match nn.interpolate
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17134

Reviewed By: ezyang

Differential Revision: D14095694

Pulled By: driazati

fbshipit-source-id: 79afec9ddd50b3b8ce39acf98c2543cf1a3d1127
2019-02-15 06:38:41 -08:00
Ailing Zhang
b0545aa85f maskrcnn & bert AD coverage part 1 (#16689)
Summary:
- Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver.
- Added a hack to loop up keyword only argument. Will add proper support for kw only later
- Simulate function overload in aten using `_<number>` as function name suffix.
- Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support.
- Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk`  and leave them for next PR.

Ops supported in this PR:
```
erf
expand_as
index
kthvalue
mean
permute
pow
rsub
select
sqrt
squeeze
t
to
topk
transpose
view
var
embedding
logsumexp
// grad is None
_dim_arange
contiguous
nonzero
ones_like
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16689

Differential Revision: D14020806

Pulled By: ailzhang

fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5
2019-02-14 15:36:39 -08:00
Theo
3618b52c74 Add module and name to func created with _jit_internal.boolean_dispatch (#16922)
Summary:
The use case for making this PR is the following bug :
(with F = torch.nn.functional)
`F.max_pool2d.__module__` is `torch._jit_internal`
`F.max_pool2d.__name__` is `fn`

With this PR you get:
`F.max_pool2d.__module__` is `torch.nn.functional`
`F.max_pool2d.__name__` is `max_pool2d`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16922

Differential Revision: D14020053

Pulled By: driazati

fbshipit-source-id: c109c1f04640f3b2b69bc4790b16fef7714025dd
2019-02-12 09:38:48 -08:00
Thomas Viehmann
29f096cc70 optionally zero infinite losses in CTCLoss (#16199)
Summary:
Here is a stab at implementing an option to zero out infinite losses (and NaN gradients).
It might be nicer to move the zeroing to the respective kernels.
The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful.

This topic seems to come up regularly, e.g. in  #14335
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16199

Differential Revision: D14020462

Pulled By: ezyang

fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428
2019-02-11 13:12:55 -08:00
Wanchao Liang
ac00e85e36 Remove undefined tensor in jit script (#16379)
Summary:
This PR is a follow up of #15460, it did the following things:

* remove the undefined tensor semantic in jit script/tracing mode
* change ATen/JIT schema for at::index and other index related ops with `Tensor?[]` to align with what at::index is really doing and to adopt `optional[tensor]` in JIT
* change python_print to correctly print the exported script
* register both TensorList and ListOfOptionalTensor in JIT ATen ops to support both
* Backward compatibility for `torch.jit.annotate(Tensor, None)`

List of follow ups:

* remove the undefined tensor semantic in jit autograd, autodiff and grad_of
* remove prim::Undefined fully

For easy reviews, please turn on `hide white space changes` in diff settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16379

Differential Revision: D13855677

Pulled By: wanchaol

fbshipit-source-id: 0e21c14d7de250c62731227c81bfbfb7b7da20ab
2019-02-07 11:02:14 -08:00
vishwakftw
34b43baeec Allow list and tuples to be passed as output_size to max_unpool1d (#16489)
Summary:
Changelog:
- Modify concantenation of [1] to a tuple by using cases for list and non-list types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16489

Differential Revision: D13875838

Pulled By: soumith

fbshipit-source-id: fade65cc47385986b773b9bde9b4601ab93fe1cf
2019-01-30 11:00:34 -08:00
Lu Fang
b1b00f329e Fix the flake8 linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549

Reviewed By: bddppq

Differential Revision: D13877435

Pulled By: houseroad

fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540
2019-01-30 09:36:00 -08:00
Elias Ellison
c2be9f1487 Remove unneeded manual unwrap optionals (#16245)
Summary:
Remove calls to torch.jit._unwrap_optional that are no longer needed.

The remaining instances would require control flow logic for exceptions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16245

Differential Revision: D13804292

Pulled By: eellison

fbshipit-source-id: 08c5cbe4b956519be2333de5cf4e202488aff626
2019-01-24 15:48:01 -08:00
Egil Martinsson
d6a8dd9538 Cleanup gumbel_softmax (#13339)
Summary:
Fixes #12643, amends to #3341.

- Allow multidimensional input ~~(but apply softmax over `dim=-1`)~~ with `dim` argument
- Cleaner: Less lines of code
- Faster (1.32x speedup vs original, 2x speedup vs using `torch.Distributions`)
- Small fixes in docstring
- Remove some references in docstring. Was the linked (excellent) ipynb the first to do the straight-through trick? Instead, I propose changing to reference to the two papers most known for it.
- Add deprecationwarning for `eps`. It's not needed anymore.
- Initial commit keeps some code alternatives commented to exploit CI

- As of discussion when `gumbel_softmax` was added (#3341), this was merged into `torch.nn.functional` before all the work with `Distributions` and `Pyro`, and there will probably be multiple other best practices for this in the future.
I've tested building using the `Distributions`-api, but it was too slow, see below.

I therefore propose not using `Distributions` to keep it fast and simple, but adding a comment in docstring that `gumbel_softmax` may be deprecated in the future.

```
dist = torch.distributions.RelaxedOneHotCategorical(temperature=tau, logits=logits, validate_args=False)
y_soft = dist.rsample()
```

Pros:
* Built using tricks like `logsumexp` etc
* Explicitly uses `torch.distributions.utils._finfo` to avoid overflow (old implementation had an `eps` flag)
* Maintained for this exact purpose.

Cons:
* Very slow. Construction of distribution adds overhead see timings below. May be solved in future with speedups of `TransformedDistribution` and `Distribution`.
* Assumes which `dim` to apply softmax over.

```
    y_soft = logits.new(logits.shape)
    y_soft = (logits - y_soft.exponential_().log()) / tau  # Gumbel noise
    y_soft = y_soft.softmax(dim)  # Gumbel softmax noise
```
Pros:
* Faster

```
    import time
    start = time.time()
    num_draws = 1000000
    logits = torch.randn(1,3)

    for draw in range(num_draws):
        y_draw = gumbel_softmax(logits, hard=True)
        counts = counts + y_draw
    print(end - start)

>> 12.995795965194702

>> 7.658372640609741

>> 20.3382670879364
````

Decide on which path to chose. I'll commit in changes to the unit tests in a while to show that it passes both old tests and new tests. I'll also remove the commented code about `RelaxedOneHotCategorical`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13339

Differential Revision: D13092434

Pulled By: ezyang

fbshipit-source-id: 4c21788df336f4e9c2ac289022e395b261227b4b
2019-01-17 12:56:35 -08:00
Gregory Chanan
595f767880 Revert batched pdist, improve existing kernel, add test (#15901)
Summary:
1) Reverts https://github.com/pytorch/pytorch/pull/12302 which added support for batched pdist. Except I kept the (non-batched) test improvements that came with that PR, because they are nice to have.  Motivation: https://github.com/pytorch/pytorch/issues/15511
2) For the non-batched pdist, improved the existing kernel by forcing fp64 math and properly checking cuda launch errors
3) Added a 'large tensor' test that at least on my machine, fails on the batch pdist implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15901

Reviewed By: ezyang

Differential Revision: D13616730

Pulled By: gchanan

fbshipit-source-id: 620d3f9b9acd492dc131bad9d2ff618d69fc2954
2019-01-17 10:44:43 -08:00
Chandler Zuo
237c0c3c7a Port the backend of FractionalMaxPool3d from TH to ATen (#15575)
Summary:
1. Port the FractionalMaxPool3d implementation from THNN/THCUNN to ATen.
2. Expose this function to Python module nn.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15575

Differential Revision: D13612848

Pulled By: chandlerzuo

fbshipit-source-id: 5f474b39005efa7788e984e8a805456dcdc43f6c
2019-01-16 14:16:30 -08:00
Elias Ellison
7d601715e5 Constant prop prim::None (#15979)
Summary:
Previously we were only constant propping prim::Constants, but we should be constant propping prim::None as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15979

Differential Revision: D13664692

Pulled By: eellison

fbshipit-source-id: 01839403576c21fc030c427e49275b8e1210fa8f
2019-01-15 11:34:51 -08:00
Derek Kim
abdaa477e5 Improved the documentation for torch.nn.functional.pad (#15984)
Summary:
- Fixed a few typos and grammar errors.
- Changed the sentences a bit.
- Changed the format of the tuples to be consistent with padding notations in the other places. For example, `ReflectionPad2d`'s dostring contains :math:`H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}`.

I also made sure that the generated html doesn't break.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15984

Differential Revision: D13649939

Pulled By: soumith

fbshipit-source-id: 0abfa22a7bf1cbc6546ac4859652ce8741d41232
2019-01-14 04:12:45 -08:00
Derek Kim
da753b7ccf Trivial typo fixings in nn.functional dropout* docstrings (#15951)
Summary:
Defualt -> Default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15951

Differential Revision: D13633875

Pulled By: soumith

fbshipit-source-id: 0da823ef235418396e9322089f6610b592e6990f
2019-01-10 22:42:52 -08:00
Gao, Xiang
a47749cb28 Add at::one_hot (#15208)
Summary: Closes: https://github.com/pytorch/pytorch/issues/15060

Differential Revision: D13528014

Pulled By: ezyang

fbshipit-source-id: 5a18689a4c5638d92f9390c91517f741e5396293
2018-12-20 14:24:58 -08:00
Erik Brinkman
8db44eda01 Add support for batched pdist (#12302)
Summary:
This updates pdist to work for batched inputs, and updates the
documentation to reflect issues raised.

closes #9406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12302

Reviewed By: ezyang

Differential Revision: D13528485

Pulled By: erikbrinkman

fbshipit-source-id: 63d93a6e1cc95b483fb58e9ff021758b341cd4de
2018-12-20 09:41:08 -08:00
David Riazati
f3cc9b2218 Remove fully qualified weak script names (#15364)
Summary:
Cleanup to make references to `weak_script` consistent across codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15364

Differential Revision: D13509676

Pulled By: driazati

fbshipit-source-id: 93dbbbe57e9b9b6587895f3cc6fac678babd21de
2018-12-18 16:48:52 -08:00
David Riazati
3118124cd6 Add (Un)Fold modules to standard library (#14759)
Summary:
Depends on #14597 for the corresponding aten ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14759

Differential Revision: D13325356

Pulled By: driazati

fbshipit-source-id: 99e39449c1ccfa293de05672c31a11e580bdd11f
2018-12-18 12:03:08 -08:00
Roy Li
e0b261a35b Port nn fold and unfold to c++
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14597

Reviewed By: ezyang

Differential Revision: D13272227

fbshipit-source-id: 6eccab5ff5830a977398a96393b778095120edc6
2018-12-17 15:46:37 -08:00
David Riazati
59d71b9664 Bicubic interpolation for nn.functional.interpolate (#9849)
Summary:
Addresses #918, interpolation results should be similar to tf

* Adds bicubic interpolation operator to `nn.functional.interpolate`
* Corresponding test in `test_nn.py`

The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849

Differential Revision: D9007525

Pulled By: driazati

fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc
2018-12-17 15:31:48 -08:00
Yuxin Wu
110ccbb689 Improve the docs of interpolate(align_corners=) (#14806)
Summary:
ailzhang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14806

Reviewed By: ailzhang

Differential Revision: D13366332

Pulled By: ppwwyyxx

fbshipit-source-id: 08fcea95d5c86b11cdfe464fdd9daa50050871f1
2018-12-10 12:50:38 -08:00
David Riazati
a66669a110 Enable testing on Loss modules (#14778)
Summary:
This PR adds `None` buffers as parameters (similarly to #14715). It also cleans up a bunch of the `test_jit.py` tests that should be covered by `common_nn.py` and brings in `criterion_tests` to test loss functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14778

Differential Revision: D13330849

Pulled By: driazati

fbshipit-source-id: 924cc4cf94e0dcd11e811a55222fd2ebc42a9e76
2018-12-04 18:35:10 -08:00
Ailing Zhang
ef91cfd68b Add new reduction mode in kl_div (#14457)
Summary:
Fixes #6622 .
We used to average over all elements for kl divergence, which is not aligned with its math definition.
This PR corrects the default reduction behavior of KL divergence that it now naverages over batch dimension.

- In KL, default behavior `reduction=mean` averages over batch dimension. While for most other loss functions, `reduction=mean` averages over all elements.
- We used to support scalar tensor as well. For BC purpose, we still support it, no reduction is performed on scalar tensor.
- Added a new reduction mode called `batchmean` which has the correct behavior for KL. Add a warning to make `batchmean` as default for KL instead of `mean` in next major release.
- [deprecated]I chose to not add a new reduction option, since "mean over batch dimension" is kinda special, and it only makes sense in few cases like KL. We don't want to explain why there's a option "batchmean" but it's not applicable for all other functions. I'm open to discussion on this one, as I cannot think of a perfect solution for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14457

Differential Revision: D13236016

Pulled By: ailzhang

fbshipit-source-id: 905cc7b3bfc35a11d7cf098b1ebc382170a087a7
2018-12-04 12:24:28 -08:00
David Riazati
a23863fd6f Add Pooling modules to Script (#14527)
Summary:
Depends on #14584
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14527

Differential Revision: D13270773

Pulled By: driazati

fbshipit-source-id: e4acd43ccbce0f4b62d41c30ce8d5c721171e19a
2018-12-03 23:55:04 -08:00
David Riazati
d429e78a9a Add fractional_max_pool2d to standard lib
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14591

Differential Revision: D13270755

Pulled By: driazati

fbshipit-source-id: 138a60256795f5ef8d236c75be2cfd929059b98f
2018-12-03 23:49:38 -08:00
Elias Ellison
404ad939e5 Revert existing no_grad_embedding_renorm_ from aten (#14639)
Summary:
Remove no_grad_embedding_renorm_ from aten. Setting the derivatives of the inputs to false has different semantics from calling with no_grad(), because it will not error if an input is modified and then has it's grad accessed.

Instead, make a custom op, and use NoGradGuard.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14639

Differential Revision: D13285604

Pulled By: eellison

fbshipit-source-id: c7d343fe8f22e369669e92799f167674f124ffe7
2018-11-30 16:57:51 -08:00
David Riazati
89c3dbcad8 Add binary cross entropy to standard lib
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14583

Differential Revision: D13269423

Pulled By: driazati

fbshipit-source-id: 7cc1594d8189c3e8f2d4ce0462fdc0a03683006e
2018-11-29 22:23:13 -08:00
David Riazati
15e8bb379e Add List to annotations (#14482)
Summary:
This PR adds a polyfill for `typing.List` for Python versions that don't
support `typing` as a builtin. It also moves the type defintions from
`annotations.py` so that they can be used in `torch.nn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14482

Differential Revision: D13237570

Pulled By: driazati

fbshipit-source-id: 6575b7025c2d98198aee3b170f9c4323ad5314bd
2018-11-29 17:23:29 -08:00
David Riazati
666d383a00 Add broadcast list default arg support (#14361)
Summary:
To convert `max_unpool` functions to weak script, this PR adds support
for `T` as default arguments for `BroadcastingListN[T]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361

Differential Revision: D13192231

Pulled By: driazati

fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249
2018-11-29 15:15:47 -08:00
David Riazati
9e93a02624 Use nn module tests in test_jit (#14238)
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13252887

Pulled By: driazati

fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988
2018-11-28 23:31:25 -08:00
Elias Ellison
6d63e9dbff Support Embedding + EmbeddingBag in Script + (Ignore flakey test) (#14509)
Summary:
Resubmitting PR #14415

The tests added for Embedding + EmbeddingBag had random numbers as input, which affected the random number generator & caused the flakey test to break.

Everything but the last two commits have already been accepted
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14509

Differential Revision: D13247917

Pulled By: eellison

fbshipit-source-id: ea6963c47f666c07687787e2fa82020cddc6aa15
2018-11-28 19:16:38 -08:00
Elias Ellison
105fa58748 pointwise_loss (#14134)
Summary:
Adding pointwise loss ops to weak_script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14134

Differential Revision: D13209455

Pulled By: eellison

fbshipit-source-id: 87fc0222121f34a2f4edb24c2da2a11124b097d8
2018-11-28 18:14:38 -08:00
Edward Yang
5f07b33857 Revert D13219647: [pytorch][PR] Support Embedding + EmbeddingBag in Script
Differential Revision:
D13219647

Original commit changeset: c90706aa6fbd

fbshipit-source-id: d189e717ba0773de43d633876bc3a688830a9303
2018-11-28 13:38:58 -08:00
Elias Ellison
7749804099 Support Embedding + EmbeddingBag in Script (#14415)
Summary:
Add support for Embedding and EmbeddingBag in script. Both functions require with torch.no_grad(), which we don't have any plans to support in the near future. To work around this, I added a embedding_renorm function without derivatives.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14415

Reviewed By: wanchaol

Differential Revision: D13219647

Pulled By: eellison

fbshipit-source-id: c90706aa6fbd48686eb10f3efdb65844be7b8717
2018-11-28 10:52:30 -08:00
David Riazati
3d98810fbd Revert D13192230: [pytorch][PR] [jit] Use nn module tests in test_jit
Differential Revision:
D13192230

Original commit changeset: 36488960b6c9

fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909
2018-11-28 00:23:09 -08:00
David Riazati
4cdcbbf410 Use nn module tests in test_jit (#14238)
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13192230

Pulled By: driazati

fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e
2018-11-27 21:19:51 -08:00
David Riazati
662f66ebb9 Add poisson_nll_loss to script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14420

Differential Revision: D13220726

Pulled By: driazati

fbshipit-source-id: 6c08a0050075beafcc8ba413c9603b273870c70c
2018-11-27 19:39:16 -08:00
David Riazati
d75f751bec Add boolean dispatch for function overloading (#14425)
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See max_pool1d for an example usage.

This is the first step in enabling the use of max_pool functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.

Fixes #14081
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14425

Differential Revision: D13222104

Pulled By: driazati

fbshipit-source-id: 8cb676b8b13ebcec3262234698edf4a7d7dcbbe1
2018-11-27 19:36:47 -08:00
Elias Ellison
82175f31b4 Move Affine grid to C++ (#14392)
Summary:
Port AffineGrid to C++, because script does not support compiling Function classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392

Differential Revision: D13219698

Pulled By: eellison

fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6
2018-11-27 18:38:11 -08:00
David Riazati
1b80644b4d Revert D13192228: [pytorch][PR] [jit] Add boolean dispatch for function overloading
Differential Revision:
D13192228

Original commit changeset: fce33c400c1f

fbshipit-source-id: 75c9991dc7097f9513c6c89d16eff2de6e287c3b
2018-11-27 13:14:42 -08:00
David Riazati
66c8bbf021 Add boolean dispatch for function overloading (#14081)
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See `max_pool1d` for an example usage.

This is the first step in enabling the use of `max_pool` functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.

Depends on #14232 for `Optional[BroadcastingList[T]]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14081

Differential Revision: D13192228

Pulled By: driazati

fbshipit-source-id: fce33c400c1fd06e59747d98507c5fdcd8d4c113
2018-11-27 10:51:32 -08:00
Wanchao Liang
7fc34a4122 Convert gumbel_softmax, lp pooling weak functions and modules (#14232)
Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232

Differential Revision: D13164506

Pulled By: wanchaol

fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700
2018-11-21 23:44:24 -08:00
David Riazati
d9cdcc9a3b Add list inequality operator (#14129)
Summary:
This PR adds `aten::neq` for list inequality comparisons and converts
`nll_loss` to weak script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14129

Differential Revision: D13123894

Pulled By: driazati

fbshipit-source-id: 8c1edf7c163217ec00eb653f95d196db3998613f
2018-11-21 16:32:58 -08:00
David Riazati
8f20d40bb7 Allow undefined tensors as constants (#14120)
Summary:
This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`:

```python
torch.jit.script
def fn(x=None):
    # type: (Optional[Tensor]) -> Tensor
    return torch.jit._unwrap_optional(x)

torch.jit.script
def fn2():
    # type: () -> Tensor
    return fn()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120

Differential Revision: D13124625

Pulled By: driazati

fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569
2018-11-20 16:54:27 -08:00
Wanchao Liang
d6bfc53b9e Export BatchNorm functional and module, add necessary JIT support (#14016)
Summary:
This PR did three things:

1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016

Differential Revision: D13112064

Pulled By: wanchaol

fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91
2018-11-20 14:15:06 -08:00
David Riazati
0d29846d5e Convert more weak functions (#14003)
Summary:
Same deal as #13707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14003

Differential Revision: D13076403

Pulled By: driazati

fbshipit-source-id: eb3cb3b2c31caf1de591b613bdc4c9a6ed4e1767
2018-11-15 16:45:50 -08:00
Wanchao Liang
6d094224b9 Fix optional import/export, export multi-margin-loss (#13877)
Summary:
This PR did two thing:

1. it fix the optional import/export to include any type including tensor types (previously we only support base types), this is essential to unblock optional tensor type annotation in our test logic
2. it tries to export mult_margin_loss functional to serve as a example of optional undefined tensor use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13877

Differential Revision: D13076090

Pulled By: wanchaol

fbshipit-source-id: c9597295efc8cf4b6462f99a93709aae8dcc0df8
2018-11-15 00:45:22 -08:00
Xiang Gao
143ba72264 Move cosine_similarity to ATen (#12199)
Summary:
I'm now traveling and don't have access to a good computer to compile test by myself. Will see the outcome of CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12199

Differential Revision: D13062326

Pulled By: nairbv

fbshipit-source-id: 85873525caa94906ccaf2c739eb4cd55a72a4ffd
2018-11-14 10:41:44 -08:00
David Riazati
5163a28917 Convert more weak functions (#13707)
Summary:
Convert some more functions to match up with features added. Some
conversions were unsuccessful but the type line was left in for later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13707

Differential Revision: D13030210

Pulled By: driazati

fbshipit-source-id: 02d5712779b83b7f18d0d55539e336321335e0cc
2018-11-13 13:50:57 -08:00
David Riazati
0c375571f5 Support OptionalType export and type match (#13647)
Summary:
* Adds `OptionalType` support for import/export
    * Optionals get exported along with their contained type, i.e. 'Optional[int]'
* Allows concrete types and `None` to be passed to an op that takes an optional
* Converts `softmax`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13647

Differential Revision: D12954672

Pulled By: driazati

fbshipit-source-id: 159e9bfb7f3e398bec3912d414c393098cc7455a
2018-11-12 12:15:25 -08:00
Wanchao Liang
79ceecec8e Optional undefined tensor support (#13650)
Summary:
This PR is a part of task to unblock standard library export.
* we treat None differently from Tensor and other types, when passing None as Tensor, it's an undefined tensor rather than the None IValue.
* Refine the type system so that we have correct tensor types hierarchy (Dynamic/Tensor/CompleteTensor), Dynamic should be at the top of the inheritance hierarchy.
* It also tries to export bilinear as an example of undefined tensor(None) input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13650

Differential Revision: D12967026

Pulled By: wanchaol

fbshipit-source-id: 6aedccc7ce2a12fadd13d9e620c03e1260103a5a
2018-11-09 11:29:57 -08:00
Dan Zheng
51f58f0990 Fix typo in CTC loss doc comments. (#13727)
Summary:
`target_lenghts` -> `target_lengths`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13727

Differential Revision: D12981582

Pulled By: zou3519

fbshipit-source-id: e5e02b26cf3030a91494655ff863273333cc4133
2018-11-08 14:50:48 -08:00
David Riazati
556ff8e7b7 Add builtins for size() and list with defaults (#13639)
Summary:
* `aten::size()` to match `torch.Tensor.size`
* `aten::list_with_default` for semantics of `torch.nn.modules.utils.list_with_default`
* converts `adaptive_avg_pool2d` and `adaptive_avg_pool3d`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13639

Differential Revision: D12954670

Pulled By: driazati

fbshipit-source-id: 68c30af0efc02c60af5fb8c9715b2435cc01a0d9
2018-11-08 11:26:35 -08:00
David Riazati
4472ad3b2f Move functional _Reduction to its own module (#13401)
Summary:
To support `_Reduction` in the jit this PR moves it out to a new file so that it goes through the paths for python modules in the script compiler and converts `F.ctc_loss` to weak script

Depends on #13484 for saving rng state
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13401

Differential Revision: D12868501

Pulled By: driazati

fbshipit-source-id: 23cec0fb135744578c73e31ac825e238db495d27
2018-11-08 01:04:10 -08:00
Gregory Chanan
7341ab0a33 Fix range of target examples and JIT test case for CTC loss.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13644

Differential Revision: D12949733

Pulled By: gchanan

fbshipit-source-id: 1c4cacbb6a50d5002165bdd0a7881883db5c8249
2018-11-07 07:04:31 -08:00
David Riazati
fc6a9a19ea Add torch._C._nn built-in, more weak fns (#13322)
Summary:
This PR adds functions defined in `torch._C._nn` as builtin functions (including inplace variants). This allows for the conversion of more functions to weak script

NB: many `torch.nn.functional` functions will have to be slightly rewritten to avoid early returns (as with `threshold` in this PR)

Converts these functions to weak script:
* `threshold`
* `relu`
* `hardtanh`
* `relu6`
* `elu`
* `selu`
* `celu`
* `leaky_relu`
* `rrelu`
* `tanh`
* `sigmoid`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13322

Differential Revision: D12852203

Pulled By: driazati

fbshipit-source-id: 220670df32cb1ff39d120bdc04aa1bd41209c809
2018-11-05 21:02:18 -08:00
David Riazati
1969898647 Convert functional dropouts to weak script (#13484)
Summary:
To convert `nn.functional.dropout`
* `_VF` had to be exposed as a Python module so this PR adds a module class to forward to `torch._C._VariableFunctions`
* rng state between calls in the tests needed to be made consistent
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13484

Differential Revision: D12929622

Pulled By: driazati

fbshipit-source-id: 78b455db9c8856b94d2dda573fb7dc74d5784f56
2018-11-05 17:13:07 -08:00
Sam Gross
98f5c005da Speed up CPU threshold and relu implementation (#13182)
Summary:
```
The previous threshold implementation was not vectorized or parallelized.
This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms

CPU timings:
https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8

1 thread (before vs. after)
10240:  17.4 us vs. 6.9 µs per loop
102400: 141 us vs. 39.8 µs per loop

16 threads (before vs. after)
10240:  17.4 us vs. 6.7 µs per loop
102400: 141 us vs. 14.3 µs per loop

CUDA timings are not measurably different.

[1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions
https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182

Reviewed By: soumith

Differential Revision: D12825105

Pulled By: colesbury

fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15
2018-11-05 12:51:29 -08:00
Tongzhou Wang
99a5d19591 Rename elementwise_mean to mean (#13419)
Summary:
Closes #12459
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419

Differential Revision: D12883299

Pulled By: SsnL

fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7
2018-11-01 10:31:26 -07:00
Ailing Zhang
488d393ea6 Fix pointwise loss broadcast (#12996)
Summary: Fixes #12129 , #12327

Differential Revision: D10513781

Pulled By: ailzhang

fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec
2018-10-31 10:17:25 -07:00
Michael Suo
d2659f6689 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13346

Differential Revision: D12850686

Pulled By: michaelsuo

fbshipit-source-id: b7474d0a3f3347034592bef45125610c040cff6a
2018-10-30 16:22:58 -07:00
verhoek
0db505bf27 Made docstrings for Embedding more accurate. (#13310)
Summary:
Made the previous description for max_norm more precise, avoiding 'this' and describing what actually happens in the code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13310

Differential Revision: D12840813

Pulled By: SsnL

fbshipit-source-id: 98090c884267a62ce93cd85da84252d46926dfa5
2018-10-30 12:25:38 -07:00
Jason Gauci
5b15a501da Refactor & unit test feed predictor
Summary:
1. Refactor DDPG predictor.  Merge the critic predictor with ParametricDQNPredictor since they are the same
2. Fix bug where loss was multiplied by the batch size
3. Create DDPGFeedPredictor which uses the feed predictor output format
4. Add support for gridworld simulation memoization to DDPG.  Also memoize normalization tables.

Reviewed By: kittipatv

Differential Revision: D10161240

fbshipit-source-id: 2813890043de1241c1fb9b9c2b6a897403f9fc12
2018-10-30 10:27:47 -07:00
William Horton
1bec8f773b Move ConstantPadNd into ATen (#10885)
Summary:
Addresses #9499. Completed work on the forward function, tests should be passing for that. Working on backward function now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10885

Differential Revision: D9643786

Pulled By: SsnL

fbshipit-source-id: 2930d6f3d2975c45b2ba7042c55773cbdc8fa3ac
2018-10-26 15:25:27 -07:00
David Riazati
14ea4bf0d1 Make 7 nn modules into weak modules (#12966)
Summary:
Depends on #12682 ([stacked diff](https://github.com/driazati/pytorch/compare/weak_mod...driazati:mod_conv1))

* Adds tests for weak module conversion that creates a `ScriptModule` that uses the weak module and checks its graph
* Adds `torch._jit_internal.weak_module` tags to modules that already work
  * `Sigmoid`
  * `Tanh`
  * `Hardshrink`
  * `PReLU`
  * `Softsign`
  * `Tanhshrink`
  * `PairwiseDistance`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12966

Differential Revision: D10559557

Pulled By: driazati

fbshipit-source-id: dc4bea3aa744b3c44d4fa7dceefd97e951f824d0
2018-10-25 13:59:34 -07:00
Thomas Viehmann
dd823ccd28 small improvements to torch.nn.normalization docs (#12936)
Summary:
Based on a [discussion at the forums](https://discuss.pytorch.org/t/question-about-functional-normalize-and-torch-norm/27755), it might be worthwhile to clarify the documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12936

Differential Revision: D10502139

Pulled By: ezyang

fbshipit-source-id: 480c3c367f8c685dcde107b3018cb4129032322d
2018-10-22 23:14:47 -07:00
David Riazati
1e8064dec0 Convert 2 nn.functional functions to weak script (#12723)
Summary:
* Moves `weak_script` annotation to `torch/_jit_internal.py` folder to resolve dependency issue between `torch.jit` and `torch.nn`
* Add `torch._jit.weak_script` to `tanhshrink` and `softsign`, their tests now pass instead of giving an `unknown builtin op` error
* Blacklist converted `torch.nn.functional` functions from appearing in the builtin op list if they don't actually have corresponding `aten` ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12723

Differential Revision: D10452986

Pulled By: driazati

fbshipit-source-id: c7842bc2d3ba0aaf7ca6e1e228523dbed3d63c36
2018-10-21 14:09:55 -07:00
Thomas Viehmann
0521c47c91 Amend nondeterminism notes (#12217)
Summary:
include atomicAdd commentary as this is less well known

There is some discussion in #12207

Unfortunately, I cannot seem to get the ..include working in `_tensor_docs.py` and `_torch_docs.py`. I could use a hint for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12217

Differential Revision: D10419739

Pulled By: SsnL

fbshipit-source-id: eecd04fb7486bd9c6ee64cd34859d61a0a97ec4e
2018-10-16 23:59:26 -07:00
Tongzhou Wang
ac994f2c78 Fix SpectralNorm with DataParallel (#12671)
Summary:
There were two problems with SN + DP:

1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost.
2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work.

Fixes are:
1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained
2. Do not call `detach_`.
3. Added comments in SN about the subtlety.
4. Added a note to the DP doc on this particular behavior of DP.

cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu

Fixes https://github.com/pytorch/pytorch/issues/11476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671

Differential Revision: D10410232

Pulled By: SsnL

fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9
2018-10-16 16:02:17 -07:00
Ailing Zhang
e15501fb68 fix bce_with_logits with legacy reduce (#12689)
Summary:
Fix #12624 . internal usecase of legacy `reduce`.
Add test in test_nn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689

Reviewed By: ezyang

Differential Revision: D10391195

Pulled By: ailzhang

fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1
2018-10-16 09:46:58 -07:00
Natalia Gimelshein
a98958d3bd dtype option for softmax (#11719)
Summary:
Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719

Reviewed By: ezyang

Differential Revision: D10175514

Pulled By: zou3519

fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243
2018-10-13 17:57:10 -07:00
Ailing Zhang
5317429e82 move bceWithLogits from python to Aten (#11054)
Summary:
Fixes #10648 .
Perf comparison:
```
import torch
import torch.nn as nn
import time

def bm(testsize, repeat=100, cuda=False):
    total_time = 0.0
    pos_weight= torch.ones(testsize[1], device='cuda' if cuda else 'cpu') / testsize[1]
    # loss = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
    loss = nn.BCEWithLogitsLoss()
    input = torch.randn(testsize, device='cuda' if cuda else 'cpu').clamp_(2.8e-2, 1 - 2.8e-2)
    target = torch.randn(testsize, device='cuda' if cuda else 'cpu').gt(0).float()
    input.requires_grad = True
    target.requires_grad = True
    for _ in range(repeat):
        start = time.time()
        l = loss(input, target)
        l.backward()
        # print(target.grad)
        end = time.time()
        total_time += end - start
    return total_time

for cuda in [False, True]:
    for testsize in [(100, 100), (1000, 1000), (2000, 2000)]:
        # print(testsize, cuda)
        print('{:.5f}'.format(bm(testsize, cuda=cuda)))
```
|    | Python CPU | Aten CPU | Python GPU | Aten GPU
| ------------- | ------------- | ------------- | ------------- | ------------- |
| (100, 100)  | 0.15813s | 0.10890s | 0.14601s | 0.07070s |
| (1000, 1000)  | 1.74051s | 0.95038s | 0.15158s | 0.10153s |
| (2000, 2000) | 5.36515s | 2.46996s | 0.31322s | 0.200941s |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11054

Differential Revision: D9728289

Pulled By: ailzhang

fbshipit-source-id: b7c5bc50635f8cc63c317caa4321e32f7df860f8
2018-10-12 11:13:33 -07:00
Wei Yang
de11fe0c83 migrate PReLU to ATen (#11758)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10723
- migrate PReLU to ATen and deprecate legacy PReLU
- performance:

CPU with weight.numel() = 1
```
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
100 loops, best of 100: 9.43 ms per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
10 loops, best of 100: 24.4 ms per loop

>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 695 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.47 ms per loop
```

CPU with weight.numel() = channels
```
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 603 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 13.3 ms per loop

>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 655 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.45 ms per loop
```

CUDA with weight.numel() = 1
```
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 187 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.01 ms per loop

>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 195 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.28 ms per loop
```

CUDA with weight.numel() = channel
```
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 174 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.27 ms per loop

>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 181 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.26 ms per loop
```

The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels.

ezyang SsnL zou3519  soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758

Differential Revision: D9995799

Pulled By: weiyangfb

fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e
2018-09-21 16:26:04 -07:00
Marc Ferradou
e734c94fa2 Quick update to embedding_bag doc (#11784)
Summary:
Related to #11624 adding maxes to the function def of embedding_bag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11784

Differential Revision: D9892598

Pulled By: ezyang

fbshipit-source-id: e6372ccf631826ddf1e1885b2f8f75f354a36c0b
2018-09-17 23:56:05 -07:00
Gao, Xiang
513fd3dd36 Improve doc of torch.nn.functional.pad (#11623)
Summary:
I'm reading the doc of `torch.nn.functional.pad` and it looks a bit confusing to me. Hopefully this PR makes it clearer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11623

Differential Revision: D9818255

Pulled By: soumith

fbshipit-source-id: 4f6b17b0211c6927007f44bfdf42df5f84d47536
2018-09-13 19:25:24 -07:00
Tongzhou Wang
760679352e Move Pixel Shuffle to ATen (#9721)
Summary:
<del>#9692 </del>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9721

Differential Revision: D8955829

Pulled By: SsnL

fbshipit-source-id: 4f4d1c7720b6f757fbef9a10f70209ae76f61399
2018-09-13 18:25:48 -07:00
Marc Ferradou
f129da1a47 Add max to the ValueError for EmbeddingBag mode check (#11655)
Summary:
Related to #11624
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11655

Differential Revision: D9815454

Pulled By: SsnL

fbshipit-source-id: 8dd82e0c0aa68362e12b301e095a85af7d7fd71a
2018-09-13 14:39:40 -07:00
Roy Li
75f49befeb move instance_norm to aten (#10792)
Summary:
This also removes the usage of torch.onnx.symbolic_override in instance_norm. Fixes #8439.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10792

Differential Revision: D9800643

Pulled By: li-roy

fbshipit-source-id: fa13a57de5a31fbfa2d4d02639d214c867b9e1f1
2018-09-13 12:26:22 -07:00
Rasmus Diederichsen
35348dab10 WIP: Include note on cudnn determinism in each function backed by cudnn (#11434)
Summary:
Ping ezyang
This addresses your comment in #114. Strangely, when running the doc build (`make html`) none of my changes are actually showing, could you point out what I'm doing wrong?

Once #11329 is merged it might make sense to link to the reproducibility note everywhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11434

Differential Revision: D9751208

Pulled By: ezyang

fbshipit-source-id: cc672472449564ff099323c39603e8ff2b2d35c9
2018-09-11 20:27:09 -07:00
Peter Goldsborough
d95fedb436 Use ATen dropout implementation in Dropout module and add FeatureDropout (#11458)
Summary:
This PR does two things:
1. Replaces the implementation of the `Dropout` module with a call to the ATen function,
2. Replaces `Dropout2d` with a new `FeatureDropout` module that shall take the place of `Dropout2d` and `Dropout3d`. I contemplated calling it `Dropout2d` and making `Dropout3d` an alias for it, but similar to our decision for `BatchNorm{1,2,3}d` (c.f. https://github.com/pytorch/pytorch/pull/9188), we can deviate from Python PyTorch in favor of the ideal-world solution, which is to have a single module, since both actually just call `feature_dropout`.

I also replaced the implementation of `dropout3d`  with a call to `dropout2d` in Python. The code is the same and it's easier for developers to parse than having to manually match the tokens to make sure it's really 100% the same code (which it is, if I matched the tokens correctly).

ebetica ezyang SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11458

Differential Revision: D9756603

Pulled By: goldsborough

fbshipit-source-id: fe847cd2cda2b6da8b06779255d76e32a974807c
2018-09-11 20:16:12 -07:00
Tongzhou Wang
de460c7ad3 Improvements on conv/pool/fold/stft/ParamDict docs (#11106)
Summary:
Also fixes some incorrect formula rendering.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106

Differential Revision: D9752433

Pulled By: SsnL

fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21
2018-09-11 08:56:21 -07:00
Wei Yang
425ea6b31e fix doc for functional.dropout* (#10417)
Summary:
- fixes #4177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10417

Differential Revision: D9542876

Pulled By: weiyangfb

fbshipit-source-id: 480ed973d1fe0364f4acb5cd596c2031895b82df
2018-09-05 17:26:00 -07:00
Erik Brinkman
611a608517 Add ATen pdist CPU kernel (#10782)
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782

Reviewed By: ezyang

Differential Revision: D9583378

Pulled By: erikbrinkman

fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
2018-08-30 11:55:27 -07:00
Roy Li
f2bb9f0bb5 speed up kl div loss (#10336)
Summary:
Moved kl div loss to aten.

benchmarks for 5000 iterations on input size (1000,100)

New
```
cuda:
forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006]
input requires_grad=True:
backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155]
double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918]
target requires_grad=True:
backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101]
double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982]

cpu:
forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648]
input requires_grad=True:
backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492]
double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031]
target requires_grad=True:
backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143]
double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685]
```

Old
```
cuda:
forward [3.101281268056482, 3.068499860819429, 3.0527669726870954]
input requires_grad=True:
backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839]
double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133]
target requires_grad=True:
backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282]
double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176]

cpu:
forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352]
input requires_grad=True:
backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552]
double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931]
target requires_grad=True:
backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046]
double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10336

Differential Revision: D9213636

Pulled By: li-roy

fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b
2018-08-27 16:10:59 -07:00
Tongzhou Wang
d043f83019 Add tests for Tensor.* nn.* F.* docs (#10311)
Summary:
Test only for existence for now. I had to skip a lot of them so there a FIXME in the test.

Also I'm not testing torch.* because of namespace issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311

Differential Revision: D9196341

Pulled By: SsnL

fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc
2018-08-14 11:39:46 -07:00
Adam Paszke
adbcb3c1dc Move dropout and alpha dropout to ATen (#10384)
Summary:
zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10384

Reviewed By: ezyang

Differential Revision: D9272583

Pulled By: apaszke

fbshipit-source-id: ed5d37b28ce9ff25800bbaa0daf066cfbf1f9921
2018-08-10 14:55:28 -07:00
Tongzhou Wang
6a55238a3f Grid sampler: nearest interpolation & reflection padding (#10051)
Summary:
closes #9702 .

cc jph00

Commit structure:

1. Change the index calculation logic. I will explain using 1-D for simplicity.

	Previously we have (in pseudo code):

	```
	// 1. get the float locations from grid
	scalar_t x = from_grid()

	// 2. find the integral surrounding indices
	int x_left = floor(x)
	int x_right = x_left + 1

	// 3. calculate the linear interpolate weights
	scalar_t w_left = x_right - x
	scalar_t w_right = x - x_left

	// 4. manipulate the integral surrounding indices if needed
	// (e.g., clip for border padding_mode)
	x_left = manipulate(x_left, padding_mode)
	x_right = manipulate(x_right, padding_mode)

	// 5. interpolate
	output_val = interpolate(w_left, w_right, x_left, x_right)
	```

	This is actually incorrect (and also unintuitive) because it calculates the
	weights before manipulate out-of-boundary indices. Fortunately, this
	isn't manifested in both of the current supported modes, `'zeros'` and
	`'border'` padding:

	+ `'zeros'`: doesn't clip
	+ `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are
	  clipped to the same value, so weights don't matter

	But this is a problem with reflection padding, since after each time we reflect,
	the values of `w_left` and `w_right` should be swapped.

	So in this commit I change the algorithm to (numbers corresponding to the
        ordering in the above pseudo-code)

	```
	1. get float location
	4. clip the float location
	2. find the integral surrounding indices
	3. calculate the linear interpolate weights
	```

	In the backward, because of this change, I need to add new variables to track
	`d manipulate_output / d manipulate_input`, which is basically a multiplier
	on the gradient calculated for `grid`. From benchmarking this addition doesn't
	cause obvious slow downs.

2. Implement reflection padding. The indices will keep being reflected until
	they become within boundary.

	Added variant of `clip_coordinates` and `reflect_coordinates` to be used in
	backward. E.g.,
	```cpp
	// clip_coordinates_set_grad works similarly to clip_coordinates except that
	// it also returns the `d output / d input` via pointer argument `grad_in`.
	// This is useful in the backward pass of grid_sampler.
	scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t *grad_in)
	```
	For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`.
	If `in` is reflected **odd** times in `'reflection'` mode, `grad_in`
	is set to `-1`.

3. Implement nearest interpolation.

4. Add test cases

5. Add better input checking
  Discussed with goldsborough for moving `operator<<` of `at::Device`,
  `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise
  `AT_CHECK` can't find them.)

6. Support empty tensors. cc gchanan

    + Make empty tensors not acceptable by cudnn.
    + Add `AT_ASSERT(kernel block size  > 0)` if using `GET_BLOCKS`
   + Cache `numel` in `TensorGeometry`
      I was going to use `numel` to test if cudnn descriptor should accept a
      tensor, but it isn't used eventually. I can revert this if needed.

7. Add more test cases, including on input checking and empty tensors

8. Remove an obsolete comment

9. Update docs. Manually tested by generating docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051

Differential Revision: D9123950

Pulled By: SsnL

fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6
2018-08-10 12:43:27 -07:00
Wei Yang
149d4f776b use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965)
Summary:
- fixes #9141, #9301
- use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion)
- return (N) instead of (N, C) to match the same behavior as MultiMarginLoss
- Note that with this PR, the following behavior is expected:
```
loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none')
loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean')
loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum')

loss.sum() == loss_sum  # True
loss.mean() == loss_mean  # True
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965

Differential Revision: D9038402

Pulled By: weiyangfb

fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9
2018-08-03 17:54:19 -07:00
Rob Kunkle
6e85112f12 Adding katex rendering of equations, and required edits to equations. (#8848)
Summary:
This fixes issue #8529.

- Adds Katex extension to conf.py and requirements.txt
- Fixes syntax differences in docs
- Should allow documentation pages to render faster
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8848

Reviewed By: soumith

Differential Revision: D8677702

Pulled By: goodlux

fbshipit-source-id: c4a832c5879e0eebcb14763b35a41663331ba23f
2018-08-02 12:25:17 -07:00
Xiang Gao
6fc75eadf0 Add CELU activation to pytorch (#8551)
Summary:
Also fuse input scale multiplication into ELU

Paper:
https://arxiv.org/pdf/1704.07483.pdf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551

Differential Revision: D9088477

Pulled By: SsnL

fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b
2018-08-01 07:54:44 -07:00
Kyle M. Tarplee
aae37324cc fixed a newly introduced regression in softmax (#10066)
Summary:
There is a regression in softmin in 0.4.1 that was not present in 0.4.0.  The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x).  These are not the same.  The fix is trivial because the bug is due to operator precedence.

This is a major regression that broke my training.  I'm not sure how a unit test did not catch this.

```
x = torch.tensor([1, 2, 3.5, 4])
print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0
print(F.softmax(-x, dim=0)) # this is what softmax should be
print(F.softmax(x, dim=0))
print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly
```
In 0.4.1 this produces
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
tensor([0.6668, 0.2453, 0.0547, 0.0332])
tensor([0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])

In 0.4.0 this produces the correct values
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.0278,  0.0755,  0.3385,  0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066

Differential Revision: D9106995

Pulled By: soumith

fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b
2018-07-31 19:28:30 -07:00
Roy Li
2422801625 fix _pointwise_loss for target gradients (#10018)
Summary:
_pointwise loss has some python special casing, we converted reduction to aten enums too early.

fixes #10009
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018

Differential Revision: D9075489

Pulled By: li-roy

fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162
2018-07-31 13:39:58 -07:00
Thomas Viehmann
685224aa14 Add CTC loss (#9628)
Summary:
The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the
modification that is is in log space.
The there also is a binding for the (much faster) CuDNN implementation.

This could eventually fix #3420

I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments.

I could use feedback on all sorts of things, including:
- Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors)
- Input convention. I use log probs because that is what the gradients are for.
- Launch parameters for the kernels
- Errors and obmissions and anything else I'm not even aware of.

Thank you for looking!

In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this).
I have read CuDNN is much faster than implementations because it does *not* use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step.
Average timings for the kernels from nvprof for some size:

```
CuDNN:
60.464us compute_alphas_and_betas
16.755us compute_grads_deterministic
Cuda:
121.06us ctc_loss_backward_collect_gpu_kernel (= grads)
109.88us ctc_loss_gpu_kernel (= alphas)
98.517us ctc_loss_backward_betas_gpu_kernel (= betas)
WarpCTC:
299.74us compute_betas_and_grad_kernel
66.977us compute_alpha_kernel
```

Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations.

Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably.

My performance measuring testing script:
```
import timeit
import sys
import torch
num_labels = 10
target_length  = 30
input_length = 50
eps = 1e-5
BLANK = 0#num_labels
batch_size = 16

torch.manual_seed(5)
activations = torch.randn(input_length, batch_size, num_labels + 1)
log_probs = torch.log_softmax(activations, 2)
probs = torch.exp(log_probs)
targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long)
targets_2d = targets.view(batch_size, target_length)
target_lengths = torch.tensor(batch_size*[target_length])
input_lengths = torch.tensor(batch_size*[input_length])
activations = log_probs.detach()

def time_cuda_ctc_loss(grout, *args):
    torch.cuda.synchronize()
    culo, culog_alpha = torch._ctc_loss(*args)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

def time_cudnn_ctc_loss(groupt, *args):
    torch.cuda.synchronize()
    culo, cugra= torch._cudnn_ctc_loss(*args)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

def time_warp_ctc_loss(grout, *args):
    torch.cuda.synchronize()
    culo = warpctc.ctc_loss(*args, blank_label=BLANK, size_average=False, length_average=False, reduce=False)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

if sys.argv[1] == 'cuda':
    lpcu = log_probs.float().cuda().detach().requires_grad_()
    args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK]
    grout = lpcu.new_ones((batch_size,))
    torch.cuda.synchronize()
    print(timeit.repeat("time_cuda_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'cudnn':
    lpcu = log_probs.float().cuda().detach().requires_grad_()
    args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True]
    grout = lpcu.new_ones((batch_size,))
    torch.cuda.synchronize()
    print(timeit.repeat("time_cudnn_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'warpctc':
    import warpctc
    activations = activations.cuda().detach().requires_grad_()
    args = [activations, input_lengths.int(), targets.int(), target_lengths.int()]
    grout = activations.new_ones((batch_size,), device='cpu')
    torch.cuda.synchronize()

    print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals()))
```
I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628

Differential Revision: D8952453

Pulled By: ezyang

fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860
2018-07-31 11:09:48 -07:00
Adam Paszke
aa7af94656 Make JIT tracing a thread-local property (#9414)
Summary:
As in the title. Lets us simplify a lot of code.

Depends on #9363, so please review only the last commit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414

Reviewed By: zdevito

Differential Revision: D8836496

Pulled By: apaszke

fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3
2018-07-19 19:09:39 -07:00
tippisum
5c695e3a60 Implement 2D and 3D alpha_dropout (#9073)
Summary:
It implements per-channel alpha_dropout. It also creates corresponding function classes and unifies the process of dropout and alpha_dropout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9073

Differential Revision: D8727008

Pulled By: ezyang

fbshipit-source-id: 9d509f9c5db4e98f7b698cdfc4443505a4d2b331
2018-07-17 17:10:16 -07:00
Roy Li
a47a30b9ce Implement grid_sampler in aten (#8929)
Summary:
Partially addresses #8928.

Maybe #7273?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8929

Reviewed By: ezyang

Differential Revision: D8668919

Pulled By: li-roy

fbshipit-source-id: 8ad07b224d2ab211c274c4c10f042501efaae32c
2018-07-10 15:10:24 -07:00
Tongzhou Wang
e8536c08a1 Update extension docs, fix Fold/Unfold docs (#9239)
Summary:
Commits:
1. In extension doc, get rid of all references of `Variable` s (Closes #6947 )
    + also add minor improvements
    + also added a section with links to cpp extension :) goldsborough
    + removed mentions of `autograd.Function.requires_grad` as it's not used anywhere and hardcoded to `return_Py_True`.
2. Fix several sphinx warnings
3. Change `*` in equations in `module/conv.py` to `\times`
4. Fix docs for `Fold` and `Unfold`.
    + Added better shape check for `Fold` (it previously may give bogus result when there are not enough blocks). Added test for the checks.
5. Fix doc saying `trtrs` not available for CUDA (#9247 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9239

Reviewed By: soumith

Differential Revision: D8762492

Pulled By: SsnL

fbshipit-source-id: 13cd91128981a94493d5efdf250c40465f84346a
2018-07-08 19:09:39 -07:00
Ailing Zhang
227c8f2654 Implement nn.functional.interpolate based on upsample. (#8591)
Summary:
This PR addresses #5823.

* fix docstring: upsample doesn't support LongTensor

* Enable float scale up & down sampling for linear/bilinear/trilinear modes. (following SsnL 's commit)

* Enable float scale up & down sampling for nearest mode. Note that our implementation is slightly different from TF that there's actually no "align_corners" concept in this mode.

* Add a new interpolate function API to replace upsample. Add deprecate warning for upsample.

* Add an area mode which is essentially Adaptive_average_pooling into resize_image.

* Add test cases for interpolate in test_nn.py

* Add a few comments to help understand *linear interpolation code.

* There is only "*cubic" mode missing in resize_images API which is pretty useful in practice. And it's labeled as hackamonth here #1552. I discussed with SsnL that we probably want to implement all new ops in ATen instead of THNN/THCUNN. Depending on the priority, I could either put it in my queue or leave it for a HAMer.

* After the change, the files named as *Upsampling*.c works for both up/down sampling. I could rename the files if needed.

Differential Revision: D8729635

Pulled By: ailzhang

fbshipit-source-id: a98dc5e1f587fce17606b5764db695366a6bb56b
2018-07-06 15:28:11 -07:00
Tongzhou Wang
7b25cbbef9 Test nn.Module on non-contiguous inputs (#9114)
Summary:
1. Let `ModuleTest` raise when they fail on non-contiguous inputs. Fix legacy modules.
2. Fix BN (both THNN and cuDNN) not working on non-contiguous inputs.
3. Fix CUDA EmbeddingBag not working on non-contiguous inputs. To prevent calling `.contiguous()` on in both `forward` and `backward`,
  a. prefix all current `embedding_bag*` functions with `_`, indicating that they require input to be contiguous (there is a check in each function).
  b. create `embedding_bag`, which makes input arguments `.contiguous()`, and calls `_embedding_bag`
3. Make many ATen `embedding*` functions to work on non-contiguous inputs so we don't need to call `input = input.contiguous()` in Python `nn.functional.embedding`.
4. Fix dense-sparse addition when the sparse input is not coalesced and indices or values tensor is not contiguous. This came up in the test cases of Embedding modules with `sparse=True`. Added tests.
5. Update `TensorUtils.cpp` to use `AT_*` macros.

Request:
review from cpuhrsch on the `Embedding*` changes.
review from ezyang on ATen sparse & BN changes.
Closes https://github.com/pytorch/pytorch/pull/9114

Differential Revision: D8717299

Pulled By: SsnL

fbshipit-source-id: 0acc6f1c9522b5b605361e75112c16bbe1e98527
2018-07-05 21:09:34 -07:00
Roy Li
21c786071b update nn loss tests to use new reduction arg (#9118)
Summary:
The tests were using the old args, which caused them to emit a lot of deprecation warnings.

closes #9103.

Reviewed By: ezyang

Differential Revision: D8720581

Pulled By: li-roy

fbshipit-source-id: 3b79527f6fe862fb48b99a6394e8d7b89fc7a8c8
2018-07-02 19:41:57 -07:00
Wei Yang
cb1bfe91af Deprecated several functions at torch.nn.functional (#8748)
Summary:
1. fixes #6245
2. deprecated tanh, sigmoid
Closes https://github.com/pytorch/pytorch/pull/8748

Differential Revision: D8697975

Pulled By: weiyangfb

fbshipit-source-id: f30714aa0611a1fe870040692f3dbcc8238aece9
2018-07-02 15:54:46 -07:00
Roy Li
c61f0217a5 combine size_average and reduce args in loss functions (#8018)
Summary:
closes #7929
Closes https://github.com/pytorch/pytorch/pull/8018

Differential Revision: D8682540

Pulled By: li-roy

fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936
2018-07-01 05:39:00 -07:00
Peter Goldsborough
f0772c0ab2 Replace max_pool with max_pool_with_indices (#8946)
Summary:
Re-push from https://github.com/pytorch/pytorch/pull/8892
Closes https://github.com/pytorch/pytorch/pull/8946

Differential Revision: D8666862

Pulled By: goldsborough

fbshipit-source-id: 44cd3d63d347316818a7b0f5f89fce8ff7486736
2018-06-28 16:10:08 -07:00
Orion Reblitz-Richardson
9ec0a2aef4 fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af 2018-06-27 04:50:56 -07:00
Peter Goldsborough
290d20b094
Replace max_pool with max_pool_with_indices (#8892)
* Create max_poolXd_with_indices

* Match ATen names in ONNX symbolic
2018-06-26 17:09:30 -07:00
Vadim Velikodniy
6e28d4d364 Add pos_weight argument to nn.BCEWithLogitsLoss (#5660) (#6856)
* Add pos_weight argument to nn.BCEWithLogitsLoss and F.binary_cross_entropy_with_logits (#5660)
- Add an option to control precision/recall in imbalanced datasets
- Add tests (but new_criterion_tests)

* Move pos_weight to the end of args list in the documentation.

`pos_weight` was moved to the end because it is the last argument in both
`nn.BCEWithLogitsLoss` and `binary_cross_entropy_with_logits`
2018-06-26 12:31:07 -04:00
Peter Goldsborough
8e98a1a84d
Create avg_pool1d in ATen (#8880)
* Create avg_pool1d in ATen

* Put function name into check1d method
2018-06-25 20:31:32 -07:00
li-roy
85f4d2b55a throw error when grid_sample is passed unsupported mode (#8884) 2018-06-25 22:37:41 -04:00
Tongzhou Wang
731273b8d6 Improve convT output_padding docs (#8825)
* improve output_padding doc for convT modules

* Update functional.py

* Update conv.py

* lint
2018-06-23 14:33:18 -04:00
Ailing
ddda7cfea5 allow output_size to contain None in adaptive pooling methods (#8596)
* allow output_size to contain None in adaptive pooling methods

* fix lint

* address comments
2018-06-22 13:29:15 -04:00
Thomas Viehmann
0ae8b6c027 add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc (#8600)
* add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc

and a few drive-by doc fixes

* typo
2018-06-18 09:36:42 -04:00
Wei Yang
ae55865a3b Migrated hardshrink() to ATen and deprecated nn.Hardshrink() (#8117)
* 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet

* optimized memory read/write

* 1. pass in lambd as scalar for CPU/CUDA_apply*; 2. removed tests for hardshrink at test_legacy_nn

* fixes test_utils

* 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda

* 1. printing lambd value; 2. default lambd=0.5 is still failing

* getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py

* cleaned up debug printf
2018-06-14 16:42:20 -04:00
Tongzhou Wang
a77b391de7 [SpectralNorm] don't register original weight as buffer (#8170)
* don't register original weight as buffer; fixes for buffers that require grad

* add test
2018-06-12 14:42:05 -04:00
Tongzhou Wang
f9926e4ce5 Fix EmbeddingBag max_norm option (#7959)
* fix EmbeddingBag max_norm option

* flake8

* add warning to the embedding bag arg change
2018-05-31 09:42:56 -04:00
Vedaanta Agarwalla
215fe057ea No Default argument to max_unpool functions (Fixes #7327) (#7388)
* Fix for Issue #7327

* Added testcase for max_unpool
2018-05-29 15:02:23 -04:00
ngimel
a015d579dd move softmax/logsoftmax to ATen (#6786)
* move softmax/logsoftmax to ATen

* specify cpu and gpu accum types

* use accreal for CPU

* expose softmax backward to python, fix legacy interface

* fix Distributions.cu to use common AccumulateType

* fix cuda 8 build

* delete commented out lines

* rebase on master, fix breakages
2018-05-04 14:23:35 -04:00
Ethan Steinberg
ee00a8049a Add max pooling support to EmbeddingBag (#5725)
* Add max mode support to EmbeddingBag

* Lint fix

* Fix compilation issue on other platforms

* Rebase + don't waste memory when not in max mode

* Oops, missed a spot

* Fix whitespace from merge

* less precision

* Lower precision to avoid spurious failures

* Minor typo

* Switch to size()
2018-04-29 16:48:11 -04:00
Emanuel Jöbstl
645ad7ad0c Fixing LP-Pooling stability issues (#6766)
* Added ReLU unit to LP pooling, so the gradient does not become NAN if all inputs are zero.

* Added workaround for odd p. Added a bit of doc.

* Make the linter happy.
2018-04-25 22:13:15 -04:00
li-roy
d564ecb4a5 Update docs with new tensor repr (#6454)
* Update docs with new tensor repr

* remove cuda in dtype

* remove changes to gloo submodule

* [docs] document tensor.new_* ctor

* [docs] Add docs for tensor.to(), tensor.float(), etc

* [docs] Moar examples for docs.

* [docs] Warning for tensor ctor copy behavior

* Quick fix

* [docs] Document requires_grad_()

* [docs] Add example for requires_grad_()

* update slogdet and *fft

* update tensor rst

* small fixes

* update some docs

* additional doc changes

* update torch and tensor docs

* finish changing tensor docs

* fix flake8

* slogdet with negative det

* Update functional.py tensor ctors

* Fix nll_loss docs

* reorder to move device up

* torch.LongTensor -> torch.tensor or torch.empty in docs

* update tensor constructors in docs

* change tensor constructors

* change constructors

* change more Tensor() to tensor()

* Show requires_grads_ docs

* Fix set_default_dtype docs

* Update docs with new tensor repr

* remove cuda in dtype

* remove changes to gloo submodule

* [docs] document tensor.new_* ctor

* [docs] Add docs for tensor.to(), tensor.float(), etc

* [docs] Moar examples for docs.

* [docs] Warning for tensor ctor copy behavior

* Quick fix

* [docs] Document requires_grad_()

* [docs] Add example for requires_grad_()

* update slogdet and *fft

* update tensor rst

* small fixes

* update some docs

* additional doc changes

* update torch and tensor docs

* finish changing tensor docs

* fix flake8

* slogdet with negative det

* Update functional.py tensor ctors

* Fix nll_loss docs

* reorder to move device up

* torch.LongTensor -> torch.tensor or torch.empty in docs

* update tensor constructors in docs

* change tensor constructors

* change constructors

* change more Tensor() to tensor()

* Show requires_grads_ docs

* Fix set_default_dtype docs

* Link to torch.no_grad, etc, from torch doc

* Add dtype aliases to table

* regen docs again

* Tensor attributes stub page

* link to inplace sampling

* Link torch.dtype, device, and layout

* fix dots after nonfinite floats

* better layout docs
2018-04-21 07:35:37 -04:00
Thomas Viehmann
533beab5bb Fix doc for torch.nn.functional.relu (fixes #6742) (#6749)
Thank you Shengyi Qian (JasonQSY) for spotting and reporting.
2018-04-19 11:25:43 +02:00
Tongzhou Wang
1c01eabd3c
Codemod to update our codebase to 0.4 standard (#6641)
* Codemod to update our codebase to 0.4 standard

* Update some of the test scri[ts

* remove Variable in test_clip_grad_value

* fix _symbolic_override_wrapper_maker
2018-04-17 22:06:54 -04:00
Mike Vella
d5f041aa8b Updated documentation for cross entropy loss to include multi-dimensional input shapes (#6638) 2018-04-17 09:56:43 -04:00
Yannick Soom
fd6d11ae66 Fixed text of error message in case of unexpected target size (#6617) 2018-04-16 11:27:02 -04:00
Tongzhou Wang
59bda9a8c4
Fix reflection padding boundary checks (#6438)
* Fix Reflection padding boundary checks

* Improve padding docs

* fix lint
2018-04-10 10:37:01 -04:00
Kento NOZAWA
3b58b859b2 Fix typos in docs (#6389) 2018-04-07 12:41:15 -04:00
Tongzhou Wang
48ad4546d2 Move LayerNorm to ATen; remove tracking_running_stats functionality (#5983)
* move LN to aten; remove tracking_stats functionaility

* Address comments about error message and respect cudnn flag for LayerNorm and GroupNorm
2018-03-30 09:44:11 -07:00
Richard Zou
371e14b807 NLLLoss: error message for mismatched input/target batch sizes (#6072)
Fixes #5554

Adds an error message for when NLLLoss is passed an input and target
whose batch sizes don't match. Ideally this check should live in ATen
but since there is NLLLoss logic in python the check is there right now.
2018-03-28 14:21:38 -07:00
sundw2014
8964aab260 fix docs error in torch.nn.functional.nll_loss (#6060)
According to the code in _torch/nn/functional.py:1399_
(```if target.size()[1:] != input.size()[2:]:```),
if the size of input is (N, C, d_1, d_2, ..., d_K), the size of target should be (N, d_1, d_2, ..., d_K).
2018-03-28 10:05:14 +02:00
Tongzhou Wang
39829c1670 Improve docs (#5999)
* Clarify det and svd doc on when backward is not stable

* Fix some links in nn.functional doc; improve upsampling doc
2018-03-26 14:09:11 -04:00
Tongzhou Wang
5d77709485 Linearly interpolating upsampling fix (#5927)
* Changes in bilinear upsampling

* Add align_corners option to upsampling module & functional when using linearly interpolating modes
When align_corners=True, it uses the old original upsampling scheme, which gives visually better results,
but doesn't properly align input and output pixels, and thus cause the output vary basing on input.
This PR adds this align_corners option, and changes the default behavior to align_corners=False, with
proper warning if this option is not specified upon using nn.Upsample or nn.functional.upsample to let
be aware of this new change.
Adds tests in test_nn.py for spatial invariance when align_corners=False, and usual module tests for
align_corners=False.

* remove redundant checks and unnecessary variables; fix the cast

* fix negative indices
2018-03-24 12:21:13 -04:00
Tongzhou Wang
08891b0a4e Group Normalization (#5968)
* Group Normalization

* move to ATen
2018-03-24 12:16:18 -04:00
Vedanuj Goswami
f3e16cc737 Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (#5408)
This PR addresses issue #5024

* Expose Conv2dBackward in python

* Separate interface for exposing gardients of operators

* Revert old changes

* Add tests

* Add conv1d gradients. Refactor tests for grad convolutions

* Refactor names and change examples

* Remove Varibale from tests for conv backward
2018-03-23 17:49:32 -04:00
li-roy
e4eee7c2cf Implement MarginRankingLoss as native function and add reduce=True arg to it (#5346)
* add reduce=True arg to MarginRankingLoss

* make default margin arg match for legacy

* remove accidentally added test

* fix test

* fix native_functions.yaml alphabetical order
2018-03-21 15:40:58 -04:00
li-roy
1dcad08537 Support N-D tensors in Bilinear (#5764)
* support n-d inputs in bilinear and move to aten

* support n-d inputs in bilinear and move to aten

* add asserts to bilinear inputs

* address comments

* cast int64_t in asserts
2018-03-17 11:57:43 -04:00
li-roy
e876b5d9d0 implement TripletMarginLoss as a native function (#5680)
* implement TripletMarginLoss as a native function

* implement TripletMarginLoss as native function

* fix compile error

* address comments

* address comments

* Add keepdim arg to pairwise distance
2018-03-17 11:10:48 -04:00
Peter Goldsborough
effc568cee Add ReLU to ATen (#5626) 2018-03-13 19:23:24 +01:00
Vishwak Srinivasan
76a283db40 [ready] General Documentation Improvements - 2 (#5685)
* Fix some minor errors in existing docs.

* Fix Convolution and Pooling docs in torch.nn.functional

* Cleaned up torch.nn.functional docs

* Address @SsnL 's comments

* Add multiplication sign missing in docs

* Fix more typos, and clear some warnings

* Change infinity symbol in LPPool2d

* Revert some changes in torch.nn.functional

* Few more minor changes
2018-03-13 09:47:43 -04:00
li-roy
4c4a42b3f9 implement CosineEmbeddingLoss as a native function and add reduce arg (#5646)
* implement CosineEmbeddingLoss as a native function and add reduce=True arg to it

* fix flake8

* address comments

* add reference function to tests

* fix flake8
2018-03-08 17:54:24 -05:00
Edward Z. Yang
9de922991c
Revert "implement CosineEmbeddingLoss as a native function and add reduce arg" (#5640)
* Revert "implement CosineEmbeddingLoss as a native function and add reduce arg (#5447)"

This reverts commit c16478fe3f.
2018-03-08 14:07:17 -05:00
li-roy
c16478fe3f implement CosineEmbeddingLoss as a native function and add reduce arg (#5447)
forward (new) [1.1905965859768912, 1.160144692985341, 1.1558120870031416]
backward (new) [1.9150976981036365, 1.9792822760064155, 1.8779143309220672]
double backward (new) [3.6898688060464337, 3.5784677929477766, 3.569505032035522]

forward (old) [3.2359962839400396, 3.275224728975445, 3.3409753759624436]
backward (old) [5.668679727939889, 5.722980880062096, 5.585088661056943]
double backward (old) N/A

* implement CosineEmbeddingLoss as a native function and add reduce=True arg to it

* fix flake8

* address comments

* add reference function to tests

* fix flake8
2018-03-08 13:15:12 -05:00
Francisco Massa
0f50ca0b48 Add reduce to functional smooth_l1 documentation (#5610)
This has been present in master since https://github.com/pytorch/pytorch/pull/3382 but the doc for the functional interface was not taken into account.
2018-03-07 10:16:40 -05:00
cjsg
15eae9543e Fixed dimensions in docs of conv and conv_transpose (#5543) 2018-03-03 05:49:01 -05:00
Edward Z. Yang
f064c5aa33
Expunge all occurrences of torch._C._VariableFunctions (#5525)
Some of the call-sites now look a little hokey with this
removed, saving that for another patch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 12:19:44 -05:00
Tongzhou Wang
27265503ad nn.* doc update after Variable/Tensor merge (#5459)
The nn.* counterpart of #5443 . Mostly removed Variable wrapper. Also added doc for nn.RReLU.

Notice that torch.randn(*, requires_grad=True) isn't documented until #5462 is done.
2018-03-01 18:11:39 -05:00
Soumith Chintala
36abf023bd
Added 3d grid sampler (for volumetric transformer networks) (#5453)
* add 3d grid_sample

* add cuda implementation, more testing
2018-02-28 19:32:15 -05:00
li-roy
5bbeb55f22 add reduce=True arg to MultiMarginLoss (#5150)
* add reduce=True arg to MultiMarginLoss

* Change tests to support legacy

* fix flake8

* address comments

* formatting change

* remove free of unallocated tensor

* fix after variable/tensor merge
2018-02-27 18:35:50 -05:00
Sam Gross
30ec06c140
Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
Tongzhou Wang
1848cad108 [ready] Layer Normalization (#4922)
* at::maybe_data_ptr and Check.h => TensorUtils.h

* THNN support for optional BN running_*

* ATen support for optional BN running_*

* Python nn.* support for optional BN running_*; Improve IN and BN doc

* Add tests for IN and BN new option

* Layer Norm

* Fix LRN doc

* functional interface for LN and IN

* Layer norm tests

* fix BN double backward returning undefined tensors

* fix jit test using wrong dim inputs for BN

* add/improve BN, IN and LN GPU tests with half type

* Udpate docs to be consistent with Conv notation
Fix onnx
Clarified onnx symbokic wrapper

* fix typo

* Address comments
2018-02-22 11:56:41 -05:00
li-roy
68aed0779d add reduce=True arg to MultiLabelSoftMarginLoss (#5097)
* add reduce=True arg to MultiLabelSoftMarginLoss

* Move some tests to new_criterion_tests

* fix flake8

* fix multilabelsoftmarginloss weights test
2018-02-15 15:29:44 -05:00
Richard Zou
ab18aaeba7 Clarify output shapes of reduce=False losses (#5082) 2018-02-13 10:11:14 -08:00
li-roy
147612e64a add reduce=True arg to SoftMarginLoss (#5071)
* add reduce=True arg to SoftMarginLoss

* add reference function for SoftMarginLoss

* Rebase onto master

* Address comments

* Fix flake8

* Fix rebase error
2018-02-13 10:51:57 -05:00
cpuhrsch
07be53b57f Move EmbeddingBag into ATen (#4856)
This diff creates code related to EmbeddingBag in ATen. It also allows sparse gradients.
2018-02-12 14:20:32 -05:00
li-roy
ce5702fa80 add reduce=True arg to HingeEmbeddingLoss (#5130)
* add reduce=True arg to HingeEmbeddingLoss

* pass arg to super constructor in HingeEmbeddingLoss

* make HingeEmbeddingLoss reference fn work on legacy
2018-02-09 11:38:36 -05:00
gchanan
affe742d31
Add scalar module tests for test_nn. (#5116)
* Add scalar module tests for test_nn.

* Properly return from glu.

* Guard scalar test with skipIf.
2018-02-08 13:53:24 -05:00
Lu Fang
c111cdfd1d Add onnx support for InstanceNorm (#4626)
* Add ONNX symbolic for instancenorm

* Fix some bugs
2018-02-07 10:54:30 -05:00
gchanan
7af433deeb
Add scalar criterion tests (#5087)
* Add criterion scalar tests.

This exposed an issue in MarginRankingLoss with scalars, but the cleanest way to fix is to wait
until forward runs on Variables (so we don't have to wait for the backward to check if something
is a scalar).

* Fix flake8.

* Add error message for margin_ranking_loss with scalars.
2018-02-06 18:40:37 -05:00
gchanan
fcccd07cc0
Implement hinge_embedding_loss as a native function. (#5080) 2018-02-06 14:43:36 -05:00
li-roy
28f056fed2 add reduce=True argument to MultiLabelMarginLoss (#4924)
* add reduce=True argument to MultiLabelMarginLoss

* Fix lint

* Addressed comments

* Remove unneeded syncthreads calls
2018-02-05 12:28:51 -05:00
Richard Zou
e4ddbeb554 Fix typo (#4846) 2018-01-25 10:33:45 -05:00
Richard Zou
b997474a4f Adds Im2Col and Col2Im (#4729) 2018-01-19 09:37:53 -05:00
Sam Gross
57549b7e44
Bind functions with out= arguments in VariableType (#4565)
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.

The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
2018-01-17 18:27:42 -05:00
Sam Gross
cb83474a57
Fix embedding with sparse=True (#4686)
Fixes #4666
2018-01-16 16:19:20 -05:00
Kai Arulkumaran
2260649fb6 Local Response Normalization (#4667)
* Local Response Normalization

* Add 1D and 3D LRN

* Generalise LRN to higher dims

* Use mean instead of sum

Specify 'across-channels'
2018-01-15 22:23:51 -05:00
David Pollack
05908e8243 current code works with dim = 3, so I added it to dim checks 2018-01-13 12:58:08 +01:00
Riddhiman Dasgupta
f99c7d9429 Padding_idx in Embedding supports negative indexing (#4496) 2018-01-09 12:04:11 +01:00
Neeraj Pradhan
408c84de7c Supporting logits as parameters in Bernoulli and Categorical (#4448)
* Supporting logits as parameters in Bernoulli and Categorical

* address comments

* fix lint

* modify binary_cross_entropy_with_logits

* address comments

* add descriptor for lazy attributes

* address comments
2018-01-05 03:45:05 -05:00
Richard Zou
35c4d73bdb Deprecate nn.NLLLoss2d (#4238)
* Deprecate nn.NLLLoss2d

* Fix legacy tests

* Fix tests

* Remove NLLLoss2d from docs, add deprecation warning instead of error

* fix lint

* Add more to docs
2018-01-04 12:38:04 -05:00
Hugh Perkins
fc0d940c5e add gumbel_softmax, based on Eric Jang's implementation (#3341)
* add gumbel_softmax, based on Eric Jang's implementation

* Make gumbel_softmax CUDA friendly

* gumbel_softmax tweaks
2018-01-04 12:23:21 -05:00
Sam Gross
20b5e82155
Implement embedding in ATen (#4322)
Implements nn.Embedding (lookup table) in ATen.

Breaking change: new optional argument padding_idx in F.embedding to
match nn.Embedding.

Note that there are a few bugs in Embedding that are inherited from the
previous code:

 - CUDA renorm has race conditions if index contains duplicate entries
 - sparse gradient doesn't work with scale_grad_by_freq
2018-01-02 15:44:46 -05:00
Sam Gross
98f71912b0
Fix type signature of in-place NN functions (#4389)
This is a step towards removing the special casing of NN functions in gen_variable_type.py. It fixes the signature of in-place NN functions so that they return Tensor & instead of Tensor.
2017-12-28 16:50:09 -05:00
Sam Gross
4dba674324
Move factional max pooling to ATen (#4290) 2017-12-21 17:07:46 -05:00
Edward Z. Yang
5f7c5502b8
Further improvements to ATen convolution (#4287)
- Rename THNN convolution to have thnn_ prefix.
- Propagate CuDNN benchmark and deterministic to at::Context
- Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults
  The conv_transposeNd wrappers are updated to have the same argument
  order as Python.
- torch.nn.functional directly dispatches to the native wrappers
- Make it possible to turn off tracing for some native wrappers, so I don't
  have to write symbolics for all the functions above
- Spectral ops can now make use of CuDNN convolution if possible
- Better commentary on cudnn_batch_norm
- Turn on DCE for all JIT tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21 13:03:43 -05:00
Edward Z. Yang
5b8fe5cbb5
Batchnorm in ATen (#4285)
* Batchnorm in ATen

This commit moves BatchNorm derivatives into ATen, eliminating
torch/csrc/autograd/functions/batch_normalization.cpp

Some refactoring along the way:

- Functions got renamed to remove _forward from their names
- CuDNN batchnorm forward was modified to return save_mean/save_std instead of
  take it as parameters. To avoid returning undefined Variables, these return
  (small) uninitialized tensors when they are not used.
- THNN batch normalization takes care of resizing save_mean and save_std on
  forward.
- There are some shenanigans re batchnorm backwards in eval mode. I'm tracking
  that in #4284
- I decided not to introduce buffers as a proper concept in ATen, which means
  that tensors like running_mean/running_var are variables in ATen.  This meant
  there needed to be some adjustments to how we *trace* such variables; the
  new strategy is if we can't find a Value for a variable, we look and see
  if we have a Value for the buffer pointed to by the variable, before
  finally falling back on constant.
- This PR finally reliably triggered OOM on Travis builds; I fixed this by reducing
  the number of parallel jobs.
- Stop using std::string when it's not necessary.
- Remove training parameter from cudnn_batch_norm_backward, because it
  doesn't make sense; cuDNN doesn't implement the math for evaluation mode
  batchnorm backwards.
- batchnorm_double_backward is now in an anonymous namespace, as it
  no longer needs to be called from torch/csrc

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21 11:38:31 -05:00
Sam Gross
b6a30f7ede
Move SELU to ATen (#4269)
Fuse scale multiplication into ELU
2017-12-20 16:32:21 -05:00
Sam Gross
dad4b2d6cc
Move adaptive avg/max pool1d to ATen (#4266) 2017-12-20 15:50:17 -05:00
Sam Gross
689ef9cba3
Move upsampling to ATen (#4264) 2017-12-20 15:12:07 -05:00
Edward Z. Yang
a88a8ec827
Convolution derivatives in ATen (#4116)
* Convolution derivatives in ATen

This PR introduces ATen implementation of convolution, which dispatches to
THNN/CuDNN/nnpack based on input parameters. The general strategy is to compose
this function out of the various forward-backward pairs of specific
implementations, rather than write a monolithic function with backwards (which
is what we did before because the boilerplate of doing it otherwise would have
been very high.) The new API provides the following functions:

  - _convolution, which is a fully generic, native convolution implementation
    that dispatches to various other convolution implementations depending on
    input characteristics. This is prefixed with an underscore because it
    explicitly takes benchmark, deterministic and cudnn_enabled which are
    implementation details for CuDNN. The intent is to eventually provide a
    convolution that reads these parameters out of the context using #4104.
  - _convolution_nogroup is a convolution implementation for non-CuDNN
    algorithms which don't support group convolution natively.
  - _convolution_double_backward is the generic double-backwards implementation
    for convolution.

In more detail:

- Most functionality from torch/csrc/autograd/functions/convolution.cpp has been
  moved into aten/src/ATen/native/Convolution.cpp
- We continue to make use of ConvParams, but we now construct the parameters
  upon entry to a function from the function signature (which does not use
  ConvParams; having convolution take ConvParams directly would require teaching
  the code generator how to accept these as parameters, complicating ATen's API
  model) and destruct them when making subprocedure calls.
- I introduce a new idiom, input_r, which represents a const Tensor& reference,
  which will subsequently be assigned to a local Tensor input. This is helpful
  because a lot of the existing algorithms relied on being able to assign to
  locals, which is not permitted with a const reference.
- The native argument parser now supports std::array<bool,2> inputs (NB: there
  MUST NOT be a space; this is the same hack as is applied to derivatives.yaml)
- Native parser now supports Tensor? arguments, which indicates a nullable
  tensor. Previously this function was only used by NN methods.
- Documentation updates on THNN library
- I added an extra fgradInput argument to VolumetricConvolutionMM_updateOutput
  and VolumetricConvolutionMM_accGradParameters so that its buffer list lines up
  with the backward argument list. This makes it possible to write derivative
  for conv3d which previously was not supported (commented out in
  derivatives.yaml)
- Extra double_backward declarations for all convolution backwards functions was
  added.
- You can now use the syntax Tensor? in native_functions.yaml to indicate that a
  tensor argument is nullable.  There are adjustments to propagate this to the
  Python argument parser.
- NNPACK was ported to ATen, and ATen now builds and links against ATen if
  possible. New AT_NNPACK_ENABLED macro.  The nnpack functions are
  nnpack_spatial_convolution.
- Some modest CuDNN convolution refactoring to remove _forward from names.
- There's a new cudnn_convolution_backward function to deal with the fact that
  CuDNN convolution double backward requires you to have computed all gradients
  in one go.
- Variable set_flags now checks if the tensor is undefined, fixing a silent memory
  corruption.
- checkSameType updated to not raise an exception if called with Variable arguments
- "no ATen declaration found for" error message is improved to say what available declarations are
- make_variable now accepts undefined tensors, and returns an undefined tensor in this case.
2017-12-20 14:19:27 -05:00
Sam Gross
b476d10c64
Move max_pool1d to ATen (#4257) 2017-12-19 20:10:11 -05:00
Sam Gross
9495595520
Move reflection/replication padding to ATen (#4258) 2017-12-19 18:57:14 -05:00
Sam Gross
227ef1fb60
Move adaptive avg pooling 2d/3d to ATen (#4254)
Move adaptive avg pooling 2d/3d to ATen

Also use ATen for softshrink
2017-12-19 15:45:33 -05:00
James Reed
cb4f6c3148 conv_tbc (#3730)
attempt to rebase

skip conv_tbc in preprocess_nn_functions

Add conv_tbc symbolic

Fix backward issue with dBias

ConvTBC nn wrapper and unit test
2017-12-18 23:52:36 -05:00
Richard Zou
ccf4dc1525 Add reduce arg to BCELoss (#4231)
* Add reduce arg to BCELoss

* Fix test precision

* reduce keyword for BCELoss in derivatives.yaml
2017-12-18 12:28:53 -05:00