Commit Graph

594 Commits

Author SHA1 Message Date
Guanheng Zhang
4b20fc826d Import MultiheadAttention to PyTorch (#18334)
Summary:
Import MultiheadAttention into the core pytorch framework.
Users now can import MultiheadAttention directly from torch.nn.
See "Attention Is All You Need" for more details related to MultiheadAttention function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18334

Differential Revision: D14577966

Pulled By: zhangguanheng66

fbshipit-source-id: 756c0deff623f3780651d9f9a70ce84516c806d3
2019-04-11 08:07:30 -07:00
Richard Zou
447d74a074 EmbeddingBag w/ differentiable per_sample_weights (#18957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18957
ghimport-source-id: 7396ca08b137ea40f04285764a9d9a6d4f19227e

Reviewed By: cpuhrsch

Differential Revision: D14856526

Pulled By: zou3519

fbshipit-source-id: 949faea219c7c02ad981b1db610a477194d3f5c9
2019-04-09 18:13:06 -07:00
Richard Zou
c889ff6cf8 EmbeddingBag w/ per_sample_weights CUDA fwd + bwd (#18800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18800
ghimport-source-id: 17f638dea0e1ac9a86ec06b223c60362ed78449c

Reviewed By: cpuhrsch

Differential Revision: D14851422

Pulled By: zou3519

fbshipit-source-id: 27b114e51e66112e4bc9cfc63d1d1ddfa650d347
2019-04-09 18:13:02 -07:00
Richard Zou
0397d7c0c8 EmbeddingBag w/ per_sample_weights CPU backward (#18799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18799
ghimport-source-id: 58a6f629e890449013f24a9b6282664ca2a1e3ba

Reviewed By: cpuhrsch

Differential Revision: D14851417

Pulled By: zou3519

fbshipit-source-id: c36b9d469989354bf6cef1c2c3dc4f13e7cb1a25
2019-04-09 18:12:59 -07:00
Richard Zou
2a2007e5ac EmbeddingBag CPU forward with per_sample_weights. (#18735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18735
ghimport-source-id: d81bef54dafd7167d2451250d7be478d3c013920

Reviewed By: cpuhrsch

Differential Revision: D14851415

Pulled By: zou3519

fbshipit-source-id: cea6039e760ad571b90f0a536e420498f34be325
2019-04-09 18:12:55 -07:00
J M Dieterich
e45e3634d6 add launch bounds, enable more tests (#18909)
Summary:
Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use.

Enable tests that now work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909

Differential Revision: D14801490

Pulled By: ezyang

fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7
2019-04-05 10:17:15 -07:00
Sepehr Sameni
b11a8c6aef return missing keys from load_state_dict (#18668)
Summary:
return missing_keys and unexpected_keys from load_state_dict so the user can handle them when strict mode is off; also removed an unused variable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18668

Differential Revision: D14782073

Pulled By: ezyang

fbshipit-source-id: ab3b855eb77bb7422594d971988067e86eef20f2
2019-04-04 18:11:56 -07:00
Roy Li
c705d9eb1e Introduce DeprecatedTypeProperties class (#17991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17991

changes:
-Breaks bc: Tensor::type() now returns DeprecatedTypeProperties& rather than Type&.
-Added DeprecatedTypeProperties, it serves as a temporary replacement for Type as the return value of Tensor::type(). This contributes to making Type just for dispatch purposes so that we can make it dtype agnostic.
-Tensor::dispatch_type() now returns Type& like Tensor::type() used to do.
-Changed callsites of Tensor::type() appropriately.

Reviewed By: ezyang

Differential Revision: D14443117

fbshipit-source-id: 239ccb7a09626279a71d1a37f8f82e7f57bf7d9e
2019-04-04 02:24:13 -07:00
Shen Li
7ae0263e1b Support replicating multi-GPU modules (#18687)
Summary:
If the input `network` resides on multiple GPUs, `devices` must be a 2D list with `devices[0]` matching `network`'s devices. See  #18591
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18687

Differential Revision: D14706162

Pulled By: mrshenli

fbshipit-source-id: dca630d3308f2dbcf8b75629c452d7a64092ba42
2019-04-03 14:43:07 -07:00
kshitij12345
0916b5419a Fix dense Embedding to work with double backward (#9078)
Summary:
Fixes : #6469

1. `ATen/native/native_functions.yml` had [dispatch](03e7953a98/aten/src/ATen/native/native_functions.yaml (L451-L455)) variants for for `embedding_dense_backward` , however `embedding_backward` explicitly made [call](03e7953a98/aten/src/ATen/native/Embedding.cpp (L35-L45)) to it, thus leading to error.

2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data [pointer](03e7953a98/aten/src/ATen/native/Embedding.cpp (L93)).

Both have been solved and checked against (on CUDA and CPU)

1.  As mentioned in the issue
```
import torch

class Test(torch.nn.Module):

    def __init__(self):
        super(Test,self).__init__()
        self.embd = torch.nn.Embedding(1000, 100)
        self.dense = torch.nn.Linear(100, 1)

    def forward(self, inp):
        inp = self.embd(inp)
        return self.dense(inp)

test = Test()
inp = torch.tensor([0,1,2,1,1])
out = test(inp)
raw_loss = out.mean(dim=0)

loss_grad = torch.autograd.grad(outputs=raw_loss,
                         inputs=list(test.parameters()),
                         retain_graph=True, create_graph=True, only_inputs=True)
norm = sum([param.norm()**2 for param in loss_grad])
loss = raw_loss + norm

loss.backward(retain_graph=True)

print(test.embd.weight.grad)

```

2. Test Script
```
import torch
import time
start = time.time()
l = [1,1]*100
input = torch.tensor([[1,0],[1,0]],device='cpu')
embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu')

sq = embedding_matrix * embedding_matrix
emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False)

print('Embedding Matrix')
print(embedding_matrix)
print('-----------------')

sum_ = emb.sum()#prod.sum()

loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True)

print('Gradient')
print(loss_grad)
print('-----------------')

sum2_ = sum_ + loss_grad.sum()
print(sum2_)
sum2_.backward()

print(embedding_matrix.grad)
print(time.time() - start)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9078

Reviewed By: ezyang

Differential Revision: D14691901

Pulled By: soumith

fbshipit-source-id: 78e2612ba39080be564c876311671eb5a0119a0f
2019-04-03 09:50:34 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
jithunnair-amd
fdedc62c26 enable more unit tests (#18537)
Summary:
Enable unit tests working with ROCm 2.3. In particular, these are unit tests where we skipped for double data types previously and some tests for multi-GPU setups.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18537

Differential Revision: D14651822

Pulled By: ezyang

fbshipit-source-id: 7dd575504ebe235a91489866c91000e9754b1235
2019-03-27 14:27:23 -07:00
mc-robinson
8bc5b86709 Added tensor size warning to F.mse_loss() (#18349)
Summary:
To address the issue of broadcasting giving the wrong result in `nn.MSELoss()` as mentioned here https://github.com/pytorch/pytorch/issues/16045 . In particular, the issue often arises when computing the loss between tensors with shapes (n, 1) and (n,)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18349

Differential Revision: D14594176

Pulled By: soumith

fbshipit-source-id: f23ae68a4bf42f3554ad7678a314ba2c7532a6db
2019-03-24 19:22:14 -07:00
Will Feng
7be05b822c Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values (#18179)
Summary:
Currently, this code gives incorrect result:
```python
import torch
indices=torch.tensor([[7, 1, 3]])
values=torch.tensor([[1., 1., 1.],
               [1., 1., 1.],
               [1., 1., 1.]])
x = torch.sparse_coo_tensor(indices, values, size=(10, 3))
values=torch.tensor(1.).expand(3, 3)
y = torch.sparse_coo_tensor(indices, values, size=(10, 3))
z = x + y

tensor(indices=tensor([[7, 1, 3]]),
       values=tensor([[2., 1., 1.],
                      [1., 1., 1.],
                      [1., 1., 1.]]),
       size=(10, 3), nnz=3, layout=torch.sparse_coo)
```

This PR fixes the bug by adding special handling for sparse tensors with non-contiguous values in the addition function (specifically, by cat'ing the indices and values together).

This PR closes https://github.com/pytorch/pytorch/issues/17950 and https://github.com/pytorch/pytorch/issues/17919.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18179

Reviewed By: ezyang

Differential Revision: D14569591

Pulled By: yf225

fbshipit-source-id: f5a14c4a31337fc95eab64596212066b4fb18b1a
2019-03-22 19:35:14 -07:00
Ailing Zhang
fe5d23cf4a Using sqrt for better precision in cosine_similarity (#18250)
Summary:
address comment in #18168 .
Testing in CI...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18250

Differential Revision: D14568601

Pulled By: ailzhang

fbshipit-source-id: 39fbbdb08743b53fa665c7e88e4750cbe0976ec7
2019-03-22 13:33:30 -07:00
Edward Yang
ba81074c40 Fix B902 lint error: invalid first argument. (#18181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181
ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18184 Fix B903 lint: save memory for data classes with slots/namedtuple
* **#18181 Fix B902 lint error: invalid first argument.**
* #18178 Fix B006 lint errors: using mutable structure in default argument.
* #18177 Fix lstrip bug revealed by B005 lint

A variety of sins were committed:
- Some code was dead
- Some code was actually a staticmethod
- Some code just named it the wrong way
- Some code was purposely testing the omitted case

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530876

fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e
2019-03-21 09:10:28 -07:00
Ailing Zhang
8895bfba6a fix cosine_similarity (#18168)
Summary:
fixes #18057 according to colesbury 's suggestion. Thanks!
cc: ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18168

Differential Revision: D14520953

Pulled By: ailzhang

fbshipit-source-id: 970e6cfb482d857a81721ec1d0ee4a4df84a0450
2019-03-19 20:09:17 -07:00
Edward Yang
3fe7bdb2ff Fix lint in test_nn.py (#18006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18006
ghimport-source-id: e267ece1ac03e0d17e01dddf4a77f52421859435

Stack:
* **#18006 Fix lint in test_nn.py**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: eellison

Differential Revision: D14458108

fbshipit-source-id: 18ee6199447efed55a922cff5b3ad940a21c0536
2019-03-14 08:59:24 -07:00
bhushan
16e50c78e7 Report convolution size mismatch (#17436)
Summary:
1. Kernel size is larger than input
2. Expected output size to be less than zero

Test case added:
- invalid_conv1d
- Relevant test cases for conv2d and conv3d exists

Fixes #17247
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17436

Reviewed By: mrshenli

Differential Revision: D14354272

Pulled By: fmassa

fbshipit-source-id: 94b98621aa03b1f60d151ef9399ed3da55d41b42
2019-03-14 06:35:29 -07:00
Soumith Chintala
a478d41620 Fix nll_loss crash on cpu where ignore_index is out of bounds (#17328)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15508
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17328

Differential Revision: D14322629

Pulled By: soumith

fbshipit-source-id: 7d02f372be78794782c18affcfc109ce30b1e91c
2019-03-05 14:35:05 -08:00
Tongzhou Wang
44a607b90c Fix autograd with buffers requiring grad in DataParallel (#13352)
Summary:
Causing a problem with spectral norm, although SN won't use that anymore after #13350 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13352

Differential Revision: D14209562

Pulled By: ezyang

fbshipit-source-id: f5e3183e1e7050ac5a66d203de6f8cf56e775134
2019-02-26 20:53:19 -08:00
vishwakftw
724c7e76c6 Fix reduction='none' in poisson_nll_loss (#17358)
Summary:
Changelog:
- Modify `if` to `elif` in reduction mode comparison
- Add error checking for reduction mode
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17358

Differential Revision: D14190523

Pulled By: zou3519

fbshipit-source-id: 2b734d284dc4c40679923606a1aa148e6a0abeb8
2019-02-25 10:35:33 -08:00
Tongzhou Wang
3d5968d366 Fix DataParallel(cpu_m).cuda() not working by checking at forward (#17363)
Summary:
Fixes #17362
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17363

Differential Revision: D14175151

Pulled By: soumith

fbshipit-source-id: 7b7e2335d553ed2133287deeaca3f6b6254aea4a
2019-02-22 08:31:36 -08:00
Natalia Gimelshein
5fa78303ed fix double backward for half softmax/logsoftmax (#17330)
Summary:
Fix for #17261, SsnL do you have tests for it in your other PR? If not, I'll add to this. Example from #17261 now does not error out (and same for log_softmax).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17330

Differential Revision: D14171529

Pulled By: soumith

fbshipit-source-id: ee925233feb1b44ef9f1d757db59ca3601aadef2
2019-02-21 14:58:45 -08:00
Soumith Chintala
c63af8837d remove nn.Upsample deprecation warnings from tests (#17352)
Differential Revision: D14168481

Pulled By: soumith

fbshipit-source-id: 63c37c5f04d2529abd4f42558a3d5e81993eecec
2019-02-21 11:27:24 -08:00
Tongzhou Wang
2b57bdb7ab Fix cuda softmax backward with empty input (#17259)
Summary:
Fixes #17256
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17259

Differential Revision: D14142196

Pulled By: soumith

fbshipit-source-id: 1f2dc202951b59b43da27684f9f924314bcd3040
2019-02-19 16:41:52 -08:00
Jie
594a4d7b55 at::native batch norm kernel launch config update (#17047)
Summary:
limit block dimension to avoid configuration error on batch norm kernel launch

This should resolve #16998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17047

Differential Revision: D14142132

Pulled By: soumith

fbshipit-source-id: 9c8c52dcd1d108cda1f65f5227e625b8fe6e12a0
2019-02-19 16:41:51 -08:00
Sergei Nikolaev
6455d91e4d False alarm about leak in TestNN.test_variable_sequence_cuda (#17242)
Summary:
`TestNN.test_variable_sequence_cuda` sometimes brakes due to CUDA leak.
The cause appears to be too small tolerance breaking float16 sub-test of the test above.
When it breaks it calls abort disrupting correct tear down of the test
and false alarming about the leak.

~~Also, removed annoying **Upsample** module warning.
IMHO this warning is wrong because the module **Upsample** is not deprecated. Seems like it's been mixed
with `nn.functional.upsample` function which is indeed deprecated in favor of `nn.functional.interpolate`, see `torch/nn/functional.py:2387` for details (this replacement is also performed in `test_nn.py`).~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17242

Differential Revision: D14141686

Pulled By: soumith

fbshipit-source-id: faa8f87440d94bdc6ab0ff00be6dad82353115c4
2019-02-19 15:59:30 -08:00
Shen Li
472cfc0f2c Enforce module device at DataParallel construction time (#17129)
Summary:
closes #17065

CC douwekiela
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17129

Differential Revision: D14093353

Pulled By: mrshenli

fbshipit-source-id: 9a5a10f16e392337a7f7073223541cf69b402f82
2019-02-15 11:14:46 -08:00
wbydo
6c67dcfb05 Fix AdaptiveLogSoftmaxWithLoss's constructor (#16694)
Summary:
t-ken1 and I are members of a same team.
I have added test codes about the pull request https://github.com/pytorch/pytorch/pull/16656.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16694

Differential Revision: D14070106

Pulled By: ezyang

fbshipit-source-id: ff784dbf45e96a6bcf9a4b5cb9544a661a8acad2
2019-02-15 06:58:00 -08:00
Junjie Bai
9b7f3da74b Skip test_cudnn_multiple_threads_same_device on ROCm (flaky) (#17061)
Summary:
cc iotamudelta
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10722//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10710//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10753//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/1756//console
```
19:07:18 ======================================================================
19:07:18 FAIL: test_cudnn_multiple_threads_same_device (test_nn.TestNN)
19:07:18 ----------------------------------------------------------------------
19:07:18 Traceback (most recent call last):
19:07:18   File "/var/lib/jenkins/workspace/test/test_nn.py", line 3905, in test_cudnn_multiple_threads_same_device
19:07:18     (2048 - test_iters) * (2048 - test_iters))
19:07:18   File "/var/lib/jenkins/workspace/test/common_utils.py", line 453, in assertEqual
19:07:18     super(TestCase, self).assertLessEqual(abs(x - y), prec, message)
19:07:18 AssertionError: 3794704.0 not less than or equal to 1e-05 :
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17061

Differential Revision: D14069324

Pulled By: bddppq

fbshipit-source-id: e33b09abca217a62a8b577f9c332ea22985ef4ff
2019-02-13 17:18:47 -08:00
Johannes M Dieterich
9d01be1a5a enable more unit tests in test_nn (#16994)
Summary:
These tests work with ROCm 2.1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16994

Differential Revision: D14059802

Pulled By: bddppq

fbshipit-source-id: 8e2cbb13196c2e0283d3e02b7f761374bc580751
2019-02-12 17:58:44 -08:00
Thomas Viehmann
29f096cc70 optionally zero infinite losses in CTCLoss (#16199)
Summary:
Here is a stab at implementing an option to zero out infinite losses (and NaN gradients).
It might be nicer to move the zeroing to the respective kernels.
The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful.

This topic seems to come up regularly, e.g. in  #14335
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16199

Differential Revision: D14020462

Pulled By: ezyang

fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428
2019-02-11 13:12:55 -08:00
Johannes M Dieterich
23e1c55cc0 enable unit tests working on ROCm 2.1 (#16871)
Summary:
This is the first round of enabling unit tests that work on ROCm 2.1 in my tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16871

Differential Revision: D13997662

Pulled By: bddppq

fbshipit-source-id: d909a3f7dd5fc8f85f126bf0613751c8e4ef949f
2019-02-09 00:30:50 -08:00
Asher Mancinelli
7078b2baf5 Better bounds checks in ctcloss (#16269)
Summary:
Adds better bounds checks for target lengths in CTC loss, checks for integral types for target and prediction lengths, and adds tests for each, according to #15946
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16269

Differential Revision: D13847567

Pulled By: ezyang

fbshipit-source-id: 5d7a975565e02baf78fe388813a1d1ef56dfb212
2019-02-01 08:02:54 -08:00
vishwakftw
34b43baeec Allow list and tuples to be passed as output_size to max_unpool1d (#16489)
Summary:
Changelog:
- Modify concantenation of [1] to a tuple by using cases for list and non-list types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16489

Differential Revision: D13875838

Pulled By: soumith

fbshipit-source-id: fade65cc47385986b773b9bde9b4601ab93fe1cf
2019-01-30 11:00:34 -08:00
Lu Fang
b1b00f329e Fix the flake8 linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549

Reviewed By: bddppq

Differential Revision: D13877435

Pulled By: houseroad

fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540
2019-01-30 09:36:00 -08:00
Gregory Chanan
0cb24098c7 Handle non-contiguous inputs with mkldnn convolution. (#16300)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/16018.

Backwards appears to be fine because the derivative is written in terms of mkldnn_convolution itself.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16300

Differential Revision: D13797776

Pulled By: gchanan

fbshipit-source-id: 68a990b8a3c186412a99d176931314806c9ed7bf
2019-01-24 07:39:31 -08:00
Thomas Viehmann
b662a9b66a add back NNPACK in PyTorch (#15924)
Summary:
This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions.

In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.)

The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.)
The CMake changes try to use the NNPack we already have in git.

In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924

Differential Revision: D13709576

Pulled By: ezyang

fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c
2019-01-18 15:34:35 -08:00
Richard Zou
ed0a761c82 Improve pack_sequence and pack_padded_sequence error message (#16084)
Summary:
Mention that if enforce_sorted=True, the user can set
enforce_sorted=False. This is a new flag that is probably hard to
discover unless one throughly reads the docs.

Fixes #15567
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16084

Differential Revision: D13701118

Pulled By: zou3519

fbshipit-source-id: c9aeb47ae9769d28b0051bcedb8f2f51a5a5c260
2019-01-18 07:58:54 -08:00
Egil Martinsson
d6a8dd9538 Cleanup gumbel_softmax (#13339)
Summary:
Fixes #12643, amends to #3341.

- Allow multidimensional input ~~(but apply softmax over `dim=-1`)~~ with `dim` argument
- Cleaner: Less lines of code
- Faster (1.32x speedup vs original, 2x speedup vs using `torch.Distributions`)
- Small fixes in docstring
- Remove some references in docstring. Was the linked (excellent) ipynb the first to do the straight-through trick? Instead, I propose changing to reference to the two papers most known for it.
- Add deprecationwarning for `eps`. It's not needed anymore.
- Initial commit keeps some code alternatives commented to exploit CI

- As of discussion when `gumbel_softmax` was added (#3341), this was merged into `torch.nn.functional` before all the work with `Distributions` and `Pyro`, and there will probably be multiple other best practices for this in the future.
I've tested building using the `Distributions`-api, but it was too slow, see below.

I therefore propose not using `Distributions` to keep it fast and simple, but adding a comment in docstring that `gumbel_softmax` may be deprecated in the future.

```
dist = torch.distributions.RelaxedOneHotCategorical(temperature=tau, logits=logits, validate_args=False)
y_soft = dist.rsample()
```

Pros:
* Built using tricks like `logsumexp` etc
* Explicitly uses `torch.distributions.utils._finfo` to avoid overflow (old implementation had an `eps` flag)
* Maintained for this exact purpose.

Cons:
* Very slow. Construction of distribution adds overhead see timings below. May be solved in future with speedups of `TransformedDistribution` and `Distribution`.
* Assumes which `dim` to apply softmax over.

```
    y_soft = logits.new(logits.shape)
    y_soft = (logits - y_soft.exponential_().log()) / tau  # Gumbel noise
    y_soft = y_soft.softmax(dim)  # Gumbel softmax noise
```
Pros:
* Faster

```
    import time
    start = time.time()
    num_draws = 1000000
    logits = torch.randn(1,3)

    for draw in range(num_draws):
        y_draw = gumbel_softmax(logits, hard=True)
        counts = counts + y_draw
    print(end - start)

>> 12.995795965194702

>> 7.658372640609741

>> 20.3382670879364
````

Decide on which path to chose. I'll commit in changes to the unit tests in a while to show that it passes both old tests and new tests. I'll also remove the commented code about `RelaxedOneHotCategorical`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13339

Differential Revision: D13092434

Pulled By: ezyang

fbshipit-source-id: 4c21788df336f4e9c2ac289022e395b261227b4b
2019-01-17 12:56:35 -08:00
Gregory Chanan
595f767880 Revert batched pdist, improve existing kernel, add test (#15901)
Summary:
1) Reverts https://github.com/pytorch/pytorch/pull/12302 which added support for batched pdist. Except I kept the (non-batched) test improvements that came with that PR, because they are nice to have.  Motivation: https://github.com/pytorch/pytorch/issues/15511
2) For the non-batched pdist, improved the existing kernel by forcing fp64 math and properly checking cuda launch errors
3) Added a 'large tensor' test that at least on my machine, fails on the batch pdist implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15901

Reviewed By: ezyang

Differential Revision: D13616730

Pulled By: gchanan

fbshipit-source-id: 620d3f9b9acd492dc131bad9d2ff618d69fc2954
2019-01-17 10:44:43 -08:00
Natalia Gimelshein
461dc9a28b use all_weights instead of _parameters in _flat_weights in rnn (#15766)
Summary:
Fixes #15749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15766

Differential Revision: D13592320

Pulled By: soumith

fbshipit-source-id: 6c3805f576c3df5a2da8bef1e4305eda379718df
2019-01-08 09:48:36 -08:00
Michael Carilli
e313f1a7bf Cudnn Handle Pool 3: At Wit's End (#15668)
Summary:
ezyang Here's a freshly rebased version of https://github.com/pytorch/pytorch/pull/15080 with the if statement that relieved the hangs that occasionally, nondeterministically, occurred on cudnnCreate on a particular windows build ([example w/debug statements](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/19238/console))  in https://github.com/pytorch/pytorch/pull/15280.

I'd like to run the CI over this several times before it's considered mergeable.  Sometimes the windows hang doesn't manifest for 2 or 3 consecutive trials.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15668

Differential Revision: D13579291

Pulled By: soumith

fbshipit-source-id: 3972eb98bad6ece933ca5e67a10fc4bc2ed06068
2019-01-04 06:28:21 -08:00
Ailing Zhang
78442f04fc Add mkldnn conv double backward (#15686)
Summary:
Fixes #15353 .

Like cudnn conv implementation, mkldnn also falls back to the default `_convolution_double_backward` as double backward.

This bug wasn't caught by CI before because mkldnn is only used when input scalar type is float, but our tests are all using double as default.

Adding test for float inputs, but mkldnn seems to have imprecision issues similar to cudnn implementation, so here I only check if double backward exists instead of calling `gradgradcheck`. Please correct me if the precision should actually be checked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15686

Differential Revision: D13571682

Pulled By: ailzhang

fbshipit-source-id: f1762439762370f276cfd59e8b8b8a4dee960a4b
2019-01-03 10:50:00 -08:00
Will Feng
7b87ecae37 Move autograd metadata from VariableImpl to TensorImpl (#13827)
Summary:
Changes originally in this PR:
1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct
2. Change Variable::Impl functions to use data members in `AutogradMeta` struct
3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl
4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()`

Changes moved from https://github.com/pytorch/pytorch/pull/13645:
1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python.
2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated.

This is the 1st+2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13827

Differential Revision: D13507173

Pulled By: yf225

fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a
2018-12-26 16:34:24 -08:00
David Pollack
cdb8edce75 add from_pretrained method to EmbeddingBag (#15273)
Summary:
The `EmbeddingBag` module does not include a `from_pretrained` method like the `Embedding` module.  I added it for consistency between the two modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15273

Differential Revision: D13547842

Pulled By: soumith

fbshipit-source-id: 8ffde51ff0c1e8fc8310263b6f375da88089ff7d
2018-12-26 08:35:39 -08:00
Richard Zou
3353064060 Add option to automatically handle unsorted variable-length sequences in RNNs (#15225)
Summary:
Fixes #3584.

Motivation: manually sorting sequences, packing them, and then unsorting them
is something a lot of users have complained about doing, especially when we can
offer library support for them.

Overview: we internally sort sequences before packing them and store a list of
`unsorted_indices` that represent how to unsort the sequences inside
PackedSequence. The packing helper functions return PackedSequence with the
`permutation` field and the unpacking helper functions use it to unsort.

To implement this, the following changes were made:
- PackedSequence now keeps `sorted_indices` and `unsorted_indices`.
  These two can be thought of as permutations and are inverses of each other.
  `sorted_indices` is how the sequences were sorted; `unsorted_indices` is how
  to unsort the sequences.
- Added an `enforce_sorted` argument to pack_sequence and pack_padded_sequence
  that maintains the legacy behavior of error-ing out on unsorted-sequences.
  When `enforce_sorted=True`, these functions maintain their ONNX exportability.
- pack_sequence(sequences, enforce_sorted) takes in unsorted sequences.
- pack_padded_sequence can take in a padded tensor that represents padded,
  unsorted sequences.
- pad_packed_sequence unsorts the PackedSequence such that it is still the
  inverse operation of packed_padded_sequence.
- RNNs apply `sort_indices` to their input hidden state and apply
  `unsort_indices` to their output hidden state. This is to ensure that the
  hidden state batches correspond to the user's ordering of input sequences.

NOT BC-Breaking
- The default for pack_sequence and pack_padded_sequence is
  `enforce_sorted=True` to avoid breaking ONNX export. To use the new
  functionality, pass in `enforce_sorted=False`

Testing Plan
- Modified TestNN.test_pack_sequence, TestNN.test_packed_padded_sequence,
  and TestNN.test_variable_sequence (RNN test) to check the behavior
  of unsorted sequences, sorted sequences, and sorted sequences with
  enforce_sorted=True
- test/test_jit.py has a test to see if RNNs are exportable with
  enforce_sorted=True

cc colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15225

Reviewed By: soumith

Differential Revision: D13507138

Pulled By: zou3519

fbshipit-source-id: b871dccd6abefffca81bc4e3efef1873faa242ef
2018-12-20 17:37:18 -08:00
Gao, Xiang
a47749cb28 Add at::one_hot (#15208)
Summary: Closes: https://github.com/pytorch/pytorch/issues/15060

Differential Revision: D13528014

Pulled By: ezyang

fbshipit-source-id: 5a18689a4c5638d92f9390c91517f741e5396293
2018-12-20 14:24:58 -08:00
Erik Brinkman
8db44eda01 Add support for batched pdist (#12302)
Summary:
This updates pdist to work for batched inputs, and updates the
documentation to reflect issues raised.

closes #9406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12302

Reviewed By: ezyang

Differential Revision: D13528485

Pulled By: erikbrinkman

fbshipit-source-id: 63d93a6e1cc95b483fb58e9ff021758b341cd4de
2018-12-20 09:41:08 -08:00
Peter Goldsborough
aec9fdf0a4 Fix _apply in nn.Module (#15305)
Summary:
Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module).

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305

Differential Revision: D13493937

Pulled By: goldsborough

fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750
2018-12-17 16:22:21 -08:00
David Riazati
59d71b9664 Bicubic interpolation for nn.functional.interpolate (#9849)
Summary:
Addresses #918, interpolation results should be similar to tf

* Adds bicubic interpolation operator to `nn.functional.interpolate`
* Corresponding test in `test_nn.py`

The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849

Differential Revision: D9007525

Pulled By: driazati

fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc
2018-12-17 15:31:48 -08:00
Edward Yang
dcd1685282 Revert D13440858: [pytorch][PR] Use a pool of per-thread cudnn handles for each device, updated
Differential Revision:
D13440858

Original commit changeset: 1c6af5c53538

fbshipit-source-id: fda42ea75000d4a4e9c4a8eeaaa5518f7ad9c298
2018-12-14 14:35:01 -08:00
Chaitanya Sri Krishna Lolla
9f1d8f2eeb enabled tests in test_nn, test_cuda and test_sparse (#15232)
Summary:
tests work on ROCm 1.9.2 as present on CI (fp16 bringup, hipMemset and sparse improvements)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15232

Differential Revision: D13470991

Pulled By: bddppq

fbshipit-source-id: 45acc4f9ea5baaaf7672b86eb022948055779925
2018-12-14 14:27:57 -08:00
Michael Carilli
ca4358c8f5 Use a pool of per-thread cudnn handles for each device, updated (#15080)
Summary:
Rebased version of https://github.com/pytorch/pytorch/pull/14861, hopefully addressing ezyang's comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15080

Differential Revision: D13440858

Pulled By: ezyang

fbshipit-source-id: 1c6af5c53538b81c6b92cf1dda231ed333f28035
2018-12-13 10:24:06 -08:00
Johannes M Dieterich
6610ace28b use ROCm 1.9.2 fp16 capabilities in rocBLAS and MIOpen interfaces (#14994)
Summary:
* relax MIOpen if statement to allow fp16/fp32 mixed precision training now supported by ROCm 1.9.2
* use gemm_ex API of rocBLAS in ROCm 1.9.2 instead of the previous hgemm API
* with this: enable all but one half test in test_nn

While there, fix also:
* a group convolution issue w/ MIOpen pertaining to initializing MIOpen on multi-GPU systems properly we detected while working on this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14994

Differential Revision: D13439869

Pulled By: bddppq

fbshipit-source-id: 75e4eb51a59488882e64b5eabdc30555b25be25e
2018-12-12 16:16:47 -08:00
Natalia Gimelshein
27d5ae7afb use datatype dependent tolerance in data parallel tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14856

Differential Revision: D13413560

Pulled By: soumith

fbshipit-source-id: b3a0cfe93477ed332e6eaa2e39ef5f4cc8b36481
2018-12-10 22:50:27 -08:00
Johannes M Dieterich
52942e1f09 Enable unit tests known to work on ROCm (#14011)
Summary:
* Enable unit tests known to work on ROCm.
* Disable a few that are known to be flaky for the time being.
* Use std::abs for Half
* No more special casing for ROCm in TensorMathReduce
* Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce

ezyang bddppq for awareness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011

Differential Revision: D13387679

Pulled By: bddppq

fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71
2018-12-07 18:57:32 -08:00
Ailing Zhang
ef91cfd68b Add new reduction mode in kl_div (#14457)
Summary:
Fixes #6622 .
We used to average over all elements for kl divergence, which is not aligned with its math definition.
This PR corrects the default reduction behavior of KL divergence that it now naverages over batch dimension.

- In KL, default behavior `reduction=mean` averages over batch dimension. While for most other loss functions, `reduction=mean` averages over all elements.
- We used to support scalar tensor as well. For BC purpose, we still support it, no reduction is performed on scalar tensor.
- Added a new reduction mode called `batchmean` which has the correct behavior for KL. Add a warning to make `batchmean` as default for KL instead of `mean` in next major release.
- [deprecated]I chose to not add a new reduction option, since "mean over batch dimension" is kinda special, and it only makes sense in few cases like KL. We don't want to explain why there's a option "batchmean" but it's not applicable for all other functions. I'm open to discussion on this one, as I cannot think of a perfect solution for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14457

Differential Revision: D13236016

Pulled By: ailzhang

fbshipit-source-id: 905cc7b3bfc35a11d7cf098b1ebc382170a087a7
2018-12-04 12:24:28 -08:00
David Riazati
814b5715ba Move module tests to common_nn (#14578)
Summary:
This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so
that they can be used in `test_jit.py` without running any of
`test_nn.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578

Differential Revision: D13268286

Pulled By: driazati

fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03
2018-11-30 12:14:59 -08:00
David Riazati
1f6d9f44fc Add InstanceNorm, Distance modules to Script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551

Differential Revision: D13272741

Pulled By: driazati

fbshipit-source-id: 3e4fe870d0e268903757f3ae8a56100606906bce
2018-11-29 22:18:55 -08:00
David Riazati
67e3905bc6 Revert D13268293: [pytorch][PR] [jit] Add InstanceNorm, Distance modules to Script
Differential Revision:
D13268293

Original commit changeset: cb33c6dcdadd

fbshipit-source-id: 214a29b74c85b7b25df0eb48e3fdb81539049130
2018-11-29 19:19:35 -08:00
David Riazati
75eccffdfe Add InstanceNorm, Distance modules to Script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551

Differential Revision: D13268293

Pulled By: driazati

fbshipit-source-id: cb33c6dcdaddf8c7a49b3535894d77bf5d771ddd
2018-11-29 17:26:29 -08:00
David Riazati
666d383a00 Add broadcast list default arg support (#14361)
Summary:
To convert `max_unpool` functions to weak script, this PR adds support
for `T` as default arguments for `BroadcastingListN[T]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361

Differential Revision: D13192231

Pulled By: driazati

fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249
2018-11-29 15:15:47 -08:00
David Riazati
9e93a02624 Use nn module tests in test_jit (#14238)
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13252887

Pulled By: driazati

fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988
2018-11-28 23:31:25 -08:00
David Riazati
3d98810fbd Revert D13192230: [pytorch][PR] [jit] Use nn module tests in test_jit
Differential Revision:
D13192230

Original commit changeset: 36488960b6c9

fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909
2018-11-28 00:23:09 -08:00
David Riazati
4cdcbbf410 Use nn module tests in test_jit (#14238)
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13192230

Pulled By: driazati

fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e
2018-11-27 21:19:51 -08:00
Jan Schlüter
c19af59a6e Use integer math to compute output size of pooling operations (#14405)
Summary:
As reported in #13386, the pooling operations can return wrong results for large inputs. The root of the problem is that while the output shape is initially being computed with integer operations, it is converted to float32 for division by the stride and applying either a `ceil` or a `floor` depending on the `ceil_mode`. Since even moderately large integers (the smallest being 16,777,217) cannot be expressed exactly in float32, this leads to wrong result shapes.

This PR relies purely on integer operations to perform the shape computation, including the ceil/floor distinction. Since I could not stand all that duplicated code, I pulled it out into a `pooling_shape.h` header, similar to the existing `linear_upsampling.h` header. I hope this is acceptable, let me know if you'd like to see it solved differently. I've also added tests to `test_nn.py` that fail without my changes and pass with my changes. They cover `{max,avg}_pool{1,2,3}d()` for CPU and GPU.

Fixes #13386.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14405

Differential Revision: D13215260

Pulled By: soumith

fbshipit-source-id: 802588ce6cba8db6c346448c3b3c0dac14d12b2d
2018-11-27 09:38:06 -08:00
Brian Vaughan
e4bb56570c Preemptively test for out-of-order length. (#13933)
Summary:
torch.nn.utils.rnn.pack_padded_sequence segment fault if not in
decreasing order #13324

We were seeing this segfault on throw, pre-emptively checking avoids
this:

*** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 ***
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13933

Differential Revision: D13090389

Pulled By: nairbv

fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8
2018-11-16 08:39:05 -08:00
lyuwenyu
1b1cdd944c Keep ModuleList consistent with python list in __setitem__ function. (#13102)
Summary:
`ModuleList` class function `__setitem__` has implicit rist
```
In [26]: mlist = nn.ModuleList([nn.ReLU(), nn.Conv2d(10, 10, 3, 1)])

In [27]: mlist
Out[27]:
ModuleList(
  (0): ReLU()
  (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
)

In [28]: mlist[-1] = nn.ReLU()

In [29]: mlist
Out[29]:
ModuleList(
  (0): ReLU()
  (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
  (-1): ReLU()
)

In [30]: mlist[-1]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-30-229d1b6823a0> in <module>()
----> 1 mlist[-1]

~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py in __getitem__(self, idx)
    134             return ModuleList(list(self._modules.values())[idx])
    135         else:
--> 136             return self._modules[self._get_abs_string_index(idx)]
    137
    138     def __setitem__(self, idx, module):

KeyError: '2'

```

modified as
```
    def __setitem__(self, idx, module):
        idx = self._get_abs_string_index(idx)
        return setattr(self, str(idx), module)
```
to fix it.

```
In [31]: class NewModuleList(nn.ModuleList):
    ...:     def __setitem__(self, idx, module):
    ...:         idx = self._get_abs_string_index(idx)
    ...:         return setattr(self, str(idx), module)
    ...:

In [32]: mlist = NewModuleList([nn.ReLU(), nn.Conv2d(10, 10, 2, 1)])

In [33]: mlist[-1] = nn.ReLU()

In [34]: mlist
Out[34]:
NewModuleList(
  (0): ReLU()
  (1): ReLU()
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13102

Differential Revision: D13092480

Pulled By: ezyang

fbshipit-source-id: 7ff7688f66e44bbd263a10d2d09db7bb0df4b749
2018-11-16 07:39:26 -08:00
Gregory Chanan
02152c515e Ensure nn Losses check scalar vs non-scalar values.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13860

Reviewed By: ezyang

Differential Revision: D13029364

Pulled By: gchanan

fbshipit-source-id: 20f1330fa181e52aea1f879dc655a9a6f62b5f53
2018-11-14 16:46:27 -08:00
Gregory Chanan
4341dd2753 Move most sccalar checks from nn.yaml into THNN/THCUNN code. (#13906)
Summary:
This includes everything in nn.yaml except for convolutions, multi_margin_loss, multi_label_margin_loss, nll_loss, and nll_loss2d.

Note that scalar_check False just means we don't do any extra scalar checks (we could elide this from the generated code, which I may do in a later commit).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13906

Reviewed By: ezyang

Differential Revision: D13044507

Pulled By: gchanan

fbshipit-source-id: ebd3bdca2bcf512ca44de1ce3be81946f6c0828e
2018-11-14 07:58:35 -08:00
Sam Gross
c46dd5163f Temporarily disable part of test_spectral_norm (#13908)
Summary:
See #13818 for suggestions about a long-term fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13908

Differential Revision: D13047262

Pulled By: colesbury

fbshipit-source-id: 0f29bd5b659bb97826381abbc305fb8a25b131ed
2018-11-13 14:19:16 -08:00
Ailing Zhang
a17c0118a5 fix stability in bce with pos_weight formula (#13863)
Summary:
Fixes #13773
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13863

Differential Revision: D13031803

Pulled By: ailzhang

fbshipit-source-id: 6c9e044f0450eebf4555bbc02c125713d9378e2f
2018-11-12 22:04:24 -08:00
Johannes M Dieterich
ce48958606 enable more unit tests (#13166)
Summary:
This enables the distributions and utils test sets for ROCm.
Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit.

For attention: bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166

Differential Revision: D12814759

Pulled By: bddppq

fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e
2018-11-12 18:49:52 -08:00
Thomas Viehmann
14004cbef6 Native batch norm (#13263)
Summary:
- Move batch norm from TH(CU)NN to native
- Speedups in many cases (e.g. #12006) for CUDA due to new block/grid layout and Welford-type mean/variance calculations (the latter for training mode)
- It splits the forward kernel in two pieces and reuses the evaluation kernel for the transformation.
- We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision.

Compared to the ill-fated #12368
- I changed the CPU kernel to not call `.sum()` from within parallel for. This seemed to have caused the breakage (NaN-results) in TestModels.test_dcgan_netG (thank you houseroad for the repro, errors in assessment of the fix are my own)
- I updated the Half->Float upcasting in tensors to go through `t.type().scalarType()` instead of `t.dtype()`.
- I have merged master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13263

Differential Revision: D12946254

Pulled By: SsnL

fbshipit-source-id: 3bb717ee250fbccaf10afe73722996aa4713d10d
2018-11-06 20:05:54 -08:00
Tongzhou Wang
2cd912bcc2 Fix more spectral norm bugs (#13350)
Summary:
Problems with SN and DP after #12671 :
1. in eval mode, `weight_orig` is not getting correct gradient #12737 .

    Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval.

2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong.

    Fix: Make `weight` not a buffer anymore and always calculate it as above.

3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward.

    Fix: This PR clones `u` and `v` before using them.

To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done.

cc The controller you requested could not be found. crcrpar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350

Differential Revision: D12931044

Pulled By: SsnL

fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef
2018-11-06 19:16:13 -08:00
Soumith Chintala
a7ee632dff Various Test and build fixes (#13556)
Summary:
- fixes weights-contiguous requirement for THCUNN Convolutions
- Add tests that conv backward pass works for non-contiguous weights
- fix RNN tests / error messages to be consistent and pass
- relax weight grad precision for fp16 for a particular test
- fix regression of CMAKE_PREFIX_PATH not passing through
- add missing skipIfNoLapack annotations where needed

Differential Revision: D12918456

Pulled By: soumith

fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05
2018-11-06 07:13:47 -08:00
Sam Gross
98f5c005da Speed up CPU threshold and relu implementation (#13182)
Summary:
```
The previous threshold implementation was not vectorized or parallelized.
This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms

CPU timings:
https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8

1 thread (before vs. after)
10240:  17.4 us vs. 6.9 µs per loop
102400: 141 us vs. 39.8 µs per loop

16 threads (before vs. after)
10240:  17.4 us vs. 6.7 µs per loop
102400: 141 us vs. 14.3 µs per loop

CUDA timings are not measurably different.

[1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions
https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182

Reviewed By: soumith

Differential Revision: D12825105

Pulled By: colesbury

fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15
2018-11-05 12:51:29 -08:00
Tongzhou Wang
9f2b2cac37 Fix handling all empty bags in CUDA embedding bag (#13483)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11847
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13483

Differential Revision: D12902914

Pulled By: SsnL

fbshipit-source-id: 577a53e815231e988da716b1ee5667e1f36408ca
2018-11-02 10:21:14 -07:00
Tongzhou Wang
99a5d19591 Rename elementwise_mean to mean (#13419)
Summary:
Closes #12459
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419

Differential Revision: D12883299

Pulled By: SsnL

fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7
2018-11-01 10:31:26 -07:00
Ailing Zhang
488d393ea6 Fix pointwise loss broadcast (#12996)
Summary: Fixes #12129 , #12327

Differential Revision: D10513781

Pulled By: ailzhang

fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec
2018-10-31 10:17:25 -07:00
Lu Fang
f8864f0505 Revert "Move batch_norm to ATen/native, speed up (#12368)" (#13191)
Summary:
Revert #12368 since it's causing onnx related test cases failing.
https://github.com/pytorch/pytorch/pull/12368

SsnL The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13191

Reviewed By: BIT-silence

Differential Revision: D12810778

Pulled By: houseroad

fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577
2018-10-26 23:05:50 -07:00
Zachary DeVito
dae7616078 Shard all of tests based on how many tests exist. (#13160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160

Reduces pytorch_core build from 2 hours to 30 minutes

Reviewed By: soumith, dzhulgakov

Differential Revision: D10524261

fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9
2018-10-26 18:20:34 -07:00
Thomas Viehmann
dc211c7de4 Move batch_norm to ATen/native, speed up (#12368)
Summary:
- Speed up the case of #12006 in the forward
- The backward still isn't as fast as one might hope (factor 2-3 in the #12006 case).
- More extensive benchmarking shows not so great performance compared
  to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024.
- We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to
  maintain reasonable precision.

Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12368

Differential Revision: D10559696

Pulled By: SsnL

fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419
2018-10-25 23:41:10 -07:00
Richard Zou
7863c17b26 Fix convtranspose3d output_size calculation (#12952)
Summary:
Closes #2119.

There was a small bug where the output_size got sliced with `[-2:]`
where we really meant to slice it as `[2:]` (to remove the batch and
channel dimensions).

Added a new test for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12952

Differential Revision: D10510678

Pulled By: zou3519

fbshipit-source-id: 4c04a5007fc6d002e1806d6fe981b43d33d6a4f2
2018-10-24 09:23:05 -07:00
Soumith Chintala
cf235e0894 fix lint after new flake8 release added new style constraints (#13047)
Summary:
fix lint after new flake8 release added new style constraints
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047

Differential Revision: D10527804

Pulled By: soumith

fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8
2018-10-24 09:03:38 -07:00
Wei Yang
710191e292 fix error message of large kernel size in conv2D (#12791)
Summary:
- fix #12565
- test plan:
with this fix, we have:
```
>>> m = nn.Conv2d(in_channels=3, out_channels=33, kernel_size=10, stride=1, bias=True)
>>> input = torch.randn(1, 3, 1, 1)
>>> output = m(input)
```
RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (10 x 10). Kernel size can't be greater than actual input size at ~/pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:50

not sure why these are `int` instead of `int64_t`:
5ccdd7a626/aten/src/THNN/generic/SpatialConvolutionMM.c (L10)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12791

Differential Revision: D10443045

Pulled By: weiyangfb

fbshipit-source-id: 2620acb40bdd49d29cec06337f6dfb4653d1987c
2018-10-18 00:51:16 -07:00
James Sun
f4944f0f8a Rename test/common.py to test/common_utils.py (#12794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794

common.py is used in base_module for almost all tests in test/. The
name of this file is so common that can easily conflict with other dependencies
if they happen to have another common.py in the base module. Rename the file to
avoid conflict.

Reviewed By: orionr

Differential Revision: D10438204

fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380
2018-10-17 23:04:29 -07:00
Thomas Viehmann
ba25e13782 Forbid Module.to with copy argument. (#12617)
Summary:
Module.to uses the Tensor.to parsing facility.
It should not, however, accept "copy" as a keyword/fourth positional
argument.

See #12571 for discussion.

Thank you SsnL for noticing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617

Differential Revision: D10392053

Pulled By: ezyang

fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c
2018-10-16 20:31:44 -07:00
Tongzhou Wang
ac994f2c78 Fix SpectralNorm with DataParallel (#12671)
Summary:
There were two problems with SN + DP:

1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost.
2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work.

Fixes are:
1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained
2. Do not call `detach_`.
3. Added comments in SN about the subtlety.
4. Added a note to the DP doc on this particular behavior of DP.

cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu

Fixes https://github.com/pytorch/pytorch/issues/11476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671

Differential Revision: D10410232

Pulled By: SsnL

fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9
2018-10-16 16:02:17 -07:00
Ailing Zhang
e15501fb68 fix bce_with_logits with legacy reduce (#12689)
Summary:
Fix #12624 . internal usecase of legacy `reduce`.
Add test in test_nn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689

Reviewed By: ezyang

Differential Revision: D10391195

Pulled By: ailzhang

fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1
2018-10-16 09:46:58 -07:00
Natalia Gimelshein
a98958d3bd dtype option for softmax (#11719)
Summary:
Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719

Reviewed By: ezyang

Differential Revision: D10175514

Pulled By: zou3519

fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243
2018-10-13 17:57:10 -07:00
Tongzhou Wang
d400502b1d Fix a bunch of warnings in TestNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12453

Differential Revision: D10244130

Pulled By: SsnL

fbshipit-source-id: e425c76bfb721fe118a32ddd1fa6eca3a3cd86f0
2018-10-08 17:38:23 -07:00
daquexian
f8086845aa Fix bug in grad.py when conv bias != None (#12281)
Summary:
Obviously, the grads of conv weight and conv input are not relevant to the bias, but the original `convXd_input` and `convXd_weight` methods receive a `bias` parameter. What's more, while the doc says `bias` should have the shape `(out_channels,)`, one will get a `RuntimeError` if the bias != None and in_channels != out_channels, for the weight of transposed conv has the shape `(in_channels, out_channels, kH, kW)` while the weight of vanilla conv has the shape `(out_channels, in_channels, kH, kW)`
```
RuntimeError: Given transposed=1, weight of size [channel1, channel2, kH, kW], expected bias to be 1-dimensional with channel2 elements, but got bias of size [channel1] instead
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12281

Differential Revision: D10217370

Pulled By: ezyang

fbshipit-source-id: bc00b439e5ae539276a5e678bdb92af700197bb2
2018-10-05 12:55:14 -07:00
Johannes M Dieterich
c9f7d7b506 mark unit tests as working, skip failing unit test (#12313)
Summary:
* enabled fp16 tests for test_torch

* enable fp16 tests for test_nn

* enabled multilabelmargin loss for fp16

* removed skip for test_pdist_empty_col

* Enable test_nn tests that pass with compiler fixes etc.

* Enable test_legacy_nn tests that pass with compiler fixes etc.

ezyang bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313

Differential Revision: D10189922

Pulled By: bddppq

fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6
2018-10-03 23:56:26 -07:00
iotamudelta
a2ebbccc9f fix unit tests on CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187

Differential Revision: D10118483

Pulled By: bddppq

fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442
2018-09-28 23:11:55 -07:00
Wei Yang
de11fe0c83 migrate PReLU to ATen (#11758)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10723
- migrate PReLU to ATen and deprecate legacy PReLU
- performance:

CPU with weight.numel() = 1
```
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
100 loops, best of 100: 9.43 ms per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
10 loops, best of 100: 24.4 ms per loop

>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 695 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.47 ms per loop
```

CPU with weight.numel() = channels
```
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 603 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 13.3 ms per loop

>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 655 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.45 ms per loop
```

CUDA with weight.numel() = 1
```
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 187 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.01 ms per loop

>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 195 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.28 ms per loop
```

CUDA with weight.numel() = channel
```
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 174 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.27 ms per loop

>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 181 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.26 ms per loop
```

The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels.

ezyang SsnL zou3519  soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758

Differential Revision: D9995799

Pulled By: weiyangfb

fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e
2018-09-21 16:26:04 -07:00
Thomas Viehmann
775358e4c2 Add non-legacy test of bilinear (#11935)
Summary:
Fixes: #11905
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11935

Differential Revision: D9991120

Pulled By: soumith

fbshipit-source-id: b00ad4f405440664ae5228b229a2ba0a5d3d92f6
2018-09-21 12:43:35 -07:00
Christian Puhrsch
d8f6be686d Remove torch/legacy (#11823)
Summary:
Largely unused and hinders current development
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11823

Differential Revision: D9925094

Pulled By: cpuhrsch

fbshipit-source-id: c797f62180e2128f9a567b0c57c8347957470ea5
2018-09-20 14:00:54 -07:00