pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Guanheng Zhang	4b20fc826d	Import MultiheadAttention to PyTorch (#18334 ) Summary: Import MultiheadAttention into the core pytorch framework. Users now can import MultiheadAttention directly from torch.nn. See "Attention Is All You Need" for more details related to MultiheadAttention function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18334 Differential Revision: D14577966 Pulled By: zhangguanheng66 fbshipit-source-id: 756c0deff623f3780651d9f9a70ce84516c806d3	2019-04-11 08:07:30 -07:00
Richard Zou	447d74a074	EmbeddingBag w/ differentiable per_sample_weights (#18957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18957 ghimport-source-id: 7396ca08b137ea40f04285764a9d9a6d4f19227e Reviewed By: cpuhrsch Differential Revision: D14856526 Pulled By: zou3519 fbshipit-source-id: 949faea219c7c02ad981b1db610a477194d3f5c9	2019-04-09 18:13:06 -07:00
Richard Zou	c889ff6cf8	EmbeddingBag w/ per_sample_weights CUDA fwd + bwd (#18800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18800 ghimport-source-id: 17f638dea0e1ac9a86ec06b223c60362ed78449c Reviewed By: cpuhrsch Differential Revision: D14851422 Pulled By: zou3519 fbshipit-source-id: 27b114e51e66112e4bc9cfc63d1d1ddfa650d347	2019-04-09 18:13:02 -07:00
Richard Zou	0397d7c0c8	EmbeddingBag w/ per_sample_weights CPU backward (#18799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18799 ghimport-source-id: 58a6f629e890449013f24a9b6282664ca2a1e3ba Reviewed By: cpuhrsch Differential Revision: D14851417 Pulled By: zou3519 fbshipit-source-id: c36b9d469989354bf6cef1c2c3dc4f13e7cb1a25	2019-04-09 18:12:59 -07:00
Richard Zou	2a2007e5ac	EmbeddingBag CPU forward with per_sample_weights. (#18735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18735 ghimport-source-id: d81bef54dafd7167d2451250d7be478d3c013920 Reviewed By: cpuhrsch Differential Revision: D14851415 Pulled By: zou3519 fbshipit-source-id: cea6039e760ad571b90f0a536e420498f34be325	2019-04-09 18:12:55 -07:00
J M Dieterich	e45e3634d6	add launch bounds, enable more tests (#18909 ) Summary: Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use. Enable tests that now work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909 Differential Revision: D14801490 Pulled By: ezyang fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7	2019-04-05 10:17:15 -07:00
Sepehr Sameni	b11a8c6aef	return missing keys from load_state_dict (#18668 ) Summary: return missing_keys and unexpected_keys from load_state_dict so the user can handle them when strict mode is off; also removed an unused variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/18668 Differential Revision: D14782073 Pulled By: ezyang fbshipit-source-id: ab3b855eb77bb7422594d971988067e86eef20f2	2019-04-04 18:11:56 -07:00
Roy Li	c705d9eb1e	Introduce DeprecatedTypeProperties class (#17991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17991 changes: -Breaks bc: Tensor::type() now returns DeprecatedTypeProperties& rather than Type&. -Added DeprecatedTypeProperties, it serves as a temporary replacement for Type as the return value of Tensor::type(). This contributes to making Type just for dispatch purposes so that we can make it dtype agnostic. -Tensor::dispatch_type() now returns Type& like Tensor::type() used to do. -Changed callsites of Tensor::type() appropriately. Reviewed By: ezyang Differential Revision: D14443117 fbshipit-source-id: 239ccb7a09626279a71d1a37f8f82e7f57bf7d9e	2019-04-04 02:24:13 -07:00
Shen Li	7ae0263e1b	Support replicating multi-GPU modules (#18687 ) Summary: If the input `network` resides on multiple GPUs, `devices` must be a 2D list with `devices[0]` matching `network`'s devices. See #18591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18687 Differential Revision: D14706162 Pulled By: mrshenli fbshipit-source-id: dca630d3308f2dbcf8b75629c452d7a64092ba42	2019-04-03 14:43:07 -07:00
kshitij12345	0916b5419a	Fix dense Embedding to work with double backward (#9078 ) Summary: Fixes : #6469 1. `ATen/native/native_functions.yml` had [dispatch](`03e7953a98/aten/src/ATen/native/native_functions.yaml (L451-L455)`) variants for for `embedding_dense_backward` , however `embedding_backward` explicitly made [call](`03e7953a98/aten/src/ATen/native/Embedding.cpp (L35-L45)`) to it, thus leading to error. 2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data [pointer](`03e7953a98/aten/src/ATen/native/Embedding.cpp (L93)`). Both have been solved and checked against (on CUDA and CPU) 1. As mentioned in the issue ``` import torch class Test(torch.nn.Module): def __init__(self): super(Test,self).__init__() self.embd = torch.nn.Embedding(1000, 100) self.dense = torch.nn.Linear(100, 1) def forward(self, inp): inp = self.embd(inp) return self.dense(inp) test = Test() inp = torch.tensor([0,1,2,1,1]) out = test(inp) raw_loss = out.mean(dim=0) loss_grad = torch.autograd.grad(outputs=raw_loss, inputs=list(test.parameters()), retain_graph=True, create_graph=True, only_inputs=True) norm = sum([param.norm()*2 for param in loss_grad]) loss = raw_loss + norm loss.backward(retain_graph=True) print(test.embd.weight.grad) ``` 2. Test Script ``` import torch import time start = time.time() l = [1,1]100 input = torch.tensor([[1,0],[1,0]],device='cpu') embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu') sq = embedding_matrix * embedding_matrix emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False) print('Embedding Matrix') print(embedding_matrix) print('-----------------') sum_ = emb.sum()#prod.sum() loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True) print('Gradient') print(loss_grad) print('-----------------') sum2_ = sum_ + loss_grad.sum() print(sum2_) sum2_.backward() print(embedding_matrix.grad) print(time.time() - start) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9078 Reviewed By: ezyang Differential Revision: D14691901 Pulled By: soumith fbshipit-source-id: 78e2612ba39080be564c876311671eb5a0119a0f	2019-04-03 09:50:34 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
jithunnair-amd	fdedc62c26	enable more unit tests (#18537 ) Summary: Enable unit tests working with ROCm 2.3. In particular, these are unit tests where we skipped for double data types previously and some tests for multi-GPU setups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18537 Differential Revision: D14651822 Pulled By: ezyang fbshipit-source-id: 7dd575504ebe235a91489866c91000e9754b1235	2019-03-27 14:27:23 -07:00
mc-robinson	8bc5b86709	Added tensor size warning to F.mse_loss() (#18349 ) Summary: To address the issue of broadcasting giving the wrong result in `nn.MSELoss()` as mentioned here https://github.com/pytorch/pytorch/issues/16045 . In particular, the issue often arises when computing the loss between tensors with shapes (n, 1) and (n,) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18349 Differential Revision: D14594176 Pulled By: soumith fbshipit-source-id: f23ae68a4bf42f3554ad7678a314ba2c7532a6db	2019-03-24 19:22:14 -07:00
Will Feng	7be05b822c	Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values (#18179 ) Summary: Currently, this code gives incorrect result: ```python import torch indices=torch.tensor([[7, 1, 3]]) values=torch.tensor([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]) x = torch.sparse_coo_tensor(indices, values, size=(10, 3)) values=torch.tensor(1.).expand(3, 3) y = torch.sparse_coo_tensor(indices, values, size=(10, 3)) z = x + y tensor(indices=tensor([[7, 1, 3]]), values=tensor([[2., 1., 1.], [1., 1., 1.], [1., 1., 1.]]), size=(10, 3), nnz=3, layout=torch.sparse_coo) ``` This PR fixes the bug by adding special handling for sparse tensors with non-contiguous values in the addition function (specifically, by cat'ing the indices and values together). This PR closes https://github.com/pytorch/pytorch/issues/17950 and https://github.com/pytorch/pytorch/issues/17919. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18179 Reviewed By: ezyang Differential Revision: D14569591 Pulled By: yf225 fbshipit-source-id: f5a14c4a31337fc95eab64596212066b4fb18b1a	2019-03-22 19:35:14 -07:00
Ailing Zhang	fe5d23cf4a	Using sqrt for better precision in cosine_similarity (#18250 ) Summary: address comment in #18168 . Testing in CI... Pull Request resolved: https://github.com/pytorch/pytorch/pull/18250 Differential Revision: D14568601 Pulled By: ailzhang fbshipit-source-id: 39fbbdb08743b53fa665c7e88e4750cbe0976ec7	2019-03-22 13:33:30 -07:00
Edward Yang	ba81074c40	Fix B902 lint error: invalid first argument. (#18181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181 ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18184 Fix B903 lint: save memory for data classes with slots/namedtuple * #18181 Fix B902 lint error: invalid first argument. * #18178 Fix B006 lint errors: using mutable structure in default argument. * #18177 Fix lstrip bug revealed by B005 lint A variety of sins were committed: - Some code was dead - Some code was actually a staticmethod - Some code just named it the wrong way - Some code was purposely testing the omitted case Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14530876 fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e	2019-03-21 09:10:28 -07:00
Ailing Zhang	8895bfba6a	fix cosine_similarity (#18168 ) Summary: fixes #18057 according to colesbury 's suggestion. Thanks! cc: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/18168 Differential Revision: D14520953 Pulled By: ailzhang fbshipit-source-id: 970e6cfb482d857a81721ec1d0ee4a4df84a0450	2019-03-19 20:09:17 -07:00
Edward Yang	3fe7bdb2ff	Fix lint in test_nn.py (#18006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18006 ghimport-source-id: e267ece1ac03e0d17e01dddf4a77f52421859435 Stack: * #18006 Fix lint in test_nn.py Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: eellison Differential Revision: D14458108 fbshipit-source-id: 18ee6199447efed55a922cff5b3ad940a21c0536	2019-03-14 08:59:24 -07:00
bhushan	16e50c78e7	Report convolution size mismatch (#17436 ) Summary: 1. Kernel size is larger than input 2. Expected output size to be less than zero Test case added: - invalid_conv1d - Relevant test cases for conv2d and conv3d exists Fixes #17247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17436 Reviewed By: mrshenli Differential Revision: D14354272 Pulled By: fmassa fbshipit-source-id: 94b98621aa03b1f60d151ef9399ed3da55d41b42	2019-03-14 06:35:29 -07:00
Soumith Chintala	a478d41620	Fix nll_loss crash on cpu where ignore_index is out of bounds (#17328 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15508 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17328 Differential Revision: D14322629 Pulled By: soumith fbshipit-source-id: 7d02f372be78794782c18affcfc109ce30b1e91c	2019-03-05 14:35:05 -08:00
Tongzhou Wang	44a607b90c	Fix autograd with buffers requiring grad in DataParallel (#13352 ) Summary: Causing a problem with spectral norm, although SN won't use that anymore after #13350 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/13352 Differential Revision: D14209562 Pulled By: ezyang fbshipit-source-id: f5e3183e1e7050ac5a66d203de6f8cf56e775134	2019-02-26 20:53:19 -08:00
vishwakftw	724c7e76c6	Fix reduction='none' in poisson_nll_loss (#17358 ) Summary: Changelog: - Modify `if` to `elif` in reduction mode comparison - Add error checking for reduction mode Pull Request resolved: https://github.com/pytorch/pytorch/pull/17358 Differential Revision: D14190523 Pulled By: zou3519 fbshipit-source-id: 2b734d284dc4c40679923606a1aa148e6a0abeb8	2019-02-25 10:35:33 -08:00
Tongzhou Wang	3d5968d366	Fix DataParallel(cpu_m).cuda() not working by checking at forward (#17363 ) Summary: Fixes #17362 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17363 Differential Revision: D14175151 Pulled By: soumith fbshipit-source-id: 7b7e2335d553ed2133287deeaca3f6b6254aea4a	2019-02-22 08:31:36 -08:00
Natalia Gimelshein	5fa78303ed	fix double backward for half softmax/logsoftmax (#17330 ) Summary: Fix for #17261, SsnL do you have tests for it in your other PR? If not, I'll add to this. Example from #17261 now does not error out (and same for log_softmax). Pull Request resolved: https://github.com/pytorch/pytorch/pull/17330 Differential Revision: D14171529 Pulled By: soumith fbshipit-source-id: ee925233feb1b44ef9f1d757db59ca3601aadef2	2019-02-21 14:58:45 -08:00
Soumith Chintala	c63af8837d	remove nn.Upsample deprecation warnings from tests (#17352 ) Differential Revision: D14168481 Pulled By: soumith fbshipit-source-id: 63c37c5f04d2529abd4f42558a3d5e81993eecec	2019-02-21 11:27:24 -08:00
Tongzhou Wang	2b57bdb7ab	Fix cuda softmax backward with empty input (#17259 ) Summary: Fixes #17256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17259 Differential Revision: D14142196 Pulled By: soumith fbshipit-source-id: 1f2dc202951b59b43da27684f9f924314bcd3040	2019-02-19 16:41:52 -08:00
Jie	594a4d7b55	at::native batch norm kernel launch config update (#17047 ) Summary: limit block dimension to avoid configuration error on batch norm kernel launch This should resolve #16998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17047 Differential Revision: D14142132 Pulled By: soumith fbshipit-source-id: 9c8c52dcd1d108cda1f65f5227e625b8fe6e12a0	2019-02-19 16:41:51 -08:00
Sergei Nikolaev	6455d91e4d	False alarm about leak in TestNN.test_variable_sequence_cuda (#17242 ) Summary: `TestNN.test_variable_sequence_cuda` sometimes brakes due to CUDA leak. The cause appears to be too small tolerance breaking float16 sub-test of the test above. When it breaks it calls abort disrupting correct tear down of the test and false alarming about the leak. ~~Also, removed annoying Upsample module warning. IMHO this warning is wrong because the module Upsample is not deprecated. Seems like it's been mixed with `nn.functional.upsample` function which is indeed deprecated in favor of `nn.functional.interpolate`, see `torch/nn/functional.py:2387` for details (this replacement is also performed in `test_nn.py`).~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/17242 Differential Revision: D14141686 Pulled By: soumith fbshipit-source-id: faa8f87440d94bdc6ab0ff00be6dad82353115c4	2019-02-19 15:59:30 -08:00
Shen Li	472cfc0f2c	Enforce module device at DataParallel construction time (#17129 ) Summary: closes #17065 CC douwekiela Pull Request resolved: https://github.com/pytorch/pytorch/pull/17129 Differential Revision: D14093353 Pulled By: mrshenli fbshipit-source-id: 9a5a10f16e392337a7f7073223541cf69b402f82	2019-02-15 11:14:46 -08:00
wbydo	6c67dcfb05	Fix AdaptiveLogSoftmaxWithLoss's constructor (#16694 ) Summary: t-ken1 and I are members of a same team. I have added test codes about the pull request https://github.com/pytorch/pytorch/pull/16656. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16694 Differential Revision: D14070106 Pulled By: ezyang fbshipit-source-id: ff784dbf45e96a6bcf9a4b5cb9544a661a8acad2	2019-02-15 06:58:00 -08:00
Junjie Bai	9b7f3da74b	Skip test_cudnn_multiple_threads_same_device on ROCm (flaky) (#17061 ) Summary: cc iotamudelta https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10722//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10710//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10753//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/1756//console ``` 19:07:18 ====================================================================== 19:07:18 FAIL: test_cudnn_multiple_threads_same_device (test_nn.TestNN) 19:07:18 ---------------------------------------------------------------------- 19:07:18 Traceback (most recent call last): 19:07:18 File "/var/lib/jenkins/workspace/test/test_nn.py", line 3905, in test_cudnn_multiple_threads_same_device 19:07:18 (2048 - test_iters) * (2048 - test_iters)) 19:07:18 File "/var/lib/jenkins/workspace/test/common_utils.py", line 453, in assertEqual 19:07:18 super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 19:07:18 AssertionError: 3794704.0 not less than or equal to 1e-05 : ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17061 Differential Revision: D14069324 Pulled By: bddppq fbshipit-source-id: e33b09abca217a62a8b577f9c332ea22985ef4ff	2019-02-13 17:18:47 -08:00
Johannes M Dieterich	9d01be1a5a	enable more unit tests in test_nn (#16994 ) Summary: These tests work with ROCm 2.1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16994 Differential Revision: D14059802 Pulled By: bddppq fbshipit-source-id: 8e2cbb13196c2e0283d3e02b7f761374bc580751	2019-02-12 17:58:44 -08:00
Thomas Viehmann	29f096cc70	optionally zero infinite losses in CTCLoss (#16199 ) Summary: Here is a stab at implementing an option to zero out infinite losses (and NaN gradients). It might be nicer to move the zeroing to the respective kernels. The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful. This topic seems to come up regularly, e.g. in #14335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16199 Differential Revision: D14020462 Pulled By: ezyang fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428	2019-02-11 13:12:55 -08:00
Johannes M Dieterich	23e1c55cc0	enable unit tests working on ROCm 2.1 (#16871 ) Summary: This is the first round of enabling unit tests that work on ROCm 2.1 in my tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16871 Differential Revision: D13997662 Pulled By: bddppq fbshipit-source-id: d909a3f7dd5fc8f85f126bf0613751c8e4ef949f	2019-02-09 00:30:50 -08:00
Asher Mancinelli	7078b2baf5	Better bounds checks in ctcloss (#16269 ) Summary: Adds better bounds checks for target lengths in CTC loss, checks for integral types for target and prediction lengths, and adds tests for each, according to #15946 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16269 Differential Revision: D13847567 Pulled By: ezyang fbshipit-source-id: 5d7a975565e02baf78fe388813a1d1ef56dfb212	2019-02-01 08:02:54 -08:00
vishwakftw	34b43baeec	Allow list and tuples to be passed as output_size to max_unpool1d (#16489 ) Summary: Changelog: - Modify concantenation of [1] to a tuple by using cases for list and non-list types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16489 Differential Revision: D13875838 Pulled By: soumith fbshipit-source-id: fade65cc47385986b773b9bde9b4601ab93fe1cf	2019-01-30 11:00:34 -08:00
Lu Fang	b1b00f329e	Fix the flake8 linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549 Reviewed By: bddppq Differential Revision: D13877435 Pulled By: houseroad fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540	2019-01-30 09:36:00 -08:00
Gregory Chanan	0cb24098c7	Handle non-contiguous inputs with mkldnn convolution. (#16300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/16018. Backwards appears to be fine because the derivative is written in terms of mkldnn_convolution itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16300 Differential Revision: D13797776 Pulled By: gchanan fbshipit-source-id: 68a990b8a3c186412a99d176931314806c9ed7bf	2019-01-24 07:39:31 -08:00
Thomas Viehmann	b662a9b66a	add back NNPACK in PyTorch (#15924 ) Summary: This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions. In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.) The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.) The CMake changes try to use the NNPack we already have in git. In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924 Differential Revision: D13709576 Pulled By: ezyang fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c	2019-01-18 15:34:35 -08:00
Richard Zou	ed0a761c82	Improve pack_sequence and pack_padded_sequence error message (#16084 ) Summary: Mention that if enforce_sorted=True, the user can set enforce_sorted=False. This is a new flag that is probably hard to discover unless one throughly reads the docs. Fixes #15567 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16084 Differential Revision: D13701118 Pulled By: zou3519 fbshipit-source-id: c9aeb47ae9769d28b0051bcedb8f2f51a5a5c260	2019-01-18 07:58:54 -08:00
Egil Martinsson	d6a8dd9538	Cleanup gumbel_softmax (#13339 ) Summary: Fixes #12643, amends to #3341. - Allow multidimensional input ~~(but apply softmax over `dim=-1`)~~ with `dim` argument - Cleaner: Less lines of code - Faster (1.32x speedup vs original, 2x speedup vs using `torch.Distributions`) - Small fixes in docstring - Remove some references in docstring. Was the linked (excellent) ipynb the first to do the straight-through trick? Instead, I propose changing to reference to the two papers most known for it. - Add deprecationwarning for `eps`. It's not needed anymore. - Initial commit keeps some code alternatives commented to exploit CI - As of discussion when `gumbel_softmax` was added (#3341), this was merged into `torch.nn.functional` before all the work with `Distributions` and `Pyro`, and there will probably be multiple other best practices for this in the future. I've tested building using the `Distributions`-api, but it was too slow, see below. I therefore propose not using `Distributions` to keep it fast and simple, but adding a comment in docstring that `gumbel_softmax` may be deprecated in the future. ``` dist = torch.distributions.RelaxedOneHotCategorical(temperature=tau, logits=logits, validate_args=False) y_soft = dist.rsample() ``` Pros: * Built using tricks like `logsumexp` etc * Explicitly uses `torch.distributions.utils._finfo` to avoid overflow (old implementation had an `eps` flag) * Maintained for this exact purpose. Cons: * Very slow. Construction of distribution adds overhead see timings below. May be solved in future with speedups of `TransformedDistribution` and `Distribution`. * Assumes which `dim` to apply softmax over. ``` y_soft = logits.new(logits.shape) y_soft = (logits - y_soft.exponential_().log()) / tau # Gumbel noise y_soft = y_soft.softmax(dim) # Gumbel softmax noise ``` Pros: * Faster ``` import time start = time.time() num_draws = 1000000 logits = torch.randn(1,3) for draw in range(num_draws): y_draw = gumbel_softmax(logits, hard=True) counts = counts + y_draw print(end - start) >> 12.995795965194702 >> 7.658372640609741 >> 20.3382670879364 ```` Decide on which path to chose. I'll commit in changes to the unit tests in a while to show that it passes both old tests and new tests. I'll also remove the commented code about `RelaxedOneHotCategorical` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13339 Differential Revision: D13092434 Pulled By: ezyang fbshipit-source-id: 4c21788df336f4e9c2ac289022e395b261227b4b	2019-01-17 12:56:35 -08:00
Gregory Chanan	595f767880	Revert batched pdist, improve existing kernel, add test (#15901 ) Summary: 1) Reverts https://github.com/pytorch/pytorch/pull/12302 which added support for batched pdist. Except I kept the (non-batched) test improvements that came with that PR, because they are nice to have. Motivation: https://github.com/pytorch/pytorch/issues/15511 2) For the non-batched pdist, improved the existing kernel by forcing fp64 math and properly checking cuda launch errors 3) Added a 'large tensor' test that at least on my machine, fails on the batch pdist implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15901 Reviewed By: ezyang Differential Revision: D13616730 Pulled By: gchanan fbshipit-source-id: 620d3f9b9acd492dc131bad9d2ff618d69fc2954	2019-01-17 10:44:43 -08:00
Natalia Gimelshein	461dc9a28b	use all_weights instead of _parameters in _flat_weights in rnn (#15766 ) Summary: Fixes #15749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15766 Differential Revision: D13592320 Pulled By: soumith fbshipit-source-id: 6c3805f576c3df5a2da8bef1e4305eda379718df	2019-01-08 09:48:36 -08:00
Michael Carilli	e313f1a7bf	Cudnn Handle Pool 3: At Wit's End (#15668 ) Summary: ezyang Here's a freshly rebased version of https://github.com/pytorch/pytorch/pull/15080 with the if statement that relieved the hangs that occasionally, nondeterministically, occurred on cudnnCreate on a particular windows build ([example w/debug statements](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/19238/console)) in https://github.com/pytorch/pytorch/pull/15280. I'd like to run the CI over this several times before it's considered mergeable. Sometimes the windows hang doesn't manifest for 2 or 3 consecutive trials. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15668 Differential Revision: D13579291 Pulled By: soumith fbshipit-source-id: 3972eb98bad6ece933ca5e67a10fc4bc2ed06068	2019-01-04 06:28:21 -08:00
Ailing Zhang	78442f04fc	Add mkldnn conv double backward (#15686 ) Summary: Fixes #15353 . Like cudnn conv implementation, mkldnn also falls back to the default `_convolution_double_backward` as double backward. This bug wasn't caught by CI before because mkldnn is only used when input scalar type is float, but our tests are all using double as default. Adding test for float inputs, but mkldnn seems to have imprecision issues similar to cudnn implementation, so here I only check if double backward exists instead of calling `gradgradcheck`. Please correct me if the precision should actually be checked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15686 Differential Revision: D13571682 Pulled By: ailzhang fbshipit-source-id: f1762439762370f276cfd59e8b8b8a4dee960a4b	2019-01-03 10:50:00 -08:00
Will Feng	7b87ecae37	Move autograd metadata from VariableImpl to TensorImpl (#13827 ) Summary: Changes originally in this PR: 1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct 2. Change Variable::Impl functions to use data members in `AutogradMeta` struct 3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl 4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()` Changes moved from https://github.com/pytorch/pytorch/pull/13645: 1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python. 2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated. This is the 1st+2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13827 Differential Revision: D13507173 Pulled By: yf225 fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a	2018-12-26 16:34:24 -08:00
David Pollack	cdb8edce75	add from_pretrained method to EmbeddingBag (#15273 ) Summary: The `EmbeddingBag` module does not include a `from_pretrained` method like the `Embedding` module. I added it for consistency between the two modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15273 Differential Revision: D13547842 Pulled By: soumith fbshipit-source-id: 8ffde51ff0c1e8fc8310263b6f375da88089ff7d	2018-12-26 08:35:39 -08:00
Richard Zou	3353064060	Add option to automatically handle unsorted variable-length sequences in RNNs (#15225 ) Summary: Fixes #3584. Motivation: manually sorting sequences, packing them, and then unsorting them is something a lot of users have complained about doing, especially when we can offer library support for them. Overview: we internally sort sequences before packing them and store a list of `unsorted_indices` that represent how to unsort the sequences inside PackedSequence. The packing helper functions return PackedSequence with the `permutation` field and the unpacking helper functions use it to unsort. To implement this, the following changes were made: - PackedSequence now keeps `sorted_indices` and `unsorted_indices`. These two can be thought of as permutations and are inverses of each other. `sorted_indices` is how the sequences were sorted; `unsorted_indices` is how to unsort the sequences. - Added an `enforce_sorted` argument to pack_sequence and pack_padded_sequence that maintains the legacy behavior of error-ing out on unsorted-sequences. When `enforce_sorted=True`, these functions maintain their ONNX exportability. - pack_sequence(sequences, enforce_sorted) takes in unsorted sequences. - pack_padded_sequence can take in a padded tensor that represents padded, unsorted sequences. - pad_packed_sequence unsorts the PackedSequence such that it is still the inverse operation of packed_padded_sequence. - RNNs apply `sort_indices` to their input hidden state and apply `unsort_indices` to their output hidden state. This is to ensure that the hidden state batches correspond to the user's ordering of input sequences. NOT BC-Breaking - The default for pack_sequence and pack_padded_sequence is `enforce_sorted=True` to avoid breaking ONNX export. To use the new functionality, pass in `enforce_sorted=False` Testing Plan - Modified TestNN.test_pack_sequence, TestNN.test_packed_padded_sequence, and TestNN.test_variable_sequence (RNN test) to check the behavior of unsorted sequences, sorted sequences, and sorted sequences with enforce_sorted=True - test/test_jit.py has a test to see if RNNs are exportable with enforce_sorted=True cc colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/15225 Reviewed By: soumith Differential Revision: D13507138 Pulled By: zou3519 fbshipit-source-id: b871dccd6abefffca81bc4e3efef1873faa242ef	2018-12-20 17:37:18 -08:00
Gao, Xiang	a47749cb28	Add at::one_hot (#15208 ) Summary: Closes: https://github.com/pytorch/pytorch/issues/15060 Differential Revision: D13528014 Pulled By: ezyang fbshipit-source-id: 5a18689a4c5638d92f9390c91517f741e5396293	2018-12-20 14:24:58 -08:00
Erik Brinkman	8db44eda01	Add support for batched pdist (#12302 ) Summary: This updates pdist to work for batched inputs, and updates the documentation to reflect issues raised. closes #9406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12302 Reviewed By: ezyang Differential Revision: D13528485 Pulled By: erikbrinkman fbshipit-source-id: 63d93a6e1cc95b483fb58e9ff021758b341cd4de	2018-12-20 09:41:08 -08:00
Peter Goldsborough	aec9fdf0a4	Fix _apply in nn.Module (#15305 ) Summary: Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module). soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305 Differential Revision: D13493937 Pulled By: goldsborough fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750	2018-12-17 16:22:21 -08:00
David Riazati	59d71b9664	Bicubic interpolation for nn.functional.interpolate (#9849 ) Summary: Addresses #918, interpolation results should be similar to tf * Adds bicubic interpolation operator to `nn.functional.interpolate` * Corresponding test in `test_nn.py` The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849 Differential Revision: D9007525 Pulled By: driazati fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc	2018-12-17 15:31:48 -08:00
Edward Yang	dcd1685282	Revert D13440858: [pytorch][PR] Use a pool of per-thread cudnn handles for each device, updated Differential Revision: D13440858 Original commit changeset: 1c6af5c53538 fbshipit-source-id: fda42ea75000d4a4e9c4a8eeaaa5518f7ad9c298	2018-12-14 14:35:01 -08:00
Chaitanya Sri Krishna Lolla	9f1d8f2eeb	enabled tests in test_nn, test_cuda and test_sparse (#15232 ) Summary: tests work on ROCm 1.9.2 as present on CI (fp16 bringup, hipMemset and sparse improvements) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15232 Differential Revision: D13470991 Pulled By: bddppq fbshipit-source-id: 45acc4f9ea5baaaf7672b86eb022948055779925	2018-12-14 14:27:57 -08:00
Michael Carilli	ca4358c8f5	Use a pool of per-thread cudnn handles for each device, updated (#15080 ) Summary: Rebased version of https://github.com/pytorch/pytorch/pull/14861, hopefully addressing ezyang's comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15080 Differential Revision: D13440858 Pulled By: ezyang fbshipit-source-id: 1c6af5c53538b81c6b92cf1dda231ed333f28035	2018-12-13 10:24:06 -08:00
Johannes M Dieterich	6610ace28b	use ROCm 1.9.2 fp16 capabilities in rocBLAS and MIOpen interfaces (#14994 ) Summary: * relax MIOpen if statement to allow fp16/fp32 mixed precision training now supported by ROCm 1.9.2 * use gemm_ex API of rocBLAS in ROCm 1.9.2 instead of the previous hgemm API * with this: enable all but one half test in test_nn While there, fix also: * a group convolution issue w/ MIOpen pertaining to initializing MIOpen on multi-GPU systems properly we detected while working on this Pull Request resolved: https://github.com/pytorch/pytorch/pull/14994 Differential Revision: D13439869 Pulled By: bddppq fbshipit-source-id: 75e4eb51a59488882e64b5eabdc30555b25be25e	2018-12-12 16:16:47 -08:00
Natalia Gimelshein	27d5ae7afb	use datatype dependent tolerance in data parallel tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14856 Differential Revision: D13413560 Pulled By: soumith fbshipit-source-id: b3a0cfe93477ed332e6eaa2e39ef5f4cc8b36481	2018-12-10 22:50:27 -08:00
Johannes M Dieterich	52942e1f09	Enable unit tests known to work on ROCm (#14011 ) Summary: * Enable unit tests known to work on ROCm. * Disable a few that are known to be flaky for the time being. * Use std::abs for Half * No more special casing for ROCm in TensorMathReduce * Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce ezyang bddppq for awareness Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011 Differential Revision: D13387679 Pulled By: bddppq fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71	2018-12-07 18:57:32 -08:00
Ailing Zhang	ef91cfd68b	Add new reduction mode in kl_div (#14457 ) Summary: Fixes #6622 . We used to average over all elements for kl divergence, which is not aligned with its math definition. This PR corrects the default reduction behavior of KL divergence that it now naverages over batch dimension. - In KL, default behavior `reduction=mean` averages over batch dimension. While for most other loss functions, `reduction=mean` averages over all elements. - We used to support scalar tensor as well. For BC purpose, we still support it, no reduction is performed on scalar tensor. - Added a new reduction mode called `batchmean` which has the correct behavior for KL. Add a warning to make `batchmean` as default for KL instead of `mean` in next major release. - [deprecated]I chose to not add a new reduction option, since "mean over batch dimension" is kinda special, and it only makes sense in few cases like KL. We don't want to explain why there's a option "batchmean" but it's not applicable for all other functions. I'm open to discussion on this one, as I cannot think of a perfect solution for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14457 Differential Revision: D13236016 Pulled By: ailzhang fbshipit-source-id: 905cc7b3bfc35a11d7cf098b1ebc382170a087a7	2018-12-04 12:24:28 -08:00
David Riazati	814b5715ba	Move module tests to common_nn (#14578 ) Summary: This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so that they can be used in `test_jit.py` without running any of `test_nn.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578 Differential Revision: D13268286 Pulled By: driazati fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03	2018-11-30 12:14:59 -08:00
David Riazati	1f6d9f44fc	Add InstanceNorm, Distance modules to Script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551 Differential Revision: D13272741 Pulled By: driazati fbshipit-source-id: 3e4fe870d0e268903757f3ae8a56100606906bce	2018-11-29 22:18:55 -08:00
David Riazati	67e3905bc6	Revert D13268293: [pytorch][PR] [jit] Add InstanceNorm, Distance modules to Script Differential Revision: D13268293 Original commit changeset: cb33c6dcdadd fbshipit-source-id: 214a29b74c85b7b25df0eb48e3fdb81539049130	2018-11-29 19:19:35 -08:00
David Riazati	75eccffdfe	Add InstanceNorm, Distance modules to Script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551 Differential Revision: D13268293 Pulled By: driazati fbshipit-source-id: cb33c6dcdaddf8c7a49b3535894d77bf5d771ddd	2018-11-29 17:26:29 -08:00
David Riazati	666d383a00	Add broadcast list default arg support (#14361 ) Summary: To convert `max_unpool` functions to weak script, this PR adds support for `T` as default arguments for `BroadcastingListN[T]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361 Differential Revision: D13192231 Pulled By: driazati fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249	2018-11-29 15:15:47 -08:00
David Riazati	9e93a02624	Use nn module tests in test_jit (#14238 ) Summary: This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types` Also depends on #14379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238 Differential Revision: D13252887 Pulled By: driazati fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988	2018-11-28 23:31:25 -08:00
David Riazati	3d98810fbd	Revert D13192230: [pytorch][PR] [jit] Use nn module tests in test_jit Differential Revision: D13192230 Original commit changeset: 36488960b6c9 fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909	2018-11-28 00:23:09 -08:00
David Riazati	4cdcbbf410	Use nn module tests in test_jit (#14238 ) Summary: This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types` Also depends on #14379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238 Differential Revision: D13192230 Pulled By: driazati fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e	2018-11-27 21:19:51 -08:00
Jan Schlüter	c19af59a6e	Use integer math to compute output size of pooling operations (#14405 ) Summary: As reported in #13386, the pooling operations can return wrong results for large inputs. The root of the problem is that while the output shape is initially being computed with integer operations, it is converted to float32 for division by the stride and applying either a `ceil` or a `floor` depending on the `ceil_mode`. Since even moderately large integers (the smallest being 16,777,217) cannot be expressed exactly in float32, this leads to wrong result shapes. This PR relies purely on integer operations to perform the shape computation, including the ceil/floor distinction. Since I could not stand all that duplicated code, I pulled it out into a `pooling_shape.h` header, similar to the existing `linear_upsampling.h` header. I hope this is acceptable, let me know if you'd like to see it solved differently. I've also added tests to `test_nn.py` that fail without my changes and pass with my changes. They cover `{max,avg}_pool{1,2,3}d()` for CPU and GPU. Fixes #13386. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14405 Differential Revision: D13215260 Pulled By: soumith fbshipit-source-id: 802588ce6cba8db6c346448c3b3c0dac14d12b2d	2018-11-27 09:38:06 -08:00
Brian Vaughan	e4bb56570c	Preemptively test for out-of-order length. (#13933 ) Summary: torch.nn.utils.rnn.pack_padded_sequence segment fault if not in decreasing order #13324 We were seeing this segfault on throw, pre-emptively checking avoids this: * Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 * Pull Request resolved: https://github.com/pytorch/pytorch/pull/13933 Differential Revision: D13090389 Pulled By: nairbv fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8	2018-11-16 08:39:05 -08:00
lyuwenyu	1b1cdd944c	Keep `ModuleList` consistent with python `list` in `__setitem__` function. (#13102 ) Summary: `ModuleList` class function `__setitem__` has implicit rist ``` In [26]: mlist = nn.ModuleList([nn.ReLU(), nn.Conv2d(10, 10, 3, 1)]) In [27]: mlist Out[27]: ModuleList( (0): ReLU() (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) ) In [28]: mlist[-1] = nn.ReLU() In [29]: mlist Out[29]: ModuleList( (0): ReLU() (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) (-1): ReLU() ) In [30]: mlist[-1] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-30-229d1b6823a0> in <module>() ----> 1 mlist[-1] ~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py in __getitem__(self, idx) 134 return ModuleList(list(self._modules.values())[idx]) 135 else: --> 136 return self._modules[self._get_abs_string_index(idx)] 137 138 def __setitem__(self, idx, module): KeyError: '2' ``` modified as ``` def __setitem__(self, idx, module): idx = self._get_abs_string_index(idx) return setattr(self, str(idx), module) ``` to fix it. ``` In [31]: class NewModuleList(nn.ModuleList): ...: def __setitem__(self, idx, module): ...: idx = self._get_abs_string_index(idx) ...: return setattr(self, str(idx), module) ...: In [32]: mlist = NewModuleList([nn.ReLU(), nn.Conv2d(10, 10, 2, 1)]) In [33]: mlist[-1] = nn.ReLU() In [34]: mlist Out[34]: NewModuleList( (0): ReLU() (1): ReLU() ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13102 Differential Revision: D13092480 Pulled By: ezyang fbshipit-source-id: 7ff7688f66e44bbd263a10d2d09db7bb0df4b749	2018-11-16 07:39:26 -08:00
Gregory Chanan	02152c515e	Ensure nn Losses check scalar vs non-scalar values. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13860 Reviewed By: ezyang Differential Revision: D13029364 Pulled By: gchanan fbshipit-source-id: 20f1330fa181e52aea1f879dc655a9a6f62b5f53	2018-11-14 16:46:27 -08:00
Gregory Chanan	4341dd2753	Move most sccalar checks from nn.yaml into THNN/THCUNN code. (#13906 ) Summary: This includes everything in nn.yaml except for convolutions, multi_margin_loss, multi_label_margin_loss, nll_loss, and nll_loss2d. Note that scalar_check False just means we don't do any extra scalar checks (we could elide this from the generated code, which I may do in a later commit). Pull Request resolved: https://github.com/pytorch/pytorch/pull/13906 Reviewed By: ezyang Differential Revision: D13044507 Pulled By: gchanan fbshipit-source-id: ebd3bdca2bcf512ca44de1ce3be81946f6c0828e	2018-11-14 07:58:35 -08:00
Sam Gross	c46dd5163f	Temporarily disable part of test_spectral_norm (#13908 ) Summary: See #13818 for suggestions about a long-term fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/13908 Differential Revision: D13047262 Pulled By: colesbury fbshipit-source-id: 0f29bd5b659bb97826381abbc305fb8a25b131ed	2018-11-13 14:19:16 -08:00
Ailing Zhang	a17c0118a5	fix stability in bce with pos_weight formula (#13863 ) Summary: Fixes #13773 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13863 Differential Revision: D13031803 Pulled By: ailzhang fbshipit-source-id: 6c9e044f0450eebf4555bbc02c125713d9378e2f	2018-11-12 22:04:24 -08:00
Johannes M Dieterich	ce48958606	enable more unit tests (#13166 ) Summary: This enables the distributions and utils test sets for ROCm. Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit. For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166 Differential Revision: D12814759 Pulled By: bddppq fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e	2018-11-12 18:49:52 -08:00
Thomas Viehmann	14004cbef6	Native batch norm (#13263 ) Summary: - Move batch norm from TH(CU)NN to native - Speedups in many cases (e.g. #12006) for CUDA due to new block/grid layout and Welford-type mean/variance calculations (the latter for training mode) - It splits the forward kernel in two pieces and reuses the evaluation kernel for the transformation. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Compared to the ill-fated #12368 - I changed the CPU kernel to not call `.sum()` from within parallel for. This seemed to have caused the breakage (NaN-results) in TestModels.test_dcgan_netG (thank you houseroad for the repro, errors in assessment of the fix are my own) - I updated the Half->Float upcasting in tensors to go through `t.type().scalarType()` instead of `t.dtype()`. - I have merged master Pull Request resolved: https://github.com/pytorch/pytorch/pull/13263 Differential Revision: D12946254 Pulled By: SsnL fbshipit-source-id: 3bb717ee250fbccaf10afe73722996aa4713d10d	2018-11-06 20:05:54 -08:00
Tongzhou Wang	2cd912bcc2	Fix more spectral norm bugs (#13350 ) Summary: Problems with SN and DP after #12671 : 1. in eval mode, `weight_orig` is not getting correct gradient #12737 . Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval. 2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong. Fix: Make `weight` not a buffer anymore and always calculate it as above. 3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward. Fix: This PR clones `u` and `v` before using them. To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done. cc The controller you requested could not be found. crcrpar Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350 Differential Revision: D12931044 Pulled By: SsnL fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef	2018-11-06 19:16:13 -08:00
Soumith Chintala	a7ee632dff	Various Test and build fixes (#13556 ) Summary: - fixes weights-contiguous requirement for THCUNN Convolutions - Add tests that conv backward pass works for non-contiguous weights - fix RNN tests / error messages to be consistent and pass - relax weight grad precision for fp16 for a particular test - fix regression of CMAKE_PREFIX_PATH not passing through - add missing skipIfNoLapack annotations where needed Differential Revision: D12918456 Pulled By: soumith fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05	2018-11-06 07:13:47 -08:00
Sam Gross	98f5c005da	Speed up CPU threshold and relu implementation (#13182 ) Summary: ``` The previous threshold implementation was not vectorized or parallelized. This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms CPU timings: https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8 1 thread (before vs. after) 10240: 17.4 us vs. 6.9 µs per loop 102400: 141 us vs. 39.8 µs per loop 16 threads (before vs. after) 10240: 17.4 us vs. 6.7 µs per loop 102400: 141 us vs. 14.3 µs per loop CUDA timings are not measurably different. [1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182 Reviewed By: soumith Differential Revision: D12825105 Pulled By: colesbury fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15	2018-11-05 12:51:29 -08:00
Tongzhou Wang	9f2b2cac37	Fix handling all empty bags in CUDA embedding bag (#13483 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13483 Differential Revision: D12902914 Pulled By: SsnL fbshipit-source-id: 577a53e815231e988da716b1ee5667e1f36408ca	2018-11-02 10:21:14 -07:00
Tongzhou Wang	99a5d19591	Rename elementwise_mean to mean (#13419 ) Summary: Closes #12459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419 Differential Revision: D12883299 Pulled By: SsnL fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7	2018-11-01 10:31:26 -07:00
Ailing Zhang	488d393ea6	Fix pointwise loss broadcast (#12996 ) Summary: Fixes #12129 , #12327 Differential Revision: D10513781 Pulled By: ailzhang fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec	2018-10-31 10:17:25 -07:00
Lu Fang	f8864f0505	Revert "Move batch_norm to ATen/native, speed up (#12368 )" (#13191 ) Summary: Revert #12368 since it's causing onnx related test cases failing. https://github.com/pytorch/pytorch/pull/12368 SsnL The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13191 Reviewed By: BIT-silence Differential Revision: D12810778 Pulled By: houseroad fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577	2018-10-26 23:05:50 -07:00
Zachary DeVito	dae7616078	Shard all of tests based on how many tests exist. (#13160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160 Reduces pytorch_core build from 2 hours to 30 minutes Reviewed By: soumith, dzhulgakov Differential Revision: D10524261 fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9	2018-10-26 18:20:34 -07:00
Thomas Viehmann	dc211c7de4	Move batch_norm to ATen/native, speed up (#12368 ) Summary: - Speed up the case of #12006 in the forward - The backward still isn't as fast as one might hope (factor 2-3 in the #12006 case). - More extensive benchmarking shows not so great performance compared to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12368 Differential Revision: D10559696 Pulled By: SsnL fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419	2018-10-25 23:41:10 -07:00
Richard Zou	7863c17b26	Fix convtranspose3d output_size calculation (#12952 ) Summary: Closes #2119. There was a small bug where the output_size got sliced with `[-2:]` where we really meant to slice it as `[2:]` (to remove the batch and channel dimensions). Added a new test for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12952 Differential Revision: D10510678 Pulled By: zou3519 fbshipit-source-id: 4c04a5007fc6d002e1806d6fe981b43d33d6a4f2	2018-10-24 09:23:05 -07:00
Soumith Chintala	cf235e0894	fix lint after new flake8 release added new style constraints (#13047 ) Summary: fix lint after new flake8 release added new style constraints Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047 Differential Revision: D10527804 Pulled By: soumith fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8	2018-10-24 09:03:38 -07:00
Wei Yang	710191e292	fix error message of large kernel size in conv2D (#12791 ) Summary: - fix #12565 - test plan: with this fix, we have: ``` >>> m = nn.Conv2d(in_channels=3, out_channels=33, kernel_size=10, stride=1, bias=True) >>> input = torch.randn(1, 3, 1, 1) >>> output = m(input) ``` RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (10 x 10). Kernel size can't be greater than actual input size at ~/pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:50 not sure why these are `int` instead of `int64_t`: `5ccdd7a626/aten/src/THNN/generic/SpatialConvolutionMM.c (L10)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12791 Differential Revision: D10443045 Pulled By: weiyangfb fbshipit-source-id: 2620acb40bdd49d29cec06337f6dfb4653d1987c	2018-10-18 00:51:16 -07:00
James Sun	f4944f0f8a	Rename test/common.py to test/common_utils.py (#12794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794 common.py is used in base_module for almost all tests in test/. The name of this file is so common that can easily conflict with other dependencies if they happen to have another common.py in the base module. Rename the file to avoid conflict. Reviewed By: orionr Differential Revision: D10438204 fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380	2018-10-17 23:04:29 -07:00
Thomas Viehmann	ba25e13782	Forbid Module.to with copy argument. (#12617 ) Summary: Module.to uses the Tensor.to parsing facility. It should not, however, accept "copy" as a keyword/fourth positional argument. See #12571 for discussion. Thank you SsnL for noticing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617 Differential Revision: D10392053 Pulled By: ezyang fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c	2018-10-16 20:31:44 -07:00
Tongzhou Wang	ac994f2c78	Fix SpectralNorm with DataParallel (#12671 ) Summary: There were two problems with SN + DP: 1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost. 2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work. Fixes are: 1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained 2. Do not call `detach_`. 3. Added comments in SN about the subtlety. 4. Added a note to the DP doc on this particular behavior of DP. cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu Fixes https://github.com/pytorch/pytorch/issues/11476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671 Differential Revision: D10410232 Pulled By: SsnL fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9	2018-10-16 16:02:17 -07:00
Ailing Zhang	e15501fb68	fix bce_with_logits with legacy reduce (#12689 ) Summary: Fix #12624 . internal usecase of legacy `reduce`. Add test in test_nn Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689 Reviewed By: ezyang Differential Revision: D10391195 Pulled By: ailzhang fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1	2018-10-16 09:46:58 -07:00
Natalia Gimelshein	a98958d3bd	dtype option for softmax (#11719 ) Summary: Add dtype argument to softmax/log_softmax functions. Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it. For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719 Reviewed By: ezyang Differential Revision: D10175514 Pulled By: zou3519 fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243	2018-10-13 17:57:10 -07:00
Tongzhou Wang	d400502b1d	Fix a bunch of warnings in TestNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12453 Differential Revision: D10244130 Pulled By: SsnL fbshipit-source-id: e425c76bfb721fe118a32ddd1fa6eca3a3cd86f0	2018-10-08 17:38:23 -07:00
daquexian	f8086845aa	Fix bug in grad.py when conv bias != None (#12281 ) Summary: Obviously, the grads of conv weight and conv input are not relevant to the bias, but the original `convXd_input` and `convXd_weight` methods receive a `bias` parameter. What's more, while the doc says `bias` should have the shape `(out_channels,)`, one will get a `RuntimeError` if the bias != None and in_channels != out_channels, for the weight of transposed conv has the shape `(in_channels, out_channels, kH, kW)` while the weight of vanilla conv has the shape `(out_channels, in_channels, kH, kW)` ``` RuntimeError: Given transposed=1, weight of size [channel1, channel2, kH, kW], expected bias to be 1-dimensional with channel2 elements, but got bias of size [channel1] instead ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12281 Differential Revision: D10217370 Pulled By: ezyang fbshipit-source-id: bc00b439e5ae539276a5e678bdb92af700197bb2	2018-10-05 12:55:14 -07:00
Johannes M Dieterich	c9f7d7b506	mark unit tests as working, skip failing unit test (#12313 ) Summary: * enabled fp16 tests for test_torch * enable fp16 tests for test_nn * enabled multilabelmargin loss for fp16 * removed skip for test_pdist_empty_col * Enable test_nn tests that pass with compiler fixes etc. * Enable test_legacy_nn tests that pass with compiler fixes etc. ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313 Differential Revision: D10189922 Pulled By: bddppq fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6	2018-10-03 23:56:26 -07:00
iotamudelta	a2ebbccc9f	fix unit tests on CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187 Differential Revision: D10118483 Pulled By: bddppq fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442	2018-09-28 23:11:55 -07:00
Wei Yang	de11fe0c83	migrate PReLU to ATen (#11758 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10723 - migrate PReLU to ATen and deprecate legacy PReLU - performance: CPU with weight.numel() = 1 ``` >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 100 loops, best of 100: 9.43 ms per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 10 loops, best of 100: 24.4 ms per loop >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 695 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.47 ms per loop ``` CPU with weight.numel() = channels ``` >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 603 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 13.3 ms per loop >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 655 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.45 ms per loop ``` CUDA with weight.numel() = 1 ``` >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 187 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.01 ms per loop >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 195 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.28 ms per loop ``` CUDA with weight.numel() = channel ``` >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 174 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.27 ms per loop >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 181 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.26 ms per loop ``` The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels. ezyang SsnL zou3519 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758 Differential Revision: D9995799 Pulled By: weiyangfb fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e	2018-09-21 16:26:04 -07:00
Thomas Viehmann	775358e4c2	Add non-legacy test of bilinear (#11935 ) Summary: Fixes: #11905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11935 Differential Revision: D9991120 Pulled By: soumith fbshipit-source-id: b00ad4f405440664ae5228b229a2ba0a5d3d92f6	2018-09-21 12:43:35 -07:00
Christian Puhrsch	d8f6be686d	Remove torch/legacy (#11823 ) Summary: Largely unused and hinders current development Pull Request resolved: https://github.com/pytorch/pytorch/pull/11823 Differential Revision: D9925094 Pulled By: cpuhrsch fbshipit-source-id: c797f62180e2128f9a567b0c57c8347957470ea5	2018-09-20 14:00:54 -07:00

1 2 3 4 5 ...

594 Commits