pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Peter Goldsborough	aec9fdf0a4	Fix _apply in nn.Module (#15305 ) Summary: Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module). soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305 Differential Revision: D13493937 Pulled By: goldsborough fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750	2018-12-17 16:22:21 -08:00
David Riazati	59d71b9664	Bicubic interpolation for nn.functional.interpolate (#9849 ) Summary: Addresses #918, interpolation results should be similar to tf * Adds bicubic interpolation operator to `nn.functional.interpolate` * Corresponding test in `test_nn.py` The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849 Differential Revision: D9007525 Pulled By: driazati fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc	2018-12-17 15:31:48 -08:00
Edward Yang	dcd1685282	Revert D13440858: [pytorch][PR] Use a pool of per-thread cudnn handles for each device, updated Differential Revision: D13440858 Original commit changeset: 1c6af5c53538 fbshipit-source-id: fda42ea75000d4a4e9c4a8eeaaa5518f7ad9c298	2018-12-14 14:35:01 -08:00
Chaitanya Sri Krishna Lolla	9f1d8f2eeb	enabled tests in test_nn, test_cuda and test_sparse (#15232 ) Summary: tests work on ROCm 1.9.2 as present on CI (fp16 bringup, hipMemset and sparse improvements) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15232 Differential Revision: D13470991 Pulled By: bddppq fbshipit-source-id: 45acc4f9ea5baaaf7672b86eb022948055779925	2018-12-14 14:27:57 -08:00
Michael Carilli	ca4358c8f5	Use a pool of per-thread cudnn handles for each device, updated (#15080 ) Summary: Rebased version of https://github.com/pytorch/pytorch/pull/14861, hopefully addressing ezyang's comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15080 Differential Revision: D13440858 Pulled By: ezyang fbshipit-source-id: 1c6af5c53538b81c6b92cf1dda231ed333f28035	2018-12-13 10:24:06 -08:00
Johannes M Dieterich	6610ace28b	use ROCm 1.9.2 fp16 capabilities in rocBLAS and MIOpen interfaces (#14994 ) Summary: * relax MIOpen if statement to allow fp16/fp32 mixed precision training now supported by ROCm 1.9.2 * use gemm_ex API of rocBLAS in ROCm 1.9.2 instead of the previous hgemm API * with this: enable all but one half test in test_nn While there, fix also: * a group convolution issue w/ MIOpen pertaining to initializing MIOpen on multi-GPU systems properly we detected while working on this Pull Request resolved: https://github.com/pytorch/pytorch/pull/14994 Differential Revision: D13439869 Pulled By: bddppq fbshipit-source-id: 75e4eb51a59488882e64b5eabdc30555b25be25e	2018-12-12 16:16:47 -08:00
Natalia Gimelshein	27d5ae7afb	use datatype dependent tolerance in data parallel tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14856 Differential Revision: D13413560 Pulled By: soumith fbshipit-source-id: b3a0cfe93477ed332e6eaa2e39ef5f4cc8b36481	2018-12-10 22:50:27 -08:00
Johannes M Dieterich	52942e1f09	Enable unit tests known to work on ROCm (#14011 ) Summary: * Enable unit tests known to work on ROCm. * Disable a few that are known to be flaky for the time being. * Use std::abs for Half * No more special casing for ROCm in TensorMathReduce * Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce ezyang bddppq for awareness Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011 Differential Revision: D13387679 Pulled By: bddppq fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71	2018-12-07 18:57:32 -08:00
Ailing Zhang	ef91cfd68b	Add new reduction mode in kl_div (#14457 ) Summary: Fixes #6622 . We used to average over all elements for kl divergence, which is not aligned with its math definition. This PR corrects the default reduction behavior of KL divergence that it now naverages over batch dimension. - In KL, default behavior `reduction=mean` averages over batch dimension. While for most other loss functions, `reduction=mean` averages over all elements. - We used to support scalar tensor as well. For BC purpose, we still support it, no reduction is performed on scalar tensor. - Added a new reduction mode called `batchmean` which has the correct behavior for KL. Add a warning to make `batchmean` as default for KL instead of `mean` in next major release. - [deprecated]I chose to not add a new reduction option, since "mean over batch dimension" is kinda special, and it only makes sense in few cases like KL. We don't want to explain why there's a option "batchmean" but it's not applicable for all other functions. I'm open to discussion on this one, as I cannot think of a perfect solution for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14457 Differential Revision: D13236016 Pulled By: ailzhang fbshipit-source-id: 905cc7b3bfc35a11d7cf098b1ebc382170a087a7	2018-12-04 12:24:28 -08:00
David Riazati	814b5715ba	Move module tests to common_nn (#14578 ) Summary: This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so that they can be used in `test_jit.py` without running any of `test_nn.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578 Differential Revision: D13268286 Pulled By: driazati fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03	2018-11-30 12:14:59 -08:00
David Riazati	1f6d9f44fc	Add InstanceNorm, Distance modules to Script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551 Differential Revision: D13272741 Pulled By: driazati fbshipit-source-id: 3e4fe870d0e268903757f3ae8a56100606906bce	2018-11-29 22:18:55 -08:00
David Riazati	67e3905bc6	Revert D13268293: [pytorch][PR] [jit] Add InstanceNorm, Distance modules to Script Differential Revision: D13268293 Original commit changeset: cb33c6dcdadd fbshipit-source-id: 214a29b74c85b7b25df0eb48e3fdb81539049130	2018-11-29 19:19:35 -08:00
David Riazati	75eccffdfe	Add InstanceNorm, Distance modules to Script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551 Differential Revision: D13268293 Pulled By: driazati fbshipit-source-id: cb33c6dcdaddf8c7a49b3535894d77bf5d771ddd	2018-11-29 17:26:29 -08:00
David Riazati	666d383a00	Add broadcast list default arg support (#14361 ) Summary: To convert `max_unpool` functions to weak script, this PR adds support for `T` as default arguments for `BroadcastingListN[T]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361 Differential Revision: D13192231 Pulled By: driazati fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249	2018-11-29 15:15:47 -08:00
David Riazati	9e93a02624	Use nn module tests in test_jit (#14238 ) Summary: This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types` Also depends on #14379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238 Differential Revision: D13252887 Pulled By: driazati fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988	2018-11-28 23:31:25 -08:00
David Riazati	3d98810fbd	Revert D13192230: [pytorch][PR] [jit] Use nn module tests in test_jit Differential Revision: D13192230 Original commit changeset: 36488960b6c9 fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909	2018-11-28 00:23:09 -08:00
David Riazati	4cdcbbf410	Use nn module tests in test_jit (#14238 ) Summary: This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types` Also depends on #14379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238 Differential Revision: D13192230 Pulled By: driazati fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e	2018-11-27 21:19:51 -08:00
Jan Schlüter	c19af59a6e	Use integer math to compute output size of pooling operations (#14405 ) Summary: As reported in #13386, the pooling operations can return wrong results for large inputs. The root of the problem is that while the output shape is initially being computed with integer operations, it is converted to float32 for division by the stride and applying either a `ceil` or a `floor` depending on the `ceil_mode`. Since even moderately large integers (the smallest being 16,777,217) cannot be expressed exactly in float32, this leads to wrong result shapes. This PR relies purely on integer operations to perform the shape computation, including the ceil/floor distinction. Since I could not stand all that duplicated code, I pulled it out into a `pooling_shape.h` header, similar to the existing `linear_upsampling.h` header. I hope this is acceptable, let me know if you'd like to see it solved differently. I've also added tests to `test_nn.py` that fail without my changes and pass with my changes. They cover `{max,avg}_pool{1,2,3}d()` for CPU and GPU. Fixes #13386. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14405 Differential Revision: D13215260 Pulled By: soumith fbshipit-source-id: 802588ce6cba8db6c346448c3b3c0dac14d12b2d	2018-11-27 09:38:06 -08:00
Brian Vaughan	e4bb56570c	Preemptively test for out-of-order length. (#13933 ) Summary: torch.nn.utils.rnn.pack_padded_sequence segment fault if not in decreasing order #13324 We were seeing this segfault on throw, pre-emptively checking avoids this: * Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 * Pull Request resolved: https://github.com/pytorch/pytorch/pull/13933 Differential Revision: D13090389 Pulled By: nairbv fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8	2018-11-16 08:39:05 -08:00
lyuwenyu	1b1cdd944c	Keep `ModuleList` consistent with python `list` in `__setitem__` function. (#13102 ) Summary: `ModuleList` class function `__setitem__` has implicit rist ``` In [26]: mlist = nn.ModuleList([nn.ReLU(), nn.Conv2d(10, 10, 3, 1)]) In [27]: mlist Out[27]: ModuleList( (0): ReLU() (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) ) In [28]: mlist[-1] = nn.ReLU() In [29]: mlist Out[29]: ModuleList( (0): ReLU() (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) (-1): ReLU() ) In [30]: mlist[-1] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-30-229d1b6823a0> in <module>() ----> 1 mlist[-1] ~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py in __getitem__(self, idx) 134 return ModuleList(list(self._modules.values())[idx]) 135 else: --> 136 return self._modules[self._get_abs_string_index(idx)] 137 138 def __setitem__(self, idx, module): KeyError: '2' ``` modified as ``` def __setitem__(self, idx, module): idx = self._get_abs_string_index(idx) return setattr(self, str(idx), module) ``` to fix it. ``` In [31]: class NewModuleList(nn.ModuleList): ...: def __setitem__(self, idx, module): ...: idx = self._get_abs_string_index(idx) ...: return setattr(self, str(idx), module) ...: In [32]: mlist = NewModuleList([nn.ReLU(), nn.Conv2d(10, 10, 2, 1)]) In [33]: mlist[-1] = nn.ReLU() In [34]: mlist Out[34]: NewModuleList( (0): ReLU() (1): ReLU() ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13102 Differential Revision: D13092480 Pulled By: ezyang fbshipit-source-id: 7ff7688f66e44bbd263a10d2d09db7bb0df4b749	2018-11-16 07:39:26 -08:00
Gregory Chanan	02152c515e	Ensure nn Losses check scalar vs non-scalar values. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13860 Reviewed By: ezyang Differential Revision: D13029364 Pulled By: gchanan fbshipit-source-id: 20f1330fa181e52aea1f879dc655a9a6f62b5f53	2018-11-14 16:46:27 -08:00
Gregory Chanan	4341dd2753	Move most sccalar checks from nn.yaml into THNN/THCUNN code. (#13906 ) Summary: This includes everything in nn.yaml except for convolutions, multi_margin_loss, multi_label_margin_loss, nll_loss, and nll_loss2d. Note that scalar_check False just means we don't do any extra scalar checks (we could elide this from the generated code, which I may do in a later commit). Pull Request resolved: https://github.com/pytorch/pytorch/pull/13906 Reviewed By: ezyang Differential Revision: D13044507 Pulled By: gchanan fbshipit-source-id: ebd3bdca2bcf512ca44de1ce3be81946f6c0828e	2018-11-14 07:58:35 -08:00
Sam Gross	c46dd5163f	Temporarily disable part of test_spectral_norm (#13908 ) Summary: See #13818 for suggestions about a long-term fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/13908 Differential Revision: D13047262 Pulled By: colesbury fbshipit-source-id: 0f29bd5b659bb97826381abbc305fb8a25b131ed	2018-11-13 14:19:16 -08:00
Ailing Zhang	a17c0118a5	fix stability in bce with pos_weight formula (#13863 ) Summary: Fixes #13773 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13863 Differential Revision: D13031803 Pulled By: ailzhang fbshipit-source-id: 6c9e044f0450eebf4555bbc02c125713d9378e2f	2018-11-12 22:04:24 -08:00
Johannes M Dieterich	ce48958606	enable more unit tests (#13166 ) Summary: This enables the distributions and utils test sets for ROCm. Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit. For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166 Differential Revision: D12814759 Pulled By: bddppq fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e	2018-11-12 18:49:52 -08:00
Thomas Viehmann	14004cbef6	Native batch norm (#13263 ) Summary: - Move batch norm from TH(CU)NN to native - Speedups in many cases (e.g. #12006) for CUDA due to new block/grid layout and Welford-type mean/variance calculations (the latter for training mode) - It splits the forward kernel in two pieces and reuses the evaluation kernel for the transformation. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Compared to the ill-fated #12368 - I changed the CPU kernel to not call `.sum()` from within parallel for. This seemed to have caused the breakage (NaN-results) in TestModels.test_dcgan_netG (thank you houseroad for the repro, errors in assessment of the fix are my own) - I updated the Half->Float upcasting in tensors to go through `t.type().scalarType()` instead of `t.dtype()`. - I have merged master Pull Request resolved: https://github.com/pytorch/pytorch/pull/13263 Differential Revision: D12946254 Pulled By: SsnL fbshipit-source-id: 3bb717ee250fbccaf10afe73722996aa4713d10d	2018-11-06 20:05:54 -08:00
Tongzhou Wang	2cd912bcc2	Fix more spectral norm bugs (#13350 ) Summary: Problems with SN and DP after #12671 : 1. in eval mode, `weight_orig` is not getting correct gradient #12737 . Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval. 2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong. Fix: Make `weight` not a buffer anymore and always calculate it as above. 3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward. Fix: This PR clones `u` and `v` before using them. To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done. cc The controller you requested could not be found. crcrpar Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350 Differential Revision: D12931044 Pulled By: SsnL fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef	2018-11-06 19:16:13 -08:00
Soumith Chintala	a7ee632dff	Various Test and build fixes (#13556 ) Summary: - fixes weights-contiguous requirement for THCUNN Convolutions - Add tests that conv backward pass works for non-contiguous weights - fix RNN tests / error messages to be consistent and pass - relax weight grad precision for fp16 for a particular test - fix regression of CMAKE_PREFIX_PATH not passing through - add missing skipIfNoLapack annotations where needed Differential Revision: D12918456 Pulled By: soumith fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05	2018-11-06 07:13:47 -08:00
Sam Gross	98f5c005da	Speed up CPU threshold and relu implementation (#13182 ) Summary: ``` The previous threshold implementation was not vectorized or parallelized. This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms CPU timings: https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8 1 thread (before vs. after) 10240: 17.4 us vs. 6.9 µs per loop 102400: 141 us vs. 39.8 µs per loop 16 threads (before vs. after) 10240: 17.4 us vs. 6.7 µs per loop 102400: 141 us vs. 14.3 µs per loop CUDA timings are not measurably different. [1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182 Reviewed By: soumith Differential Revision: D12825105 Pulled By: colesbury fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15	2018-11-05 12:51:29 -08:00
Tongzhou Wang	9f2b2cac37	Fix handling all empty bags in CUDA embedding bag (#13483 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13483 Differential Revision: D12902914 Pulled By: SsnL fbshipit-source-id: 577a53e815231e988da716b1ee5667e1f36408ca	2018-11-02 10:21:14 -07:00
Tongzhou Wang	99a5d19591	Rename elementwise_mean to mean (#13419 ) Summary: Closes #12459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419 Differential Revision: D12883299 Pulled By: SsnL fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7	2018-11-01 10:31:26 -07:00
Ailing Zhang	488d393ea6	Fix pointwise loss broadcast (#12996 ) Summary: Fixes #12129 , #12327 Differential Revision: D10513781 Pulled By: ailzhang fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec	2018-10-31 10:17:25 -07:00
Lu Fang	f8864f0505	Revert "Move batch_norm to ATen/native, speed up (#12368 )" (#13191 ) Summary: Revert #12368 since it's causing onnx related test cases failing. https://github.com/pytorch/pytorch/pull/12368 SsnL The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13191 Reviewed By: BIT-silence Differential Revision: D12810778 Pulled By: houseroad fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577	2018-10-26 23:05:50 -07:00
Zachary DeVito	dae7616078	Shard all of tests based on how many tests exist. (#13160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160 Reduces pytorch_core build from 2 hours to 30 minutes Reviewed By: soumith, dzhulgakov Differential Revision: D10524261 fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9	2018-10-26 18:20:34 -07:00
Thomas Viehmann	dc211c7de4	Move batch_norm to ATen/native, speed up (#12368 ) Summary: - Speed up the case of #12006 in the forward - The backward still isn't as fast as one might hope (factor 2-3 in the #12006 case). - More extensive benchmarking shows not so great performance compared to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12368 Differential Revision: D10559696 Pulled By: SsnL fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419	2018-10-25 23:41:10 -07:00
Richard Zou	7863c17b26	Fix convtranspose3d output_size calculation (#12952 ) Summary: Closes #2119. There was a small bug where the output_size got sliced with `[-2:]` where we really meant to slice it as `[2:]` (to remove the batch and channel dimensions). Added a new test for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12952 Differential Revision: D10510678 Pulled By: zou3519 fbshipit-source-id: 4c04a5007fc6d002e1806d6fe981b43d33d6a4f2	2018-10-24 09:23:05 -07:00
Soumith Chintala	cf235e0894	fix lint after new flake8 release added new style constraints (#13047 ) Summary: fix lint after new flake8 release added new style constraints Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047 Differential Revision: D10527804 Pulled By: soumith fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8	2018-10-24 09:03:38 -07:00
Wei Yang	710191e292	fix error message of large kernel size in conv2D (#12791 ) Summary: - fix #12565 - test plan: with this fix, we have: ``` >>> m = nn.Conv2d(in_channels=3, out_channels=33, kernel_size=10, stride=1, bias=True) >>> input = torch.randn(1, 3, 1, 1) >>> output = m(input) ``` RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (10 x 10). Kernel size can't be greater than actual input size at ~/pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:50 not sure why these are `int` instead of `int64_t`: `5ccdd7a626/aten/src/THNN/generic/SpatialConvolutionMM.c (L10)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12791 Differential Revision: D10443045 Pulled By: weiyangfb fbshipit-source-id: 2620acb40bdd49d29cec06337f6dfb4653d1987c	2018-10-18 00:51:16 -07:00
James Sun	f4944f0f8a	Rename test/common.py to test/common_utils.py (#12794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794 common.py is used in base_module for almost all tests in test/. The name of this file is so common that can easily conflict with other dependencies if they happen to have another common.py in the base module. Rename the file to avoid conflict. Reviewed By: orionr Differential Revision: D10438204 fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380	2018-10-17 23:04:29 -07:00
Thomas Viehmann	ba25e13782	Forbid Module.to with copy argument. (#12617 ) Summary: Module.to uses the Tensor.to parsing facility. It should not, however, accept "copy" as a keyword/fourth positional argument. See #12571 for discussion. Thank you SsnL for noticing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617 Differential Revision: D10392053 Pulled By: ezyang fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c	2018-10-16 20:31:44 -07:00
Tongzhou Wang	ac994f2c78	Fix SpectralNorm with DataParallel (#12671 ) Summary: There were two problems with SN + DP: 1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost. 2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work. Fixes are: 1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained 2. Do not call `detach_`. 3. Added comments in SN about the subtlety. 4. Added a note to the DP doc on this particular behavior of DP. cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu Fixes https://github.com/pytorch/pytorch/issues/11476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671 Differential Revision: D10410232 Pulled By: SsnL fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9	2018-10-16 16:02:17 -07:00
Ailing Zhang	e15501fb68	fix bce_with_logits with legacy reduce (#12689 ) Summary: Fix #12624 . internal usecase of legacy `reduce`. Add test in test_nn Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689 Reviewed By: ezyang Differential Revision: D10391195 Pulled By: ailzhang fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1	2018-10-16 09:46:58 -07:00
Natalia Gimelshein	a98958d3bd	dtype option for softmax (#11719 ) Summary: Add dtype argument to softmax/log_softmax functions. Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it. For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719 Reviewed By: ezyang Differential Revision: D10175514 Pulled By: zou3519 fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243	2018-10-13 17:57:10 -07:00
Tongzhou Wang	d400502b1d	Fix a bunch of warnings in TestNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12453 Differential Revision: D10244130 Pulled By: SsnL fbshipit-source-id: e425c76bfb721fe118a32ddd1fa6eca3a3cd86f0	2018-10-08 17:38:23 -07:00
daquexian	f8086845aa	Fix bug in grad.py when conv bias != None (#12281 ) Summary: Obviously, the grads of conv weight and conv input are not relevant to the bias, but the original `convXd_input` and `convXd_weight` methods receive a `bias` parameter. What's more, while the doc says `bias` should have the shape `(out_channels,)`, one will get a `RuntimeError` if the bias != None and in_channels != out_channels, for the weight of transposed conv has the shape `(in_channels, out_channels, kH, kW)` while the weight of vanilla conv has the shape `(out_channels, in_channels, kH, kW)` ``` RuntimeError: Given transposed=1, weight of size [channel1, channel2, kH, kW], expected bias to be 1-dimensional with channel2 elements, but got bias of size [channel1] instead ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12281 Differential Revision: D10217370 Pulled By: ezyang fbshipit-source-id: bc00b439e5ae539276a5e678bdb92af700197bb2	2018-10-05 12:55:14 -07:00
Johannes M Dieterich	c9f7d7b506	mark unit tests as working, skip failing unit test (#12313 ) Summary: * enabled fp16 tests for test_torch * enable fp16 tests for test_nn * enabled multilabelmargin loss for fp16 * removed skip for test_pdist_empty_col * Enable test_nn tests that pass with compiler fixes etc. * Enable test_legacy_nn tests that pass with compiler fixes etc. ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313 Differential Revision: D10189922 Pulled By: bddppq fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6	2018-10-03 23:56:26 -07:00
iotamudelta	a2ebbccc9f	fix unit tests on CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187 Differential Revision: D10118483 Pulled By: bddppq fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442	2018-09-28 23:11:55 -07:00
Wei Yang	de11fe0c83	migrate PReLU to ATen (#11758 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10723 - migrate PReLU to ATen and deprecate legacy PReLU - performance: CPU with weight.numel() = 1 ``` >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 100 loops, best of 100: 9.43 ms per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 10 loops, best of 100: 24.4 ms per loop >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 695 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.47 ms per loop ``` CPU with weight.numel() = channels ``` >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 603 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 13.3 ms per loop >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 655 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.45 ms per loop ``` CUDA with weight.numel() = 1 ``` >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 187 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.01 ms per loop >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 195 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.28 ms per loop ``` CUDA with weight.numel() = channel ``` >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 174 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.27 ms per loop >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 181 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.26 ms per loop ``` The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels. ezyang SsnL zou3519 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758 Differential Revision: D9995799 Pulled By: weiyangfb fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e	2018-09-21 16:26:04 -07:00
Thomas Viehmann	775358e4c2	Add non-legacy test of bilinear (#11935 ) Summary: Fixes: #11905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11935 Differential Revision: D9991120 Pulled By: soumith fbshipit-source-id: b00ad4f405440664ae5228b229a2ba0a5d3d92f6	2018-09-21 12:43:35 -07:00
Christian Puhrsch	d8f6be686d	Remove torch/legacy (#11823 ) Summary: Largely unused and hinders current development Pull Request resolved: https://github.com/pytorch/pytorch/pull/11823 Differential Revision: D9925094 Pulled By: cpuhrsch fbshipit-source-id: c797f62180e2128f9a567b0c57c8347957470ea5	2018-09-20 14:00:54 -07:00
Wei Yang	8aedc27a63	checking device types of input and weights at RNN (#10185 ) Summary: - fixes #9534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10185 Differential Revision: D9141222 Pulled By: weiyangfb fbshipit-source-id: bb652e42cc15917019df080d6bce2926b18f3476	2018-09-18 20:26:02 -07:00
Xingdong Zuo	e2bc95e1bd	add `ModuleList.insert` (#11664 ) Summary: fixes #11652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11664 Differential Revision: D9892845 Pulled By: ezyang fbshipit-source-id: 2c910d6bc0b28a999e25beca6e398fd0f35535c5	2018-09-18 07:41:28 -07:00
Tongzhou Wang	7df6650e9c	Fix empty embedding bag on cuda (#11740 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11739 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11740 Differential Revision: D9881392 Pulled By: SsnL fbshipit-source-id: 2964d314f199dd9b4bb69e36592b67efdf5e0760	2018-09-17 14:40:03 -07:00
Tongzhou Wang	6f6b03566b	Vectorize grid sample 2d CPU kernels (#10980 ) Summary: This PR vectorizes the CPU grid sample 2d forward and backward kernels. Specifically, 1. add `.data()` in `TensorAccessor` 2. support non-void return value for declaring CPU kernel stub 2. add `bool at:: geometry_is_contiguous(IntList sizes, IntList strides)` 1. The following vectorized CPU primitives are added: + `gather<scale>(baseaddr, vindex)`: `result[i] = baseaddr[vindex[i] * scale]` + `mask_gather<scale>(src, baseaddr, vindex, mask)`: `result[i] = mask[i] ? baseaddr[vindex[i] * scale] : src[i]`. + comparison ops + binary logical ops + `min(a, b)` + `cast<dst_t, src_t>(src_vec)`: changing dtype but keeping the bit representation + `blendv(a, b, mask)`: `result[i] = mask[i] ? b[i] : a[i]`. + ctor with multiple values (i.e., `setr`) + `arange(start = 0, step = 1)`: constructs a vector with values specified by the arange parameters + `convert_to_int_of_same_size(vec)`: convert floating point vector to corresponding integral type of same size + `interleave2(a, b)` & `deinterleave2(x, y)`: interleave or deinterleaves two vectors. E.g., for `interleave`: ``` inputs: {a0, a1, a2, a3, a4, a5, a6, a7} {b0, b1, b2, b3, b4, b5, b6, b7} outputs: {a0, b0, a1, b1, a2, b2, a3, b3} {a4, b4, a5, b5, a6, b6, a7, b7} ``` 2. Grid sample CPU kernel implementations are described in the following note (also in `GridSampleKernel.cpp`: ``` NOTE [ Grid Sample CPU Kernels ] Implementation of vectorized grid sample CPU kernels is divided into three parts: 1. `ComputeLocation` struct Transforms grid values into interpolation locations of the input tensor for a particular spatial dimension, basing on the size of that dimension in input tensor, and the padding mode. ``` ```cpp template<typename scalar_t, GridSamplerPadding padding> struct ComputeLocation { using Vec = Vec256<scalar_t>; // ctor ComputeLocation(int64_t size); // Given grid values `in`, return the interpolation locations after // un-normalization and padding mechanism (elementwise). Vec apply(const Vec &in) const; // Similar to `apply`, but also returns `d apply(in) / d in` // (elementwise). // this is often used in gradient computation. std::pair<Vec, Vec> apply_get_grad(const Vec &in) const; }; ``` ``` 2. `ApplyGridSample` struct Owns N `ComputeLocation` structs, where N is the number of spatial dimensions. Given N input grid vectors (one for each spatial dimension) and spatial offset, it gets the interpolation locations from `ComputeLocation`s, applies interpolation procedure, and then writes to the output (or grad_input & grad_grid in backward). ``` ```cpp template<typename scalar_t, int spatial_dim, GridSamplerInterpolation interp, GridSamplerPadding padding> struct ApplyGridSample { // ctor ApplyGridSample(const TensorAccessor<scalar_t, 4>& input); // Applies grid sampling (forward) procedure: // 1. computes interpolation locations from grid values `grid_x` and // `grid_y`, // 2. interpolates output values using the locations and input data // in `inp_slice`, and // 3. writes the first `len` values in the interpolated vector to // `out_slice` with spatial offset being `offset`. // // This assimes that `grid_x` and `grid_y` all contain valid grid // values \in [-1, 1], even at indices greater than `len`. // // The `*_slice` argument namess mean samples within a batch (i.e., // with the batch dimension sliced out). void forward(TensorAccessor<scalar_t, 3>& out_slice, const TensorAccessor<scalar_t, 3>& inp_slice, int64_t offset, const Vec& grid_x, const Vec& grid_y, int64_t len) const; // Applies grid sampling (backward) procedure. Arguments semantics // and strategy are similar to those of `forward`. void backward(TensorAccessor<scalar_t, 3>& gInp_slice, TensorAccessor<scalar_t, 3>& gGrid_slice, const TensorAccessor<scalar_t, 3>& gOut_slice, const TensorAccessor<scalar_t, 3>& inp_slice, int64_t offset, const Vec& grid_x, const Vec& grid_y, int64_t len) const; } ``` ``` 3. `grid_sample_2d_grid_slice_iterator` function Among the tensors we work with, we know that the output tensors are contiguous (i.e., `output` in forward, and `grad_input` & `grad_grid` in backward), we need to randomly read `input` anyways, and `grad_output` usually comes from autograd and is often contiguous. So we base our iterating strategy on the geometry of grid. `grid_sample_2d_grid_slice_iterator` function provides an abstract to efficiently iterates through a `grid` slice (without batch dimension). See comments of that function on the specific cases and strategies used. ``` ```cpp template<typename scalar_t, typename ApplyFn> void grid_sample_2d_grid_slice_iterator( const TensorAccessor<scalar_t, 3>& grid_slice, const ApplyFn &apply_fn); // `apply_fn` is a function/lambda that can be called as if it has // declaration: // void apply_fn(const Vec256<scalar_t>& grid_x, // const Vec256<scalar_t>& grid_y, // int64_t spatial_offset, int64_t len); ``` ``` `apply_fn` will be called multiple times, and together cover the entire output spatial space. Therefore, e.g., to implement forward 2d grid sample, we can do ``` ```cpp ApplyGridSample<scalar_t, 2, interp, padding> grid_sample(input_accessor); for (int n = 0; n < input_accessor.size(0); n++) { grid_sample_2d_grid_slice_iterator( grid_accessor[n], [&](const Vec256<scalar_t>& grid_x, const Vec256<scalar_t>& grid_y, int64_t spatial_offset, int64_t len) { grid_sample.forward(out_accessor[n], input_accessor[n], spatial_offset, grid_x, grid_y, len); }); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10980 Differential Revision: D9564867 Pulled By: SsnL fbshipit-source-id: 5b7c3c7ea63af00eec230ae9ee1c3e6c6c9679b4	2018-09-16 20:41:10 -07:00
Edward Yang	74197c7115	Restore support for dim=None on WeightNorm. (#11661 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11661 Reviewed By: veenix Differential Revision: D9826799 Pulled By: ezyang fbshipit-source-id: 9eec57bb27a365406669e412f6eb88741b22ed3d	2018-09-14 07:39:43 -07:00
Wei Yang	54107ae8cf	convert output_device at data_parallel from torch.device to index (#10189 ) Summary: - fixes #9984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189 Differential Revision: D9545390 Pulled By: weiyangfb fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e	2018-09-11 20:27:07 -07:00
Erik Brinkman	91089a7e17	Add GPU implementation of pdist (#11102 ) Summary: Add the gpu kernel version. The parallelism I went with performs poorly when there are a large number of vectors, but they're all short, as I don't allocate the thread pool to wrap in that case. Test Plan --------- ``` python -m unittest test_torch.TestTorch.test_pdist_{empty,scipy} test_nn.TestNN.test_pdist{,_zeros,_empty_row,_empty_col,_cpu_gradgrad_unimplemented,_cuda_gradgrad_unimplemented} test_jit.TestJitGenerated.test_nn_pdist ``` Current performance specs are a little underwhelming, I'm in the process of debugging. size \| torch \| torch cuda \| scipy -----\|-------\|------------\|------ 16 x 16 \| 9.13 µs ± 3.55 µs \| 9.86 µs ± 81.5 ns \| 15.8 µs ± 1.2 µs 16 x 1024 \| 15 µs ± 224 ns \| 9.48 µs ± 88.7 ns \| 88.7 µs ± 8.83 µs 1024 x 16 \| 852 µs ± 6.03 µs \| 7.84 ms ± 6.22 µs \| 4.7 ms ± 166 µs 1024 x 1024 \| 34.1 ms ± 803 µs \| 11.5 ms ± 6.24 µs \| 273 ms ± 6.7 ms 2048 x 2048 \| 261 ms ± 3.5 ms \| 77.5 ms ± 41.5 µs \| 2.5 s ± 97.6 ms 4096 x 4096 \| 2.37 s ± 154 ms \| 636 ms ± 2.97 µs \| 25.9 s ± 394 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/11102 Differential Revision: D9697305 Pulled By: erikbrinkman fbshipit-source-id: 2b4f4b816c02b3715a85d8db3f4e77479d19bb99	2018-09-07 09:09:46 -07:00
iotamudelta	9de2085806	Use custom hcc/HIP, purge hcSPARSE (#11198 ) Summary: * purge hcSPARSE now that rocSPARSE is available * integrate a custom hcc and HIP * hcc brings two important compiler fixes (fixes hundreds of unit tests) * HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch) * mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault) * optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198 Differential Revision: D9652340 Pulled By: ezyang fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690	2018-09-06 19:38:07 -07:00
iotamudelta	33c7cc13ca	improve docker packages, fix bugs, enable tests, enable FFT (#10893 ) Summary: * improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs) * integrate rocFFT (i.e., enable Fourier functionality) * fix bugs in ROCm caused by wrong warp size * enable more test sets, skip the tests that don't work on ROCm yet * don't disable asserts any longer in hipification * small improvements Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893 Differential Revision: D9615053 Pulled By: ezyang fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b	2018-09-02 08:54:42 -07:00
Tongzhou Wang	e85f3fccb3	Fix relying on UB in test_data_parallel_nested_output (#11092 ) Summary: We shouldn't reply on plain `dict` ordering. Example failure: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-xenial-cuda8-cudnn6-py3-test1/8417/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/11092 Reviewed By: ezyang Differential Revision: D9583274 Pulled By: SsnL fbshipit-source-id: ba80b96648c98c24c2ec5fa6fd9aa566c095cce7	2018-08-30 13:10:25 -07:00
Erik Brinkman	611a608517	Add ATen pdist CPU kernel (#10782 ) Summary: Also add single grad whitelist to the jit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782 Reviewed By: ezyang Differential Revision: D9583378 Pulled By: erikbrinkman fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944	2018-08-30 11:55:27 -07:00
Will Feng	b14f2e899c	Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279 ) Summary: When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times: ``` _sparseDims + _denseDims = len(shape) _indices.shape: dimensionality: 2, shape: (_sparseDims, nnz) _values.shape: dimensionality: 1 + _denseDims. shape: (nnz, shape[_sparseDims:]) ``` This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled. Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279 Differential Revision: D8936683 Pulled By: yf225 fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e	2018-08-23 10:10:24 -07:00
Tongzhou Wang	de11a5fb28	Resubmit #8322 with scipy version check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10775 Differential Revision: D9458207 Pulled By: SsnL fbshipit-source-id: f2b0dbf2d236134afded9b15d8bf55ff98f50e7b	2018-08-22 13:39:49 -07:00
Tongzhou Wang	c5c1c051ca	Fix dropout fused kernel applied in eval mode (#10621 ) Summary: fixes https://github.com/pytorch/pytorch/issues/10584 cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10621 Differential Revision: D9379397 Pulled By: SsnL fbshipit-source-id: 5ff2939ba794af082ce597ef289a09ee757636dc	2018-08-17 14:54:42 -07:00
Jerry Ma	afd7477eaa	Add ``buffers(),` `named_buffers()`` methods. (#10554 ) Summary: This commit adds the ``buffers()`` and ``named_buffers()`` methods as analogues of ``parameters()`` and ``named_parameters()``. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554 Reviewed By: SsnL Differential Revision: D9367762 Pulled By: jma127 fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e	2018-08-16 16:26:48 -07:00
Simon Wang	a129f9ad3b	Revert D9332335: [pytorch][PR] Implements volumetric (5d) affine grid generation. Differential Revision: D9332335 Original commit changeset: 1b3a91d078ef fbshipit-source-id: 3dcce680257a6da121f5d67918ed4236e0c5bfec	2018-08-15 15:25:11 -07:00
Adam Paszke	86363e1d8e	Move RNN implementations to C++ (#10481 ) Summary: This is the first of two changes that are supposed to improve how we handle RNNs in the JIT. They still get traced as `PythonOp`s, but now it will be much easier to actually expose them to the JIT as e.g. `aten::lstm`, and ignore the Python interpreter entirely. This needs some symbolic adjustments that will be part of a second PR. Even when we fix symbolics, there will still be a bit of a problem with statefulness of the cuDNN API (we need a mutable cache for the dropout state, but our IR has no way of representing that). zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10481 Reviewed By: ezyang Differential Revision: D9341113 Pulled By: apaszke fbshipit-source-id: 0ae30ead72a1b12044b7c12369d11e5ca8ec30b5	2018-08-15 13:25:41 -07:00
Tongzhou Wang	254dedf604	Propagate NaN through threshold (#10277 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/10238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10277 Reviewed By: SsnL Differential Revision: D9199825 Pulled By: soumith fbshipit-source-id: 8ee7f9a72d9546d429f311c3f6028461d3c93fe2	2018-08-15 12:59:31 -07:00
Brian Hart	9cffe783f1	relax tolerance for two torch.half (float16) tests (#10519 ) Summary: Two tests in the 'nn' test bucket may fail when the torch.half (float16) data type is used. The assertions used in the tests intend to allow slight floating point imprecision in the results, but the tolerances used for the comparisons are too strict for the half type. Relax the tolerances so that slight float16 imprecision won't cause test failures. The affected tests are: - test_variable_sequence_cuda - test_Conv2d_groups_nobias For more information, see issue: https://github.com/pytorch/pytorch/issues/7420 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10519 Differential Revision: D9343751 Pulled By: soumith fbshipit-source-id: 90aedf48f6e22dd4fed9c7bde7cd7c7b6885845a	2018-08-15 12:11:20 -07:00
Eli Stevens	f5a4dd89b5	Implements volumetric (5d) affine grid generation. (#8322 ) Summary: I've implemented affine grid generation for volumetric (5d) inputs. The implementation is based off of the spatial implementation, extended by one dimension. I have a few questions about my implementation vs. the existing one that I will add inline. I have some extensive test cases for the forward pass here: https://gist.github.com/elistevens/6e3bfb20d8d0652b83bd16b3e911285b However, they use `pytest.fixture` extensively, so I'm not sure the best way to incorporate them into the pytorch test suite. Suggestions? I have not tested backwards at all. Diff probably best viewed with whitespace changes ignored. Thanks for considering! Pull Request resolved: https://github.com/pytorch/pytorch/pull/8322 Differential Revision: D9332335 Pulled By: SsnL fbshipit-source-id: 1b3a91d078ef41a6d0a800514e49298fd817e4df	2018-08-15 11:02:08 -07:00
Tongzhou Wang	6a55238a3f	Grid sampler: nearest interpolation & reflection padding (#10051 ) Summary: closes #9702 . cc jph00 Commit structure: 1. Change the index calculation logic. I will explain using 1-D for simplicity. Previously we have (in pseudo code): ``` // 1. get the float locations from grid scalar_t x = from_grid() // 2. find the integral surrounding indices int x_left = floor(x) int x_right = x_left + 1 // 3. calculate the linear interpolate weights scalar_t w_left = x_right - x scalar_t w_right = x - x_left // 4. manipulate the integral surrounding indices if needed // (e.g., clip for border padding_mode) x_left = manipulate(x_left, padding_mode) x_right = manipulate(x_right, padding_mode) // 5. interpolate output_val = interpolate(w_left, w_right, x_left, x_right) ``` This is actually incorrect (and also unintuitive) because it calculates the weights before manipulate out-of-boundary indices. Fortunately, this isn't manifested in both of the current supported modes, `'zeros'` and `'border'` padding: + `'zeros'`: doesn't clip + `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are clipped to the same value, so weights don't matter But this is a problem with reflection padding, since after each time we reflect, the values of `w_left` and `w_right` should be swapped. So in this commit I change the algorithm to (numbers corresponding to the ordering in the above pseudo-code) ``` 1. get float location 4. clip the float location 2. find the integral surrounding indices 3. calculate the linear interpolate weights ``` In the backward, because of this change, I need to add new variables to track `d manipulate_output / d manipulate_input`, which is basically a multiplier on the gradient calculated for `grid`. From benchmarking this addition doesn't cause obvious slow downs. 2. Implement reflection padding. The indices will keep being reflected until they become within boundary. Added variant of `clip_coordinates` and `reflect_coordinates` to be used in backward. E.g., ```cpp // clip_coordinates_set_grad works similarly to clip_coordinates except that // it also returns the `d output / d input` via pointer argument `grad_in`. // This is useful in the backward pass of grid_sampler. scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t grad_in) ``` For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`. If `in` is reflected odd* times in `'reflection'` mode, `grad_in` is set to `-1`. 3. Implement nearest interpolation. 4. Add test cases 5. Add better input checking Discussed with goldsborough for moving `operator<<` of `at::Device`, `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise `AT_CHECK` can't find them.) 6. Support empty tensors. cc gchanan + Make empty tensors not acceptable by cudnn. + Add `AT_ASSERT(kernel block size > 0)` if using `GET_BLOCKS` + Cache `numel` in `TensorGeometry` I was going to use `numel` to test if cudnn descriptor should accept a tensor, but it isn't used eventually. I can revert this if needed. 7. Add more test cases, including on input checking and empty tensors 8. Remove an obsolete comment 9. Update docs. Manually tested by generating docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051 Differential Revision: D9123950 Pulled By: SsnL fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6	2018-08-10 12:43:27 -07:00
Natalia Gimelshein	5bb21493fd	add fused dropout kernels (#9666 ) Summary: While waiting for dropout to be fully ported to ATen, here's performance fix for the most common dropout case. Dropout is still in python function, I just added efficient path to it. I could not make inplace work, because generator always generates `return self` for inplace function, and I need to return both original tensor and mask, so inplace goes on the existing pass. Even with non-inplace version, since mask is now a ByteTensor, memory used is just a little larger than for inplace dropout, due to savings on mask. Once dropout is moved to aten, these kernels still can be used for efficient implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9666 Reviewed By: SsnL Differential Revision: D8948077 Pulled By: ezyang fbshipit-source-id: 52990ef769471d957e464af635e5f9b4e519567a	2018-08-07 13:34:53 -07:00
Wei Yang	149d4f776b	use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965 ) Summary: - fixes #9141, #9301 - use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion) - return (N) instead of (N, C) to match the same behavior as MultiMarginLoss - Note that with this PR, the following behavior is expected: ``` loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none') loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean') loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum') loss.sum() == loss_sum # True loss.mean() == loss_mean # True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965 Differential Revision: D9038402 Pulled By: weiyangfb fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9	2018-08-03 17:54:19 -07:00
Thomas Viehmann	6456b944fd	ctc_loss odds and ends (#10112 ) Summary: - Add convenience wrapper to pass tensors as input_lengths, target_lengths - Fix documentation example - Check BLANK >= 0 Thank you, Simon and Soumith for the suggestions! Pull Request resolved: https://github.com/pytorch/pytorch/pull/10112 Differential Revision: D9130737 Pulled By: SsnL fbshipit-source-id: f9a0022a969788bda3db9f360e2564b519ebf2e6	2018-08-03 13:25:18 -07:00
Tongzhou Wang	43b151224e	Move grid sampler to ATen (#9961 ) Summary: Spatial version benchmark \| \| CPUFloat THNN \| CPUFloat ATen \| CPUDouble THNN \| CPUDouble ATen \| CUDAHalf THNN \| CUDAHalf ATen \| CUDAFloat THNN \| CUDAFloat ATen \| CUDADouble THNN \| CUDADouble ATen \| \|---------------------------\|---------------\|---------------\|----------------\|----------------\|---------------\|---------------\|----------------\|----------------\|-----------------\|-----------------\| \| [1024x1x28x28] zero pad \| 2.19281888s \| 0.21280479s \| 2.52922535s \| 0.23944831s \| 0.17494774s \| 0.06242800s \| 0.31270599s \| 0.03706479s \| 0.40542483s \| 0.07391024s \| \| [1024x1x28x28] border pad \| 3.04329610s \| 0.24705672s \| 2.29205394s \| 0.22336411s \| 0.17980361s \| 0.06212497s \| 0.31415701s \| 0.03847790s \| 0.43020391s \| 0.07540464s \| \| [32x3x244x244] zero pad \| 18.29301333s \| 2.18566656s \| 19.01662397s \| 3.51552224s \| 1.72487235s \| 0.28933954s \| 2.02466702s \| 0.18178749s \| 2.63671613s \| 0.41391206s \| \| [32x3x244x244] border pad \| 18.72205329s \| 2.02600884s \| 20.13017297s \| 3.25979590s \| 1.96455693s \| 0.33070564s \| 2.18666625s \| 0.19546938s \| 2.91268897s \| 0.38465047s \| For #9702 basics: + grid tensors have dimensions `[N, H, W, 2]` (or `[N, D, H, W, 3]` for 3d). + input/output tensors have dimensions `[N, C, H, W]` (or `[N, C, D, H ,W]` for 3d) + grid sampler maps `input([N, C, inp_H, inp_W]), grid([N, H, W, 2])` to `output([N, C, H, W])` (3d case is similar). variable naming: + `tensor_sH` means the stride of `tensor` at the dimension of `H`. + `tensor_ptr_NCH` is a data pointer that always points to the beginning of the `tensor[n][c][h]` slice in the loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9961 Differential Revision: D9057175 Pulled By: SsnL fbshipit-source-id: 9ed8f1dc376ed10229f047fdcf3c90dbd250bee6	2018-08-01 07:54:46 -07:00
Xiang Gao	6fc75eadf0	Add CELU activation to pytorch (#8551 ) Summary: Also fuse input scale multiplication into ELU Paper: https://arxiv.org/pdf/1704.07483.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551 Differential Revision: D9088477 Pulled By: SsnL fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b	2018-08-01 07:54:44 -07:00
Kyle M. Tarplee	aae37324cc	fixed a newly introduced regression in softmax (#10066 ) Summary: There is a regression in softmin in 0.4.1 that was not present in 0.4.0. The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x). These are not the same. The fix is trivial because the bug is due to operator precedence. This is a major regression that broke my training. I'm not sure how a unit test did not catch this. ``` x = torch.tensor([1, 2, 3.5, 4]) print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0 print(F.softmax(-x, dim=0)) # this is what softmax should be print(F.softmax(x, dim=0)) print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly ``` In 0.4.1 this produces tensor([-0.0278, -0.0755, -0.3385, -0.5581]) tensor([0.6668, 0.2453, 0.0547, 0.0332]) tensor([0.0278, 0.0755, 0.3385, 0.5581]) tensor([-0.0278, -0.0755, -0.3385, -0.5581]) In 0.4.0 this produces the correct values tensor([ 0.6668, 0.2453, 0.0547, 0.0332]) tensor([ 0.6668, 0.2453, 0.0547, 0.0332]) tensor([ 0.0278, 0.0755, 0.3385, 0.5581]) tensor([-0.0278, -0.0755, -0.3385, -0.5581]) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066 Differential Revision: D9106995 Pulled By: soumith fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b	2018-07-31 19:28:30 -07:00
Roy Li	2422801625	fix _pointwise_loss for target gradients (#10018 ) Summary: _pointwise loss has some python special casing, we converted reduction to aten enums too early. fixes #10009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018 Differential Revision: D9075489 Pulled By: li-roy fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162	2018-07-31 13:39:58 -07:00
Thomas Viehmann	685224aa14	Add CTC loss (#9628 ) Summary: The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the modification that is is in log space. The there also is a binding for the (much faster) CuDNN implementation. This could eventually fix #3420 I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments. I could use feedback on all sorts of things, including: - Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors) - Input convention. I use log probs because that is what the gradients are for. - Launch parameters for the kernels - Errors and obmissions and anything else I'm not even aware of. Thank you for looking! In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this). I have read CuDNN is much faster than implementations because it does not use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step. Average timings for the kernels from nvprof for some size: ``` CuDNN: 60.464us compute_alphas_and_betas 16.755us compute_grads_deterministic Cuda: 121.06us ctc_loss_backward_collect_gpu_kernel (= grads) 109.88us ctc_loss_gpu_kernel (= alphas) 98.517us ctc_loss_backward_betas_gpu_kernel (= betas) WarpCTC: 299.74us compute_betas_and_grad_kernel 66.977us compute_alpha_kernel ``` Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations. Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably. My performance measuring testing script: ``` import timeit import sys import torch num_labels = 10 target_length = 30 input_length = 50 eps = 1e-5 BLANK = 0#num_labels batch_size = 16 torch.manual_seed(5) activations = torch.randn(input_length, batch_size, num_labels + 1) log_probs = torch.log_softmax(activations, 2) probs = torch.exp(log_probs) targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long) targets_2d = targets.view(batch_size, target_length) target_lengths = torch.tensor(batch_size[target_length]) input_lengths = torch.tensor(batch_size[input_length]) activations = log_probs.detach() def time_cuda_ctc_loss(grout, args): torch.cuda.synchronize() culo, culog_alpha = torch._ctc_loss(args) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() def time_cudnn_ctc_loss(groupt, args): torch.cuda.synchronize() culo, cugra= torch._cudnn_ctc_loss(args) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() def time_warp_ctc_loss(grout, args): torch.cuda.synchronize() culo = warpctc.ctc_loss(args, blank_label=BLANK, size_average=False, length_average=False, reduce=False) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() if sys.argv[1] == 'cuda': lpcu = log_probs.float().cuda().detach().requires_grad_() args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK] grout = lpcu.new_ones((batch_size,)) torch.cuda.synchronize() print(timeit.repeat("time_cuda_ctc_loss(grout, args)", number=1000, globals=globals())) elif sys.argv[1] == 'cudnn': lpcu = log_probs.float().cuda().detach().requires_grad_() args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True] grout = lpcu.new_ones((batch_size,)) torch.cuda.synchronize() print(timeit.repeat("time_cudnn_ctc_loss(grout, args)", number=1000, globals=globals())) elif sys.argv[1] == 'warpctc': import warpctc activations = activations.cuda().detach().requires_grad_() args = [activations, input_lengths.int(), targets.int(), target_lengths.int()] grout = activations.new_ones((batch_size,), device='cpu') torch.cuda.synchronize() print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals())) ``` I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628 Differential Revision: D8952453 Pulled By: ezyang fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860	2018-07-31 11:09:48 -07:00
Owen Anderson	6ed41adb04	Use round-to-negative division when computing output sizes for convolutions involving striding and dilation. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9640 Differential Revision: D8948081 Pulled By: resistor fbshipit-source-id: 06f2e3ad1bdb448be6f36577cb9bd27c884df595	2018-07-27 13:22:54 -07:00
Tongzhou Wang	27455e9c78	Use _six for inf and nan (#9500 ) Summary: Things like `float('inf')` are actually quite expensive. ```py In [1]: import math In [2]: %timeit -n 200 math.inf 49.3 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 200 loops each) In [3]: %timeit -n 200 float('inf') 194 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 200 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9500 Reviewed By: soumith Differential Revision: D8876229 Pulled By: SsnL fbshipit-source-id: 78602b76bb53d5588910b58270930c0bd413d2d7	2018-07-18 10:40:29 -07:00
tippisum	5c695e3a60	Implement 2D and 3D alpha_dropout (#9073 ) Summary: It implements per-channel alpha_dropout. It also creates corresponding function classes and unifies the process of dropout and alpha_dropout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9073 Differential Revision: D8727008 Pulled By: ezyang fbshipit-source-id: 9d509f9c5db4e98f7b698cdfc4443505a4d2b331	2018-07-17 17:10:16 -07:00
Karan Dwivedi	97008a64a1	Add ModuleDict and ParameterDict containers (#8463 ) Summary: Addresses: https://github.com/pytorch/pytorch/issues/4048 and https://github.com/pytorch/pytorch/pull/5297#issuecomment-394924139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8463 Reviewed By: SsnL Differential Revision: D8689291 Pulled By: ezyang fbshipit-source-id: 47e67d9bae1b64ec10771a2c00c56229463b1598	2018-07-15 17:40:52 -07:00
Richard Zou	05559b4071	Accumulate MSELoss reduce=True into accreal instead of real (#9287 ) Summary: THNN was accumulating the result of reduction loss functions into real instead of accreal. This was causing precision issues with MSELoss. This patch only fixes MSELoss. Some of the other losses exhibit bad precision as well (because they accumulate into real instead of accreal) and require more investigation. I will open an issue for those (#9286) Fixes #8710 cc li-roy SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/9287 Reviewed By: SsnL Differential Revision: D8775708 Pulled By: zou3519 fbshipit-source-id: d1a1f159deee0cb90fd8e81e63b246115eea8e9e	2018-07-11 10:25:36 -07:00
Roy Li	a47a30b9ce	Implement grid_sampler in aten (#8929 ) Summary: Partially addresses #8928. Maybe #7273? Pull Request resolved: https://github.com/pytorch/pytorch/pull/8929 Reviewed By: ezyang Differential Revision: D8668919 Pulled By: li-roy fbshipit-source-id: 8ad07b224d2ab211c274c4c10f042501efaae32c	2018-07-10 15:10:24 -07:00
Tongzhou Wang	e8536c08a1	Update extension docs, fix Fold/Unfold docs (#9239 ) Summary: Commits: 1. In extension doc, get rid of all references of `Variable` s (Closes #6947 ) + also add minor improvements + also added a section with links to cpp extension :) goldsborough + removed mentions of `autograd.Function.requires_grad` as it's not used anywhere and hardcoded to `return_Py_True`. 2. Fix several sphinx warnings 3. Change `*` in equations in `module/conv.py` to `\times` 4. Fix docs for `Fold` and `Unfold`. + Added better shape check for `Fold` (it previously may give bogus result when there are not enough blocks). Added test for the checks. 5. Fix doc saying `trtrs` not available for CUDA (#9247 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9239 Reviewed By: soumith Differential Revision: D8762492 Pulled By: SsnL fbshipit-source-id: 13cd91128981a94493d5efdf250c40465f84346a	2018-07-08 19:09:39 -07:00
Ailing Zhang	227c8f2654	Implement nn.functional.interpolate based on upsample. (#8591 ) Summary: This PR addresses #5823. * fix docstring: upsample doesn't support LongTensor * Enable float scale up & down sampling for linear/bilinear/trilinear modes. (following SsnL 's commit) * Enable float scale up & down sampling for nearest mode. Note that our implementation is slightly different from TF that there's actually no "align_corners" concept in this mode. * Add a new interpolate function API to replace upsample. Add deprecate warning for upsample. * Add an area mode which is essentially Adaptive_average_pooling into resize_image. * Add test cases for interpolate in test_nn.py * Add a few comments to help understand linear interpolation code. There is only "cubic" mode missing in resize_images API which is pretty useful in practice. And it's labeled as hackamonth here #1552. I discussed with SsnL that we probably want to implement all new ops in ATen instead of THNN/THCUNN. Depending on the priority, I could either put it in my queue or leave it for a HAMer. After the change, the files named as Upsampling.c works for both up/down sampling. I could rename the files if needed. Differential Revision: D8729635 Pulled By: ailzhang fbshipit-source-id: a98dc5e1f587fce17606b5764db695366a6bb56b	2018-07-06 15:28:11 -07:00
Tongzhou Wang	7b25cbbef9	Test nn.Module on non-contiguous inputs (#9114 ) Summary: 1. Let `ModuleTest` raise when they fail on non-contiguous inputs. Fix legacy modules. 2. Fix BN (both THNN and cuDNN) not working on non-contiguous inputs. 3. Fix CUDA EmbeddingBag not working on non-contiguous inputs. To prevent calling `.contiguous()` on in both `forward` and `backward`, a. prefix all current `embedding_bag` functions with `_`, indicating that they require input to be contiguous (there is a check in each function). b. create `embedding_bag`, which makes input arguments `.contiguous()`, and calls `_embedding_bag` 3. Make many ATen `embedding` functions to work on non-contiguous inputs so we don't need to call `input = input.contiguous()` in Python `nn.functional.embedding`. 4. Fix dense-sparse addition when the sparse input is not coalesced and indices or values tensor is not contiguous. This came up in the test cases of Embedding modules with `sparse=True`. Added tests. 5. Update `TensorUtils.cpp` to use `AT_` macros. Request: review from cpuhrsch on the `Embedding` changes. review from ezyang on ATen sparse & BN changes. Closes https://github.com/pytorch/pytorch/pull/9114 Differential Revision: D8717299 Pulled By: SsnL fbshipit-source-id: 0acc6f1c9522b5b605361e75112c16bbe1e98527	2018-07-05 21:09:34 -07:00
Will Feng	ff501c30af	Turn on UBSAN in the OSS build (#8813 ) Summary: Copy of https://github.com/pytorch/pytorch/pull/8802 Closes https://github.com/pytorch/pytorch/pull/8813 Differential Revision: D8707364 Pulled By: yf225 fbshipit-source-id: bc201980b50e9fb44c42a17f898b50d3558fc417	2018-07-05 15:55:49 -07:00
Roy Li	21c786071b	update nn loss tests to use new reduction arg (#9118 ) Summary: The tests were using the old args, which caused them to emit a lot of deprecation warnings. closes #9103. Reviewed By: ezyang Differential Revision: D8720581 Pulled By: li-roy fbshipit-source-id: 3b79527f6fe862fb48b99a6394e8d7b89fc7a8c8	2018-07-02 19:41:57 -07:00
Thomas Viehmann	e977485449	detach spectral norm calculated weight in eval mode (#9020 ) Summary: As we left weight to be the last calculated weight in eval mode, we need to detach it from the computation in order to facilitate using backward. The typical use case is in GANs when the discriminator has spectral norm, is in eval mode and we want to backprop through the discriminator to get weight gradients for the generator. Closes https://github.com/pytorch/pytorch/pull/9020 Reviewed By: ezyang Differential Revision: D8694054 Pulled By: SsnL fbshipit-source-id: 09ee5843687cac3ed4c40759ac577a14c5371730	2018-07-02 10:39:47 -07:00
Tongzhou Wang	623ae0c07c	Fix loading 0.4 BN checkpoints (#9004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8481 Closes https://github.com/pytorch/pytorch/pull/9004 Reviewed By: soumith Differential Revision: D8684017 Pulled By: SsnL fbshipit-source-id: 57820ad5f6b60795358c9447409a364a93ffa1d9	2018-07-01 22:24:21 -07:00
Roy Li	c61f0217a5	combine size_average and reduce args in loss functions (#8018 ) Summary: closes #7929 Closes https://github.com/pytorch/pytorch/pull/8018 Differential Revision: D8682540 Pulled By: li-roy fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936	2018-07-01 05:39:00 -07:00
Karan Dwivedi	a41d433d9d	Check key should be string in nn.Module.add_module, parameter and buffer (#8960 ) Summary: Because I probably messed up the rebase in https://github.com/pytorch/pytorch/pull/8905 Closes https://github.com/pytorch/pytorch/pull/8960 Reviewed By: soumith Differential Revision: D8668202 Pulled By: ezyang fbshipit-source-id: 41e19803c7ac7aac898c8e70c6a9769314476ca9	2018-06-27 19:40:00 -07:00
Orion Reblitz-Richardson	edb88b5f3a	Update from Facebook (#8887 ) * add opencl + fpga context adds an opencl context inside caffe2/fb which can be used for fpga access * [Caffe2] Force tensor inference checks to be triggered during testing We've started to rely on TensorInference functions more for different analysis. This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level. * [Caffe2] Fix cost models for DotProduct and Div. Update Tensor Inference for dot product As title. DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs. TensorInference defined to support implementation. * [SG-MoE] Add an option to make the experts NOT as components * [nomnigraph] Rename and fixup convertToNeuralNetOperator API This will make things a bit cleaner * no longer symlink THNN.h and THCUNN.h * forced decoder network (onnx export) Closes https://github.com/pytorch/translate/pull/95 Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties. Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea * Revert schema change to fix production models Revert schema change to fix production models * MockLogDeviceReader - rebase on FIX # Goal 1), Build a make_mock_log_device_reader using make_mock_reader 2), Replace the real log_device_reader here: https://fburl.com/raihwf1p # Log by D8151734 Real log_device_reader: ``` I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0 I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin * [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier implement log barrier as a regularization method * Add teacher weight screening. Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function. * Add NormalizerContext See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file. I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow. https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1 * Adding cosine similarity option in dot processor Add pairwise cosine similarity option in dot product. Add an option to concate dot product and cosine similarity. Add test cases. * [nomnigraph][redo] Concat elim for sparseNN Same as D7962948, which was reverted because Operator Schema was not defined * [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads). https://github.com/pytorch/pytorch/pull/7918/files * [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size enables nomnigraph and reduces codesize * [Warmup] Allow both offline incremental training and online training Change plan name on saving side and reading side to support both training type This diff depends on D8128530 and D8168651. * Revert D7802642: [Warmup] Allow both offline incremental training and online training This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Add legacy grad logic to fix div op on old graphs. Add legacy grad logic to fix div op on old graphs. * Correctly propagate operator failures Propagate errors from operators that throw exceptions and return false * Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption(). And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope. * [opt] hgdirsync wasn't enabled, merge diverged code Here's the damage, P59732616 basically xplat was left behind but had the change from assert to CAFFE_ENFORCE * OMP parallelism over RoIs for RoIAlign op Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on the number of OMP threads set during startup. PR: https://github.com/pytorch/pytorch/pull/8562 * Use int64_t for shape in FillOps to avoid overflow of int32 * Implement Rotated RoIAlign op Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086. The idea is simple - orientation/angle is added as an RPN anchor parameter and then the angle is further regressed similar to bbox coords. There are some additional changes related to NMS and IoU, but besides that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ. RoIs are represented in [center_x, center_y, width, height, angle] format. `angle` repre * Rotated RoIAlign op CUDA forward implementation CUDA forward impl for D8415490 * RoIAlignRotated op CUDA backward pass implementation TSIA * All remaining fixes to eliminate process_github.sh Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py remove skipIf(True, 'Fbcode') line from process_github.sh replace sed of cpp file with #ifdef to control cudnnDestroy use undo sync-time deletion of .gitattributes, remove process_github.sh switch to using _utils._internal rather than try-import-except This diff also fixes the open-source bug where rebuilds have * Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package * [easy] improve error log in adagrad op as title * re-allow use of thnn_h_path This fixes cffi usage in OSS * [4/4] [tum] paralyzing layerNorm for GPU full sync as title * add compile=False to pytorch tests, remove hack with pyc * Add shape and type inference for RowWiseArgMax operator See title * Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally # Problem `MockHiveReader` uses `GlobalCounter` to limit `max_examples`. GlobalCounter on server node collect local counts from worker nodes every 1 sec. This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`. # Plan Given, ``` Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int * [Caffe2] Fix FCGradient cost inference. Prevent overflow in cost inference FCGradient missed a factor 2 in the `num_outputs == 3` case. Overflow was occurring with flop calculation for FC. Changed types to `uint64_t` to prevent future problems. * Fix binary ops with empty inputs Fix binary ops with empty inputs * Support the filling of input blob with provided data as title for Biz Integrity case * Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test. * [c2][easy] improve pack ops error loggings as desc. * Add ShapeTypeInference for LpNorm operator As desc * Shard test_nn to reduce runtime for each test target Closes https://github.com/pytorch/pytorch/pull/8793 The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future. * Change default caffe2_streams_per_gpu to 1 * Remove IN_SANDCASTLE from common.py and test_nn.py We prefer to disable the failing tests through Sandcastle UI instead. * Add a new class for an updated prof_dag.proto This diff contains: - An updated prof_dag.proto that contains blob profiles. - A class to deserialize this information (serialization is in a follow up diff) - Update to separate profiling information from NeuralNet (and use it as part of the class above). - Unit tests * Lambdarank for SparseNN This diff adds a lambda_rank_layer for SparseNN. changes include 1) Adds support for multi sessions in c2 op 2) Adds support for two different loss functions in c2 op 3) Unit tests for op * Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [easy] A few fixups to multithread predictor benchmark (1) support perf on T6 server (2) remove dead code * fix a bug about the map size as title * Fix reduce sum on in-place case. Fix reduce sum on in-place case. * [Warmup] Reland reverted diff Allow both offline incremental training and online training Closes https://github.com/pytorch/pytorch/pull/8827 fix net transform integration test. Allow offline and online trainer to coexist D7802642. * Add StoreHandlerNotAvailableException Add an exception for a store that is not available or has been deleted. * Use exception handling for fault tolerance, missing KV store Remove status blobs to communication ops so that exceptions propagate on failure. * [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj for simple bounded constrained optimization, incl non-negative box constraints. * [GanH]: Adaptive Weighting with More Estimations With implemented postivity optimization, we now learn adaptive weights with different parameterizations. This improves parameter estimation and training stability. * Revert some changes for landing * Remove AutoNoGIL in StorageSharing * Temporarily disable net_tests * Revert "[Caffe2] Force tensor inference checks to be triggered during testing" This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4. * Revert "Fix reduce sum on in-place case." This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64. * Revert "Revert "Fix reduce sum on in-place case."" This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.	2018-06-26 14:55:48 -07:00
Vadim Velikodniy	6e28d4d364	Add pos_weight argument to nn.BCEWithLogitsLoss (#5660 ) (#6856 ) * Add pos_weight argument to nn.BCEWithLogitsLoss and F.binary_cross_entropy_with_logits (#5660) - Add an option to control precision/recall in imbalanced datasets - Add tests (but new_criterion_tests) * Move pos_weight to the end of args list in the documentation. `pos_weight` was moved to the end because it is the last argument in both `nn.BCEWithLogitsLoss` and `binary_cross_entropy_with_logits`	2018-06-26 12:31:07 -04:00
li-roy	85f4d2b55a	throw error when grid_sample is passed unsupported mode (#8884 )	2018-06-25 22:37:41 -04:00
Thomas Viehmann	fc22bf3e82	Spectral norm improvements (#8590 ) * Spectral norm improvements - Don't do iterations on weight in eval mode To facilitate this, register weight as buffer in order to be able to use module with spectral norm in eval mode after immediately after loading state dict (#8208) - Use weight instead of weight_orig as weight when removing spectral norm - Add dim parameter in case the normalization should occur w.r.t. a dimension other than 0 (#7865) * add and update spectral norm tests * More spectral norm tests Thank you, Simon, for the suggestions.	2018-06-24 17:15:13 -04:00
Ailing	ddda7cfea5	allow output_size to contain None in adaptive pooling methods (#8596 ) * allow output_size to contain None in adaptive pooling methods * fix lint * address comments	2018-06-22 13:29:15 -04:00
Will Feng	ac068fdabe	Use env var to pass sharding options to test_nn.py (#8727 ) Buck doesn't support passing arguments to Python unit tests, and we have to use environment variables to pass the sharding options instead. Also, buck test doesn't go through the __name__ == '__main__' code path and we need to move the env var checking logic to top-level. * Use env var to pass sharing options to test_nn.py * Move env var checking to top-level * fix lint	2018-06-21 08:30:28 -04:00
Will Feng	d6c873a393	Shard test_nn to reduce runtime for each test target (#8678 ) * Shard test_nn to reduce runtime for each test target * Use load_tests for selecting tests to enable * fix lint * Use arg parser from common.py	2018-06-20 15:01:28 -04:00
Richard Zou	2039c7a38f	Fix test_rnn_args_check (#8606 ) test_rnn_args_check generates mismatched input_shape and hidden_shape args. To do this, it changes a dimension of input_shape or hidden_shape to have an incorrect size. Before, the test was changing the size of a dimension to -1. However, this is flawed because an input of size i.e. (6, -1, 2) is wrong. This PR fixes it so that the test changes sizes of dimensions to `bad_size = 7`. As long as none of the other sizes (input_size, hidden_size, num_layers, batch_size) divide this, we don't have to worry about that dimension being accidentally broadcasted into working.	2018-06-18 14:08:57 -04:00
Wei Yang	ae55865a3b	Migrated hardshrink() to ATen and deprecated nn.Hardshrink() (#8117 ) * 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet * optimized memory read/write * 1. pass in lambd as scalar for CPU/CUDA_apply; 2. removed tests for hardshrink at test_legacy_nn fixes test_utils * 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda * 1. printing lambd value; 2. default lambd=0.5 is still failing * getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py * cleaned up debug printf	2018-06-14 16:42:20 -04:00
Tongzhou Wang	a77b391de7	[SpectralNorm] don't register original weight as buffer (#8170 ) * don't register original weight as buffer; fixes for buffers that require grad * add test	2018-06-12 14:42:05 -04:00
Wei Yang	4c2a1a1a64	Added backward function for kl_div target (#7839 ) * added backward fn for target * added module test for kl_div target, and assuming targets are probabilities	2018-06-07 17:17:18 -07:00
li-roy	73966f65ae	Stop BCELoss from returning negative results (#8147 ) * Stop BCELoss from returning negative results * check explicitly for 0 before taking log * add tests * fix lint * address comments	2018-06-07 20:06:04 -04:00
Tongzhou Wang	36b8cc5483	skip CUDA memory leak check on Windows altogether (#8213 )	2018-06-06 17:29:53 -04:00
Tongzhou Wang	35f08b930d	Allow parallel_apply to take in list[Tensor] (#8047 )	2018-06-06 13:49:52 -04:00
Tongzhou Wang	c0a419e6ba	Add non_blocking to Tensor/Module.to (#7312 ) * Add non_blocking to Tensor/Module.to * flake8 * Add argparse tests * cpp parse * Use C++ parser * use a commong parse function with Tensor.to * fix test_jit * use THPObjectPtr * increase refcount for None, True, and False * address comments * address comments	2018-06-04 18:46:52 -04:00
Marcin Elantkowski	c2046c1e5e	Implement adaptive softmax (#5287 ) * Implement adaptive softmax * fix test for python 2 * add return_logprob flag * add a test for cross-entropy path * address review comments * Fix docs * pytorch 0.4 fixes * address review comments * don't use no_grad when computing log-probs * add predict method * add test for predict * change methods order * get rid of hardcoded int values * Add an optional bias term to the head of AdaptiveSoftmax	2018-06-04 12:12:03 -04:00
Tongzhou Wang	eb2f21f1e4	Skip CUDA memory leak test on BN tests on windows (#8043 )	2018-06-01 18:09:14 -04:00
Tongzhou Wang	c6a923f486	Support modules that output scalar in Gather (and data parallel) (#7973 ) * Support modules that output scalar in Gather (and data parallel) * Improve warning msg	2018-06-01 16:20:39 -04:00
Tongzhou Wang	bf29abd908	propagate nan in some activations (#8033 ) * propagate nan in some activations * fix py2 not having math.nan * flake8	2018-06-01 14:08:01 -04:00
Tongzhou Wang	85ee94b7be	Add memory leak check in CUDA tests (#7270 ) * Add memory leak check in CUDA tests * Tracking multi-GPU too * fix run_test.py not running __name__ == '__main__' content; add test for make_cuda_memory_checked_test * add a comment * skip if cuda * 1. Change the wrapper to a method in common.py:TestCase 2. Refactor common constants/method that initialize CUDA context into common_cuda.py 3. Update some test files to use TEST_CUDA and TEST_MULTIGPU * Fix MaxUnpool3d forward memory leak * Fix MultiLabelMarginCriterion forward memory leak * Fix MultiMarginLoss backward memory leak * default doCUDAMemoryCheck to False * make the wrapper skip-able * use TEST_MULTIGPU * add align_corners=True/False tests for Upsample; fix TEST_CUDNN * finalize interface * VolumetricMaxUnpooling_updateOutput * fix test_nccl * rename THC caching allocator methods to be clearer * make the wrapped function a method * address comments; revert changes to aten/src/THC/THCCachingAllocator.cpp * fix renamed var	2018-05-31 15:09:54 -04:00
Tongzhou Wang	f9926e4ce5	Fix EmbeddingBag max_norm option (#7959 ) * fix EmbeddingBag max_norm option * flake8 * add warning to the embedding bag arg change	2018-05-31 09:42:56 -04:00
Richard Zou	0656ef483d	remove sort requirement from pad-sequence (#7928 ) * pad-sequence no longer requires sorting entries pad-sequence can get the max_len from the list of sequences. entries only need to be sorted if output will be used for pack_padded_sequence, which can throw the error itself. * remove sort requirement from pad-sequence Picks up from #5974. Removes the requirement that input sequences to pad_sequence have to be sorted. Addressed the comments in the PR: - Updated docstring for pad_sequence - Remove sort requirement in pad_sequence test - Test unsorted and sorted sequences in pad_sequence test	2018-05-30 16:36:55 -04:00
Tongzhou Wang	8858b1d519	Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952 )	2018-05-30 12:55:02 -04:00
Vedaanta Agarwalla	215fe057ea	No Default argument to max_unpool functions (Fixes #7327 ) (#7388 ) * Fix for Issue #7327 * Added testcase for max_unpool	2018-05-29 15:02:23 -04:00
vfdev	6a604f16cc	Update test_nn.py (#7787 )	2018-05-23 12:28:13 +02:00
Ruotian(RT) Luo	2222fc7666	Add support for accepting Tensor as input in clip_grad_* functions. (#7769 )	2018-05-23 12:12:03 +02:00
gchanan	7abdc303c6	Don't allow requires_grad to be set on integer Tensor constructors in… (#7185 ) * Don't allow requires_grad to be set on integer Tensor constructors in tensor_new. * Fix autograd test. * Fix test_distributions. * Fix test_jit. * Fix NN tests.	2018-05-18 19:45:10 +02:00
Thomas Viehmann	f7bc7007d4	return nan in max_pool/adaptive_max_pool for nan args (#7645 ) (#7670 )	2018-05-18 16:39:41 +02:00
ngimel	63ae163b24	put dropout states on the input device (#7515 ) * put dropout states on the input device * add assert to aten, add test, fix lint * only assert device if states are defined	2018-05-13 16:25:37 -04:00
Karan Dwivedi	dc0faab18d	Add zeros_ and ones_ init + tests (#7488 ) * Add zeros_ and ones_ init + tests * Dedup tests * Remove all occurences of as_variable	2018-05-11 11:07:11 -04:00
danielsimig	b6adf6871c	EmbeddingBag to handle empty bags in all modes (#7389 )	2018-05-10 15:46:57 -04:00
Yan Facai (颜发才)	f1e38725bf	add `to` method for PackedSequence (#7319 ) * ENH: add to method for PackedSequence * ENH: return self if possible * TST: remove extra data * DOC: add more explanation * TST: remove extra data * DOC: minor fix	2018-05-07 14:39:03 -04:00
Martin Tutek	c96f2624a2	Speedup sparse init (#6899 ) * Sparse initialization speedup * +empty line * simplify indexing * Can't reproduce locally... * Can't reproduce locally...+ * Can't reproduce locally...+ * Fix test, cleanup	2018-05-03 14:29:12 +01:00
Masaki Kozuki	ba046331e8	add spectral normalization [pytorch] (#6929 ) * initial commit for spectral norm * fix comment * edit rst * fix doc * remove redundant empty line * fix nit mistakes in doc * replace l2normalize with F.normalize * fix chained `by` * fix docs fix typos add comments related to power iteration and epsilon update link to the paper make some comments specific * fix typo	2018-05-01 17:00:30 +08:00
Ethan Steinberg	ee00a8049a	Add max pooling support to EmbeddingBag (#5725 ) * Add max mode support to EmbeddingBag * Lint fix * Fix compilation issue on other platforms * Rebase + don't waste memory when not in max mode * Oops, missed a spot * Fix whitespace from merge * less precision * Lower precision to avoid spurious failures * Minor typo * Switch to size()	2018-04-29 16:48:11 -04:00
gchanan	a6bfa16c17	torch.arange: add numpy-style type inference. (#7016 ) * torch.arange: add numpy-style type inference. This is a backwards-compatibility breaking change. * Fix flake8. * Use at::optional. * Remove unneeded header files. * Use reference wrapper. * Update arange for test. * Address review comments.	2018-04-27 15:11:45 -04:00
Jerry Ma	76d3c30783	Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#6445 )	2018-04-26 19:27:24 -07:00
Tongzhou Wang	6a41e2dc47	Add BC mechanism to Module.load_state_dict (#6639 ) * Add version counter to module, change load_state_dict to use load_local_state_dict which does class specific loading * Clarifies version number in docs * fix jit tests * fix state_dict tests * typo * fix ddp * exclude version numbers from state dict entries * Fix jit test and empty modules * address comments * test for "." * revert the private version change in state_dict * make IN case a hard error * fix not reporting error when unexpected submodule * address comments * disallow empty string in name and remvoe trailing dot	2018-04-19 15:36:30 -04:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
Tony Beltramelli	7fcaf3b49e	Update torch.nn.init and torch.nn.utils.clip_grad (#6173 ) Introducing two updates. 1. Add param to He initialization scheme in torch.nn.init Problem solved: The function calculate_gain can take an argument to specify the type of non-linearity used. However, it wasn't possible to pass this argument directly to the He / Kaiming weight initialization function. 2. Add util to clip gradient value in torch.nn.utils.clip_grad Problem solved: DL libraries typically provide users with easy access to functions for clipping the gradients both using the norm and a fixed value. However, the utils clip_grad.py only had a function to clip the gradient norm. * add param to He initialization scheme in torch.nn.init * add util to clip gradient value in torch/nn/utils/clip_grad.py * update doc in torch.nn.utils.clip_grad * update and add test for torch.nn.utils.clip_grad * update function signature in torch.nn.utils.clip_grad to match suffix_ convention * ensure backward compatibility in torch.nn.utils.clip_grad * remove DeprecationWarning in torch.nn.utils.clip_grad * extend test and implementation of torch.nn.utils.clip_grad * update test and implementation torch.nn.utils.clip_grad	2018-04-17 11:32:32 -04:00
Tongzhou Wang	0e93a2c334	Add Module.to (#6629 )	2018-04-16 17:46:52 -04:00
gchanan	749d51414a	Separate cuda-ness from dtype. (#6470 ) * Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.	2018-04-12 14:05:44 -04:00
Tongzhou Wang	59bda9a8c4	Fix reflection padding boundary checks (#6438 ) * Fix Reflection padding boundary checks * Improve padding docs * fix lint	2018-04-10 10:37:01 -04:00
Subhash Mullapudi	79c3ebc040	adds correct precision to test_noncontig_conv_grad (#6440 )	2018-04-10 12:18:01 +02:00
Tongzhou Wang	4d15442ebc	Add total_length option to pad_packed_sequence (#6327 ) * add total_length to pad_packed_sequence; add example on how to use pack->rnn->unpack with DP * address comments * fix typo	2018-04-08 20:25:48 -04:00
Tongzhou Wang	dfcd90783c	fix sparse embedding backward when input contains only padding_idx (#6211 )	2018-04-03 15:53:43 -04:00
Kaiyu Shi	605307f8f3	Add support for printing extra information in Module and refactor redundant codes (#5936 ) This PR enables users to print extra information of their subclassed nn.Module. Now I simply insert the user-defined string at the ending of module name, which should be discussed in this PR. Before this PR, users should redefine the __repr__ and copy&paste the source code from Module. * Add support for extra information on Module * Rewrite the repr method of Module * Fix flake8 * Change the __repr__ to get_extra_repr in Linear * Fix extra new-line for empty line * Add test for __repr__ method * Fix bug of block string indent * Add indent for multi-line repr test. * Address review comments * Update tutorial for creating nn.Module * Fix flake8, add extra_repr of bilinear * Refactor DropoutNd * Change to extra_repr in some Modules * Fix flake8 * Refactor padding modules * Refactor pooling module * Fix typo * Change to extra_repr * Fix bug for GroupNorm * Fix bug for LayerNorm	2018-04-02 13:52:33 -04:00
mseitzer	92a0f7835e	Support returning dictionaries in DataParallel (#6113 )	2018-04-02 15:16:44 +02:00
Tongzhou Wang	48ad4546d2	Move LayerNorm to ATen; remove tracking_running_stats functionality (#5983 ) * move LN to aten; remove tracking_stats functionaility * Address comments about error message and respect cudnn flag for LayerNorm and GroupNorm	2018-03-30 09:44:11 -07:00
Tongzhou Wang	4f05cb710e	Add underscore to nn.init.* and deprecate the original ones (#6093 ) Fixes #5946. * add underscore to nn.init.* and deprecate the original ones * add a test for deprecation	2018-03-29 13:26:12 -04:00
Richard Zou	371e14b807	NLLLoss: error message for mismatched input/target batch sizes (#6072 ) Fixes #5554 Adds an error message for when NLLLoss is passed an input and target whose batch sizes don't match. Ideally this check should live in ATen but since there is NLLLoss logic in python the check is there right now.	2018-03-28 14:21:38 -07:00
Marcin Elantkowski	5583c12888	Fix bias size assert in Bilinear (#5992 )	2018-03-27 23:05:04 +02:00
Tongzhou Wang	5d77709485	Linearly interpolating upsampling fix (#5927 ) * Changes in bilinear upsampling * Add align_corners option to upsampling module & functional when using linearly interpolating modes When align_corners=True, it uses the old original upsampling scheme, which gives visually better results, but doesn't properly align input and output pixels, and thus cause the output vary basing on input. This PR adds this align_corners option, and changes the default behavior to align_corners=False, with proper warning if this option is not specified upon using nn.Upsample or nn.functional.upsample to let be aware of this new change. Adds tests in test_nn.py for spatial invariance when align_corners=False, and usual module tests for align_corners=False. * remove redundant checks and unnecessary variables; fix the cast * fix negative indices	2018-03-24 12:21:13 -04:00
Tongzhou Wang	08891b0a4e	Group Normalization (#5968 ) * Group Normalization * move to ATen	2018-03-24 12:16:18 -04:00
Vedanuj Goswami	f3e16cc737	Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (#5408 ) This PR addresses issue #5024 * Expose Conv2dBackward in python * Separate interface for exposing gardients of operators * Revert old changes * Add tests * Add conv1d gradients. Refactor tests for grad convolutions * Refactor names and change examples * Remove Varibale from tests for conv backward	2018-03-23 17:49:32 -04:00
Ailing	d707dae013	Add half test in test_nn for auto generated tests. (#5362 ) * add half and double test in NewTestModule * add half/double/float tests in NewCriterionTest * resolve merge conflict with master	2018-03-21 16:55:06 -04:00

1 2 3 4 5 ...

594 Commits