Commit Graph

594 Commits

Author SHA1 Message Date
Peter Goldsborough
aec9fdf0a4 Fix _apply in nn.Module (#15305)
Summary:
Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module).

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305

Differential Revision: D13493937

Pulled By: goldsborough

fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750
2018-12-17 16:22:21 -08:00
David Riazati
59d71b9664 Bicubic interpolation for nn.functional.interpolate (#9849)
Summary:
Addresses #918, interpolation results should be similar to tf

* Adds bicubic interpolation operator to `nn.functional.interpolate`
* Corresponding test in `test_nn.py`

The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849

Differential Revision: D9007525

Pulled By: driazati

fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc
2018-12-17 15:31:48 -08:00
Edward Yang
dcd1685282 Revert D13440858: [pytorch][PR] Use a pool of per-thread cudnn handles for each device, updated
Differential Revision:
D13440858

Original commit changeset: 1c6af5c53538

fbshipit-source-id: fda42ea75000d4a4e9c4a8eeaaa5518f7ad9c298
2018-12-14 14:35:01 -08:00
Chaitanya Sri Krishna Lolla
9f1d8f2eeb enabled tests in test_nn, test_cuda and test_sparse (#15232)
Summary:
tests work on ROCm 1.9.2 as present on CI (fp16 bringup, hipMemset and sparse improvements)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15232

Differential Revision: D13470991

Pulled By: bddppq

fbshipit-source-id: 45acc4f9ea5baaaf7672b86eb022948055779925
2018-12-14 14:27:57 -08:00
Michael Carilli
ca4358c8f5 Use a pool of per-thread cudnn handles for each device, updated (#15080)
Summary:
Rebased version of https://github.com/pytorch/pytorch/pull/14861, hopefully addressing ezyang's comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15080

Differential Revision: D13440858

Pulled By: ezyang

fbshipit-source-id: 1c6af5c53538b81c6b92cf1dda231ed333f28035
2018-12-13 10:24:06 -08:00
Johannes M Dieterich
6610ace28b use ROCm 1.9.2 fp16 capabilities in rocBLAS and MIOpen interfaces (#14994)
Summary:
* relax MIOpen if statement to allow fp16/fp32 mixed precision training now supported by ROCm 1.9.2
* use gemm_ex API of rocBLAS in ROCm 1.9.2 instead of the previous hgemm API
* with this: enable all but one half test in test_nn

While there, fix also:
* a group convolution issue w/ MIOpen pertaining to initializing MIOpen on multi-GPU systems properly we detected while working on this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14994

Differential Revision: D13439869

Pulled By: bddppq

fbshipit-source-id: 75e4eb51a59488882e64b5eabdc30555b25be25e
2018-12-12 16:16:47 -08:00
Natalia Gimelshein
27d5ae7afb use datatype dependent tolerance in data parallel tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14856

Differential Revision: D13413560

Pulled By: soumith

fbshipit-source-id: b3a0cfe93477ed332e6eaa2e39ef5f4cc8b36481
2018-12-10 22:50:27 -08:00
Johannes M Dieterich
52942e1f09 Enable unit tests known to work on ROCm (#14011)
Summary:
* Enable unit tests known to work on ROCm.
* Disable a few that are known to be flaky for the time being.
* Use std::abs for Half
* No more special casing for ROCm in TensorMathReduce
* Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce

ezyang bddppq for awareness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011

Differential Revision: D13387679

Pulled By: bddppq

fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71
2018-12-07 18:57:32 -08:00
Ailing Zhang
ef91cfd68b Add new reduction mode in kl_div (#14457)
Summary:
Fixes #6622 .
We used to average over all elements for kl divergence, which is not aligned with its math definition.
This PR corrects the default reduction behavior of KL divergence that it now naverages over batch dimension.

- In KL, default behavior `reduction=mean` averages over batch dimension. While for most other loss functions, `reduction=mean` averages over all elements.
- We used to support scalar tensor as well. For BC purpose, we still support it, no reduction is performed on scalar tensor.
- Added a new reduction mode called `batchmean` which has the correct behavior for KL. Add a warning to make `batchmean` as default for KL instead of `mean` in next major release.
- [deprecated]I chose to not add a new reduction option, since "mean over batch dimension" is kinda special, and it only makes sense in few cases like KL. We don't want to explain why there's a option "batchmean" but it's not applicable for all other functions. I'm open to discussion on this one, as I cannot think of a perfect solution for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14457

Differential Revision: D13236016

Pulled By: ailzhang

fbshipit-source-id: 905cc7b3bfc35a11d7cf098b1ebc382170a087a7
2018-12-04 12:24:28 -08:00
David Riazati
814b5715ba Move module tests to common_nn (#14578)
Summary:
This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so
that they can be used in `test_jit.py` without running any of
`test_nn.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578

Differential Revision: D13268286

Pulled By: driazati

fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03
2018-11-30 12:14:59 -08:00
David Riazati
1f6d9f44fc Add InstanceNorm, Distance modules to Script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551

Differential Revision: D13272741

Pulled By: driazati

fbshipit-source-id: 3e4fe870d0e268903757f3ae8a56100606906bce
2018-11-29 22:18:55 -08:00
David Riazati
67e3905bc6 Revert D13268293: [pytorch][PR] [jit] Add InstanceNorm, Distance modules to Script
Differential Revision:
D13268293

Original commit changeset: cb33c6dcdadd

fbshipit-source-id: 214a29b74c85b7b25df0eb48e3fdb81539049130
2018-11-29 19:19:35 -08:00
David Riazati
75eccffdfe Add InstanceNorm, Distance modules to Script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551

Differential Revision: D13268293

Pulled By: driazati

fbshipit-source-id: cb33c6dcdaddf8c7a49b3535894d77bf5d771ddd
2018-11-29 17:26:29 -08:00
David Riazati
666d383a00 Add broadcast list default arg support (#14361)
Summary:
To convert `max_unpool` functions to weak script, this PR adds support
for `T` as default arguments for `BroadcastingListN[T]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361

Differential Revision: D13192231

Pulled By: driazati

fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249
2018-11-29 15:15:47 -08:00
David Riazati
9e93a02624 Use nn module tests in test_jit (#14238)
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13252887

Pulled By: driazati

fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988
2018-11-28 23:31:25 -08:00
David Riazati
3d98810fbd Revert D13192230: [pytorch][PR] [jit] Use nn module tests in test_jit
Differential Revision:
D13192230

Original commit changeset: 36488960b6c9

fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909
2018-11-28 00:23:09 -08:00
David Riazati
4cdcbbf410 Use nn module tests in test_jit (#14238)
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13192230

Pulled By: driazati

fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e
2018-11-27 21:19:51 -08:00
Jan Schlüter
c19af59a6e Use integer math to compute output size of pooling operations (#14405)
Summary:
As reported in #13386, the pooling operations can return wrong results for large inputs. The root of the problem is that while the output shape is initially being computed with integer operations, it is converted to float32 for division by the stride and applying either a `ceil` or a `floor` depending on the `ceil_mode`. Since even moderately large integers (the smallest being 16,777,217) cannot be expressed exactly in float32, this leads to wrong result shapes.

This PR relies purely on integer operations to perform the shape computation, including the ceil/floor distinction. Since I could not stand all that duplicated code, I pulled it out into a `pooling_shape.h` header, similar to the existing `linear_upsampling.h` header. I hope this is acceptable, let me know if you'd like to see it solved differently. I've also added tests to `test_nn.py` that fail without my changes and pass with my changes. They cover `{max,avg}_pool{1,2,3}d()` for CPU and GPU.

Fixes #13386.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14405

Differential Revision: D13215260

Pulled By: soumith

fbshipit-source-id: 802588ce6cba8db6c346448c3b3c0dac14d12b2d
2018-11-27 09:38:06 -08:00
Brian Vaughan
e4bb56570c Preemptively test for out-of-order length. (#13933)
Summary:
torch.nn.utils.rnn.pack_padded_sequence segment fault if not in
decreasing order #13324

We were seeing this segfault on throw, pre-emptively checking avoids
this:

*** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 ***
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13933

Differential Revision: D13090389

Pulled By: nairbv

fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8
2018-11-16 08:39:05 -08:00
lyuwenyu
1b1cdd944c Keep ModuleList consistent with python list in __setitem__ function. (#13102)
Summary:
`ModuleList` class function `__setitem__` has implicit rist
```
In [26]: mlist = nn.ModuleList([nn.ReLU(), nn.Conv2d(10, 10, 3, 1)])

In [27]: mlist
Out[27]:
ModuleList(
  (0): ReLU()
  (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
)

In [28]: mlist[-1] = nn.ReLU()

In [29]: mlist
Out[29]:
ModuleList(
  (0): ReLU()
  (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
  (-1): ReLU()
)

In [30]: mlist[-1]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-30-229d1b6823a0> in <module>()
----> 1 mlist[-1]

~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py in __getitem__(self, idx)
    134             return ModuleList(list(self._modules.values())[idx])
    135         else:
--> 136             return self._modules[self._get_abs_string_index(idx)]
    137
    138     def __setitem__(self, idx, module):

KeyError: '2'

```

modified as
```
    def __setitem__(self, idx, module):
        idx = self._get_abs_string_index(idx)
        return setattr(self, str(idx), module)
```
to fix it.

```
In [31]: class NewModuleList(nn.ModuleList):
    ...:     def __setitem__(self, idx, module):
    ...:         idx = self._get_abs_string_index(idx)
    ...:         return setattr(self, str(idx), module)
    ...:

In [32]: mlist = NewModuleList([nn.ReLU(), nn.Conv2d(10, 10, 2, 1)])

In [33]: mlist[-1] = nn.ReLU()

In [34]: mlist
Out[34]:
NewModuleList(
  (0): ReLU()
  (1): ReLU()
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13102

Differential Revision: D13092480

Pulled By: ezyang

fbshipit-source-id: 7ff7688f66e44bbd263a10d2d09db7bb0df4b749
2018-11-16 07:39:26 -08:00
Gregory Chanan
02152c515e Ensure nn Losses check scalar vs non-scalar values.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13860

Reviewed By: ezyang

Differential Revision: D13029364

Pulled By: gchanan

fbshipit-source-id: 20f1330fa181e52aea1f879dc655a9a6f62b5f53
2018-11-14 16:46:27 -08:00
Gregory Chanan
4341dd2753 Move most sccalar checks from nn.yaml into THNN/THCUNN code. (#13906)
Summary:
This includes everything in nn.yaml except for convolutions, multi_margin_loss, multi_label_margin_loss, nll_loss, and nll_loss2d.

Note that scalar_check False just means we don't do any extra scalar checks (we could elide this from the generated code, which I may do in a later commit).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13906

Reviewed By: ezyang

Differential Revision: D13044507

Pulled By: gchanan

fbshipit-source-id: ebd3bdca2bcf512ca44de1ce3be81946f6c0828e
2018-11-14 07:58:35 -08:00
Sam Gross
c46dd5163f Temporarily disable part of test_spectral_norm (#13908)
Summary:
See #13818 for suggestions about a long-term fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13908

Differential Revision: D13047262

Pulled By: colesbury

fbshipit-source-id: 0f29bd5b659bb97826381abbc305fb8a25b131ed
2018-11-13 14:19:16 -08:00
Ailing Zhang
a17c0118a5 fix stability in bce with pos_weight formula (#13863)
Summary:
Fixes #13773
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13863

Differential Revision: D13031803

Pulled By: ailzhang

fbshipit-source-id: 6c9e044f0450eebf4555bbc02c125713d9378e2f
2018-11-12 22:04:24 -08:00
Johannes M Dieterich
ce48958606 enable more unit tests (#13166)
Summary:
This enables the distributions and utils test sets for ROCm.
Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit.

For attention: bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166

Differential Revision: D12814759

Pulled By: bddppq

fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e
2018-11-12 18:49:52 -08:00
Thomas Viehmann
14004cbef6 Native batch norm (#13263)
Summary:
- Move batch norm from TH(CU)NN to native
- Speedups in many cases (e.g. #12006) for CUDA due to new block/grid layout and Welford-type mean/variance calculations (the latter for training mode)
- It splits the forward kernel in two pieces and reuses the evaluation kernel for the transformation.
- We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision.

Compared to the ill-fated #12368
- I changed the CPU kernel to not call `.sum()` from within parallel for. This seemed to have caused the breakage (NaN-results) in TestModels.test_dcgan_netG (thank you houseroad for the repro, errors in assessment of the fix are my own)
- I updated the Half->Float upcasting in tensors to go through `t.type().scalarType()` instead of `t.dtype()`.
- I have merged master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13263

Differential Revision: D12946254

Pulled By: SsnL

fbshipit-source-id: 3bb717ee250fbccaf10afe73722996aa4713d10d
2018-11-06 20:05:54 -08:00
Tongzhou Wang
2cd912bcc2 Fix more spectral norm bugs (#13350)
Summary:
Problems with SN and DP after #12671 :
1. in eval mode, `weight_orig` is not getting correct gradient #12737 .

    Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval.

2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong.

    Fix: Make `weight` not a buffer anymore and always calculate it as above.

3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward.

    Fix: This PR clones `u` and `v` before using them.

To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done.

cc The controller you requested could not be found. crcrpar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350

Differential Revision: D12931044

Pulled By: SsnL

fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef
2018-11-06 19:16:13 -08:00
Soumith Chintala
a7ee632dff Various Test and build fixes (#13556)
Summary:
- fixes weights-contiguous requirement for THCUNN Convolutions
- Add tests that conv backward pass works for non-contiguous weights
- fix RNN tests / error messages to be consistent and pass
- relax weight grad precision for fp16 for a particular test
- fix regression of CMAKE_PREFIX_PATH not passing through
- add missing skipIfNoLapack annotations where needed

Differential Revision: D12918456

Pulled By: soumith

fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05
2018-11-06 07:13:47 -08:00
Sam Gross
98f5c005da Speed up CPU threshold and relu implementation (#13182)
Summary:
```
The previous threshold implementation was not vectorized or parallelized.
This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms

CPU timings:
https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8

1 thread (before vs. after)
10240:  17.4 us vs. 6.9 µs per loop
102400: 141 us vs. 39.8 µs per loop

16 threads (before vs. after)
10240:  17.4 us vs. 6.7 µs per loop
102400: 141 us vs. 14.3 µs per loop

CUDA timings are not measurably different.

[1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions
https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182

Reviewed By: soumith

Differential Revision: D12825105

Pulled By: colesbury

fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15
2018-11-05 12:51:29 -08:00
Tongzhou Wang
9f2b2cac37 Fix handling all empty bags in CUDA embedding bag (#13483)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11847
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13483

Differential Revision: D12902914

Pulled By: SsnL

fbshipit-source-id: 577a53e815231e988da716b1ee5667e1f36408ca
2018-11-02 10:21:14 -07:00
Tongzhou Wang
99a5d19591 Rename elementwise_mean to mean (#13419)
Summary:
Closes #12459
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419

Differential Revision: D12883299

Pulled By: SsnL

fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7
2018-11-01 10:31:26 -07:00
Ailing Zhang
488d393ea6 Fix pointwise loss broadcast (#12996)
Summary: Fixes #12129 , #12327

Differential Revision: D10513781

Pulled By: ailzhang

fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec
2018-10-31 10:17:25 -07:00
Lu Fang
f8864f0505 Revert "Move batch_norm to ATen/native, speed up (#12368)" (#13191)
Summary:
Revert #12368 since it's causing onnx related test cases failing.
https://github.com/pytorch/pytorch/pull/12368

SsnL The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13191

Reviewed By: BIT-silence

Differential Revision: D12810778

Pulled By: houseroad

fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577
2018-10-26 23:05:50 -07:00
Zachary DeVito
dae7616078 Shard all of tests based on how many tests exist. (#13160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160

Reduces pytorch_core build from 2 hours to 30 minutes

Reviewed By: soumith, dzhulgakov

Differential Revision: D10524261

fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9
2018-10-26 18:20:34 -07:00
Thomas Viehmann
dc211c7de4 Move batch_norm to ATen/native, speed up (#12368)
Summary:
- Speed up the case of #12006 in the forward
- The backward still isn't as fast as one might hope (factor 2-3 in the #12006 case).
- More extensive benchmarking shows not so great performance compared
  to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024.
- We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to
  maintain reasonable precision.

Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12368

Differential Revision: D10559696

Pulled By: SsnL

fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419
2018-10-25 23:41:10 -07:00
Richard Zou
7863c17b26 Fix convtranspose3d output_size calculation (#12952)
Summary:
Closes #2119.

There was a small bug where the output_size got sliced with `[-2:]`
where we really meant to slice it as `[2:]` (to remove the batch and
channel dimensions).

Added a new test for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12952

Differential Revision: D10510678

Pulled By: zou3519

fbshipit-source-id: 4c04a5007fc6d002e1806d6fe981b43d33d6a4f2
2018-10-24 09:23:05 -07:00
Soumith Chintala
cf235e0894 fix lint after new flake8 release added new style constraints (#13047)
Summary:
fix lint after new flake8 release added new style constraints
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047

Differential Revision: D10527804

Pulled By: soumith

fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8
2018-10-24 09:03:38 -07:00
Wei Yang
710191e292 fix error message of large kernel size in conv2D (#12791)
Summary:
- fix #12565
- test plan:
with this fix, we have:
```
>>> m = nn.Conv2d(in_channels=3, out_channels=33, kernel_size=10, stride=1, bias=True)
>>> input = torch.randn(1, 3, 1, 1)
>>> output = m(input)
```
RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (10 x 10). Kernel size can't be greater than actual input size at ~/pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:50

not sure why these are `int` instead of `int64_t`:
5ccdd7a626/aten/src/THNN/generic/SpatialConvolutionMM.c (L10)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12791

Differential Revision: D10443045

Pulled By: weiyangfb

fbshipit-source-id: 2620acb40bdd49d29cec06337f6dfb4653d1987c
2018-10-18 00:51:16 -07:00
James Sun
f4944f0f8a Rename test/common.py to test/common_utils.py (#12794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794

common.py is used in base_module for almost all tests in test/. The
name of this file is so common that can easily conflict with other dependencies
if they happen to have another common.py in the base module. Rename the file to
avoid conflict.

Reviewed By: orionr

Differential Revision: D10438204

fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380
2018-10-17 23:04:29 -07:00
Thomas Viehmann
ba25e13782 Forbid Module.to with copy argument. (#12617)
Summary:
Module.to uses the Tensor.to parsing facility.
It should not, however, accept "copy" as a keyword/fourth positional
argument.

See #12571 for discussion.

Thank you SsnL for noticing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617

Differential Revision: D10392053

Pulled By: ezyang

fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c
2018-10-16 20:31:44 -07:00
Tongzhou Wang
ac994f2c78 Fix SpectralNorm with DataParallel (#12671)
Summary:
There were two problems with SN + DP:

1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost.
2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work.

Fixes are:
1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained
2. Do not call `detach_`.
3. Added comments in SN about the subtlety.
4. Added a note to the DP doc on this particular behavior of DP.

cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu

Fixes https://github.com/pytorch/pytorch/issues/11476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671

Differential Revision: D10410232

Pulled By: SsnL

fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9
2018-10-16 16:02:17 -07:00
Ailing Zhang
e15501fb68 fix bce_with_logits with legacy reduce (#12689)
Summary:
Fix #12624 . internal usecase of legacy `reduce`.
Add test in test_nn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689

Reviewed By: ezyang

Differential Revision: D10391195

Pulled By: ailzhang

fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1
2018-10-16 09:46:58 -07:00
Natalia Gimelshein
a98958d3bd dtype option for softmax (#11719)
Summary:
Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719

Reviewed By: ezyang

Differential Revision: D10175514

Pulled By: zou3519

fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243
2018-10-13 17:57:10 -07:00
Tongzhou Wang
d400502b1d Fix a bunch of warnings in TestNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12453

Differential Revision: D10244130

Pulled By: SsnL

fbshipit-source-id: e425c76bfb721fe118a32ddd1fa6eca3a3cd86f0
2018-10-08 17:38:23 -07:00
daquexian
f8086845aa Fix bug in grad.py when conv bias != None (#12281)
Summary:
Obviously, the grads of conv weight and conv input are not relevant to the bias, but the original `convXd_input` and `convXd_weight` methods receive a `bias` parameter. What's more, while the doc says `bias` should have the shape `(out_channels,)`, one will get a `RuntimeError` if the bias != None and in_channels != out_channels, for the weight of transposed conv has the shape `(in_channels, out_channels, kH, kW)` while the weight of vanilla conv has the shape `(out_channels, in_channels, kH, kW)`
```
RuntimeError: Given transposed=1, weight of size [channel1, channel2, kH, kW], expected bias to be 1-dimensional with channel2 elements, but got bias of size [channel1] instead
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12281

Differential Revision: D10217370

Pulled By: ezyang

fbshipit-source-id: bc00b439e5ae539276a5e678bdb92af700197bb2
2018-10-05 12:55:14 -07:00
Johannes M Dieterich
c9f7d7b506 mark unit tests as working, skip failing unit test (#12313)
Summary:
* enabled fp16 tests for test_torch

* enable fp16 tests for test_nn

* enabled multilabelmargin loss for fp16

* removed skip for test_pdist_empty_col

* Enable test_nn tests that pass with compiler fixes etc.

* Enable test_legacy_nn tests that pass with compiler fixes etc.

ezyang bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313

Differential Revision: D10189922

Pulled By: bddppq

fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6
2018-10-03 23:56:26 -07:00
iotamudelta
a2ebbccc9f fix unit tests on CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187

Differential Revision: D10118483

Pulled By: bddppq

fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442
2018-09-28 23:11:55 -07:00
Wei Yang
de11fe0c83 migrate PReLU to ATen (#11758)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10723
- migrate PReLU to ATen and deprecate legacy PReLU
- performance:

CPU with weight.numel() = 1
```
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
100 loops, best of 100: 9.43 ms per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
10 loops, best of 100: 24.4 ms per loop

>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 695 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.47 ms per loop
```

CPU with weight.numel() = channels
```
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 603 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 13.3 ms per loop

>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 655 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.45 ms per loop
```

CUDA with weight.numel() = 1
```
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 187 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.01 ms per loop

>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 195 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.28 ms per loop
```

CUDA with weight.numel() = channel
```
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 174 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.27 ms per loop

>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 181 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.26 ms per loop
```

The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels.

ezyang SsnL zou3519  soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758

Differential Revision: D9995799

Pulled By: weiyangfb

fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e
2018-09-21 16:26:04 -07:00
Thomas Viehmann
775358e4c2 Add non-legacy test of bilinear (#11935)
Summary:
Fixes: #11905
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11935

Differential Revision: D9991120

Pulled By: soumith

fbshipit-source-id: b00ad4f405440664ae5228b229a2ba0a5d3d92f6
2018-09-21 12:43:35 -07:00
Christian Puhrsch
d8f6be686d Remove torch/legacy (#11823)
Summary:
Largely unused and hinders current development
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11823

Differential Revision: D9925094

Pulled By: cpuhrsch

fbshipit-source-id: c797f62180e2128f9a567b0c57c8347957470ea5
2018-09-20 14:00:54 -07:00
Wei Yang
8aedc27a63 checking device types of input and weights at RNN (#10185)
Summary:
- fixes #9534
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10185

Differential Revision: D9141222

Pulled By: weiyangfb

fbshipit-source-id: bb652e42cc15917019df080d6bce2926b18f3476
2018-09-18 20:26:02 -07:00
Xingdong Zuo
e2bc95e1bd add ModuleList.insert (#11664)
Summary:
fixes #11652
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11664

Differential Revision: D9892845

Pulled By: ezyang

fbshipit-source-id: 2c910d6bc0b28a999e25beca6e398fd0f35535c5
2018-09-18 07:41:28 -07:00
Tongzhou Wang
7df6650e9c Fix empty embedding bag on cuda (#11740)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11739
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11740

Differential Revision: D9881392

Pulled By: SsnL

fbshipit-source-id: 2964d314f199dd9b4bb69e36592b67efdf5e0760
2018-09-17 14:40:03 -07:00
Tongzhou Wang
6f6b03566b Vectorize grid sample 2d CPU kernels (#10980)
Summary:
This PR vectorizes the CPU grid sample 2d forward and backward kernels. Specifically,

 1. add `.data()` in `TensorAccessor`
 2. support non-void return value for declaring CPU kernel stub
 2. add `bool at:: geometry_is_contiguous(IntList sizes, IntList strides)`
1. The following vectorized CPU primitives are added:

    + `gather<scale>(baseaddr, vindex)`: `result[i] = baseaddr[vindex[i] * scale]`
    + `mask_gather<scale>(src, baseaddr, vindex, mask)`: `result[i] = mask[i] ? baseaddr[vindex[i] * scale] : src[i]`.
    + comparison ops
    + binary logical ops
    + `min(a, b)`
    + `cast<dst_t, src_t>(src_vec)`: changing dtype but keeping the bit representation
    + `blendv(a, b, mask)`: `result[i] = mask[i] ? b[i] : a[i]`.
    + ctor with multiple values (i.e., `setr`)
    + `arange(start = 0, step = 1)`: constructs a vector with values specified by the arange parameters
    + `convert_to_int_of_same_size(vec)`: convert floating point vector to corresponding integral type of same size
    + `interleave2(a, b)` & `deinterleave2(x, y)`: interleave or deinterleaves two vectors. E.g., for `interleave`:
        ```
        inputs:
          {a0, a1, a2, a3, a4, a5, a6, a7}
          {b0, b1, b2, b3, b4, b5, b6, b7}
        outputs:
          {a0, b0, a1, b1, a2, b2, a3, b3}
          {a4, b4, a5, b5, a6, b6, a7, b7}
        ```

  2. Grid sample CPU kernel implementations are described in the following note (also in `GridSampleKernel.cpp`:

  ```
   NOTE [ Grid Sample CPU Kernels ]

   Implementation of vectorized grid sample CPU kernels is divided into three
   parts:

   1. `ComputeLocation` struct
      Transforms grid values into interpolation locations of the input tensor
      for a particular spatial dimension, basing on the size of that dimension
      in input tensor, and the padding mode.
```
```cpp
      template<typename scalar_t, GridSamplerPadding padding>
      struct ComputeLocation {
        using Vec = Vec256<scalar_t>;

        // ctor
        ComputeLocation(int64_t size);

        // Given grid values `in`, return the interpolation locations after
        // un-normalization and padding mechanism (elementwise).
        Vec apply(const Vec &in) const;

        // Similar to `apply`, but also returns `d apply(in) / d in`
        // (elementwise).
        // this is often used in gradient computation.
        std::pair<Vec, Vec> apply_get_grad(const Vec &in) const;
      };
```
```
   2. `ApplyGridSample` struct
      Owns N `ComputeLocation` structs, where N is the number of spatial
      dimensions. Given N input grid vectors (one for each spatial dimension)
      and spatial offset, it gets the interpolation locations from
      `ComputeLocation`s, applies interpolation procedure, and then writes to
      the output (or grad_input & grad_grid in backward).
```
```cpp
      template<typename scalar_t, int spatial_dim,
               GridSamplerInterpolation interp,
               GridSamplerPadding padding>
      struct ApplyGridSample {

        // ctor
        ApplyGridSample(const TensorAccessor<scalar_t, 4>& input);

        // Applies grid sampling (forward) procedure:
        //   1. computes interpolation locations from grid values `grid_x` and
        //      `grid_y`,
        //   2. interpolates output values using the locations and input data
        //      in `inp_slice`, and
        //   3. writes the first `len` values in the interpolated vector to
        //      `out_slice` with spatial offset being `offset`.
        //
        // This assimes that `grid_x` and `grid_y` all contain valid grid
        // values \in [-1, 1], even at indices greater than `len`.
        //
        // The `*_slice` argument namess mean samples within a batch (i.e.,
        // with the batch dimension sliced out).
        void forward(TensorAccessor<scalar_t, 3>& out_slice,
                     const TensorAccessor<scalar_t, 3>& inp_slice,
                     int64_t offset, const Vec& grid_x, const Vec& grid_y,
                     int64_t len) const;

        // Applies grid sampling (backward) procedure. Arguments semantics
        // and strategy are similar to those of `forward`.
        void backward(TensorAccessor<scalar_t, 3>& gInp_slice,
                      TensorAccessor<scalar_t, 3>& gGrid_slice,
                      const TensorAccessor<scalar_t, 3>& gOut_slice,
                      const TensorAccessor<scalar_t, 3>& inp_slice,
                      int64_t offset, const Vec& grid_x, const Vec& grid_y,
                      int64_t len) const;
      }
```
```
   3. `grid_sample_2d_grid_slice_iterator` function
      Among the tensors we work with, we know that the output tensors are
      contiguous (i.e., `output` in forward, and `grad_input` & `grad_grid` in
      backward), we need to randomly read `input` anyways, and `grad_output`
      usually comes from autograd and is often contiguous. So we base our
      iterating strategy on the geometry of grid.
      `grid_sample_2d_grid_slice_iterator` function provides an abstract to
      efficiently iterates through a `grid` slice (without batch dimension).
      See comments of that function on the specific cases and strategies used.
```
```cpp
      template<typename scalar_t, typename ApplyFn>
      void grid_sample_2d_grid_slice_iterator(
        const TensorAccessor<scalar_t, 3>& grid_slice,
        const ApplyFn &apply_fn);

      // `apply_fn` is a function/lambda that can be called as if it has
      // declaration:
      //   void apply_fn(const Vec256<scalar_t>& grid_x,
      //                 const Vec256<scalar_t>& grid_y,
      //                 int64_t spatial_offset, int64_t len);
```
```
      `apply_fn` will be called multiple times, and together cover the entire
      output spatial space. Therefore, e.g., to implement forward 2d grid
      sample, we can do
```
```cpp
      ApplyGridSample<scalar_t, 2, interp, padding> grid_sample(input_accessor);

      for (int n = 0; n < input_accessor.size(0); n++) {
        grid_sample_2d_grid_slice_iterator(
          grid_accessor[n],
          [&](const Vec256<scalar_t>& grid_x, const Vec256<scalar_t>& grid_y,
              int64_t spatial_offset, int64_t len) {
            grid_sample.forward(out_accessor[n], input_accessor[n],
                                spatial_offset, grid_x, grid_y, len);
          });
      }
   ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10980

Differential Revision: D9564867

Pulled By: SsnL

fbshipit-source-id: 5b7c3c7ea63af00eec230ae9ee1c3e6c6c9679b4
2018-09-16 20:41:10 -07:00
Edward Yang
74197c7115 Restore support for dim=None on WeightNorm. (#11661)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11661

Reviewed By: veenix

Differential Revision: D9826799

Pulled By: ezyang

fbshipit-source-id: 9eec57bb27a365406669e412f6eb88741b22ed3d
2018-09-14 07:39:43 -07:00
Wei Yang
54107ae8cf convert output_device at data_parallel from torch.device to index (#10189)
Summary:
- fixes #9984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189

Differential Revision: D9545390

Pulled By: weiyangfb

fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e
2018-09-11 20:27:07 -07:00
Erik Brinkman
91089a7e17 Add GPU implementation of pdist (#11102)
Summary:
Add the gpu kernel version.

The parallelism I went with performs poorly when there are a large number of vectors, but they're all short, as I don't allocate the thread pool to wrap in that case.

Test Plan
---------
```
python -m unittest test_torch.TestTorch.test_pdist_{empty,scipy} test_nn.TestNN.test_pdist{,_zeros,_empty_row,_empty_col,_cpu_gradgrad_unimplemented,_cuda_gradgrad_unimplemented} test_jit.TestJitGenerated.test_nn_pdist
```

Current performance specs are a little underwhelming, I'm in the process of debugging.

size | torch | torch cuda | scipy
-----|-------|------------|------
16 x 16 | 9.13 µs ± 3.55 µs | 9.86 µs ± 81.5 ns | 15.8 µs ± 1.2 µs
16 x 1024 | 15 µs ± 224 ns | 9.48 µs ± 88.7 ns | 88.7 µs ± 8.83 µs
1024 x 16 | 852 µs ± 6.03 µs | 7.84 ms ± 6.22 µs | 4.7 ms ± 166 µs
1024 x 1024 | 34.1 ms ± 803 µs | 11.5 ms ± 6.24 µs | 273 ms ± 6.7 ms
2048 x 2048 | 261 ms ± 3.5 ms | 77.5 ms ± 41.5 µs | 2.5 s ± 97.6 ms
4096 x 4096 | 2.37 s ± 154 ms | 636 ms ± 2.97 µs | 25.9 s ± 394 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11102

Differential Revision: D9697305

Pulled By: erikbrinkman

fbshipit-source-id: 2b4f4b816c02b3715a85d8db3f4e77479d19bb99
2018-09-07 09:09:46 -07:00
iotamudelta
9de2085806 Use custom hcc/HIP, purge hcSPARSE (#11198)
Summary:
* purge hcSPARSE now that rocSPARSE is available
* integrate a custom hcc and HIP
* hcc brings two important compiler fixes (fixes hundreds of unit tests)
* HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch)
* mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault)
* optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198

Differential Revision: D9652340

Pulled By: ezyang

fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690
2018-09-06 19:38:07 -07:00
iotamudelta
33c7cc13ca improve docker packages, fix bugs, enable tests, enable FFT (#10893)
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893

Differential Revision: D9615053

Pulled By: ezyang

fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
2018-09-02 08:54:42 -07:00
Tongzhou Wang
e85f3fccb3 Fix relying on UB in test_data_parallel_nested_output (#11092)
Summary:
We shouldn't reply on plain `dict` ordering. Example failure: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-xenial-cuda8-cudnn6-py3-test1/8417/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11092

Reviewed By: ezyang

Differential Revision: D9583274

Pulled By: SsnL

fbshipit-source-id: ba80b96648c98c24c2ec5fa6fd9aa566c095cce7
2018-08-30 13:10:25 -07:00
Erik Brinkman
611a608517 Add ATen pdist CPU kernel (#10782)
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782

Reviewed By: ezyang

Differential Revision: D9583378

Pulled By: erikbrinkman

fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
2018-08-30 11:55:27 -07:00
Will Feng
b14f2e899c Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279)
Summary:
When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times:

```
_sparseDims + _denseDims = len(shape)
_indices.shape: dimensionality: 2,  shape: (_sparseDims, nnz)
_values.shape:  dimensionality: 1 + _denseDims.  shape: (nnz, shape[_sparseDims:])
```

This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled.

Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279

Differential Revision: D8936683

Pulled By: yf225

fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e
2018-08-23 10:10:24 -07:00
Tongzhou Wang
de11a5fb28 Resubmit #8322 with scipy version check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10775

Differential Revision: D9458207

Pulled By: SsnL

fbshipit-source-id: f2b0dbf2d236134afded9b15d8bf55ff98f50e7b
2018-08-22 13:39:49 -07:00
Tongzhou Wang
c5c1c051ca Fix dropout fused kernel applied in eval mode (#10621)
Summary:
fixes https://github.com/pytorch/pytorch/issues/10584

cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10621

Differential Revision: D9379397

Pulled By: SsnL

fbshipit-source-id: 5ff2939ba794af082ce597ef289a09ee757636dc
2018-08-17 14:54:42 -07:00
Jerry Ma
afd7477eaa Add `buffers(), named_buffers()` methods. (#10554)
Summary:
This commit adds the ``buffers()`` and ``named_buffers()`` methods as
analogues of ``parameters()`` and ``named_parameters()``.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554

Reviewed By: SsnL

Differential Revision: D9367762

Pulled By: jma127

fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e
2018-08-16 16:26:48 -07:00
Simon Wang
a129f9ad3b Revert D9332335: [pytorch][PR] Implements volumetric (5d) affine grid generation.
Differential Revision:
D9332335

Original commit changeset: 1b3a91d078ef

fbshipit-source-id: 3dcce680257a6da121f5d67918ed4236e0c5bfec
2018-08-15 15:25:11 -07:00
Adam Paszke
86363e1d8e Move RNN implementations to C++ (#10481)
Summary:
This is the first of two changes that are supposed to improve how we handle RNNs in the JIT. They still get traced as `PythonOp`s, but now it will be much easier to actually expose them to the JIT as e.g. `aten::lstm`, and ignore the Python interpreter entirely. This needs some symbolic adjustments that will be part of a second PR.

Even when we fix symbolics, there will still be a bit of a problem with statefulness of the cuDNN API (we need a mutable cache for the dropout state, but our IR has no way of representing that).

zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10481

Reviewed By: ezyang

Differential Revision: D9341113

Pulled By: apaszke

fbshipit-source-id: 0ae30ead72a1b12044b7c12369d11e5ca8ec30b5
2018-08-15 13:25:41 -07:00
Tongzhou Wang
254dedf604 Propagate NaN through threshold (#10277)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/10238
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10277

Reviewed By: SsnL

Differential Revision: D9199825

Pulled By: soumith

fbshipit-source-id: 8ee7f9a72d9546d429f311c3f6028461d3c93fe2
2018-08-15 12:59:31 -07:00
Brian Hart
9cffe783f1 relax tolerance for two torch.half (float16) tests (#10519)
Summary:
Two tests in the 'nn' test bucket may fail when the torch.half
(float16) data type is used. The assertions used in the tests
intend to allow slight floating point imprecision in the results,
but the tolerances used for the comparisons are too strict for
the half type.

Relax the tolerances so that slight float16 imprecision won't
cause test failures.

The affected tests are:

- test_variable_sequence_cuda
- test_Conv2d_groups_nobias

For more information, see issue:

https://github.com/pytorch/pytorch/issues/7420
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10519

Differential Revision: D9343751

Pulled By: soumith

fbshipit-source-id: 90aedf48f6e22dd4fed9c7bde7cd7c7b6885845a
2018-08-15 12:11:20 -07:00
Eli Stevens
f5a4dd89b5 Implements volumetric (5d) affine grid generation. (#8322)
Summary:
I've implemented affine grid generation for volumetric (5d) inputs. The implementation is based off of the spatial implementation, extended by one dimension. I have a few questions about my implementation vs. the existing one that I will add inline.

I have some extensive test cases for the forward pass here: https://gist.github.com/elistevens/6e3bfb20d8d0652b83bd16b3e911285b However, they use `pytest.fixture` extensively, so I'm not sure the best way to incorporate them into the pytorch test suite. Suggestions? I have not tested backwards at all.

Diff probably best viewed with whitespace changes ignored.

Thanks for considering!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8322

Differential Revision: D9332335

Pulled By: SsnL

fbshipit-source-id: 1b3a91d078ef41a6d0a800514e49298fd817e4df
2018-08-15 11:02:08 -07:00
Tongzhou Wang
6a55238a3f Grid sampler: nearest interpolation & reflection padding (#10051)
Summary:
closes #9702 .

cc jph00

Commit structure:

1. Change the index calculation logic. I will explain using 1-D for simplicity.

	Previously we have (in pseudo code):

	```
	// 1. get the float locations from grid
	scalar_t x = from_grid()

	// 2. find the integral surrounding indices
	int x_left = floor(x)
	int x_right = x_left + 1

	// 3. calculate the linear interpolate weights
	scalar_t w_left = x_right - x
	scalar_t w_right = x - x_left

	// 4. manipulate the integral surrounding indices if needed
	// (e.g., clip for border padding_mode)
	x_left = manipulate(x_left, padding_mode)
	x_right = manipulate(x_right, padding_mode)

	// 5. interpolate
	output_val = interpolate(w_left, w_right, x_left, x_right)
	```

	This is actually incorrect (and also unintuitive) because it calculates the
	weights before manipulate out-of-boundary indices. Fortunately, this
	isn't manifested in both of the current supported modes, `'zeros'` and
	`'border'` padding:

	+ `'zeros'`: doesn't clip
	+ `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are
	  clipped to the same value, so weights don't matter

	But this is a problem with reflection padding, since after each time we reflect,
	the values of `w_left` and `w_right` should be swapped.

	So in this commit I change the algorithm to (numbers corresponding to the
        ordering in the above pseudo-code)

	```
	1. get float location
	4. clip the float location
	2. find the integral surrounding indices
	3. calculate the linear interpolate weights
	```

	In the backward, because of this change, I need to add new variables to track
	`d manipulate_output / d manipulate_input`, which is basically a multiplier
	on the gradient calculated for `grid`. From benchmarking this addition doesn't
	cause obvious slow downs.

2. Implement reflection padding. The indices will keep being reflected until
	they become within boundary.

	Added variant of `clip_coordinates` and `reflect_coordinates` to be used in
	backward. E.g.,
	```cpp
	// clip_coordinates_set_grad works similarly to clip_coordinates except that
	// it also returns the `d output / d input` via pointer argument `grad_in`.
	// This is useful in the backward pass of grid_sampler.
	scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t *grad_in)
	```
	For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`.
	If `in` is reflected **odd** times in `'reflection'` mode, `grad_in`
	is set to `-1`.

3. Implement nearest interpolation.

4. Add test cases

5. Add better input checking
  Discussed with goldsborough for moving `operator<<` of `at::Device`,
  `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise
  `AT_CHECK` can't find them.)

6. Support empty tensors. cc gchanan

    + Make empty tensors not acceptable by cudnn.
    + Add `AT_ASSERT(kernel block size  > 0)` if using `GET_BLOCKS`
   + Cache `numel` in `TensorGeometry`
      I was going to use `numel` to test if cudnn descriptor should accept a
      tensor, but it isn't used eventually. I can revert this if needed.

7. Add more test cases, including on input checking and empty tensors

8. Remove an obsolete comment

9. Update docs. Manually tested by generating docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051

Differential Revision: D9123950

Pulled By: SsnL

fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6
2018-08-10 12:43:27 -07:00
Natalia Gimelshein
5bb21493fd add fused dropout kernels (#9666)
Summary:
While waiting for dropout to be fully ported to ATen, here's performance fix for the most common dropout case. Dropout is still in python function, I just added efficient path to it. I could not make inplace work, because generator always generates `return self` for inplace function, and I need to return both original tensor and mask, so inplace goes on the existing pass. Even with non-inplace version, since mask is now a ByteTensor, memory used is just a little larger than for inplace dropout, due to savings on mask.
Once dropout is moved to aten, these kernels still can be used for efficient implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9666

Reviewed By: SsnL

Differential Revision: D8948077

Pulled By: ezyang

fbshipit-source-id: 52990ef769471d957e464af635e5f9b4e519567a
2018-08-07 13:34:53 -07:00
Wei Yang
149d4f776b use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965)
Summary:
- fixes #9141, #9301
- use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion)
- return (N) instead of (N, C) to match the same behavior as MultiMarginLoss
- Note that with this PR, the following behavior is expected:
```
loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none')
loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean')
loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum')

loss.sum() == loss_sum  # True
loss.mean() == loss_mean  # True
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965

Differential Revision: D9038402

Pulled By: weiyangfb

fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9
2018-08-03 17:54:19 -07:00
Thomas Viehmann
6456b944fd ctc_loss odds and ends (#10112)
Summary:
- Add convenience wrapper to pass tensors as input_lengths, target_lengths
- Fix documentation example
- Check BLANK >= 0

Thank you, Simon and Soumith for the suggestions!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10112

Differential Revision: D9130737

Pulled By: SsnL

fbshipit-source-id: f9a0022a969788bda3db9f360e2564b519ebf2e6
2018-08-03 13:25:18 -07:00
Tongzhou Wang
43b151224e Move grid sampler to ATen (#9961)
Summary:
Spatial version benchmark

|                           | CPUFloat THNN | CPUFloat ATen | CPUDouble THNN | CPUDouble ATen | CUDAHalf THNN | CUDAHalf ATen | CUDAFloat THNN | CUDAFloat ATen | CUDADouble THNN | CUDADouble ATen |
|---------------------------|---------------|---------------|----------------|----------------|---------------|---------------|----------------|----------------|-----------------|-----------------|
| [1024x1x28x28] zero pad   | 2.19281888s   | 0.21280479s   | 2.52922535s    | 0.23944831s    | 0.17494774s   | 0.06242800s   | 0.31270599s    | 0.03706479s    | 0.40542483s     | 0.07391024s     |
| [1024x1x28x28] border pad | 3.04329610s   | 0.24705672s   | 2.29205394s    | 0.22336411s    | 0.17980361s   | 0.06212497s   | 0.31415701s    | 0.03847790s    | 0.43020391s     | 0.07540464s     |
| [32x3x244x244] zero pad   | 18.29301333s  | 2.18566656s   | 19.01662397s   | 3.51552224s    | 1.72487235s   | 0.28933954s   | 2.02466702s    | 0.18178749s    | 2.63671613s     | 0.41391206s     |
| [32x3x244x244] border pad | 18.72205329s  | 2.02600884s   | 20.13017297s   | 3.25979590s    | 1.96455693s   | 0.33070564s   | 2.18666625s    | 0.19546938s    | 2.91268897s     | 0.38465047s     |

For #9702

basics:
+ grid tensors have dimensions `[N, H, W, 2]` (or `[N, D, H, W, 3]` for 3d).
+ input/output tensors have dimensions `[N, C, H, W]` (or `[N, C, D, H ,W]` for 3d)
+ grid sampler maps `input([N, C, inp_H, inp_W]), grid([N, H, W, 2])` to `output([N, C, H, W])` (3d case is similar).

variable naming:
+ `tensor_sH` means the stride of `tensor` at the dimension of `H`.
+ `tensor_ptr_NCH` is a data pointer that always points to the beginning of the `tensor[n][c][h]` slice in the loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9961

Differential Revision: D9057175

Pulled By: SsnL

fbshipit-source-id: 9ed8f1dc376ed10229f047fdcf3c90dbd250bee6
2018-08-01 07:54:46 -07:00
Xiang Gao
6fc75eadf0 Add CELU activation to pytorch (#8551)
Summary:
Also fuse input scale multiplication into ELU

Paper:
https://arxiv.org/pdf/1704.07483.pdf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551

Differential Revision: D9088477

Pulled By: SsnL

fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b
2018-08-01 07:54:44 -07:00
Kyle M. Tarplee
aae37324cc fixed a newly introduced regression in softmax (#10066)
Summary:
There is a regression in softmin in 0.4.1 that was not present in 0.4.0.  The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x).  These are not the same.  The fix is trivial because the bug is due to operator precedence.

This is a major regression that broke my training.  I'm not sure how a unit test did not catch this.

```
x = torch.tensor([1, 2, 3.5, 4])
print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0
print(F.softmax(-x, dim=0)) # this is what softmax should be
print(F.softmax(x, dim=0))
print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly
```
In 0.4.1 this produces
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
tensor([0.6668, 0.2453, 0.0547, 0.0332])
tensor([0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])

In 0.4.0 this produces the correct values
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.0278,  0.0755,  0.3385,  0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066

Differential Revision: D9106995

Pulled By: soumith

fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b
2018-07-31 19:28:30 -07:00
Roy Li
2422801625 fix _pointwise_loss for target gradients (#10018)
Summary:
_pointwise loss has some python special casing, we converted reduction to aten enums too early.

fixes #10009
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018

Differential Revision: D9075489

Pulled By: li-roy

fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162
2018-07-31 13:39:58 -07:00
Thomas Viehmann
685224aa14 Add CTC loss (#9628)
Summary:
The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the
modification that is is in log space.
The there also is a binding for the (much faster) CuDNN implementation.

This could eventually fix #3420

I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments.

I could use feedback on all sorts of things, including:
- Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors)
- Input convention. I use log probs because that is what the gradients are for.
- Launch parameters for the kernels
- Errors and obmissions and anything else I'm not even aware of.

Thank you for looking!

In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this).
I have read CuDNN is much faster than implementations because it does *not* use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step.
Average timings for the kernels from nvprof for some size:

```
CuDNN:
60.464us compute_alphas_and_betas
16.755us compute_grads_deterministic
Cuda:
121.06us ctc_loss_backward_collect_gpu_kernel (= grads)
109.88us ctc_loss_gpu_kernel (= alphas)
98.517us ctc_loss_backward_betas_gpu_kernel (= betas)
WarpCTC:
299.74us compute_betas_and_grad_kernel
66.977us compute_alpha_kernel
```

Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations.

Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably.

My performance measuring testing script:
```
import timeit
import sys
import torch
num_labels = 10
target_length  = 30
input_length = 50
eps = 1e-5
BLANK = 0#num_labels
batch_size = 16

torch.manual_seed(5)
activations = torch.randn(input_length, batch_size, num_labels + 1)
log_probs = torch.log_softmax(activations, 2)
probs = torch.exp(log_probs)
targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long)
targets_2d = targets.view(batch_size, target_length)
target_lengths = torch.tensor(batch_size*[target_length])
input_lengths = torch.tensor(batch_size*[input_length])
activations = log_probs.detach()

def time_cuda_ctc_loss(grout, *args):
    torch.cuda.synchronize()
    culo, culog_alpha = torch._ctc_loss(*args)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

def time_cudnn_ctc_loss(groupt, *args):
    torch.cuda.synchronize()
    culo, cugra= torch._cudnn_ctc_loss(*args)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

def time_warp_ctc_loss(grout, *args):
    torch.cuda.synchronize()
    culo = warpctc.ctc_loss(*args, blank_label=BLANK, size_average=False, length_average=False, reduce=False)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

if sys.argv[1] == 'cuda':
    lpcu = log_probs.float().cuda().detach().requires_grad_()
    args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK]
    grout = lpcu.new_ones((batch_size,))
    torch.cuda.synchronize()
    print(timeit.repeat("time_cuda_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'cudnn':
    lpcu = log_probs.float().cuda().detach().requires_grad_()
    args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True]
    grout = lpcu.new_ones((batch_size,))
    torch.cuda.synchronize()
    print(timeit.repeat("time_cudnn_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'warpctc':
    import warpctc
    activations = activations.cuda().detach().requires_grad_()
    args = [activations, input_lengths.int(), targets.int(), target_lengths.int()]
    grout = activations.new_ones((batch_size,), device='cpu')
    torch.cuda.synchronize()

    print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals()))
```
I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628

Differential Revision: D8952453

Pulled By: ezyang

fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860
2018-07-31 11:09:48 -07:00
Owen Anderson
6ed41adb04 Use round-to-negative division when computing output sizes for convolutions involving striding and dilation.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9640

Differential Revision: D8948081

Pulled By: resistor

fbshipit-source-id: 06f2e3ad1bdb448be6f36577cb9bd27c884df595
2018-07-27 13:22:54 -07:00
Tongzhou Wang
27455e9c78 Use _six for inf and nan (#9500)
Summary:
Things like `float('inf')` are actually quite expensive.
```py
In [1]: import math

In [2]: %timeit -n 200 math.inf
49.3 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 200 loops each)

In [3]: %timeit -n 200 float('inf')
194 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 200 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9500

Reviewed By: soumith

Differential Revision: D8876229

Pulled By: SsnL

fbshipit-source-id: 78602b76bb53d5588910b58270930c0bd413d2d7
2018-07-18 10:40:29 -07:00
tippisum
5c695e3a60 Implement 2D and 3D alpha_dropout (#9073)
Summary:
It implements per-channel alpha_dropout. It also creates corresponding function classes and unifies the process of dropout and alpha_dropout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9073

Differential Revision: D8727008

Pulled By: ezyang

fbshipit-source-id: 9d509f9c5db4e98f7b698cdfc4443505a4d2b331
2018-07-17 17:10:16 -07:00
Karan Dwivedi
97008a64a1 Add ModuleDict and ParameterDict containers (#8463)
Summary:
Addresses:

https://github.com/pytorch/pytorch/issues/4048 and https://github.com/pytorch/pytorch/pull/5297#issuecomment-394924139
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8463

Reviewed By: SsnL

Differential Revision: D8689291

Pulled By: ezyang

fbshipit-source-id: 47e67d9bae1b64ec10771a2c00c56229463b1598
2018-07-15 17:40:52 -07:00
Richard Zou
05559b4071 Accumulate MSELoss reduce=True into accreal instead of real (#9287)
Summary:
THNN was accumulating the result of reduction loss functions
into real instead of accreal. This was causing precision issues with
MSELoss.

This patch only fixes MSELoss. Some of the other losses exhibit bad precision as well (because they accumulate into real instead of accreal) and require more investigation. I will open an issue for those (#9286)

Fixes #8710

cc li-roy SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9287

Reviewed By: SsnL

Differential Revision: D8775708

Pulled By: zou3519

fbshipit-source-id: d1a1f159deee0cb90fd8e81e63b246115eea8e9e
2018-07-11 10:25:36 -07:00
Roy Li
a47a30b9ce Implement grid_sampler in aten (#8929)
Summary:
Partially addresses #8928.

Maybe #7273?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8929

Reviewed By: ezyang

Differential Revision: D8668919

Pulled By: li-roy

fbshipit-source-id: 8ad07b224d2ab211c274c4c10f042501efaae32c
2018-07-10 15:10:24 -07:00
Tongzhou Wang
e8536c08a1 Update extension docs, fix Fold/Unfold docs (#9239)
Summary:
Commits:
1. In extension doc, get rid of all references of `Variable` s (Closes #6947 )
    + also add minor improvements
    + also added a section with links to cpp extension :) goldsborough
    + removed mentions of `autograd.Function.requires_grad` as it's not used anywhere and hardcoded to `return_Py_True`.
2. Fix several sphinx warnings
3. Change `*` in equations in `module/conv.py` to `\times`
4. Fix docs for `Fold` and `Unfold`.
    + Added better shape check for `Fold` (it previously may give bogus result when there are not enough blocks). Added test for the checks.
5. Fix doc saying `trtrs` not available for CUDA (#9247 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9239

Reviewed By: soumith

Differential Revision: D8762492

Pulled By: SsnL

fbshipit-source-id: 13cd91128981a94493d5efdf250c40465f84346a
2018-07-08 19:09:39 -07:00
Ailing Zhang
227c8f2654 Implement nn.functional.interpolate based on upsample. (#8591)
Summary:
This PR addresses #5823.

* fix docstring: upsample doesn't support LongTensor

* Enable float scale up & down sampling for linear/bilinear/trilinear modes. (following SsnL 's commit)

* Enable float scale up & down sampling for nearest mode. Note that our implementation is slightly different from TF that there's actually no "align_corners" concept in this mode.

* Add a new interpolate function API to replace upsample. Add deprecate warning for upsample.

* Add an area mode which is essentially Adaptive_average_pooling into resize_image.

* Add test cases for interpolate in test_nn.py

* Add a few comments to help understand *linear interpolation code.

* There is only "*cubic" mode missing in resize_images API which is pretty useful in practice. And it's labeled as hackamonth here #1552. I discussed with SsnL that we probably want to implement all new ops in ATen instead of THNN/THCUNN. Depending on the priority, I could either put it in my queue or leave it for a HAMer.

* After the change, the files named as *Upsampling*.c works for both up/down sampling. I could rename the files if needed.

Differential Revision: D8729635

Pulled By: ailzhang

fbshipit-source-id: a98dc5e1f587fce17606b5764db695366a6bb56b
2018-07-06 15:28:11 -07:00
Tongzhou Wang
7b25cbbef9 Test nn.Module on non-contiguous inputs (#9114)
Summary:
1. Let `ModuleTest` raise when they fail on non-contiguous inputs. Fix legacy modules.
2. Fix BN (both THNN and cuDNN) not working on non-contiguous inputs.
3. Fix CUDA EmbeddingBag not working on non-contiguous inputs. To prevent calling `.contiguous()` on in both `forward` and `backward`,
  a. prefix all current `embedding_bag*` functions with `_`, indicating that they require input to be contiguous (there is a check in each function).
  b. create `embedding_bag`, which makes input arguments `.contiguous()`, and calls `_embedding_bag`
3. Make many ATen `embedding*` functions to work on non-contiguous inputs so we don't need to call `input = input.contiguous()` in Python `nn.functional.embedding`.
4. Fix dense-sparse addition when the sparse input is not coalesced and indices or values tensor is not contiguous. This came up in the test cases of Embedding modules with `sparse=True`. Added tests.
5. Update `TensorUtils.cpp` to use `AT_*` macros.

Request:
review from cpuhrsch on the `Embedding*` changes.
review from ezyang on ATen sparse & BN changes.
Closes https://github.com/pytorch/pytorch/pull/9114

Differential Revision: D8717299

Pulled By: SsnL

fbshipit-source-id: 0acc6f1c9522b5b605361e75112c16bbe1e98527
2018-07-05 21:09:34 -07:00
Will Feng
ff501c30af Turn on UBSAN in the OSS build (#8813)
Summary:
Copy of https://github.com/pytorch/pytorch/pull/8802
Closes https://github.com/pytorch/pytorch/pull/8813

Differential Revision: D8707364

Pulled By: yf225

fbshipit-source-id: bc201980b50e9fb44c42a17f898b50d3558fc417
2018-07-05 15:55:49 -07:00
Roy Li
21c786071b update nn loss tests to use new reduction arg (#9118)
Summary:
The tests were using the old args, which caused them to emit a lot of deprecation warnings.

closes #9103.

Reviewed By: ezyang

Differential Revision: D8720581

Pulled By: li-roy

fbshipit-source-id: 3b79527f6fe862fb48b99a6394e8d7b89fc7a8c8
2018-07-02 19:41:57 -07:00
Thomas Viehmann
e977485449 detach spectral norm calculated weight in eval mode (#9020)
Summary:
As we left weight to be the last calculated weight in eval mode, we need to detach it from the computation in order to facilitate using backward.
The typical use case is in GANs when the discriminator has spectral norm, is in eval mode and we want to backprop through the discriminator to get weight gradients for the generator.
Closes https://github.com/pytorch/pytorch/pull/9020

Reviewed By: ezyang

Differential Revision: D8694054

Pulled By: SsnL

fbshipit-source-id: 09ee5843687cac3ed4c40759ac577a14c5371730
2018-07-02 10:39:47 -07:00
Tongzhou Wang
623ae0c07c Fix loading 0.4 BN checkpoints (#9004)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/8481
Closes https://github.com/pytorch/pytorch/pull/9004

Reviewed By: soumith

Differential Revision: D8684017

Pulled By: SsnL

fbshipit-source-id: 57820ad5f6b60795358c9447409a364a93ffa1d9
2018-07-01 22:24:21 -07:00
Roy Li
c61f0217a5 combine size_average and reduce args in loss functions (#8018)
Summary:
closes #7929
Closes https://github.com/pytorch/pytorch/pull/8018

Differential Revision: D8682540

Pulled By: li-roy

fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936
2018-07-01 05:39:00 -07:00
Karan Dwivedi
a41d433d9d Check key should be string in nn.Module.add_module, parameter and buffer (#8960)
Summary:
Because I probably messed up the rebase in https://github.com/pytorch/pytorch/pull/8905
Closes https://github.com/pytorch/pytorch/pull/8960

Reviewed By: soumith

Differential Revision: D8668202

Pulled By: ezyang

fbshipit-source-id: 41e19803c7ac7aac898c8e70c6a9769314476ca9
2018-06-27 19:40:00 -07:00
Orion Reblitz-Richardson
edb88b5f3a
Update from Facebook (#8887)
* add opencl + fpga context

adds an opencl context inside caffe2/fb which can be used for fpga access

* [Caffe2] Force tensor inference checks to be triggered during testing

We've started to rely on TensorInference functions more for different analysis.  This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.

* [Caffe2] Fix cost models for DotProduct and Div.  Update Tensor Inference for dot product

As title.  DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs.  TensorInference defined to support implementation.

* [SG-MoE] Add an option to make the experts NOT as components

* [nomnigraph] Rename and fixup convertToNeuralNetOperator API

This will make things a bit cleaner

* no longer symlink THNN.h and THCUNN.h

* forced decoder network (onnx export)

Closes https://github.com/pytorch/translate/pull/95

Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties.

Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea

* Revert schema change to fix production models

Revert schema change to fix production models

* MockLogDeviceReader - rebase on FIX

# Goal

1), Build a make_mock_log_device_reader using make_mock_reader

2), Replace the real log_device_reader here: https://fburl.com/raihwf1p

# Log by D8151734

Real log_device_reader:
```
I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0
I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin

* [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier

implement log barrier as a regularization method

* Add teacher weight screening.

Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function.

* Add NormalizerContext

See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file.

I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow.

https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1

* Adding cosine similarity option in dot processor

Add pairwise cosine similarity option in dot product.
Add an option to concate dot product and cosine similarity.
Add test cases.

* [nomnigraph][redo] Concat elim for sparseNN

Same as D7962948, which was reverted because Operator Schema was not
defined

* [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN

Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads).

https://github.com/pytorch/pytorch/pull/7918/files

* [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size

enables nomnigraph and reduces codesize

* [Warmup] Allow both offline incremental training and online training

Change plan name on saving side and reading side to support both training type

This diff depends on D8128530 and D8168651.

* Revert D7802642: [Warmup] Allow both offline incremental training and online training

This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Add legacy grad logic to fix div op on old graphs.

Add legacy grad logic to fix div op on old graphs.

* Correctly propagate operator failures

Propagate errors from operators that throw exceptions and return false

* Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN

This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope

extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption().  And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope.

* [opt] hgdirsync wasn't enabled, merge diverged code

Here's the damage, P59732616 basically xplat was left behind but had
the change from assert to CAFFE_ENFORCE

* OMP parallelism over RoIs for RoIAlign op

Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on
the number of OMP threads set during startup.

PR: https://github.com/pytorch/pytorch/pull/8562

* Use int64_t for shape in FillOps

to avoid overflow of int32

* Implement Rotated RoIAlign op

Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086.
The idea is simple - orientation/angle is added as an RPN
anchor parameter and then the angle is further regressed similar to bbox
coords. There are some additional changes related to NMS and IoU, but besides
that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ.

RoIs are represented in [center_x, center_y, width, height, angle] format.
`angle` repre

* Rotated RoIAlign op CUDA forward implementation

CUDA forward impl for D8415490

* RoIAlignRotated op CUDA backward pass implementation

TSIA

* All remaining fixes to eliminate process_github.sh

Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py

remove skipIf(True, 'Fbcode') line from process_github.sh

replace sed of cpp file with #ifdef to control cudnnDestroy use

undo sync-time deletion of .gitattributes, remove process_github.sh

switch to using _utils._internal rather than try-import-except

This diff also fixes the open-source bug where rebuilds have

* Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package

* [easy] improve error log in adagrad op

as title

* re-allow use of thnn_h_path

This fixes cffi usage in OSS

* [4/4] [tum] paralyzing layerNorm for GPU full sync

as title

* add compile=False to pytorch tests, remove hack with pyc

* Add shape and type inference for RowWiseArgMax operator

See title

* Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally

# Problem

`MockHiveReader` uses `GlobalCounter` to limit `max_examples`.

GlobalCounter on server node collect local counts from worker nodes every 1 sec.

This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`.

# Plan

Given,
```
Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int

* [Caffe2] Fix FCGradient cost inference.  Prevent overflow in cost inference

FCGradient missed a factor 2 in the `num_outputs == 3` case.  Overflow was occurring with flop calculation for FC.  Changed types to `uint64_t` to prevent future problems.

* Fix binary ops with empty inputs

Fix binary ops with empty inputs

* Support the filling of input blob with provided data

as title for Biz Integrity case

* Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test.

* [c2][easy] improve pack ops error loggings

as desc.

* Add ShapeTypeInference for LpNorm operator

As desc

* Shard test_nn to reduce runtime for each test target

Closes https://github.com/pytorch/pytorch/pull/8793

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

* Change default caffe2_streams_per_gpu to 1

* Remove IN_SANDCASTLE from common.py and test_nn.py

We prefer to disable the failing tests through Sandcastle UI instead.

* Add a new class for an updated prof_dag.proto

This diff contains:
- An updated prof_dag.proto that contains blob profiles.
- A class to deserialize this information (serialization is in a follow up diff)
- Update to separate profiling information from NeuralNet (and use it as part of the class above).
- Unit tests

* Lambdarank for SparseNN

This diff adds a lambda_rank_layer for SparseNN.
 changes include
1) Adds support for multi sessions in c2 op
2) Adds support for two different loss functions in c2 op
3) Unit tests for op

* Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [easy] A few fixups to multithread predictor benchmark

(1) support perf on T6 server
(2) remove dead code

* fix a bug about the map size

as title

* Fix reduce sum on in-place case.

Fix reduce sum on in-place case.

* [Warmup] Reland reverted diff Allow both offline incremental training and online training

Closes https://github.com/pytorch/pytorch/pull/8827

fix net transform integration test. Allow offline and online trainer to coexist D7802642.

* Add StoreHandlerNotAvailableException

Add an exception for a store that is not available or has been
deleted.

* Use exception handling for fault tolerance, missing KV store

Remove status blobs to communication ops so that exceptions propagate on
failure.

* [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj

for simple bounded constrained optimization, incl non-negative box constraints.

* [GanH]: Adaptive Weighting with More Estimations

With implemented postivity optimization, we now learn adaptive weights with different
parameterizations.

This improves parameter estimation and training stability.

* Revert some changes for landing

* Remove AutoNoGIL in StorageSharing

* Temporarily disable net_tests

* Revert "[Caffe2] Force tensor inference checks to be triggered during testing"

This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4.

* Revert "Fix reduce sum on in-place case."

This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64.

* Revert "Revert "Fix reduce sum on in-place case.""

This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.
2018-06-26 14:55:48 -07:00
Vadim Velikodniy
6e28d4d364 Add pos_weight argument to nn.BCEWithLogitsLoss (#5660) (#6856)
* Add pos_weight argument to nn.BCEWithLogitsLoss and F.binary_cross_entropy_with_logits (#5660)
- Add an option to control precision/recall in imbalanced datasets
- Add tests (but new_criterion_tests)

* Move pos_weight to the end of args list in the documentation.

`pos_weight` was moved to the end because it is the last argument in both
`nn.BCEWithLogitsLoss` and `binary_cross_entropy_with_logits`
2018-06-26 12:31:07 -04:00
li-roy
85f4d2b55a throw error when grid_sample is passed unsupported mode (#8884) 2018-06-25 22:37:41 -04:00
Thomas Viehmann
fc22bf3e82 Spectral norm improvements (#8590)
* Spectral norm improvements
- Don't do iterations on weight in eval mode
  To facilitate this, register weight as buffer in order to be able
  to use module with spectral norm in eval mode after immediately
  after loading state dict (#8208)
- Use weight instead of weight_orig as weight when removing
  spectral norm
- Add dim parameter in case the normalization should occur w.r.t.
  a dimension other than 0 (#7865)

* add and update spectral norm tests

* More spectral norm tests

Thank you, Simon, for the suggestions.
2018-06-24 17:15:13 -04:00
Ailing
ddda7cfea5 allow output_size to contain None in adaptive pooling methods (#8596)
* allow output_size to contain None in adaptive pooling methods

* fix lint

* address comments
2018-06-22 13:29:15 -04:00
Will Feng
ac068fdabe Use env var to pass sharding options to test_nn.py (#8727)
Buck doesn't support passing arguments to Python unit tests, and we have to use environment variables to pass the sharding options instead. Also, buck test doesn't go through the __name__ == '__main__' code path and we need to move the env var checking logic to top-level.

* Use env var to pass sharing options to test_nn.py

* Move env var checking to top-level

* fix lint
2018-06-21 08:30:28 -04:00
Will Feng
d6c873a393
Shard test_nn to reduce runtime for each test target (#8678)
* Shard test_nn to reduce runtime for each test target

* Use load_tests for selecting tests to enable

* fix lint

* Use arg parser from common.py
2018-06-20 15:01:28 -04:00
Richard Zou
2039c7a38f Fix test_rnn_args_check (#8606)
test_rnn_args_check generates mismatched input_shape and hidden_shape
args. To do this, it changes a dimension of input_shape or hidden_shape
to have an incorrect size.

Before, the test was changing the size of a dimension to -1. However,
this is flawed because an input of size i.e. (6, -1, 2) is wrong.
This PR fixes it so that the test changes sizes of dimensions to
`bad_size = 7`. As long as none of the other sizes (input_size,
hidden_size, num_layers, batch_size) divide this, we don't have to worry
about that dimension being accidentally broadcasted into working.
2018-06-18 14:08:57 -04:00
Wei Yang
ae55865a3b Migrated hardshrink() to ATen and deprecated nn.Hardshrink() (#8117)
* 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet

* optimized memory read/write

* 1. pass in lambd as scalar for CPU/CUDA_apply*; 2. removed tests for hardshrink at test_legacy_nn

* fixes test_utils

* 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda

* 1. printing lambd value; 2. default lambd=0.5 is still failing

* getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py

* cleaned up debug printf
2018-06-14 16:42:20 -04:00
Tongzhou Wang
a77b391de7 [SpectralNorm] don't register original weight as buffer (#8170)
* don't register original weight as buffer; fixes for buffers that require grad

* add test
2018-06-12 14:42:05 -04:00
Wei Yang
4c2a1a1a64
Added backward function for kl_div target (#7839)
* added backward fn for target

* added module test for kl_div target, and assuming targets are probabilities
2018-06-07 17:17:18 -07:00
li-roy
73966f65ae Stop BCELoss from returning negative results (#8147)
* Stop BCELoss from returning negative results

* check explicitly for 0 before taking log

* add tests

* fix lint

* address comments
2018-06-07 20:06:04 -04:00
Tongzhou Wang
36b8cc5483 skip CUDA memory leak check on Windows altogether (#8213) 2018-06-06 17:29:53 -04:00
Tongzhou Wang
35f08b930d
Allow parallel_apply to take in list[Tensor] (#8047) 2018-06-06 13:49:52 -04:00
Tongzhou Wang
c0a419e6ba
Add non_blocking to Tensor/Module.to (#7312)
* Add non_blocking to Tensor/Module.to

* flake8

* Add argparse tests

* cpp parse

* Use C++ parser

* use a commong parse function with Tensor.to

* fix test_jit

* use THPObjectPtr

* increase refcount for None, True, and False

* address comments

* address comments
2018-06-04 18:46:52 -04:00
Marcin Elantkowski
c2046c1e5e Implement adaptive softmax (#5287)
* Implement adaptive softmax

* fix test for python 2

* add return_logprob flag

* add a test for cross-entropy path

* address review comments

* Fix docs

* pytorch 0.4 fixes

* address review comments

* don't use no_grad when computing log-probs

* add predict method

* add test for predict

* change methods order

* get rid of hardcoded int values

* Add an optional bias term to the head of AdaptiveSoftmax
2018-06-04 12:12:03 -04:00
Tongzhou Wang
eb2f21f1e4
Skip CUDA memory leak test on BN tests on windows (#8043) 2018-06-01 18:09:14 -04:00
Tongzhou Wang
c6a923f486
Support modules that output scalar in Gather (and data parallel) (#7973)
* Support modules that output scalar in Gather (and data parallel)

* Improve warning msg
2018-06-01 16:20:39 -04:00
Tongzhou Wang
bf29abd908
propagate nan in some activations (#8033)
* propagate nan in some activations

* fix py2 not having math.nan

* flake8
2018-06-01 14:08:01 -04:00
Tongzhou Wang
85ee94b7be
Add memory leak check in CUDA tests (#7270)
* Add memory leak check in CUDA tests

* Tracking multi-GPU too

* fix run_test.py not running __name__ == '__main__' content; add test for make_cuda_memory_checked_test

* add a comment

* skip if cuda

* 1. Change the wrapper to a method in common.py:TestCase
2. Refactor common constants/method that initialize CUDA context into common_cuda.py
3. Update some test files to use TEST_CUDA and TEST_MULTIGPU

* Fix MaxUnpool3d forward memory leak

* Fix MultiLabelMarginCriterion forward memory leak

* Fix MultiMarginLoss backward memory leak

* default doCUDAMemoryCheck to False

* make the wrapper skip-able

* use TEST_MULTIGPU

* add align_corners=True/False tests for Upsample; fix TEST_CUDNN

* finalize interface

* VolumetricMaxUnpooling_updateOutput

* fix test_nccl

* rename THC caching allocator methods to be clearer

* make the wrapped function a method

* address comments; revert changes to aten/src/THC/THCCachingAllocator.cpp

* fix renamed var
2018-05-31 15:09:54 -04:00
Tongzhou Wang
f9926e4ce5 Fix EmbeddingBag max_norm option (#7959)
* fix EmbeddingBag max_norm option

* flake8

* add warning to the embedding bag arg change
2018-05-31 09:42:56 -04:00
Richard Zou
0656ef483d remove sort requirement from pad-sequence (#7928)
* pad-sequence no longer requires sorting entries

pad-sequence can get the max_len from the list of sequences. entries only need to be sorted if output will be used for pack_padded_sequence, which can throw the error itself.

* remove sort requirement from pad-sequence

Picks up from #5974.

Removes the requirement that input sequences to pad_sequence have to be
sorted. Addressed the comments in the PR:
- Updated docstring for pad_sequence
- Remove sort requirement in pad_sequence test
- Test unsorted and sorted sequences in pad_sequence test
2018-05-30 16:36:55 -04:00
Tongzhou Wang
8858b1d519 Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952) 2018-05-30 12:55:02 -04:00
Vedaanta Agarwalla
215fe057ea No Default argument to max_unpool functions (Fixes #7327) (#7388)
* Fix for Issue #7327

* Added testcase for max_unpool
2018-05-29 15:02:23 -04:00
vfdev
6a604f16cc Update test_nn.py (#7787) 2018-05-23 12:28:13 +02:00
Ruotian(RT) Luo
2222fc7666 Add support for accepting Tensor as input in clip_grad_* functions. (#7769) 2018-05-23 12:12:03 +02:00
gchanan
7abdc303c6
Don't allow requires_grad to be set on integer Tensor constructors in… (#7185)
* Don't allow requires_grad to be set on integer Tensor constructors in tensor_new.

* Fix autograd test.

* Fix test_distributions.

* Fix test_jit.

* Fix NN tests.
2018-05-18 19:45:10 +02:00
Thomas Viehmann
f7bc7007d4 return nan in max_pool/adaptive_max_pool for nan args (#7645) (#7670) 2018-05-18 16:39:41 +02:00
ngimel
63ae163b24 put dropout states on the input device (#7515)
* put dropout states on the input device

* add assert to aten, add test, fix lint

* only assert device if states are defined
2018-05-13 16:25:37 -04:00
Karan Dwivedi
dc0faab18d Add zeros_ and ones_ init + tests (#7488)
* Add zeros_ and ones_ init + tests

* Dedup tests

* Remove all occurences of as_variable
2018-05-11 11:07:11 -04:00
danielsimig
b6adf6871c EmbeddingBag to handle empty bags in all modes (#7389) 2018-05-10 15:46:57 -04:00
Yan Facai (颜发才)
f1e38725bf add to method for PackedSequence (#7319)
* ENH: add to method for PackedSequence

* ENH: return self if possible

* TST: remove extra data

* DOC: add more explanation

* TST: remove extra data

* DOC: minor fix
2018-05-07 14:39:03 -04:00
Martin Tutek
c96f2624a2 Speedup sparse init (#6899)
* Sparse initialization speedup

* +empty line

* simplify indexing

* Can't reproduce locally...

* Can't reproduce locally...+

* Can't reproduce locally...+

* Fix test, cleanup
2018-05-03 14:29:12 +01:00
Masaki Kozuki
ba046331e8 add spectral normalization [pytorch] (#6929)
* initial commit for spectral norm

* fix comment

* edit rst

* fix doc

* remove redundant empty line

* fix nit mistakes in doc

* replace l2normalize with F.normalize

* fix chained `by`

* fix docs

fix typos
add comments related to power iteration and epsilon
update link to the paper
make some comments specific

* fix typo
2018-05-01 17:00:30 +08:00
Ethan Steinberg
ee00a8049a Add max pooling support to EmbeddingBag (#5725)
* Add max mode support to EmbeddingBag

* Lint fix

* Fix compilation issue on other platforms

* Rebase + don't waste memory when not in max mode

* Oops, missed a spot

* Fix whitespace from merge

* less precision

* Lower precision to avoid spurious failures

* Minor typo

* Switch to size()
2018-04-29 16:48:11 -04:00
gchanan
a6bfa16c17
torch.arange: add numpy-style type inference. (#7016)
* torch.arange: add numpy-style type inference.

This is a backwards-compatibility breaking change.

* Fix flake8.

* Use at::optional.

* Remove unneeded header files.

* Use reference wrapper.

* Update arange for test.

* Address review comments.
2018-04-27 15:11:45 -04:00
Jerry Ma
76d3c30783 Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#6445) 2018-04-26 19:27:24 -07:00
Tongzhou Wang
6a41e2dc47 Add BC mechanism to Module.load_state_dict (#6639)
* Add version counter to module, change load_state_dict to use load_local_state_dict which does class specific loading

* Clarifies version number in docs

* fix jit tests

* fix state_dict tests

* typo

* fix ddp

* exclude version numbers from state dict entries

* Fix jit test and empty modules

* address comments

* test for "."

* revert the private version change in state_dict

* make IN case a hard error

* fix not reporting error when unexpected submodule

* address comments

* disallow empty string in name and remvoe trailing dot
2018-04-19 15:36:30 -04:00
Tongzhou Wang
1c01eabd3c
Codemod to update our codebase to 0.4 standard (#6641)
* Codemod to update our codebase to 0.4 standard

* Update some of the test scri[ts

* remove Variable in test_clip_grad_value

* fix _symbolic_override_wrapper_maker
2018-04-17 22:06:54 -04:00
Tony Beltramelli
7fcaf3b49e Update torch.nn.init and torch.nn.utils.clip_grad (#6173)
Introducing two updates.

1. Add param to He initialization scheme in torch.nn.init
Problem solved:
The function calculate_gain can take an argument to specify the type of non-linearity used. However, it wasn't possible to pass this argument directly to the He / Kaiming weight initialization function.

2. Add util to clip gradient value in torch.nn.utils.clip_grad
Problem solved:
DL libraries typically provide users with easy access to functions for clipping the gradients both using the norm and a fixed value. However, the utils clip_grad.py only had a function to clip the gradient norm.

* add param to He initialization scheme in torch.nn.init

* add util to clip gradient value in torch/nn/utils/clip_grad.py

* update doc in torch.nn.utils.clip_grad

* update and add test for torch.nn.utils.clip_grad

* update function signature in torch.nn.utils.clip_grad to match suffix_ convention

* ensure backward compatibility in torch.nn.utils.clip_grad

* remove DeprecationWarning in torch.nn.utils.clip_grad

* extend test and implementation of torch.nn.utils.clip_grad

* update test and implementation torch.nn.utils.clip_grad
2018-04-17 11:32:32 -04:00
Tongzhou Wang
0e93a2c334
Add Module.to (#6629) 2018-04-16 17:46:52 -04:00
gchanan
749d51414a
Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00
Tongzhou Wang
59bda9a8c4
Fix reflection padding boundary checks (#6438)
* Fix Reflection padding boundary checks

* Improve padding docs

* fix lint
2018-04-10 10:37:01 -04:00
Subhash Mullapudi
79c3ebc040 adds correct precision to test_noncontig_conv_grad (#6440) 2018-04-10 12:18:01 +02:00
Tongzhou Wang
4d15442ebc
Add total_length option to pad_packed_sequence (#6327)
* add total_length to pad_packed_sequence; add example on how to use pack->rnn->unpack with DP

* address comments

* fix typo
2018-04-08 20:25:48 -04:00
Tongzhou Wang
dfcd90783c fix sparse embedding backward when input contains only padding_idx (#6211) 2018-04-03 15:53:43 -04:00
Kaiyu Shi
605307f8f3 Add support for printing extra information in Module and refactor redundant codes (#5936)
This PR enables users to print extra information of their subclassed nn.Module.
Now I simply insert the user-defined string at the ending of module name, which should be discussed in this PR.

Before this PR, users should redefine the __repr__ and copy&paste the source code from Module.

* Add support for extra information on Module

* Rewrite the repr method of Module

* Fix flake8

* Change the __repr__ to get_extra_repr in Linear

* Fix extra new-line for empty line

* Add test for __repr__ method

* Fix bug of block string indent

* Add indent for multi-line repr test.

* Address review comments

* Update tutorial for creating nn.Module

* Fix flake8, add extra_repr of bilinear

* Refactor DropoutNd

* Change to extra_repr in some Modules

* Fix flake8

* Refactor padding modules

* Refactor pooling module

* Fix typo

* Change to extra_repr

* Fix bug for GroupNorm

* Fix bug for LayerNorm
2018-04-02 13:52:33 -04:00
mseitzer
92a0f7835e Support returning dictionaries in DataParallel (#6113) 2018-04-02 15:16:44 +02:00
Tongzhou Wang
48ad4546d2 Move LayerNorm to ATen; remove tracking_running_stats functionality (#5983)
* move LN to aten; remove tracking_stats functionaility

* Address comments about error message and respect cudnn flag for LayerNorm and GroupNorm
2018-03-30 09:44:11 -07:00
Tongzhou Wang
4f05cb710e Add underscore to nn.init.* and deprecate the original ones (#6093)
Fixes #5946.

* add underscore to nn.init.* and deprecate the original ones

* add a test for deprecation
2018-03-29 13:26:12 -04:00
Richard Zou
371e14b807 NLLLoss: error message for mismatched input/target batch sizes (#6072)
Fixes #5554

Adds an error message for when NLLLoss is passed an input and target
whose batch sizes don't match. Ideally this check should live in ATen
but since there is NLLLoss logic in python the check is there right now.
2018-03-28 14:21:38 -07:00
Marcin Elantkowski
5583c12888 Fix bias size assert in Bilinear (#5992) 2018-03-27 23:05:04 +02:00
Tongzhou Wang
5d77709485 Linearly interpolating upsampling fix (#5927)
* Changes in bilinear upsampling

* Add align_corners option to upsampling module & functional when using linearly interpolating modes
When align_corners=True, it uses the old original upsampling scheme, which gives visually better results,
but doesn't properly align input and output pixels, and thus cause the output vary basing on input.
This PR adds this align_corners option, and changes the default behavior to align_corners=False, with
proper warning if this option is not specified upon using nn.Upsample or nn.functional.upsample to let
be aware of this new change.
Adds tests in test_nn.py for spatial invariance when align_corners=False, and usual module tests for
align_corners=False.

* remove redundant checks and unnecessary variables; fix the cast

* fix negative indices
2018-03-24 12:21:13 -04:00
Tongzhou Wang
08891b0a4e Group Normalization (#5968)
* Group Normalization

* move to ATen
2018-03-24 12:16:18 -04:00
Vedanuj Goswami
f3e16cc737 Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (#5408)
This PR addresses issue #5024

* Expose Conv2dBackward in python

* Separate interface for exposing gardients of operators

* Revert old changes

* Add tests

* Add conv1d gradients. Refactor tests for grad convolutions

* Refactor names and change examples

* Remove Varibale from tests for conv backward
2018-03-23 17:49:32 -04:00
Ailing
d707dae013 Add half test in test_nn for auto generated tests. (#5362)
* add half and double test in NewTestModule

* add half/double/float tests in NewCriterionTest

* resolve merge conflict with master
2018-03-21 16:55:06 -04:00