pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Thomas Viehmann	2e40857dad	Fix CTC loss for zero-length targets on GPU (#23298 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/18215 at last! Also sprinkle tests... Pull Request resolved: https://github.com/pytorch/pytorch/pull/23298 Differential Revision: D16582145 Pulled By: soumith fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d	2019-07-31 12:03:45 -07:00
ptrblck	9130ab380a	fix gemm call for CUDABlas for THCUNN conv, #23545 (#23552 ) Summary: * Swapped `CUBLAS_OP_N` for `'n'` * added a test This PR should fix https://github.com/pytorch/pytorch/issues/23545. Thanks at AlphabetMan for reporting the initial issue reported in [the forum](https://discuss.pytorch.org/t/cuda-10-1-error-using-transposeconv2d-with-output-padding-1/51414?u=ptrblck) as well as ngimel for the guidance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23552 Differential Revision: D16580986 Pulled By: ezyang fbshipit-source-id: abc0bce1e84d9c9d96d44ae0296951725adc8424	2019-07-31 10:01:36 -07:00
Jan Schlüter	0bc90194fb	Catch and print exception traceback in parallel_apply() workers (#18055 ) Summary: When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure. This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread. Before: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply raise output RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` After: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply ''.join(traceback.format_exception(*exc_info))) RuntimeError: Caught exception in replica 0. Original traceback and message: Traceback (most recent call last): ... File "../models/foo.py", line 319, in bar baz = asdf / ghij[:, np.newaxis] RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055 Differential Revision: D16444972 Pulled By: zhangguanheng66 fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce	2019-07-26 11:41:22 -07:00
Stefan Krah	87d3f66506	max_pool_with_indices: return valid indices if all input elements are -inf (#23161 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20465. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23161 Differential Revision: D16442672 Pulled By: ezyang fbshipit-source-id: 8c2ee13acd73954c7307720c01c732f460266a63	2019-07-24 14:51:39 -07:00
Johannes M Dieterich	4cd726c7b3	Update ROCm CI to python3.6 (#23088 ) Summary: Rehash of https://github.com/pytorch/pytorch/issues/22322 . Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6. This PR adds the skip tests and some semantic changes for PyTorch. Added pattern match skip for anything but the ROCm CI compared to #223222 for the python find step in the PyTorch build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23088 Differential Revision: D16448261 Pulled By: bddppq fbshipit-source-id: 69ece1a213418d9abf1444c496dce1c190ee07c8	2019-07-23 23:07:45 -07:00
Zafar Takhirov	058645acb1	Fusion and _intrinsic modules (#23003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003 torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu Fusion function to combine specific modules: (conv,bn) and (conv,bn,relu). In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity. Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training. Also add: torch.nn._intrinsic for convRelu and LinearRelu. TODO: Add tests for _intrinsic modules. Conv BN fusion code is based on DsKhudia's implementation Differential Revision: D16199720 fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2	2019-07-23 14:54:19 -07:00
Edward Yang	fdfc676eb6	Invert ownership between PyFunction and THPFunction. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22983 Test Plan: Imported from OSS Differential Revision: D16422209 Pulled By: ezyang fbshipit-source-id: d6e41a1606484fbbd7a95a547b83a4199151be68	2019-07-22 14:13:14 -07:00
Igor Fedan	c2df54d6d0	avg_pool2d avg_pool3d for LongTensor (#22433 ) Summary: Generate avg_pool2d/avg_pool3d for LongTensor for CPU. Added divisor_override parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22433 Differential Revision: D16108809 Pulled By: ifedan fbshipit-source-id: 8de7ff585a0479702cceafb5ccf9dfea62a9cc50	2019-07-17 19:59:09 -07:00
Junjie Bai	eb76b7a564	Revert D16199862: [pytorch][PR] [ROCm] Update ROCm CI to python3.6 Differential Revision: D16199862 Original commit changeset: 46ca6029a232 fbshipit-source-id: 2843b919f2655674e39dc764053621994061a12b	2019-07-17 14:26:56 -07:00
iotamudelta	031b406c38	Update ROCm CI to python3.6 (#22322 ) Summary: Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6. This PR adds the skip tests and some semantic changes for PyTorch. Open tasks/questions: * RoiAlignTest.CheckCPUGPUEqual fails in the Caffe2 unit tests. Is this something expects / can be skipped? * for testing, I've used update-alternatives on CentOS/Ubuntu to select python == python 3.6. Is this the preferred way? Pull Request resolved: https://github.com/pytorch/pytorch/pull/22322 Differential Revision: D16199862 Pulled By: ezyang fbshipit-source-id: 46ca6029a232f7d23f3fdb5efc33ae39a379fca8	2019-07-17 13:42:30 -07:00
Jan Schlüter	5adba33c01	Use integer floor division for pooling shape computation (#22304 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21935 by using the integer floor division that was introduced for convolution shapes in https://github.com/pytorch/pytorch/issues/9640. Without this fix, the pooling operators can produce a 1-element output in cases they shouldn't. Disclaimer: I couldn't properly test it locally (it's not picking up the modified version for some reason). I'm marking this WIP until I checked what the CI tools say... Pull Request resolved: https://github.com/pytorch/pytorch/pull/22304 Differential Revision: D16181955 Pulled By: ezyang fbshipit-source-id: a2405372753572548b40616d1206848b527c8121	2019-07-17 13:23:29 -07:00
Lucas Adams	c6fe864db3	Add key_padding_mask kwarg to Transformer (#22588 ) Summary: Motivation: The forward method of MultiheadAttention has a kwarg a key_padding_mask. This mask is of shape (N,S) where N is batch and S is sequence length. This mask is applied prior to attention softmax where True values in the mask are set to float('-inf'). This allows you to mask position j from attention for all position i in input sequence. It's typically used to mask padded inputs. So for a sample in a batch we will be able to make sure no encoder outputs depend on padding inputs. Currently the Transformer, TransformerEncoder, and TransformerEncoderLayer do not have this kwarg, and only have options for a (S,S), (T,T), and (S,T) masks which are applied equally across the batch for source input, target output, and target-source memory respectively. These masks can't be used for padding and are instead used for things like subsequent masking in language modeling, by masking the attention of position i to position j. This diff exposes the key_padding_mask to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods which is ultimately passed to MultiheadAttention forward. Open question: should we also allow a key_padding_mask for the decoder layer? As padding is usually at the end of each sentence in a batch and sentences are usually decoding from left to right, usually people deal with padding on decoded outputs by just masking those outputs at the loss layer. There might be some scenarios where it's needed though I don't think it would be common. People can also still just subclass and override the layers. We could also pass the input key_padding_mask to the memory <> decoder attention layer. Not sure if that's necessary though because the output of position i from each attention encoder layer won't depend on any masked positions in the input (even if position i is a masked position itself) so there's not really any point in masking position i again. Adds the key_padding_mask kwarg to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods. The standard TransformerEncoderLayer uses a MultiheadAttention layer as self_attn. MultiheadAttention forward method has a key_padding_mask kwarg that allows for masking of values such as padding per sequence in a batch, in contrast to the attn_mask kwarg which is usually of shape (S,S) and applied equally across the batch. MultiheadAttention calls functional.multi_head_attention_forward, which has the same key_padding_mask kwarg of shape (N,S). Masked (True) values are set to float('-inf'). Pull Request resolved: https://github.com/pytorch/pytorch/pull/22588 Test Plan: buck test mode/dev caffe2/test:nn -- 'test_transformerencoderlayer \(test_nn\.TestNN\)' buck test mode/dev caffe2/test:nn -- 'test_Transformer_cell \(test_nn\.TestNN\)' buck test mode/dev caffe2/test:nn -- 'test_transformer_args_check \(test_nn\.TestNN\)' Differential Revision: D16112263 Pulled By: lucasgadams fbshipit-source-id: dc4147dd1f89b55a4c94e8c701f16f0ffdc1d1a2	2019-07-16 11:57:22 -07:00
Mingzhe Li	573d9e6975	Support Linear operation with fp16 weights in ATen (#22023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22023 This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation: Y = X * W + B with dtypes: (fp32, fp32, fp16, fp32) To do that, three steps are needed: 1. Quantize weights from fp32 to fp16, this is done using `PackedGemmMatrixFP16` in the `fbgemm_pack_gemm_matrix_fp16` 2. Conduct matrix multiplication with quantized weights using `cblas_gemm_compute` in `fbgemm_linear_fp16_weight` 3. Add bias to the result from step2 and return the final Y Reviewed By: jianyuh Differential Revision: D15921768 fbshipit-source-id: dc4e5b366f846ce9d58975876940a9b3372b8b8d	2019-07-12 15:59:13 -07:00
Tongzhou Wang	368dbb9ab3	Fix a FIXME in test_nn (#22675 ) Summary: https://github.com/pytorch/pytorch/issues/17262 is already resolved, so this should pass now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22675 Differential Revision: D16188003 Pulled By: zou3519 fbshipit-source-id: 32693229a0590b274ed1bf76b815f17e77c2d3ea	2019-07-10 13:12:50 -07:00
SsnL	478d480d37	Add Module.requires_grad_ (#22576 ) Summary: addresses https://github.com/pytorch/pytorch/issues/20241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22576 Differential Revision: D16149314 Pulled By: zou3519 fbshipit-source-id: 1cc4c1ec084df30e00e9ae73ce1a53494a034d5c	2019-07-08 12:13:07 -07:00
SsnL	d48cbd62cd	Fix spectral_norm load_state_dict with strict=False (#22545 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21251 also fixes some missing hook removals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22545 Differential Revision: D16139506 Pulled By: soumith fbshipit-source-id: 552a9f9f91be328a47ee8f1e1d29c1f59b0ebca3	2019-07-07 19:08:48 -07:00
Guanheng Zhang	bb0f299f27	Update MultiheadAttention module support key/value with different number of features and allow static key/value (#21288 ) Summary: The changes include: 1. Allow key/value to have different number of features with query. It supports the case when key and value have different feature dimensions. 2. Support three separate proj_weight, in addition to a single in_proj_weight. The proj_weight of key and value may have different dimension with that of query so three separate proj_weights are necessary. In case that key and value have same dimension as query, it is preferred to use a single large proj_weight for performance reason. However, it should be noted that using a single large weight or three separate weights is a size-dependent decision. 3. Give an option to use static k and v in the multihead_attn operator (see saved_k and saved_v). Those static key/value tensors can now be re-used when training the model. 4. Add more test cases to cover the arguments. Note: current users should not be affected by the changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21288 Differential Revision: D15738808 Pulled By: zhangguanheng66 fbshipit-source-id: 288b995787ad55fba374184b3d15b5c6fe9abb5c	2019-07-02 18:06:25 -07:00
Jerry Zhang	577c04c490	add mutation support for forward_pre_hook and forward_hook (#22285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22285 Previously forward hooks are expected to return None, this PR adds the support to overwrite input and output in `forward_pre_hook` and `forward_hook`, this is used to implement inserting quant/dequant function calls around forward functions. Differential Revision: D16022491 fbshipit-source-id: 02340080745f22c8ea8a2f80c2c08e3a88e37253	2019-07-01 11:06:42 -07:00
Igor Fedan	04fe2453c4	conv2d/conv3d for LongTensor (#20730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20730 Generates forward conv2d function for LongTensor Differential Revision: D15423753 fbshipit-source-id: 0e770b61257cc4c6559581796bf104ef68155c84	2019-06-26 15:29:56 -07:00
Hong Xu	299ea84a70	Use latest stable flake8-bugbear in CI and fix B011 flake8 error. (#21944 ) Summary: - PyCQA/flake8-bugbear#53 has been fixed (but not yet closed on their side) and a new version of flake8-bugbear has been released on Mar 28, 2019. Switch CI to use the latest stable version. - Fix the new B011 errors that flake8-bugbear catches in the current codebase. --- B011: Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/21944 Differential Revision: D15974842 Pulled By: soumith fbshipit-source-id: de5c2c07015f7f1c50cb3904c651914b8c83bf5c	2019-06-24 20:48:15 -07:00
Ziyang Hu	0ac28c8966	Quick fix for #18215 , the CPU case (#21910 ) Summary: The bug is that when target_length == 0, there is no preceding BLANK state and the original implementation will lead to out of bound pointer access. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21910 Differential Revision: D15960239 Pulled By: ezyang fbshipit-source-id: 7bbbecb7bf91842735c14265612c7e5049c4d9b3	2019-06-24 15:26:58 -07:00
Adam Paszke	41d0525de3	Improve repr for IncompatibleKeys (#22119 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20128. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22119 Differential Revision: D15961965 Pulled By: ezyang fbshipit-source-id: 9cc397726e6bea5580e79d291cfc1ee75337fa0c	2019-06-24 15:26:54 -07:00
Adam Paszke	c0f96aaf01	Restore default values on premature test exit (#22115 ) Summary: Previously any assert failures would leave the updated setting, making the test suite semantics dependent on the order in which the tests are run. The diff is large only due to the indentation change (might be good to review without whitespace changes). cc yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22115 Differential Revision: D15960875 Pulled By: soumith fbshipit-source-id: 9313695277fc2d968786f13371719e03fff18519	2019-06-23 12:55:00 -07:00
Dmytro Dzhulgakov	82dd69326b	Split nn.Module._save_to_state_dict to make it overridable (#21933 ) Summary: # Motivation We allow to override JIT module serialization with `__getstate__/__setstate__` in order to cover cases where parameters are not serializable. Use cases include: MKLDNN integration: `a388c78350/torch/utils/mkldnn.py (L18-L26)` and also fbgemm prepacked format integration for quantized tensors. However many Eager scripts use `torch.save(module.state_dict())` form of serialization. There are several ways to make it work: * make packed_weight itself pickleable (e.g. by binding `__getstate__/__setstate__` on C++ UDT level) * change: we’d need to allow module buffers to be of arbitrary, non-Tensor types * pro: no change to state_dict behavior * cons: might not be directly inspectable by user calling .state_dict(), especially if packed weights represent several tensors fused together * make packed_weight being proper Tensor layout * pro: no change to state_dict or buffers behavior * cons: adding new tensor layouts is pretty costly today * cons: doesn’t work if multiple tensors are packed in one interleaved representation * [this approach] allow Modules to override state_dict and return regular tensors * pro: most flexible and hackable * pro: maintains semantic meaning of statedict as all data necessary to represent module’s state * cons: complicates state_dict logic * cons: potential code duplication between `__getstate__/__setstate__` Based on discussions with zdevito and gchanan we decided to pick latter approach. Rationale: this behavior is fully opt-in and will impact only modules that need it. For those modules the requirement listed above won't be true. But we do preserve requirement that all elements of state_dict are tensors. (https://fburl.com/qgybrug4 for internal discussion) In the future we might also implement one of the approaches above but those are more involved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21933 Differential Revision: D15937678 Pulled By: dzhulgakov fbshipit-source-id: 3cb5d1a8304d04def7aabc0969d0a2e7be182367	2019-06-21 09:55:22 -07:00
Will Feng	6b972795e4	Add `torch.__future__._overwrite_module_params_on_conversion` global flag, and check it in `nn.Module._apply()` (#21613 ) Summary: https://github.com/pytorch/pytorch/pull/17072 breaks `model.to(xla_device)`, because moving `model` to XLA device involves changing its parameters' TensorImpl type, and the current implementation of `nn.Module.to()` doesn't support changing module parameters' TensorImpl type: ```python # `6dc445e1a8/torch/nn/modules/module.py (L192-L208)` def _apply(self, fn): ... for param in self._parameters.values(): if param is not None: # Tensors stored in modules are graph leaves, and we don't # want to create copy nodes, so we have to unpack the data. param.data = fn(param.data) # NOTE: this doesn't allow changing `param.data`'s TensorImpl type if param._grad is not None: param._grad.data = fn(param._grad.data) # NOTE: this doesn't allow changing `param._grad.data`'s TensorImpl type ... ``` yf225 TODO: fix the description here when we finish the implementation To fix this problem, we introduce a new API `model.to_()` that always assign new tensors to the parameters (thus supporting changing the parameters to any TensorImpl type), and also bump the version counter of the original parameters correctly so that they are invalidated in any autograd graph they participate in. We also add warning to the current `model.to()` API to inform users about the upcoming behavior change of `model.to()`: in future releases, it would create and return a new model instead of in-place updating the current model. This unblocks adding XLA to our CI test suite, which also allows XLA to catch up with other changes in our codebase, notably the c10 dispatcher. [xla ci] cc. resistor ailzhang Pull Request resolved: https://github.com/pytorch/pytorch/pull/21613 Differential Revision: D15895387 Pulled By: yf225 fbshipit-source-id: b79f230fb06019122a37fdf0711bf2130a016fe6	2019-06-19 10:30:02 -07:00
Will Feng	4b1df5c1f5	Use fn(param) instead of fn(param.data) in nn.Module._apply (#21865 ) Summary: When we pass `fn` to `nn.Module._apply()` and `fn` is an in-place operation, the correct behavior should also include bumping the parameters' and their gradients' version counters. This PR fixes the old incorrect behavior and makes sure the new behavior is right. Note that this PR is BC-breaking in the following way: Previously, passing an in-place operation to `nn.Module._apply()` does not bump the module's parameters' and their gradients' version counters. After this PR, the module's parameters' and their gradients' version counters will be correctly bumped by the in-place operation, which will invalidate them in any autograd graph they previously participate in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21865 Differential Revision: D15881952 Pulled By: yf225 fbshipit-source-id: 62f9244a4283a110147e9f20145ff232a5579fbd	2019-06-18 20:45:40 -07:00
LUO luoyuchu	b403b10ff9	Fix #11752 : fix numerical issue in log_softmax (#21672 ) Summary: https://github.com/pytorch/pytorch/issues/11866 has corrected this issue in function `host_softmax` (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in https://github.com/pytorch/pytorch/issues/11752. `log_softmax` is still not working for big logits. I have looked into the source code, found that example had called `vec_host_softmax_lastdim`, not `host_softmax`. This code fixes the issue in `_vec_log_softmax_lastdim` and has a test for `log_softmax`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21672 Differential Revision: D15856327 Pulled By: VitalyFedyunin fbshipit-source-id: 7a1fd3c0a03d366c99eb873e235361e4fcfa7567	2019-06-17 12:59:08 -07:00
Edward Yang	029a968212	Define __setstate__ on _ConvNd to handle pre-padding_mode pickles. (#21687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21687 ghimport-source-id: df49530d25239ac4d62eae83c5d7b0d8f00f836a Differential Revision: D15807402 Pulled By: ezyang fbshipit-source-id: f51b221444afc4e017db7544642a9c0a7d2a3efb	2019-06-14 11:00:21 -07:00
Natalia Gimelshein	efd20de276	fix multihead attention for half (#21658 ) Summary: Currently multihead attention for half type is broken ``` File "/home/ngimel/pytorch/torch/nn/functional.py", line 3279, in multi_head_attention_forward attn_output = torch.bmm(attn_output_weights, v) RuntimeError: Expected object of scalar type Float but got scalar type Half for argument https://github.com/pytorch/pytorch/issues/2 'mat2' ``` because softmax converts half inputs into fp32 inputs. This is unnecessary - all the computations in softmax will be done in fp32 anyway, and the results need to be converted into fp16 for the subsequent batch matrix multiply, so nothing is gained by writing them out in fp32. This PR gets rid of type casting in softmax, so that half works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21658 Differential Revision: D15807487 Pulled By: zhangguanheng66 fbshipit-source-id: 4709ec71a36383d0d35a8f01021e12e22b94992d	2019-06-13 15:17:04 -07:00
Guanheng Zhang	83cec5f3ee	nn.Transformer (#20170 ) Summary: Accidentally rebased the old PR and make it too messy. Find it here (https://github.com/pytorch/pytorch/pull/19274) Create a PR for comments. The model is still WIP but I want to have some feedbacks before moving too far. The transformer model depends on several modules, like MultiheadAttention (landed). Transformer is implemented based on the paper (https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). Users have the flexibility to build a transformer with self-defined and/or built-in components (i.e encoder, decoder, encoder_layer, decoder_layer). Users could use Transformer class to build a standard transformer model and modify sub-layers as needed. Add a few unit tests for the transformer module, as follow: TestNN.test_Transformer_cell TestNN.test_transformerencoderlayer TestNN.test_transformerdecoderlayer TestNN.test_transformer_args_check TestScript.test_scriptmodule_transformer_cuda There is another demonstration example for applying transformer module on the word language problem. https://github.com/pytorch/examples/pull/555 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20170 Differential Revision: D15417983 Pulled By: zhangguanheng66 fbshipit-source-id: 7ce771a7e27715acd9a23d60bf44917a90d1d572	2019-06-12 12:22:12 -07:00
huba	b144ba66d5	Change PyTorch tests to use non-default CUDA stream (#21474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21474 ghimport-source-id: b2477765362248a80557d1a20db02a1290bdcde3 Differential Revision: D15699700 Pulled By: fbhuba fbshipit-source-id: 1aa4309fec0982c8477cfab29ca5f42d2b171f97	2019-06-07 10:24:48 -07:00
Thomas Viehmann	3feb40d602	pack_padded_sequence: Check for empty (zero-element) tensors (#21461 ) Summary: Fixes: #20529 Thank you, JamieCT for the bug report with reproducing script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21461 Differential Revision: D15696183 Pulled By: ezyang fbshipit-source-id: a93cde2c924f8447563c64ce8a1cf75fcee60a01	2019-06-06 13:41:52 -07:00
Junjie Bai	b647804a55	Fix embedding bag nan output when input is empty (#21400 ) Summary: ``` import torch Embed = torch.nn.EmbeddingBag(100, 10, sparse=True) print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0]))) print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0, 0]))) ``` Before this fix: ``` tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]]) tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) ``` After this fix: ``` tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21400 Differential Revision: D15643357 Pulled By: bddppq fbshipit-source-id: 119eba38129dc0a3757c331304a18044714fcca5	2019-06-06 13:03:17 -07:00
Syed Tousif Ahmed	155f767382	Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#21287 ) Summary: ## Effective Bandwidth Benchmark - using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68 - on V100 ### Float Type #### Before: ``` normal, size, elements 65536 forward 4.956722259521484e-06 bandwidth (GB/s) 52.88656218258779 normal, size, elements 131072 forward 5.285739898681641e-06 bandwidth (GB/s) 99.18914098114568 normal, size, elements 262144 forward 7.548332214355469e-06 bandwidth (GB/s) 138.91492454529376 normal, size, elements 524288 forward 1.1980533599853516e-05 bandwidth (GB/s) 175.0466273076219 normal, size, elements 1048576 forward 2.091646194458008e-05 bandwidth (GB/s) 200.52645667862762 normal, size, elements `2097152` forward 3.9961338043212894e-05 bandwidth (GB/s) 209.91809610901498 normal, size, elements 4194304 forward 7.39765167236328e-05 bandwidth (GB/s) 226.79110538115253 normal, size, elements 8388608 forward 0.0001377725601196289 bandwidth (GB/s) 243.5494555001696 normal, size, elements 16777216 forward 0.0002710080146789551 bandwidth (GB/s) 247.62686107087774 normal, size, elements 33554432 forward 0.0005375170707702637 bandwidth (GB/s) 249.69947058177252 ``` #### After: ``` normal, size, elements 65536 forward 6.198883056640625e-06 bandwidth (GB/s) 42.288908760615385 normal, size, elements 131072 forward 6.756782531738281e-06 bandwidth (GB/s) 77.59432800112916 normal, size, elements 262144 forward 7.560253143310547e-06 bandwidth (GB/s) 138.6958849291706 normal, size, elements 524288 forward 7.550716400146485e-06 bandwidth (GB/s) 277.7421225831386 normal, size, elements 1048576 forward 1.1034011840820313e-05 bandwidth (GB/s) 380.1250225673293 normal, size, elements `2097152` forward 1.802682876586914e-05 bandwidth (GB/s) 465.34019427102237 normal, size, elements 4194304 forward 2.8417110443115234e-05 bandwidth (GB/s) 590.3913430460946 normal, size, elements 8388608 forward 4.8711299896240235e-05 bandwidth (GB/s) 688.8428777608927 normal, size, elements 16777216 forward 9.685993194580078e-05 bandwidth (GB/s) 692.8444265018856 normal, size, elements 33554432 forward 0.00018213510513305663 bandwidth (GB/s) 736.9130069787966 ``` ### Double Type #### Before: ``` normal, size, elements 65536 forward 5.8841705322265624e-06 bandwidth (GB/s) 44.55071425348461 normal, size, elements 131072 forward 8.018016815185547e-06 bandwidth (GB/s) 65.38873789925661 normal, size, elements 262144 forward 1.2989044189453124e-05 bandwidth (GB/s) 80.72772597474304 normal, size, elements 524288 forward 2.2075176239013673e-05 bandwidth (GB/s) 95.00046465285668 normal, size, elements 1048576 forward 4.1041374206542965e-05 bandwidth (GB/s) 102.19696784254678 normal, size, elements `2097152` forward 7.57598876953125e-05 bandwidth (GB/s) 110.72624650312186 normal, size, elements 4194304 forward 0.00013725996017456056 bandwidth (GB/s) 122.22949779865557 normal, size, elements 8388608 forward 0.0002614736557006836 bandwidth (GB/s) 128.32815569921402 normal, size, elements 16777216 forward 0.0005080199241638184 bandwidth (GB/s) 132.0988819689674 normal, size, elements 33554432 forward 0.0009479570388793945 bandwidth (GB/s) 141.58629821311564 ``` #### After: ``` normal, size, elements 65536 forward 5.991458892822265e-06 bandwidth (GB/s) 43.75294977222444 normal, size, elements 131072 forward 7.293224334716797e-06 bandwidth (GB/s) 71.88699756626349 normal, size, elements 262144 forward 8.094310760498048e-06 bandwidth (GB/s) 129.54481623281296 normal, size, elements 524288 forward 1.2805461883544922e-05 bandwidth (GB/s) 163.7701177100726 normal, size, elements 1048576 forward 2.2592544555664064e-05 bandwidth (GB/s) 185.64991604491345 normal, size, elements `2097152` forward 3.801822662353516e-05 bandwidth (GB/s) 220.6470092112881 normal, size, elements 4194304 forward 6.761550903320313e-05 bandwidth (GB/s) 248.1267425164457 normal, size, elements 8388608 forward 0.00013209104537963867 bandwidth (GB/s) 254.02503177684966 normal, size, elements 16777216 forward 0.0002667689323425293 bandwidth (GB/s) 251.56176699703818 normal, size, elements 33554432 forward 0.0004705166816711426 bandwidth (GB/s) 285.25604559501795 ``` Resubmit of #20621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21287 Differential Revision: D15603695 Pulled By: ezyang fbshipit-source-id: f8c5032678d503d45ac99fb1475a929df7c2b361	2019-06-03 09:45:02 -07:00
Shen Li	f62a006097	Retry Fix Python DataParallel RNN in no_grad mode (#21262 ) Summary: Retry #21197 The previous one failed because it uses some Python3 only syntax. ezyang Do we still have multi-GPU py2 tests? I am curious why the CI tests did not catch this error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21262 Differential Revision: D15598941 Pulled By: mrshenli fbshipit-source-id: 95f416589448c443685d6d236d205b011998a715	2019-06-03 08:04:35 -07:00
Xiaomeng Yang	31c79b71ff	Add gelu gradient for pytorch (#21237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21237 Add gelu gradient for pytorch Reviewed By: zheng-xq Differential Revision: D15589816 fbshipit-source-id: 76fda7c413afed5b6cc3abe3a26c258d393a53ce	2019-06-02 09:42:42 -07:00
Xiaomeng Yang	93ae040ff0	Add gelu activation in pytorch (#20665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20665 Add gelu activation forward on CPU in pytorch Compare to current python implemented version of gelu in BERT model like def gelu(self, x): x * 0.5 * (1.0 + torch.erf(x / self.sqrt_two)) The torch.nn.functional.gelu function can reduce the forward time from 333ms to 109ms (with MKL) / 112ms (without MKL) for input size = [64, 128, 56, 56] on a devvm. Reviewed By: zheng-xq Differential Revision: D15400974 fbshipit-source-id: f606b43d1dd64e3c42a12c4991411d47551a8121	2019-06-02 09:08:47 -07:00
Karl Ostmo	aac424a6c4	Revert D15577342: [pytorch][PR] Fix Python DataParallel RNN in no_grad mode Differential Revision: D15577342 Original commit changeset: 1a024c572171 fbshipit-source-id: 9a3ddc14ebb2d75d9dc3ee1fe69df9ffba3529de	2019-06-01 22:17:19 -07:00
Shen Li	51ebbe970a	Fix Python DataParallel RNN in no_grad mode (#21197 ) Summary: Fixes #21108 When grad is disabled, Python autograd function outputs are [wrapped as detached aliases](`8cde4c4d22/torch/csrc/autograd/python_function.cpp (L395-L399)`), which prevents calling `Tensor.set_()` on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call `rnn.flatten_parameters()` in the forward pass, as the function [calls `set_()`](`9d09f5df6c/aten/src/ATen/native/cudnn/RNN.cpp (L669)`). The proposed solution is to avoid using an autograd Broadcast if in no_grad mode. apsdehal Pull Request resolved: https://github.com/pytorch/pytorch/pull/21197 Differential Revision: D15577342 Pulled By: mrshenli fbshipit-source-id: 1a024c572171a3f2daca9454fd3ee6450d112f7c	2019-06-01 10:37:57 -07:00
Qian Hong	d17aa72373	Added more regression test for groupconv w/o bias. (#18519 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/18218, which was fixed by https://github.com/pytorch/pytorch/pull/18463 with mkl-dnn upgraded to v0.18.1. Covering special case when group > 1, input-channel / group < 16 and output-channel is multiple of 16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18519 Differential Revision: D14643071 Pulled By: soumith fbshipit-source-id: d0ebed59326c67089e042b50583b87ed2c3ccc2f	2019-05-30 11:36:07 -07:00
Guanheng Zhang	8e3311c5e2	Remove functionality unsupported by the JIT from multi_head_attention_forward. (#20653 ) Summary: Remove the internal functions in multi_head_attention_forward. Those internal functions cause 10-15% performance regression and there is possibly a JIT issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20653 Differential Revision: D15398888 Pulled By: cpuhrsch fbshipit-source-id: 0a3f053a4ade5009e73d3974fa6733c2bff9d929	2019-05-27 15:12:58 -07:00
Sam Gross	c1fa449763	Break reference cycle in load_state_dict (#20397 ) Summary: load_state_dict includes a recursive inner function `load` that captures Tensors through the close-over variable `state_dict`. Because it's recursive, it also captures itself leading to a reference cycle. This breaks the reference cycle so that any Tensors in state_dict can be collected immediately instead of waiting until the next GC cycle. Alternatively, we could have passed `state_dict` and `metadata` as arguments to load to prevent capture of Tensors. (That would still result in cyclic garbage, but not any cyclic garbage of Tensors). See: https://github.com/pytorch/pytorch/issues/20199#issuecomment-491089004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20397 Differential Revision: D15414834 Pulled By: colesbury fbshipit-source-id: 4c2275a08b2d8043deb3779db28be03bda15872d	2019-05-20 11:46:00 -07:00
Natalia Gimelshein	66c6133264	fix empty dropout (#20541 ) Summary: Fix for #20499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20541 Differential Revision: D15372461 Pulled By: ezyang fbshipit-source-id: cdc237a98244515a573216a6dac4826261c973f9	2019-05-16 09:33:51 -07:00
Guanheng Zhang	41673d477c	Disable incremental_state function in MultiheadAttention module. (#20177 ) Summary: To fully support incremental_state function, it requires several additional utils available in fairseq. However, we lack a problem for the unit test. Therefore, the incremental_state function will be disable for now. If it is needed in the future, a feature request could be created. Fixed #20132 Add some unit tests to cover the arguments of MultiheadAttention module, including bias, add_bias_kv, add_zero_attn, key_padding_mask, need_weights, attn_mask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20177 Differential Revision: D15304575 Pulled By: cpuhrsch fbshipit-source-id: ebd8cc0f11a4da0c0998bf0c7e4e341585e5685a	2019-05-13 08:21:15 -07:00
Brian Vaughan	d68802ba47	Sparse half embeddings on cuda (#19695 ) Summary: ``` import torch a = torch.nn.Embedding(3, 4, sparse=True).half().cuda() a(torch.LongTensor([1, 0]).cuda()).sum().backward() ``` gave: `RuntimeError: torch.cuda.sparse.HalfTensor is not enabled` This PR enables sparse.HalfTensor on cuda. Still won't work for CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19695 Differential Revision: D15281162 Pulled By: nairbv fbshipit-source-id: 0d83d946a059393bd53d8b8102e2daa9b4c02588	2019-05-10 08:00:55 -07:00
Ailing Zhang	0effe1d4a4	Make interpolate bicubic match opencv result (#19703 ) Summary: Fixes #19650 When driazati started bicubic implementation we used TF result as ground truth. It turns out opencv version bicubic resize is used more commonly. This PR does two things: - Fix a bug where we didn't use area mode to compute source index - Follow the Opencv logic to handle computed negative source indices(we used to bound them by 0). Pull Request resolved: https://github.com/pytorch/pytorch/pull/19703 Differential Revision: D15078159 Pulled By: ailzhang fbshipit-source-id: 06a32baf2fbc93b90a156b863b4f9fab326d3242	2019-04-25 10:21:31 -07:00
James Reed	d17c22d024	Improve embedding_bag add kernel (#19329 ) Summary: This was actually getting pretty poor throughput with respect to memory bandwidth. I used this test to measure the memory bandwidth specifically for the AXPY call: https://gist.github.com/jamesr66a/b27ff9ecbe036eed5ec310c0a3cc53c5 And I got ~8 GB/s before this change, but ~14 GB/s after this change. This seems to speed up the operator overall by around 1.3x (benchmark: https://gist.github.com/jamesr66a/c533817c334d0be432720ef5e54a4166): == Before == time_per_iter 0.0001298875093460083 GB/s 3.082544287868467 == After == time_per_iter 0.00010104801654815674 GB/s 3.9623142905451076 The large difference between the local BW increase and the full-op BW increase likely indicates significant time is being spent elsewhere in the op, so I will investigate that. EDIT: I updated this PR to include a call into caffe2/perfkernels. This is the progression: before time_per_iter 8.983819484710693e-05 GB/s 4.456723564864611 After no axpy time_per_iter 7.19951868057251e-05 GB/s 5.56126065872172 AFter perfkernels time_per_iter 5.6699180603027346e-05 GB/s 7.061548257694262 After perfkernels no grad time_per_iter 4.388842582702637e-05 GB/s 9.122769670026413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19329 Reviewed By: dzhulgakov Differential Revision: D14969630 Pulled By: jamesr66a fbshipit-source-id: 42d1015772c87bedd119e33c0aa2c8105160a738	2019-04-19 19:16:24 -07:00
Shen Li	344acaa0ca	Revert replicate.py to disallow replicating multi-device modules (#19278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19278 Based on discussion in https://github.com/pytorch/pytorch/pull/19278 and https://github.com/pytorch/pytorch/pull/18687, changes to replicate.py will be reverted to disallow replicating multi-device modules. Reviewed By: pietern Differential Revision: D14940018 fbshipit-source-id: 7504c0f4325c2639264c52dcbb499e61c9ad2c26	2019-04-16 10:03:38 -07:00
Richard Zou	3b29cbaf86	Enable half for CUDA dense EmbeddingBag backward. (#19293 ) Summary: I audited the relevant kernel and saw it accumulates a good deal into float so it should be fine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19293 Differential Revision: D14942274 Pulled By: zou3519 fbshipit-source-id: 36996ba0fbb29fbfb12b27bfe9c0ad1eb012ba3c	2019-04-16 08:57:20 -07:00
Johannes M Dieterich	d8669a2c7e	Enable working ROCm tests (#19169 ) Summary: Enable multi-GPU tests that work with ROCm 2.2. Have been run three times on CI to ensure stability. While there, remove skipIfRocm annotations for tests that depend on MAGMA. They still skip but now for the correct reason (no MAGMA) to improve our diagnostics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19169 Differential Revision: D14924812 Pulled By: bddppq fbshipit-source-id: 8b88f58bba58a08ddcd439e899a0abc6198fef64	2019-04-12 21:51:10 -07:00

1 2 3 4 5 ...

594 Commits