pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jon Janzen	2387efd356	Revert "[PyTorch] Add native fast path for transformer encoder inference" This reverts commit `b369b89f23`. This has internal changes and should not have been landed via mergebot. Ref: https://github.com/pytorch/pytorch/pull/75809#issuecomment-1108717166	2022-04-25 11:40:02 -04:00
Scott Wolchok	b369b89f23	[PyTorch] Add native fast path for transformer encoder inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/75809 The current PyTorch multi-head attention and transformer implementations are slow. This should speed them up for inference. Differential Revision: [D35239925](https://our.internmc.facebook.com/intern/diff/D35239925/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35239925/)! Approved by: https://github.com/ezyang	2022-04-25 06:11:36 +00:00
Peter Bell	cb37e7a080	Remove F.pad python implementation Pull Request resolved: https://github.com/pytorch/pytorch/pull/73433 Approved by: https://github.com/albanD, https://github.com/jbschlosser	2022-04-23 00:13:20 +00:00
Joel Benjamin Schlosser	041e6e750a	Fix to support no-batch-dim inputs in ConvTransposeNd._output_padding Pull Request resolved: https://github.com/pytorch/pytorch/pull/76151 Approved by: https://github.com/albanD	2022-04-22 19:25:09 +00:00
Nikita Vedeneev	9e137ee583	more numerically stable cosine_similarity Previous behavior: compute inner product, then normalize. This patch: first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms. By design ensures that cosine similarity is within `[-1.0, +1.0]`, so it should fix [#29442](https://github.com/pytorch/pytorch/issues/29442). P.S. I had to change tests because this implementation handles division by 0 differently. This PR computes cosine similarity as follows: <x/max(eps, \|\|x\|\|), y/max(eps, \|\|y\|\|)>. Let f(x,y) = <x,y>/(\|\|x\|\| * \|\|y\|\|), then df/dx = y/(\|\|x\|\| * \|\|y\|\|) - (\|\|y\|\|/\|\|x\|\| * <x,y> * x)/(\|\|x\|\| * \|\|y\|\|)^2. The changed test checks division by zero in backward when x=0 and y != 0. For this case the non-zero part of the gradient is just y / (\|\|x\|\| * \|\|y\|\|). The previous test evaluates y/(\|\|x\|\| * \|\|y\|\|) to y / eps, and this PR to 1/eps * y/\|\|y\|\|. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31378 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-04-22 09:28:50 +00:00
arindamroy-eng	7478ce187a	ROCM:Unskip more tests for ROCM5.0 Re-enabling more tests which are working on ROCM5.0 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353 Approved by: https://github.com/ezyang	2022-04-19 19:45:55 +00:00
George Qi	f5517761aa	add operator header Pull Request resolved: https://github.com/pytorch/pytorch/pull/71502 Approved by: https://github.com/zrphercule, https://github.com/cpuhrsch	2022-04-19 15:23:25 +00:00
yanbing-j	dc2e630341	Optimize PReLU (float32) and enable PReLU BFloat16 support in CPU path (#63634 ) Summary: In this PR, we try to optimize PReLU op in CPU path, and enable BFloat16 support based on the optimized PReLU. The original implementation uses parallel_for to accelerate operation speed, but vectorization is not used. It can be optimized by using TensorIterator, both including parallelization and vectorization. The difference between PReLU and other activation function ops, is that PReLU supports a learnable parameter `weight`. When called without arguments, nn.PReLU() uses a single parameter `weight` across all input channels. If called with nn.PReLU(nChannels), a separate `weight` is used for each input channel. So we cannot simply use TensorIterator because `weight` is different for each input channel. In order to use TensorIterator, `weight` should be broadcasted to `input` shape. And with vectorization and parallel_for, this implementation is much faster than the original one. Another advantage is, don't need to separate `share weights` and `multiple weights` in implementation. We test the performance between the PReLU implementation of public Pytorch and the optimized PReLU in this PR, including fp32/bf16, forward/backward, share weights/multiple weights configurations. bf16 in public Pytorch directly reuses `Vectorized<scalar_t>` for `BFloat16`. Share weights: ![image](https://user-images.githubusercontent.com/61222868/130403002-ef271bee-0cae-460b-b796-46853599c210.png) ![image](https://user-images.githubusercontent.com/61222868/130403028-96753102-bea3-44c2-8656-2526469e0627.png) Multiple weights: ![image](https://user-images.githubusercontent.com/61222868/130403059-a3418eb2-9546-471f-b057-15bc0e46f0d0.png) ![image](https://user-images.githubusercontent.com/61222868/130403070-8c620db9-f354-4ddd-b5d5-4557e10ea77a.png) cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/63634 Reviewed By: yinghai Differential Revision: D34031616 Pulled By: frank-wei fbshipit-source-id: 04e2a0f9e92c658fba7ff56b1010eacb7e8ab44c (cherry picked from commit ed262b15487557720bb0d498f9f2e8fcdba772d9)	2022-04-15 21:46:24 +00:00
PyTorch MergeBot	e8ed042043	Revert "Optimize PReLU (float32) and enable PReLU BFloat16 support in CPU path" This reverts commit `263c4c2a95`. Reverted https://github.com/pytorch/pytorch/pull/63634 on behalf of https://github.com/seemethere	2022-04-15 21:41:51 +00:00
yanbing-j	263c4c2a95	Optimize PReLU (float32) and enable PReLU BFloat16 support in CPU path In this PR, we try to optimize PReLU op in CPU path, and enable BFloat16 support based on the optimized PReLU. The original implementation uses parallel_for to accelerate operation speed, but vectorization is not used. It can be optimized by using TensorIterator, both including parallelization and vectorization. The difference between PReLU and other activation function ops, is that PReLU supports a learnable parameter `weight`. When called without arguments, nn.PReLU() uses a single parameter `weight` across all input channels. If called with nn.PReLU(nChannels), a separate `weight` is used for each input channel. So we cannot simply use TensorIterator because `weight` is different for each input channel. In order to use TensorIterator, `weight` should be broadcasted to `input` shape. And with vectorization and parallel_for, this implementation is much faster than the original one. Another advantage is, don't need to separate `share weights` and `multiple weights` in implementation. We test the performance between the PReLU implementation of public Pytorch and the optimized PReLU in this PR, including fp32/bf16, forward/backward, share weights/multiple weights configurations. bf16 in public Pytorch directly reuses `Vectorized<scalar_t>` for `BFloat16`. Share weights: ![image](https://user-images.githubusercontent.com/61222868/130403002-ef271bee-0cae-460b-b796-46853599c210.png) ![image](https://user-images.githubusercontent.com/61222868/130403028-96753102-bea3-44c2-8656-2526469e0627.png) Multiple weights: ![image](https://user-images.githubusercontent.com/61222868/130403059-a3418eb2-9546-471f-b057-15bc0e46f0d0.png) ![image](https://user-images.githubusercontent.com/61222868/130403070-8c620db9-f354-4ddd-b5d5-4557e10ea77a.png) cc @albanD @mruberry @jbschlosser @walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/63634 Approved by: https://github.com/frank-wei, https://github.com/seemethere	2022-04-15 20:34:58 +00:00
Scott Wolchok	56f801e788	[PyTorch] Add test for all-masked case for native softmax It returns all NaNs. CUDA implementation required a fix for this. Differential Revision: [D35327730](https://our.internmc.facebook.com/intern/diff/D35327730/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75803 Approved by: https://github.com/ngimel	2022-04-14 21:30:57 +00:00
Scott Wolchok	d4c527e738	[PyTorch] Run test_transformerencoderlayer_gelu on CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/75347 Preparing to add native fast path; need to test on CUDA! Differential Revision: [D35327729](https://our.internmc.facebook.com/intern/diff/D35327729/) Approved by: https://github.com/ngimel	2022-04-14 21:30:57 +00:00
Scott Wolchok	96cf8a450a	[PyTorch] Run test_transformerencoderlayer on CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/75346 Preparing to add native fast path; need to test on CUDA! Differential Revision: [D35327731](https://our.internmc.facebook.com/intern/diff/D35327731/) Approved by: https://github.com/ngimel	2022-04-14 21:30:56 +00:00
Rohan Varma	a4126e5936	Add test to make sure submodule hooks fire Pull Request resolved: https://github.com/pytorch/pytorch/pull/75421 As part of FSDP work, we will be relying on `_register_load_state_dict_pre_hook` to manage some specific logic related to loading state dicts. This PR adds a test to ensure that _register_load_state_dict_pre_hook can be used to register hooks on modules that will be used in a nested way, and then calling load_state_dict on the overall module still calls those hooks appropriately. Differential Revision: [D35434726](https://our.internmc.facebook.com/intern/diff/D35434726/) Approved by: https://github.com/albanD	2022-04-12 20:11:38 +00:00
HDCharles	25ee52570e	[ao][sparsity] comsability for sparsity and QAT convert Summary: The primary issue for enabling sparsity to work with QAT convert (unlike normal quantization convert) is that when the parametrized module undergoes the QAT convert, the parametrizations need to be maintained. If the parametrizations don't get transfered during the convert, the sparsifier would lose its connection to the model. In practice this was handled using the transfer_parametrizations_and_params function to move the weight and bias and any associated paramerizations to the new module. This PR also adds tests for transfer_parametrizations_and_params and type_before_parametrizations to test_nn.py and also added comments to the test code for composability. Test Plan: python test/test_ao_sparsity.py TestComposability python test/test_nn.py TestNN Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74848 Approved by: https://github.com/vkuzo, https://github.com/Lezcano	2022-04-11 16:32:08 +00:00
kshitij12345	e177d2cc44	[complex] conv3d Reference: https://github.com/pytorch/pytorch/issues/71108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75581 Approved by: https://github.com/anjali411	2022-04-10 19:37:10 +00:00
soulitzer	b10d151745	Ensure convolution_backward respects output_mask Pull Request resolved: https://github.com/pytorch/pytorch/pull/75298 Approved by: https://github.com/albanD	2022-04-08 19:27:41 +00:00
Kshiteej K	fe799374de	[complex] conv2d Reference: https://github.com/pytorch/pytorch/issues/71108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75412 Approved by: https://github.com/anjali411	2022-04-08 16:26:39 +00:00
kshitij12345	706b9e8b8d	[reland] [complex] conv1d Reland : #75013 Reference: #71108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75310 Approved by: https://github.com/anjali411	2022-04-06 17:12:41 +00:00
Jianyu Huang	b8a4708ac0	[pt] Add half precision support for nn.EmbeddingBag (CPU) (#74844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74844 - Use FBGEMM/perf kernel implementation for the fast path. - Use FP32 accumulation for FP16 weight embeddings (`index_select_add` and `index_select_scale_add`). - Add the unit test coverage. Test Plan: ``` buck run mode/opt //ai_codesign/nonprod/jianyuhuang/pytorch_examples:eb Parsing buck files: finished in 0.6 sec Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 01:52.1 min (100%) 12247/12247 jobs, 2/12247 updated Total time: 01:52.8 min BUILD SUCCEEDED tensor([[ 0.1282, -0.0244, 1.0996], [-1.2285, -0.8643, 2.6621]], dtype=torch.float16, grad_fn=<EmbeddingBagBackward0>) tensor([[[-0.1643, 0.1266, -0.4851], [ 0.0710, 0.5024, 0.2798], [ 0.4797, 0.5991, -0.0188], [ 0.8843, 1.2090, 1.6494]], [[ 0.4797, 0.5991, -0.0188], [ 0.0662, -0.4121, 1.5752], [ 0.0710, 0.5024, 0.2798], [-0.8242, 0.2668, -0.6177]]], dtype=torch.float16, grad_fn=<EmbeddingBackward0>) ``` ``` $ buck run mode/opt //caffe2/test:nn -- -r test_embedding_bag_half 2>&1 \| tee output.log PARSING BUCK FILES: FINISHED IN 0.8s CREATING ACTION GRAPH: FINISHED IN 0.0s test_embedding_bag_half_cpu_int32_int32 (test_nn.TestNNDeviceTypeCPU) ... ok test_embedding_bag_half_cpu_int32_int64 (test_nn.TestNNDeviceTypeCPU) ... ok test_embedding_bag_half_cpu_int64_int32 (test_nn.TestNNDeviceTypeCPU) ... ok test_embedding_bag_half_cpu_int64_int64 (test_nn.TestNNDeviceTypeCPU) ... ok test_embedding_bag_half_cuda_int32_int32 (test_nn.TestNNDeviceTypeCUDA) ... ok test_embedding_bag_half_cuda_int32_int64 (test_nn.TestNNDeviceTypeCUDA) ... ok test_embedding_bag_half_cuda_int64_int32 (test_nn.TestNNDeviceTypeCUDA) ... ok test_embedding_bag_half_cuda_int64_int64 (test_nn.TestNNDeviceTypeCUDA) ... ok ---------------------------------------------------------------------- Ran 8 tests in 44.621s OK ``` ``` TORCH_SHOW_CPP_STACKTRACES=1 buck run mode/opt //caffe2/test:nn -- -r test_EmbeddingBag_per_sample_weights_and_new_offsets 2>&1 \| tee output.log ``` Reviewed By: jasonjk-park Differential Revision: D35190299 fbshipit-source-id: d1daa6e837660259b92a1f316b09f38e509ee077 (cherry picked from commit 86f575f2c4cd407c13d4b2eaeea94b59f74642af)	2022-04-06 01:27:46 +00:00
PyTorch MergeBot	862f67454f	Revert "[complex] conv1d" This reverts commit `b64e7dee51`. Reverted https://github.com/pytorch/pytorch/pull/75013 on behalf of https://github.com/mruberry	2022-04-05 20:18:50 +00:00
kshitij12345	b64e7dee51	[complex] conv1d Reference: https://github.com/pytorch/pytorch/issues/71108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75013 Approved by: https://github.com/anjali411	2022-04-05 17:29:59 +00:00
CaoE	77c7a50d46	Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish and softplus on CPU (#63134 ) Summary: Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish and softplus on CPU, and optimize the performance of softshrink. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63134 Reviewed By: yinghai Differential Revision: D34897992 Pulled By: frank-wei fbshipit-source-id: 4c778f5271d6fa54dd78158258941def8d9252f5 (cherry picked from commit decda0e3debf56cc5c4d7faea41b1165a7cabe12)	2022-04-04 20:31:22 +00:00
Nikita Karetnikov	936a65056e	Use the same checks in all `grid_sampler` functions Fixes #73187. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75164 Approved by: https://github.com/albanD	2022-04-04 15:21:44 +00:00
CaoE	c5872e6d6d	Add BFloat16 support for smooth_l1_loss on CPU (#62558 ) Summary: Add BFloat16 support for smooth_l1_loss on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62558 Reviewed By: H-Huang Differential Revision: D34897859 Pulled By: frank-wei fbshipit-source-id: a52138c89852642db78f5f3083d05873f3cdec3a (cherry picked from commit 71908ee3de7ca0580a073350353ce6f234a8c6ff)	2022-04-04 06:05:46 +00:00
Scott Wolchok	8f4f1638bb	[PyTorch] Flip polarity of masked_softmax mask (#78 ) Summary: X-link: https://github.com/pytorch/pytorch-canary/pull/78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75039 It didn't match torch.nn.MultiheadAttention. Now it does. ghstack-source-id: 152815449 Test Plan: updated tests Reviewed By: zrphercule Differential Revision: D34929186 fbshipit-source-id: 1eaee615bafd5a6f058f1faefa54f8f4aa01c92e (cherry picked from commit 00eea72a06fb924112f1036c8f0c5ed08eb0d02c)	2022-04-02 00:17:49 +00:00
Kyle Chen	f888dc5842	[ROCm] re-enable test_Conv2d_groups_nobias tests fixes: https://github.com/pytorch/pytorch/pull/59158 https://github.com/pytorch/pytorch/pull/58701 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75008 Approved by: https://github.com/albanD	2022-04-01 21:42:23 +00:00
Alban Desmaison	0ce02ea52d	Revert D35284563: Use the same checks in all `grid_sampler` functions Test Plan: revert-hammer Differential Revision: D35284563 (`835cc66e5d`) Original commit changeset: 1477c506b875 Original Phabricator Diff: D35284563 (`835cc66e5d`) fbshipit-source-id: 7260f4dfda23bd60200e5ba2c5bf3e4f833c2646 (cherry picked from commit fbe082905ef678e7dd70dbc9520dca644383ce01)	2022-04-01 16:45:46 +00:00
Nikita Karetnikov	835cc66e5d	Use the same checks in all `grid_sampler` functions (#74635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74635 Fixes #73187. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D35284563 Pulled By: albanD fbshipit-source-id: 1477c506b8755d864ca902ee140bee7bdb0069b0 (cherry picked from commit dcbd5242baaae11f9e323d99a9596e5b88e86bd7)	2022-04-01 14:26:16 +00:00
Nikita Shulga	bfac65dfe5	[testing] Update dispatch macros (#74977 ) This PR is reland of #74289 Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>	2022-03-30 14:13:21 -07:00
PyTorch MergeBot	2e4152b118	Revert "[testing] Update dispatch macros" This reverts commit `eed19a0f38`. Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet	2022-03-30 19:52:37 +00:00
kshitij12345	273c2f0124	EmbeddingBagCUDA: remove oob check for perf Fixes: https://github.com/pytorch/pytorch/issues/74751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74767 Approved by: https://github.com/ngimel, https://github.com/xuzhao9	2022-03-30 16:55:49 +00:00
Khushi Agrawal	eed19a0f38	[testing] Update dispatch macros Hi, This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that). Please take a look. Thanks! cc: @pmeier @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289 Approved by: https://github.com/pmeier, https://github.com/mruberry	2022-03-30 16:10:16 +00:00
Michael	3157b1abd7	CrossEntropyLoss triggers floating point exception Adds cast to float to fix floating point exception Fixes #73165 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73837 Approved by: https://github.com/jbschlosser	2022-03-22 14:20:30 +00:00
XiaobingSuper	4a0f6e6c53	report an error if num_channels is not divisible by num_groups for nn.GroupNorm For a GroupNorm module, if num_channels is not divisible by num_groups, we need to report an error when defining a module other than at the running step. example: ``` import torch m = torch.nn.GroupNorm(5, 6) x = torch.randn(1, 6, 4, 4) y = m(x) ``` before: ``` Traceback (most recent call last): File "group_norm_test.py", line 8, in <module> y = m(x) File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl return forward_call(input, *kwargs) File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 271, in forward input, self.num_groups, self.weight, self.bias, self.eps) File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/functional.py", line 2500, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: Expected number of channels in input to be divisible by num_groups, but got input of shape [1, 6, 4, 4] and num_groups=5 ``` after: ``` Traceback (most recent call last): File "group_norm_test.py", line 6, in <module> m = torch.nn.GroupNorm(5, 6) File "/home/xiaobinz/miniconda3/envs/pytorch_test/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 251, in __init__ raise ValueError('num_channels must be divisible by num_groups') ``` This PR also update the doc of num_groups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74293 Approved by: https://github.com/jbschlosser	2022-03-17 13:40:47 +00:00
Nikita Shulga	ef066f0832	Revert D34856571: [pytorch][PR] Replace `get_all_` type macros with the ATen dispatch macros. Test Plan: revert-hammer Differential Revision: D34856571 (`3ded7b1da3`) Original commit changeset: 0dca038bcad5 Original Phabricator Diff: D34856571 (`3ded7b1da3`) fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386 (cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)	2022-03-15 22:07:11 +00:00
Khushi Agrawal	3ded7b1da3	Replace `get_all_` type macros with the ATen dispatch macros. (#71561 ) Summary: Hi, Team! The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros. The files it iterates over are: (Thanks, Lezcano, for the idea!!) <details> <summary> `test/test_autograd.py`</summary> <p> ```python 43:from torch.testing._internal.common_dtype import get_all_dtypes 8506: floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point] ``` </p> </details> <details> <summary> `test/test_binary_ufuncs.py`</summary> <p> ```python 26: all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes, 27: get_all_complex_dtypes, get_all_fp_dtypes, 935: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1035: dtypes(get_all_dtypes( 1488: dtypes((get_all_dtypes(include_bool=False, include_bfloat16=False))) 1879: dtypes(product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False))) 1887: dtypes((get_all_int_dtypes() + [torch.bool])) 1913: dtypes((get_all_fp_dtypes())) 1941: dtypes((get_all_fp_dtypes())) 1977: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 2019: dtypes(product(get_all_fp_dtypes(), get_all_fp_dtypes())) 2048: dtypes(get_all_dtypes()) 2110: dtypes(product(get_all_dtypes(include_complex=False), 2111: get_all_dtypes(include_complex=False))) 2128: types = [torch.bool, torch.bfloat16] + get_all_int_dtypes() 2173: if dtypes[1] in get_all_fp_dtypes(): 2178: dtypes(product(get_all_fp_dtypes(), 2179: get_all_fp_dtypes())) 2260: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2261: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2273: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2274: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2307: dtypes(get_all_math_dtypes('cpu')) 2319: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 2331: dtypes(get_all_int_dtypes()) 2356: dtypes(get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False)) 2393: if dtype in get_all_int_dtypes(): 2614: dtypes(get_all_dtypes()) 2624: dtypes(tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2))) 2806: dtypes(list(product(get_all_dtypes(include_complex=False), 2807: get_all_dtypes(include_complex=False)))) 2866: dtypes(list(product(get_all_complex_dtypes(), 2867: get_all_complex_dtypes()))) 2902: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2906: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2910: dtypes(product(get_all_dtypes(), get_all_dtypes())) 3019: dtypes = [torch.float, torch.double] + get_all_complex_dtypes() 3221: dtypes(get_all_dtypes(include_complex=False)) 3407: dtypes(list(product(get_all_dtypes(include_bool=False), 3408: get_all_dtypes(include_bool=False)))) 3504: dtypes(product(get_all_dtypes(include_complex=False, include_bfloat16=False), 3505: get_all_dtypes(include_complex=False, include_bfloat16=False))) 3516: if x.dtype in get_all_int_dtypes() + [torch.bool]: 3643: dtypes(product(get_all_dtypes(include_complex=False, 3645: get_all_dtypes(include_complex=False, ``` </p> </details> <details> <summary> `test/test_complex.py`</summary> <p> ```python 6:from torch.testing._internal.common_dtype import get_all_complex_dtypes 11: dtypes(get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_foreach.py`</summary> <p> ```python 18: get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 142: if dtype in get_all_int_dtypes(): 179: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 201: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 205: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 211: disable_fastpath \|= dtype not in get_all_complex_dtypes() 241: bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 246: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 248: disable_fastpath \|= dtype not in get_all_complex_dtypes() 250: disable_fastpath \|= True and dtype not in get_all_complex_dtypes() 307: disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool] 365: if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes(): 376: ops(foreach_unary_op_db, dtypes=get_all_dtypes()) 393: dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) 401: ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True)) 426: if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes(): 439: dtypes(get_all_dtypes()) 449: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 481: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 536: if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div: 545: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 637: ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False)) ``` </p> </details> <details> <summary> `test/test_linalg.py`</summary> <p> ```python 29: all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, 30: get_all_fp_dtypes, 111: dtypes((get_all_dtypes())) 794: float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes() 807: dtypes((get_all_int_dtypes())) 828: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 841: if dtype in get_all_complex_dtypes(): 844: dtypes(itertools.product(get_all_dtypes(), 845: get_all_dtypes())) 855: for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3): 5607: get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater))) 5608: dtypes((set(get_all_dtypes()) - {torch.half, torch.bool})) 5644: dtypes((get_all_complex_dtypes() + get_all_fp_dtypes())) 6255: dtypesIfCUDA(get_all_complex_dtypes(), 6256: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)), 6292: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6323: dtypesIfCUDA(get_all_complex_dtypes(), 6324: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6325: dtypes(get_all_complex_dtypes(), get_all_fp_dtypes()) 6358: dtypesIfCUDA(([torch.float, torch.double] + get_all_complex_dtypes())) 6556: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6668: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6741: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_nn.py`</summary> <p> ```python 37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes 50: onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \ 8862: for device in get_all_device_types(): 9629: for dt1 in get_all_math_dtypes(device): 9630: for dt2 in get_all_math_dtypes(device): 9631: for dt3 in get_all_math_dtypes(device): 9648: for input_dtype in get_all_math_dtypes(device): 9664: for input_dtype in get_all_math_dtypes(device): 13015: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13034: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13159: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17400: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17768: dtypesIfCUDA(get_all_fp_dtypes()) 17773: dtypesIfCUDA(get_all_fp_dtypes()) 17778: dtypesIfCUDA(get_all_fp_dtypes()) 17783: dtypesIfCUDA(get_all_fp_dtypes()) 17788: dtypesIfCUDA(get_all_fp_dtypes()) 17793: dtypesIfCUDA(get_all_fp_dtypes()) 17798: dtypesIfCUDA(get_all_fp_dtypes()) 17963: dtypesIfCUDA(get_all_fp_dtypes()) 17977: dtypesIfCUDA(get_all_fp_dtypes()) 18684: def test_cross_entropy_loss_prob_target_all_reductions(self, device): ``` </p> </details> <details> <summary> `test/test_numpy_interop.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import get_all_dtypes 399: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_ops.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes 86: for dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_reductions.py`</summary> <p> ```python 16: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 360: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 366: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 394: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 750: for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]: 1404: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1457: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1458: get_all_complex_dtypes())) 1465: return dtype in get_all_int_dtypes() 1494: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1501: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1507: dtypes((get_all_complex_dtypes())) 1514: dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)) 1523: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1531: if dtype in get_all_fp_dtypes(): 1608: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, 1837: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1855: dtypes((set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8})) 3219: for dtype in get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_serialization.py`</summary> <p> ```python 26:from torch.testing._internal.common_dtype import get_all_dtypes 586: for device, dtype in product(devices, get_all_dtypes()): 589: for other_dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_shape_ops.py`</summary> <p> ```python 18:from torch.testing._internal.common_dtype import get_all_dtypes 230: dtypes(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, 232: dtypesIfCUDA(get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False)) 344: dtypes(get_all_dtypes()) 443: dtypes(get_all_dtypes()) 461: dtypes(get_all_dtypes()) 570: dtypes(get_all_dtypes(include_complex=False)) ``` </p> </details> <details> <summary> `test/test_sort_and_select.py`</summary> <p> ```python 12: all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, 136: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 231: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 296: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 647: dtypesIfCUDA(get_all_fp_dtypes()) 678: dtypesIfCUDA((get_all_dtypes(include_complex=False, 682: dtypes((get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False))) 739: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 740: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) 799: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 800: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) ``` </p> </details> <details> <summary> `test/test_sparse.py`</summary> <p> ```python 20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes 29: floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes, 1963: return dtype in get_all_int_dtypes() 1994: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2103: return dtype in get_all_int_dtypes() 2138: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2626: all_sparse_dtypes = get_all_dtypes(include_complex=True) 2633: all_sparse_dtypes = get_all_dtypes(include_complex=True) 3230: dtypes(get_all_complex_dtypes(), 3231: get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 3234: get_all_fp_dtypes( ``` </p> </details> <details> <summary> `test/test_sparse_csr.py`</summary> <p> ```python 7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor 17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes 120: dtypes(get_all_dtypes()) 133: dtypes(get_all_dtypes()) 150: dtypes(get_all_dtypes()) 180: dtypes(get_all_dtypes()) 201: dtypes(get_all_dtypes()) 210: dtypes(get_all_dtypes()) 225: dtypes(get_all_dtypes()) 244: dtypes(get_all_dtypes()) 263: dtypes(get_all_dtypes()) 285: dtypes(get_all_dtypes()) 411: dtypes(get_all_dtypes()) 482: dtypes(get_all_dtypes()) 502: dtypes(get_all_dtypes()) 562: dtypes(get_all_dtypes()) 588: dtypesIfCUDA(get_all_complex_dtypes(), 589: get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater)) 745: dtypesIfCUDA(get_all_complex_dtypes(), 746: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 765: dtypesIfCUDA(get_all_complex_dtypes(), 766: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 801: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 841: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 1182: dtypes(get_all_dtypes()) 1276: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False)) 1286: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_tensor_creation_ops.py`</summary> <p> ```python 21: onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types) 23: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 150: for dt in get_all_dtypes(): 160: for dt in get_all_dtypes(): 314: dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16] 1012: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1013: get_all_complex_dtypes())) 1032: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1033: get_all_complex_dtypes())) 1050: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1051: get_all_complex_dtypes())) 1745: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1779: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1868: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1926: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1954: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 1956: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None) 1957: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 2538: for device in get_all_device_types(): 2645: for dtype in get_all_dtypes(): 2678: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False) + 2679: get_all_complex_dtypes())) 2716: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 2827: for dt in get_all_dtypes(): 2913: dtypes(get_all_dtypes(include_bool=False, include_half=False)) 2914: dtypesIfCUDA(get_all_dtypes(include_bool=False, include_half=True)) 3028: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3033: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3074: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) 3075: dtypesIfCUDA(((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16]) 3077: else get_all_dtypes(include_bool=False, include_half=True, include_complex=False))) 3873: dtypes(get_all_dtypes()) 3884: dtypes(get_all_dtypes(include_bool=False)) 3916: for other in get_all_dtypes(): 3922: dtypes(get_all_dtypes()) 3932: dtypes(get_all_dtypes(include_bool=False)) 3955: dtypes(get_all_dtypes(include_bool=False)) 3961: dtypes(get_all_dtypes(include_bool=False)) 3965: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_testing.py`</summary> <p> ```python 25:from torch.testing._internal.common_dtype import get_all_dtypes 31: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_torch.py`</summary> <p> ```python 51: expectedAlertNondeterministic, get_all_device_types, skipXLA) 57: get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes 296: for d in get_all_device_types(): 323: for device in get_all_device_types(): 324: for dt1 in get_all_dtypes(): 325: for dt2 in get_all_dtypes(): 343: all_dtypes = get_all_dtypes() 350: all_dtypes = get_all_dtypes() 781: for dtype in get_all_dtypes(): 986: for device in get_all_device_types(): 1017: for device in get_all_device_types(): 1018: for dtype in get_all_math_dtypes(device): 2792: for device in get_all_device_types(): 3186: dtypes(get_all_dtypes()) 3195: for error_dtype in get_all_dtypes(): 3203: dtypes(get_all_dtypes()) 3212: for error_dtype in get_all_dtypes(): 4539: dtypes(get_all_fp_dtypes()) 4545: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 4577: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 4578: dtypesIfCPU((get_all_fp_dtypes(include_half=False, include_bfloat16=True))) 4579: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4599: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4600: dtypesIfCPU((get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False))) 4601: dtypesIfCUDA((get_all_dtypes(include_bfloat16=False, include_complex=False))) 4613: for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False): 4628: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4629: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4640: dtypes(get_all_fp_dtypes()) 4723: dtypes(get_all_fp_dtypes()) 4735: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 4736: dtypesIfCUDA(get_all_fp_dtypes()) 4747: dtypes(get_all_fp_dtypes()) 4761: dtypes(get_all_fp_dtypes()) 4771: dtypes(get_all_fp_dtypes()) 4792: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 5302: dtypes(get_all_dtypes(include_bfloat16=False)) 5322: dtypes(get_all_dtypes(include_half=False, include_bfloat16=False)) 5323: dtypesIfCPU(get_all_dtypes(include_bfloat16=False)) 5324: dtypesIfCUDA(get_all_dtypes(include_bfloat16=False)) 5591: for dt in get_all_dtypes(): 5611: for dt in get_all_dtypes(): 5678: for dt in get_all_dtypes(): 5696: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 5697: dtypes(set(get_all_math_dtypes('cpu'))) 5746: dtypes(get_all_dtypes()) 5780: dtypes(get_all_dtypes()) 5885: dtypes(get_all_dtypes()) 5902: dtypes(get_all_dtypes()) 5945: dtypes(get_all_dtypes()) 5979: dtypes(get_all_dtypes(include_bool=False)) 6049: dtypes(get_all_dtypes(include_bool=False)) 6092: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6093: get_all_complex_dtypes())) 6094: dtypesIfCPU(get_all_dtypes()) 6095: dtypesIfCUDA(get_all_dtypes()) 6122: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6123: get_all_complex_dtypes())) 6124: dtypesIfCPU(get_all_dtypes()) 6125: dtypesIfCUDA(get_all_dtypes()) 6163: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6164: get_all_complex_dtypes())) 6165: dtypesIfCPU(get_all_dtypes()) 6166: dtypesIfCUDA(get_all_dtypes()) 6190: dtypes((get_all_complex_dtypes() + 6191: get_all_int_dtypes())) 6238: dtypes(get_all_dtypes()) 6323: dtypes(get_all_dtypes()) 6389: dtypes(product(get_all_dtypes(), (torch.uint8, torch.bool))) 6699: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 6700: dtypes(set(get_all_math_dtypes('cpu'))) 7452: dtypes(get_all_dtypes(include_bool=False)) 7461: dtypes(get_all_dtypes(include_bool=False)) 7477: dtypes(get_all_dtypes(include_bool=False)) 7496: dtypes(get_all_dtypes(include_bool=False)) 7538: dtypes(get_all_dtypes(include_bool=False)) 8162: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8163: get_all_complex_dtypes())) 8175: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8176: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_type_promotion.py`</summary> <p> ```python 14: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes 187: for dtype in get_all_dtypes(): 262: dtypes1 = get_all_math_dtypes('cuda') 263: dtypes2 = get_all_math_dtypes(device) 339: dtypes(itertools.product(get_all_dtypes(), get_all_dtypes())) 468: for dt1 in get_all_math_dtypes(device): 469: for dt2 in get_all_math_dtypes(device): 519: for dt1 in get_all_math_dtypes(device): 520: for dt2 in get_all_math_dtypes(device): 528: for dt in get_all_math_dtypes(device): 561: for dtype in get_all_dtypes(): 766: dtypes=get_all_math_dtypes(device)) 771: dtypes=get_all_math_dtypes(device)) 782: dtypes=get_all_math_dtypes(device)) 879: dtypes = get_all_dtypes(include_bfloat16=False) 898: dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False) 965: dtypesIfCUDA(itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False), 966: get_all_dtypes(include_bfloat16=False, include_complex=False))) 967: dtypes(itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False, 969: get_all_dtypes(include_half=False, include_bfloat16=False, 976: return dtype in get_all_int_dtypes() + [torch.bool] 979: return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False) ``` </p> </details> <details> <summary> `test/test_unary_ufuncs.py`</summary> <p> ```python 24: floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes, 25: get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 517: dtypes((get_all_int_dtypes() + [torch.bool] + 518: get_all_fp_dtypes(include_bfloat16=False))) 596: dtypes(get_all_fp_dtypes(include_half=True, include_bfloat16=False)) 611: invalid_input_dtypes = get_all_int_dtypes() + \ 612: get_all_complex_dtypes() + \ 619: for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False): 1048: dtypes(get_all_math_dtypes('cpu')) 1182: dtypesIfCUDA(get_all_fp_dtypes()) 1190: dtypesIfCUDA(get_all_fp_dtypes()) 1205: dtypesIfCUDA(get_all_fp_dtypes()) 1215: dtypesIfCUDA(get_all_fp_dtypes()) 1307: dtypes((get_all_dtypes(include_bool=False))) 1349: dtypes((get_all_fp_dtypes(include_half=False) + 1350: get_all_complex_dtypes())) 1351: dtypesIfCUDA((get_all_fp_dtypes(include_half=True) + 1352: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_view_ops.py`</summary> <p> ```python 19: get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 124: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 131: dtypes(get_all_dtypes(include_bfloat16=False)) 213: for view_dtype in [get_all_fp_dtypes(), get_all_complex_dtypes()]: 220: dtypes(get_all_dtypes()) 224: for view_dtype in get_all_dtypes(): 305: dtypes(get_all_complex_dtypes(include_complex32=True)) 343: dtypes(get_all_dtypes()) 354: dtypes(get_all_dtypes()) 364: dtypes(get_all_dtypes()) 374: dtypes(get_all_dtypes()) 384: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 395: dtypes(get_all_complex_dtypes()) 426: dtypes(get_all_complex_dtypes()) 451: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 1263: dtypes((torch.testing.get_all_dtypes())) 1279: dtypes((torch.testing.get_all_dtypes())) 1405: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1406: get_all_complex_dtypes())) 1471: dtypes(get_all_dtypes(include_bfloat16=False)) 1574: dtypes(get_all_dtypes()) 1601: dtypes(get_all_dtypes(include_bfloat16=False)) 1632: dtypes(*get_all_dtypes(include_bfloat16=False)) 1711: for dt in get_all_dtypes(): 1717: for dt in get_all_dtypes(): 1724: for dt in get_all_dtypes(): ``` </p> </details> I'm looking forward to your viewpoints. Thanks :) cc: mruberry kshitij12345 anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561 Reviewed By: samdow Differential Revision: D34856571 Pulled By: mruberry fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335 (cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)	2022-03-15 20:31:41 +00:00
Emilio Castillo	3186e366d1	Support `0`s in `out_size` of `FractionalMaxPoolNd` Fixes #73624 CUDA implementation was correct :), only CPU had an out of bounds memory access Pull Request resolved: https://github.com/pytorch/pytorch/pull/73634 Approved by: jbschlosser	2022-03-03 15:39:44 +00:00
soulitzer	e6afa4f771	batch_norm_jvp: improve error message when running_{mean,var} have forward grad defined (#73655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73655 Fixes: https://github.com/pytorch/pytorch/issues/73541 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D34586758 Pulled By: soulitzer fbshipit-source-id: 689dba3ac159e50b596381c27e23ef1fd8122a40 (cherry picked from commit 81ea860fbe3c217b0100730f4b74e8d5f9bf1b61)	2022-03-02 21:31:29 +00:00
François Lecomte	1dd3f950ba	Optimize grid sample 3d Fixes #71415 I have implemented the changes that replicate what @to-mi did in this [PR](https://github.com/pytorch/pytorch/pull/65986#issue-1012959443) for the 3D case : > Fixes #64977 > > Avoids creating a tensor for and calculating `input` gradient if it's not needed in the backward pass of `grid_sample` (2d case, native CPU & CUDA kernels). Especially the tensor creation seemed time consuming (see #64977). > > Brief description of the changes: > > * I have tried to go with rather minimal changes. It would probably be possible to make a more elegant version with a bit larger refactoring (or possibly with better understanding of PyTorch internals and C++ functionalities). > > * Changed the `native_functions.yaml` and `derivatives.yaml` so that the gradient input mask is passed to the functions. > > * Changed the CPU kernels: > (1) added `bool input_requires_grad` template parameter to the `backward` function, > (2) added if branches based on it to remove `input` gradient computations if it's not requested, > (3) feed in `TensorAccessor<scalar_t, 3>* gInp_slice_ptr` instead of `TensorAccessor<scalar_t, 3>& gInp_slice` so that I can pass a `nullptr` in case gradient for `input` is not requested. (A bit inelegant perhaps, but allows to keep one signature for `backward` function and not require breaking it to smaller pieces. Perhaps there's a more elegant way to achieve this?) > > * Changed CUDA kernel: > (1) added ~`bool input_requires_grad` template parameter~ `const bool input_requires_grad` argument to the `backward` function, > (2) added if branches based on it to remove `input` gradient computations if it's not requested, > (3) feed in `TensorInfo<scalar_t, index_t>()` instead of `getTensorInfo<scalar_t, index_t>(grad_input)` in case gradient for `input` is not requested. > > * Modified tests in `test/test_nn.py` so that they run also cases with no `input` gradient needed. > > * Have not touched the CPU fallback kernel. Note: the changes number (3) are N/A in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71759	2022-02-23 19:25:17 +00:00
Rui Zhu	f41db99a56	Add simple correctness check for native MHA (#72941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72941 Simple test for MHA, use cos similarity as metric since scaling generate mismatch. Cuda is validated, CPU fix a following (We can land this with onlyCuda flag, and remove it once CPU is also done) Test Plan: For cuda: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_native_multihead_attention_cuda_float32 2>&1 \| pastry Reviewed By: swolchok Differential Revision: D33906921 fbshipit-source-id: ad447401eb7002f22ed533d620a6b544524b3f58 (cherry picked from commit `45b778da27`)	2022-02-19 00:31:45 +00:00
Scott Wolchok	79a216ce57	Move native MHA code out of PyTorch core (#72944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72944 Doesn't make sense to develop it in core right now. ghstack-source-id: 149456040 Test Plan: CI run MHA benchmark in benchmark_transformers.py to make sure it doesn't crash Reviewed By: zrphercule Differential Revision: D34283104 fbshipit-source-id: 4f0c7a6bc066f938ceac891320d4cf4c3f8a9cd6 (cherry picked from commit `b9df65e97c`)	2022-02-18 21:34:06 +00:00
zsef123	e0e1e0b114	Fix empty tensor handling in RReLU (#70496 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70489 Add handling if `numel == 0` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70496 Reviewed By: zou3519, cpuhrsch Differential Revision: D34286069 Pulled By: jbschlosser fbshipit-source-id: a63797fe1ea05e5a192bc8e43327949b36ceebf2 (cherry picked from commit `b410abe85e`)	2022-02-17 14:37:47 +00:00
Scott Wolchok	ae8198121c	[PyTorch] Handle non-vectorizable parameters for native MHA CUDA rescale kernel (#72671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72671 The existing kernel did not handle cases where D % 4 != 0 or dim_per_head % 4 != 0. Now we have a non-vectorized kernel for these cases. ghstack-source-id: 149201477 Test Plan: Updated test_nn to cover these cases. Reviewed By: zrphercule, ngimel Differential Revision: D34119371 fbshipit-source-id: 4e9b4d9b636224ef2c433593f6f236df040de782 (cherry picked from commit `f5393878e4`)	2022-02-16 18:33:31 +00:00
Scott Wolchok	ad623fdecf	[PyTorch] MHA: add test for transform_bias_rescale_qkv (#72464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72464 We had some trouble getting this component (and this test!) right, so let's test it. ghstack-source-id: 149201478 Test Plan: new test passes Reviewed By: zrphercule Differential Revision: D33992477 fbshipit-source-id: cc377eed5d4a4412b42bdabf360601c6e52947cf (cherry picked from commit `9832867b12`)	2022-02-16 18:11:56 +00:00
Eddie Yan	e7985e3c60	Properly initialize `grad_weight` in `raw_cudnn_convolution_backward_weight_out` (#72157 ) Summary: https://github.com/pytorch/pytorch/issues/71521 attempted to fix an issue where the `test_conv_large` test was producing `NaN` values after the backward pass, yielding a bogus comparison between the result and the expected result. While tweaking the initialization of the conv layer seemed to fix this behavior, it was actually just masking the real issue, which was that `grad_weight` is not guaranteed to be initialized in `raw_cudnn_convolution_backward_weight_out` when the backward operation is split. Specifically, the `grad_weight` tensor is expected to be directly written to by a `cudnn` kernel (which does occur in most cases) so it does not need to be initialized, but splitting introduces an intermediate `grad_weight_` tensor that holds the intermediate gradients and then accumulates into `grad_weight` without initializing it first. This PR tweaks this behavior so that now accumulation is done with a zero'd tensor, and also adds the change of doing the accumulation in an accumulation dtype. The hacky workaround masking the issue is also reverted, with the safeguard against comparing `NaN` values (using the reference tensor for scale computation) kept in place. CC ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/72157 Reviewed By: malfet Differential Revision: D34147547 Pulled By: ngimel fbshipit-source-id: 056c19f727eeef96347db557528272e24eae4223 (cherry picked from commit `24c7f77a81`)	2022-02-14 17:26:37 +00:00
Ryan Spring	4f8b986e28	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: VitalyFedyunin Differential Revision: D33894937 Pulled By: jbschlosser fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851 (cherry picked from commit `6e986f91a9`)	2022-02-14 03:40:32 +00:00
kshitij12345	a2e545e6c5	pad_sequence: fix regression - support tensor (#72436 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/71365 Based on https://github.com/pytorch/pytorch/pull/72343 Thanks jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/72436 Reviewed By: bdhirsh Differential Revision: D34117724 Pulled By: jbschlosser fbshipit-source-id: e5d6599d0791025e18ab36ae16c417a91554bf64 (cherry picked from commit `ffe8a0e41b`)	2022-02-10 22:36:33 +00:00
Alban Desmaison	7035738b50	Change ParameterList and ParameterDict to be able to contain any kind of objects (#70499 ) Summary: The only difference with plain list/dict now is that nn.Parameters are handled specially and registered as parameters properly. test_nn and parametrization works locally. Will see in CI if DP is fixed as well. Tentative fix for https://github.com/pytorch/pytorch/issues/36035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70499 Reviewed By: jbschlosser, alexeib Differential Revision: D34005332 Pulled By: albanD fbshipit-source-id: 7e76b0873d0fec345cb537e2a6ecba0258e662b9 (cherry picked from commit `dc1e6f8d86`)	2022-02-09 18:52:29 +00:00
Horace He	7cdbbfaee2	Revert D33716716: [pytorch][PR] Added remove_duplicate parameter to `nn.Module` Test Plan: revert-hammer Differential Revision: D33716716 (`7e8217549f`) Original commit changeset: ff1ed9980bd1 Original Phabricator Diff: D33716716 (`7e8217549f`) fbshipit-source-id: 91c3d9acc5bc731da716dd0d2485431f85f861c9 (cherry picked from commit `c81d193bf0`)	2022-02-03 09:04:29 +00:00
kshitij12345	02f6226bff	[fix] Dropout2d-3d no-batch-dim (#69885 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69801 TODO: * [x] Update C++ API cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69885 Reviewed By: mruberry Differential Revision: D33175470 Pulled By: jbschlosser fbshipit-source-id: c9d7d9e0f59ba290a0157725c338a345f3d58b9f (cherry picked from commit `7e4271a156`)	2022-02-02 16:40:32 +00:00
kshitij12345	aa5dab02b2	[fix] EmbeddingBag segfault for out-of-bounds idx (#71904 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/71094 Added checks for out-of-bound indices Pull Request resolved: https://github.com/pytorch/pytorch/pull/71904 Reviewed By: jbschlosser, VitalyFedyunin Differential Revision: D33893387 Pulled By: ngimel fbshipit-source-id: 0ba7038bd7e10c6fa6700646a0fe755b73db0ec9 (cherry picked from commit `4d6ae2e3f4`)	2022-02-02 00:04:26 +00:00
pejato	b8a4ee5e35	Clean up old warnings in F.interpolate (#72093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/71720 This PR removes the old warnings for `recompute_scale_factor` and `align_corners`. Looking at this, I realize that the tests I modified don't really catch whether or not a warning is created for `recompute_scale_factor`. If desired, I can add a couple lines into the tests there to pass a floating point in the `scale_factors` kwarg, along with `recompute_scale_factor=None`. Let me know how this looks, thanks so much! Pull Request resolved: https://github.com/pytorch/pytorch/pull/72093 Reviewed By: mruberry Differential Revision: D33917615 Pulled By: albanD fbshipit-source-id: e822f0a15b813ecf312cdc6ed0b693e7f1d1ca89 (cherry picked from commit `c14852b85c`)	2022-02-01 21:18:29 +00:00
Horace He	7e8217549f	Added remove_duplicate parameter to `nn.Module` (#39 ) Summary: Pull Request resolved: https://github.com/pytorch/torchrec/pull/39 Pull Request resolved: https://github.com/facebookresearch/torchrec/pull/6 This makes it so that shared parameters get their own entry in `named_parameters`. More broadly, this makes it so that ``` params_and_buffers = {mod.named_named_parameters(remove_duplicate=False), mod.named_buffers(remove_duplicate=False)} _stateless.functional_call(mod, params_and_buffers, args, kwargs) ``` is identical to calling the original module's forwards pass. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/71542 Reviewed By: jbschlosser, albanD Differential Revision: D33716716 Pulled By: Chillee fbshipit-source-id: ff1ed9980bd1a3f7ebaf695ee5e401202b543213 (cherry picked from commit `d6e3ad3cd0`)	2022-02-01 18:34:58 +00:00
Nikita Shulga	74c44ba9d6	Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation Test Plan: revert-hammer Differential Revision: D33850228 (`23d03025dc`) Original commit changeset: 3cc33fb298e4 Original Phabricator Diff: D33850228 (`23d03025dc`) fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692 (cherry picked from commit `c9efb58223`)	2022-01-31 17:44:19 +00:00
Ryan Spring	23d03025dc	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: cpuhrsch Differential Revision: D33850228 Pulled By: jbschlosser fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33 (cherry picked from commit `3a53b3e94f`)	2022-01-31 17:07:45 +00:00
Jake Tae	ca61292465	Add append method for nn.Sequential (#71326 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/71249, and potentially supersedes https://github.com/pytorch/pytorch/pull/20274. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71326 Reviewed By: cpuhrsch Differential Revision: D33855047 Pulled By: jbschlosser fbshipit-source-id: a3a682e206f93b4c52bc3405e2f7b26aea6635ea (cherry picked from commit `c0b27bbf2a`)	2022-01-31 16:54:12 +00:00
Joel Schlosser	cb823d9f07	Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation Test Plan: revert-hammer Differential Revision: D33744717 (`f499ab9cef`) Original commit changeset: d64532a562ed Original Phabricator Diff: D33744717 (`f499ab9cef`) fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93 (cherry picked from commit `e9fb2d1db1`)	2022-01-28 18:35:01 +00:00
Ryan Spring	f499ab9cef	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: mikaylagawarecki Differential Revision: D33744717 Pulled By: jbschlosser fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187 (cherry picked from commit `4713dd9cca`)	2022-01-28 16:59:09 +00:00
vfdev-5	eeda31fa08	Added antialias flag to interpolate (CUDA, bilinear and bicubic) (#70930 ) Summary: Description: - Added antialias flag to interpolate (CUDA) - forward and backward for bicubic mode - added tests Previous PR for CPU bilinear, https://github.com/pytorch/pytorch/pull/65142 Previous PR for CPU bicubic, https://github.com/pytorch/pytorch/pull/68819 ### Benchmarks <details> <summary> Bilinear forward pass, PIL, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2851.2 \| 874.1 \| 57.1 channels_last non-contiguous torch.float32 \| 2856.1 \| 1155.8 \| 130.6 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3705.9 \| 1005.8 \| 66.3 channels_last non-contiguous torch.float32 \| 3742.9 \| 1332.8 \| 143.5 Times are in microseconds (us). [------------------------------------ Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 1768.0 \| 725.2 \| 77.9 channels_last non-contiguous torch.float32 \| 1753.7 \| 942.5 \| 144.0 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 9522.6 \| 2593.8 \| 157.8 channels_last non-contiguous torch.float32 \| 9513.5 \| 3622.7 \| 241.5 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2240.1 \| 565.5 \| 93.3 channels_last non-contiguous torch.float32 \| 2244.2 \| 972.7 \| 170.8 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 1441.3 \| 386.1 \| 22.3 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 1815.2 \| 376.8 \| 27.8 Times are in microseconds (us). [-------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) --------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 962.3 \| 400.0 \| 29.4 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) -------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 4749.7 \| 910.1 \| 63.7 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) -------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 1098.1 \| 272.0 \| 36.4 Times are in microseconds (us). ``` </details> <details> <summary> Bicubic forward pass, PIL, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 4522.4 \| 1406.7 \| 170.3 channels_last non-contiguous torch.float32 \| 4530.0 \| 1435.4 \| 242.2 Times are in microseconds (us). [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 5726.4 \| 1628.6 \| 164.0 channels_last non-contiguous torch.float32 \| 5722.6 \| 1665.6 \| 234.7 Times are in microseconds (us). [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) ------------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2909.1 \| 1461.5 \| 276.9 channels_last non-contiguous torch.float32 \| 2892.9 \| 1458.7 \| 345.1 Times are in microseconds (us). [----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 14699.2 \| 4283.9 \| 407.1 channels_last non-contiguous torch.float32 \| 14711.3 \| 4321.1 \| 477.0 Times are in microseconds (us). [----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -----------------------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3467.0 \| 980.0 \| 339.2 channels_last non-contiguous torch.float32 \| 3465.2 \| 982.3 \| 407.8 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 2396.7 \| 877.8 \| 68.1 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 3068.2 \| 777.3 \| 64.7 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 1540.2 \| 829.3 \| 100.4 Times are in microseconds (us). [------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 7919.5 \| 1467.8 \| 151.6 Times are in microseconds (us). [------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------------------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 \| 1695.7 \| 631.2 \| 117.7 Times are in microseconds (us). ``` </details> <details> <summary> Bilinear backward pass, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` - Measure only backward op Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) ------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 4686.8 \| 215.7 channels_last non-contiguous torch.float32 \| 5101.1 \| 220.5 Times are in microseconds (us). [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) ------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 6011.2 \| 204.4 channels_last non-contiguous torch.float32 \| 6396.0 \| 210.0 Times are in microseconds (us). [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2035.6 \| 250.2 channels_last non-contiguous torch.float32 \| 1589.6 \| 252.5 Times are in microseconds (us). [------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 11392.5 \| 256.5 channels_last non-contiguous torch.float32 \| 11640.2 \| 263.9 Times are in microseconds (us). [------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 11769.6 \| 465.9 channels_last non-contiguous torch.float32 \| 12407.0 \| 474.4 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) ----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 3931.0 \| 133.3 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) ----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 5594.8 \| 133.9 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) -----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 1272.6 \| 133.0 Times are in microseconds (us). [--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 10618.1 \| 134.0 Times are in microseconds (us). [--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 11082.2 \| 154.6 Times are in microseconds (us). ``` </details> <details> <summary> Bicubic backward pass, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` - Measure only backward op Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 6791.2 \| 618.9 channels_last non-contiguous torch.float32 \| 7125.2 \| 622.9 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 8806.2 \| 600.3 channels_last non-contiguous torch.float32 \| 9167.6 \| 607.5 Times are in microseconds (us). [-------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3683.6 \| 693.8 channels_last non-contiguous torch.float32 \| 3617.4 \| 695.0 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 17548.2 \| 779.4 channels_last non-contiguous torch.float32 \| 17966.2 \| 786.5 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 28.4 \| 1.6 channels_last non-contiguous torch.float32 \| 28.4 \| 1.6 Times are in milliseconds (ms). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) -----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 6266.1 \| 208.5 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) -----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 8218.3 \| 200.8 Times are in microseconds (us). [----- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) -----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 3458.9 \| 231.9 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 15729.3 \| 261.6 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----] \| 1.11.0a0+gitd032369 cpu \| 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 \| 26279.8 \| 547.0 Times are in microseconds (us). ``` </details> Code is moved from torchvision: https://github.com/pytorch/vision/pull/4211 and optimized Pull Request resolved: https://github.com/pytorch/pytorch/pull/70930 Reviewed By: zou3519 Differential Revision: D33817902 Pulled By: jbschlosser fbshipit-source-id: d63a620f8972ff36b63841f0bc6c820466f58f69 (cherry picked from commit `d358cfdb7d`)	2022-01-27 20:43:08 +00:00
Khushi Agrawal	dfcbe059ec	Obliviate ALL_TENSORTYPES and ALL_TENSORTYPES2. (#71153 ) Summary: Hi, The PR fixes https://github.com/pytorch/pytorch/issues/71096. It aims to scan all the test files and replace ` ALL_TENSORTYPES` and `ALL_TENSORTYPES2` with `get_all_fp_dtypes`. I'm looking forward to your viewpoints! Thanks! cc: janeyx99 kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71153 Reviewed By: jbschlosser, mruberry Differential Revision: D33533346 Pulled By: anjali411 fbshipit-source-id: 75e79ca2756c1ddaf0e7e0289257fca183a570b3 (cherry picked from commit `da54b54dc5`)	2022-01-26 03:25:02 +00:00
eqy	166d4e4201	Change `test_conv_large` parameter initialization (#71521 ) Summary: This PR twiddles the parameters of the conv layer in `test_conv_large` to better avoid NaN values. Previously, this test would cause a NaN to be computed for `scale` (propagated from `.mean()` on the `.grad` tensor). This NaN would then be propagated to the scaled gradients via division, resulting in a bogus `assertEqual` check as `NaN == NaN` is by default true. (This behavior was observed on V100 and A100). To improve visibility of failures in the event of NaNs in `grad1`, scale is now computed from `grad2`. Interestingly enough, we discovered this issue when trying out some less common setups that broke this test; it turns out those breakages were cases where there were no NaN values (leading to an actual `assertEqual` check that would fail for `float16`). CC ptrblck ngimel puririshi98 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71521 Reviewed By: anjali411 Differential Revision: D33776705 Pulled By: ngimel fbshipit-source-id: a1ec4792cba04c6322b22ef5b80ce08579ea4cf6 (cherry picked from commit `d207bd9b87`)	2022-01-26 02:32:15 +00:00
Emilio Castillo	6848e0dae5	Fix RNN modules with inputs shapes containing-0 in CUDA (#71696 ) Summary: We found a discrepancy between cpu & CUDA when using RNN modules where input shapes containing 0s would cause an invalid configuration argument error in CUDA (kernel grid size is 0), while returning a valid tensor in CPU cases. A reproducer: ``` import torch x = torch.zeros((5, 0, 3)).cuda() gru = torch.nn.GRU(input_size=3, hidden_size=4).to("cuda") gru(x) ``` Run with `CUDA_LAUNCH_BLOCKING=1` set. cc ngimel albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/71696 Reviewed By: mikaylagawarecki Differential Revision: D33743674 Pulled By: ngimel fbshipit-source-id: e9334175d10969fdf1f9c63985910d944bbd26e7 (cherry picked from commit `70838ba69b`)	2022-01-25 18:32:13 +00:00
kshitij12345	0a2cdd18f3	nice error msg from load_state_dict for non-tensor value (#70596 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70596 Reviewed By: anjali411 Differential Revision: D33710750 Pulled By: jbschlosser fbshipit-source-id: 870b5fafffcd005fd4fcd62f865542739c133805 (cherry picked from commit `da374fbc58`)	2022-01-21 22:02:13 +00:00
mingfeima	84b1c9798c	add BFloat16 support for AvgPool2d on CPU (#66927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66927 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33353198 Pulled By: VitalyFedyunin fbshipit-source-id: 1aeaa4bb90ac99210b8f6051c09d6995d06ce3a1	2022-01-14 07:59:10 -08:00
mingfeima	910c01020e	add BFloat16 support for AdaptiveMaxPool2d on CPU (#66929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66929 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33353199 Pulled By: VitalyFedyunin fbshipit-source-id: d402d5deb7ca766259ca42118ddc16625e134c4c	2022-01-13 20:00:42 -08:00
Jake Tae	eac3decf93	ModuleList concatenation (#70887 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70441. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70887 Reviewed By: ejguan Differential Revision: D33555431 Pulled By: albanD fbshipit-source-id: ce42459ee46a611e98e89f02686acbac16b6b668	2022-01-13 15:31:07 -08:00
mingfeima	385773cb77	add BFloat16 support for MaxPool2d on CPU (#56903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56903 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D28836791 Pulled By: VitalyFedyunin fbshipit-source-id: e03d55cc30dfa3628f096938fbad34b1031948af	2022-01-12 14:20:20 -08:00
Ilya Persky	a8612cd72a	Skip failing tests in test_nn if compiled without LAPACK (#70913 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70912 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70913 Reviewed By: mruberry Differential Revision: D33534840 Pulled By: albanD fbshipit-source-id: 0facf5682140ecd7a78edb34b9cd997f9319e084	2022-01-11 12:21:18 -08:00
George Qi	d7db5fb462	ctc loss no batch dim support (#70092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70092 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33280068 Pulled By: george-qi fbshipit-source-id: 3278fb2d745a396fe27c00fb5f40df0e7f584f81	2022-01-07 14:33:22 -08:00
Bin Bao	f135438d3b	Dispatch to at::convolution intead of at::_convolution in _convolution_double_backward (#70661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70661 Dispatching to at::convolution can make Lazy Tensor trace the right convolution op. Test Plan: pytest test/test_nn.py -k test_conv_double_backward_strided_with_3D_input_and_weight Reviewed By: wconstab, jbschlosser Differential Revision: D33428780 Pulled By: desertfire fbshipit-source-id: 899e4135588ea33fff23d16103c25d9bcd3f902c	2022-01-07 07:53:46 -08:00
Joel Schlosser	e6befbe85c	Add flag to optionally average output attention weights across heads (#70055 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055 Reviewed By: bhosmer Differential Revision: D33457866 Pulled By: jbschlosser fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5	2022-01-06 17:32:37 -08:00
Joel Schlosser	7b8f73dd32	No-batch-dim support for ConvNd (#70506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70506 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33355034 Pulled By: jbschlosser fbshipit-source-id: 5a42645299b1d82cee7d461826acca1c5b35a71c	2022-01-06 16:53:50 -08:00
Jake Tae	b7742b437a	Allow RNN hidden_size to be 0 (#70556 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56767. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70556 Reviewed By: ngimel Differential Revision: D33455156 Pulled By: jbschlosser fbshipit-source-id: 5dc57b09d7beb6ae81dfabc318e87c109bb4e6ae	2022-01-06 14:18:36 -08:00
Jane Xu	c00d33033c	Remove repeat test for types in test nn (#70872 ) Summary: Helps fix a part of https://github.com/pytorch/pytorch/issues/69865 The first commit just migrates everything as is. The second commit uses the "device" variable instead of passing "cuda" everywhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/70872 Reviewed By: jbschlosser Differential Revision: D33455941 Pulled By: janeyx99 fbshipit-source-id: 9d9ec8c95f1714c40d55800e652ccd69b0c314dc	2022-01-06 09:20:02 -08:00
soulitzer	3051aabd0e	Add forward AD formulas for convolution and some others (#69956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69956 Test Plan: Imported from OSS Reviewed By: albanD, bdhirsh Differential Revision: D33235974 Pulled By: soulitzer fbshipit-source-id: ea60d687edc5d62d92f3fd3cb6640421d32c908c	2022-01-06 08:39:51 -08:00
Joel Schlosser	b60b1b100f	Set cuDNN deterministic flag for test_conv_double_backward_cuda (#69941 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69941 Reviewed By: george-qi Differential Revision: D33430727 Pulled By: jbschlosser fbshipit-source-id: 4a250bd0e5460ee631730afe0ab68ba72f37d292	2022-01-05 10:05:56 -08:00
kshitij12345	7bfaa230be	[nn] adaptive_avg_pool{1/2/3}d : Error on negative `output_size` (#70488 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70488 Reviewed By: H-Huang Differential Revision: D33367289 Pulled By: jbschlosser fbshipit-source-id: 6b7b89d72c4e1e049ad6a0addb22a261c28ddb4c	2021-12-30 14:42:11 -08:00
mingfeima	401a6b682b	add BFloat16 support for AdaptiveAvgPool2d on CPU (#56902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56902 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D28836789 Pulled By: VitalyFedyunin fbshipit-source-id: caac5e5b15190b8010bbfbc6920aa44032208ee7	2021-12-30 11:58:37 -08:00
vfdev	d2abf3f981	Added antialias flag to interpolate (CPU only, bicubic) (#68819 ) Summary: Description: - Added antialias flag to interpolate (CPU only) - forward and backward for bicubic mode - added tests Previous PR for bilinear, https://github.com/pytorch/pytorch/pull/65142 ### Benchmarks <details> <summary> Forward pass, CPU. PTH interpolation vs PIL </summary> Cases: - PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apples vs pears) - PTH 1 Channel, float32 vs PIL 1 Channel Float Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 1 [------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 4.5 \| 5.2 channels_last non-contiguous torch.float32 \| 4.5 \| 5.3 Times are in milliseconds (ms). [------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 5.7 \| 6.4 channels_last non-contiguous torch.float32 \| 5.7 \| 6.4 Times are in milliseconds (ms). [------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) --------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3.0 \| 4.0 channels_last non-contiguous torch.float32 \| 2.9 \| 4.1 Times are in milliseconds (ms). [------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 14.7 \| 17.1 channels_last non-contiguous torch.float32 \| 14.8 \| 17.2 Times are in milliseconds (ms). [------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -------------------] \| Reference, PIL 8.4.0, mode: RGB \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3.5 \| 3.9 channels_last non-contiguous torch.float32 \| 3.5 \| 3.9 Times are in milliseconds (ms). [---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) ---------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 2.4 \| 1.8 Times are in milliseconds (ms). [---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) ---------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 3.1 \| 2.2 Times are in milliseconds (ms). [---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ----------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.6 \| 1.4 Times are in milliseconds (ms). [--------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ---------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 7.9 \| 5.7 Times are in milliseconds (ms). [--------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ---------] \| Reference, PIL 8.4.0, mode: F \| 1.11.0a0+gitb0bdf58 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.7 \| 1.3 Times are in milliseconds (ms). ``` </details> Code is moved from torchvision: https://github.com/pytorch/vision/pull/3810 and https://github.com/pytorch/vision/pull/4208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68819 Reviewed By: mikaylagawarecki Differential Revision: D33339117 Pulled By: jbschlosser fbshipit-source-id: 6a0443bbba5439f52c7dbc1be819b75634cf67c4	2021-12-29 14:04:43 -08:00
George Qi	8af39b7668	AdaptiveLogSoftmaxWithLoss no_batch_dim support (#69054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69054 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33200166 Pulled By: george-qi fbshipit-source-id: 9d953744351a25f372418d2a64e8402356d1e9b7	2021-12-29 10:25:26 -08:00
soulitzer	3116d87024	Add forward AD formulas for `{adaptive_,fractional_,}max_pool{2,3}d_{backward,}` (#69884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69884 Also fixes: https://github.com/pytorch/pytorch/issues/69322, https://github.com/pytorch/pytorch/issues/69325 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33093039 Pulled By: soulitzer fbshipit-source-id: b9a522a00f4e9e85974888de5058de07280f8f66	2021-12-23 15:51:09 -08:00
soulitzer	5651e1e3ad	Add auto_linear formulas and some others (#69727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69727 Still need to test the backward ones. We would need to update gradgradcheck to check forward over backward. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031728 Pulled By: soulitzer fbshipit-source-id: 86c59df5d2196b5c8dbbb1efed9321e02ab46d30	2021-12-20 12:15:25 -08:00
Albert Liang	0d06616c47	Add `dict` methods to `ParameterDict` (#69403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68476 We implemented all of the following `dict` methods for `ParameterDict` - `get ` - `setdefault` - `popitem` - `fromkeys` - `copy` - `__or__` - `__ior__` - `__reversed__` - `__ror__` The behavior of these new methods matches the expected behavior of python `dict` as defined by the language itself: https://docs.python.org/3/library/stdtypes.html#typesmapping Pull Request resolved: https://github.com/pytorch/pytorch/pull/69403 Reviewed By: albanD Differential Revision: D33187111 Pulled By: jbschlosser fbshipit-source-id: ecaa493837dbc9d8566ddbb113b898997e2debcb	2021-12-17 10:15:47 -08:00
Rui Zhu	46ace4ac33	Add support for masked_softmax when softmax_elements > 1024 & corresponding unit tests (#69924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69924 Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32819181 fbshipit-source-id: 6838a11d3554ec8e1bd48f1c2c7b1ee3a4680995	2021-12-15 16:44:15 -08:00
kshitij12345	e8d5c7cf7f	[nn] mha : no-batch-dim support (python) (#67176 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 * [x] Update docs * [x] Tests for shape checking Tests take roughly 20s on system that I use. Below is the timings for slowest 20 tests. ``` pytest test/test_modules.py -k _multih --durations=20 ============================================================================================== test session starts =============================================================================================== platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini plugins: hypothesis-6.23.2, repeat-0.9.1 collected 372 items / 336 deselected / 36 selected test/test_modules.py ..............ssssssss.............. [100%] ================================================================================================ warnings summary ================================================================================================ ../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73 test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32 /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( -- Docs: https://docs.pytest.org/en/stable/warnings.html ============================================================================================== slowest 20 durations ============================================================================================== 8.66s call test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_cuda_float64 2.02s call test/test_modules.py::TestModuleCPU::test_gradgrad_nn_MultiheadAttention_cpu_float64 1.89s call test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiheadAttention_cuda_float64 1.01s call test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32 0.51s call test/test_modules.py::TestModuleCPU::test_grad_nn_MultiheadAttention_cpu_float64 0.46s call test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float32 0.45s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float64 0.44s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float32 0.21s call test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float64 0.21s call test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float32 0.18s call test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float64 0.17s call test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float32 0.16s call test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float64 0.11s call test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float64 0.08s call test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float32 0.08s call test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float64 0.06s call test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float64 0.06s call test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float32 0.06s call test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float32 0.06s call test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float64 ============================================================================================ short test summary info ============================================================================================= =========================================================================== 28 passed, 8 skipped, 336 deselected, 2 warnings in 19.71s =========================================================================== ``` cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/67176 Reviewed By: dagitses Differential Revision: D33094285 Pulled By: jbschlosser fbshipit-source-id: 0dd08261b8a457bf8bad5c7f3f6ded14b0beaf0d	2021-12-14 13:21:21 -08:00
Rui Zhu	1a299d8f1b	Add support for transformer layout of masked_softmax (#69272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272 In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D). This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory. In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread. This new layout is not currently supported in CPU yet. Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32605557 fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2	2021-12-14 10:51:58 -08:00
Joel Schlosser	fc37e5b3ed	Hook up general convolution to convolution_backward (#69584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69584 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32936380 Pulled By: jbschlosser fbshipit-source-id: c6fdd88db33bd1a9d0eabea47ae09a4d5b170e92	2021-12-12 17:30:01 -08:00
Joel Schlosser	f0e98dcbd3	General convolution_backward function (#69044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69044 Test Plan: Imported from OSS Reviewed By: zou3519, albanD, H-Huang Differential Revision: D32708818 Pulled By: jbschlosser fbshipit-source-id: e563baa3197811d8d51553fc83718ace2f8d1b7a	2021-12-12 15:53:38 -08:00
Rui Zhu	aab67c6dff	Add native masked_softmax (#69268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69268 This diff enabled native masked softmax on CUDA, also expanded our current warp_softmax to accept masking. The mask in this masked softmax has to be the same shape as input, and has to be contiguous. In a following diff I will submit later, I will have encoder mask layout included, where input is BHDD and mask is BD. Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32338419 fbshipit-source-id: 48c3fde793ad4535725d9dae712db42e2bdb8a49	2021-12-09 23:29:45 -08:00
kshitij12345	7407e3d6fd	[fix] cross_entropy : fix weight with ignore_index and label_smoothing (#69511 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69339 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/69511 Reviewed By: mrshenli Differential Revision: D32951935 Pulled By: jbschlosser fbshipit-source-id: 482eae851861a32f96bd6231dd3448fb6d44a015	2021-12-08 12:08:33 -08:00
jjsjann123	3c1e2ff9eb	fixing layer_norm cuda bug (#69210 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69210 Reviewed By: H-Huang Differential Revision: D32764811 Pulled By: ngimel fbshipit-source-id: fb4201fe5f2284fcb22e36bc1029eef4a21b09bf	2021-12-01 15:46:47 -08:00
Kurt Mohler	d507fd63f3	Check that block height and width are positive in `nn.Fold` (#69048 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68875 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/69048 Reviewed By: samdow Differential Revision: D32729307 Pulled By: jbschlosser fbshipit-source-id: 162cafb005873012d900d86997d07640967038c0	2021-12-01 10:08:47 -08:00
Omkar Salpekar	8e343ba5db	Revert D32611368: [pytorch][PR] Initial version of general convolution_backward Test Plan: revert-hammer Differential Revision: D32611368 (`445b31abff`) Original commit changeset: 26d759b7c908 fbshipit-source-id: e91f45f0f31150e60d657a3964b7e42027beff58	2021-11-23 13:39:36 -08:00
Joel Schlosser	445b31abff	Initial version of general convolution_backward (#65219 ) Summary: Towards [convolution consolidation](https://fb.quip.com/tpDsAYtO15PO). Introduces the general `convolution_backward` function that uses the factored-out backend routing logic from the forward function. Some notes: * `finput` is now recomputed in the backward pass for the slow 2d / 3d kernels instead of being saved from the forward pass. The logic for is based on the forward computation and is present in `compute_finput2d` / `compute_finput3d` functions in `ConvUtils.h`. * Using structured kernels for `convolution_backward` requires extra copying since the backend-specific backward functions return tensors. Porting to structured is left as future work. * The tests that check the routing logic have been renamed from `test_conv_backend_selection` -> `test_conv_backend` and now also include gradcheck validation using an `autograd.Function` hooking up `convolution` to `convolution_backward`. This was done to ensure that gradcheck passes for the same set of inputs / backends. The forward pass routing is done as shown in this flowchart (probably need to download it for it to be readable since it's ridiculous): ![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/137186002-5bca75ca-f911-4e61-8245-ec07af841506.png) ![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/139731619-9d0d436e-cce3-4bc3-8eaf-d469f667f0d7.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65219 Reviewed By: mruberry Differential Revision: D32611368 Pulled By: jbschlosser fbshipit-source-id: 26d759b7c908ab8f19ecce627acea7bd3d5f59ba	2021-11-23 08:19:45 -08:00
soulitzer	7bb401a4c9	Add forward AD support for miscellanous operators (#67820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67820 Original PR here: https://github.com/pytorch/pytorch/pull/67040 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32314423 Pulled By: soulitzer fbshipit-source-id: ecd898dc903692cab084f6922a1d86986f957b1b	2021-11-19 14:31:06 -08:00
jiej	ca92111758	Add native_dropout (#63937 ) Summary: Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937 Reviewed By: mruberry Differential Revision: D32477657 Pulled By: ngimel fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4	2021-11-18 19:41:10 -08:00
kshitij12345	d5d2096dab	[testing] make @dtypes mandatory when using @dtypesIf (#68186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53647 With this if a test forgets to add `dtypes` while using `dtypesIf`, following error is raised ``` AssertionError: dtypes is mandatory when using dtypesIf however 'test_exponential_no_zero' didn't specify it ``` Tested Locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/68186 Reviewed By: VitalyFedyunin Differential Revision: D32468581 Pulled By: mruberry fbshipit-source-id: 805e0855f988b77a5d8d4cd52b31426c04c2200b	2021-11-18 08:29:31 -08:00
vfdev-5	3da2e09c9b	Added antialias flag to interpolate (CPU only, bilinear) (#65142 ) Summary: Description: - Added antialias flag to interpolate (CPU only) - forward and backward for bilinear mode - added tests ### Benchmarks <details> <summary> Forward pass, CPU. PTH interpolation vs PIL </summary> Cases: - PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apply vs pears) - PTH 1 Channel, float32 vs PIL 1 Channel Float Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` # OMP_NUM_THREADS=1 python bench_interp_aa_vs_pillow.py Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, Num threads: 1 [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (320, 196) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2.9 \| 3.1 channels_last non-contiguous torch.float32 \| 2.6 \| 3.6 Times are in milliseconds (ms). [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (460, 220) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3.4 \| 4.0 channels_last non-contiguous torch.float32 \| 3.4 \| 4.8 Times are in milliseconds (ms). [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 96) -------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 1.6 \| 1.8 channels_last non-contiguous torch.float32 \| 1.6 \| 1.9 Times are in milliseconds (ms). [----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 9.0 \| 11.3 channels_last non-contiguous torch.float32 \| 8.9 \| 12.5 Times are in milliseconds (ms). [----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2.1 \| 1.8 channels_last non-contiguous torch.float32 \| 2.1 \| 3.4 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (320, 196) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.2 \| 1.0 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (460, 220) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.4 \| 1.3 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 719.9 \| 599.9 Times are in microseconds (us). [-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 3.7 \| 3.5 Times are in milliseconds (ms). [-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 834.4 \| 605.7 Times are in microseconds (us). ``` </details> Code is moved from torchvision: https://github.com/pytorch/vision/pull/4208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65142 Reviewed By: mrshenli Differential Revision: D32432405 Pulled By: jbschlosser fbshipit-source-id: b66c548347f257c522c36105868532e8bc1d4c6d	2021-11-17 09:10:15 -08:00
vfdev-5	6adbe044e3	Added nearest-exact interpolation mode (#64501 ) Summary: Added "nearest-exact" interpolation mode to fix the issues: https://github.com/pytorch/pytorch/issues/34808 and https://github.com/pytorch/pytorch/issues/62237. Description: As we can not fix "nearest" mode without large impact on already trained model [it was suggested](https://github.com/pytorch/pytorch/pull/64501#pullrequestreview-749771815) to introduce new mode instead of fixing exising "nearest" mode. - New mode "nearest-exact" performs index computation for nearest interpolation to match scikit-image, pillow, TF2 and while "nearest" mode still match opencv INTER_NEAREST, which appears to be buggy, see https://ppwwyyxx.com/blog/2021/Where-are-Pixels/#Libraries. "nearest": ``` input_index_f32 = output_index * scale input_index = floor(input_index_f32) ``` "nearest-exact" ``` input_index_f32 = (output_index + 0.5) * scale - 0.5 input_index = round(input_index_f32) ``` Comparisions with other libs: https://gist.github.com/vfdev-5/a5bd5b1477b1c82a87a0f9e25c727664 PyTorch version \| 1.9.0 "nearest" \| this PR "nearest" \| this PR "nearest-exact" ---\|---\|---\|--- Resize option: \| \| OpenCV INTER_NEAREST result mismatches \| 0 \| 0 \| 10 OpenCV INTER_NEAREST_EXACT result mismatches \| 9 \| 9 \| 9 Scikit-Image result mismatches \| 10 \| 10 \| 0 Pillow result mismatches \| 10 \| 10 \| 7 TensorFlow result mismatches \| 10 \| 10 \| 0 Rescale option: \| \| \| size mismatches (https://github.com/pytorch/pytorch/issues/62396) \| 10 \| 10 \| 10 OpenCV INTER_NEAREST result mismatches \| 3 \| 3\| 5 OpenCV INTER_NEAREST_EXACT result mismatches \| 3 \| 3\| 4 Scikit-Image result mismatches \| 4 \| 4 \| 0 Scipy result mismatches \| 4 \| 4 \| 0 TensorFlow: no such option \| - \| - Versions: ``` skimage: 0.19.0.dev0 opencv: 4.5.4-dev scipy: 1.7.2 Pillow: 8.4.0 TensorFlow: 2.7.0 ``` Implementations in other libs: - Pillow: - `ee079ae67e/src/libImaging/Geometry.c (L889-L899)` - `ee079ae67e/src/libImaging/Geometry.c (L11)` - `a[2] == 0` - Scikit-Image : - dev v0.19.0 uses scipy ndi.zoom: - `38fae50c3f/skimage/transform/_warps.py (L180-L188)` - `47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L775-L779)` - `47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L479)` Additionally: - Updated upsampling tests cc ezyang gchanan albanD mruberry jbschlosser walterddr fmassa heitorschueroff ppwwyyxx Pull Request resolved: https://github.com/pytorch/pytorch/pull/64501 Reviewed By: anjali411 Differential Revision: D32361901 Pulled By: jbschlosser fbshipit-source-id: df906f4d25a2b2180e1942ffbab2cc14600aeed2	2021-11-15 14:28:19 -08:00
yanbing-j	12026124cc	Avoid the view for mkldnn case in 1D convolution (#68166 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68166 Reviewed By: mrshenli Differential Revision: D32432444 Pulled By: jbschlosser fbshipit-source-id: fc4e626d497d9e4597628a18eb89b94518bb3b33	2021-11-15 11:56:45 -08:00
eqy	a1ace029e2	Add host-side memory requirement for `test_softmax_64bit_indexing` (#67922 ) Summary: https://github.com/pytorch/pytorch/issues/67910 The original `largeTensorTest` decorator didn't account for the additional host-side memory requirements. Thanks crcrpar for raising the issue, CC ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67922 Reviewed By: malfet Differential Revision: D32308602 Pulled By: mruberry fbshipit-source-id: 97b7d2c39fe63c1a8269402f72186026a89f6b4c	2021-11-11 09:24:15 -08:00
Dani El-Ayyass	f171c78c04	add unpack_sequence and unpad_sequence functions (#66550 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66550 Reviewed By: malfet Differential Revision: D32299193 Pulled By: jbschlosser fbshipit-source-id: 96c92d73d3d40b7424778b2365e0c8bb1ae56cfb	2021-11-10 15:15:08 -08:00
Joel Schlosser	9a2db6f091	Factor backend routing logic out of convolution forward (#67790 ) Summary: This PR introduces a new function `_select_conv_backend` that returns a `ConvBackend` enum representing the selected backend for a given set of convolution inputs and params. The function and enum are exposed to python for testing purposes through `torch/csrc/Module.cpp` (please let me know if there's a better place to do this). A new set of tests validates that the correct backend is selected for several sets of inputs + params. Some backends aren't tested yet: * nnpack (for mobile) * xnnpack (for mobile) * winograd 3x3 (for mobile) Some flowcharts for reference: ![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/140828957-1135b400-38c0-4c9f-87ef-4f33ceebeeae.png) ![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/140828977-ed223a4e-aa86-49f1-9925-c0f6b9ab36af.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67790 Reviewed By: zou3519 Differential Revision: D32280878 Pulled By: jbschlosser fbshipit-source-id: 0ce55174f470f65c9b5345b9980cf12251f3abbb	2021-11-10 07:53:55 -08:00
Xiao Wang	f6a4c80a5a	Refactor cuDNN Convolution memory format and Conv-Bias-Relu code (#65594 ) Summary: This PR makes several changes: - Changed function `bool cudnn_conv_use_channels_last(...)` to `at::MemoryFormat cudnn_conv_suggest_memory_format(...)` - Removed `resize_` in cudnn convolution code. Added a new overloading method `TensorDescriptor::set` that also passes the desired memory format of the tensor. - Disabled the usage of double + channels_last on cuDNN Conv-Relu and Conv-Bias-Relu. Call `.contiguous(memory_format)` before passing data to cuDNN functions. - Disabled the usage of cuDNN fused Conv-Bias-Relu in cuDNN < 8.0 version due to a CUDNN_STATUS_NOT_SUPPORTED error. Instead, use the native fallback path. - Let Conv-Bias-Relu code respect the global `allow_tf32` flag. From cuDNN document, double + NHWC is genenrally not supported. Close https://github.com/pytorch/pytorch/pull/66968 Fix https://github.com/pytorch/pytorch/issues/55301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65594 Reviewed By: jbschlosser, malfet Differential Revision: D32175766 Pulled By: ngimel fbshipit-source-id: 7ba079c9f7c46fc56f8bfef05bad0854acf380d7	2021-11-05 11:50:55 -07:00
Alban Desmaison	bb8978f605	Revert D32175963: Converting hardswish to strucutred kernels with metatensor support Test Plan: revert-hammer Differential Revision: D32175963 (`57335a9ee3`) Original commit changeset: f4d749c6aeaf fbshipit-source-id: 6d68a96cf872c2d7b518c061875b9336bca0043a	2021-11-05 07:04:40 -07:00
John Clow	57335a9ee3	Converting hardswish to strucutred kernels with metatensor support (#66899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66899 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175963 Pulled By: Gamrix fbshipit-source-id: f4d749c6aeaf064084be72361607ea4f3f6bc91d	2021-11-04 19:02:00 -07:00
soulitzer	83e8612d11	Clean up test autograd (#67413 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/66066 This PR: - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality - tests related to an operator are better colocated - see the tracker for details What to think about when moving tests to their correct test suite: - naming, make sure its not too generic - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter - can this be merged with existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413 Reviewed By: jbschlosser, albanD Differential Revision: D32031480 Pulled By: soulitzer fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f	2021-11-03 15:26:09 -07:00
Xiao Wang	31cf3d6aad	Fix adaptive_max_pool2d for channels-last on CUDA (#67697 ) Summary: Fix https://github.com/pytorch/pytorch/issues/67239 The CUDA kernels for `adaptive_max_pool2d` (forward and backward) were written for contiguous output. If outputs are non-contiguous, first create a contiguous copy and let the kernel write output to the contiguous memory space. Then copy the output from contiguous memory space to the original non-contiguous memory space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67697 Reviewed By: ejguan Differential Revision: D32112443 Pulled By: ngimel fbshipit-source-id: 0e3bf06d042200c651a79d13b75484526fde11fe	2021-11-03 09:47:29 -07:00
John Shen	234bd6dc56	[quantized] Add bilinear quantized grid_sample (#66879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66879 This adds a quantized implementation for bilinear gridsample. Bicubic interpolation cannot be supported as easily since we rely on the linearity of quantization to operate on the raw values, i.e. f(q(a), q(b)) = q(f(a, b)) where f is the linear interpolation function. ghstack-source-id: 141321116 Test Plan: test_quantization Reviewed By: kimishpatel Differential Revision: D31656893 fbshipit-source-id: d0bc31da8ce93daf031a142decebf4a155943f0f	2021-11-01 14:44:26 -07:00
kshitij12345	885a8e53ba	replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201 ) Summary: Reference https://github.com/pytorch/pytorch/issues/53849 Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201 Reviewed By: mrshenli Differential Revision: D31299718 Pulled By: mruberry fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd	2021-11-01 09:22:34 -07:00
Joel Schlosser	16d937b0df	Fix strided _conv_double_backward() with 3D input / weight (#67283 ) Summary: Removes the 3D special case logic in `_convolution_double_backward()` that never worked. The logic was never called previously since `convolution()` expands input / weight from 3D -> 4D before passing them to backends; backend-specific backward calls thus save the 4D version to pass to `_convolution_double_backward()`. The new general `convolution_backward()` saves the original 3D input / weight, uncovering the bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67283 Reviewed By: anjali411 Differential Revision: D32021100 Pulled By: jbschlosser fbshipit-source-id: 0916bcaa77ef49545848b344d6385b33bacf473d	2021-10-29 09:48:53 -07:00
Sameer Deshmukh	edd4d246c3	Accept 0-dim channel inputs in convolution layer (#66256 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56998 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/66256 Reviewed By: mrshenli Differential Revision: D31859428 Pulled By: jbschlosser fbshipit-source-id: 034b6c1ce35aac50eabfa09bbcd8b1e3c8b171bd	2021-10-25 08:12:29 -07:00
Eddie Yan	d9c4b3feab	Do rowwisemoments computation in `float` for `half` `LayerNorm` (#66920 ) Summary: https://github.com/pytorch/pytorch/issues/66707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66920 Reviewed By: mrshenli Differential Revision: D31850612 Pulled By: ngimel fbshipit-source-id: a95a33567285dcf9ee28d33f503cead3268960f9	2021-10-22 09:50:42 -07:00
vfdev	62ca5a81c0	Exposed `recompute_scale_factor` into nn.Upsample (#66419 ) Summary: Description: - Exposed recompute_scale_factor into nn.Upsample such that recompute_scale_factor=True option could be used Context: https://github.com/pytorch/pytorch/pull/64501#discussion_r710205190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66419 Reviewed By: gchanan Differential Revision: D31731276 Pulled By: jbschlosser fbshipit-source-id: 2118489e6f5bc1142f2a64323f4cfd095a9f3c42	2021-10-20 07:59:25 -07:00
Jane Xu	9eab6da887	[skip ci] Set test owner for nn tests (#66850 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/66850 Reviewed By: albanD Differential Revision: D31761712 Pulled By: janeyx99 fbshipit-source-id: 7272154cac77e2ce38370775a9e8d41252e13166	2021-10-19 08:26:50 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Tomi Peltola	713e025c9f	Add no-input-grad-needed cases to test_grid_sample (#66071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66071 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431801 Pulled By: albanD fbshipit-source-id: 57a94ed9e97e402aa8193d69355e57b6309c64f7	2021-10-13 10:56:47 -07:00
Natalia Gimelshein	4a50b6c490	fix cosine similarity dimensionality check (#66191 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66191 Reviewed By: dagitses, malfet Differential Revision: D31436997 Pulled By: ngimel fbshipit-source-id: 363556eea4e1696d928ae08320d298451c286b10	2021-10-06 15:44:51 -07:00
lezcano	ca76e193a3	Fix nll_backward for negative weights (#64572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64572 Fixes https://github.com/pytorch/pytorch/issues/64256 It also fixes an inconsistent treatment of the case `reduction = "mean"` when the whole target is equal to `ignore_index`. It now returns `NaN` in this case, consistently with what it returns when computing the mean over an empty tensor. We add tests for all these cases. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31116297 Pulled By: albanD fbshipit-source-id: cc44e79205f5eeabf1efd7d32fe61e26ba701b52	2021-10-01 19:41:51 -07:00
Philip Meier	aebde1bc2b	deprecate device getter from `torch.testing` namespace (#63844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63844 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31141433 Pulled By: mruberry fbshipit-source-id: a29331278ab99a19e225e2cb357458e3db4f9732	2021-09-29 02:40:52 -07:00
kshitij12345	a012216b96	[nn] Fold : no batch dim (#64909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64907 Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909 Reviewed By: cpuhrsch, heitorschueroff Differential Revision: D30991087 Pulled By: jbschlosser fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3	2021-09-23 08:37:32 -07:00
Rodrigo Berriel	b80bdcc73b	Add register_module alias to nn.Module (#65174 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60397. I'm not sure how aliases are supposed to be implemented, but this is the most basic/direct way, IMO. As a side-effect, this implementation results in a "duplicate" doc entry, inheriting the one from `add_module`: ![monkey-patch](https://user-images.githubusercontent.com/7027770/133693137-8408d8e7-1f4f-436b-b176-57dda9bc3a32.png) An alternative implementation could be: ```python def register_module(self, name: str, module: Optional['Module']) -> None: r"""Alias for :func:`add_module`.""" self.add_module(name, module) ``` which results in this documentation: ![image](https://user-images.githubusercontent.com/7027770/133693249-d969a71a-be44-489d-9633-4f38b44ab887.png) Questions: 1. Should I replicate the tests? There are two for `add_module`: [test_add_module_raises_error_if_attr_exists](`873255c6d9/test/test_nn.py (L1420-L1434)`) and [test_add_module](`873255c6d9/test/test_nn.py (L1837-L1855)`). 2. This PR only adds `register_module` to `nn.Module`. There is an `add_module` in [`_RemoteModule`](https://github.com/pytorch/pytorch/blob/master/torch/distributed/nn/api/remote_module.py#L311-L312), which raises `NotSupported`, and there is another one in [`ConcreteModuleTypeBuilder`](`873255c6d9/torch/_C/__init__.pyi.in (L468)`), which means something else, I think. Should I do anything about them? cc ngimel SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/65174 Reviewed By: soulitzer Differential Revision: D31089717 Pulled By: jbschlosser fbshipit-source-id: abd8d14a434fd8c7efa0bd8c242df56da33491e9	2021-09-22 16:37:28 -07:00
kshitij12345	9c23f6eb7d	[nn] TripletMarginLoss and PairwiseDistance : no batch dim (#64882 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64882 Reviewed By: malfet Differential Revision: D31055577 Pulled By: jbschlosser fbshipit-source-id: 2f0a5a08619b672026b48a78bc7d83a6dccba0bf	2021-09-21 07:29:48 -07:00
Alban Desmaison	d37c02be08	Allow parametrization to be nested (#65167 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65167 Reviewed By: jbschlosser Differential Revision: D31002318 Pulled By: albanD fbshipit-source-id: b1f1c6c9efa9e83af9789ed13efc133f777f418e	2021-09-17 07:29:01 -07:00
kshitij12345	01e92f2a56	[nn] no batch dim support: CosineEmbeddingLoss (#64590 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 TODO * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/64590 Reviewed By: H-Huang Differential Revision: D30900775 Pulled By: jbschlosser fbshipit-source-id: d24e72787017e79afbf8f04a94901a290485b81a	2021-09-13 10:45:33 -07:00
Aswin John Mathews	63b180beed	ROCm MIOpen NHWC Convolution support (#63617 ) Summary: - Added 2D-Convolution NHWC support - on ROCm 4.3, with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` flag - May need to force MIOpen to search for solutions ( see examples below for flags ) PYTORCH_MIOPEN_SUGGEST_NHWC Environment Flag MIOpen does not officially support NHWC yet, although convolution support has been added to tip-of-tree of MIOpen. This flag is intended to be a short-lived flag to explicitly turn on NHWC support until ROCm officially supports NHWC and performance is verified. Examples 1. Example usage 1 : Run test on ROCm4.3 `PYTORCH_TEST_WITH_ROCM=1 PYTORCH_MIOPEN_SUGGEST_NHWC=1 MIOPEN_FIND_ENFORCE=4 MIOPEN_DEBUG_CONV_GEMM=0 MIOPEN_FIND_MODE=1 pytest test_nn.py -v -k "test_conv_cudnn_nhwc" ` 2. Example usage 2: Run the following with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` on ROCm4.3. ``` #!/usr/bin/env python3 import torch model = torch.nn.Conv2d(8, 4, 3).cuda().half() model = model.to(memory_format=torch.channels_last) input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, requires_grad=True) input = input.to(device="cuda", memory_format=torch.channels_last, dtype=torch.float16) # should print True for is_contiguous(channels_last), and strides must match NHWC format print(input.is_contiguous(memory_format=torch.channels_last), input.shape, input.stride() ) out = model(input) # should print True for is_contiguous(channels_last), and strides must match NHWC format print("Contiguous channel last :", out.is_contiguous(memory_format=torch.channels_last), " out shape :", out.shape, "out stride :", out.stride() ) ``` See https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html for more examples. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/63617 Reviewed By: saketh-are Differential Revision: D30730800 Pulled By: ezyang fbshipit-source-id: 61906a0f30be8299e6547d312ae6ac91cc7c3238	2021-09-10 08:06:32 -07:00
Sameer Deshmukh	7205ca0210	Change MaxUnpool to accept tensors with 0-dim batch sizes. (#64082 ) Summary: Part of the fix for https://github.com/pytorch/pytorch/issues/38115. Changes the `MaxUnpool` module to work with 0-dimensions batch sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64082 Reviewed By: mrshenli Differential Revision: D30793907 Pulled By: jbschlosser fbshipit-source-id: d21aa665be5aa18f592b39ef7b4e3cbc632e21ed	2021-09-08 08:41:09 -07:00
Philip Meier	26b7ff5aea	deprecate dtype getters from `torch.testing` namespace (#63554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554 Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold: 1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`. 2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries. We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30662206 Pulled By: mruberry fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56	2021-09-07 08:58:51 -07:00
Richard Zou	535526b95c	Restore LayerNorm numerics test (#64385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64385 It was deleted in https://github.com/pytorch/pytorch/pull/63276. The numerics test was meant to check LayerNorm behavior on large inputs, but we deleted it without realizing that. Test Plan: - wait for tests. Reviewed By: ngimel Differential Revision: D30702950 Pulled By: zou3519 fbshipit-source-id: a480e26c45ec38fb628938b70416cdb22d976a46	2021-09-01 15:32:49 -07:00
Kushashwa Ravi Shrimali	d5bfdd3dac	OpInfo for `nn.functional.layer_norm` (#63276 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Note: * This PR also adds a reference test inspired by existing tests in `test_nn.py`. cc: mruberry zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63276 Reviewed By: ejguan Differential Revision: D30452483 Pulled By: zou3519 fbshipit-source-id: 2578d01ca34e031668a41bd284db60c31ae1fba8	2021-09-01 09:31:45 -07:00
Kushashwa Ravi Shrimali	ca8dd296ee	Add OpInfo for `nn.functional.cosine_similarity` (#62959 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Notes: * Some redundant tests from `test_nn.py` have been removed. I'm unsure about precision checks if they can be removed as well. * Broadcasting is also checked in the OpInfo for `cosine_similarity`. cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62959 Reviewed By: heitorschueroff Differential Revision: D30520176 Pulled By: zou3519 fbshipit-source-id: 14e902eb4bcce875edab28a1669a2ea021052b9b	2021-08-31 10:31:36 -07:00
CaoE	cb7cf823b3	add BFloat16 support for fold and unfold on CPU (#62880 ) Summary: Add BFloat16 support for fold and unfold operators on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62880 Reviewed By: iramazanli Differential Revision: D30576387 Pulled By: zou3519 fbshipit-source-id: c48f6e56702bfea34448db1b3a1634c49c5d8ec8	2021-08-30 19:14:10 -07:00
lezcano	f3e329cbec	Implements the orthogonal parametrization (#62089 ) Summary: Implements an orthogonal / unitary parametrisation. It does passes the tests and I have trained a couple models with this implementation, so I believe it should be somewhat correct. Now, the implementation is very subtle. I'm tagging nikitaved and IvanYashchuk as reviewers in case they have comments / they see some room for optimisation of the code, in particular of the `forward` function. Fixes https://github.com/pytorch/pytorch/issues/42243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62089 Reviewed By: ezyang Differential Revision: D30639063 Pulled By: albanD fbshipit-source-id: 988664f333ac7a75ce71ba44c8d77b986dff2fe6	2021-08-30 13:12:07 -07:00
Peter Bell	5b0dfd0f8a	Fix bad use of channels last kernel in sync batch norm backward (#64100 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64039 There are two distinct problems here. 1. If `grad_output` is channels last but not input, then input would be read as-if it were channels last. So reading the wrong values. 2. `use_channels_last_kernels` doesn't guarunte that `suggest_memory_format` will actually return channels last, so use `empty_like` instead so the strides always match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64100 Reviewed By: mruberry Differential Revision: D30622127 Pulled By: ngimel fbshipit-source-id: e28cc57215596817f1432fcdd6c49d69acfedcf2	2021-08-30 12:16:30 -07:00
Thomas J. Fan	d3bcba5f85	ENH Adds label_smoothing to cross entropy loss (#63122 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/7455 Partially resolves pytorch/vision#4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122 Reviewed By: iramazanli Differential Revision: D30586076 Pulled By: jbschlosser fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924	2021-08-29 23:33:04 -07:00
mingfeima	c5ed31e4a7	add channel last support for MaxUnpool2d (#49984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49984 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007051 Pulled By: VitalyFedyunin fbshipit-source-id: 6c54751ade4092e03c1651aaa60380f7d6e92f6b	2021-08-29 18:37:10 -07:00
BBuf	6ab3a21098	fix resize bug (#61166 ) Summary: I think the original intention here is to only take effect in the case of align_corners (because output_size = 1 and the divisor will be 0), but it affects non-align_corners too. For example: ```python input = torch.tensor( np.arange(1, 5, dtype=np.int32).reshape((1, 1, 2, 2)) ) m = torch.nn.Upsample(scale_factor=0.5, mode="bilinear") of_out = m(input) ``` The result we expect should be [[[[2.5]]]] but pytorch get [[[[1.0]]]] which is different from OpenCV and PIL, this pr try to fixed it。 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61166 Reviewed By: malfet Differential Revision: D30543178 Pulled By: heitorschueroff fbshipit-source-id: 21a4035483981986b0ae4a401ef0efbc565ccaf1	2021-08-27 10:49:31 -07:00
Philip Meier	57d4c6cf42	replace `self.assertTrue(torch.allclose(..))` with `self.assertEqual(…)` (#63637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63637 Reviewed By: malfet Differential Revision: D30541266 Pulled By: mruberry fbshipit-source-id: ab461949782c6908a589ea098fcfcf5c3e081ee6	2021-08-25 16:47:40 -07:00
mingfeima	b0782f0f32	add BFloat16 support for bernoulli and Dropout on CPU (#56372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56372 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28836792 Pulled By: VitalyFedyunin fbshipit-source-id: ede951d172a59276e11383fd767778ab959b5a6b	2021-08-25 12:01:27 -07:00
Joel Schlosser	544af391b5	Allow arbitrary objects in state_dicts (#62976 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62094 Introduces functionality for adding arbitrary objects to module state_dicts. To take advantage of this, the following functions can be defined on a module: * `get_extra_state(self) -> dict` - Returns a dict defining any extra state this module wants to save * `set_extra_state(self, state)` - Subsumes the given state within the module In the details, a sub-dictionary is stored in the state_dict under the key `_extra_state` for each module that requires extra state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62976 Reviewed By: heitorschueroff Differential Revision: D30518657 Pulled By: jbschlosser fbshipit-source-id: 5fb35ab8e3d36f35e3e96dcd4498f8c917d1f386	2021-08-24 19:06:14 -07:00
soulitzer	5be17ec1fc	Do not modify saved variables in-place for spectral norm during power iteration (#62293 ) Summary: Interestingly enough, the original code did have a mechanism that aims to prevent this very issue: but it performs a clone AFTER modifying u and v in-place. This wouldn't work though because we can later use the cloned u and v in operations that save for backward, and the next time we execute forward, we modify the same cloned u and v in-place. So if the idea is that we want to avoid modifying saved variable in-place we should clone it BEFORE the in-place operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62293 Reviewed By: bdhirsh Differential Revision: D30489750 Pulled By: soulitzer fbshipit-source-id: cbe8dea885aef97adda8481f7a822e5bd91f7889	2021-08-24 13:08:59 -07:00
mingfeima	d3be02d100	fix batchnorm2d issue when input is non contiguous (#63392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63392 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30476317 Pulled By: VitalyFedyunin fbshipit-source-id: 03055a0aec21cf2c029b6f32315da2b09cb722d0	2021-08-24 08:24:01 -07:00
mingfeima	5b7cdc5a3d	add channels last for GroupNorm (#49821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49821 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007053 Pulled By: VitalyFedyunin fbshipit-source-id: 34a48d5d3b66a159febf3c3d96748fbaba1b9e31	2021-08-23 22:54:59 -07:00
Jeff Daily	a8de0d83fe	empty caching allocator before test_avg_pool2d large subtest (#63528 ) Summary: Otherwise, unrecoverable OOM occurs on MI25. Fixes broken ROCm CI test1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63528 Reviewed By: malfet, zhouzhuojie Differential Revision: D30459151 Pulled By: walterddr fbshipit-source-id: 63e205c4f486fcbdd514cfb0ed8e38584f894585	2021-08-20 14:01:45 -07:00
Philip Meier	99203580a9	Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841 Redo of #60863. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30408145 Pulled By: mruberry fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58	2021-08-19 12:50:41 -07:00
kshitij12345	3ce67efea2	[opinfo] nn.functional.pad (#62814 ) Summary: Reference: https://github.com/facebookresearch/functorch/issues/78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62814 Reviewed By: VitalyFedyunin Differential Revision: D30307492 Pulled By: zou3519 fbshipit-source-id: 4f6062eb4a3c91ed1795df1f82846afa0abafcdc	2021-08-16 13:29:34 -07:00
leslie-fang-intel	385b082854	add substract of max and testcase (#63132 ) Summary: As discussed here https://github.com/pytorch/pytorch/pull/62897, in the path of BF16/non-last-dim Softmax, we miss the subtractions of max value which will cause the overflow in the `exp()` calculation when the value of input tensor is large, such as `1000.0`. To avoid this issue, we add the subtractions of max value and the corresponding test cases in this PR. Note w/o subtractions of max value(accidental reverts or changes), we will get the underlying error message of the test case ``` AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.05 and atol=0.05, found 103984 element(s) (out of 126720) whose difference(s) exceeded the margin of error (including 103984 nan comparisons). The greatest difference was nan (0.0 vs. nan), which occurred at index (0, 0, 0, 1). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63132 Reviewed By: VitalyFedyunin Differential Revision: D30280792 Pulled By: cpuhrsch fbshipit-source-id: 722821debf983bbb4fec878975fa8a4da0d1d866	2021-08-13 20:50:49 -07:00
Sameer Deshmukh	809e1e7457	Allow TransformerEncoder and TransformerDecoder to accept 0-dim batch sized tensors. (#62800 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows TransformerEncoder and Decoder (alongwith the inner `Layer` classes) to accept inputs with 0-dimensional batch sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62800 Reviewed By: VitalyFedyunin Differential Revision: D30303240 Pulled By: jbschlosser fbshipit-source-id: 8f8082a6f2a9f9d7ce0b22a942d286d5db62bd12	2021-08-13 16:11:57 -07:00
Sameer Deshmukh	38a825c648	Allow Average Pooling modules to accept tensors with 0-dim batch sizes. (#62025 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. It introduces changes and tests for allowing the Average Pooling layers to accept tensors with 0 sized batch dimensions and return meaningful results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62025 Reviewed By: VitalyFedyunin Differential Revision: D30303256 Pulled By: jbschlosser fbshipit-source-id: 5f727e62a7c58d2b8bb49fcc3bd7688474917ba5	2021-08-13 11:31:17 -07:00
Sameer Deshmukh	cb23976f9f	Allow 0-dim batch sizes for AdaptiveMaxPool and MaxPool. (#62088 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows `MaxPool` and `AdaptiveMaxPool` to accept tensors whose batch size is 0. Some changes have been made to modernize the tests so that they will show the name of C++ function that throws an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62088 Reviewed By: bdhirsh Differential Revision: D30281285 Pulled By: jbschlosser fbshipit-source-id: 52bffc67bfe45a78e11e4706b62cce1469eba1b9	2021-08-13 07:33:17 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Christian Puhrsch	3beb65d45d	test_cudnn_convolution_relu skipCUDAIfRocm Summary: skip rocm test for test_cudnn_convolution_relu Test Plan: This skips a test Reviewed By: ngimel Differential Revision: D30233620 fbshipit-source-id: 31eab8b03c3f15674e0d262a8f55965c1aa6b809	2021-08-10 15:15:23 -07:00
Sameer Deshmukh	9e7b6bb69f	Allow LocalResponseNorm to accept 0 dim batch sizes (#62801 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows `LocalResponseNorm` to accept tensors with 0 dimensional batch size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62801 Reviewed By: zou3519 Differential Revision: D30165282 Pulled By: jbschlosser fbshipit-source-id: cce0b2d12dbf47dc8ed6247c267bf2f2305f858a	2021-08-10 06:54:52 -07:00
=	084e92bb76	Use output memory format based on input for cudnn_convolution_relu (#62482 ) Summary: Currently when cudnn_convolution_relu is passed a channels last Tensor it will return a contiguous Tensor. This PR changes this behavior and bases the output format on the input format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62482 Reviewed By: ngimel Differential Revision: D30049905 Pulled By: cpuhrsch fbshipit-source-id: 98521d14ee03466e7128a1912b9f754ffe10b448	2021-08-09 15:31:53 -07:00
Natalia Gimelshein	e6a3154519	Allow broadcasting along non-reduction dimension for cosine similarity (#62912 ) Summary: Checks introduced by https://github.com/pytorch/pytorch/issues/58559 are too strict and disable correctly working cases that people were relying on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62912 Reviewed By: jbschlosser Differential Revision: D30165827 Pulled By: ngimel fbshipit-source-id: f9229a9fc70142fe08a42fbf2d18dae12f679646	2021-08-06 19:17:04 -07:00
Sameer Deshmukh	f6c7081a16	Allow FractionalMaxPool 2D and 3D layers to accept 0 dim batch size tensors. (#62083 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. Allow `FractionalMaxPool` 2D and 3D layers to accept 0 dim batch sizes. Also make some minor corrections to error messages to make them more informative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62083 Reviewed By: H-Huang Differential Revision: D30134461 Pulled By: jbschlosser fbshipit-source-id: 0ec50875d36c2083a7f06d9ca6a110fb3ec4f2e2	2021-08-05 17:40:10 -07:00
kshitij12345	64c54f92ca	[opinfo] nn.functional.unfold (#62705 ) Summary: Reference: facebookresearch/functorch#78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62705 Reviewed By: H-Huang Differential Revision: D30138807 Pulled By: zou3519 fbshipit-source-id: 1d0b0e58feb13aec7b231c9f632a6d1694b9d272	2021-08-05 17:12:25 -07:00
Eddie Yan	878943c64f	Preserve memory layout when aten batchnorm is used (#62773 ) Summary: https://github.com/pytorch/pytorch/issues/62594 CC cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/62773 Reviewed By: H-Huang Differential Revision: D30118658 Pulled By: cpuhrsch fbshipit-source-id: bce9e92f5f8710c876a33cccbd1625155496ddea	2021-08-05 10:21:44 -07:00
yanbing-j	c7a7c2b62f	Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525 ) Summary: Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one. Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525 Reviewed By: ejguan Differential Revision: D29940369 Pulled By: ezyang fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf	2021-08-03 06:52:23 -07:00
Joel Schlosser	a42345adee	Support for target with class probs in CrossEntropyLoss (#61044 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11959 Alternative approach to creating a new `CrossEntropyLossWithSoftLabels` class. This PR simply adds support for "soft targets" AKA class probabilities to the existing `CrossEntropyLoss` and `NLLLoss` classes. Implementation is dumb and simple right now, but future work can add higher performance kernels for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61044 Reviewed By: zou3519 Differential Revision: D29876894 Pulled By: jbschlosser fbshipit-source-id: 75629abd432284e10d4640173bc1b9be3c52af00	2021-07-29 10:04:41 -07:00
Joel Schlosser	35307b131d	Callable activation function support for Transformer modules (Python) (#61355 ) Summary: Fixes Python part of https://github.com/pytorch/pytorch/issues/60747 Enhances the Python versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61355 Reviewed By: bdhirsh Differential Revision: D29967302 Pulled By: jbschlosser fbshipit-source-id: 8ee6f20083d49dcd3ab432a18e6ad64fe1e05705	2021-07-28 21:42:56 -07:00
Pritam Damania	cac4aa71ca	Provide option to pass module instance to _load_state_dict_pre_hooks. (#62070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62070 We have a custom Tensor: https://github.com/pytorch/pytorch/blob/master/torch/distributed/_sharded_tensor/api.py#L67, which doesn't show up in state_dict for the module. This was resolved by using the _register_state_dict_hook: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1196 to parse and add custom tensors to state_dict. However, the problem is during load time _register_load_state_dict_pre_hook: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1272, does not pass in the module instance and as a result, a ShardedTensor in the state_dict cannot be appropriately added to a module at load time. To resolve this issue, in this PR I've enhanced this hook to support two variations, one which passes in the module instance (for the problem described above) and one is the previous version for BC reasons. ghstack-source-id: 134541391 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: jbschlosser Differential Revision: D29867142 fbshipit-source-id: bcb136ff51eedd0b508cfb419e8b8a6b7d95539c	2021-07-28 19:22:47 -07:00
Thomas J. Fan	71a6ef17a5	ENH Adds no_batch_dim tests/docs for Maxpool1d & MaxUnpool1d (#62206 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62206 Reviewed By: ejguan Differential Revision: D29942341 Pulled By: jbschlosser fbshipit-source-id: a3fad774cee30478f7d6cdd49d2eec31be3fc518	2021-07-28 10:15:32 -07:00
leslie-fang-intel	7443c90f15	optimize non lastdim softmax bf16 (#60371 ) Summary: Here is the PR to enable the softmax calculation with data type of `bfloat16` when not along the last dim. * Use bf16 specialization for forward calculation to reduce the bf16/fp32 cast in vec template. * Release the bf16 limitation for backward calculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60371 Reviewed By: ejguan Differential Revision: D29563109 Pulled By: cpuhrsch fbshipit-source-id: f6b439fa3850a6c633f35db65ea3d735b747863e	2021-07-28 10:06:51 -07:00
Peter Bell	9776e1ff2f	Migrate thnn_conv_depthwise2d from THC to ATen (#62281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62281 Closes gh-24646, Closes gh-24647 There is no `TensorIterator` equivalent to these kernels so this is just migrating the existing kernels over to the ATen style. I've benchmarked for contiguous tensors with this script: ``` import torch shape = (10, 10, 100, 100) x = torch.randn(*shape, device='cuda') w = torch.randn((10, 1, 5, 5), device='cuda') for _ in range(100): torch.nn.functional.conv2d(x, w, groups=10) ``` and similarly for backwards. I see these as the same to within measurement error. \| \| Master Forward (us) \| This PR Forward (us) \| \|------------------:\|:-------------------:\|:--------------------:\| \| Forward \| 133.5 \| 133.6 \| \| Backward (input) \| 1,102 \| 1,119 \| \| Backward (weight) \| 2,220 \| 2,217 \| Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29943062 Pulled By: ngimel fbshipit-source-id: fc5d16496eb733743face7c5a14e532d7b8ee26a	2021-07-27 16:51:23 -07:00
Sameer Deshmukh	4a15f4a902	Allow 0-dim batch sizes in Bilinear NN layer. (#47106 ) Summary: Part of the fix for https://github.com/pytorch/pytorch/issues/12013 Checks if the inputs and outputs are non-zero in order to allow the Bilinear layer to accept 0-dim batch sizes. The if-check for this checks for both input and output dim sizes since the `_trilinear` function is written to work with both forward and backward for Bilinear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47106 Reviewed By: ejguan Differential Revision: D29935589 Pulled By: jbschlosser fbshipit-source-id: 607d3352bd4f88e2528c64408f04999960be049d	2021-07-27 13:59:42 -07:00
Erjia Guan	acaac70f63	Revert D29883676: Migrate thnn_conv_depthwise2d from THC to ATen Test Plan: revert-hammer Differential Revision: D29883676 (`de3a4eb583`) Original commit changeset: 9b2ac62cdd8a fbshipit-source-id: d211d3cb7723b5d2e73de6941a7e649e5f78864f	2021-07-27 11:28:52 -07:00
Peter Bell	de3a4eb583	Migrate thnn_conv_depthwise2d from THC to ATen (#62006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62006 Closes gh-24646, gh-24647 There is no `TensorIterator` equivalent to these kernels so this is just migrating the existing kernels over to the ATen style. I've benchmarked for contiguous tensors with this script: ``` import torch shape = (10, 10, 100, 100) x = torch.randn(*shape, device='cuda') w = torch.randn((10, 1, 5, 5), device='cuda') for _ in range(100): torch.nn.functional.conv2d(x, w, groups=10) ``` and similarly for backwards. I see these as the same to within measurement error. \| \| Master Forward (us) \| This PR Forward (us) \| \|------------------:\|:-------------------:\|:--------------------:\| \| Forward \| 133.5 \| 133.6 \| \| Backward (input) \| 1,102 \| 1,119 \| \| Backward (weight) \| 2,220 \| 2,217 \| Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29883676 Pulled By: ngimel fbshipit-source-id: 9b2ac62cdd8a84e1a23ffcd66035b2b2fe2374d8	2021-07-27 10:00:25 -07:00
Thomas J. Fan	89ca638c18	ENH Adds no batch dim support for AdativeMaxPool*D (#61847 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61847 Reviewed By: suo Differential Revision: D29883887 Pulled By: jbschlosser fbshipit-source-id: de3fcf1cc3878b138ab766d2a50cc59c52ec5a60	2021-07-26 07:35:36 -07:00
Thomas J. Fan	f03e7170f0	ENH Updates docs and tests for regression modules that already support no-batch-dims (#61461 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 This PR does not use `check_sum_reduction` because I wanted to test every reduction option. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61461 Reviewed By: suo Differential Revision: D29883744 Pulled By: jbschlosser fbshipit-source-id: cdad0effb41f0484938caad0d4c9d6d83e2aec07	2021-07-23 16:40:17 -07:00
Thomas J. Fan	1ec6205bd0	ENH Adds no_batch_dim support for maxpool and unpool for 2d and 3d (#61984 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 (Interesting how the maxpool tests are currently in `test/test_nn.py`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61984 Reviewed By: suo Differential Revision: D29883846 Pulled By: jbschlosser fbshipit-source-id: 1e0637c96f8fa442b4784a9865310c164cbf61c8	2021-07-23 16:14:10 -07:00
Joel Schlosser	f4ffaf0cde	Fix type promotion for cosine_similarity() (#62054 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62054 Reviewed By: suo Differential Revision: D29881755 Pulled By: jbschlosser fbshipit-source-id: 10499766ac07b0ae3c0d2f4c426ea818d1e77db6	2021-07-23 15:20:48 -07:00
Peter Bell	0df1679e5c	BatchNorm: fix mixed precision usage with affine=False (#61962 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61924 The fused backward kernel was using the weight dtype to detect mixed precision usage, but the weights can be none and the `running_mean` and `running_var` can still be mixed precision. So, I update the check to look at those variables as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61962 Reviewed By: albanD Differential Revision: D29825516 Pulled By: ngimel fbshipit-source-id: d087fbf3bed1762770cac46c0dcec30c03a86fda	2021-07-23 09:55:52 -07:00
Vitaly Fedyunin	b60d1b713e	Revert D26007050: add channels last support for thnn_conv2d (non-dilated) Test Plan: revert-hammer Differential Revision: D26007050 (`8b88c24670`) Original commit changeset: 1289e0687c24 fbshipit-source-id: 88b679efbcae572fe604d50e2199861cadbc3d4a	2021-07-22 08:31:15 -07:00
Thomas J. Fan	17d743ff04	ENH Adds test and docs for dropout for no batch dims (#61911 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 I think `Dropout` is already tested in `test_Dropout` for no batch dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61911 Reviewed By: albanD Differential Revision: D29810928 Pulled By: jbschlosser fbshipit-source-id: 7716a1a808e9e34aae43573f38706212552afbb4	2021-07-21 09:07:10 -07:00
Thomas J. Fan	48af9de92f	ENH Enables No-batch for *Pad1d Modules (#61060 ) Summary: Toward https://github.com/pytorch/pytorch/issues/60585 This PR adds a `single_batch_reference_fn` that uses the single batch implementation to check no-batch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61060 Reviewed By: mrshenli Differential Revision: D29739823 Pulled By: jbschlosser fbshipit-source-id: d90d88a3671177a647171801cc6ec7aa3df35482	2021-07-21 07:12:41 -07:00
Calvin McCarter	bdf439a958	Adds _LazyInstanceNorm and LazyInstanceNormXd (#60982 ) Summary: Signed-off-by: Calvin McCarter <calvin@lightmatter.co> Fixes https://github.com/pytorch/pytorch/issues/60981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60982 Reviewed By: albanD Differential Revision: D29810547 Pulled By: jbschlosser fbshipit-source-id: d933d4c7fe5cf7be9b09a5ab93f740b94cf08cc1	2021-07-21 06:45:45 -07:00
mingfeima	8b88c24670	add channels last support for thnn_conv2d (non-dilated) (#49582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49582 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007050 Pulled By: VitalyFedyunin fbshipit-source-id: 1289e0687c2459dd4eb8e4ba2efc8266397cfe5f	2021-07-20 12:50:24 -07:00
Xiong Wei	45751e0b34	Support integral target for the backward of nn.SmoothL1Loss (#61112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58816 - enhance the backward of `nn.SmoothL1Loss` to allow integral `target` - add test cases in `test_nn.py` to check the `input.grad` between the integral input and its floating counterpart. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61112 Reviewed By: mrshenli Differential Revision: D29775660 Pulled By: albanD fbshipit-source-id: 544eabb6ce1ea13e1e79f8f18c70f148e92be508	2021-07-20 10:24:03 -07:00
Joel Schlosser	aa01a7a61c	Fix for get_buffer(): check buffers by name instead of value (#61429 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61242 Previous code was wrongly checking if a tensor is a buffer in a module by comparing values; fix compares names instead. Docs need some updating as well- current plan is to bump that to a separate PR, but I'm happy to do it here as well if preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61429 Reviewed By: gchanan Differential Revision: D29712341 Pulled By: jbschlosser fbshipit-source-id: 41f29ab746505e60f13de42a9053a6770a3aac22	2021-07-15 09:55:09 -07:00
John Shen	343cb276b0	[pytorch] Add broadcasting support to add_relu kernel (#61584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61584 add_relu is not working with broadcasting. This registers a scalar version of add_relu in native_functions that casts to tensor before calling the regular function. TensorIterator handles broadcasting analogously to existing add. ghstack-source-id: 133480068 Test Plan: python3 test/test_nn.py TestAddRelu Reviewed By: kimishpatel Differential Revision: D29641768 fbshipit-source-id: 1b0ecfdb7eaf44afed83c9e9e74160493c048cbc	2021-07-14 10:32:20 -07:00
Joel Schlosser	4d842d909b	Revert FC workaround for ReflectionPad3d (#61308 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61248 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61308 Reviewed By: iramazanli Differential Revision: D29566849 Pulled By: jbschlosser fbshipit-source-id: 8ab443ffef7fd9840d64d71afc2f2d2b8a410ddb	2021-07-12 14:19:07 -07:00
Xiao Wang	5a17cb6f44	Add channels-last support for bilinear and nearest 2d interpolation on CUDA (#56322 ) Summary: Add channels-last support for bilinear and nearest 2d interpolation on CUDA Benchmark (on 2070 Super) is available at - nearest 2d: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/nearest-2d - bilinear: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/bilinear Some regressions are seen for tensors with small channel size. We may add a heuristic to dispatch the contiguous and channels-last path if needed. Close https://github.com/pytorch/pytorch/issues/60137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56322 Reviewed By: mruberry Differential Revision: D29645980 Pulled By: ngimel fbshipit-source-id: c36dff4ee4789bec9b01da4029f326d30067c6b7	2021-07-10 18:00:50 -07:00
mingfeima	8bec478a9e	MaxPool2d: use channels_last format for both output and indice when input is channels_last (#61245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61245 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29557884 Pulled By: ezyang fbshipit-source-id: 0d2b8cbaaf13411eefd7d867021bd6028d40e5cc	2021-07-07 07:50:28 -07:00
mingfeima	652d911f81	add BFloat16 support for LayerNorm CPU (#55210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55210 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28836793 Pulled By: VitalyFedyunin fbshipit-source-id: 998298deedd7a18e45fb761a0a4e0d88b65f2e0c	2021-06-29 14:08:30 -07:00
Karen Zhou	965dad25a5	Allow resizing of parametrized tensors (#60418 ) Summary: Modify `parametrize.py` to allow resizing of parametrized tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/60418 Test Plan: `buck test mode/dev-nosan //caffe2/test:nn -- --exact 'caffe2/test:nn - test_register_and_remove_parametrization (test_nn.TestNN)'` https://pxl.cl/1L0wh Reviewed By: z-a-f Differential Revision: D29279442 Pulled By: kazhou fbshipit-source-id: 4d94915748f896e7761a40ad18f4c6444f505c3a	2021-06-28 12:57:11 -07:00
joerg-de	387289d4a5	support non-contiguous tensor in bilinear (#38409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38409 Reviewed By: anjali411 Differential Revision: D29361043 Pulled By: albanD fbshipit-source-id: 05147a9b0f7a47204bcd5ff70e281a464e8de1e6	2021-06-28 10:43:21 -07:00
Thomas J. Fan	e63db3ae46	ENH Adds byte support for nll_loss (CUDA) (#60650 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59765 This PR adds byte support for nll_loss on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60650 Reviewed By: albanD Differential Revision: D29429456 Pulled By: jbschlosser fbshipit-source-id: 894c969ed6bfc6117dee8e844a7cb5b99977247c	2021-06-28 08:20:13 -07:00
Natalia Gimelshein	5b118a7f23	Don't reference reflection_pad3d in functional.py (#60837 ) Summary: To work around FC issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/60837 Reviewed By: jbschlosser Differential Revision: D29421142 Pulled By: ngimel fbshipit-source-id: f5c1d9c324173b628e286f9005edf7109162066f	2021-06-27 20:54:32 -07:00
mingfeima	dd045ab540	add channels last for AdapativeMaxPool2d (#48920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48920 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25399467 Pulled By: VitalyFedyunin fbshipit-source-id: d9d2cc728cc7a18a26983e96d3c3e81a23659e89	2021-06-25 16:36:20 -07:00
Hongbo Zhang	ad69e2fd11	[torch] Module fix on the support of LazyModule on bug #60132 (#60517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60517 This is to fix the module support on lazymodulefixin on the bug issue #60132 Check the link: https://github.com/pytorch/pytorch/issues/60132 We will have to update lazy_extension given the dependency on module.py and update the unit test as well. Test Plan: Unit test passes torchrec test passes Reviewed By: albanD Differential Revision: D29274068 fbshipit-source-id: 1c20f7f0556e08dc1941457ed20c290868346980	2021-06-25 16:20:19 -07:00
lezcano	3a838e4ce3	Parametrizations depending on several inputs (#60530 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/58488 There was a line that had been changed in `test_nn.py` as caught in https://github.com/pytorch/pytorch/pull/58488#discussion_r651267668 I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3	2021-06-25 09:16:57 -07:00
Xiaomeng Yang	963c983366	Improve numerical stability of LayerNorm (#59987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59987 Similar as GroupNorm, improve numerical stability of LayerNorm by Welford algorithm and pairwise sum. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" Reviewed By: ngimel Differential Revision: D29115235 fbshipit-source-id: 5183346c3c535f809ec7d98b8bdf6d8914bfe790	2021-06-25 02:22:42 -07:00
mingfeima	5a077bb10b	Optimize some redunction operators on CPU BFloat16 (#55202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55202 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28836790 Pulled By: VitalyFedyunin fbshipit-source-id: f3a29633d85eb5a614652e568140e9b19509f959	2021-06-24 10:50:24 -07:00
Thomas J. Fan	99b641169b	Migrates nll_loss_forward from TH to Aten (CUDA) (#60097 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24610 Aten Umbrella issue https://github.com/pytorch/pytorch/issues/24507 Related to https://github.com/pytorch/pytorch/issues/59765 The performance does not change between this PR and master with the following benchmark script: <details> <summary>Benchmark script</summary> ```python import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): torch.cuda.synchronize() MS_PER_SECOND = 1000 return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 30 softmax = nn.LogSoftmax(dim=1) n_runs = 250 for reduction in ["none", "mean", "sum"]: for N in [100_000, 500_000, 1_000_000]: fwd_t = 0 bwd_t = 0 data = torch.randn(N, C, device=device) target = torch.empty(N, dtype=torch.long, device=device).random_(0, C) loss = nn.NLLLoss(reduction=reduction) input = softmax(data) for i in range(n_runs): t1 = _time() result = loss(input, target) t2 = _time() fwd_t = fwd_t + (t2 - t1) fwd_avg = fwd_t / n_runs print( f"input size({N}, {C}), reduction: {reduction} " f"forward time is {fwd_avg:.2f} (ms)" ) print() ``` </details> ## master ``` input size(100000, 30), reduction: none forward time is 0.02 (ms) input size(500000, 30), reduction: none forward time is 0.08 (ms) input size(1000000, 30), reduction: none forward time is 0.15 (ms) input size(100000, 30), reduction: mean forward time is 1.81 (ms) input size(500000, 30), reduction: mean forward time is 8.24 (ms) input size(1000000, 30), reduction: mean forward time is 16.46 (ms) input size(100000, 30), reduction: sum forward time is 1.66 (ms) input size(500000, 30), reduction: sum forward time is 8.24 (ms) input size(1000000, 30), reduction: sum forward time is 16.46 (ms) ``` ## this PR ``` input size(100000, 30), reduction: none forward time is 0.02 (ms) input size(500000, 30), reduction: none forward time is 0.08 (ms) input size(1000000, 30), reduction: none forward time is 0.15 (ms) input size(100000, 30), reduction: mean forward time is 1.80 (ms) input size(500000, 30), reduction: mean forward time is 8.24 (ms) input size(1000000, 30), reduction: mean forward time is 16.46 (ms) input size(100000, 30), reduction: sum forward time is 1.66 (ms) input size(500000, 30), reduction: sum forward time is 8.24 (ms) input size(1000000, 30), reduction: sum forward time is 16.46 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60097 Reviewed By: mrshenli Differential Revision: D29303099 Pulled By: ngimel fbshipit-source-id: fc0d636543a79ea81158d286dcfb84043bec079a	2021-06-23 19:47:01 -07:00
Thomas J. Fan	da030c59e7	ENH Adds Byte support for nll_loss (CPU) (#60308 ) Summary: Addresses a part of https://github.com/pytorch/pytorch/issues/59765 This PR adds byte support for nll_loss on the CPU for `input.dim() == 2`. CUDA support will be implemented when `nll_loss` migration to CUDA is completed in https://github.com/pytorch/pytorch/pull/60299 and https://github.com/pytorch/pytorch/pull/60097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60308 Reviewed By: VitalyFedyunin Differential Revision: D29329458 Pulled By: jbschlosser fbshipit-source-id: d3585c4966030bc61e451f8aa817406a8a3acf47	2021-06-23 12:16:45 -07:00
Nikita Shulga	7b2d375148	Fix convolution_depthwise3x3_winograd for multichannel output (#60460 ) Summary: Before this change it was implemented with the assumption, that number of groups, input and output channels are the same, which is not always the case Extend the implementation to support any number of output channels as long as number of groups equals to the number of input channels (i.e. kernel.size(1) == 1) Fixes https://github.com/pytorch/pytorch/issues/60176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60460 Reviewed By: albanD Differential Revision: D29299693 Pulled By: malfet fbshipit-source-id: 31130c71ce86535ccfba2f4929eee3e2e287b2f0	2021-06-23 10:38:14 -07:00
Ilqar Ramazanli	79dc500a99	Add error message for sequence length to be equal to 0 case for RNNs (#60269 ) Summary: Fixes #https://github.com/pytorch/pytorch/issues/50192 It has been discussed in the issue that, currently RNN apis do not support inputs with `seq_len=0` and the error message does not reflect this issue clearly. This PR is suggesting a solution to this issue, by adding a more clear error message that, none of RNN api (nn.RNN, nn.GRU and nn.LSTM) do not support `seq_len=0` for neither one-directional nor bi-directional layers. ``` import torch input_size = 5 hidden_size = 6 rnn = torch.nn.GRU(input_size, hidden_size) for seq_len in reversed(range(4)): output, h_n = rnn(torch.zeros(seq_len, 10, input_size)) print('{}, {}'.format(output.shape, h_n.shape)) ``` Previously was giving output as : ``` torch.Size([3, 10, 6]), torch.Size([1, 10, 6]) torch.Size([2, 10, 6]), torch.Size([1, 10, 6]) torch.Size([1, 10, 6]), torch.Size([1, 10, 6]) Traceback (most recent call last): File "test.py", line 8, in <module> output, h_n = rnn(torch.zeros(seq_len, 10, input_size)) File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 739, in forward result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: stack expects a non-empty TensorList ``` However, after adding this PR, this error message change for any combination of [RNN, GRU and LSTM] x [one-directional, bi-directional]. Let's illustrate the change with the following code snippet: ``` import torch input_size = 5 hidden_size = 6 rnn = torch.nn.LSTM(input_size, hidden_size, bidirectional=True) output, h_n = rnn(torch.zeros(0, 10, input_size)) ``` would give output as following: ``` Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/fsx/users/iramazanli/pytorch/torch/nn/modules/module.py", line 1054, in _call_impl return forward_call(input, kwargs) File "/fsx/users/iramazanli/pytorch/torch/nn/modules/rnn.py", line 837, in forward result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: Expected sequence length to be larger than 0 in RNN ``` ********************************* The change for Packed Sequence didn't seem to be necessary because from the following code snippet error message looks clear about the issue: ``` import torch import torch.nn.utils.rnn as rnn_utils import torch.nn as nn packed = rnn_utils.pack_sequence([]) ``` returns: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 398, in pack_sequence return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted) File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 363, in pad_sequence return torch._C._nn.pad_sequence(sequences, batch_first, padding_value) RuntimeError: received an empty list of sequences ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60269 Reviewed By: mrshenli Differential Revision: D29299914 Pulled By: iramazanli fbshipit-source-id: 5ca98faa28d4e6a5a2f7600a30049de384a3b132	2021-06-22 21:25:05 -07:00
Philip Meier	0c916c8a4e	up the priority of numpy array comparisons in self.assertEqual (#59067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58988. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067 Reviewed By: jbschlosser Differential Revision: D28986642 Pulled By: heitorschueroff fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0	2021-06-22 13:07:07 -07:00
Jeffrey Wan	b34965435d	Improve testing of inplace views (#59891 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/49825 by improving the testing - Rename some of the old tests that had "inplace_view" in their names, but actually mean "inplace_[update_]on_view" so there is no confusion with the naming - Adds some tests in test_view_ops that verify basic behavior - Add tests that creation meta is properly handled for no-grad, multi-output, and custom function cases - Add test that verifies that in the cross dtype view case, the inplace views won't be accounted in the backward graph on rebase as mentioned in the issue. - Update inference mode tests to also check in-place Pull Request resolved: https://github.com/pytorch/pytorch/pull/59891 Reviewed By: albanD Differential Revision: D29272546 Pulled By: soulitzer fbshipit-source-id: b12acf5f0e3f788167ebe268423cdb58481b56f6	2021-06-22 12:28:09 -07:00
Thomas J. Fan	c16f87949f	ENH Adds nn.ReflectionPad3d (#59791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27655 This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791 Reviewed By: gchanan Differential Revision: D29242015 Pulled By: jbschlosser fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56	2021-06-21 10:53:14 -07:00
Eddie Yan	3870e68644	TF32 threshold twiddling for tests (#60209 ) Summary: Following https://github.com/pytorch/pytorch/issues/59624 I observed some straggling failing tests on Ampere due to TF32 thresholds. This PR just twiddles some more thresholds to fix the (6) failing tests I saw on A100. CC Flamefire ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60209 Reviewed By: gchanan Differential Revision: D29220508 Pulled By: ngimel fbshipit-source-id: 7c83187a246e1b3a24b181334117c0ccf2baf311	2021-06-18 11:41:33 -07:00
Alban Desmaison	5c1d17e697	Revert D29100708: [pytorch][PR] Parametrizations depending on several inputs Test Plan: revert-hammer Differential Revision: D29100708 (`061e71b199`) Original commit changeset: b9e91f439cf6 fbshipit-source-id: bff6d8a3d7b24f4beb976383912033c250d91a53	2021-06-14 14:08:50 -07:00
lezcano	061e71b199	Parametrizations depending on several inputs (#58488 ) Summary: Makes possible that the first register parametrization depends on a number of parameters rather than just one. Examples of these types of parametrizations are `torch.nn.utils.weight_norm` and low rank parametrizations via the multiplication of a `n x k` tensor by a `k x m` tensor with `k <= m, n`. Follows the plan outlined in https://github.com/pytorch/pytorch/pull/33344#issuecomment-768574924. A short summary of the idea is: we call `right_inverse` when registering a parametrization to generate the tensors that we are going to save. If `right_inverse` returns a sequence of tensors, then we save them as `original0`, `original1`... If it returns a `Tensor` or a sequence of length 1, we save it as `original`. We only allow to have many-to-one parametrizations in the first parametrization registered. The next parametrizations would need to be one-to-one. There were a number of choices in the implementation: If the `right_inverse` returns a sequence of parameters, then we unpack it in the forward. This is to allow to write code as: ```python class Sum(nn.Module): def forward(self, X, Y): return X + Y def right_inverse(Z): return Z, torch.zeros_like(Z) ``` rather than having to unpack manually a list or a tuple within the `forward` function. At the moment the errors are a bit all over the place. This is to avoid having to check some properties of `forward` and `right_inverse` when they are registered. I left this like this for now, but I believe it'd be better to call these functions when they are registered to make sure the invariants hold and throw errors as soon as possible. The invariants are the following: 1. The following code should be well-formed ```python X = module.weight Y = param.right_inverse(X) assert isinstance(Y, Tensor) or isinstance(Y, collections.Sequence) Z = param(Y) if isisntance(Y, Tensor) else param(*Y) ``` in other words, if `Y` is a `Sequence` of `Tensor`s (we check also that the elements of the sequence are Tensors), then it is of the same length as the number parameters `param.forward` accepts. 2. Always: `X.dtype == Z.dtype and X.shape == Z.shape`. This is to protect the user from shooting themselves in the foot, as it's too odd for a parametrization to change the metadata of a tensor. 3. If it's one-to-one: `X.dtype == Y.dtype`. This is to be able to do `X.set_(Y)` so that if a user first instantiates the optimiser and then puts the parametrisation, then we reuse `X` and the user does not need to add a new parameter to the optimiser. Alas, this is not possible when the parametrisation is many-to-one. The current implementation of `spectral_norm` and `weight_norm` does not seem to care about this, so this would not be a regression. I left a warning in the documentation though, as this case is a bit tricky. I'm still missing to go over the formatting of the documentation, I'll do that tomorrow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58488 Reviewed By: soulitzer Differential Revision: D29100708 Pulled By: albanD fbshipit-source-id: b9e91f439cf6b5b54d5fa210ec97c889efb9da38	2021-06-14 11:11:47 -07:00
Xiaomeng Yang	ff15d93b88	Improve numerical stability of GroupNorm (#54921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54921 Improve numerical stability of GroupNorm Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GroupNorm" Reviewed By: ngimel Differential Revision: D27414438 fbshipit-source-id: 815517240ca5ea3e2beb77ced3bd862e9c83d445	2021-06-13 16:13:32 -07:00
lezcano	1f6e39336f	Simplify parametrizations.SpectralNorm and improve its initialization (#59564 ) Summary: Implements a number of changes discussed with soulitzer offline. In particular: - Initialise `u`, `v` in `__init__` rather than in `_update_vectors` - Initialise `u`, `v` to some reasonable vectors by doing 15 power iterations at the start - Simplify the code of `_reshape_weight_to_matrix` (and make it faster) by using `flatten` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59564 Reviewed By: ailzhang Differential Revision: D29066238 Pulled By: soulitzer fbshipit-source-id: 6a58e39ddc7f2bf989ff44fb387ab408d4a1ce3d	2021-06-11 19:52:44 -07:00
mingfeima	f3218568ad	optimize channels last for BatchNorm2d on CPU (#59286 ) Summary: replacement of https://github.com/pytorch/pytorch/issues/48919 optimize channels last performance for BatchNorm2 on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59286 Reviewed By: bdhirsh Differential Revision: D29008198 Pulled By: VitalyFedyunin fbshipit-source-id: 8a7d020bd6a42ab5c21ffe788b79a22f4ec82ac0	2021-06-11 16:30:16 -07:00
mingfeima	bb19dc14cc	add channels last support for AvgPool2d on CPU (#58725 ) Summary: replacement of: https://github.com/pytorch/pytorch/pull/48918 enable test case on AvgPool2d channels last for CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/58725 Reviewed By: ngimel Differential Revision: D28593169 Pulled By: VitalyFedyunin fbshipit-source-id: 5de870fe1d9dd961fb0dab5f9d531ab14614a160	2021-06-09 21:06:45 -07:00
Kimish Patel	c5bee1ec4f	[PyTorch] Parallelize gelu via tensoriterator (#58950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58950 Use tensor iterator's API to set grain size in order to parallelize gelu op. ghstack-source-id: 130947174 Test Plan: test_gelu Reviewed By: ezyang Differential Revision: D28689819 fbshipit-source-id: 0a02066d47a4d9648323c5ec27d7e0e91f4c303a	2021-06-09 16:09:38 -07:00
Alexander Grund	804f924504	Fix accuraccy failures when running test_nn on A100s (#59624 ) Summary: Make sure tests run explicitely without TF32 don't use TF32 operations Fixes https://github.com/pytorch/pytorch/issues/52278 After the tf32 accuracy tolerance was increased to 0.05 this is the only remaining change required to fix the above issue (for TestNN.test_Conv3d_1x1x1_no_bias_cuda) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59624 Reviewed By: heitorschueroff Differential Revision: D28996279 Pulled By: ngimel fbshipit-source-id: 7f1b165fd52cfa0898a89190055b7a4b0985573a	2021-06-09 14:38:34 -07:00
Nikita Vedeneev	c51abf8fca	Make `binary_cross_entropy` differentiable wrt `target` (#59447 ) Summary: As per title. Resolves https://github.com/pytorch/pytorch/issues/56683. `gradgradcheck` will fail once `target.requires_grad() == True` because of the limitations of the current double backward implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59447 Reviewed By: agolynski Differential Revision: D28910140 Pulled By: albanD fbshipit-source-id: 20934880eb4d22bec34446a6d1be0a38ef95edc7	2021-06-07 09:20:17 -07:00
Thomas J. Fan	7f2e620105	FIX Validates that weights are 2d in embedding (#59314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55185 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59314 Reviewed By: H-Huang Differential Revision: D28837753 Pulled By: jbschlosser fbshipit-source-id: 683378244c61b0937c95563f91ef87ab09fd1653	2021-06-02 12:52:21 -07:00
Jagadish Krishnamoorthy	95c26b2806	[ROCm] disable test test_Conv2d_groups_nobias for ROCm (#59158 ) Summary: Disabling the test since its failing in ROCm4.2 Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59158 Reviewed By: mruberry Differential Revision: D28808953 Pulled By: ngimel fbshipit-source-id: 134f147ead6dc559d2cde49cf8343cd976e6c224	2021-06-01 15:10:06 -07:00
Joel Schlosser	ef32a29c97	Back out "[pytorch][PR] ENH Adds dtype to nn.functional.one_hot" (#59080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59080 Original commit changeset: 3686579517cc Test Plan: None; reverting diff Reviewed By: albanD Differential Revision: D28746799 fbshipit-source-id: 75a7885ab0bf3abadde9a42b56d479f71f57c89c	2021-05-27 15:40:52 -07:00
Adnios	09a8f22bf9	Add mish activation function (#58648 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4	2021-05-25 10:36:21 -07:00
Thomas J. Fan	a7f4f80903	ENH Adds dtype to nn.functional.one_hot (#58090 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33046 Related to https://github.com/pytorch/pytorch/issues/53785 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58090 Reviewed By: zou3519 Differential Revision: D28640893 Pulled By: jbschlosser fbshipit-source-id: 3686579517ccc75beaa74f0f6d167f5e40a83fd2	2021-05-24 13:48:25 -07:00
Joel Schlosser	c58709b7bb	Helper function for skipping module parameter / buffer initialization (#57555 ) Summary: This PR introduces a helper function named `torch.nn.utils.skip_init()` that accepts a module class object + `args` / `kwargs` and instantiates the module while skipping initialization of parameter / buffer values. See discussion at https://github.com/pytorch/pytorch/issues/29523 for more context. Example usage: ```python import torch m = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1) print(m.weight) m2 = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1, device='cuda') print(m2.weight) m3 = torch.nn.utils.skip_init(torch.nn.Linear, in_features=5, out_features=1) print(m3.weight) ``` ``` Parameter containing: tensor([[-3.3011e+28, 4.5915e-41, -3.3009e+28, 4.5915e-41, 0.0000e+00]], requires_grad=True) Parameter containing: tensor([[-2.5339e+27, 4.5915e-41, -2.5367e+27, 4.5915e-41, 0.0000e+00]], device='cuda:0', requires_grad=True) Parameter containing: tensor([[1.4013e-45, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]], requires_grad=True) ``` Bikeshedding on the name / namespace is welcome, as well as comments on the design itself - just wanted to get something out there for discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57555 Reviewed By: zou3519 Differential Revision: D28640613 Pulled By: jbschlosser fbshipit-source-id: 5654f2e5af5530425ab7a9e357b6ba0d807e967f	2021-05-24 11:28:32 -07:00
Kyle Chen	52a8031e8c	[ROCm] disable test test_Conv2d_groups_nobias_v2 for ROCm (#58701 ) Summary: Disable test_Conv2d_groups_nobias_v2 test because it is failing on ROCm 4.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58701 Reviewed By: ngimel Differential Revision: D28626651 Pulled By: mruberry fbshipit-source-id: a74bdf45335ae2afee0aa5e3bece6e208e75a63f	2021-05-23 15:43:36 -07:00
Rong Rong (AI Infra)	c1c9be16c4	port mm to structure kernel (#57755 ) Summary: relate to https://github.com/pytorch/pytorch/issues/57417. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57755 Reviewed By: ezyang Differential Revision: D28426111 Pulled By: walterddr fbshipit-source-id: 943d3e36433ca846990b940177fb040553961156	2021-05-22 19:24:14 -07:00
Thomas J. Fan	151ec56311	ENH Adds check for input sizes in cosine_similarity (#58559 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55273 Adds check for input sizes to be consistent with the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58559 Reviewed By: soulitzer Differential Revision: D28562376 Pulled By: ailzhang fbshipit-source-id: f292e8a26f11a40d146fbed94a28025794808216	2021-05-20 11:40:06 -07:00
Thomas J. Fan	ee93a348de	ENH Raises nicer error when calling module.train with invalid modes (#58247 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46763 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58247 Reviewed By: ejguan Differential Revision: D28418080 Pulled By: albanD fbshipit-source-id: fef8f4f641ef75e801ed8b8d04c4016579aea8b0	2021-05-17 05:57:18 -07:00
Vitaly Fedyunin	49a8942a77	Revert D25399466: add channels last support for AvgPool2d on CPU Test Plan: revert-hammer Differential Revision: D25399466 (`8ac0917cc7`) Original commit changeset: 9477b0c281c0 fbshipit-source-id: e0245f0e390f5eca228445fd82d6e5142a827abc	2021-05-14 12:45:29 -07:00
Vitaly Fedyunin	0caec739a3	Revert D25399468: optimize channels last for BatchNorm2d on CPU Test Plan: revert-hammer Differential Revision: D25399468 (`0be334a1ba`) Original commit changeset: a4cd7a09cd4e fbshipit-source-id: cef74881adcdf193355fa5a77e816addd2e2c56e	2021-05-14 12:44:14 -07:00
mingfeima	0be334a1ba	optimize channels last for BatchNorm2d on CPU (#48919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48919 move data indexing utils parallel inference contiguous path parallel inference channels last path add dim apply optimize update stats add channels last support for backward Revert "add channels last support for backward" This reverts commit cc5e29dce44395250f8e2abf9772f0b99f4bcf3a. Revert "optimize update stats" This reverts commit 7cc6540701448b9cfd5833e36c745b5015ae7643. Revert "add dim apply" This reverts commit b043786d8ef72dee5cf85b5818fcb25028896ecd. bug fix add batchnorm nhwc test for cpu, including C=1 and HW=1 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399468 Pulled By: VitalyFedyunin fbshipit-source-id: a4cd7a09cd4e1a8f5cdd79c7c32c696d0db386bd	2021-05-14 11:09:42 -07:00
Peter Bell	064923e635	Improve native_batch_norm_backward performance (CUDA) (#58240 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38915 The original code uses a single kernel to do both the reduction and the elementwise backward calculations. Whereas the `SyncBatchNorm` kernels are split, which makes them slightly slower in some cases. I try to use the fused kernel when it's beneficial, but otherwise choose the optimized channels last split kernels. There is also eval mode, where the reduction is sometimes unnecessary in which case split kernels are a win even without channels last. Benchmarks on my system show significant speedups for channels last reductions and eval mode, with only a few minor slowdowns in training mode. These slowdowns are for 2 x 2048 shape in training, which is a small channels last inputs. But for larger batches or channels, the channels last kernels are much faster. \|N \|C \|L \|training\|backward\|old \|new \|cudnn \| \|----\|----\|----\|--------\|--------\|------\|------\|------\| \|1 \|256 \|3136\|TRUE \|all \|70.25 \|64.93 \|68.90 \| \| \| \| \|TRUE \|self \|69.77 \|64.61 \|69.42 \| \| \| \| \|FALSE \|all \|70.10 \|51.12 \|x \| \| \| \| \|FALSE \|self \|70.17 \|51.17 \|x \| \|3136\|256 \| \|TRUE \|all \|554.08\|76.63 \|549.88\| \| \| \| \|TRUE \|self \|553.34\|78.19 \|552.36\| \| \| \| \|FALSE \|all \|565.40\|55.09 \|x \| \| \| \| \|FALSE \|self \|565.71\|54.84 \|x \| \|2 \|8192\|1 \|TRUE \|all \|155.47\|47.26 \|202.26\| \| \| \| \|TRUE \|self \|155.46\|48.36 \|203.72\| \| \| \| \|FALSE \|all \|178.28\|40.90 \|x \| \| \| \| \|FALSE \|self \|178.21\|40.69 \|x \| \|2 \|2048\|1 \|TRUE \|all \|43.50 \|48.21 \|57.47 \| \| \| \| \|TRUE \|self \|43.63 \|47.24 \|55.22 \| \| \| \| \|FALSE \|all \|49.36 \|39.27 \|x \| \| \| \| \|FALSE \|self \|49.25 \|42.02 \|x \| \|128 \|8192\|1 \|TRUE \|all \|762.70\|106.45\|336.52\| \| \| \| \|TRUE \|self \|765.79\|107.04\|337.32\| \| \| \| \|FALSE \|all \|792.68\|74.94 \|x \| \| \| \| \|FALSE \|self \|793.86\|74.83 \|x \| \|128 \|2048\|1 \|TRUE \|all \|188.37\|46.20 \|85.02 \| \| \| \| \|TRUE \|self \|188.47\|47.57 \|85.04 \| \| \| \| \|FALSE \|all \|191.57\|40.44 \|x \| \| \| \| \|FALSE \|self \|190.13\|41.55 \|x \| \|2 \|8192\| \|TRUE \|all \|156.03\|43.01 \|155.19\| \| \| \| \|TRUE \|self \|156.24\|46.59 \|156.93\| \| \| \| \|FALSE \|all \|179.34\|40.06 \|x \| \| \| \| \|FALSE \|self \|179.20\|41.85 \|x \| \|2 \|2048\| \|TRUE \|all \|44.05 \|50.15 \|44.21 \| \| \| \| \|TRUE \|self \|44.10 \|48.97 \|44.11 \| \| \| \| \|FALSE \|all \|49.72 \|40.95 \|x \| \| \| \| \|FALSE \|self \|49.87 \|43.43 \|x \| \|128 \|8192\| \|TRUE \|all \|775.19\|96.60 \|777.64\| \| \| \| \|TRUE \|self \|776.20\|96.85 \|774.21\| \| \| \| \|FALSE \|all \|797.64\|68.01 \|x \| \| \| \| \|FALSE \|self \|806.25\|68.05 \|x \| \|128 \|2048\| \|TRUE \|all \|188.49\|48.10 \|188.97\| \| \| \| \|TRUE \|self \|188.07\|46.97 \|187.98\| \| \| \| \|FALSE \|all \|192.32\|43.78 \|x \| \| \| \| \|FALSE \|self \|193.72\|40.82 \|x \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/58240 Reviewed By: bdhirsh Differential Revision: D28435158 Pulled By: ngimel fbshipit-source-id: bf62a1ee1c5d95a2caf55bee6176ae5c965688ec	2021-05-14 09:29:05 -07:00
Freey0	cf1daf571d	Port silu to structured (#58050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58050 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28382790 Pulled By: ezyang fbshipit-source-id: 5aeedfe39b5f15d14022d1e9edec1b30e98e5076	2021-05-14 00:49:10 -07:00
Freey0	f23e10f27b	Port softshrink to structured (#57623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57623 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224703 Pulled By: ezyang fbshipit-source-id: 62e40d53eb130205f6c4d2775082e436e6adadce	2021-05-14 00:49:09 -07:00
Freey0	401d0fe8c5	Port leaky_relu to structured (#57621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57621 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224706 Pulled By: ezyang fbshipit-source-id: 168b175d0fd9e0cc3335ea00df4c7967fea77819	2021-05-14 00:49:05 -07:00
Freey0	9dba26eed1	Port softplus to structured (#57620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57620 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224705 Pulled By: ezyang fbshipit-source-id: a48419f5958e4d29427fb1fec7ff929f0297e4e4	2021-05-14 00:49:04 -07:00
Freey0	03398b7edb	Port elu to structured (#57619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57619 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224707 Pulled By: ezyang fbshipit-source-id: 9e1cad3f5536c65ada2d951366de134ebcb6bb3f	2021-05-14 00:47:41 -07:00
mingfeima	8ac0917cc7	add channels last support for AvgPool2d on CPU (#48918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48918 enable test case on AvgPool2d channels last for CPU Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399466 Pulled By: VitalyFedyunin fbshipit-source-id: 9477b0c281c0de5ed981a97e2dcbe6072d7f0aef	2021-05-13 18:05:57 -07:00
Jeffrey Wan	e1bb9d2d99	Reimplement spectral_norm using new parametrization functionality (#57784 ) Summary: Adds a new file under `torch/nn/utils/parametrizations.py` which should contain all the parametrization implementations For spectral_norm we add the `SpectralNorm` module which can be registered using `torch.nn.utils.parametrize.register_parametrization` or using a wrapper: `spectral_norm`, the same API the old implementation provided. Most of the logic is borrowed from the old implementation: - Just like the old implementation, there should be cases when retrieving the weight should perform another power iteration (thus updating the weight) and cases where it shouldn't. For example in eval mode `self.training=True`, we do not perform power iteration. There are also some differences/difficulties with the new implementation: - Using new parametrization functionality as-is there doesn't seem to be a good way to tell whether a 'forward' call was the result of parametrizations are unregistered (and leave_parametrizations=True) or when the injected property's getter was invoked. The issue is that we want perform power iteration in the latter case but not the former, but we don't have this control as-is. So, in this PR I modified the parametrization functionality to change the module to eval mode before triggering their forward call - Updates the vectors based on weight on initialization to fix https://github.com/pytorch/pytorch/issues/51800 (this avoids silently update weights in eval mode). This also means that we perform twice any many power iterations by the first forward. - right_inverse is just the identity for now, but maybe it should assert that the passed value already satisfies the constraints - So far, all the old spectral_norm tests have been cloned, but maybe we don't need so much testing now that the core functionality is already well tested Pull Request resolved: https://github.com/pytorch/pytorch/pull/57784 Reviewed By: ejguan Differential Revision: D28413201 Pulled By: soulitzer fbshipit-source-id: e8f1140f7924ca43ae4244c98b152c3c554668f2	2021-05-13 14:16:13 -07:00
lezcano	d8c6b74b0b	Deprecate torch.solve (#57741 ) Summary: Deprecate deprecate deprecate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57741 Reviewed By: agolynski Differential Revision: D28379337 Pulled By: mruberry fbshipit-source-id: a7a35ce1d3f25d8593698d89761c6c2d940db31a	2021-05-13 09:54:21 -07:00
Natalia Gimelshein	e573987bea	remove syncs in one_hot (#57902 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55579 Now on cuda one-hot relies on device-side asserts thrown by scatter Pull Request resolved: https://github.com/pytorch/pytorch/pull/57902 Reviewed By: bdhirsh Differential Revision: D28328698 Pulled By: ngimel fbshipit-source-id: 1cd13e2c123c733cde7dbe4cbe6ff5335063bb70	2021-05-11 17:54:08 -07:00
Sigmund_Rolfsjord	8b12c8e8b3	Fixes: register_full_backward_hook crash if first argument don't require a gradient (#57944 ) (#57945 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57945 Reviewed By: mruberry Differential Revision: D28351929 Pulled By: albanD fbshipit-source-id: d0db898e6bf13d1877cd81892a5a65c7854c8102	2021-05-11 15:07:35 -07:00
Zheng Yan	ee48bd089c	Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants (#55189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55189 Currently EmbeddingBag and it variants support either int32 or int64 indices/offsets. We have use cases where there are mix of int32 and int64 indices which are not supported yet. To avoid introducing too many branches we could simply cast offsets type to indices type when they are not the same. Test Plan: unit tests Reviewed By: allwu Differential Revision: D27482738 fbshipit-source-id: deeadd391d49ff65d17d016092df1839b82806cc	2021-05-10 23:23:50 -07:00
Thomas J. Fan	3ec16035f2	TST Migrates some of test_nn.py from assertEqualIgnoreTypes to assertEqual (#57642 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38095, https://github.com/pytorch/pytorch/issues/50006 Migrates some of `test_nn.py` from `assertEqualIgnoreTypes` to `assertEqual` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57642 Reviewed By: bdhirsh Differential Revision: D28317761 Pulled By: mruberry fbshipit-source-id: 6bea6f669569922b2a391d1523917edde976f014	2021-05-10 23:10:29 -07:00
Richard Zou	0787d781c5	Fix compatibility problem with LSTMs and torch.save (#57558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57558 Fixes #53359 If someone directly saves an nn.LSTM in PyTorch 1.7 and then loads it in PyTorch 1.8, it errors out with the following: ``` (In PyTorch 1.7) import torch model = torch.nn.LSTM(2, 3) torch.save(model, 'lstm17.pt') (In PyTorch 1.8) model = torch.load('lstm17.pt') AttributeError: 'LSTM' object has no attribute 'proj_size' ``` Although we do not officially support this (directly saving modules via torch.save), it used to work and the fix is very simple. This PR adds an extra line to `__setstate__`: if the state we are passed does not have a `proj_size` attribute, we assume it was saved from PyTorch 1.7 and older and set `proj_size` equal to 0. Test Plan: I wrote a test that tests `__setstate__`. But also, Run the following: ``` (In PyTorch 1.7) import torch x = torch.ones(32, 5, 2) model = torch.nn.LSTM(2, 3) torch.save(model, 'lstm17.pt') y17 = model(x) (Using this PR) model = torch.load('lstm17.pt') x = torch.ones(32, 5, 2) y18 = model(x) ``` and finally compare y17 and y18. Reviewed By: mrshenli Differential Revision: D28198477 Pulled By: zou3519 fbshipit-source-id: e107d1ebdda23a195a1c3574de32a444eeb16191	2021-05-05 07:36:13 -07:00
Xiao Wang	ac72881f3f	Fix a numerical issue of CUDA channels-last SyncBatchNorm (#57077 ) Summary: Fix a numerical issue of CUDA channels-last SyncBatchNorm The added test is a repro for the numerical issue. Thanks for the help from jjsjann123 who identified the root cause. Since pytorch SBN channels-last code was migrated from [nvidia/apex](https://github.com/nvidia/apex), apex SBN channels-last also has this issue. We will submit a fix there soon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57077 Reviewed By: mruberry Differential Revision: D28107672 Pulled By: ngimel fbshipit-source-id: 0c80e79ddb48891058414ad8a9bedd80f0f7f8df	2021-04-29 21:38:52 -07:00
Joel Schlosser	f7fba854bf	Implement module.to_empty() (#56610 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56610 Reviewed By: malfet Differential Revision: D27921653 Pulled By: jbschlosser fbshipit-source-id: 10734b3eaa5b84bb4ba6eeba1043cfc8bb570a17	2021-04-27 06:19:54 -07:00
Xiao Wang	7b31ba4708	Fix cudnn ctc loss backward (#56639 ) Summary: Fix cudnn ctc loss backward Fix https://github.com/pytorch/pytorch/issues/49046, which was working in pytorch 1.1 Originally modified in this PR in Oct 2019, https://github.com/pytorch/pytorch/pull/27039/files#diff-25ec2c1108ee03e2167622588ec31d167897ef1cccb12a4cfe77eb98777316daR2383-R2392 According to the original code `90ffab6e37/tools/autograd/derivatives.yaml (L1387-L1388)` and the code after PR `f461184505/tools/autograd/templates/Functions.cpp (L2456-L2465)` This `at::zeros({0}, raw_grad.options())` in line 2460 seems suspicious, and is causing `infer_size` runtime error ``` RuntimeError: The size of tensor a (0) must match the size of tensor b (177) at non-singleton dimension 2 Exception raised from infer_size at ..\aten\src\ATen\ExpandUtils.cpp:24 (most recent call first): ``` I've modified that to `at::zeros_like(raw_grad)`, which looks more accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56639 Reviewed By: mruberry Differential Revision: D27987860 Pulled By: ngimel fbshipit-source-id: 5ad65e78d017c26894fb26318a5992b0878d04d5	2021-04-25 22:51:19 -07:00
Joel Schlosser	7d2a9f2dc9	Fix instance norm input size validation + test (#56659 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45687 Fix changes the input size check for `InstanceNorm*d` to be more restrictive and correctly reject sizes with only a single spatial element, regardless of batch size, to avoid infinite variance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56659 Reviewed By: pbelevich Differential Revision: D27948060 Pulled By: jbschlosser fbshipit-source-id: 21cfea391a609c0774568b89fd241efea72516bb	2021-04-23 10:53:39 -07:00
albanD	22b151a3ba	Make sure full backward hook fire when no input requires grad (#56693 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56380 BC-breaking note: This changes the behavior of full backward hooks as they will now fire properly even if no input to the Module require gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56693 Reviewed By: ezyang Differential Revision: D27947030 Pulled By: albanD fbshipit-source-id: e8353d769ba5a2c1b6bdf3b64e2d61308cf624a2	2021-04-23 08:46:49 -07:00
Joel Schlosser	e5fda07e80	Fix: Compare input against beta * threshold in softplus backwards (#56484 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55587 The fix converts the binary `TensorIterator` used by softplus backwards to a ternary one, adding in the original input for comparison against `beta * threshold`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56484 Reviewed By: malfet Differential Revision: D27908372 Pulled By: jbschlosser fbshipit-source-id: 73323880a5672e0242879690514a17886cbc29cd	2021-04-23 07:58:51 -07:00
Kurt Mohler	1f04494c0e	Consolidate nondeterministic error tests (#55631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55631 Reviewed By: malfet Differential Revision: D27909953 Pulled By: mruberry fbshipit-source-id: 9115b2433f9c276555be55bd51b270a7a2846829	2021-04-22 23:37:01 -07:00
Jeffrey Wan	d01302431c	Enable fast gradcheck for real inputs and outputs (#55237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55237 In this PR, we reenable fast-gradcheck and resolve misc issues that arise: Before landing this PR, land #55182 so that slow tests are still being run periodically. Bolded indicates the issue is handled in this PR, otherwise it is handled in a previous PR. Non-determinism issues: - ops that do not have deterministic implementation (as documented https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms) - test_pad_cuda (replication_pad2d) (test_nn) - interpolate (test_nn) - cummin, cummax (scatter_add_cuda_kernel) (test_ops) - test_fn_gradgrad_prod_cpu_float64 (test_ops) Randomness: - RRelu (new module tests) - we fix by using our own generator as to avoid messing with user RNG state (handled in #54480) Numerical precision issues: - jacobian mismatch: test_gelu (test_nn, float32, not able to replicate locally) - we fixed this by disabling for float32 (handled in previous PR) - cholesky_solve (test_linalg): #56235 handled in previous PR - cumprod (test_ops) - #56275 disabled fast gradcheck Not yet replicated: - test_relaxed_one_hot_categorical_2d (test_distributions) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27920906 fbshipit-source-id: 894dd7bf20b74f1a91a5bc24fe56794b4ee24656	2021-04-22 19:46:37 -07:00
Jeffrey Wan	2ea3c24c06	Disable flaky tests (#56279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56279 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27916606 Pulled By: soulitzer fbshipit-source-id: 60c07024f6eb818f4aa6730a5f9ff90d7bc2b80f	2021-04-22 19:45:41 -07:00
Nikita Shulga	9be2cabc45	Pass contiguous weight to NNPACK convolution (#56569 ) Summary: Added TestNN.test_conv2d_discontiguous_weight to prevent further regressions Fixes https://github.com/pytorch/pytorch/issues/55781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56569 Reviewed By: ngimel Differential Revision: D27926509 Pulled By: malfet fbshipit-source-id: fa5ce943c3e4db4aa4de1b1cba35bd399fb3c54d	2021-04-22 08:45:24 -07:00
M.L. Croci	1f0223d6bb	Fix bug in gaussian_nll_loss (#56469 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53964. cc albanD almson ## Major changes: - Overhauled the actual loss calculation so that the shapes are now correct (in functional.py) - added the missing doc in nn.functional.rst ## Minor changes (in functional.py): - I removed the previous check on whether input and target were the same shape. This is to allow for broadcasting, say when you have 10 predictions that all have the same target. - I added some comments to explain each shape check in detail. Let me know if these should be shortened/cut. Screenshots of updated docs attached. Let me know what you think, thanks! ## Edit: Description of change of behaviour (affecting BC): The backwards-compatibility is only affected for the `reduction='none'` mode. This was the source of the bug. For tensors with size (N, D), the old returned loss had size (N), as incorrect summation was happening. It will now have size (N, D) as expected. ### Example Define input tensors, all with size (2, 3). `input = torch.tensor([[0., 1., 3.], [2., 4., 0.]], requires_grad=True)` `target = torch.tensor([[1., 4., 2.], [-1., 2., 3.]])` `var = 2*torch.ones(size=(2, 3), requires_grad=True)` Initialise loss with reduction mode 'none'. We expect the returned loss to have the same size as the input tensors, (2, 3). `loss = torch.nn.GaussianNLLLoss(reduction='none')` Old behaviour: `print(loss(input, target, var)) ` `# Gives tensor([3.7897, 6.5397], grad_fn=<MulBackward0>. This has size (2).` New behaviour: `print(loss(input, target, var)) ` `# Gives tensor([[0.5966, 2.5966, 0.5966], [2.5966, 1.3466, 2.5966]], grad_fn=<MulBackward0>)` `# This has the expected size, (2, 3).` To recover the old behaviour, sum along all dimensions except for the 0th: `print(loss(input, target, var).sum(dim=1))` `# Gives tensor([3.7897, 6.5397], grad_fn=<SumBackward1>.` ![doc1](https://user-images.githubusercontent.com/26558092/115391089-f7f47b00-a1d6-11eb-8726-e4da9057aee0.png) ![doc2](https://user-images.githubusercontent.com/26558092/115391094-f925a800-a1d6-11eb-954b-afd187f42bc7.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56469 Reviewed By: jbschlosser, agolynski Differential Revision: D27894170 Pulled By: albanD fbshipit-source-id: 197890189c97c22109491c47f469336b5b03a23f	2021-04-22 07:43:48 -07:00
beningodfrey4	df1dfd879e	Fix errors when initializing Linear with 0 in_features (#56505 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56505 Reviewed By: malfet Differential Revision: D27919590 Pulled By: jbschlosser fbshipit-source-id: 462ca280051f63c31ff588c38a9e436116c0f336	2021-04-21 20:42:32 -07:00
Xiao Wang	3ec6bf5d26	Fix cuda launch error in reflection_pad2d (#56451 ) Summary: Fix https://github.com/pytorch/pytorch/issues/55222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56451 Reviewed By: malfet Differential Revision: D27912184 Pulled By: ngimel fbshipit-source-id: 3fc80273c30a68a247289d3fb698f99b92931731	2021-04-21 14:39:31 -07:00
Shai Bagon	a583b9cd86	Fixing "naive" `forward` of `ModuleList` and `ModuleDict (#48785 ) Summary: Goal: Making sure "calling"/"forwarding" a `ModuleList` or `ModuleDict` produce the intended `NotImpmentedError`. Current behavior: Currently, when naively calling `forward` user ends up with the confusing error message: ```python TypeError: forward() takes 1 positional argument but 2 were given ``` Instead of the intended `NotImplementedError.` This minor issue was brought up by vadimkantorov in issue https://github.com/pytorch/pytorch/issues/37718 [here][1], also by a confused stackoverflow user [here][2]. What this PR includes: Remove `forward` altogether from `ModuleList` and `ModuleDict` to fall back on the `_forward_unimplemented` of `Module` that properly throws `NotImplementedError` regardless of input arguments. Appropriate test was added to `test_nn.py` Fixes previous PR https://github.com/pytorch/pytorch/issues/48698 and PR https://github.com/pytorch/pytorch/issues/48783 (third time's a charm? I'm really sorry for the mess) Test added according to ngimel [request][3]. [1]: https://github.com/pytorch/pytorch/issues/37718#issuecomment-736333345 [2]: https://stackoverflow.com/q/65096679/1714410 [3]: https://github.com/pytorch/pytorch/pull/48698#issuecomment-737398693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48785 Reviewed By: zhangguanheng66 Differential Revision: D25359759 Pulled By: jbschlosser fbshipit-source-id: 28f82386f2e9a2a9b0b0b81b16dba6b79398bd34	2021-04-21 10:43:07 -07:00
mingfeima	1e03a2505f	add channels last for MaxPool2d (#56361 ) Summary: add channels last support for MaxPool2d. this one is a replacement of https://github.com/pytorch/pytorch/pull/48917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56361 Reviewed By: heitorschueroff Differential Revision: D27874142 Pulled By: VitalyFedyunin fbshipit-source-id: bc9604def9c974d7b59621fc709a39948088b992	2021-04-20 15:02:18 -07:00
eqy	42f0fe1fe3	fix misaligned access #56325 (#56403 ) Summary: CC ngimel ptrblck ref: https://github.com/pytorch/pytorch/issues/56325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56403 Reviewed By: mruberry Differential Revision: D27866625 Pulled By: ngimel fbshipit-source-id: 9dff0e9749f8de57fac6a653f685c14854611a02	2021-04-19 20:12:03 -07:00
Jeffrey Wan	dd8bfe2b93	Finish deprecation cycle for inplace view error checks (#56093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50617 Also updates the relevant tests to expect errors instead of warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/56093 Reviewed By: agolynski Differential Revision: D27806795 Pulled By: soulitzer fbshipit-source-id: 93c5c28edb1f97fa4457332c2ef4711f050ac81f	2021-04-16 10:44:58 -07:00
Jerry Zhang	0a541e23e1	[nn] Add allow_duplicate option for named_modules (#54812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54812 Needed for quantization since different attribute might refer to the same module instance Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27408376 fbshipit-source-id: cada85c4a1772d3dd9502c3f6f9a56d690d527e7	2021-04-16 01:26:16 -07:00
h6197627	f02454f957	Fix ChanelShuffle named tensor warnings (#55911 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54846 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55911 Reviewed By: agolynski Differential Revision: D27798078 Pulled By: jbschlosser fbshipit-source-id: 1ebd325ac8a21f82c395d2eafac7ef2ecd1f32b1	2021-04-15 15:36:35 -07:00
Peter Bell	1934725875	Use cascade summation in nll_loss on CPU (#55841 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55657 This also avoids summing `total_weight_val` when weights aren't supplied. Avoiding accumulated error completely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55841 Reviewed By: jbschlosser Differential Revision: D27751492 Pulled By: ngimel fbshipit-source-id: 2c2dc48f31c25dfa9db48693e3f765b179771a3c	2021-04-15 09:10:35 -07:00
S.Cao	416c18b7c9	Add a batch_first arg to Transformer / MHA modules (#55285 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25100 #43112 EDIT: pardon my inexperience since this is my first PR here, that I did not realize the doc should not have any trailing white spaces, and `[E712] comparison to False should be 'if cond is False:' or 'if not cond:'`, now both fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55285 Reviewed By: mruberry Differential Revision: D27765694 Pulled By: jbschlosser fbshipit-source-id: c34774fa065d67c0ac130de20a54e66e608bdbf4	2021-04-14 11:18:42 -07:00
Kurt Mohler	3fe4718d16	Add `padding_idx` argument to EmbeddingBag (#49237 ) Summary: This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction. This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided. Fixes https://github.com/pytorch/pytorch/issues/3194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237 Reviewed By: walterddr, VitalyFedyunin Differential Revision: D26948258 Pulled By: jbschlosser fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc	2021-04-14 09:38:01 -07:00
Vitaly Fedyunin	2bf26965e7	Revert D27710107: [pytorch][PR] Update a `batch_first` arg for transformers like GRU and LSTM. Test Plan: revert-hammer Differential Revision: D27710107 (`2237754b13`) Original commit changeset: c4363a460454 fbshipit-source-id: 5387b5deae6db43f17a7d5e0408a7d24e463d73a	2021-04-13 16:22:23 -07:00
S.Cao	2237754b13	Update a `batch_first` arg for transformers like GRU and LSTM. (#55285 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25100 #43112 EDIT: pardon my inexperience since this is my first PR here, that I did not realize the doc should not have any trailing white spaces, and `[E712] comparison to False should be 'if cond is False:' or 'if not cond:'`, now both fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55285 Reviewed By: ngimel Differential Revision: D27710107 Pulled By: jbschlosser fbshipit-source-id: c4363a4604548c0d84628c4997dd23d6b3afb4d9	2021-04-13 14:54:50 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Xiao Wang	55d45458bd	[cuDNN] Enable Conv3d channels_last_3d (#48430 ) Summary: This PR adds the functionality to use channals_last_3d, aka, NDHWC, in Conv3d. It's only enabled when cuDNN version is greater than or equal to 8.0.5. Todo: - [x] add memory_format test - [x] add random shapes functionality test Close https://github.com/pytorch/pytorch/pull/52547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48430 Reviewed By: mrshenli Differential Revision: D27641452 Pulled By: ezyang fbshipit-source-id: 0e98957cf30c50c3390903d307dd43bdafd28880	2021-04-09 07:56:49 -07:00
zsef123	3498fde20e	Add AccumulateType in AdaptiveAveragePooling3d.cu (#53607 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52719 - Changed the type(`scalar_t`) of intermediate results to `at::acc_type<scalar_t, true>` This issue occurs by decimal precision of the half precision. Follows test cases of upper issue, The value range of input tensors are [0, 1] because init by `rand`. And when the kernel size 1, summations all target values and divide numel of kernel `34d9278c19/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu (L94-L95)` When adding [0, 1] values, if `sum` more than 2048 then not changed values. ( Even if the value is small, the mored exact value is added, but there are still precision issues.) (https://en.wikipedia.org/wiki/Half-precision_floating-point_format) Benchmarks - In V100 32GB, Driver : 450.80, cuda 10.1 - faster than prev <details><summary>Script</summary><p> ```import torch from torch.utils.benchmark import Timer torch.manual_seed(0) kernel_sizes = [1, 3, 5, 7, 9, 11, 13] shapes = [(12, 12, 12), (16, 16, 16), (16, 32, 32), (16, 56, 56), (16, 112, 112)] def run(batch, channel): print(f"Batch : {batch}, Channel : {channel} / (diff, diff / numel, time)") head = "\t".join(f"{str(s):30s}" for s in ["k \ shape"] + shapes) print(head) for kernel_size in kernel_sizes: kernel_size = (kernel_size, kernel_size, kernel_size) pool = torch.nn.AdaptiveAvgPool3d(kernel_size) print(f"{str(kernel_size):30s}", end="\t") for shape in shapes: x_half = torch.rand([batch, channel, shape], dtype=torch.half, device="cuda") x_float = x_half.float() y_half = pool(x_half) y_float = pool(x_float) timer = Timer("pool(x_half)", globals={"pool": pool, "x_half": x_half}) measurement = timer.blocked_autorange(min_run_time=5) diff = (y_float - y_half).abs().sum().item() diff = f"{diff:.4f}, {diff / y_half.numel():.6f}, {measurement.median 1e6 :3.2f}us" print(f"{diff:30s}", end="\t") print("") run(1, 1) run(1, 3) run(1, 54) run(1, 16) run(8, 1) run(8, 16) run(8, 54) import torch m = torch.nn.AdaptiveAvgPool3d((1,1,1)) inputs = torch.rand([8,54,16,56,56]) inputs = inputs.cuda() inputs_2 = inputs.half() print("Float") out = m(inputs).float() print("half") out2 = m(inputs_2).float() print('Discepancies', torch.sum(torch.abs(out2- out)).item(), torch.sum(torch.abs(out2- out)).item() / out.numel() , out.numel()) print("Sum : ", torch.sum(inputs, dim=(2,3,4))[0, 0], torch.sum(inputs_2, dim=(2,3,4))[0, 0]) ``` </p> </details> <details><summary>This commit</summary><p> ``` Batch : 1, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0001, 0.000078, 55.73us 0.0001, 0.000079, 117.51us 0.0000, 0.000003, 379.60us 0.0000, 0.000046, 1046.21us 0.0001, 0.000139, 3897.17us (3, 3, 3) 0.0021, 0.000076, 22.04us 0.0031, 0.000115, 21.47us 0.0022, 0.000080, 41.63us 0.0030, 0.000111, 100.59us 0.0025, 0.000091, 295.04us (5, 5, 5) 0.0103, 0.000083, 21.65us 0.0097, 0.000078, 21.37us 0.0103, 0.000083, 21.60us 0.0114, 0.000091, 25.69us 0.0107, 0.000085, 97.06us (7, 7, 7) 0.0312, 0.000091, 21.52us 0.0290, 0.000084, 21.61us 0.0311, 0.000091, 21.60us 0.0309, 0.000090, 21.44us 0.0334, 0.000097, 33.60us (9, 9, 9) 0.0646, 0.000089, 21.57us 0.0672, 0.000092, 21.89us 0.0662, 0.000091, 21.89us 0.0684, 0.000094, 27.64us 0.0660, 0.000091, 54.85us (11, 11, 11) 0.1251, 0.000094, 21.68us 0.1194, 0.000090, 21.70us 0.1202, 0.000090, 21.72us 0.1233, 0.000093, 22.25us 0.1229, 0.000092, 41.39us (13, 13, 13) 0.2038, 0.000093, 21.57us 0.2047, 0.000093, 21.58us 0.1964, 0.000089, 21.54us 0.2021, 0.000092, 21.94us 0.1989, 0.000091, 40.01us Batch : 1, Channel : 3 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0003, 0.000110, 55.74us 0.0003, 0.000093, 118.62us 0.0003, 0.000093, 382.12us 0.0001, 0.000040, 1052.33us 0.0003, 0.000114, 3917.90us (3, 3, 3) 0.0073, 0.000090, 21.84us 0.0075, 0.000093, 22.25us 0.0072, 0.000089, 41.78us 0.0070, 0.000087, 100.27us 0.0069, 0.000086, 293.96us (5, 5, 5) 0.0353, 0.000094, 22.57us 0.0325, 0.000087, 21.64us 0.0343, 0.000092, 22.63us 0.0338, 0.000090, 25.82us 0.0332, 0.000089, 97.16us (7, 7, 7) 0.0937, 0.000091, 22.50us 0.0910, 0.000088, 21.92us 0.0933, 0.000091, 21.99us 0.0948, 0.000092, 21.56us 0.0928, 0.000090, 34.17us (9, 9, 9) 0.1957, 0.000089, 21.68us 0.1984, 0.000091, 21.57us 0.2025, 0.000093, 22.10us 0.1986, 0.000091, 27.66us 0.2020, 0.000092, 55.32us (11, 11, 11) 0.3585, 0.000090, 21.75us 0.3684, 0.000092, 22.70us 0.3706, 0.000093, 21.67us 0.3752, 0.000094, 21.86us 0.3663, 0.000092, 41.22us (13, 13, 13) 0.5931, 0.000090, 21.67us 0.6056, 0.000092, 21.79us 0.6005, 0.000091, 21.79us 0.6112, 0.000093, 21.69us 0.6034, 0.000092, 40.02us Batch : 1, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0051, 0.000095, 55.76us 0.0060, 0.000112, 118.60us 0.0036, 0.000067, 381.50us 0.0054, 0.000100, 1054.03us 0.0048, 0.000089, 4888.68us (3, 3, 3) 0.1332, 0.000091, 21.66us 0.1344, 0.000092, 22.62us 0.1354, 0.000093, 45.72us 0.1364, 0.000094, 106.63us 0.1324, 0.000091, 448.31us (5, 5, 5) 0.6221, 0.000092, 22.48us 0.6220, 0.000092, 21.71us 0.6053, 0.000090, 27.65us 0.6137, 0.000091, 31.40us 0.6209, 0.000092, 172.78us (7, 7, 7) 1.6859, 0.000091, 22.42us 1.6972, 0.000092, 21.96us 1.6849, 0.000091, 23.14us 1.7012, 0.000092, 26.25us 1.6920, 0.000091, 75.58us (9, 9, 9) 3.5811, 0.000091, 21.73us 3.5746, 0.000091, 22.55us 3.6237, 0.000092, 27.66us 3.6046, 0.000092, 59.71us 3.6392, 0.000092, 168.15us (11, 11, 11) 6.5582, 0.000091, 22.05us 6.5746, 0.000091, 21.74us 6.5955, 0.000092, 32.91us 6.5644, 0.000091, 45.57us 6.5697, 0.000091, 114.01us (13, 13, 13) 10.6384, 0.000090, 21.81us 10.8608, 0.000092, 21.79us 10.8375, 0.000091, 37.01us 10.8662, 0.000092, 51.80us 10.8593, 0.000092, 123.19us Batch : 1, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0015, 0.000093, 55.75us 0.0012, 0.000075, 118.10us 0.0013, 0.000079, 379.25us 0.0012, 0.000075, 1047.21us 0.0013, 0.000079, 4451.57us (3, 3, 3) 0.0407, 0.000094, 21.82us 0.0395, 0.000091, 21.69us 0.0385, 0.000089, 42.07us 0.0397, 0.000092, 100.33us 0.0384, 0.000089, 363.31us (5, 5, 5) 0.1858, 0.000093, 21.76us 0.1799, 0.000090, 21.63us 0.1834, 0.000092, 21.76us 0.1890, 0.000095, 26.04us 0.1814, 0.000091, 135.32us (7, 7, 7) 0.4937, 0.000090, 21.65us 0.5076, 0.000092, 21.69us 0.5001, 0.000091, 22.31us 0.4988, 0.000091, 21.59us 0.5123, 0.000093, 50.03us (9, 9, 9) 1.0678, 0.000092, 21.73us 1.0752, 0.000092, 21.75us 1.0673, 0.000091, 21.75us 1.0649, 0.000091, 30.01us 1.0786, 0.000092, 70.92us (11, 11, 11) 1.9591, 0.000092, 21.57us 1.9522, 0.000092, 21.60us 1.9566, 0.000092, 21.73us 1.9475, 0.000091, 23.46us 1.9323, 0.000091, 55.02us (13, 13, 13) 3.1784, 0.000090, 22.02us 3.2165, 0.000092, 21.95us 3.1969, 0.000091, 21.92us 3.2061, 0.000091, 24.40us 3.2578, 0.000093, 56.00us Batch : 8, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0010, 0.000122, 55.74us 0.0009, 0.000114, 118.82us 0.0006, 0.000074, 379.80us 0.0009, 0.000107, 1047.31us 0.0008, 0.000102, 3900.36us (3, 3, 3) 0.0219, 0.000101, 21.57us 0.0200, 0.000093, 21.61us 0.0194, 0.000090, 41.74us 0.0208, 0.000096, 99.91us 0.0212, 0.000098, 293.03us (5, 5, 5) 0.0906, 0.000091, 21.46us 0.0911, 0.000091, 21.60us 0.0934, 0.000093, 21.93us 0.0927, 0.000093, 25.74us 0.0913, 0.000091, 96.85us (7, 7, 7) 0.2530, 0.000092, 22.53us 0.2526, 0.000092, 22.46us 0.2558, 0.000093, 22.03us 0.2542, 0.000093, 22.29us 0.2475, 0.000090, 34.44us (9, 9, 9) 0.5305, 0.000091, 22.34us 0.5368, 0.000092, 22.42us 0.5265, 0.000090, 21.74us 0.5370, 0.000092, 27.81us 0.5416, 0.000093, 55.65us (11, 11, 11) 0.9887, 0.000093, 21.80us 0.9660, 0.000091, 21.61us 0.9793, 0.000092, 22.11us 0.9719, 0.000091, 21.80us 0.9650, 0.000091, 43.90us (13, 13, 13) 1.6024, 0.000091, 21.87us 1.6198, 0.000092, 22.65us 1.6242, 0.000092, 21.73us 1.6236, 0.000092, 22.59us 1.6025, 0.000091, 42.77us Batch : 8, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0113, 0.000088, 56.66us 0.0117, 0.000091, 119.57us 0.0130, 0.000102, 389.57us 0.0110, 0.000086, 1433.78us 0.0119, 0.000093, 5217.61us (3, 3, 3) 0.3209, 0.000093, 21.54us 0.3184, 0.000092, 22.87us 0.3115, 0.000090, 51.00us 0.3171, 0.000092, 164.17us 0.3182, 0.000092, 500.60us (5, 5, 5) 1.4391, 0.000090, 22.39us 1.4577, 0.000091, 21.69us 1.4601, 0.000091, 53.87us 1.4626, 0.000091, 93.65us 1.4567, 0.000091, 370.11us (7, 7, 7) 4.0501, 0.000092, 22.34us 4.0230, 0.000092, 31.45us 4.0381, 0.000092, 45.19us 4.0171, 0.000091, 65.35us 4.0108, 0.000091, 164.76us (9, 9, 9) 8.5360, 0.000091, 22.80us 8.5456, 0.000092, 27.24us 8.5461, 0.000092, 50.23us 8.5677, 0.000092, 117.63us 8.5645, 0.000092, 270.46us (11, 11, 11) 15.5521, 0.000091, 26.56us 15.5826, 0.000091, 32.81us 15.6014, 0.000092, 63.82us 15.5620, 0.000091, 96.87us 15.5722, 0.000091, 220.24us (13, 13, 13) 25.4146, 0.000090, 32.91us 25.7898, 0.000092, 38.48us 25.6698, 0.000091, 72.02us 25.8193, 0.000092, 121.73us 25.7718, 0.000092, 249.71us Batch : 8, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0377, 0.000087, 109.07us 0.0405, 0.000094, 233.17us 0.0392, 0.000091, 998.97us 0.0393, 0.000091, 2960.68us 0.0408, 0.000094, 11879.53us (3, 3, 3) 1.0660, 0.000091, 25.68us 1.0761, 0.000092, 64.12us 1.0725, 0.000092, 182.50us 1.0801, 0.000093, 505.82us 1.0736, 0.000092, 1650.21us (5, 5, 5) 4.9587, 0.000092, 50.84us 4.9336, 0.000091, 47.38us 4.9696, 0.000092, 158.49us 4.9347, 0.000091, 237.39us 4.9303, 0.000091, 965.13us (7, 7, 7) 13.5409, 0.000091, 45.60us 13.5736, 0.000092, 87.45us 13.5012, 0.000091, 141.63us 13.6111, 0.000092, 181.51us 13.5296, 0.000091, 469.77us (9, 9, 9) 28.7817, 0.000091, 58.01us 28.7969, 0.000091, 77.61us 28.8761, 0.000092, 159.33us 28.8786, 0.000092, 334.47us 28.8093, 0.000091, 786.72us (11, 11, 11) 52.4453, 0.000091, 78.19us 52.7265, 0.000092, 95.12us 52.7322, 0.000092, 200.38us 52.6342, 0.000092, 282.41us 52.6467, 0.000092, 652.54us (13, 13, 13) 85.7411, 0.000090, 98.85us 86.7183, 0.000091, 115.28us 86.8545, 0.000092, 232.34us 86.9997, 0.000092, 367.32us 86.9083, 0.000092, 757.73us Float half Discepancies 0.03963914513587952 9.175728040712852e-05 432 Sum : tensor(25110.1484, device='cuda:0') tensor(25104., device='cuda:0', dtype=torch.float16) ``` </p> </details> <details><summary>1.8.0</summary><p> ``` Batch : 1, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0023, 0.002275, 74.35us 0.0040, 0.003985, 159.73us 0.3740, 0.374021, 546.59us 0.4587, 0.458663, 1543.16us 0.4906, 0.490637, 5945.97us (3, 3, 3) 0.0100, 0.000370, 20.37us 0.0230, 0.000852, 22.12us 0.0309, 0.001143, 54.75us 0.0520, 0.001926, 129.78us 7.1219, 0.263775, 377.11us (5, 5, 5) 0.0441, 0.000352, 20.06us 0.0394, 0.000316, 20.50us 0.0759, 0.000607, 26.43us 0.1499, 0.001199, 32.01us 0.2707, 0.002166, 128.15us (7, 7, 7) 0.0791, 0.000231, 20.10us 0.1002, 0.000292, 20.56us 0.1812, 0.000528, 20.48us 0.2424, 0.000707, 20.83us 0.4994, 0.001456, 43.97us (9, 9, 9) 0.1122, 0.000154, 20.55us 0.1778, 0.000244, 20.44us 0.2572, 0.000353, 20.15us 0.4149, 0.000569, 35.64us 0.7208, 0.000989, 68.46us (11, 11, 11) 0.2044, 0.000154, 20.47us 0.2647, 0.000199, 20.62us 0.3867, 0.000291, 20.61us 0.6059, 0.000455, 23.54us 1.0902, 0.000819, 53.32us (13, 13, 13) 0.3094, 0.000141, 20.53us 0.3843, 0.000175, 20.60us 0.5756, 0.000262, 20.80us 0.8598, 0.000391, 24.52us 1.4853, 0.000676, 47.70us Batch : 1, Channel : 3 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0054, 0.001801, 74.36us 0.0108, 0.003614, 158.94us 1.1183, 0.372768, 547.67us 1.3782, 0.459387, 1545.27us 1.4685, 0.489505, 5949.17us (3, 3, 3) 0.0308, 0.000380, 20.14us 0.0502, 0.000619, 22.11us 0.1210, 0.001493, 54.80us 0.1900, 0.002345, 130.47us 21.3483, 0.263560, 375.68us (5, 5, 5) 0.1179, 0.000314, 20.68us 0.1326, 0.000354, 20.53us 0.2662, 0.000710, 26.51us 0.4116, 0.001098, 31.85us 0.8369, 0.002232, 128.19us (7, 7, 7) 0.2335, 0.000227, 20.40us 0.3057, 0.000297, 20.43us 0.4954, 0.000481, 20.31us 0.7339, 0.000713, 20.74us 1.4208, 0.001381, 44.55us (9, 9, 9) 0.3326, 0.000152, 20.63us 0.5353, 0.000245, 20.42us 0.8025, 0.000367, 20.13us 1.2693, 0.000580, 35.64us 2.2096, 0.001010, 68.88us (11, 11, 11) 0.6121, 0.000153, 20.59us 0.8086, 0.000202, 20.42us 1.1700, 0.000293, 20.71us 1.8170, 0.000455, 23.54us 3.2117, 0.000804, 53.36us (13, 13, 13) 0.9165, 0.000139, 20.51us 1.1395, 0.000173, 20.56us 1.7343, 0.000263, 20.80us 2.5868, 0.000392, 24.59us 4.5823, 0.000695, 47.77us Batch : 1, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.1092, 0.002023, 75.45us 0.1709, 0.003165, 160.44us 20.2452, 0.374911, 548.61us 24.7990, 0.459240, 1550.34us 26.4494, 0.489804, 6957.79us (3, 3, 3) 0.5352, 0.000367, 20.58us 1.0281, 0.000705, 24.14us 2.0150, 0.001382, 59.12us 3.3069, 0.002268, 138.23us 384.5216, 0.263732, 529.71us (5, 5, 5) 2.0739, 0.000307, 20.60us 2.5199, 0.000373, 20.44us 4.6916, 0.000695, 33.89us 7.9482, 0.001178, 37.74us 14.2553, 0.002112, 200.54us (7, 7, 7) 4.2236, 0.000228, 20.61us 5.5605, 0.000300, 20.97us 9.0440, 0.000488, 26.40us 12.7847, 0.000690, 30.64us 25.3050, 0.001366, 88.05us (9, 9, 9) 6.0817, 0.000154, 20.63us 9.5416, 0.000242, 20.84us 14.2416, 0.000362, 32.47us 22.8452, 0.000580, 78.57us 40.3246, 0.001024, 194.50us (11, 11, 11) 11.1144, 0.000155, 20.56us 14.5581, 0.000203, 20.91us 20.8263, 0.000290, 38.07us 33.0004, 0.000459, 52.74us 57.3275, 0.000798, 137.19us (13, 13, 13) 16.5176, 0.000139, 21.26us 20.8089, 0.000175, 22.33us 31.3433, 0.000264, 42.93us 45.9733, 0.000388, 59.84us 82.8301, 0.000698, 138.42us Batch : 1, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0274, 0.001715, 74.99us 0.0485, 0.003034, 159.92us 5.9925, 0.374529, 546.35us 7.3389, 0.458679, 1544.53us 7.8354, 0.489714, 6677.00us (3, 3, 3) 0.1560, 0.000361, 20.72us 0.3043, 0.000704, 22.37us 0.5838, 0.001352, 54.97us 1.0455, 0.002420, 130.57us 113.9739, 0.263828, 463.43us (5, 5, 5) 0.6121, 0.000306, 20.12us 0.7247, 0.000362, 20.73us 1.3740, 0.000687, 26.59us 2.3794, 0.001190, 32.12us 4.1929, 0.002096, 165.81us (7, 7, 7) 1.2389, 0.000226, 20.59us 1.6311, 0.000297, 20.53us 2.6732, 0.000487, 20.37us 3.7501, 0.000683, 20.71us 7.4575, 0.001359, 59.16us (9, 9, 9) 1.7983, 0.000154, 20.64us 2.8075, 0.000241, 20.59us 4.2165, 0.000361, 20.38us 6.7153, 0.000576, 38.29us 12.0530, 0.001033, 86.33us (11, 11, 11) 3.3326, 0.000156, 20.56us 4.3061, 0.000202, 20.67us 6.2235, 0.000292, 20.47us 9.8009, 0.000460, 27.41us 16.9994, 0.000798, 68.49us (13, 13, 13) 4.9016, 0.000139, 20.63us 6.1261, 0.000174, 20.65us 9.2106, 0.000262, 20.93us 13.5843, 0.000386, 27.95us 24.6476, 0.000701, 64.88us Batch : 8, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0170, 0.002122, 74.99us 0.0316, 0.003946, 160.66us 3.0013, 0.375158, 546.94us 3.6780, 0.459753, 1544.58us 3.9197, 0.489966, 5948.43us (3, 3, 3) 0.0821, 0.000380, 20.27us 0.1559, 0.000722, 22.29us 0.3133, 0.001450, 54.72us 0.5100, 0.002361, 130.12us 57.0481, 0.264111, 376.71us (5, 5, 5) 0.3075, 0.000307, 20.57us 0.3680, 0.000368, 20.69us 0.6786, 0.000679, 26.61us 1.1744, 0.001174, 31.77us 2.0654, 0.002065, 128.31us (7, 7, 7) 0.6512, 0.000237, 20.60us 0.8359, 0.000305, 20.50us 1.3712, 0.000500, 20.75us 1.9472, 0.000710, 20.92us 3.7586, 0.001370, 44.59us (9, 9, 9) 0.9138, 0.000157, 20.43us 1.4198, 0.000243, 20.58us 2.1018, 0.000360, 20.52us 3.3691, 0.000578, 35.90us 5.9491, 0.001020, 69.16us (11, 11, 11) 1.6606, 0.000156, 20.63us 2.1599, 0.000203, 20.57us 3.1240, 0.000293, 20.98us 4.8874, 0.000459, 24.65us 8.4780, 0.000796, 56.47us (13, 13, 13) 2.4987, 0.000142, 20.71us 3.0667, 0.000174, 20.45us 4.6387, 0.000264, 20.76us 6.8187, 0.000388, 25.95us 12.2077, 0.000695, 50.46us Batch : 8, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.2635, 0.002059, 75.66us 0.4030, 0.003149, 161.78us 48.0296, 0.375231, 550.46us 58.7787, 0.459209, 1902.41us 62.6966, 0.489817, 7817.48us (3, 3, 3) 1.2271, 0.000355, 20.72us 2.4185, 0.000700, 26.44us 4.6933, 0.001358, 64.66us 7.7016, 0.002228, 192.69us 912.0736, 0.263910, 593.69us (5, 5, 5) 4.8716, 0.000304, 24.75us 5.8624, 0.000366, 21.39us 11.0705, 0.000692, 66.94us 18.9280, 0.001183, 104.93us 34.0512, 0.002128, 441.81us (7, 7, 7) 10.1713, 0.000232, 20.98us 13.2273, 0.000301, 36.26us 21.5426, 0.000491, 52.18us 30.1910, 0.000688, 72.94us 59.8381, 0.001363, 191.52us (9, 9, 9) 14.4542, 0.000155, 23.85us 22.6579, 0.000243, 30.59us 33.8839, 0.000363, 57.40us 54.3563, 0.000583, 142.53us 95.8123, 0.001027, 309.24us (11, 11, 11) 26.3348, 0.000155, 30.07us 34.3043, 0.000201, 37.01us 49.8093, 0.000292, 74.04us 78.3720, 0.000460, 110.53us 136.5404, 0.000801, 264.14us (13, 13, 13) 39.3550, 0.000140, 37.38us 49.3207, 0.000175, 43.51us 74.1139, 0.000264, 83.70us 108.7627, 0.000387, 136.09us 196.5412, 0.000699, 280.16us Batch : 8, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.8467, 0.001960, 147.36us 1.3993, 0.003239, 314.95us 162.0182, 0.375042, 1327.22us 198.3226, 0.459080, 3921.79us 211.6123, 0.489843, 15646.94us (3, 3, 3) 4.3146, 0.000370, 29.23us 8.1125, 0.000696, 74.94us 15.8886, 0.001362, 223.69us 26.2404, 0.002250, 601.33us 3076.5354, 0.263763, 1974.06us (5, 5, 5) 16.5032, 0.000306, 58.79us 19.6887, 0.000365, 53.79us 37.2731, 0.000690, 192.34us 63.3076, 0.001172, 270.01us 114.8880, 0.002128, 1148.56us (7, 7, 7) 34.0802, 0.000230, 51.12us 44.4087, 0.000300, 100.93us 72.4613, 0.000489, 161.48us 101.9317, 0.000688, 202.91us 201.8955, 0.001363, 545.33us (9, 9, 9) 48.8179, 0.000155, 65.78us 76.3465, 0.000242, 87.48us 114.0228, 0.000362, 179.11us 182.9805, 0.000581, 403.66us 322.7040, 0.001025, 894.86us (11, 11, 11) 88.9993, 0.000155, 88.69us 116.4213, 0.000202, 107.55us 168.3363, 0.000293, 228.71us 264.2232, 0.000460, 322.84us 459.1324, 0.000799, 784.25us (13, 13, 13) 132.7447, 0.000140, 112.91us 165.4525, 0.000174, 131.08us 249.7127, 0.000263, 266.43us 367.0824, 0.000387, 410.17us 663.1367, 0.000699, 847.87us Float half Discepancies 198.37625122070312 0.4592042852331091 432 Sum : tensor(25110.1484, device='cuda:0') tensor(25104., device='cuda:0', dtype=torch.float16) ``` </p> </details> ngimel malfet anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53607 Reviewed By: mruberry Differential Revision: D27652337 Pulled By: ngimel fbshipit-source-id: 6439c0cafe6ca3f761a3f5d058050a55e9a0abd8	2021-04-08 15:48:08 -07:00
lezcano	d3d7f57c2c	Fix a problem when removing parametrizations (#55456 ) Summary: There was an error when removing a parametrization with `leave_parametrized=True`. It had escaped the previous tests. This PR should fix that. Edit. I also took this chance to fix a few mistakes that the documentation had, and to also write the `set_original_` in a more compact way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55456 Reviewed By: mrshenli Differential Revision: D27620481 Pulled By: albanD fbshipit-source-id: f1298ddbcf24566ef48850c62a1eb4d8a3576152	2021-04-08 06:39:28 -07:00
Maxim Grechkin	38a08a49ea	Flip clip_grad_norm default for error_if_nonfinite to false (#55169 ) Summary: Non-backwards-compatible change introduced in https://github.com/pytorch/pytorch/pull/53843 is tripping up a lot of code. Better to set it to False initially and then potentially flip to True in the later version to give people time to adapt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55169 Reviewed By: mruberry Differential Revision: D27511150 Pulled By: jbschlosser fbshipit-source-id: 1ac018557c0900b31995c29f04aea060a27bc525	2021-04-02 12:25:32 -07:00
Alexander Golynski	978fca64a6	Revert D25399470: add channels last for MaxPool2d Test Plan: revert-hammer Differential Revision: D25399470 (`f43eb59a68`) Original commit changeset: b49b9581f132 fbshipit-source-id: ab8c053964aeecf196f6d932c63ada51a3b7ced8	2021-04-02 10:15:11 -07:00
mingfeima	f43eb59a68	add channels last for MaxPool2d (#48917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48917 max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399470 Pulled By: VitalyFedyunin fbshipit-source-id: b49b9581f1329a8c2b9c75bb10f12e2650e4c65a	2021-04-02 09:13:06 -07:00
Michael Melesse	26c1e2ee83	[ROCM] enable miopen for rnn f16 (#52475 ) Summary: This PR enables using MIOpen for RNN FP16 on ROCM. It does this by altering use_miopen to allow fp16. In the special case where LSTMs use projections we use the default implementation, as it is not implemented in MIOpen at this time. We do send out a warning once to let the user know. We then remove the various asserts that are no longer necessary since we handle the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52475 Reviewed By: H-Huang Differential Revision: D27449150 Pulled By: malfet fbshipit-source-id: 06499adb94f28d4aad73fa52890d6ba361937ea6	2021-03-31 14:39:54 -07:00
Joel Schlosser	0bd96458ba	Revert D26820202: Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants Test Plan: revert-hammer Differential Revision: D26820202 (`f9097c43b9`) Original commit changeset: 3e8f09523329 fbshipit-source-id: 5742b69a96ce1c848d75348d0f761cf66a69cbf3	2021-03-31 13:57:44 -07:00
Arindam Roy	b907d6e3b6	[ROCm] skip some tests to enable 4.1 CI upgrade (#54536 ) Summary: Skips the tests indicated as failing in https://github.com/pytorch/pytorch/issues/54535. During the ROCm CI upgrade from 4.0.1 to 4.1, some tests regressed. Specifically, FFT tests in test_spectral_ops.py and test_grid_sample in test_nn.py. In order to keep a passing CI signal, we need to disable these temporarily. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54536 Reviewed By: H-Huang Differential Revision: D27442974 Pulled By: malfet fbshipit-source-id: 07dffb957757a5fc7afaa5bf78b935a427251ef4	2021-03-30 17:49:45 -07:00
Edward Yang	6c8d783830	Generate no-op meta functions for all inplace operations (#54901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54901 Some subtleties: - Need to make sure not to clobber composite definitions when deciding when to generate - I was lazy and so I didn't make inplace on TensorList work, nor did I make inplace functions that returned void work - A few tests started complaining that these noop meta functions weren't raising the errors they needed. This is tracked in https://github.com/pytorch/pytorch/issues/54897 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27407232 Pulled By: ezyang fbshipit-source-id: 5e706a267496368acdafd128942c310954e43d29	2021-03-30 09:31:39 -07:00
Peter Bell	2503028ff5	Fix ConvTranspose with padding as a list of values (#54911 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54452 The assertion that fails in the issue is necessary to appease mypy. Instead, I fix `_ntuple` to always return a `tuple`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54911 Reviewed By: H-Huang Differential Revision: D27411088 Pulled By: jbschlosser fbshipit-source-id: 7f5045c58dd4f5f3b07b4826d9b4ca85606c5bce	2021-03-30 07:37:31 -07:00
Zheng Yan	f9097c43b9	Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants (#53655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53655 Currently EmbeddingBag and it variants support either int32 or int64 indices/offsets. We have use cases where there are mix of int32 and int64 indices which are not supported yet. To avoid introducing too many branches we could simply cast offsets type to indices type when they are not the same. Test Plan: unit tests Reviewed By: qizzzh Differential Revision: D26820202 fbshipit-source-id: 3e8f09523329ea12393ea92ee9a6315aa40a0b7f	2021-03-29 23:58:03 -07:00
Kurt Mohler	3ddc6174da	Raise error in clip_grad_norm_ if norm is non-finite (#53843 ) Summary: BC-breaking note: This change throws errors for cases that used to silently pass. The old behavior can be obtained by setting `error_if_nonfinite=False` Fixes https://github.com/pytorch/pytorch/issues/46849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53843 Reviewed By: malfet Differential Revision: D27291838 Pulled By: jbschlosser fbshipit-source-id: 216d191b26e1b5919a44a3af5cde6f35baf825c4	2021-03-29 08:41:21 -07:00
Brian Hirsh	86b1f4e9f2	fix silent correctness bug with channels_last usage of upsample cuda kernels (#54744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54744 Fixes https://github.com/pytorch/pytorch/issues/54590 After the porting the upsample operators to be structured, they now forward memory_format information to the output. This is a problem for the cuda kernels, which are not implemented to deal with `torch.channels_last` memory format. The operators are: * upsample_nearest2d * upsample_bilinear2d * upsample_nearest3d * upsample_trilinear3d This fix just allocates a temporary, contiguous output tensor when that happens, writes the results to the temporary and copies the results back to the output tensor. I held off on adding tests to get the fix out quickly, but I wrote a script and ran some manual tests, that basically just asserts that the outputs are the same for cpu and cuda, for some threshold. I ran it for all 4 operators: ``` import torch def basically_equal(t1, t2): epsilon = 1e-4 diffs = torch.abs(t1 - t2) print(torch.all(diffs < 1e-4)) # upsample 2d a = torch.arange(48).reshape(2, 2, 3, 4).contiguous(memory_format=torch.channels_last).float() out_cpu = torch.nn.functional.interpolate(a, scale_factor=2, mode='nearest') out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=2, mode='nearest') basically_equal(out_cpu, out_cuda.to("cpu")) out_cpu = torch.nn.functional.interpolate(a, scale_factor=2, mode='bilinear', align_corners=True) out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=2, mode='bilinear', align_corners=True) basically_equal(out_cpu, out_cuda.to("cpu")) # upsample 3d a = torch.arange(96).reshape(2, 2, 2, 3, 4).contiguous(memory_format=torch.channels_last_3d).float() out_cpu = torch.nn.functional.interpolate(a, scale_factor=3, mode='nearest') out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=3, mode='nearest') basically_equal(out_cpu, out_cuda.to("cpu")) out_cpu = torch.nn.functional.interpolate(a, scale_factor=3, mode='trilinear', align_corners=True) out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=3, mode='trilinear', align_corners=True) basically_equal(out_cpu, out_cuda.to("cpu")) ``` prints ``` tensor(True) tensor(True) tensor(True) tensor(True) ``` One thing that was weird- `upsample_bilinear2d` and `upsample_trilinear3d` were only accurate across cpu/cuda with an epsilon of `1e-4`. That tentatively sounds close enough to say that cuda isn't "wrong" (?), but that's not exactly "equal"... and I also ran the script before my change, and `bilinear2d` and `trilinear3d` were also the same across cpu/cuda with an epsilon of `1e-4`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27351393 Pulled By: bdhirsh fbshipit-source-id: b33f46e4855dc8b49b363770190b639beebbf5a7	2021-03-29 06:42:03 -07:00
Thomas Viehmann	d12118c0aa	Handle stride > 1 with im2col in CUDA thnn conv2d (#54080 ) Summary: The fallback thnn 2d convolution uses `im2col` to get patches and `gemm` to implement convolution . I has a shortcut to use `gemm` directly for kernel size 1, but this only works for stride == 1 and padding == 0. This PR adds checks for stride == 1 and padding == 0 to determining whether `im2col` can be skipped. Fixes https://github.com/pytorch/pytorch/issues/54036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54080 Reviewed By: ejguan Differential Revision: D27170482 Pulled By: zou3519 fbshipit-source-id: 055d6502239d34945934de409d78144d8a5c56f4	2021-03-25 09:53:49 -07:00
haozhe.zhu	947ab84fd2	enable_and_enhance_bf16_threshold (#54384 ) Summary: enable_and_enhance_bf16_threshold Pull Request resolved: https://github.com/pytorch/pytorch/pull/54384 Reviewed By: ngimel Differential Revision: D27286323 Pulled By: mruberry fbshipit-source-id: 517fa94764d8202bbcbf94011d2d48f716fbd01b	2021-03-24 22:46:20 -07:00
Xiang Gao	9f336bdf10	Fixes new tf32 failures in test_nn.py (#52871 ) Summary: Also modify the `tf32_on_and_off` decorator to make it support function without `device` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52871 Reviewed By: ngimel Differential Revision: D27286674 Pulled By: mruberry fbshipit-source-id: 14f6d558271bd6a1d0bc40691c170d47e81de1ff	2021-03-24 21:53:33 -07:00
Peter Bell	04e0cbf5a9	Add padding='same' mode to conv{1,2,3}d (#45667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45667 First part of #3867 (Pooling operators still to do) This adds a `padding='same'` mode to the interface of `conv{n}d`and `nn.Conv{n}d`. This should match the behaviour of `tensorflow`. I couldn't find it explicitly documented but through experimentation I found `tensorflow` returns the shape `ceil(len/stride)` and always adds any extra asymmetric padding onto the right side of the input. Since the `native_functions.yaml` schema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically padded `conv{n}d` and the `_conv{n}d_same` variant. Underscores because I couldn't see any way to avoid exporting a function into the `torch` namespace. A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use `TORCH_WARN_ONCE` to notify the user of the performance implications. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27170744 Pulled By: jbschlosser fbshipit-source-id: b3d8a0380e0787ae781f2e5d8ee365a7bfd49f22	2021-03-18 16:22:03 -07:00
Vitaly Fedyunin	ce2f71836c	Disabling dispatch to OneDNN for group convolutions when groups size = 24 * n (#53991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53991 Reviewed By: malfet Differential Revision: D27048155 Pulled By: VitalyFedyunin fbshipit-source-id: 5009f064220156ca14e1eb97172cfd4f7531b2a9	2021-03-15 19:30:19 -07:00
Yi Wang	d726ce6668	Support loading a non-DP/DDP model from a DP/DDP state_dict (#53224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53224 Loading a DP/DDP dict just needs to strip the module prefix from all items in the state dict and the metadata. One existing example is here: https://github.com/facebookresearch/fvcore/blob/master/fvcore/common/checkpoint.py#L239. #Closes: https://github.com/pytorch/pytorch/issues/41048/ ghstack-source-id: 123722976 Test Plan: buck test mode/dev-nosan caffe2/test:nn -- test_load_state_dict buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_save_load_checkpoint Reviewed By: rohan-varma, mrshenli Differential Revision: D26798495 fbshipit-source-id: 035c7d0907d7ae8f0d7ca21ec71f7f96ef8df6c8	2021-03-11 18:43:33 -08:00
Jagadish Krishnamoorthy	0a549f9412	[ROCm] Disable flaky tests on ROCm (#53192 ) Summary: The disabled tests are tracked by https://github.com/pytorch/pytorch/issues/53190 Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53192 Reviewed By: zhangguanheng66 Differential Revision: D26782204 Pulled By: mrshenli fbshipit-source-id: bc90b182c236249961da1f0d4894d29f6b44fa27	2021-03-11 08:29:12 -08:00
Brian Hirsh	c68cc24cee	update upsample tests in test_nn.py to test for memory_format (#53665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53665 ngimel pointed out to me where we already test the behavior of the `Upsample` ops in `test_nn.py`. This PR deleting my bespoke tests in `test_torch.py` and updates those in `test_nn.py` to test memory format properly. There were two reasons the original test didn't pick up on a memory format regression: - They didn't test the memory format of the output tensor explicitly, i.e. `output.is_contiguous(memory_format=...)` - Even with that change, the test tensors were to simple to fail the tests. From some trial and error, it looks like one of the first two dimensions in the inputs needs to be > 1 in order for the `channels_last` memory format to actually re-order the strides. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26929683 Pulled By: bdhirsh fbshipit-source-id: d17bc660ff031e9b3e2c93c60a9e9308e56ea612	2021-03-10 14:21:14 -08:00
Thomas Viehmann	e13ef777a7	Use native ctc loss for target length 256 (#53557 ) Summary: Apparently cudnn (8.1) does not like 256-long targets. Thank you raotnameh for reporting. Fixes https://github.com/pytorch/pytorch/issues/53505 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53557 Reviewed By: VitalyFedyunin Differential Revision: D26947262 Pulled By: albanD fbshipit-source-id: df6da7db8fd8e35050b4303ff1658646ebc60141	2021-03-10 10:13:42 -08:00
kshitij12345	45ddf113c9	[fix] nn.Embedding: allow changing the padding vector (#53447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53447 Reviewed By: albanD Differential Revision: D26946284 Pulled By: jbschlosser fbshipit-source-id: 54e5eec7da86fa02b1b6e4a235d66976a80764fc	2021-03-10 09:53:27 -08:00
Tomasz Grzegorzek	a3465214ba	move rnn cell size check to cpp (#51964 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32193. Possible further improvements: - do the same for quantized cells - reuse newly written functions in `56034636b9/torch/csrc/api/src/nn/modules/rnn.cpp (L699-L715)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51964 Reviewed By: albanD Differential Revision: D26757050 Pulled By: ngimel fbshipit-source-id: 9c917d9124de2b914ad9915c79af675ae561295a	2021-03-09 15:02:20 -08:00
Xiao Wang	ef3765b992	Fix a cuda max_pool3d issue, do multiplication in int64 (#52828 ) Summary: Fix https://github.com/pytorch/pytorch/issues/52822 - [x] benchmark Pull Request resolved: https://github.com/pytorch/pytorch/pull/52828 Reviewed By: mrshenli Differential Revision: D26866674 Pulled By: heitorschueroff fbshipit-source-id: bd8276dd70316a767dc6e1991c1259f1f0b390b2	2021-03-09 10:54:43 -08:00
lezcano	7aeee2849b	Parametrization Functionality (#33344 ) Summary: Provides the implementation for feature request issue https://github.com/pytorch/pytorch/issues/28937. Adds the `Parametrization` functionality and implements `Pruning` on top of it. It adds the `auto` mode, on which the parametrization is just computed once per forwards pass. The previous implementation computed the pruning on every forward, which is not optimal when pruning RNNs for example. It implements a caching mechanism for parameters. This is implemented through the mechanism proposed at the end of the discussion https://github.com/pytorch/pytorch/issues/7313. In particular, it assumes that the user will not manually change the updated parameters between the call to `backwards()` and the `optimizer.step()`. If they do so, they would need to manually call the `.invalidate()` function provided in the implementation. This could be made into a function that gets a model and invalidates all the parameters in it. It might be the case that this function has to be called in the `.cuda()` and `.to` and related functions. As described in https://github.com/pytorch/pytorch/issues/7313, this could be used, to implement in a cleaner way the `weight_norm` and `spectral_norm` functions. It also allows, as described in https://github.com/pytorch/pytorch/issues/28937, for the implementation of constrained optimization on manifolds (i.e. orthogonal constraints, positive definite matrices, invertible matrices, weights on the sphere or the hyperbolic space...) TODO (when implementation is validated): - More thorough test - Documentation Resolves https://github.com/pytorch/pytorch/issues/28937 albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/33344 Reviewed By: zhangguanheng66 Differential Revision: D26816708 Pulled By: albanD fbshipit-source-id: 07c8f0da661f74e919767eae31335a9c60d9e8fe	2021-03-04 12:45:27 -08:00
Joel Schlosser	e86476f736	Huber loss (#50553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48595. ## Background This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant. I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows: ``` Huber loss calls dedicated Huber kernel: 2,795,300 Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612 ``` With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553 Test Plan: ``` python test/test_nn.py TestNN.test_HuberLoss python test/test_nn.py TestNN.test_HuberLoss_delta python test/test_nn.py TestNN.test_huber_loss_invalid_delta python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda python test/test_nn.py TestNN.test_loss_equal_input_target_shape python test/test_nn.py TestNN.test_pointwise_loss_broadcast python test/test_overrides.py python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss python test/test_type_hints.py python test/test_cpp_api_parity.py build/bin/test_api ``` ## Documentation <img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png"> <img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png"> <img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png"> <img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png"> <img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png"> Reviewed By: H-Huang Differential Revision: D26734071 Pulled By: jbschlosser fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5	2021-03-02 17:30:45 -08:00
Thomas J. Fan	e2ecfb60a6	FIX Validates target in cosine_embedding (#53110 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53030 This PR validates the target for `cosine_embedding_loss`. This is consistent with how `cross_entropy` handles non 1d targets: ```py import torch import torch.nn.functional as F input = torch.randn(3, 5, requires_grad=True) target = torch.randint(5, (3, 1)) # Raises RuntimeError loss = F.cross_entropy(input, target) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53110 Reviewed By: VitalyFedyunin Differential Revision: D26766579 Pulled By: jbschlosser fbshipit-source-id: 73ad559ff9376543b6528a36af094e82eb6f9735	2021-03-02 16:50:44 -08:00
Edward Yang	baed2cfe01	Back out "Revert D26753571: [pytorch][PR] add submodules to sys.modules so their attributes can be pickled" (#53127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53127 Original commit changeset: cc9cc4f508af ghstack-source-id: 122871468 Test Plan: run flake8 on the files locally Reviewed By: malfet, janeyx99 Differential Revision: D26757859 fbshipit-source-id: 7e7bde5c1f2b434442079656e2186b500d53fdc2	2021-03-02 14:46:56 -08:00
Edward Yang	2d7119f943	Revert D26753571: [pytorch][PR] add submodules to sys.modules so their attributes can be pickled Test Plan: revert-hammer Differential Revision: D26753571 (`fbf9745c85`) Original commit changeset: 2bda03bab39f fbshipit-source-id: cc9cc4f508af122b0fdec7f8475343bd9badb9db	2021-03-02 11:11:31 -08:00
Kyle Chen	d8ef3a4793	[ROCm] Enable test cases in test_nn.py for ROCm (#52836 ) Summary: Enabling tests in test_nn.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52836 Reviewed By: H-Huang Differential Revision: D26725891 Pulled By: mruberry fbshipit-source-id: 59655a2515ddce92ffc4c55dcf6f28257c05e3c9	2021-03-02 10:56:07 -08:00
mattip	fbf9745c85	add submodules to sys.modules so their attributes can be pickled (#53107 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38137 As mentioned in the issue, this is a workaround for [python issue 43367](https://bugs.python.org/issue43367). There are a number of other places where `sys.modules` is modified, if something changes in python perhaps those should be reviewed as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53107 Reviewed By: zou3519 Differential Revision: D26753571 Pulled By: ezyang fbshipit-source-id: 2bda03bab39ff9ca58ce4bc13befe021da91b9c4	2021-03-02 10:47:21 -08:00
Xiang Gao	a6b7da7dfe	Add 64bit indexing support for softmax (#52713 ) Summary: fixes https://github.com/pytorch/pytorch/issues/52715 https://github.com/pytorch/pytorch/issues/52716 split across batch dimension Pull Request resolved: https://github.com/pytorch/pytorch/pull/52713 Reviewed By: ailzhang Differential Revision: D26640033 Pulled By: ngimel fbshipit-source-id: f169cb0d6abc1cfbddf658d9775759a7d56f5c12	2021-02-24 21:39:58 -08:00
Nikita Shulga	59ac0ff037	Change `maybe_resize_storage_cpu` `new_size` arg to unsigned (#52671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52671 Code is written with the assumption that new_size is unsigned value, and when function is called with negative value it silently returns a nullptr rather than raise an exception. Fix above-mentioned logic by converting new_size to unsigned type and let cpu_allocator raise exception on negative alloc. Unroll nested if blocks by returning early if new_size is 0 Add TestNN.test_adaptive_pooling_size_overflow to indirecty validate the fix. Fixes https://github.com/pytorch/pytorch/issues/50960 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26607549 Pulled By: malfet fbshipit-source-id: e3d4f7548b098f24fa5aba42d8f4e9288ece1e2e	2021-02-24 09:50:28 -08:00
Joel Schlosser	a39b1c42c1	MHA: Fix regression and apply bias flag to both in/out proj (#52537 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52257 ## Background Reverts MHA behavior for `bias` flag to that of v1.5: flag enables or disables both in and out projection biases. Updates type annotations for both in and out projections biases from `Tensor` to `Optional[Tensor]` for `torch.jit.script` usage. Note: With this change, `_LinearWithBias` defined in `torch/nn/modules/linear.py` is no longer utilized. Completely removing it would require updates to quantization logic in the following files: ``` test/quantization/test_quantized_module.py torch/nn/quantizable/modules/activation.py torch/nn/quantized/dynamic/modules/linear.py torch/nn/quantized/modules/linear.py torch/quantization/quantization_mappings.py ``` This PR takes a conservative initial approach and leaves these files unchanged. Is it safe to fully remove `_LinearWithBias`? Pull Request resolved: https://github.com/pytorch/pytorch/pull/52537 Test Plan: ``` python test/test_nn.py TestNN.test_multihead_attn_no_bias ``` ## BC-Breaking Note In v1.6, the behavior of `MultiheadAttention`'s `bias` flag was incorrectly changed to affect only the in projection layer. That is, setting `bias=False` would fail to disable the bias for the out projection layer. This regression has been fixed, and the `bias` flag now correctly applies to both the in and out projection layers. Reviewed By: bdhirsh Differential Revision: D26583639 Pulled By: jbschlosser fbshipit-source-id: b805f3a052628efb28b89377a41e06f71747ac5b	2021-02-22 14:47:12 -08:00
kshitij12345	ad3319cbc2	fractional_max_pool{2/3}d : Fix segfaults for incorrect kernel_size and output_size (#51626 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50967 TODO: * [x] Add test for `fractional_max_pool3d` similar to `fractional_max_pool2d` (since there is no test for the same). Needs Resolution: * [ ] ASAN failure on the newly added 3d variant test. https://app.circleci.com/pipelines/github/pytorch/pytorch/269483/workflows/8426b3b7-9a35-4032-a57a-729964a4a5ff/jobs/10673756 * [ ] Failing gradcheck on MacOS. https://app.circleci.com/pipelines/github/pytorch/pytorch/269483/workflows/8426b3b7-9a35-4032-a57a-729964a4a5ff/jobs/10673101 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51626 Reviewed By: jbschlosser Differential Revision: D26514064 Pulled By: heitorschueroff fbshipit-source-id: e2cc57585dbc3a08c7f24591b202e0fabfd2a459	2021-02-22 12:06:36 -08:00
Gregory Chanan	f72b4b83fe	Fix upsample bicubic2d batching handling on CPU. (#52389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52389 Fixes: https://github.com/pytorch/pytorch/issues/49159 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26496319 Pulled By: gchanan fbshipit-source-id: d385cd683ef09e0596a9875ce84d03e6e77acc93	2021-02-18 09:14:41 -08:00
zilinzhu	c8b3686a3e	Make bias in lazy modules lazy and avoid create empty tensors (#52212 ) Summary: Some minor improvement for lazy modules introduced in https://github.com/pytorch/pytorch/issues/44538, https://github.com/pytorch/pytorch/issues/47350 and https://github.com/pytorch/pytorch/issues/51548. This PR mainly turn the bias to `UninitializedParameter` and instead of creating empty tensors like ```python self.bias = Parameter(torch.Tensor(0)) self.bias = UninitializedParameter() ``` I think it would be better to ```python self.register_parameter('bias', None) self.bias = UninitializedParameter() ``` In addition, I change the constructor of the `LazyBatchNorm` from ```python self.running_mean = UninitializedBuffer() ``` to ```python self.register_buffer('running_mean', UninitializedBuffer()) ``` as the original one would not change the underlying `self._buffers`. Thank you for your time on reviewing this PR :). Gently ping albanD, mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/52212 Reviewed By: jbschlosser Differential Revision: D26504508 Pulled By: albanD fbshipit-source-id: 7094d0bb4fa9e2a40a07b79d350ea12a6ebfd080	2021-02-18 06:34:53 -08:00
Vitaly Fedyunin	8bf846d2c8	Skip OneDNN Convolution in case of groups = 24 #50042 (#52327 ) Summary: Temporary disabling OneDNN conv for group size = 24 as OneDNN update came too late to be fully tested https://github.com/pytorch/pytorch/issues/50042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52327 Reviewed By: agolynski Differential Revision: D26474186 Pulled By: VitalyFedyunin fbshipit-source-id: 8d6964d33c8dcab70e207088c3940810eabbd068	2021-02-17 14:49:23 -08:00
Jane Xu	68e2a8c420	Reenable test_nn tests for Windows (#52051 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52002 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52051 Reviewed By: ngimel Differential Revision: D26409749 Pulled By: janeyx99 fbshipit-source-id: 5fa76d4fff8cf0fe2130c925fde9dffd0d1e7172	2021-02-16 08:00:07 -08:00
Phi Nguyen	490eb3e735	Add 3D depthwise seperable convolution (#51027 ) Summary: Because this pull request (https://github.com/pytorch/pytorch/issues/40801) becomes an important part of recent 3D models, brings significant improvement in speed, and also have been open for a while. So I decided to resolve the previous review comment and modify it a bit so that it can be merged into the latest version of Pytorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51027 Reviewed By: albanD Differential Revision: D26414116 Pulled By: ngimel fbshipit-source-id: 562c099f4d7f6d603a9c2f2e2a518bc577b0d8ee	2021-02-13 18:14:09 -08:00
Jane Xu	bff8194522	Replace 11.1 with 11.2 on CI for Windows (#51598 ) Summary: Adding CUDA 11.2 to Windows CI. Disabled tests: The following ran into `CUDA error: misaligned address` for CUDA 11.2: (issue linked below) `test_where_scalar_valid_combination_cuda_complex128` in test_torch.py `test_sgn_complex_cuda` in test_autograd.py The following ran into `CUDA error: too many resources requested for launch` for CUDA 11.2: (https://github.com/pytorch/pytorch/issues/52002) test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_float64 test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_float64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51598 Reviewed By: mrshenli Differential Revision: D26344965 Pulled By: janeyx99 fbshipit-source-id: 3c9a4ed16d748969e96593220ec0a9f33e1ffcef	2021-02-10 17:59:11 -08:00
Akifumi Imanishi	b3fda95fe7	Add LazyBatchNormXd (#51862 ) Summary: Same diff with https://github.com/pytorch/pytorch/issues/51548 (cc. albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51862 Reviewed By: izdeby Differential Revision: D26312289 Pulled By: albanD fbshipit-source-id: 9cdec0e0c9021c33d10d85010978c7fa5cb4dc60	2021-02-09 10:29:03 -08:00
XiaobingSuper	d90911adf9	fix AdaptiveAveragePooling crash problem for non support input (#51443 ) Summary: For none support input, we should not do check in a parallel region, this PR will first do the dtype check, and then do parallel for. Fixes https://github.com/pytorch/pytorch/issues/51352. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51443 Reviewed By: izdeby Differential Revision: D26305584 Pulled By: ngimel fbshipit-source-id: 6faa3148af5bdcd7246771c0ecb4db2b31ac82c6	2021-02-08 11:43:25 -08:00
Alban Desmaison	a930162c69	Revert D26276903: [pytorch][PR] Add LazyBatchNormXd Test Plan: revert-hammer Differential Revision: D26276903 (`aa1fd6b45a`) Original commit changeset: 0ac706974178 fbshipit-source-id: bfe01b01cd460f1e2845ea5ef1fc1514e6b6ba54	2021-02-05 12:37:29 -08:00
Akifumi Imanishi	aa1fd6b45a	Add LazyBatchNormXd (#51548 ) Summary: This PR implements UninitializedBuffer and LazyBatchnormXd based on https://github.com/pytorch/pytorch/issues/44538. (cc. emcastillo and albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51548 Reviewed By: zhangguanheng66 Differential Revision: D26276903 Pulled By: albanD fbshipit-source-id: 0ac706974178363f8af075e59b41d5989418922f	2021-02-05 10:27:04 -08:00
jiej	0e1c5cb354	fixing index clamping for upsample nearest kernel backward (#51240 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51240 Reviewed By: ailzhang Differential Revision: D26139221 Pulled By: ngimel fbshipit-source-id: 0591ac6d1f988b54c1b1ee50d34fb7c2a3f97c4e	2021-01-31 15:22:58 -08:00
Jeffrey Wan	c0966914bc	Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49409 There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories: 1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead 3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?) Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False. So far exceptions to the above (as discovered by CI) include: - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103) - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236) - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235) - test_data_parallel (test_data_parallel_buffers_requiring_grad) - SIGSEGV (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697) - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315) Possible TODO is to prevent new tests from invoking external gradcheck. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133 Reviewed By: ezyang Differential Revision: D26147919 Pulled By: soulitzer fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432	2021-01-29 09:13:37 -08:00
Akshit Khurana	16132a4b1d	Make sure ConstantPadNd op preserves memory format (#50898 ) Summary: * ConstantPadNd op didn't preserve memory format for non quantized cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/50898 Test Plan: pytest test/test_nn.py::TestConstPadNd Reviewed By: kimishpatel Differential Revision: D26003407 Pulled By: axitkhurana fbshipit-source-id: a8b56d32734772acae6f5c2af4dfe0bd3434cab1	2021-01-27 22:36:44 -08:00
Edward Yang	5e79b8e06d	Back out "Revert D25903846: [pytorch][PR] Structured kernel definition for upsample_nearest2d" (#50794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50794 Original commit changeset: b4a7948088c0 There are some subtle extra tweaks on top of the original. I can unbundle them, but I've opted to keep it with the port because it's the easiest way to make sure the changes are exercised. * There's a bugfix in the codegen to test if a dispatch key is structured before short circuiting because the dispatch key was missing in the table. This accounts for mixed structured-nonstructured situations where the dispatch table is present, but the relevant structured key isn't (because the dispatch table only exists to register, e.g., QuantizedCPU) * Dispatch tables for functions which delegate to structured kernels don't have Math entries from generated for them. * It's now illegal to specify a structured dispatch key in a delegated structured kernel (it will be ignored!) add is now fixed to follow this * There are some extra sanity checks for NativeFunctions validation * Finally, unlike the original PR, I switched the .vec variant of upsample_nearest2d to also be DefaultBackend, bringing it inline with upsample_nearest1d. ghstack-source-id: 120038038 Test Plan: ``` buck test mode/dev //coreai/tiefenrausch:python_tests -- --exact 'coreai/tiefenrausch:python_tests - test_can_run_local_async_inference_cpu (coreai.tiefenrausch.tests.python_test.TiefenrauschPY)' --run-disabled ``` Reviewed By: ngimel Differential Revision: D25962873 fbshipit-source-id: d29a9c97f15151db3066ae5efe7a0701e6dc05a3	2021-01-25 10:43:53 -08:00
Peter Bell	db079a9877	Padding: support complex dtypes (#50594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50594 Fixes #50234 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D25987316 Pulled By: anjali411 fbshipit-source-id: c298b771fe52b267a86938e886ea402badecfe3e	2021-01-22 11:57:42 -08:00
Richard Zou	c7d348fea6	Turn on batched grad testing for non-autogenerated tests in test_nn.py (#50739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50739 This does not turn on batched grad testing for autogenerated NewModuleTest tests and CriterionTest tests. Those are coming later. Test Plan: - run tests Reviewed By: ejguan Differential Revision: D25997677 Pulled By: zou3519 fbshipit-source-id: b4b2d68e0f99c3d573faf237e1e531d0b3fced40	2021-01-22 07:40:20 -08:00
M.L. Croci	8eb90d4865	Add Gaussian NLL Loss (#50886 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48520. cc albanD (This is a clean retry PR https://github.com/pytorch/pytorch/issues/49807) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50886 Reviewed By: ejguan Differential Revision: D26007435 Pulled By: albanD fbshipit-source-id: 88fe91b40dea6f72e093e6301f0f04fcc842d2f0	2021-01-22 06:56:49 -08:00
Xiao Wang	db86dd8ad7	Fix replication_pad for cuda launch configuration (#50565 ) Summary: Fix https://github.com/pytorch/pytorch/issues/49601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50565 Reviewed By: mruberry Differential Revision: D25968843 Pulled By: ngimel fbshipit-source-id: 6d2d543132b501765e69b52caaa283fb816db276	2021-01-20 11:52:12 -08:00
AJ San Joaquin	e9b369c25f	Add SELU Activation to calculate_gain (#50664 ) Summary: Fixes #{[24991](https://github.com/pytorch/pytorch/issues/24991)} I used a value of 0.75 as suggested in the forums by Thomas: https://discuss.pytorch.org/t/calculate-gain-tanh/20854/6 I verified that the value keeps the gradient stable for a 100-layer network. Code to reproduce (from [jpeg729](https://discuss.pytorch.org/t/calculate-gain-tanh/20854/4)): ```python import torch import torch.nn.functional as F import sys a = torch.randn(1000,1000, requires_grad=True) b = a print (f"in: {a.std().item():.4f}") for i in range(100): l = torch.nn.Linear(1000,1000, bias=False) torch.nn.init.xavier_normal_(l.weight, torch.nn.init.calculate_gain("selu")) b = getattr(F, 'selu')(l(b)) if i % 10 == 0: print (f"out: {b.std().item():.4f}", end=" ") a.grad = None b.sum().backward(retain_graph=True) print (f"grad: {a.grad.abs().mean().item():.4f}") ``` Output: ``` in: 1.0008 out: 0.7968 grad: 0.6509 out: 0.3127 grad: 0.2760 out: 0.2404 grad: 0.2337 out: 0.2062 grad: 0.2039 out: 0.2056 grad: 0.1795 out: 0.2044 grad: 0.1977 out: 0.2005 grad: 0.2045 out: 0.2042 grad: 0.2273 out: 0.1944 grad: 0.2034 out: 0.2085 grad: 0.2464 ``` I included the necessary documentation change, and it passes the _test_calculate_gain_nonlinear_ unittest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50664 Reviewed By: mruberry Differential Revision: D25942217 Pulled By: ngimel fbshipit-source-id: 29ff1be25713484fa7c516df71b12fdaecfb9af8	2021-01-18 23:01:18 -08:00
Sameer Deshmukh	7f3a407225	Multi label margin loss (#50007 ) Summary: Reopen PR for https://github.com/pytorch/pytorch/pull/46975 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50007 Reviewed By: mruberry Differential Revision: D25850808 Pulled By: ngimel fbshipit-source-id: a232e02949182b7d3799448d24ad54a9e0bcf95c	2021-01-18 01:48:05 -08:00
Natalia Gimelshein	534c82153e	fix bn channels_last contiguity check (#50659 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42588 The contiguity check used to be for memory format suggested by `grad_output->suggest_memory_format()`, but an invariant guaranteed by derivatives.yaml is `input->suggest_memory_format()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/50659 Reviewed By: mruberry Differential Revision: D25938921 Pulled By: ngimel fbshipit-source-id: a945bfef6ce3d91b17e7ff96babe89ffd508939a	2021-01-17 21:10:12 -08:00
Jeffrey Wan	6e3e57095c	Add complex support for torch.nn.L1Loss (#49912 ) Summary: Building on top of the work of anjali411 (https://github.com/pytorch/pytorch/issues/46640) Things added in this PR: 1. Modify backward and double-backward formulas 2. Add complex support for `new module tests` and criterion tests (and add complex tests for L1) 3. Modify some existing tests to support complex Pull Request resolved: https://github.com/pytorch/pytorch/pull/49912 Reviewed By: zhangguanheng66 Differential Revision: D25853036 Pulled By: soulitzer fbshipit-source-id: df619f1b71c450ab2818eb17804e0c55990aa8ad	2021-01-15 15:53:15 -08:00
Jeffrey Wan	ef6be0ec50	Revert D25903846: [pytorch][PR] Structured kernel definition for upsample_nearest2d Test Plan: revert-hammer Differential Revision: D25903846 (`19a8e68d8c`) Original commit changeset: 0059fda9b7d8 fbshipit-source-id: b4a7948088c0329a3605c32b64ed77e060e63fca	2021-01-14 08:44:48 -08:00
jonykarki	934805bc49	cleaned up ModuleAttributeError (#50298 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49726 Just cleaned up the unnecessary `ModuleAttributeError` BC-breaking note: `ModuleAttributeError` was added in the previous unsuccessful [PR](https://github.com/pytorch/pytorch/pull/49879) and removed here. If a user catches `ModuleAttributeError` specifically, this will no longer work. They should catch `AttributeError` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50298 Reviewed By: mrshenli Differential Revision: D25907620 Pulled By: jbschlosser fbshipit-source-id: cdfa6b1ea76ff080cd243287c10a9d749a3f3d0a	2021-01-14 06:58:01 -08:00
Jeffrey Wan	19a8e68d8c	Structured kernel definition for upsample_nearest2d (#50189 ) Summary: See the structured kernel definition [RFC](https://github.com/pytorch/rfcs/pull/9) for context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50189 Reviewed By: mrshenli Differential Revision: D25903846 Pulled By: soulitzer fbshipit-source-id: 0059fda9b7d86f596ca35d830562dd4b859293a0	2021-01-13 22:48:23 -08:00
Sameer Deshmukh	375c30a717	Avg pool 0 dim acceptance. (#50008 ) Summary: Reopen https://github.com/pytorch/pytorch/pull/47426 since it failed for XLA tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50008 Reviewed By: mruberry Differential Revision: D25857687 Pulled By: ngimel fbshipit-source-id: 8bd47a17b417b20089cf003173d8c0793be58c72	2021-01-09 21:46:05 -08:00
Karthik Prasad	3b56e9d0ef	[pytorch] prune based on custom importance scores (#48378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48378 This commit adds support for accepting custom importance scores to use for pruning mask computation, rather than only using the parameter. This is useful if one wants to prune based on scores from different technique such as activations, gradients, weighted scoring of parameters, etc. An alternative to the above approach would be pass the custom mask to the already available interface. However, the ability to accept importance scores is easier it can leverage the mask computation logic that has already been baked in. In addition, the commit also makes some minor lint fixes. Test Plan: * Unit tests * Circle CI Differential Revision: D24997355 fbshipit-source-id: 30797897977b57d3e3bc197987da20e88febb1fa	2021-01-07 15:21:43 -08:00
Natalia Gimelshein	cd608fe59b	Revert D25719980: [pytorch][PR] Accept input tensor with 0-dim batch size for MultiLabelMarginLoss Test Plan: revert-hammer Differential Revision: D25719980 (`6b56b71e61`) Original commit changeset: 83414bad37c0 fbshipit-source-id: 27eddd711a2b9e0adbc08bfab12100562e63ac21	2020-12-30 17:06:28 -08:00
Sameer Deshmukh	6b56b71e61	Accept input tensor with 0-dim batch size for MultiLabelMarginLoss (#46975 ) Summary: Fix for one of the layers listed in https://github.com/pytorch/pytorch/issues/12013 or https://github.com/pytorch/pytorch/issues/38115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46975 Reviewed By: mruberry Differential Revision: D25719980 Pulled By: ngimel fbshipit-source-id: 83414bad37c0b004bc7cced04df8b9c89bdba3e6	2020-12-30 13:29:26 -08:00
Jony Karki	e482c70a3d	added List as an option to the unflattened_size (#49838 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49743 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49838 Reviewed By: mruberry Differential Revision: D25727971 Pulled By: ngimel fbshipit-source-id: 60142dae84ef107f0083676a2a78ce6b0472b7e1	2020-12-29 16:50:37 -08:00
Joel Schlosser	68d438c9da	Add PixelUnshuffle (#49334 ) Summary: Adds an implementation of `torch.nn.PixelUnshuffle` as the inverse operation of `torch.nn.PixelShuffle`. This addresses https://github.com/pytorch/pytorch/issues/2456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49334 Test Plan: ``` # Unit tests. python test/test_nn.py TestNN.test_pixel_shuffle_unshuffle # Module test. python test/test_nn.py TestNN.test_PixelUnshuffle # C++ API tests. build/bin/test_api # C++ / python parity tests. python test/test_cpp_api_parity.py # JIT test. python test/test_jit.py TestJitGeneratedFunctional.test_nn_pixel_unshuffle # Override tests. python test/test_overrides.py # Type hint tests. python test/test_type_hints.py ``` Screenshots of rendered docs: <img width="876" alt="Screen Shot 2020-12-18 at 12 19 05 PM" src="https://user-images.githubusercontent.com/75754324/102642255-6b07bb00-412b-11eb-88fa-e53e7e8ba720.png"> <img width="984" alt="Screen Shot 2020-12-18 at 12 19 26 PM" src="https://user-images.githubusercontent.com/75754324/102642276-70fd9c00-412b-11eb-8548-445082a2db02.png"> <img width="932" alt="Screen Shot 2020-12-18 at 12 19 34 PM" src="https://user-images.githubusercontent.com/75754324/102642704-19abfb80-412c-11eb-9546-95bdd1c3cf22.png"> <img width="876" alt="Screen Shot 2020-12-22 at 12 51 36 PM" src="https://user-images.githubusercontent.com/75754324/102918259-986aa680-4454-11eb-99e7-a0b4c8b3e283.png"> <img width="869" alt="Screen Shot 2020-12-22 at 12 51 44 PM" src="https://user-images.githubusercontent.com/75754324/102918274-9ef91e00-4454-11eb-94bb-91b58aff47d3.png"> Reviewed By: mruberry Differential Revision: D25401439 Pulled By: jbschlosser fbshipit-source-id: 209d92ce7295e51699e83616d0c62170a7ce75c8	2020-12-22 20:14:55 -08:00
albanD	ccd646696b	Fix Module backward hooks for all Tensor inputs/outputs (#46163 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/598 This is BC-breaking as we now explicitly don't call the hook when there are not Tensors at the top level of the output. This feature was not working anyways as the returned grad_input/grad_output were wrong (not respecting the output structure and wrong inputs for multi-Node Module). This is also BC-breaking as we now report the correct gradients for `nn.Module`s that contain multiple autograd `Node`s while we use to return bad results before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46163 Reviewed By: ailzhang, mruberry Differential Revision: D24894180 Pulled By: albanD fbshipit-source-id: e1b5d193d2818eb2f51e2a2722c7405c8bd13c2b	2020-12-18 09:04:36 -08:00
Igor Gitman	1b6d18aa7c	Adding support for CuDNN-based LSTM with projections (#47725 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46213 I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should. 1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes. 2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that. 3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places. 4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that? 5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47725 Reviewed By: zou3519 Differential Revision: D25449794 Pulled By: ngimel fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c	2020-12-16 11:27:02 -08:00
Xiang Gao	86902f84bf	CUDA BFloat embedding (#44848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44848 Reviewed By: izdeby Differential Revision: D25574204 Pulled By: ngimel fbshipit-source-id: b35f7253a6ad2b83f7b6b06862a5ab77295373e0	2020-12-16 09:24:46 -08:00
Joel Schlosser	220b91660f	[pytorch] Expand PixelShuffle to support any number of batch dims (#49187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49187 Expands the implementation of PixelShuffle to support any number of batch dimensions Test Plan: `buck test caffe2/test:nn -- test_pixel_shuffle` Reviewed By: mruberry Differential Revision: D25399058 fbshipit-source-id: ab0a7f593b276cafc9ebb46a177e2c1dce56d0de	2020-12-14 14:52:57 -08:00
mingfeima	690eaf9c43	add channels last for AdaptiveAvgPool2d (#48916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48916 optimize adaptive average pool2d forward path optimize adaptive average pool2d backward path remove unused headers minor change minor change rename the header; add adaptive max pooling in future. minor change loosen adapative_pool2d test on nhwc to both device cuda and cpu minor change Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25399469 Pulled By: VitalyFedyunin fbshipit-source-id: 86f9fda35194f21144bd4667b778c861c05a5bac	2020-12-14 09:47:46 -08:00
Xiang Gao	5960581148	CUDA BFloat16 batchnorm (non-cuDNN) (#44994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44994 Reviewed By: ailzhang Differential Revision: D25377525 Pulled By: ngimel fbshipit-source-id: 42d583bbc364532264a4d3ebaa6b4ae02a0413de	2020-12-08 14:25:42 -08:00
CedricPicron	dc7ab46dcc	Fix incorrect warnings in ParameterList/Dict (#48315 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46983. The solution is based of two components: 1. The introduction of the `_initialized` attribute. This will be used during ParameterList/Dict creation methods `__init__` (introduced in https://github.com/pytorch/pytorch/issues/47772) and `__setstate__` to not trigger warnings when setting general `Module` attributes. 2. The introduction of the `not hasattr(self, key)` check to avoid triggering warnings when changing general `Module` attributes such as `.training` during the `train()` and `eval()` methods. Tests related to the fix are added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48315 Reviewed By: mrshenli Differential Revision: D25130217 Pulled By: albanD fbshipit-source-id: 79e2abf1eab616f5de74f75f370c2fe149bed4cb	2020-12-01 07:08:33 -08:00
Akifumi Imanishi	492683bd42	Add LazyConvXd and LazyConvTransposeXd (#47350 ) Summary: This PR implements LazyConvXd and LazyConvTransposeXd based on https://github.com/pytorch/pytorch/issues/44538. (cc. emcastillo and albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47350 Reviewed By: ejguan Differential Revision: D25220645 Pulled By: albanD fbshipit-source-id: b5e2e866d53761a3415fd762d05a81920f8b16c3	2020-12-01 07:00:28 -08:00
Xiao Wang	4ab2055857	Re-enable only cuda tests wrongly disabled before (#48429 ) Summary: Close https://github.com/pytorch/pytorch/issues/46536 Re-enable only cuda tests wrongly disabled in https://github.com/pytorch/pytorch/pull/45332 See discussions https://github.com/pytorch/pytorch/issues/46536#issuecomment-721386038 and https://github.com/pytorch/pytorch/pull/45332#issuecomment-721350987 ~~See also https://github.com/pytorch/pytorch/pull/47237 and https://github.com/pytorch/pytorch/pull/47642~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/48429 Reviewed By: ngimel Differential Revision: D25176368 Pulled By: mruberry fbshipit-source-id: 3822f5a45e58c0e387624e70ea272d16218901a9	2020-11-25 13:26:35 -08:00
albanD	233192be73	Make sure valid ParameterList/Dict don't warn on creation (#47772 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47772 Reviewed By: zou3519 Differential Revision: D24991341 Pulled By: albanD fbshipit-source-id: 0fa21192f529a016048e3eef88c5a8f3cbb3c235	2020-11-16 13:16:59 -08:00
Natalia Gimelshein	982ae987d3	Revert D24941350: [pytorch][PR] Reopen PR for 0 dim batch size for AvgPool2d. Test Plan: revert-hammer Differential Revision: D24941350 (`ceeab70da1`) Original commit changeset: b7e50346d86e fbshipit-source-id: 2e42e4418476658dc1afb905184841bf61688cfd	2020-11-13 22:33:37 -08:00
Sameer Deshmukh	ceeab70da1	Reopen PR for 0 dim batch size for AvgPool2d. (#47426 ) Summary: Resubmitting https://github.com/pytorch/pytorch/pull/40694 since it could not be landed for some reason. CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/47426 Reviewed By: mruberry Differential Revision: D24941350 Pulled By: ngimel fbshipit-source-id: b7e50346d86eb63aaaf4fdd5ee71fafee2d0b476	2020-11-13 17:57:35 -08:00
Gao, Xiang	0652d755d3	Fix some flaky tests in test_torch.py and test_nn.py (#46941 ) Summary: Fixed test: - `test_is_nonzero`, this is asserting exact match, which is flaky when `TORCH_SHOW_CPP_STACKTRACES=1`, I changed this to non-exact assert - `test_pinverse` TF32 - `test_symeig` TF32 - `test_triangular_solve_batched_many_batches_cpu_float64` precision on CPU BLAS - `test_qr` TF32, as well as the tensor factory forgets a `dtype=dtype` - `test_lu` TF32 - `ConvTranspose2d` TF32 - `Conv3d_1x1x1_no_bias` TF32 - `Transformer*` TF32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46941 Reviewed By: heitorschueroff Differential Revision: D24852725 Pulled By: mruberry fbshipit-source-id: ccd4740cc643476178d81059d1c78da34e5082ed	2020-11-12 22:35:42 -08:00
Xiang Gao	2712acbd53	CUDA BFloat16 Dropout (#45005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45005 Reviewed By: mruberry Differential Revision: D24934761 Pulled By: ngimel fbshipit-source-id: 8f615b97fb93dcd04a46e1d8eeb817ade5082990	2020-11-12 22:28:11 -08:00
kshitij12345	4b25d83e9b	torch.dropout: fix non-contiguous layout input (#47552 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47552 Reviewed By: ailzhang Differential Revision: D24903435 Pulled By: ngimel fbshipit-source-id: ef5398931dddf452f5f734b4aa40c11f4ee61664	2020-11-11 22:56:31 -08:00
Qi Zhou	0ec717c830	Support int32 indices and offsets in nn.EmbeddingBag (#46758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46758 It's in general helpful to support int32 indices and offsets, especially when such tensors are large and need to be transferred to accelerator backends. Since it may not be very useful to support the combination of int32 indices and int64 offsets, here we enforce that these two must have the same type. Test Plan: unit tests Reviewed By: ngimel Differential Revision: D24470808 fbshipit-source-id: 94b8a1d0b7fc9fe3d128247aa042c04d7c227f0b	2020-11-03 23:33:50 -08:00
pomelyu	f41f3e3cd1	Implement bicubic grid sampler (#44780 ) Summary: Fix https://github.com/pytorch/pytorch/issues/44601 I added bicubic grid sampler in both cpu and cuda side, but haven't in AVX2 There is a [colab notebook](https://colab.research.google.com/drive/1mIh6TLLj5WWM_NcmKDRvY5Gltbb781oU?usp=sharing) show some test results. The notebook use bilinear for test, since I could only use distributed version of pytorch in it. You could just download it and modify the `mode_torch=bicubic` to show the results. There are some duplicate code about getting and setting values, since the helper function used in bilinear at first clip the coordinate beyond boundary, and then get or set the value. However, in bicubic, there are more points should be consider. I could refactor that part after making sure the overall calculation are correct. Thanks Pull Request resolved: https://github.com/pytorch/pytorch/pull/44780 Reviewed By: mrshenli Differential Revision: D24681114 Pulled By: mruberry fbshipit-source-id: d39c8715e2093a5a5906cb0ef040d62bde578567	2020-11-03 15:34:59 -08:00
kshitij12345	c68c3d0a02	[fix] nn.Embedding.from_pretrained : honour `padding_idx` argument (#47184 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46585 (first snippet) Now the behaviour of `padding_idx` agrees with documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47184 Reviewed By: mruberry Differential Revision: D24682567 Pulled By: albanD fbshipit-source-id: 864bd34eb9099d367a3fcbb8f4f4ba2e2b270724	2020-11-03 12:47:19 -08:00
Xiao Wang	774b638eb6	Change largeCUDATensorTest to largeTensorTest+onlyCUDA; add a buffer to large cuda tensor test (#45332 ) Summary: Effectively, `largeCUDATensorTest` = `largeTensorTest` + `onlyCUDA`. There was this problem where a user got OOM for a `largeCUDATensorTest('16GB')` on a 16GB V100. This decorator was checking total memory for a GPU device, however in most cases, we can't allocate all of the memory that a GPU has. So, it would be beneficial that we have a buffer on this `largeTensorTest` check for CUDA. I added a 10% buffer to it. Definition of `largeTensorTest` `d22dd80128/torch/testing/_internal/common_device_type.py (L560-L578)` `_has_sufficient_memory` `d22dd80128/torch/testing/_internal/common_device_type.py (L535-L557)` `largeCUDATensorTest` `d22dd80128/torch/testing/_internal/common_device_type.py (L526-L532)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45332 Reviewed By: ngimel Differential Revision: D24698690 Pulled By: mruberry fbshipit-source-id: a77544478e45ce271f6639ea04e87700574ae307	2020-11-03 11:43:49 -08:00
Heitor Schueroff	18470f68bc	Fix max_pool1d on discontiguous tensor (#47065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47065 #fixes https://github.com/pytorch/pytorch/issues/47054 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24633342 Pulled By: heitorschueroff fbshipit-source-id: b318f3a4fe68e538c71b147a82b62367f23146fa	2020-11-02 14:21:31 -08:00
Heitor Schueroff	2643800881	Fix max_pool2d with ceil_mode bug (#46558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46558 This PR fixes a bug with how pooling output shape was computed. ## BC Breaking Notes Previously, a bug in the pooling code allowed a sliding window to be entirely off bounds. Now, sliding windows must start inside the input or left padding (not right padding, see https://github.com/pytorch/pytorch/issues/46929) and may only go off-bounds if ceil_mode=True. fixes #45357 TODO - [x] Ensure existing tests are checking for the correct output size Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24633372 Pulled By: heitorschueroff fbshipit-source-id: 55925243a53df5d6131a1983076f11cab7516d6b	2020-10-30 09:36:04 -07:00
kshitij12345	1d233d7d1f	[fix] torch.nn.functional.embedding -> padding_idx behavior (#46714 ) Summary: Reference https://github.com/pytorch/pytorch/issues/46585 Fix for second snippet in the mentioned issue. ```python predefined_weights = torch.rand(10, 3) result = torch.nn.functional.embedding(torch.LongTensor([1,2,0]), predefined_weights, padding_idx=0) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46714 Reviewed By: VitalyFedyunin Differential Revision: D24593352 Pulled By: albanD fbshipit-source-id: 655b69d9ec57891871e26feeda2aa0dcff73beba	2020-10-29 13:29:00 -07:00
ashish	dfdc1dbee4	Disable softmax tests on ROCm (#46793 ) Summary: This PR disables the test_softmax and test_softmax_results in test_nn.py that were enabled in https://github.com/pytorch/pytorch/issues/46363. The softmax tests are causing failure on gfx906 machines. Disabling those until we root cause and fix them on 906. cc: jeffdaily ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/46793 Reviewed By: izdeby Differential Revision: D24539211 Pulled By: ezyang fbshipit-source-id: 633cb9dc497ad6359af85b85a711c4549d772b2a	2020-10-29 08:05:36 -07:00
Xiang Gao	7731370e71	CUDA BFloat16 gelu, hardswish, hardsigmoid (#44997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44997 Reviewed By: izdeby Differential Revision: D24547748 Pulled By: ngimel fbshipit-source-id: 34639dfe6ca41c3f59fd2af861e5e3b1bb86757a	2020-10-26 16:01:22 -07:00
ashish	88e94da580	Enable softmax and tiny norm FP16 tests on ROCm (#46363 ) Summary: This pull request enables the following tests on ROCm: * TestCuda.test_tiny_half_norm_ * TestNNDeviceTypeCUDA.test_softmax_cuda_float16 * TestNNDeviceTypeCUDA.test_softmax_cuda_float32 * TestNNDeviceTypeCUDA.test_softmax_results_cuda_float16 * TestNNDeviceTypeCUDA.test_softmax_results_cuda_float32 The earlier failures, because of which the tests were skipped, were because of a precision issue for FP16 compute on MI25 hardware with ROCm 3.7 and older. The fix was delivered in the compiler in ROCm 3.8. The pull request fixes https://github.com/pytorch/pytorch/issues/37493 cc: jeffdaily ezyang malfet mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46363 Reviewed By: heitorschueroff Differential Revision: D24325639 Pulled By: ezyang fbshipit-source-id: a7dbb238cf38c04b6592baad40b4d71725a358c9	2020-10-22 19:40:00 -07:00
albanD	27e2ea4cea	Make add_relu an internal function (#46676 ) Summary: Cleanup for 1.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46676 Reviewed By: gchanan Differential Revision: D24458565 Pulled By: albanD fbshipit-source-id: b1e4b4630233d3f1a4bac20e3077411d1ae17f7b	2020-10-22 18:08:15 -07:00
Xiao Wang	f326f6a8a0	Remove dilation restriction on cuDNN ConvTranspose2d (#46290 ) Summary: Close https://github.com/pytorch/pytorch/issues/31690 I have verified the functionality of ConvTranspose2d (with this PR) on roughly 32,000 random shapes on V100, A100, using cuDNN 8.0.4 and CUDA 11.1. The 32,000 shapes contain 4x8,000 of (fp16, fp32) x (nchw, nhwc) each. The random shapes are sampled from ```jsonc { "batch_size": {"low": 1, "high": 8}, "in_channels": {"low": 16, "high": 128}, "out_channels": {"low": 16, "high": 128}, "height": {"low": 16, "high": 224}, "stride": {"set": [[1, 1], [2, 2]]}, "padding": {"set": [[0, 0]]}, "output_padding": {"set": [[0, 0], [1, 1], [0, 1], [1, 0]]}, "kernel_size": {"set": [[3, 3], [1, 1], [1, 3], [3, 1], [2, 2]]}, "dilation": {"set": [[1, 1]]}, "deterministic": {"set": [true, false]}, "benchmark": {"set": [true, false]}, "allow_tf32": {"set": [true, false]}, "groups": {"set": [1, IN_CHANNELS]} } ``` - Input `width` is the same as `height`. - `groups` can be either 1, or the same as `in_channels` (grouped convolution). When `groups` is 1, `out_channels` is random; when `groups` is the same as `in_channels`, `out_channels` is also the same as `in_channels` All of the checked shapes can be found in csv files here https://github.com/xwang233/code-snippet/tree/master/convtranspose2d-dilation/functionality-check-cudnn8.0.4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46290 Reviewed By: mruberry Differential Revision: D24422091 Pulled By: ngimel fbshipit-source-id: 9f0120f2995ae1575c0502f1b2742390d7937b24	2020-10-22 13:42:03 -07:00
Sameer Deshmukh	982fa07ccb	torch.nn.Unfold accepts 0-dim for batch size (#40689 ) Summary: In partial completion of https://github.com/pytorch/pytorch/issues/12013 Allows specifying a tensor with 0-dim batch size for `torch.nn.Unfold()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40689 Reviewed By: zou3519 Differential Revision: D24441164 Pulled By: ngimel fbshipit-source-id: 49cd53b9b23f2e221aecdb4b5fed19a234038063	2020-10-22 13:05:24 -07:00
Alexander Grund	93719440b8	Replace map(lambda constructs (#46462 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462 Reviewed By: zou3519 Differential Revision: D24422343 Pulled By: ezyang fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237	2020-10-22 09:50:22 -07:00
Xiaodong Wang	e3b2bfa2a3	[pytorch] Early return in nn.EmbeddingBag when weight is empty (#46572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46572 When `num_samples == 0`, grid becomes zero. Although CUDA just silently proceeds, `cudaGetLastError()` will complain about the `Error: invalid configuration argument`. So it's actually failing in some future places that becomes really hard to debug. Reviewed By: jianyuh Differential Revision: D24409874 fbshipit-source-id: ca54de13b1ab48204bbad265e3f55b56b94a1a2f	2020-10-21 13:44:56 -07:00
Ivan Yashchuk	6de619e4a4	Allow converting parameters of nn.Module to complex dtypes (#44788 ) Summary: This PR makes it possible to cast the parameters of nn.Module to complex dtypes. The following code works with the proposed changes. ```python In [1]: import torch In [2]: lin = torch.nn.Linear(5, 1).to(torch.complex64) In [3]: lin(torch.zeros(3, 5, dtype=torch.complex64)) Out[3]: tensor([[-0.1739+0.j], [-0.1739+0.j], [-0.1739+0.j]], grad_fn=<AddmmBackward>) ``` Fixes https://github.com/pytorch/pytorch/issues/43477. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44788 Reviewed By: zou3519 Differential Revision: D24307225 Pulled By: anjali411 fbshipit-source-id: dacc4f5c8c9a99303f74d1f5d807cd657b3b69b5	2020-10-21 08:54:59 -07:00
Alexander Grund	5b0f400488	Replace list(map(...)) constructs by list comprehensions (#46461 ) Summary: As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant. It also fixes a bug detected by this where the argument order of `map` was confused: `030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)` Fixes https://github.com/pytorch/pytorch/issues/46392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461 Reviewed By: ailzhang Differential Revision: D24367015 Pulled By: ezyang fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7	2020-10-19 18:42:49 -07:00
Emilio Castillo	d38a71d579	`torch.nn.modules.LazyModuleMixin` and `torch.nn.LazyLinear` (Shape Inference II) (#44538 ) Summary: Retake on https://github.com/pytorch/pytorch/issues/40493 after all the feedback from albanD This PR implements the generic Lazy mechanism and a sample `LazyLinear` layer with the `UninitializedParameter`. The main differences with the previous PR are two; Now `torch.nn.Module` remains untouched. We don't require an explicit initialization or a dummy forward pass before starting the training or inference of the actual module. Making this much simpler to use from the user side. As we discussed offline, there was the suggestion of not using a mixin, but changing the `__class__` attribute of `LazyLinear` to become `Linear` once it's completely initialized. While this can be useful, by the time being we need `LazyLinear` to be a `torch.nn.Module` subclass since there are many checks that rely on the modules being instances of `torch.nn.Module`. This can cause problems when we create complex modules such as ``` class MyNetwork(torch.nn.Module): def __init__(self): super(MyNetwork, self).__init__() self.conv = torch.nn.Conv2d(20, 4, 2) self.linear = torch.nn.LazyLinear(10) def forward(self, x): y = self.conv(x).clamp(min=0) return self.linear(y) ``` Here, when the __setattr__ function is called at the time LazyLinear is registered, it won't be added to the child modules of `MyNetwork`, so we have to manually do it later, but currently there is no way to do such thing as we can't access the parent module from LazyLinear once it becomes the Linear module. (We can add a workaround to this if needed). TODO: Add convolutions once the design is OK Fix docstrings Pull Request resolved: https://github.com/pytorch/pytorch/pull/44538 Reviewed By: ngimel Differential Revision: D24162854 Pulled By: albanD fbshipit-source-id: 6d58dfe5d43bfb05b6ee506e266db3cf4b885f0c	2020-10-19 13:13:54 -07:00
Brian Hirsh	00c779a92b	detect inplace modifications of views earlier (fix #21875 ) (#46204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46204 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D24259500 Pulled By: bdhirsh fbshipit-source-id: 223f8a07da4e4121009fc0a8b6760d90eef089b3	2020-10-19 08:58:33 -07:00
Kurt Mohler	66505b64a5	Fix incorrect CUDA `torch.nn.Embedding` result when max_norm is not None and indices are not sorted (#45248 ) Summary: Sorting indices before calling `thrust::unique` fixes the issue. Fixes https://github.com/pytorch/pytorch/issues/44792 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45248 Reviewed By: mruberry Differential Revision: D24194696 Pulled By: ngimel fbshipit-source-id: ab59ef9d46b9917b1417bab25f80ce9780f0c930	2020-10-12 18:28:07 -07:00
Sameer Deshmukh	ba642d36ff	ReplicationPad accepts 0-dim batch size. (#39137 ) Summary: This PR patches the ReplicationPad modules in `torch.nn` to be compatible with 0-dim batch sizes. EDIT: this is part of the work on gh-12013 (make all nn layers accept empty batch size) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39137 Reviewed By: albanD Differential Revision: D24131386 Pulled By: ngimel fbshipit-source-id: 3d93057cbe14d72571943c8979d5937e4bbf743a	2020-10-06 11:54:32 -07:00
Brian Hirsh	869b2ca048	some documentation and style fixes to smooth_l1_loss (#45587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45587 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24024313 Pulled By: bdhirsh fbshipit-source-id: c50efb2934d7b9d3b090e92678319cde42c0df45	2020-10-02 07:47:31 -07:00
Natalia Gimelshein	9201c37d02	Use addmm directly for 1x1 convolution (#45557 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45274 Based on https://github.com/pytorch/pytorch/issues/44041, sets intermediate for backward computation (otherwise, backward tests are failing). Pull Request resolved: https://github.com/pytorch/pytorch/pull/45557 Reviewed By: izdeby Differential Revision: D24030655 Pulled By: ngimel fbshipit-source-id: 368fe9440668dffc004879f8b1d2dd3787d915c9	2020-10-02 00:26:53 -07:00
Sam Tsai	2596113a79	Add fuse support for batchnorm with affine=False (#45474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45474 When batchnorm affine is set to false, weight and bias is set to None, which is not supported in this case. Added a fix to set weights to 1 and bias to 0 if they are not set. Test Plan: Add unit test for testing fusing conv, batchnorm where batchnorm is in affine=False mode. Reviewed By: z-a-f Differential Revision: D23977080 fbshipit-source-id: 2782be626dc67553f3d27d8f8b1ddc7dea022c2a	2020-09-30 14:15:05 -07:00
lixinyu	417e3f85e5	Support tuple inputs in NN Module test (#44853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44853 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23750441 Pulled By: glaringlee fbshipit-source-id: 1b111a370a726b40521134b711c35f48dda99411	2020-09-28 22:05:05 -07:00
Xiang Gao	36c3fbc9e3	CUDA BFloat Conv (non-cuDNN) (#45007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45007 Reviewed By: zou3519 Differential Revision: D23933174 Pulled By: ngimel fbshipit-source-id: 84eb028f09c9197993fb9981c0efb535014e5f78	2020-09-28 11:42:42 -07:00
Vinod Kumar S	bf8cd21f2a	Py transformer coder test (#43976 ) Summary: Fixes #{[37756](https://github.com/pytorch/pytorch/issues/37756)} Added the missing Transformer coder python test scripts from C++ API test scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/43976 Reviewed By: jamesr66a Differential Revision: D23873250 Pulled By: glaringlee fbshipit-source-id: cdeae53231e02208463e7629ba2c1f00990150ea	2020-09-25 08:22:24 -07:00
Gao, Xiang	3f5eee666c	Adjust TF32 tests (#44240 ) Summary: - The thresholds of some tests are bumped up. Depending on the random generator, sometimes these tests fail with things like 0.0059 is not smaller than 0.005. I ran `test_nn.py` and `test_torch.py` for 10+ times to check these are no longer flaky. - Add `tf32_on_and_off` to new `matrix_exp` tests. - Disable TF32 on test suites other than `test_nn.py` and `test_torch.py` cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/44240 Reviewed By: mruberry Differential Revision: D23882498 Pulled By: ngimel fbshipit-source-id: 44a9ec08802c93a2efaf4e01d7487222478b6df8	2020-09-24 10:25:58 -07:00
Rong Rong	b8eab8cdbd	[hotfix] typo in NaiveConvolutionTranspose2d.cu (#45224 ) Summary: Fixes typo in `e2f49c8` Fixes https://github.com/pytorch/pytorch/issues/45172 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45224 Reviewed By: ezyang Differential Revision: D23879872 Pulled By: walterddr fbshipit-source-id: c3db6d4c6f2ac0e6887862d4217a79c030647cb9	2020-09-24 10:06:29 -07:00
Xiang Gao	67a19fecef	CUDA BFloat16 pooling (#45151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45151 Reviewed By: ailzhang Differential Revision: D23854056 Pulled By: ngimel fbshipit-source-id: 32f0835218c2602a09654a9ac2d161c4eb360f90	2020-09-22 20:19:25 -07:00
Mike Ruberry	ef885c10d8	[pytorch] Add triplet margin loss with custom distance (#43680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43680 As discussed [here](https://github.com/pytorch/pytorch/issues/43342), adding in a Python-only implementation of the triplet-margin loss that takes a custom distance function. Still discussing whether this is necessary to add to PyTorch Core. Test Plan: python test/run_tests.py Imported from OSS Reviewed By: albanD Differential Revision: D23363898 fbshipit-source-id: 1cafc05abecdbe7812b41deaa1e50ea11239d0cb	2020-09-22 11:35:52 -07:00
albanD	e155fbe915	add warning when ParameterList/Dict is used with DataParallel (#44405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44405 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D23783987 Pulled By: albanD fbshipit-source-id: 5018b0d381cb09301d2f88a98a910854f740ace1	2020-09-22 08:58:00 -07:00
Xiang Gao	faef89c89f	CUDA BFloat Pooling (#44836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44836 Reviewed By: mruberry Differential Revision: D23800992 Pulled By: ngimel fbshipit-source-id: 2945a27874345197cbd1d8a4fbd20816afc02c86	2020-09-19 15:43:36 -07:00
Xiang Gao	7ecfaef7ec	CUDA BFloat16 layernorm (#45002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45002 Reviewed By: mruberry Differential Revision: D23800931 Pulled By: ngimel fbshipit-source-id: cc213d02352907a3e945cd9fffd1de29e355a16c	2020-09-19 15:36:03 -07:00
Gao, Xiang	06389406bb	CUDA BFloat activations 1 (#44834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44834 Reviewed By: mruberry Differential Revision: D23752660 Pulled By: ngimel fbshipit-source-id: 209a937e8a9afe12b7dd86ecfa493c9417fd22fb	2020-09-18 15:48:49 -07:00
Xiang Gao	f2b3480795	CUDA BFloat softmax (#44837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44837 Reviewed By: glaringlee Differential Revision: D23767981 Pulled By: ngimel fbshipit-source-id: be92c25a1b66ed50a52e090db167079def6f6b39	2020-09-17 21:52:47 -07:00
Xiao Wang	1694fde7eb	Fix a GroupNorm cuda bug when input does not require_grad (#44863 ) Summary: Fix https://discuss.pytorch.org/t/illegal-memory-access-when-i-use-groupnorm/95800 `dX` is a Tensor, comparing `dX` with `nullptr` was wrong. cc BIT-silence who wrote the kernel. The test couldn't pass with `rtol=0` and `x.requires_grad=True`, so I have to update that to `1e-5`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44863 Reviewed By: mruberry Differential Revision: D23754101 Pulled By: BIT-silence fbshipit-source-id: 2eb0134dd489480e5ae7113a7d7b84629104cd49	2020-09-17 19:01:28 -07:00
Vitaliy Chiley	c71ce10cfc	add dilation to transposeconv's _output_padding method (#43793 ) Summary: This PR adds dilation to _ConvTransposeNd._output_padding method and tests using a bunch of different sized inputs. Fixes https://github.com/pytorch/pytorch/issues/14272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43793 Reviewed By: zou3519 Differential Revision: D23493313 Pulled By: ezyang fbshipit-source-id: bca605c428cbf3a97d3d24316d8d7fde4bddb307	2020-09-14 21:28:27 -07:00
Gregory Chanan	c8914afdfa	Merge criterion_tests and new_criterion_tests. (#44398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44398 These end up executing the same tests, so no reason to have them separate. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23600855 Pulled By: gchanan fbshipit-source-id: 0952492771498bf813f1bf8e1d7c8dce574ec965	2020-09-10 08:29:59 -07:00
Chris Huynh	7b547f086f	To fix extra memory allocation when using circular padding (#39273 ) Summary: For fixing https://github.com/pytorch/pytorch/issues/39256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39273 Reviewed By: anjali411 Differential Revision: D23471811 Pulled By: mruberry fbshipit-source-id: fb324b51baea765311715cdf14642b334f335733	2020-09-10 00:15:31 -07:00
taiyuanz	c515881137	Add reset_grad() function (#44423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23010859 Pulled By: ngimel fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564	2020-09-09 22:05:45 -07:00
lixinyu	032480d365	fix typo in embedding_bag_non_contiguous_weight test (#44382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44382 This is to fix a typo that introduced in #44032. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23601316 Pulled By: glaringlee fbshipit-source-id: 17d6de5900443ea46c7a6ee9c7614fe6f2d92890	2020-09-09 13:30:36 -07:00
Xiao Wang	ef4475f902	[Reland] Optimize code path for adaptive_avg_pool2d when output size is (1, 1) (#44211 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/43986 DO NOT MERGE YET. XLA failure seems real. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44211 Reviewed By: mrshenli Differential Revision: D23590505 Pulled By: ngimel fbshipit-source-id: 6ee516b0995bfff6efaf740474c82cb23055d274	2020-09-09 12:08:14 -07:00
kshitij12345	6dd53fb58d	[fix] output of `embedding_bag` with non-contiguous weight (#44032 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43723 use weight.contiguous on fast-path as it expects contiguous tensor. TODO: * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44032 Reviewed By: izdeby Differential Revision: D23502200 Pulled By: glaringlee fbshipit-source-id: 4a7b546b3e8b1ad35c287a634b4e990a1ccef874	2020-09-08 16:07:13 -07:00
Natalia Gimelshein	0c2bc4fe20	Revert D23468286: [pytorch][PR] Optimize code path for adaptive_avg_pool2d when output size is (1, 1) Test Plan: revert-hammer Differential Revision: D23468286 (`f8f35fddd4`) Original commit changeset: cc181f705fea fbshipit-source-id: 3a1db0eef849e0c2f3c0c64040d2a8b799644fa3	2020-09-04 11:28:15 -07:00
Xiao Wang	f8f35fddd4	Optimize code path for adaptive_avg_pool2d when output size is (1, 1) (#43986 ) Summary: Benchmark: code: https://github.com/xwang233/code-snippet/blob/master/adaptive-avg-pool2d-output-1x1/adap.ipynb \| shape \| time_before (ms) \| time_after (ms) \| \| --- \| --- \| --- \| \| (2, 3, 4, 4), torch.contiguous_format, cpu \| 0.035 \| 0.031 \| \| (2, 3, 4, 4), torch.contiguous_format, cuda \| 0.041 \| 0.031 \| \| (2, 3, 4, 4), torch.channels_last, cpu \| 0.027 \| 0.029 \| \| (2, 3, 4, 4), torch.channels_last, cuda \| 0.031 \| 0.034 \| \| (2, 3, 4, 4), non_contiguous, cpu \| 0.037 \| 0.026 \| \| (2, 3, 4, 4), non_contiguous, cuda \| 0.062 \| 0.033 \| \| (4, 16, 32, 32), torch.contiguous_format, cpu \| 0.063 \| 0.055 \| \| (4, 16, 32, 32), torch.contiguous_format, cuda \| 0.043 \| 0.031 \| \| (4, 16, 32, 32), torch.channels_last, cpu \| 0.052 \| 0.064 \| \| (4, 16, 32, 32), torch.channels_last, cuda \| 0.190 \| 0.033 \| \| (4, 16, 32, 32), non_contiguous, cpu \| 0.048 \| 0.035 \| \| (4, 16, 32, 32), non_contiguous, cuda \| 0.062 \| 0.033 \| \| (8, 128, 64, 64), torch.contiguous_format, cpu \| 0.120 \| 0.109 \| \| (8, 128, 64, 64), torch.contiguous_format, cuda \| 0.043 \| 0.044 \| \| (8, 128, 64, 64), torch.channels_last, cpu \| 1.303 \| 0.260 \| \| (8, 128, 64, 64), torch.channels_last, cuda \| 1.237 \| 0.049 \| \| (8, 128, 64, 64), non_contiguous, cpu \| 0.132 \| 0.128 \| \| (8, 128, 64, 64), non_contiguous, cuda \| 0.062 \| 0.031 \| \| (16, 256, 224, 224), torch.contiguous_format, cpu \| 17.232 \| 14.807 \| \| (16, 256, 224, 224), torch.contiguous_format, cuda \| 1.930 \| 1.930 \| \| (16, 256, 224, 224), torch.channels_last, cpu \| 245.025 \| 24.345 \| \| (16, 256, 224, 224), torch.channels_last, cuda \| 15.593 \| 1.944 \| \| (16, 256, 224, 224), non_contiguous, cpu \| 11.738 \| 6.460 \| \| (16, 256, 224, 224), non_contiguous, cuda \| 0.524 \| 0.251 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/43986 Reviewed By: anjali411 Differential Revision: D23468286 Pulled By: ngimel fbshipit-source-id: cc181f705feacb2f86df420d648cc59fda69fdb7	2020-09-04 03:37:33 -07:00
Gregory Chanan	5973b44d9e	Rename NewCriterionTest to CriterionTest. (#44056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44056 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23482573 Pulled By: gchanan fbshipit-source-id: dde0f1624330dc85f48e5a0b9d98fb55fdb72f68	2020-09-03 10:29:20 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Heitor Schueroff de Souza	13a48ac1f3	MaxPool1d without indices optimization (#43745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43745 This is part of a larger effort to refactor and optimize the pooling code. Previously I started working on MaxPool2d here https://github.com/pytorch/pytorch/pull/43267 but since it uses MaxPool1d as a subroutine, it made more sense to work on 1D first and get it tested and optimized and then move up to 2D and then 3D. Below are some benchmarking results, the python script I used is under the results. ## Benchmarking ``` Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_googlenet[(3, 2, 0, 1, 0)-new] 79.7659 (1.03) 1,059.6327 (5.32) 90.6280 (1.01) 19.1196 (1.41) 84.2176 (1.01) 2.4289 (1.0) 1079;2818 11.0341 (0.99) 9055 1 test_googlenet[(3, 2, 0, 1, 0)-old] 505.1531 (6.55) 830.8962 (4.17) 563.4763 (6.29) 65.3974 (4.81) 538.3361 (6.43) 80.5371 (33.16) 242;99 1.7747 (0.16) 1742 1 test_googlenet[(3, 2, 0, 1, 1)-new] 80.2949 (1.04) 233.0020 (1.17) 97.6498 (1.09) 19.1228 (1.41) 89.2282 (1.07) 18.5743 (7.65) 1858;741 10.2407 (0.92) 9587 1 test_googlenet[(3, 2, 0, 1, 1)-old] 513.5350 (6.66) 977.4677 (4.91) 594.4559 (6.63) 69.9372 (5.15) 577.9080 (6.90) 79.8218 (32.86) 503;84 1.6822 (0.15) 1675 1 test_googlenet[(3, 2, 1, 1, 0)-new] 77.1061 (1.0) 199.1168 (1.0) 89.6529 (1.0) 13.5864 (1.0) 83.7557 (1.0) 7.5139 (3.09) 1419;1556 11.1541 (1.0) 7434 1 test_googlenet[(3, 2, 1, 1, 0)-old] 543.6055 (7.05) 964.5708 (4.84) 636.9867 (7.11) 84.0732 (6.19) 616.7777 (7.36) 100.4562 (41.36) 434;65 1.5699 (0.14) 1552 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_inception[(3, 2, 0, 1, 0)-new] 84.5827 (1.00) 184.2827 (1.0) 90.5438 (1.01) 9.6324 (1.0) 89.3027 (1.05) 4.5672 (1.03) 637;759 11.0444 (0.99) 6274 1 test_inception[(3, 2, 0, 1, 0)-old] 641.2268 (7.59) 1,704.8977 (9.25) 686.9383 (7.65) 57.2499 (5.94) 682.5905 (8.01) 58.3753 (13.17) 86;21 1.4557 (0.13) 802 1 test_inception[(3, 2, 0, 1, 1)-new] 84.5008 (1.0) 1,093.6335 (5.93) 89.8233 (1.0) 14.0443 (1.46) 85.2682 (1.0) 4.4331 (1.0) 802;1106 11.1330 (1.0) 9190 1 test_inception[(3, 2, 0, 1, 1)-old] 643.7078 (7.62) 851.4188 (4.62) 687.4905 (7.65) 41.1116 (4.27) 685.1386 (8.04) 60.2733 (13.60) 286;14 1.4546 (0.13) 1300 1 test_inception[(3, 2, 1, 1, 0)-new] 106.0739 (1.26) 258.5649 (1.40) 115.3597 (1.28) 17.5436 (1.82) 106.9643 (1.25) 5.5470 (1.25) 894;1402 8.6685 (0.78) 7635 1 test_inception[(3, 2, 1, 1, 0)-old] 651.0504 (7.70) 955.2278 (5.18) 698.0295 (7.77) 45.5097 (4.72) 692.8109 (8.13) 64.6794 (14.59) 145;15 1.4326 (0.13) 909 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_batch_size[new] 2.9608 (1.0) 5.1127 (1.0) 3.3096 (1.0) 0.1936 (1.0) 3.3131 (1.0) 0.2093 (1.0) 71;6 302.1515 (1.0) 297 1 test_large_batch_size[old] 130.6583 (44.13) 152.9521 (29.92) 137.1385 (41.44) 7.4352 (38.40) 135.1784 (40.80) 5.1358 (24.53) 1;1 7.2919 (0.02) 7 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_channel_size[new] 2.9696 (1.0) 5.5595 (1.0) 3.5997 (1.0) 0.5836 (1.0) 3.3497 (1.0) 0.3445 (1.0) 58;54 277.8014 (1.0) 277 1 test_large_channel_size[old] 19.6838 (6.63) 22.6637 (4.08) 21.1775 (5.88) 0.8610 (1.48) 21.3739 (6.38) 1.4930 (4.33) 13;0 47.2199 (0.17) 36 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_width[new] 1.7714 (1.0) 2.4104 (1.0) 1.8988 (1.0) 0.0767 (1.0) 1.8911 (1.0) 0.0885 (1.0) 86;13 526.6454 (1.0) 373 1 test_large_width[old] 19.5708 (11.05) 22.8755 (9.49) 20.7987 (10.95) 0.7009 (9.14) 20.6623 (10.93) 0.8584 (9.70) 14;1 48.0799 (0.09) 46 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_multithreaded[new] 15.0560 (1.0) 24.2891 (1.0) 16.1627 (1.0) 1.5657 (1.0) 15.7182 (1.0) 0.7598 (1.0) 4;6 61.8709 (1.0) 65 1 test_multithreaded[old] 115.7614 (7.69) 120.9670 (4.98) 118.3004 (7.32) 1.6259 (1.04) 118.4164 (7.53) 1.9613 (2.58) 2;0 8.4531 (0.14) 8 1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean ``` ### Benchmarking script To run the benchmark make sure you have pytest-benchmark installed with `pip install pytest-benchmark` and use the following command: `pytest benchmark.py --benchmark-sort='name'` ``` import torch import pytest def _test_speedup(benchmark, batches=1, channels=32, width=32, kernel_size=2, stride=None, padding=0, dilation=1, ceil_mode=False, return_indices=False): torch.set_num_threads(1) x = torch.randn((batches, channels, width)) model = torch.nn.MaxPool1d(kernel_size, stride, padding, dilation, return_indices, ceil_mode) benchmark(model, x) pytest.mark.benchmark(group="inception") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_inception(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 147, params, return_indices=return_indices) pytest.mark.benchmark(group="googlenet") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_googlenet(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 112, params, return_indices=return_indices) pytest.mark.benchmark(group="large batch size") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_batch_size(benchmark, return_indices): _test_speedup(benchmark, 100000, 1, 32, return_indices=return_indices) pytest.mark.benchmark(group="large channel size") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_channel_size(benchmark, return_indices): _test_speedup(benchmark, 1, 100000, 32, return_indices=return_indices) pytest.mark.benchmark(group="large width") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_width(benchmark, return_indices): _test_speedup(benchmark, 1, 32, 100000, return_indices=return_indices) pytest.mark.benchmark(group="multithreading") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_multithreaded(benchmark, return_indices): x = torch.randn((40, 10000, 32)) model = torch.nn.MaxPool1d(2, return_indices=return_indices) benchmark(model, x) ``` ## Discussion The new algorithm is on average 7x faster than the old one. But because the old algorithm had many issues with how it parallelized the code and made use of the cache, one can come up with input parameters (like large batch size) that will make the new algorithm much faster than the original one. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23425348 Pulled By: heitorschueroff fbshipit-source-id: 3fa3f9b8e71200da48424a95510124a83f50d7b2	2020-09-01 08:40:01 -07:00
Gregory Chanan	a67246b2d4	Add reduction string test for ctc_loss. (#43884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43884 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23427907 Pulled By: gchanan fbshipit-source-id: 889bd92e9d3e0528b57e3952fc83e25bc7abe293	2020-09-01 07:01:54 -07:00
Gregory Chanan	42c895de4d	Properly check that reduction strings are valid for l1_loss, smoothl1_loss, and mse_loss. (#43527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43527 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23306786 Pulled By: gchanan fbshipit-source-id: f3b7c9c02ae02813da116cb6b247a95727c47587	2020-08-31 09:53:56 -07:00
Peter Bell	065ebdb92f	TensorIterator: Check for memory overlap in all `binary_op`s (#43419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43419 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23298655 Pulled By: zou3519 fbshipit-source-id: 82e0ff308a6a7e46b4342d57ddb4c1d73745411a	2020-08-28 08:40:19 -07:00
Peter Bell	bdee8e02c0	TensorIterator: Check memory overlap in all `unary_op`s (#43418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43418 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23298651 Pulled By: zou3519 fbshipit-source-id: 84be498f5375813fd10cf30b8beabbd2d15210a3	2020-08-28 08:39:13 -07:00
Nikita Shulga	4afbf39737	Add nn.functional.adaptive_avg_pool size empty tests (#42857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42857 Reviewed By: seemethere Differential Revision: D23053677 Pulled By: malfet fbshipit-source-id: b3d0d517cddc96796461332150e74ae94aac8090	2020-08-11 12:59:58 -07:00
Kurt Mohler	42b4a7132e	Raise error if `at::native::embedding` is given 0-D weight (#42550 ) Summary: Previously, `at::native::embedding` implicitly assumed that the `weight` argument would be 1-D or greater. Given a 0-D tensor, it would segfault. This change makes it throw a RuntimeError instead. Fixes https://github.com/pytorch/pytorch/issues/41780 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42550 Reviewed By: smessmer Differential Revision: D23040744 Pulled By: albanD fbshipit-source-id: d3d315850a5ee2d2b6fcc0bdb30db2b76ffffb01	2020-08-11 08:26:45 -07:00
Nikita Shulga	3cf2551f2f	Fix `torch.nn.functional.grid_sample` crashes if `grid` has NaNs (#42703 ) Summary: In `clip_coordinates` replace `minimum(maximum(in))` composition with `clamp_max(clamp_min(in))` Swap order of `clamp_min` operands to clamp NaNs in grid to 0 Fixes https://github.com/pytorch/pytorch/issues/42616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42703 Reviewed By: ezyang Differential Revision: D22987447 Pulled By: malfet fbshipit-source-id: a8a2d6de8043d6b77c8707326c5412d0250efae6	2020-08-10 16:20:09 -07:00
Peter Bell	33519e19ab	Fix 64-bit indexing in GridSampler (#41923 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41656 For the CPU version, this is a regression introduced in https://github.com/pytorch/pytorch/issues/10980 which vectorized the `grid_sampler_2d` implementation. It uses the AVX2 gather intrinsic which for `float` requires 32-bit indexing to match the number of floats in the AVX register. There is also an `i64gather_ps` variant but this only utilizes half of the vector width so would be expected to give worse performance in the more likely case where 32-bit indexing is acceptable. So, I've left the optimised AVX version as-is and reinstated the old non-vectorized version as a fallback. For the CUDA version, this operation has never supported 32-bit indexing so this isn't a regression. I've templated the kernel on index type and added 64-bit variants. Although I gather in some places a simple `TORCH_CHECK(canUse32BitIndexMath(...))` is used instead. So, there is a decision to be made here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41923 Reviewed By: glaringlee Differential Revision: D22925931 Pulled By: zou3519 fbshipit-source-id: 920816107aae26360c5e7f4e9c729fa9057268bb	2020-08-06 16:08:09 -07:00
Jianyu Huang	1c5c289b62	[pt] Add incude_last_offset option to EmbeddingBag mean and max (#42215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42215 Specifically on https://github.com/pytorch/pytorch/pull/27477#discussion_r371402079 We would like to supported with include_last=True overall for other reduction types like mean and max. It now causes further code fragmentation in DPER (https://www.internalfb.com/intern/diff/D22794469/). More details: https://www.internalfb.com/intern/diff/D22794469/?dest_fbid=309597093427021&transaction_id=631457624153457 ghstack-source-id: 108733009 Test Plan: ``` buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" ``` ``` (base) [jianyuhuang@devbig281.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ TORCH_SHOW_CPP_STACKTRACES=1 buck test mode/dev-nosan //caffe2/test: nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" --print-passing-details Parsing buck files: finished in 1.2 sec Building: finished in 5.5 sec (100%) 10130/10130 jobs, 2 updated Total time: 6.7 sec More details at https://www.internalfb.com/intern/buck/build/dbdc2063-69d8-45cb-9146-308a9e8505ef First unknown argument: --print-passing-details. Falling back to TestPilot classic. Trace available for this run at /tmp/testpilot.20200728-195414.1422748.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par Discovering tests Running 1 test Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375 ✓ caffe2/test:nn - test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) 0.162 1/1 (passed) Test output: > /data/users/jianyuhuang/fbsource/fbcode/buck-out/dev/gen/caffe2/test/nn#binary,link-tree/torch/_utils_internal.py:103: DeprecationWarning: This is a NOOP in python >= 3.7, its just too dangerous with how we write code at facebook. Instead we patch os.fork and multiprocessing which can raise exceptions if a deadlock would happen. > threadSafeForkRegisterAtFork() > /usr/local/fbcode/platform007/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__ > return f(args, *kwds) > test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) ... Couldn't download test skip set, leaving all tests enabled... > ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.162s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375 Summary (total time 5.54s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Did _not_ run with tpx. See https://fburl.com/tpx for details. ``` Reviewed By: dzhulgakov Differential Revision: D22801881 fbshipit-source-id: 80a624465727081bb9bf55c28419695a3d79c6e5	2020-07-29 01:20:00 -07:00
X Wang	b0424a895c	Raise RuntimeError for zero stride pooling (#41819 ) Summary: Close https://github.com/pytorch/pytorch/issues/41767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41819 Reviewed By: mrshenli Differential Revision: D22780634 Pulled By: ngimel fbshipit-source-id: 376ce5229ad5bd60804d839340d2c6505cf3288d	2020-07-28 11:07:12 -07:00
Alvaro	3e121d9688	Amend docstring and add test for Flatten module (#42084 ) Summary: I've noticed when PR https://github.com/pytorch/pytorch/issues/22245 introduced `nn.Flatten`, the docstring had a bug where it wouldn't render properly on the web, and this PR addresses that. Additionally, it adds a unit test for this module. Actual ![image](https://user-images.githubusercontent.com/13088001/88483672-cf896a00-cf3f-11ea-8b1b-a30d152e1368.png) Expected ![image](https://user-images.githubusercontent.com/13088001/88483642-86391a80-cf3f-11ea-8333-0964a027a172.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42084 Reviewed By: mrshenli Differential Revision: D22756662 Pulled By: ngimel fbshipit-source-id: 60c58c18c9a68854533196ed6b9e9fb0d4f83520	2020-07-27 11:04:28 -07:00
Kurt Mohler	ec683299eb	Reland Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#41538 ) Summary: Reland PR https://github.com/pytorch/pytorch/issues/40056 A new overload of upsample_linear1d_backward_cuda was added in a recent commit, so I had to add the nondeterministic alert to it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41538 Reviewed By: zou3519 Differential Revision: D22608376 Pulled By: ezyang fbshipit-source-id: 54a2aa127e069197471f1feede6ad8f8dc6a2f82	2020-07-22 13:12:29 -07:00
Vinnam Kim	825a387ea2	Fix bug on the backpropagation of LayerNorm when create_graph=True (#41595 ) Summary: Solve an issue https://github.com/pytorch/pytorch/issues/41332 I found the bug at https://github.com/pytorch/pytorch/issues/41332 is caused by LayerNorm. Current implementations of LayerNorm have a disparity between 1. [`create_graph=False` CUDA implementation](`dde3d5f4a8/aten/src/ATen/native/cuda/layer_norm_kernel.cu (L145)`) 2. [`create_graph=True` implementation](`dde3d5f4a8/tools/autograd/templates/Functions.cpp (L2536)`) With this bug-fix, https://github.com/pytorch/pytorch/issues/41332 is solved. Ailing BIT-silence Signed-off-by: Vinnam Kim <vinnamkim@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41595 Reviewed By: houseroad Differential Revision: D22598415 Pulled By: BIT-silence fbshipit-source-id: 63e390724bd935dc8e028b4dfb75d34a80558c3a	2020-07-22 00:19:12 -07:00
Alvaro	c89c294ef9	Add Unflatten Module (#41564 ) Summary: This PR implements a feature extension discussed in https://github.com/pytorch/pytorch/issues/41516. I followed this other PR https://github.com/pytorch/pytorch/issues/22245 to add this other module. While I was at it, I also added `extra_repr()` method in `Flatten` which was missing. I see there are no unit tests for these modules. Should I add those too? If so, what is the best place I should place these? Pull Request resolved: https://github.com/pytorch/pytorch/pull/41564 Reviewed By: gchanan Differential Revision: D22636766 Pulled By: albanD fbshipit-source-id: f9efdefd3ffe7d9af9482087625344af8f990943	2020-07-21 07:43:02 -07:00
Mike Ruberry	b2b8af9645	Removes assertAlmostEqual (#41514 ) Summary: This test function is confusing since our `assertEqual` behavior allows for tolerance to be specified, and this is a redundant mechanism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41514 Reviewed By: ngimel Differential Revision: D22569348 Pulled By: mruberry fbshipit-source-id: 2b2ff8aaa9625a51207941dfee8e07786181fe9f	2020-07-16 10:35:12 -07:00
Zhang, Xiaobing	b48ee175e6	[reland][DNNL]:enable conv3d (#40691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40691 Test Plan: Imported from OSS Differential Revision: D22296548 Pulled By: VitalyFedyunin fbshipit-source-id: 8e2a7cf14e8bdfa2f29b735a89e8c83f6119e68d	2020-07-15 13:54:41 -07:00
Shen Li	954c260061	Revert D22480638: [pytorch][PR] Add non-deterministic alert to CUDA operations that use `atomicAdd()` Test Plan: revert-hammer Differential Revision: D22480638 (`6ff306b8b5`) Original commit changeset: 4cc913cb3ca6 fbshipit-source-id: e47fa14b5085bb2b74a479bd0830efc2d7604eea	2020-07-15 12:10:05 -07:00
Kurt Mohler	6ff306b8b5	Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#40056 ) Summary: Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40056 Differential Revision: D22480638 Pulled By: ezyang fbshipit-source-id: 4cc913cb3ca6d4206de80f4665bbc9031aa3ca01	2020-07-15 10:57:32 -07:00
Wojciech Baranowski	20f3051f7d	[adaptive_]max_pool{1,2,3}d: handle edge case when input is filled with -inf (#40665 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40131 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40665 Differential Revision: D22463538 Pulled By: ezyang fbshipit-source-id: 7e08fd0205926911d45aa150012154637e64a8d4	2020-07-14 21:51:40 -07:00
Kurt Mohler	0b73ea0ea2	Change BCELoss size mismatch warning into an error (#41426 ) Summary: BCELoss currently uses different broadcasting semantics than numpy. Since previous versions of PyTorch have thrown a warning in these cases telling the user that input sizes should match, and since the CUDA and CPU results differ when sizes do not match, it makes sense to upgrade the size mismatch warning to an error. We can consider supporting numpy broadcasting semantics in BCELoss in the future if needed. Closes https://github.com/pytorch/pytorch/issues/40023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41426 Reviewed By: zou3519 Differential Revision: D22540841 Pulled By: ezyang fbshipit-source-id: 6c6d94c78fa0ae30ebe385d05a9e3501a42b3652	2020-07-14 20:34:06 -07:00
Peter Bell	87bf04fe12	AvgPool: Ensure all cells are valid in ceil mode (#41368 ) Summary: Closes https://github.com/pytorch/pytorch/issues/36977 This avoid the division by zero that was causing NaNs to appear in the output. `AvgPooling2d` and `AvgPooling3d` both had this issue on CPU and CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41368 Reviewed By: ailzhang Differential Revision: D22520013 Pulled By: ezyang fbshipit-source-id: 3ece7829f858f5bc17c2c1d905266ac510f11194	2020-07-14 09:24:30 -07:00
Kimish Patel	82c9f79e0e	Add fused add_relu op. (#39342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39342 Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Imported from OSS Differential Revision: D21822397 fbshipit-source-id: 03df83a3e46ddb48a90c5a6f755227a7e361a0e8	2020-07-09 16:25:11 -07:00
Liu	54d7a1e3f4	Fix module dict key ordering (#40905 ) Summary: fix https://github.com/pytorch/pytorch/issues/40227 Removed the sorting operation both in ModuleDict class, updated the docstring. Also remove a sort operation in corresponding unit test, which will lead to unit test fail. BC Note: Python version after 3.6, the plain dict will preserve the order of keys. example: For a python 3.6+ user, if he is initial a ModuleDict instance using plain python dict: { "b": torch.nn.MaxPool2d(3), "a": torch.nn.MaxPool2d(3) } , he will get a ModuleDict which preserve the order: ModuleDict( (b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) (a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) ) For a python 3.5 user, if we maintain the same input, then the output ModuleDict could be: ModuleDict( (a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) (b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40905 Differential Revision: D22357480 Pulled By: albanD fbshipit-source-id: 0e2502769647bb64f404978243ca1ebe5346d573	2020-07-06 06:40:48 -07:00
Sameer Deshmukh	cf8a9b50ca	Allow ReflectionPad to accept 0-dim batch sizes. (#39231 ) Summary: Allows ReflectionPad 1D and 2D to accept 0-dim batch sizes. Related to issues: * https://github.com/pytorch/pytorch/issues/38115 * https://github.com/pytorch/pytorch/issues/12013 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39231 Reviewed By: ezyang Differential Revision: D22205717 Pulled By: mruberry fbshipit-source-id: 6744661002fcbeb4aaafd8693fb550ed53f3e00f	2020-06-24 22:24:05 -07:00
Xiao Wang	17d3f74ea3	Relax cudnn conditions for channels-last convolutions (#38904 ) Summary: Follow up of https://github.com/pytorch/pytorch/issues/38044. Thanks ptrblck, mcarilli for the help on discussing the changes! Could fix https://github.com/pytorch/pytorch/issues/37725 by skipping the depthwise-workload check introduced in https://github.com/pytorch/pytorch/issues/22302. This PR also relaxed dilated convolution for channels-last. The testing script is https://gist.github.com/xwang233/82a707f69bb710cb612349280a2c5f41. About 387k conv arguments were tested and no cudnn exception was thrown. cc ngimel VitalyFedyunin ptrblck mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/38904 Differential Revision: D22155797 Pulled By: VitalyFedyunin fbshipit-source-id: 81b5736cec67ea263029121521c6acafd9dddba6	2020-06-22 10:59:37 -07:00
F-G Fernandez	881c1adfcd	Fixed buffer update in BatchNorm when track_running_stats is set to False (#38084 ) Summary: This PR aims at tackling https://github.com/pytorch/pytorch/issues/37823 by: - ensuring that buffers will be used for normalization computation but won't be updated, when buffers are not None, and `track_running_stats=False` - adding a corresponding unittest to ensure expected behaviour Any feedback is welcome! _Note: we might want to update the docstrings of `BatchNorm*d`, feel free to share any suggestion!_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/38084 Differential Revision: D22047871 Pulled By: ezyang fbshipit-source-id: 5acbcad9773e7901f26d625db71d43d7dc236d3e	2020-06-22 08:17:31 -07:00
Xiao Wang	1670ea9474	Remove overload of GPU max_pool3d with kernel_width; fix nan, inf in GPU {fractional,adaptive} max_pool{2,3}d (#39903 ) Summary: Fix https://github.com/pytorch/pytorch/issues/39846. Fix https://github.com/pytorch/pytorch/issues/39044 The problem was that `max_pool3d_with_indices_single_out_frame` has an overload of kernel_width being a template argument. The two overloaded kernels were supposed to be identical, however, they were not. The general version `da3073e9b1/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu (L69-L73)` The overloaded version `da3073e9b1/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu (L130-L134)` While the max_pool3d being "switch-case"-ed to the overloaded version, the NaN value comparison is ignored. Also, maintaining two overloaded versions of such a complicated kernel would be hard. I'm not sure if the overloaded version would even give huge performance benefit. So I propose to remove the kernel_width overloaded version. Also, the current test of max_pool_XD_nan forgot the device kwarg. I added that. Edit: profiling before and after script: https://github.com/xwang233/code-snippet/blob/master/maxpool-3d-kw-template-arg/a.py plot: https://github.com/xwang233/code-snippet/blob/master/maxpool-3d-kw-template-arg/b.ipynb The performance difference is within +- 5%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39903 Differential Revision: D22080759 Pulled By: ngimel fbshipit-source-id: 4dacdd266a0522b3ff432eb9d58b131fa86821e9	2020-06-17 16:18:33 -07:00
Emilio Castillo	5e77999ecb	Add global hooks to `torch.nn.Module` (#38972 ) Summary: This allows registering hooks that will be executed for every module. This idea arose in a discussion with tkerola and niboshi kindly proposed this approach. The use case for this is to avoid boilerplate code when registering the same hook for all the modules in a complex model, the internal use-case was to allow every model to accept a NumPy array in the forward pass in a simpler way. Other use cases involve general mechanisms for plotting or tracing & debugging. Currently, this is shared for all the modules but this can be worked out to have the hooks shared only per type of module. If this functionality is not needed feel free to close the PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38972 Differential Revision: D22091364 Pulled By: albanD fbshipit-source-id: 204ff5f9e119eff5bdd9140c64cb5dc467bb23a2	2020-06-17 12:20:35 -07:00
Emilio Castillo	5200814cfa	Fix test_hook_* issues (#40135 ) Summary: Follows https://github.com/pytorch/pytorch/issues/38972 Some of the changes asked by albanD in the above review are appliable to the regular hooks tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40135 Differential Revision: D22091389 Pulled By: albanD fbshipit-source-id: e1004213276bfb189167b9870e1a88b3d23b458c	2020-06-17 08:50:42 -07:00
jiej	bfcb687b9c	Nearest interpolation gpu implementation fix [Resolves issue #38985 ] (#39055 ) Summary: fix nearest upsample dgrad bug, where window computation was wrong previously; fix python test where previously GPU implementation was not tested; Pull Request resolved: https://github.com/pytorch/pytorch/pull/39055 Differential Revision: D21763242 Pulled By: albanD fbshipit-source-id: 9b1d5365f40176450f529136110542fd36bd7f58	2020-05-28 08:07:14 -07:00
Ailing	20397285c6	Replace use of np.allclose in tests. (#34287 ) Summary: fixes https://github.com/pytorch/pytorch/issues/34096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34287 Differential Revision: D21735525 Pulled By: ailzhang fbshipit-source-id: 611da17cfc5a3fee77d482abccf8f9854f504263	2020-05-27 15:29:35 -07:00
Mike Ruberry	13120bf677	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21740237 Pulled By: mruberry fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042	2020-05-27 06:31:07 -07:00
Rohan Varma	63e545e0fe	Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol Test Plan: revert-hammer Differential Revision: D21717199 Original commit changeset: 9feb856f94ee fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259	2020-05-26 18:23:59 -07:00
Xiao Wang	e4a3c584d5	Fix max_pool2d nchw backward bug (#38953 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38764 The current problem is that, `top_diff` and `top_mask` pointers are shifted "accumulatively" with for-n and for-c loops. This may cause overflow and illegal memory access when the loop counts are greater than one, that is n > 65535 or c > 65535 (the case in https://github.com/pytorch/pytorch/issues/38764). Since neither of n > 65535 or c > 65535 is common, it has not been seen before. The simple fix would be using new pointer variables for the n & c offset instead of directly modifying `top_diff` or `top_mask`. However, I think the current nchw max_pool2d GPU impl still has plenty of room for performance improvement. We can check that in a later PR if needed. Slightly clean up the indentation. Also add tests to use CPU impl as a reference check. cc skrah Pull Request resolved: https://github.com/pytorch/pytorch/pull/38953 Differential Revision: D21721930 Pulled By: ezyang fbshipit-source-id: fef7d911d814f8ed9fd67c60cabe5d52f8fd3d57	2020-05-26 12:00:31 -07:00
Xiao Wang	583ff947e1	Fix max_pool2d for returning wrong shape with return_indices=True on cuda (#38992 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38986 The current code only resizes pooling output but forget to resize indices as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38992 Differential Revision: D21718324 Pulled By: ngimel fbshipit-source-id: 7cf937966d38ab2167be79979475c4e0cacbf82c	2020-05-26 11:27:36 -07:00
Mike Ruberry	6ddca30b2d	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21717199 Pulled By: mruberry fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a	2020-05-26 08:30:23 -07:00
Natalia Gimelshein	c34b333230	improve accuracy of logsoftmax computation on cuda (#38945 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38839. Previously, if magnitude of input values was large, when computing `max+log(sum)` the `log(sum)` value was essentially ignored, now the result is computed as `x-max-log(sum)` which has a better chance of preserving accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38945 Differential Revision: D21712483 Pulled By: ngimel fbshipit-source-id: c1a3599ed981ba7a7fd130cbd7040a706b7eace0	2020-05-26 08:29:56 -07:00
jiej	5b8a79ab49	fix the device inconsistency for import convert_sync_batchnorm (#38729 ) Summary: This fixes the device inconsistency reported in https://github.com/pytorch/pytorch/issues/37930 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38729 Differential Revision: D21671039 Pulled By: ngimel fbshipit-source-id: 17fdb4eae2ddaf64560dd026fe39958536ab313f	2020-05-20 15:42:53 -07:00
Jeff Daily	55914f8e83	Add skipCUDAIfRocm to test_nn test_softmax_results. (#38724 ) Summary: CC ezyang xw285cornell sunway513 Commit `59d92e442b` (https://github.com/pytorch/pytorch/issues/38557) has caused this test to regularly fail on ROCm CI gfx900 hosts. Skipping test until root cause analysis can complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38724 Differential Revision: D21645815 Pulled By: xw285cornell fbshipit-source-id: 4087e9565710c271ca5c026a5ae0c5132e56f44d	2020-05-19 13:20:34 -07:00
Natalia Gimelshein	54d4b419db	fix clip_grad_norm to work with parameters on the different devices (#38615 ) Summary: Per title. We move all the individual gradient norms to a single device before stacking (no-op if all the gradients are already on a single device), `clip_coef` is copied to the device of gradient, which may be suboptimal as there could be multiple copies, but no worse than when we were synchronizing for each parameter. In a simple case of all gradients on a single device, there should be no synchronization. Also, we no longer error out if parameter list is empty or none of the parameters have gradients, and return 0 total_norm instead. Fixes https://github.com/pytorch/pytorch/issues/38605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38615 Reviewed By: ailzhang Differential Revision: D21634588 Pulled By: ngimel fbshipit-source-id: ea4d08d4f3445438260052820c7ca285231a156b	2020-05-19 10:33:40 -07:00
Simon Layton	59d92e442b	Vectorize non-persistent Softmax (#38557 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/36485 with bug fix & enhanced testing. Moved `test_softmax_backward` -> `test_softmax_results`, check fprop & bgrad against CPU implementation for all cases. \cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/38557 Differential Revision: D21620805 Pulled By: ngimel fbshipit-source-id: 4f736b3e59f79142e1b982eb643c592dedcbe111	2020-05-18 13:05:36 -07:00
Mike Ruberry	9cfc10d52e	Updates assertEqual to use torch.isclose-like logic (#37294 ) Summary: Edit: this has been updated to reflect the PR's current status, which has changed after review. This PR updates the behavior of the assertEqual, assertNotEqual, and assert_allclose to be consistent with each other and torch.isclose. It corrects several additional bugs in the current implementations and adds extensive testing and comments, too. These updates follow from changes to assertEqual like https://github.com/pytorch/pytorch/pull/34258 and https://github.com/pytorch/pytorch/pull/37069, and from our discussion of torch.isclose for complex tensors (see https://github.com/pytorch/pytorch/issues/36462), where we decided to implement a NumPy-compatible mathematical notion of "closeness" for complex tensors that is not a great fit for our testing framework. The detailed changelist is: - New test framework functions for comparing tensors and scalars - Tensors are compared using isclose; the real and imaginary parts of complex tensors are compared independently - Scalars are compared using the same algorithm - assertEqual and assert_allclose now use this common comparison function, instead of each implementing their own with divergent behavior - assertEqual-like debug messages are now available for all tensor and scalar comparisons, with additional context when comparing the components of sparse, quantized, and complex tensors - Extensive testing of the comparison behavior and debug messages - Small Updates - assertEqual now takes an "exact_device" argument, analogous to "exact_dtype", which should be useful in multidevice tests - assertEqual now takes an "equal_nan" argument for argument consistency with torch.isclose - assertEqual no longer takes the "allow_inf" keyword, which misleadingly only applied to scalar comparisons, was only ever set (rarely) to true, and is not supported by torch.isclose - Bug fixes: - the exact_dtype attribute has been removed (no longer needed after https://github.com/pytorch/pytorch/pull/38103) - message arguments passed to assertEqual are now handled correctly - bool x other dtype comparisons are now supported - uint8 and int8 tensor comparisons now function properly - rtol for integer comparisons is now supported (default is zero) - rtol and atol for scalar comparisons are now supported - complex scalar comparisons are now supported, analogous to complex tensor comparisons - assertNotEqual is now equivalent to the logical negation of assertEqual Pull Request resolved: https://github.com/pytorch/pytorch/pull/37294 Differential Revision: D21596830 Pulled By: mruberry fbshipit-source-id: f2576669f7113a06f82581fc71883e6b772de19b	2020-05-15 16:24:03 -07:00
Natalia Gimelshein	c0bc182761	Revert "Vectorize non-persistent Softmax kernels (#36485 )" (#38534 ) Summary: This reverts commit `c879c6fb98`. (it produces incorrect results) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38534 Reviewed By: soumith Differential Revision: D21589251 Pulled By: ngimel fbshipit-source-id: 66d5324848d0245d15b7ef5f1fe4302ed0992b56	2020-05-14 23:17:59 -07:00
David Reiss	d060deb5bb	Remove _compatible_subtest (#35620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35620 Python 2 has reached end-of-life and is no longer supported by PyTorch. `self.subTest` can be used directly in Python 3. Test Plan: CI Differential Revision: D20842872 Pulled By: dreiss fbshipit-source-id: 6ad42550c01e6959821ff07df767fc14b58c5a9e	2020-05-14 10:07:48 -07:00
Robert Wang	2b2d2168e8	Issue #27441 Fix: Bug in updating ModuleDict & ParameterDict (#27814 ) Summary: Fix a bug in `nn.ModuleDict.update` and `nn.ParameterDict.update` when passing another same dictionary as input. Related issue: [Issue https://github.com/pytorch/pytorch/issues/27441](https://github.com/pytorch/pytorch/issues/27441) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27814 Differential Revision: D21518099 Pulled By: ezyang fbshipit-source-id: 9e6bb6fcc26c8070e137e2e52c65f69a1fcaab37	2020-05-14 08:01:41 -07:00
Jeff Daily	138769b1b8	[ROCm] add exact_dtype=False to bfloat16 test (#38381 ) Summary: CC rohithkrn ezyang xw285cornell Fixes - TestNNDeviceTypeCUDA.test_activations_bfloat16_cuda - TestNNDeviceTypeCUDA.test_pooling_bfloat16_cuda - TestNNDeviceTypeCUDA.test_softmax_bfloat16_cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/38381 Differential Revision: D21549636 Pulled By: ezyang fbshipit-source-id: acb290c57eff4077b040a696267ecde613f0a433	2020-05-13 08:48:18 -07:00
Vitaly Fedyunin	57d01be92b	Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102 Test Plan: Imported from OSS Differential Revision: D21477060 Pulled By: VitalyFedyunin fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4	2020-05-09 14:48:55 -07:00
Simon Layton	c879c6fb98	Vectorize non-persistent Softmax kernels (#36485 ) Summary: Add read/write vectorization to non-persistent softmax kernels only. At this point launch logic has minimal changes, and `ILP=vectorization=2` is always used (the code can handle other values, but `ILP=2` has been the most consistent performer). Dispatch to persistent / non-persistent kernels is unchanged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36485 Differential Revision: D21477775 Pulled By: ngimel fbshipit-source-id: 9ff7fd243695d7bbf4121390085b64db0bbdef35	2020-05-08 15:20:33 -07:00
Ailing Zhang	9232356e5f	remove uses of type() and type_as() part 1. (#38029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38029 Differential Revision: D21468523 Pulled By: ailzhang fbshipit-source-id: 14b7185d43eb03f630cfaa2d70e02d637ff8551b	2020-05-08 08:16:24 -07:00
Alban Desmaison	5e83a13e14	stop creating integer type Tensors that require gradients (#37789 ) Summary: Fix https://github.com/pytorch/pytorch/issues/37680 Makes two changes: - Add `argmin`, `argmax` and `argsort` to the list of non-differentiable functions to prevent them from generating outputs that requires_grad. - Add a check to make sure we don't add such functions to the codegen by mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37789 Differential Revision: D21389201 Pulled By: albanD fbshipit-source-id: 6a7617e389e893f6f813d50f02700d32300b1386	2020-05-07 15:08:35 -07:00
Sharvil Nanavati	594b33ea10	Add support for non-persistent buffers. (#37191 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/18056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37191 Differential Revision: D21428373 Pulled By: albanD fbshipit-source-id: a7d367bafb95137e1bc380178b82b08eff5d5a5a	2020-05-07 06:52:31 -07:00
rohithkrn	e3934dfae8	[ROCm] Enable bfloat16 for ops in BERT model (#37634 ) Summary: Enables bfloat16 type for ops present in BERT model. Enabled relevant unit tests. ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37634 Differential Revision: D21413957 Pulled By: ezyang fbshipit-source-id: 19309fe46b4a2f07922bf5b32fee2066df514aeb	2020-05-05 21:24:56 -07:00
Jianyu Huang	fd05debbcd	[TS][easy] Typo Fix (#37773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37773 As Title says ghstack-source-id: 103385174 Test Plan: CI Reviewed By: dmudiger Differential Revision: D21374951 fbshipit-source-id: a2fc48b931f0cecbc8a995bf4b4ace30a8eb0d70	2020-05-04 10:41:07 -07:00

... 7 8 9 10 11 ...

1598 Commits