pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Alban Desmaison	bb8978f605	Revert D32175963: Converting hardswish to strucutred kernels with metatensor support Test Plan: revert-hammer Differential Revision: D32175963 (`57335a9ee3`) Original commit changeset: f4d749c6aeaf fbshipit-source-id: 6d68a96cf872c2d7b518c061875b9336bca0043a	2021-11-05 07:04:40 -07:00
John Clow	57335a9ee3	Converting hardswish to strucutred kernels with metatensor support (#66899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66899 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175963 Pulled By: Gamrix fbshipit-source-id: f4d749c6aeaf064084be72361607ea4f3f6bc91d	2021-11-04 19:02:00 -07:00
soulitzer	83e8612d11	Clean up test autograd (#67413 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/66066 This PR: - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality - tests related to an operator are better colocated - see the tracker for details What to think about when moving tests to their correct test suite: - naming, make sure its not too generic - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter - can this be merged with existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413 Reviewed By: jbschlosser, albanD Differential Revision: D32031480 Pulled By: soulitzer fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f	2021-11-03 15:26:09 -07:00
Xiao Wang	31cf3d6aad	Fix adaptive_max_pool2d for channels-last on CUDA (#67697 ) Summary: Fix https://github.com/pytorch/pytorch/issues/67239 The CUDA kernels for `adaptive_max_pool2d` (forward and backward) were written for contiguous output. If outputs are non-contiguous, first create a contiguous copy and let the kernel write output to the contiguous memory space. Then copy the output from contiguous memory space to the original non-contiguous memory space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67697 Reviewed By: ejguan Differential Revision: D32112443 Pulled By: ngimel fbshipit-source-id: 0e3bf06d042200c651a79d13b75484526fde11fe	2021-11-03 09:47:29 -07:00
John Shen	234bd6dc56	[quantized] Add bilinear quantized grid_sample (#66879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66879 This adds a quantized implementation for bilinear gridsample. Bicubic interpolation cannot be supported as easily since we rely on the linearity of quantization to operate on the raw values, i.e. f(q(a), q(b)) = q(f(a, b)) where f is the linear interpolation function. ghstack-source-id: 141321116 Test Plan: test_quantization Reviewed By: kimishpatel Differential Revision: D31656893 fbshipit-source-id: d0bc31da8ce93daf031a142decebf4a155943f0f	2021-11-01 14:44:26 -07:00
kshitij12345	885a8e53ba	replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201 ) Summary: Reference https://github.com/pytorch/pytorch/issues/53849 Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201 Reviewed By: mrshenli Differential Revision: D31299718 Pulled By: mruberry fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd	2021-11-01 09:22:34 -07:00
Joel Schlosser	16d937b0df	Fix strided _conv_double_backward() with 3D input / weight (#67283 ) Summary: Removes the 3D special case logic in `_convolution_double_backward()` that never worked. The logic was never called previously since `convolution()` expands input / weight from 3D -> 4D before passing them to backends; backend-specific backward calls thus save the 4D version to pass to `_convolution_double_backward()`. The new general `convolution_backward()` saves the original 3D input / weight, uncovering the bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67283 Reviewed By: anjali411 Differential Revision: D32021100 Pulled By: jbschlosser fbshipit-source-id: 0916bcaa77ef49545848b344d6385b33bacf473d	2021-10-29 09:48:53 -07:00
Sameer Deshmukh	edd4d246c3	Accept 0-dim channel inputs in convolution layer (#66256 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56998 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/66256 Reviewed By: mrshenli Differential Revision: D31859428 Pulled By: jbschlosser fbshipit-source-id: 034b6c1ce35aac50eabfa09bbcd8b1e3c8b171bd	2021-10-25 08:12:29 -07:00
Eddie Yan	d9c4b3feab	Do rowwisemoments computation in `float` for `half` `LayerNorm` (#66920 ) Summary: https://github.com/pytorch/pytorch/issues/66707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66920 Reviewed By: mrshenli Differential Revision: D31850612 Pulled By: ngimel fbshipit-source-id: a95a33567285dcf9ee28d33f503cead3268960f9	2021-10-22 09:50:42 -07:00
vfdev	62ca5a81c0	Exposed `recompute_scale_factor` into nn.Upsample (#66419 ) Summary: Description: - Exposed recompute_scale_factor into nn.Upsample such that recompute_scale_factor=True option could be used Context: https://github.com/pytorch/pytorch/pull/64501#discussion_r710205190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66419 Reviewed By: gchanan Differential Revision: D31731276 Pulled By: jbschlosser fbshipit-source-id: 2118489e6f5bc1142f2a64323f4cfd095a9f3c42	2021-10-20 07:59:25 -07:00
Jane Xu	9eab6da887	[skip ci] Set test owner for nn tests (#66850 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/66850 Reviewed By: albanD Differential Revision: D31761712 Pulled By: janeyx99 fbshipit-source-id: 7272154cac77e2ce38370775a9e8d41252e13166	2021-10-19 08:26:50 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Tomi Peltola	713e025c9f	Add no-input-grad-needed cases to test_grid_sample (#66071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66071 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431801 Pulled By: albanD fbshipit-source-id: 57a94ed9e97e402aa8193d69355e57b6309c64f7	2021-10-13 10:56:47 -07:00
Natalia Gimelshein	4a50b6c490	fix cosine similarity dimensionality check (#66191 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66191 Reviewed By: dagitses, malfet Differential Revision: D31436997 Pulled By: ngimel fbshipit-source-id: 363556eea4e1696d928ae08320d298451c286b10	2021-10-06 15:44:51 -07:00
lezcano	ca76e193a3	Fix nll_backward for negative weights (#64572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64572 Fixes https://github.com/pytorch/pytorch/issues/64256 It also fixes an inconsistent treatment of the case `reduction = "mean"` when the whole target is equal to `ignore_index`. It now returns `NaN` in this case, consistently with what it returns when computing the mean over an empty tensor. We add tests for all these cases. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31116297 Pulled By: albanD fbshipit-source-id: cc44e79205f5eeabf1efd7d32fe61e26ba701b52	2021-10-01 19:41:51 -07:00
Philip Meier	aebde1bc2b	deprecate device getter from `torch.testing` namespace (#63844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63844 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31141433 Pulled By: mruberry fbshipit-source-id: a29331278ab99a19e225e2cb357458e3db4f9732	2021-09-29 02:40:52 -07:00
kshitij12345	a012216b96	[nn] Fold : no batch dim (#64909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64907 Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909 Reviewed By: cpuhrsch, heitorschueroff Differential Revision: D30991087 Pulled By: jbschlosser fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3	2021-09-23 08:37:32 -07:00
Rodrigo Berriel	b80bdcc73b	Add register_module alias to nn.Module (#65174 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60397. I'm not sure how aliases are supposed to be implemented, but this is the most basic/direct way, IMO. As a side-effect, this implementation results in a "duplicate" doc entry, inheriting the one from `add_module`: ![monkey-patch](https://user-images.githubusercontent.com/7027770/133693137-8408d8e7-1f4f-436b-b176-57dda9bc3a32.png) An alternative implementation could be: ```python def register_module(self, name: str, module: Optional['Module']) -> None: r"""Alias for :func:`add_module`.""" self.add_module(name, module) ``` which results in this documentation: ![image](https://user-images.githubusercontent.com/7027770/133693249-d969a71a-be44-489d-9633-4f38b44ab887.png) Questions: 1. Should I replicate the tests? There are two for `add_module`: [test_add_module_raises_error_if_attr_exists](`873255c6d9/test/test_nn.py (L1420-L1434)`) and [test_add_module](`873255c6d9/test/test_nn.py (L1837-L1855)`). 2. This PR only adds `register_module` to `nn.Module`. There is an `add_module` in [`_RemoteModule`](https://github.com/pytorch/pytorch/blob/master/torch/distributed/nn/api/remote_module.py#L311-L312), which raises `NotSupported`, and there is another one in [`ConcreteModuleTypeBuilder`](`873255c6d9/torch/_C/__init__.pyi.in (L468)`), which means something else, I think. Should I do anything about them? cc ngimel SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/65174 Reviewed By: soulitzer Differential Revision: D31089717 Pulled By: jbschlosser fbshipit-source-id: abd8d14a434fd8c7efa0bd8c242df56da33491e9	2021-09-22 16:37:28 -07:00
kshitij12345	9c23f6eb7d	[nn] TripletMarginLoss and PairwiseDistance : no batch dim (#64882 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64882 Reviewed By: malfet Differential Revision: D31055577 Pulled By: jbschlosser fbshipit-source-id: 2f0a5a08619b672026b48a78bc7d83a6dccba0bf	2021-09-21 07:29:48 -07:00
Alban Desmaison	d37c02be08	Allow parametrization to be nested (#65167 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65167 Reviewed By: jbschlosser Differential Revision: D31002318 Pulled By: albanD fbshipit-source-id: b1f1c6c9efa9e83af9789ed13efc133f777f418e	2021-09-17 07:29:01 -07:00
kshitij12345	01e92f2a56	[nn] no batch dim support: CosineEmbeddingLoss (#64590 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 TODO * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/64590 Reviewed By: H-Huang Differential Revision: D30900775 Pulled By: jbschlosser fbshipit-source-id: d24e72787017e79afbf8f04a94901a290485b81a	2021-09-13 10:45:33 -07:00
Aswin John Mathews	63b180beed	ROCm MIOpen NHWC Convolution support (#63617 ) Summary: - Added 2D-Convolution NHWC support - on ROCm 4.3, with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` flag - May need to force MIOpen to search for solutions ( see examples below for flags ) PYTORCH_MIOPEN_SUGGEST_NHWC Environment Flag MIOpen does not officially support NHWC yet, although convolution support has been added to tip-of-tree of MIOpen. This flag is intended to be a short-lived flag to explicitly turn on NHWC support until ROCm officially supports NHWC and performance is verified. Examples 1. Example usage 1 : Run test on ROCm4.3 `PYTORCH_TEST_WITH_ROCM=1 PYTORCH_MIOPEN_SUGGEST_NHWC=1 MIOPEN_FIND_ENFORCE=4 MIOPEN_DEBUG_CONV_GEMM=0 MIOPEN_FIND_MODE=1 pytest test_nn.py -v -k "test_conv_cudnn_nhwc" ` 2. Example usage 2: Run the following with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` on ROCm4.3. ``` #!/usr/bin/env python3 import torch model = torch.nn.Conv2d(8, 4, 3).cuda().half() model = model.to(memory_format=torch.channels_last) input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, requires_grad=True) input = input.to(device="cuda", memory_format=torch.channels_last, dtype=torch.float16) # should print True for is_contiguous(channels_last), and strides must match NHWC format print(input.is_contiguous(memory_format=torch.channels_last), input.shape, input.stride() ) out = model(input) # should print True for is_contiguous(channels_last), and strides must match NHWC format print("Contiguous channel last :", out.is_contiguous(memory_format=torch.channels_last), " out shape :", out.shape, "out stride :", out.stride() ) ``` See https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html for more examples. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/63617 Reviewed By: saketh-are Differential Revision: D30730800 Pulled By: ezyang fbshipit-source-id: 61906a0f30be8299e6547d312ae6ac91cc7c3238	2021-09-10 08:06:32 -07:00
Sameer Deshmukh	7205ca0210	Change MaxUnpool to accept tensors with 0-dim batch sizes. (#64082 ) Summary: Part of the fix for https://github.com/pytorch/pytorch/issues/38115. Changes the `MaxUnpool` module to work with 0-dimensions batch sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64082 Reviewed By: mrshenli Differential Revision: D30793907 Pulled By: jbschlosser fbshipit-source-id: d21aa665be5aa18f592b39ef7b4e3cbc632e21ed	2021-09-08 08:41:09 -07:00
Philip Meier	26b7ff5aea	deprecate dtype getters from `torch.testing` namespace (#63554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554 Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold: 1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`. 2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries. We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30662206 Pulled By: mruberry fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56	2021-09-07 08:58:51 -07:00
Richard Zou	535526b95c	Restore LayerNorm numerics test (#64385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64385 It was deleted in https://github.com/pytorch/pytorch/pull/63276. The numerics test was meant to check LayerNorm behavior on large inputs, but we deleted it without realizing that. Test Plan: - wait for tests. Reviewed By: ngimel Differential Revision: D30702950 Pulled By: zou3519 fbshipit-source-id: a480e26c45ec38fb628938b70416cdb22d976a46	2021-09-01 15:32:49 -07:00
Kushashwa Ravi Shrimali	d5bfdd3dac	OpInfo for `nn.functional.layer_norm` (#63276 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Note: * This PR also adds a reference test inspired by existing tests in `test_nn.py`. cc: mruberry zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63276 Reviewed By: ejguan Differential Revision: D30452483 Pulled By: zou3519 fbshipit-source-id: 2578d01ca34e031668a41bd284db60c31ae1fba8	2021-09-01 09:31:45 -07:00
Kushashwa Ravi Shrimali	ca8dd296ee	Add OpInfo for `nn.functional.cosine_similarity` (#62959 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Notes: * Some redundant tests from `test_nn.py` have been removed. I'm unsure about precision checks if they can be removed as well. * Broadcasting is also checked in the OpInfo for `cosine_similarity`. cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62959 Reviewed By: heitorschueroff Differential Revision: D30520176 Pulled By: zou3519 fbshipit-source-id: 14e902eb4bcce875edab28a1669a2ea021052b9b	2021-08-31 10:31:36 -07:00
CaoE	cb7cf823b3	add BFloat16 support for fold and unfold on CPU (#62880 ) Summary: Add BFloat16 support for fold and unfold operators on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62880 Reviewed By: iramazanli Differential Revision: D30576387 Pulled By: zou3519 fbshipit-source-id: c48f6e56702bfea34448db1b3a1634c49c5d8ec8	2021-08-30 19:14:10 -07:00
lezcano	f3e329cbec	Implements the orthogonal parametrization (#62089 ) Summary: Implements an orthogonal / unitary parametrisation. It does passes the tests and I have trained a couple models with this implementation, so I believe it should be somewhat correct. Now, the implementation is very subtle. I'm tagging nikitaved and IvanYashchuk as reviewers in case they have comments / they see some room for optimisation of the code, in particular of the `forward` function. Fixes https://github.com/pytorch/pytorch/issues/42243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62089 Reviewed By: ezyang Differential Revision: D30639063 Pulled By: albanD fbshipit-source-id: 988664f333ac7a75ce71ba44c8d77b986dff2fe6	2021-08-30 13:12:07 -07:00
Peter Bell	5b0dfd0f8a	Fix bad use of channels last kernel in sync batch norm backward (#64100 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64039 There are two distinct problems here. 1. If `grad_output` is channels last but not input, then input would be read as-if it were channels last. So reading the wrong values. 2. `use_channels_last_kernels` doesn't guarunte that `suggest_memory_format` will actually return channels last, so use `empty_like` instead so the strides always match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64100 Reviewed By: mruberry Differential Revision: D30622127 Pulled By: ngimel fbshipit-source-id: e28cc57215596817f1432fcdd6c49d69acfedcf2	2021-08-30 12:16:30 -07:00
Thomas J. Fan	d3bcba5f85	ENH Adds label_smoothing to cross entropy loss (#63122 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/7455 Partially resolves pytorch/vision#4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122 Reviewed By: iramazanli Differential Revision: D30586076 Pulled By: jbschlosser fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924	2021-08-29 23:33:04 -07:00
mingfeima	c5ed31e4a7	add channel last support for MaxUnpool2d (#49984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49984 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007051 Pulled By: VitalyFedyunin fbshipit-source-id: 6c54751ade4092e03c1651aaa60380f7d6e92f6b	2021-08-29 18:37:10 -07:00
BBuf	6ab3a21098	fix resize bug (#61166 ) Summary: I think the original intention here is to only take effect in the case of align_corners (because output_size = 1 and the divisor will be 0), but it affects non-align_corners too. For example: ```python input = torch.tensor( np.arange(1, 5, dtype=np.int32).reshape((1, 1, 2, 2)) ) m = torch.nn.Upsample(scale_factor=0.5, mode="bilinear") of_out = m(input) ``` The result we expect should be [[[[2.5]]]] but pytorch get [[[[1.0]]]] which is different from OpenCV and PIL, this pr try to fixed it。 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61166 Reviewed By: malfet Differential Revision: D30543178 Pulled By: heitorschueroff fbshipit-source-id: 21a4035483981986b0ae4a401ef0efbc565ccaf1	2021-08-27 10:49:31 -07:00
Philip Meier	57d4c6cf42	replace `self.assertTrue(torch.allclose(..))` with `self.assertEqual(…)` (#63637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63637 Reviewed By: malfet Differential Revision: D30541266 Pulled By: mruberry fbshipit-source-id: ab461949782c6908a589ea098fcfcf5c3e081ee6	2021-08-25 16:47:40 -07:00
mingfeima	b0782f0f32	add BFloat16 support for bernoulli and Dropout on CPU (#56372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56372 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28836792 Pulled By: VitalyFedyunin fbshipit-source-id: ede951d172a59276e11383fd767778ab959b5a6b	2021-08-25 12:01:27 -07:00
Joel Schlosser	544af391b5	Allow arbitrary objects in state_dicts (#62976 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62094 Introduces functionality for adding arbitrary objects to module state_dicts. To take advantage of this, the following functions can be defined on a module: * `get_extra_state(self) -> dict` - Returns a dict defining any extra state this module wants to save * `set_extra_state(self, state)` - Subsumes the given state within the module In the details, a sub-dictionary is stored in the state_dict under the key `_extra_state` for each module that requires extra state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62976 Reviewed By: heitorschueroff Differential Revision: D30518657 Pulled By: jbschlosser fbshipit-source-id: 5fb35ab8e3d36f35e3e96dcd4498f8c917d1f386	2021-08-24 19:06:14 -07:00
soulitzer	5be17ec1fc	Do not modify saved variables in-place for spectral norm during power iteration (#62293 ) Summary: Interestingly enough, the original code did have a mechanism that aims to prevent this very issue: but it performs a clone AFTER modifying u and v in-place. This wouldn't work though because we can later use the cloned u and v in operations that save for backward, and the next time we execute forward, we modify the same cloned u and v in-place. So if the idea is that we want to avoid modifying saved variable in-place we should clone it BEFORE the in-place operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62293 Reviewed By: bdhirsh Differential Revision: D30489750 Pulled By: soulitzer fbshipit-source-id: cbe8dea885aef97adda8481f7a822e5bd91f7889	2021-08-24 13:08:59 -07:00
mingfeima	d3be02d100	fix batchnorm2d issue when input is non contiguous (#63392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63392 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30476317 Pulled By: VitalyFedyunin fbshipit-source-id: 03055a0aec21cf2c029b6f32315da2b09cb722d0	2021-08-24 08:24:01 -07:00
mingfeima	5b7cdc5a3d	add channels last for GroupNorm (#49821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49821 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007053 Pulled By: VitalyFedyunin fbshipit-source-id: 34a48d5d3b66a159febf3c3d96748fbaba1b9e31	2021-08-23 22:54:59 -07:00
Jeff Daily	a8de0d83fe	empty caching allocator before test_avg_pool2d large subtest (#63528 ) Summary: Otherwise, unrecoverable OOM occurs on MI25. Fixes broken ROCm CI test1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63528 Reviewed By: malfet, zhouzhuojie Differential Revision: D30459151 Pulled By: walterddr fbshipit-source-id: 63e205c4f486fcbdd514cfb0ed8e38584f894585	2021-08-20 14:01:45 -07:00
Philip Meier	99203580a9	Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841 Redo of #60863. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30408145 Pulled By: mruberry fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58	2021-08-19 12:50:41 -07:00
kshitij12345	3ce67efea2	[opinfo] nn.functional.pad (#62814 ) Summary: Reference: https://github.com/facebookresearch/functorch/issues/78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62814 Reviewed By: VitalyFedyunin Differential Revision: D30307492 Pulled By: zou3519 fbshipit-source-id: 4f6062eb4a3c91ed1795df1f82846afa0abafcdc	2021-08-16 13:29:34 -07:00
leslie-fang-intel	385b082854	add substract of max and testcase (#63132 ) Summary: As discussed here https://github.com/pytorch/pytorch/pull/62897, in the path of BF16/non-last-dim Softmax, we miss the subtractions of max value which will cause the overflow in the `exp()` calculation when the value of input tensor is large, such as `1000.0`. To avoid this issue, we add the subtractions of max value and the corresponding test cases in this PR. Note w/o subtractions of max value(accidental reverts or changes), we will get the underlying error message of the test case ``` AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.05 and atol=0.05, found 103984 element(s) (out of 126720) whose difference(s) exceeded the margin of error (including 103984 nan comparisons). The greatest difference was nan (0.0 vs. nan), which occurred at index (0, 0, 0, 1). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63132 Reviewed By: VitalyFedyunin Differential Revision: D30280792 Pulled By: cpuhrsch fbshipit-source-id: 722821debf983bbb4fec878975fa8a4da0d1d866	2021-08-13 20:50:49 -07:00
Sameer Deshmukh	809e1e7457	Allow TransformerEncoder and TransformerDecoder to accept 0-dim batch sized tensors. (#62800 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows TransformerEncoder and Decoder (alongwith the inner `Layer` classes) to accept inputs with 0-dimensional batch sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62800 Reviewed By: VitalyFedyunin Differential Revision: D30303240 Pulled By: jbschlosser fbshipit-source-id: 8f8082a6f2a9f9d7ce0b22a942d286d5db62bd12	2021-08-13 16:11:57 -07:00
Sameer Deshmukh	38a825c648	Allow Average Pooling modules to accept tensors with 0-dim batch sizes. (#62025 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. It introduces changes and tests for allowing the Average Pooling layers to accept tensors with 0 sized batch dimensions and return meaningful results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62025 Reviewed By: VitalyFedyunin Differential Revision: D30303256 Pulled By: jbschlosser fbshipit-source-id: 5f727e62a7c58d2b8bb49fcc3bd7688474917ba5	2021-08-13 11:31:17 -07:00
Sameer Deshmukh	cb23976f9f	Allow 0-dim batch sizes for AdaptiveMaxPool and MaxPool. (#62088 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows `MaxPool` and `AdaptiveMaxPool` to accept tensors whose batch size is 0. Some changes have been made to modernize the tests so that they will show the name of C++ function that throws an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62088 Reviewed By: bdhirsh Differential Revision: D30281285 Pulled By: jbschlosser fbshipit-source-id: 52bffc67bfe45a78e11e4706b62cce1469eba1b9	2021-08-13 07:33:17 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Christian Puhrsch	3beb65d45d	test_cudnn_convolution_relu skipCUDAIfRocm Summary: skip rocm test for test_cudnn_convolution_relu Test Plan: This skips a test Reviewed By: ngimel Differential Revision: D30233620 fbshipit-source-id: 31eab8b03c3f15674e0d262a8f55965c1aa6b809	2021-08-10 15:15:23 -07:00
Sameer Deshmukh	9e7b6bb69f	Allow LocalResponseNorm to accept 0 dim batch sizes (#62801 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows `LocalResponseNorm` to accept tensors with 0 dimensional batch size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62801 Reviewed By: zou3519 Differential Revision: D30165282 Pulled By: jbschlosser fbshipit-source-id: cce0b2d12dbf47dc8ed6247c267bf2f2305f858a	2021-08-10 06:54:52 -07:00
=	084e92bb76	Use output memory format based on input for cudnn_convolution_relu (#62482 ) Summary: Currently when cudnn_convolution_relu is passed a channels last Tensor it will return a contiguous Tensor. This PR changes this behavior and bases the output format on the input format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62482 Reviewed By: ngimel Differential Revision: D30049905 Pulled By: cpuhrsch fbshipit-source-id: 98521d14ee03466e7128a1912b9f754ffe10b448	2021-08-09 15:31:53 -07:00
Natalia Gimelshein	e6a3154519	Allow broadcasting along non-reduction dimension for cosine similarity (#62912 ) Summary: Checks introduced by https://github.com/pytorch/pytorch/issues/58559 are too strict and disable correctly working cases that people were relying on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62912 Reviewed By: jbschlosser Differential Revision: D30165827 Pulled By: ngimel fbshipit-source-id: f9229a9fc70142fe08a42fbf2d18dae12f679646	2021-08-06 19:17:04 -07:00
Sameer Deshmukh	f6c7081a16	Allow FractionalMaxPool 2D and 3D layers to accept 0 dim batch size tensors. (#62083 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. Allow `FractionalMaxPool` 2D and 3D layers to accept 0 dim batch sizes. Also make some minor corrections to error messages to make them more informative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62083 Reviewed By: H-Huang Differential Revision: D30134461 Pulled By: jbschlosser fbshipit-source-id: 0ec50875d36c2083a7f06d9ca6a110fb3ec4f2e2	2021-08-05 17:40:10 -07:00
kshitij12345	64c54f92ca	[opinfo] nn.functional.unfold (#62705 ) Summary: Reference: facebookresearch/functorch#78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62705 Reviewed By: H-Huang Differential Revision: D30138807 Pulled By: zou3519 fbshipit-source-id: 1d0b0e58feb13aec7b231c9f632a6d1694b9d272	2021-08-05 17:12:25 -07:00
Eddie Yan	878943c64f	Preserve memory layout when aten batchnorm is used (#62773 ) Summary: https://github.com/pytorch/pytorch/issues/62594 CC cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/62773 Reviewed By: H-Huang Differential Revision: D30118658 Pulled By: cpuhrsch fbshipit-source-id: bce9e92f5f8710c876a33cccbd1625155496ddea	2021-08-05 10:21:44 -07:00
yanbing-j	c7a7c2b62f	Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525 ) Summary: Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one. Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525 Reviewed By: ejguan Differential Revision: D29940369 Pulled By: ezyang fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf	2021-08-03 06:52:23 -07:00
Joel Schlosser	a42345adee	Support for target with class probs in CrossEntropyLoss (#61044 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11959 Alternative approach to creating a new `CrossEntropyLossWithSoftLabels` class. This PR simply adds support for "soft targets" AKA class probabilities to the existing `CrossEntropyLoss` and `NLLLoss` classes. Implementation is dumb and simple right now, but future work can add higher performance kernels for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61044 Reviewed By: zou3519 Differential Revision: D29876894 Pulled By: jbschlosser fbshipit-source-id: 75629abd432284e10d4640173bc1b9be3c52af00	2021-07-29 10:04:41 -07:00
Joel Schlosser	35307b131d	Callable activation function support for Transformer modules (Python) (#61355 ) Summary: Fixes Python part of https://github.com/pytorch/pytorch/issues/60747 Enhances the Python versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61355 Reviewed By: bdhirsh Differential Revision: D29967302 Pulled By: jbschlosser fbshipit-source-id: 8ee6f20083d49dcd3ab432a18e6ad64fe1e05705	2021-07-28 21:42:56 -07:00
Pritam Damania	cac4aa71ca	Provide option to pass module instance to _load_state_dict_pre_hooks. (#62070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62070 We have a custom Tensor: https://github.com/pytorch/pytorch/blob/master/torch/distributed/_sharded_tensor/api.py#L67, which doesn't show up in state_dict for the module. This was resolved by using the _register_state_dict_hook: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1196 to parse and add custom tensors to state_dict. However, the problem is during load time _register_load_state_dict_pre_hook: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1272, does not pass in the module instance and as a result, a ShardedTensor in the state_dict cannot be appropriately added to a module at load time. To resolve this issue, in this PR I've enhanced this hook to support two variations, one which passes in the module instance (for the problem described above) and one is the previous version for BC reasons. ghstack-source-id: 134541391 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: jbschlosser Differential Revision: D29867142 fbshipit-source-id: bcb136ff51eedd0b508cfb419e8b8a6b7d95539c	2021-07-28 19:22:47 -07:00
Thomas J. Fan	71a6ef17a5	ENH Adds no_batch_dim tests/docs for Maxpool1d & MaxUnpool1d (#62206 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62206 Reviewed By: ejguan Differential Revision: D29942341 Pulled By: jbschlosser fbshipit-source-id: a3fad774cee30478f7d6cdd49d2eec31be3fc518	2021-07-28 10:15:32 -07:00
leslie-fang-intel	7443c90f15	optimize non lastdim softmax bf16 (#60371 ) Summary: Here is the PR to enable the softmax calculation with data type of `bfloat16` when not along the last dim. * Use bf16 specialization for forward calculation to reduce the bf16/fp32 cast in vec template. * Release the bf16 limitation for backward calculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60371 Reviewed By: ejguan Differential Revision: D29563109 Pulled By: cpuhrsch fbshipit-source-id: f6b439fa3850a6c633f35db65ea3d735b747863e	2021-07-28 10:06:51 -07:00
Peter Bell	9776e1ff2f	Migrate thnn_conv_depthwise2d from THC to ATen (#62281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62281 Closes gh-24646, Closes gh-24647 There is no `TensorIterator` equivalent to these kernels so this is just migrating the existing kernels over to the ATen style. I've benchmarked for contiguous tensors with this script: ``` import torch shape = (10, 10, 100, 100) x = torch.randn(*shape, device='cuda') w = torch.randn((10, 1, 5, 5), device='cuda') for _ in range(100): torch.nn.functional.conv2d(x, w, groups=10) ``` and similarly for backwards. I see these as the same to within measurement error. \| \| Master Forward (us) \| This PR Forward (us) \| \|------------------:\|:-------------------:\|:--------------------:\| \| Forward \| 133.5 \| 133.6 \| \| Backward (input) \| 1,102 \| 1,119 \| \| Backward (weight) \| 2,220 \| 2,217 \| Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29943062 Pulled By: ngimel fbshipit-source-id: fc5d16496eb733743face7c5a14e532d7b8ee26a	2021-07-27 16:51:23 -07:00
Sameer Deshmukh	4a15f4a902	Allow 0-dim batch sizes in Bilinear NN layer. (#47106 ) Summary: Part of the fix for https://github.com/pytorch/pytorch/issues/12013 Checks if the inputs and outputs are non-zero in order to allow the Bilinear layer to accept 0-dim batch sizes. The if-check for this checks for both input and output dim sizes since the `_trilinear` function is written to work with both forward and backward for Bilinear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47106 Reviewed By: ejguan Differential Revision: D29935589 Pulled By: jbschlosser fbshipit-source-id: 607d3352bd4f88e2528c64408f04999960be049d	2021-07-27 13:59:42 -07:00
Erjia Guan	acaac70f63	Revert D29883676: Migrate thnn_conv_depthwise2d from THC to ATen Test Plan: revert-hammer Differential Revision: D29883676 (`de3a4eb583`) Original commit changeset: 9b2ac62cdd8a fbshipit-source-id: d211d3cb7723b5d2e73de6941a7e649e5f78864f	2021-07-27 11:28:52 -07:00
Peter Bell	de3a4eb583	Migrate thnn_conv_depthwise2d from THC to ATen (#62006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62006 Closes gh-24646, gh-24647 There is no `TensorIterator` equivalent to these kernels so this is just migrating the existing kernels over to the ATen style. I've benchmarked for contiguous tensors with this script: ``` import torch shape = (10, 10, 100, 100) x = torch.randn(*shape, device='cuda') w = torch.randn((10, 1, 5, 5), device='cuda') for _ in range(100): torch.nn.functional.conv2d(x, w, groups=10) ``` and similarly for backwards. I see these as the same to within measurement error. \| \| Master Forward (us) \| This PR Forward (us) \| \|------------------:\|:-------------------:\|:--------------------:\| \| Forward \| 133.5 \| 133.6 \| \| Backward (input) \| 1,102 \| 1,119 \| \| Backward (weight) \| 2,220 \| 2,217 \| Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29883676 Pulled By: ngimel fbshipit-source-id: 9b2ac62cdd8a84e1a23ffcd66035b2b2fe2374d8	2021-07-27 10:00:25 -07:00
Thomas J. Fan	89ca638c18	ENH Adds no batch dim support for AdativeMaxPool*D (#61847 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61847 Reviewed By: suo Differential Revision: D29883887 Pulled By: jbschlosser fbshipit-source-id: de3fcf1cc3878b138ab766d2a50cc59c52ec5a60	2021-07-26 07:35:36 -07:00
Thomas J. Fan	f03e7170f0	ENH Updates docs and tests for regression modules that already support no-batch-dims (#61461 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 This PR does not use `check_sum_reduction` because I wanted to test every reduction option. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61461 Reviewed By: suo Differential Revision: D29883744 Pulled By: jbschlosser fbshipit-source-id: cdad0effb41f0484938caad0d4c9d6d83e2aec07	2021-07-23 16:40:17 -07:00
Thomas J. Fan	1ec6205bd0	ENH Adds no_batch_dim support for maxpool and unpool for 2d and 3d (#61984 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 (Interesting how the maxpool tests are currently in `test/test_nn.py`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61984 Reviewed By: suo Differential Revision: D29883846 Pulled By: jbschlosser fbshipit-source-id: 1e0637c96f8fa442b4784a9865310c164cbf61c8	2021-07-23 16:14:10 -07:00
Joel Schlosser	f4ffaf0cde	Fix type promotion for cosine_similarity() (#62054 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62054 Reviewed By: suo Differential Revision: D29881755 Pulled By: jbschlosser fbshipit-source-id: 10499766ac07b0ae3c0d2f4c426ea818d1e77db6	2021-07-23 15:20:48 -07:00
Peter Bell	0df1679e5c	BatchNorm: fix mixed precision usage with affine=False (#61962 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61924 The fused backward kernel was using the weight dtype to detect mixed precision usage, but the weights can be none and the `running_mean` and `running_var` can still be mixed precision. So, I update the check to look at those variables as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61962 Reviewed By: albanD Differential Revision: D29825516 Pulled By: ngimel fbshipit-source-id: d087fbf3bed1762770cac46c0dcec30c03a86fda	2021-07-23 09:55:52 -07:00
Vitaly Fedyunin	b60d1b713e	Revert D26007050: add channels last support for thnn_conv2d (non-dilated) Test Plan: revert-hammer Differential Revision: D26007050 (`8b88c24670`) Original commit changeset: 1289e0687c24 fbshipit-source-id: 88b679efbcae572fe604d50e2199861cadbc3d4a	2021-07-22 08:31:15 -07:00
Thomas J. Fan	17d743ff04	ENH Adds test and docs for dropout for no batch dims (#61911 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 I think `Dropout` is already tested in `test_Dropout` for no batch dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61911 Reviewed By: albanD Differential Revision: D29810928 Pulled By: jbschlosser fbshipit-source-id: 7716a1a808e9e34aae43573f38706212552afbb4	2021-07-21 09:07:10 -07:00
Thomas J. Fan	48af9de92f	ENH Enables No-batch for *Pad1d Modules (#61060 ) Summary: Toward https://github.com/pytorch/pytorch/issues/60585 This PR adds a `single_batch_reference_fn` that uses the single batch implementation to check no-batch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61060 Reviewed By: mrshenli Differential Revision: D29739823 Pulled By: jbschlosser fbshipit-source-id: d90d88a3671177a647171801cc6ec7aa3df35482	2021-07-21 07:12:41 -07:00
Calvin McCarter	bdf439a958	Adds _LazyInstanceNorm and LazyInstanceNormXd (#60982 ) Summary: Signed-off-by: Calvin McCarter <calvin@lightmatter.co> Fixes https://github.com/pytorch/pytorch/issues/60981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60982 Reviewed By: albanD Differential Revision: D29810547 Pulled By: jbschlosser fbshipit-source-id: d933d4c7fe5cf7be9b09a5ab93f740b94cf08cc1	2021-07-21 06:45:45 -07:00
mingfeima	8b88c24670	add channels last support for thnn_conv2d (non-dilated) (#49582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49582 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007050 Pulled By: VitalyFedyunin fbshipit-source-id: 1289e0687c2459dd4eb8e4ba2efc8266397cfe5f	2021-07-20 12:50:24 -07:00
Xiong Wei	45751e0b34	Support integral target for the backward of nn.SmoothL1Loss (#61112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58816 - enhance the backward of `nn.SmoothL1Loss` to allow integral `target` - add test cases in `test_nn.py` to check the `input.grad` between the integral input and its floating counterpart. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61112 Reviewed By: mrshenli Differential Revision: D29775660 Pulled By: albanD fbshipit-source-id: 544eabb6ce1ea13e1e79f8f18c70f148e92be508	2021-07-20 10:24:03 -07:00
Joel Schlosser	aa01a7a61c	Fix for get_buffer(): check buffers by name instead of value (#61429 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61242 Previous code was wrongly checking if a tensor is a buffer in a module by comparing values; fix compares names instead. Docs need some updating as well- current plan is to bump that to a separate PR, but I'm happy to do it here as well if preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61429 Reviewed By: gchanan Differential Revision: D29712341 Pulled By: jbschlosser fbshipit-source-id: 41f29ab746505e60f13de42a9053a6770a3aac22	2021-07-15 09:55:09 -07:00
John Shen	343cb276b0	[pytorch] Add broadcasting support to add_relu kernel (#61584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61584 add_relu is not working with broadcasting. This registers a scalar version of add_relu in native_functions that casts to tensor before calling the regular function. TensorIterator handles broadcasting analogously to existing add. ghstack-source-id: 133480068 Test Plan: python3 test/test_nn.py TestAddRelu Reviewed By: kimishpatel Differential Revision: D29641768 fbshipit-source-id: 1b0ecfdb7eaf44afed83c9e9e74160493c048cbc	2021-07-14 10:32:20 -07:00
Joel Schlosser	4d842d909b	Revert FC workaround for ReflectionPad3d (#61308 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61248 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61308 Reviewed By: iramazanli Differential Revision: D29566849 Pulled By: jbschlosser fbshipit-source-id: 8ab443ffef7fd9840d64d71afc2f2d2b8a410ddb	2021-07-12 14:19:07 -07:00
Xiao Wang	5a17cb6f44	Add channels-last support for bilinear and nearest 2d interpolation on CUDA (#56322 ) Summary: Add channels-last support for bilinear and nearest 2d interpolation on CUDA Benchmark (on 2070 Super) is available at - nearest 2d: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/nearest-2d - bilinear: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/bilinear Some regressions are seen for tensors with small channel size. We may add a heuristic to dispatch the contiguous and channels-last path if needed. Close https://github.com/pytorch/pytorch/issues/60137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56322 Reviewed By: mruberry Differential Revision: D29645980 Pulled By: ngimel fbshipit-source-id: c36dff4ee4789bec9b01da4029f326d30067c6b7	2021-07-10 18:00:50 -07:00
mingfeima	8bec478a9e	MaxPool2d: use channels_last format for both output and indice when input is channels_last (#61245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61245 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29557884 Pulled By: ezyang fbshipit-source-id: 0d2b8cbaaf13411eefd7d867021bd6028d40e5cc	2021-07-07 07:50:28 -07:00
mingfeima	652d911f81	add BFloat16 support for LayerNorm CPU (#55210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55210 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28836793 Pulled By: VitalyFedyunin fbshipit-source-id: 998298deedd7a18e45fb761a0a4e0d88b65f2e0c	2021-06-29 14:08:30 -07:00
Karen Zhou	965dad25a5	Allow resizing of parametrized tensors (#60418 ) Summary: Modify `parametrize.py` to allow resizing of parametrized tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/60418 Test Plan: `buck test mode/dev-nosan //caffe2/test:nn -- --exact 'caffe2/test:nn - test_register_and_remove_parametrization (test_nn.TestNN)'` https://pxl.cl/1L0wh Reviewed By: z-a-f Differential Revision: D29279442 Pulled By: kazhou fbshipit-source-id: 4d94915748f896e7761a40ad18f4c6444f505c3a	2021-06-28 12:57:11 -07:00
joerg-de	387289d4a5	support non-contiguous tensor in bilinear (#38409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38409 Reviewed By: anjali411 Differential Revision: D29361043 Pulled By: albanD fbshipit-source-id: 05147a9b0f7a47204bcd5ff70e281a464e8de1e6	2021-06-28 10:43:21 -07:00
Thomas J. Fan	e63db3ae46	ENH Adds byte support for nll_loss (CUDA) (#60650 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59765 This PR adds byte support for nll_loss on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60650 Reviewed By: albanD Differential Revision: D29429456 Pulled By: jbschlosser fbshipit-source-id: 894c969ed6bfc6117dee8e844a7cb5b99977247c	2021-06-28 08:20:13 -07:00
Natalia Gimelshein	5b118a7f23	Don't reference reflection_pad3d in functional.py (#60837 ) Summary: To work around FC issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/60837 Reviewed By: jbschlosser Differential Revision: D29421142 Pulled By: ngimel fbshipit-source-id: f5c1d9c324173b628e286f9005edf7109162066f	2021-06-27 20:54:32 -07:00
mingfeima	dd045ab540	add channels last for AdapativeMaxPool2d (#48920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48920 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25399467 Pulled By: VitalyFedyunin fbshipit-source-id: d9d2cc728cc7a18a26983e96d3c3e81a23659e89	2021-06-25 16:36:20 -07:00
Hongbo Zhang	ad69e2fd11	[torch] Module fix on the support of LazyModule on bug #60132 (#60517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60517 This is to fix the module support on lazymodulefixin on the bug issue #60132 Check the link: https://github.com/pytorch/pytorch/issues/60132 We will have to update lazy_extension given the dependency on module.py and update the unit test as well. Test Plan: Unit test passes torchrec test passes Reviewed By: albanD Differential Revision: D29274068 fbshipit-source-id: 1c20f7f0556e08dc1941457ed20c290868346980	2021-06-25 16:20:19 -07:00
lezcano	3a838e4ce3	Parametrizations depending on several inputs (#60530 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/58488 There was a line that had been changed in `test_nn.py` as caught in https://github.com/pytorch/pytorch/pull/58488#discussion_r651267668 I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3	2021-06-25 09:16:57 -07:00
Xiaomeng Yang	963c983366	Improve numerical stability of LayerNorm (#59987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59987 Similar as GroupNorm, improve numerical stability of LayerNorm by Welford algorithm and pairwise sum. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" Reviewed By: ngimel Differential Revision: D29115235 fbshipit-source-id: 5183346c3c535f809ec7d98b8bdf6d8914bfe790	2021-06-25 02:22:42 -07:00
mingfeima	5a077bb10b	Optimize some redunction operators on CPU BFloat16 (#55202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55202 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28836790 Pulled By: VitalyFedyunin fbshipit-source-id: f3a29633d85eb5a614652e568140e9b19509f959	2021-06-24 10:50:24 -07:00
Thomas J. Fan	99b641169b	Migrates nll_loss_forward from TH to Aten (CUDA) (#60097 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24610 Aten Umbrella issue https://github.com/pytorch/pytorch/issues/24507 Related to https://github.com/pytorch/pytorch/issues/59765 The performance does not change between this PR and master with the following benchmark script: <details> <summary>Benchmark script</summary> ```python import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): torch.cuda.synchronize() MS_PER_SECOND = 1000 return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 30 softmax = nn.LogSoftmax(dim=1) n_runs = 250 for reduction in ["none", "mean", "sum"]: for N in [100_000, 500_000, 1_000_000]: fwd_t = 0 bwd_t = 0 data = torch.randn(N, C, device=device) target = torch.empty(N, dtype=torch.long, device=device).random_(0, C) loss = nn.NLLLoss(reduction=reduction) input = softmax(data) for i in range(n_runs): t1 = _time() result = loss(input, target) t2 = _time() fwd_t = fwd_t + (t2 - t1) fwd_avg = fwd_t / n_runs print( f"input size({N}, {C}), reduction: {reduction} " f"forward time is {fwd_avg:.2f} (ms)" ) print() ``` </details> ## master ``` input size(100000, 30), reduction: none forward time is 0.02 (ms) input size(500000, 30), reduction: none forward time is 0.08 (ms) input size(1000000, 30), reduction: none forward time is 0.15 (ms) input size(100000, 30), reduction: mean forward time is 1.81 (ms) input size(500000, 30), reduction: mean forward time is 8.24 (ms) input size(1000000, 30), reduction: mean forward time is 16.46 (ms) input size(100000, 30), reduction: sum forward time is 1.66 (ms) input size(500000, 30), reduction: sum forward time is 8.24 (ms) input size(1000000, 30), reduction: sum forward time is 16.46 (ms) ``` ## this PR ``` input size(100000, 30), reduction: none forward time is 0.02 (ms) input size(500000, 30), reduction: none forward time is 0.08 (ms) input size(1000000, 30), reduction: none forward time is 0.15 (ms) input size(100000, 30), reduction: mean forward time is 1.80 (ms) input size(500000, 30), reduction: mean forward time is 8.24 (ms) input size(1000000, 30), reduction: mean forward time is 16.46 (ms) input size(100000, 30), reduction: sum forward time is 1.66 (ms) input size(500000, 30), reduction: sum forward time is 8.24 (ms) input size(1000000, 30), reduction: sum forward time is 16.46 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60097 Reviewed By: mrshenli Differential Revision: D29303099 Pulled By: ngimel fbshipit-source-id: fc0d636543a79ea81158d286dcfb84043bec079a	2021-06-23 19:47:01 -07:00
Thomas J. Fan	da030c59e7	ENH Adds Byte support for nll_loss (CPU) (#60308 ) Summary: Addresses a part of https://github.com/pytorch/pytorch/issues/59765 This PR adds byte support for nll_loss on the CPU for `input.dim() == 2`. CUDA support will be implemented when `nll_loss` migration to CUDA is completed in https://github.com/pytorch/pytorch/pull/60299 and https://github.com/pytorch/pytorch/pull/60097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60308 Reviewed By: VitalyFedyunin Differential Revision: D29329458 Pulled By: jbschlosser fbshipit-source-id: d3585c4966030bc61e451f8aa817406a8a3acf47	2021-06-23 12:16:45 -07:00
Nikita Shulga	7b2d375148	Fix convolution_depthwise3x3_winograd for multichannel output (#60460 ) Summary: Before this change it was implemented with the assumption, that number of groups, input and output channels are the same, which is not always the case Extend the implementation to support any number of output channels as long as number of groups equals to the number of input channels (i.e. kernel.size(1) == 1) Fixes https://github.com/pytorch/pytorch/issues/60176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60460 Reviewed By: albanD Differential Revision: D29299693 Pulled By: malfet fbshipit-source-id: 31130c71ce86535ccfba2f4929eee3e2e287b2f0	2021-06-23 10:38:14 -07:00
Ilqar Ramazanli	79dc500a99	Add error message for sequence length to be equal to 0 case for RNNs (#60269 ) Summary: Fixes #https://github.com/pytorch/pytorch/issues/50192 It has been discussed in the issue that, currently RNN apis do not support inputs with `seq_len=0` and the error message does not reflect this issue clearly. This PR is suggesting a solution to this issue, by adding a more clear error message that, none of RNN api (nn.RNN, nn.GRU and nn.LSTM) do not support `seq_len=0` for neither one-directional nor bi-directional layers. ``` import torch input_size = 5 hidden_size = 6 rnn = torch.nn.GRU(input_size, hidden_size) for seq_len in reversed(range(4)): output, h_n = rnn(torch.zeros(seq_len, 10, input_size)) print('{}, {}'.format(output.shape, h_n.shape)) ``` Previously was giving output as : ``` torch.Size([3, 10, 6]), torch.Size([1, 10, 6]) torch.Size([2, 10, 6]), torch.Size([1, 10, 6]) torch.Size([1, 10, 6]), torch.Size([1, 10, 6]) Traceback (most recent call last): File "test.py", line 8, in <module> output, h_n = rnn(torch.zeros(seq_len, 10, input_size)) File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 739, in forward result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: stack expects a non-empty TensorList ``` However, after adding this PR, this error message change for any combination of [RNN, GRU and LSTM] x [one-directional, bi-directional]. Let's illustrate the change with the following code snippet: ``` import torch input_size = 5 hidden_size = 6 rnn = torch.nn.LSTM(input_size, hidden_size, bidirectional=True) output, h_n = rnn(torch.zeros(0, 10, input_size)) ``` would give output as following: ``` Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/fsx/users/iramazanli/pytorch/torch/nn/modules/module.py", line 1054, in _call_impl return forward_call(input, kwargs) File "/fsx/users/iramazanli/pytorch/torch/nn/modules/rnn.py", line 837, in forward result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: Expected sequence length to be larger than 0 in RNN ``` ********************************* The change for Packed Sequence didn't seem to be necessary because from the following code snippet error message looks clear about the issue: ``` import torch import torch.nn.utils.rnn as rnn_utils import torch.nn as nn packed = rnn_utils.pack_sequence([]) ``` returns: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 398, in pack_sequence return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted) File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 363, in pad_sequence return torch._C._nn.pad_sequence(sequences, batch_first, padding_value) RuntimeError: received an empty list of sequences ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60269 Reviewed By: mrshenli Differential Revision: D29299914 Pulled By: iramazanli fbshipit-source-id: 5ca98faa28d4e6a5a2f7600a30049de384a3b132	2021-06-22 21:25:05 -07:00
Philip Meier	0c916c8a4e	up the priority of numpy array comparisons in self.assertEqual (#59067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58988. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067 Reviewed By: jbschlosser Differential Revision: D28986642 Pulled By: heitorschueroff fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0	2021-06-22 13:07:07 -07:00
Jeffrey Wan	b34965435d	Improve testing of inplace views (#59891 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/49825 by improving the testing - Rename some of the old tests that had "inplace_view" in their names, but actually mean "inplace_[update_]on_view" so there is no confusion with the naming - Adds some tests in test_view_ops that verify basic behavior - Add tests that creation meta is properly handled for no-grad, multi-output, and custom function cases - Add test that verifies that in the cross dtype view case, the inplace views won't be accounted in the backward graph on rebase as mentioned in the issue. - Update inference mode tests to also check in-place Pull Request resolved: https://github.com/pytorch/pytorch/pull/59891 Reviewed By: albanD Differential Revision: D29272546 Pulled By: soulitzer fbshipit-source-id: b12acf5f0e3f788167ebe268423cdb58481b56f6	2021-06-22 12:28:09 -07:00
Thomas J. Fan	c16f87949f	ENH Adds nn.ReflectionPad3d (#59791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27655 This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791 Reviewed By: gchanan Differential Revision: D29242015 Pulled By: jbschlosser fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56	2021-06-21 10:53:14 -07:00
Eddie Yan	3870e68644	TF32 threshold twiddling for tests (#60209 ) Summary: Following https://github.com/pytorch/pytorch/issues/59624 I observed some straggling failing tests on Ampere due to TF32 thresholds. This PR just twiddles some more thresholds to fix the (6) failing tests I saw on A100. CC Flamefire ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60209 Reviewed By: gchanan Differential Revision: D29220508 Pulled By: ngimel fbshipit-source-id: 7c83187a246e1b3a24b181334117c0ccf2baf311	2021-06-18 11:41:33 -07:00
Alban Desmaison	5c1d17e697	Revert D29100708: [pytorch][PR] Parametrizations depending on several inputs Test Plan: revert-hammer Differential Revision: D29100708 (`061e71b199`) Original commit changeset: b9e91f439cf6 fbshipit-source-id: bff6d8a3d7b24f4beb976383912033c250d91a53	2021-06-14 14:08:50 -07:00
lezcano	061e71b199	Parametrizations depending on several inputs (#58488 ) Summary: Makes possible that the first register parametrization depends on a number of parameters rather than just one. Examples of these types of parametrizations are `torch.nn.utils.weight_norm` and low rank parametrizations via the multiplication of a `n x k` tensor by a `k x m` tensor with `k <= m, n`. Follows the plan outlined in https://github.com/pytorch/pytorch/pull/33344#issuecomment-768574924. A short summary of the idea is: we call `right_inverse` when registering a parametrization to generate the tensors that we are going to save. If `right_inverse` returns a sequence of tensors, then we save them as `original0`, `original1`... If it returns a `Tensor` or a sequence of length 1, we save it as `original`. We only allow to have many-to-one parametrizations in the first parametrization registered. The next parametrizations would need to be one-to-one. There were a number of choices in the implementation: If the `right_inverse` returns a sequence of parameters, then we unpack it in the forward. This is to allow to write code as: ```python class Sum(nn.Module): def forward(self, X, Y): return X + Y def right_inverse(Z): return Z, torch.zeros_like(Z) ``` rather than having to unpack manually a list or a tuple within the `forward` function. At the moment the errors are a bit all over the place. This is to avoid having to check some properties of `forward` and `right_inverse` when they are registered. I left this like this for now, but I believe it'd be better to call these functions when they are registered to make sure the invariants hold and throw errors as soon as possible. The invariants are the following: 1. The following code should be well-formed ```python X = module.weight Y = param.right_inverse(X) assert isinstance(Y, Tensor) or isinstance(Y, collections.Sequence) Z = param(Y) if isisntance(Y, Tensor) else param(*Y) ``` in other words, if `Y` is a `Sequence` of `Tensor`s (we check also that the elements of the sequence are Tensors), then it is of the same length as the number parameters `param.forward` accepts. 2. Always: `X.dtype == Z.dtype and X.shape == Z.shape`. This is to protect the user from shooting themselves in the foot, as it's too odd for a parametrization to change the metadata of a tensor. 3. If it's one-to-one: `X.dtype == Y.dtype`. This is to be able to do `X.set_(Y)` so that if a user first instantiates the optimiser and then puts the parametrisation, then we reuse `X` and the user does not need to add a new parameter to the optimiser. Alas, this is not possible when the parametrisation is many-to-one. The current implementation of `spectral_norm` and `weight_norm` does not seem to care about this, so this would not be a regression. I left a warning in the documentation though, as this case is a bit tricky. I'm still missing to go over the formatting of the documentation, I'll do that tomorrow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58488 Reviewed By: soulitzer Differential Revision: D29100708 Pulled By: albanD fbshipit-source-id: b9e91f439cf6b5b54d5fa210ec97c889efb9da38	2021-06-14 11:11:47 -07:00
Xiaomeng Yang	ff15d93b88	Improve numerical stability of GroupNorm (#54921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54921 Improve numerical stability of GroupNorm Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GroupNorm" Reviewed By: ngimel Differential Revision: D27414438 fbshipit-source-id: 815517240ca5ea3e2beb77ced3bd862e9c83d445	2021-06-13 16:13:32 -07:00
lezcano	1f6e39336f	Simplify parametrizations.SpectralNorm and improve its initialization (#59564 ) Summary: Implements a number of changes discussed with soulitzer offline. In particular: - Initialise `u`, `v` in `__init__` rather than in `_update_vectors` - Initialise `u`, `v` to some reasonable vectors by doing 15 power iterations at the start - Simplify the code of `_reshape_weight_to_matrix` (and make it faster) by using `flatten` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59564 Reviewed By: ailzhang Differential Revision: D29066238 Pulled By: soulitzer fbshipit-source-id: 6a58e39ddc7f2bf989ff44fb387ab408d4a1ce3d	2021-06-11 19:52:44 -07:00
mingfeima	f3218568ad	optimize channels last for BatchNorm2d on CPU (#59286 ) Summary: replacement of https://github.com/pytorch/pytorch/issues/48919 optimize channels last performance for BatchNorm2 on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59286 Reviewed By: bdhirsh Differential Revision: D29008198 Pulled By: VitalyFedyunin fbshipit-source-id: 8a7d020bd6a42ab5c21ffe788b79a22f4ec82ac0	2021-06-11 16:30:16 -07:00
mingfeima	bb19dc14cc	add channels last support for AvgPool2d on CPU (#58725 ) Summary: replacement of: https://github.com/pytorch/pytorch/pull/48918 enable test case on AvgPool2d channels last for CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/58725 Reviewed By: ngimel Differential Revision: D28593169 Pulled By: VitalyFedyunin fbshipit-source-id: 5de870fe1d9dd961fb0dab5f9d531ab14614a160	2021-06-09 21:06:45 -07:00
Kimish Patel	c5bee1ec4f	[PyTorch] Parallelize gelu via tensoriterator (#58950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58950 Use tensor iterator's API to set grain size in order to parallelize gelu op. ghstack-source-id: 130947174 Test Plan: test_gelu Reviewed By: ezyang Differential Revision: D28689819 fbshipit-source-id: 0a02066d47a4d9648323c5ec27d7e0e91f4c303a	2021-06-09 16:09:38 -07:00
Alexander Grund	804f924504	Fix accuraccy failures when running test_nn on A100s (#59624 ) Summary: Make sure tests run explicitely without TF32 don't use TF32 operations Fixes https://github.com/pytorch/pytorch/issues/52278 After the tf32 accuracy tolerance was increased to 0.05 this is the only remaining change required to fix the above issue (for TestNN.test_Conv3d_1x1x1_no_bias_cuda) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59624 Reviewed By: heitorschueroff Differential Revision: D28996279 Pulled By: ngimel fbshipit-source-id: 7f1b165fd52cfa0898a89190055b7a4b0985573a	2021-06-09 14:38:34 -07:00
Nikita Vedeneev	c51abf8fca	Make `binary_cross_entropy` differentiable wrt `target` (#59447 ) Summary: As per title. Resolves https://github.com/pytorch/pytorch/issues/56683. `gradgradcheck` will fail once `target.requires_grad() == True` because of the limitations of the current double backward implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59447 Reviewed By: agolynski Differential Revision: D28910140 Pulled By: albanD fbshipit-source-id: 20934880eb4d22bec34446a6d1be0a38ef95edc7	2021-06-07 09:20:17 -07:00
Thomas J. Fan	7f2e620105	FIX Validates that weights are 2d in embedding (#59314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55185 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59314 Reviewed By: H-Huang Differential Revision: D28837753 Pulled By: jbschlosser fbshipit-source-id: 683378244c61b0937c95563f91ef87ab09fd1653	2021-06-02 12:52:21 -07:00
Jagadish Krishnamoorthy	95c26b2806	[ROCm] disable test test_Conv2d_groups_nobias for ROCm (#59158 ) Summary: Disabling the test since its failing in ROCm4.2 Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59158 Reviewed By: mruberry Differential Revision: D28808953 Pulled By: ngimel fbshipit-source-id: 134f147ead6dc559d2cde49cf8343cd976e6c224	2021-06-01 15:10:06 -07:00
Joel Schlosser	ef32a29c97	Back out "[pytorch][PR] ENH Adds dtype to nn.functional.one_hot" (#59080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59080 Original commit changeset: 3686579517cc Test Plan: None; reverting diff Reviewed By: albanD Differential Revision: D28746799 fbshipit-source-id: 75a7885ab0bf3abadde9a42b56d479f71f57c89c	2021-05-27 15:40:52 -07:00
Adnios	09a8f22bf9	Add mish activation function (#58648 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4	2021-05-25 10:36:21 -07:00
Thomas J. Fan	a7f4f80903	ENH Adds dtype to nn.functional.one_hot (#58090 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33046 Related to https://github.com/pytorch/pytorch/issues/53785 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58090 Reviewed By: zou3519 Differential Revision: D28640893 Pulled By: jbschlosser fbshipit-source-id: 3686579517ccc75beaa74f0f6d167f5e40a83fd2	2021-05-24 13:48:25 -07:00
Joel Schlosser	c58709b7bb	Helper function for skipping module parameter / buffer initialization (#57555 ) Summary: This PR introduces a helper function named `torch.nn.utils.skip_init()` that accepts a module class object + `args` / `kwargs` and instantiates the module while skipping initialization of parameter / buffer values. See discussion at https://github.com/pytorch/pytorch/issues/29523 for more context. Example usage: ```python import torch m = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1) print(m.weight) m2 = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1, device='cuda') print(m2.weight) m3 = torch.nn.utils.skip_init(torch.nn.Linear, in_features=5, out_features=1) print(m3.weight) ``` ``` Parameter containing: tensor([[-3.3011e+28, 4.5915e-41, -3.3009e+28, 4.5915e-41, 0.0000e+00]], requires_grad=True) Parameter containing: tensor([[-2.5339e+27, 4.5915e-41, -2.5367e+27, 4.5915e-41, 0.0000e+00]], device='cuda:0', requires_grad=True) Parameter containing: tensor([[1.4013e-45, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]], requires_grad=True) ``` Bikeshedding on the name / namespace is welcome, as well as comments on the design itself - just wanted to get something out there for discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57555 Reviewed By: zou3519 Differential Revision: D28640613 Pulled By: jbschlosser fbshipit-source-id: 5654f2e5af5530425ab7a9e357b6ba0d807e967f	2021-05-24 11:28:32 -07:00
Kyle Chen	52a8031e8c	[ROCm] disable test test_Conv2d_groups_nobias_v2 for ROCm (#58701 ) Summary: Disable test_Conv2d_groups_nobias_v2 test because it is failing on ROCm 4.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58701 Reviewed By: ngimel Differential Revision: D28626651 Pulled By: mruberry fbshipit-source-id: a74bdf45335ae2afee0aa5e3bece6e208e75a63f	2021-05-23 15:43:36 -07:00
Rong Rong (AI Infra)	c1c9be16c4	port mm to structure kernel (#57755 ) Summary: relate to https://github.com/pytorch/pytorch/issues/57417. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57755 Reviewed By: ezyang Differential Revision: D28426111 Pulled By: walterddr fbshipit-source-id: 943d3e36433ca846990b940177fb040553961156	2021-05-22 19:24:14 -07:00
Thomas J. Fan	151ec56311	ENH Adds check for input sizes in cosine_similarity (#58559 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55273 Adds check for input sizes to be consistent with the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58559 Reviewed By: soulitzer Differential Revision: D28562376 Pulled By: ailzhang fbshipit-source-id: f292e8a26f11a40d146fbed94a28025794808216	2021-05-20 11:40:06 -07:00
Thomas J. Fan	ee93a348de	ENH Raises nicer error when calling module.train with invalid modes (#58247 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46763 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58247 Reviewed By: ejguan Differential Revision: D28418080 Pulled By: albanD fbshipit-source-id: fef8f4f641ef75e801ed8b8d04c4016579aea8b0	2021-05-17 05:57:18 -07:00
Vitaly Fedyunin	49a8942a77	Revert D25399466: add channels last support for AvgPool2d on CPU Test Plan: revert-hammer Differential Revision: D25399466 (`8ac0917cc7`) Original commit changeset: 9477b0c281c0 fbshipit-source-id: e0245f0e390f5eca228445fd82d6e5142a827abc	2021-05-14 12:45:29 -07:00
Vitaly Fedyunin	0caec739a3	Revert D25399468: optimize channels last for BatchNorm2d on CPU Test Plan: revert-hammer Differential Revision: D25399468 (`0be334a1ba`) Original commit changeset: a4cd7a09cd4e fbshipit-source-id: cef74881adcdf193355fa5a77e816addd2e2c56e	2021-05-14 12:44:14 -07:00
mingfeima	0be334a1ba	optimize channels last for BatchNorm2d on CPU (#48919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48919 move data indexing utils parallel inference contiguous path parallel inference channels last path add dim apply optimize update stats add channels last support for backward Revert "add channels last support for backward" This reverts commit cc5e29dce44395250f8e2abf9772f0b99f4bcf3a. Revert "optimize update stats" This reverts commit 7cc6540701448b9cfd5833e36c745b5015ae7643. Revert "add dim apply" This reverts commit b043786d8ef72dee5cf85b5818fcb25028896ecd. bug fix add batchnorm nhwc test for cpu, including C=1 and HW=1 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399468 Pulled By: VitalyFedyunin fbshipit-source-id: a4cd7a09cd4e1a8f5cdd79c7c32c696d0db386bd	2021-05-14 11:09:42 -07:00
Peter Bell	064923e635	Improve native_batch_norm_backward performance (CUDA) (#58240 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38915 The original code uses a single kernel to do both the reduction and the elementwise backward calculations. Whereas the `SyncBatchNorm` kernels are split, which makes them slightly slower in some cases. I try to use the fused kernel when it's beneficial, but otherwise choose the optimized channels last split kernels. There is also eval mode, where the reduction is sometimes unnecessary in which case split kernels are a win even without channels last. Benchmarks on my system show significant speedups for channels last reductions and eval mode, with only a few minor slowdowns in training mode. These slowdowns are for 2 x 2048 shape in training, which is a small channels last inputs. But for larger batches or channels, the channels last kernels are much faster. \|N \|C \|L \|training\|backward\|old \|new \|cudnn \| \|----\|----\|----\|--------\|--------\|------\|------\|------\| \|1 \|256 \|3136\|TRUE \|all \|70.25 \|64.93 \|68.90 \| \| \| \| \|TRUE \|self \|69.77 \|64.61 \|69.42 \| \| \| \| \|FALSE \|all \|70.10 \|51.12 \|x \| \| \| \| \|FALSE \|self \|70.17 \|51.17 \|x \| \|3136\|256 \| \|TRUE \|all \|554.08\|76.63 \|549.88\| \| \| \| \|TRUE \|self \|553.34\|78.19 \|552.36\| \| \| \| \|FALSE \|all \|565.40\|55.09 \|x \| \| \| \| \|FALSE \|self \|565.71\|54.84 \|x \| \|2 \|8192\|1 \|TRUE \|all \|155.47\|47.26 \|202.26\| \| \| \| \|TRUE \|self \|155.46\|48.36 \|203.72\| \| \| \| \|FALSE \|all \|178.28\|40.90 \|x \| \| \| \| \|FALSE \|self \|178.21\|40.69 \|x \| \|2 \|2048\|1 \|TRUE \|all \|43.50 \|48.21 \|57.47 \| \| \| \| \|TRUE \|self \|43.63 \|47.24 \|55.22 \| \| \| \| \|FALSE \|all \|49.36 \|39.27 \|x \| \| \| \| \|FALSE \|self \|49.25 \|42.02 \|x \| \|128 \|8192\|1 \|TRUE \|all \|762.70\|106.45\|336.52\| \| \| \| \|TRUE \|self \|765.79\|107.04\|337.32\| \| \| \| \|FALSE \|all \|792.68\|74.94 \|x \| \| \| \| \|FALSE \|self \|793.86\|74.83 \|x \| \|128 \|2048\|1 \|TRUE \|all \|188.37\|46.20 \|85.02 \| \| \| \| \|TRUE \|self \|188.47\|47.57 \|85.04 \| \| \| \| \|FALSE \|all \|191.57\|40.44 \|x \| \| \| \| \|FALSE \|self \|190.13\|41.55 \|x \| \|2 \|8192\| \|TRUE \|all \|156.03\|43.01 \|155.19\| \| \| \| \|TRUE \|self \|156.24\|46.59 \|156.93\| \| \| \| \|FALSE \|all \|179.34\|40.06 \|x \| \| \| \| \|FALSE \|self \|179.20\|41.85 \|x \| \|2 \|2048\| \|TRUE \|all \|44.05 \|50.15 \|44.21 \| \| \| \| \|TRUE \|self \|44.10 \|48.97 \|44.11 \| \| \| \| \|FALSE \|all \|49.72 \|40.95 \|x \| \| \| \| \|FALSE \|self \|49.87 \|43.43 \|x \| \|128 \|8192\| \|TRUE \|all \|775.19\|96.60 \|777.64\| \| \| \| \|TRUE \|self \|776.20\|96.85 \|774.21\| \| \| \| \|FALSE \|all \|797.64\|68.01 \|x \| \| \| \| \|FALSE \|self \|806.25\|68.05 \|x \| \|128 \|2048\| \|TRUE \|all \|188.49\|48.10 \|188.97\| \| \| \| \|TRUE \|self \|188.07\|46.97 \|187.98\| \| \| \| \|FALSE \|all \|192.32\|43.78 \|x \| \| \| \| \|FALSE \|self \|193.72\|40.82 \|x \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/58240 Reviewed By: bdhirsh Differential Revision: D28435158 Pulled By: ngimel fbshipit-source-id: bf62a1ee1c5d95a2caf55bee6176ae5c965688ec	2021-05-14 09:29:05 -07:00
Freey0	cf1daf571d	Port silu to structured (#58050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58050 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28382790 Pulled By: ezyang fbshipit-source-id: 5aeedfe39b5f15d14022d1e9edec1b30e98e5076	2021-05-14 00:49:10 -07:00
Freey0	f23e10f27b	Port softshrink to structured (#57623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57623 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224703 Pulled By: ezyang fbshipit-source-id: 62e40d53eb130205f6c4d2775082e436e6adadce	2021-05-14 00:49:09 -07:00
Freey0	401d0fe8c5	Port leaky_relu to structured (#57621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57621 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224706 Pulled By: ezyang fbshipit-source-id: 168b175d0fd9e0cc3335ea00df4c7967fea77819	2021-05-14 00:49:05 -07:00
Freey0	9dba26eed1	Port softplus to structured (#57620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57620 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224705 Pulled By: ezyang fbshipit-source-id: a48419f5958e4d29427fb1fec7ff929f0297e4e4	2021-05-14 00:49:04 -07:00
Freey0	03398b7edb	Port elu to structured (#57619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57619 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224707 Pulled By: ezyang fbshipit-source-id: 9e1cad3f5536c65ada2d951366de134ebcb6bb3f	2021-05-14 00:47:41 -07:00
mingfeima	8ac0917cc7	add channels last support for AvgPool2d on CPU (#48918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48918 enable test case on AvgPool2d channels last for CPU Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399466 Pulled By: VitalyFedyunin fbshipit-source-id: 9477b0c281c0de5ed981a97e2dcbe6072d7f0aef	2021-05-13 18:05:57 -07:00
Jeffrey Wan	e1bb9d2d99	Reimplement spectral_norm using new parametrization functionality (#57784 ) Summary: Adds a new file under `torch/nn/utils/parametrizations.py` which should contain all the parametrization implementations For spectral_norm we add the `SpectralNorm` module which can be registered using `torch.nn.utils.parametrize.register_parametrization` or using a wrapper: `spectral_norm`, the same API the old implementation provided. Most of the logic is borrowed from the old implementation: - Just like the old implementation, there should be cases when retrieving the weight should perform another power iteration (thus updating the weight) and cases where it shouldn't. For example in eval mode `self.training=True`, we do not perform power iteration. There are also some differences/difficulties with the new implementation: - Using new parametrization functionality as-is there doesn't seem to be a good way to tell whether a 'forward' call was the result of parametrizations are unregistered (and leave_parametrizations=True) or when the injected property's getter was invoked. The issue is that we want perform power iteration in the latter case but not the former, but we don't have this control as-is. So, in this PR I modified the parametrization functionality to change the module to eval mode before triggering their forward call - Updates the vectors based on weight on initialization to fix https://github.com/pytorch/pytorch/issues/51800 (this avoids silently update weights in eval mode). This also means that we perform twice any many power iterations by the first forward. - right_inverse is just the identity for now, but maybe it should assert that the passed value already satisfies the constraints - So far, all the old spectral_norm tests have been cloned, but maybe we don't need so much testing now that the core functionality is already well tested Pull Request resolved: https://github.com/pytorch/pytorch/pull/57784 Reviewed By: ejguan Differential Revision: D28413201 Pulled By: soulitzer fbshipit-source-id: e8f1140f7924ca43ae4244c98b152c3c554668f2	2021-05-13 14:16:13 -07:00
lezcano	d8c6b74b0b	Deprecate torch.solve (#57741 ) Summary: Deprecate deprecate deprecate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57741 Reviewed By: agolynski Differential Revision: D28379337 Pulled By: mruberry fbshipit-source-id: a7a35ce1d3f25d8593698d89761c6c2d940db31a	2021-05-13 09:54:21 -07:00
Natalia Gimelshein	e573987bea	remove syncs in one_hot (#57902 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55579 Now on cuda one-hot relies on device-side asserts thrown by scatter Pull Request resolved: https://github.com/pytorch/pytorch/pull/57902 Reviewed By: bdhirsh Differential Revision: D28328698 Pulled By: ngimel fbshipit-source-id: 1cd13e2c123c733cde7dbe4cbe6ff5335063bb70	2021-05-11 17:54:08 -07:00
Sigmund_Rolfsjord	8b12c8e8b3	Fixes: register_full_backward_hook crash if first argument don't require a gradient (#57944 ) (#57945 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57945 Reviewed By: mruberry Differential Revision: D28351929 Pulled By: albanD fbshipit-source-id: d0db898e6bf13d1877cd81892a5a65c7854c8102	2021-05-11 15:07:35 -07:00
Zheng Yan	ee48bd089c	Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants (#55189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55189 Currently EmbeddingBag and it variants support either int32 or int64 indices/offsets. We have use cases where there are mix of int32 and int64 indices which are not supported yet. To avoid introducing too many branches we could simply cast offsets type to indices type when they are not the same. Test Plan: unit tests Reviewed By: allwu Differential Revision: D27482738 fbshipit-source-id: deeadd391d49ff65d17d016092df1839b82806cc	2021-05-10 23:23:50 -07:00
Thomas J. Fan	3ec16035f2	TST Migrates some of test_nn.py from assertEqualIgnoreTypes to assertEqual (#57642 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38095, https://github.com/pytorch/pytorch/issues/50006 Migrates some of `test_nn.py` from `assertEqualIgnoreTypes` to `assertEqual` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57642 Reviewed By: bdhirsh Differential Revision: D28317761 Pulled By: mruberry fbshipit-source-id: 6bea6f669569922b2a391d1523917edde976f014	2021-05-10 23:10:29 -07:00
Richard Zou	0787d781c5	Fix compatibility problem with LSTMs and torch.save (#57558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57558 Fixes #53359 If someone directly saves an nn.LSTM in PyTorch 1.7 and then loads it in PyTorch 1.8, it errors out with the following: ``` (In PyTorch 1.7) import torch model = torch.nn.LSTM(2, 3) torch.save(model, 'lstm17.pt') (In PyTorch 1.8) model = torch.load('lstm17.pt') AttributeError: 'LSTM' object has no attribute 'proj_size' ``` Although we do not officially support this (directly saving modules via torch.save), it used to work and the fix is very simple. This PR adds an extra line to `__setstate__`: if the state we are passed does not have a `proj_size` attribute, we assume it was saved from PyTorch 1.7 and older and set `proj_size` equal to 0. Test Plan: I wrote a test that tests `__setstate__`. But also, Run the following: ``` (In PyTorch 1.7) import torch x = torch.ones(32, 5, 2) model = torch.nn.LSTM(2, 3) torch.save(model, 'lstm17.pt') y17 = model(x) (Using this PR) model = torch.load('lstm17.pt') x = torch.ones(32, 5, 2) y18 = model(x) ``` and finally compare y17 and y18. Reviewed By: mrshenli Differential Revision: D28198477 Pulled By: zou3519 fbshipit-source-id: e107d1ebdda23a195a1c3574de32a444eeb16191	2021-05-05 07:36:13 -07:00
Xiao Wang	ac72881f3f	Fix a numerical issue of CUDA channels-last SyncBatchNorm (#57077 ) Summary: Fix a numerical issue of CUDA channels-last SyncBatchNorm The added test is a repro for the numerical issue. Thanks for the help from jjsjann123 who identified the root cause. Since pytorch SBN channels-last code was migrated from [nvidia/apex](https://github.com/nvidia/apex), apex SBN channels-last also has this issue. We will submit a fix there soon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57077 Reviewed By: mruberry Differential Revision: D28107672 Pulled By: ngimel fbshipit-source-id: 0c80e79ddb48891058414ad8a9bedd80f0f7f8df	2021-04-29 21:38:52 -07:00
Joel Schlosser	f7fba854bf	Implement module.to_empty() (#56610 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56610 Reviewed By: malfet Differential Revision: D27921653 Pulled By: jbschlosser fbshipit-source-id: 10734b3eaa5b84bb4ba6eeba1043cfc8bb570a17	2021-04-27 06:19:54 -07:00
Xiao Wang	7b31ba4708	Fix cudnn ctc loss backward (#56639 ) Summary: Fix cudnn ctc loss backward Fix https://github.com/pytorch/pytorch/issues/49046, which was working in pytorch 1.1 Originally modified in this PR in Oct 2019, https://github.com/pytorch/pytorch/pull/27039/files#diff-25ec2c1108ee03e2167622588ec31d167897ef1cccb12a4cfe77eb98777316daR2383-R2392 According to the original code `90ffab6e37/tools/autograd/derivatives.yaml (L1387-L1388)` and the code after PR `f461184505/tools/autograd/templates/Functions.cpp (L2456-L2465)` This `at::zeros({0}, raw_grad.options())` in line 2460 seems suspicious, and is causing `infer_size` runtime error ``` RuntimeError: The size of tensor a (0) must match the size of tensor b (177) at non-singleton dimension 2 Exception raised from infer_size at ..\aten\src\ATen\ExpandUtils.cpp:24 (most recent call first): ``` I've modified that to `at::zeros_like(raw_grad)`, which looks more accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56639 Reviewed By: mruberry Differential Revision: D27987860 Pulled By: ngimel fbshipit-source-id: 5ad65e78d017c26894fb26318a5992b0878d04d5	2021-04-25 22:51:19 -07:00
Joel Schlosser	7d2a9f2dc9	Fix instance norm input size validation + test (#56659 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45687 Fix changes the input size check for `InstanceNorm*d` to be more restrictive and correctly reject sizes with only a single spatial element, regardless of batch size, to avoid infinite variance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56659 Reviewed By: pbelevich Differential Revision: D27948060 Pulled By: jbschlosser fbshipit-source-id: 21cfea391a609c0774568b89fd241efea72516bb	2021-04-23 10:53:39 -07:00
albanD	22b151a3ba	Make sure full backward hook fire when no input requires grad (#56693 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56380 BC-breaking note: This changes the behavior of full backward hooks as they will now fire properly even if no input to the Module require gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56693 Reviewed By: ezyang Differential Revision: D27947030 Pulled By: albanD fbshipit-source-id: e8353d769ba5a2c1b6bdf3b64e2d61308cf624a2	2021-04-23 08:46:49 -07:00
Joel Schlosser	e5fda07e80	Fix: Compare input against beta * threshold in softplus backwards (#56484 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55587 The fix converts the binary `TensorIterator` used by softplus backwards to a ternary one, adding in the original input for comparison against `beta * threshold`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56484 Reviewed By: malfet Differential Revision: D27908372 Pulled By: jbschlosser fbshipit-source-id: 73323880a5672e0242879690514a17886cbc29cd	2021-04-23 07:58:51 -07:00
Kurt Mohler	1f04494c0e	Consolidate nondeterministic error tests (#55631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55631 Reviewed By: malfet Differential Revision: D27909953 Pulled By: mruberry fbshipit-source-id: 9115b2433f9c276555be55bd51b270a7a2846829	2021-04-22 23:37:01 -07:00
Jeffrey Wan	d01302431c	Enable fast gradcheck for real inputs and outputs (#55237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55237 In this PR, we reenable fast-gradcheck and resolve misc issues that arise: Before landing this PR, land #55182 so that slow tests are still being run periodically. Bolded indicates the issue is handled in this PR, otherwise it is handled in a previous PR. Non-determinism issues: - ops that do not have deterministic implementation (as documented https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms) - test_pad_cuda (replication_pad2d) (test_nn) - interpolate (test_nn) - cummin, cummax (scatter_add_cuda_kernel) (test_ops) - test_fn_gradgrad_prod_cpu_float64 (test_ops) Randomness: - RRelu (new module tests) - we fix by using our own generator as to avoid messing with user RNG state (handled in #54480) Numerical precision issues: - jacobian mismatch: test_gelu (test_nn, float32, not able to replicate locally) - we fixed this by disabling for float32 (handled in previous PR) - cholesky_solve (test_linalg): #56235 handled in previous PR - cumprod (test_ops) - #56275 disabled fast gradcheck Not yet replicated: - test_relaxed_one_hot_categorical_2d (test_distributions) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27920906 fbshipit-source-id: 894dd7bf20b74f1a91a5bc24fe56794b4ee24656	2021-04-22 19:46:37 -07:00
Jeffrey Wan	2ea3c24c06	Disable flaky tests (#56279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56279 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27916606 Pulled By: soulitzer fbshipit-source-id: 60c07024f6eb818f4aa6730a5f9ff90d7bc2b80f	2021-04-22 19:45:41 -07:00
Nikita Shulga	9be2cabc45	Pass contiguous weight to NNPACK convolution (#56569 ) Summary: Added TestNN.test_conv2d_discontiguous_weight to prevent further regressions Fixes https://github.com/pytorch/pytorch/issues/55781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56569 Reviewed By: ngimel Differential Revision: D27926509 Pulled By: malfet fbshipit-source-id: fa5ce943c3e4db4aa4de1b1cba35bd399fb3c54d	2021-04-22 08:45:24 -07:00
M.L. Croci	1f0223d6bb	Fix bug in gaussian_nll_loss (#56469 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53964. cc albanD almson ## Major changes: - Overhauled the actual loss calculation so that the shapes are now correct (in functional.py) - added the missing doc in nn.functional.rst ## Minor changes (in functional.py): - I removed the previous check on whether input and target were the same shape. This is to allow for broadcasting, say when you have 10 predictions that all have the same target. - I added some comments to explain each shape check in detail. Let me know if these should be shortened/cut. Screenshots of updated docs attached. Let me know what you think, thanks! ## Edit: Description of change of behaviour (affecting BC): The backwards-compatibility is only affected for the `reduction='none'` mode. This was the source of the bug. For tensors with size (N, D), the old returned loss had size (N), as incorrect summation was happening. It will now have size (N, D) as expected. ### Example Define input tensors, all with size (2, 3). `input = torch.tensor([[0., 1., 3.], [2., 4., 0.]], requires_grad=True)` `target = torch.tensor([[1., 4., 2.], [-1., 2., 3.]])` `var = 2*torch.ones(size=(2, 3), requires_grad=True)` Initialise loss with reduction mode 'none'. We expect the returned loss to have the same size as the input tensors, (2, 3). `loss = torch.nn.GaussianNLLLoss(reduction='none')` Old behaviour: `print(loss(input, target, var)) ` `# Gives tensor([3.7897, 6.5397], grad_fn=<MulBackward0>. This has size (2).` New behaviour: `print(loss(input, target, var)) ` `# Gives tensor([[0.5966, 2.5966, 0.5966], [2.5966, 1.3466, 2.5966]], grad_fn=<MulBackward0>)` `# This has the expected size, (2, 3).` To recover the old behaviour, sum along all dimensions except for the 0th: `print(loss(input, target, var).sum(dim=1))` `# Gives tensor([3.7897, 6.5397], grad_fn=<SumBackward1>.` ![doc1](https://user-images.githubusercontent.com/26558092/115391089-f7f47b00-a1d6-11eb-8726-e4da9057aee0.png) ![doc2](https://user-images.githubusercontent.com/26558092/115391094-f925a800-a1d6-11eb-954b-afd187f42bc7.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56469 Reviewed By: jbschlosser, agolynski Differential Revision: D27894170 Pulled By: albanD fbshipit-source-id: 197890189c97c22109491c47f469336b5b03a23f	2021-04-22 07:43:48 -07:00
beningodfrey4	df1dfd879e	Fix errors when initializing Linear with 0 in_features (#56505 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56505 Reviewed By: malfet Differential Revision: D27919590 Pulled By: jbschlosser fbshipit-source-id: 462ca280051f63c31ff588c38a9e436116c0f336	2021-04-21 20:42:32 -07:00
Xiao Wang	3ec6bf5d26	Fix cuda launch error in reflection_pad2d (#56451 ) Summary: Fix https://github.com/pytorch/pytorch/issues/55222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56451 Reviewed By: malfet Differential Revision: D27912184 Pulled By: ngimel fbshipit-source-id: 3fc80273c30a68a247289d3fb698f99b92931731	2021-04-21 14:39:31 -07:00
Shai Bagon	a583b9cd86	Fixing "naive" `forward` of `ModuleList` and `ModuleDict (#48785 ) Summary: Goal: Making sure "calling"/"forwarding" a `ModuleList` or `ModuleDict` produce the intended `NotImpmentedError`. Current behavior: Currently, when naively calling `forward` user ends up with the confusing error message: ```python TypeError: forward() takes 1 positional argument but 2 were given ``` Instead of the intended `NotImplementedError.` This minor issue was brought up by vadimkantorov in issue https://github.com/pytorch/pytorch/issues/37718 [here][1], also by a confused stackoverflow user [here][2]. What this PR includes: Remove `forward` altogether from `ModuleList` and `ModuleDict` to fall back on the `_forward_unimplemented` of `Module` that properly throws `NotImplementedError` regardless of input arguments. Appropriate test was added to `test_nn.py` Fixes previous PR https://github.com/pytorch/pytorch/issues/48698 and PR https://github.com/pytorch/pytorch/issues/48783 (third time's a charm? I'm really sorry for the mess) Test added according to ngimel [request][3]. [1]: https://github.com/pytorch/pytorch/issues/37718#issuecomment-736333345 [2]: https://stackoverflow.com/q/65096679/1714410 [3]: https://github.com/pytorch/pytorch/pull/48698#issuecomment-737398693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48785 Reviewed By: zhangguanheng66 Differential Revision: D25359759 Pulled By: jbschlosser fbshipit-source-id: 28f82386f2e9a2a9b0b0b81b16dba6b79398bd34	2021-04-21 10:43:07 -07:00
mingfeima	1e03a2505f	add channels last for MaxPool2d (#56361 ) Summary: add channels last support for MaxPool2d. this one is a replacement of https://github.com/pytorch/pytorch/pull/48917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56361 Reviewed By: heitorschueroff Differential Revision: D27874142 Pulled By: VitalyFedyunin fbshipit-source-id: bc9604def9c974d7b59621fc709a39948088b992	2021-04-20 15:02:18 -07:00
eqy	42f0fe1fe3	fix misaligned access #56325 (#56403 ) Summary: CC ngimel ptrblck ref: https://github.com/pytorch/pytorch/issues/56325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56403 Reviewed By: mruberry Differential Revision: D27866625 Pulled By: ngimel fbshipit-source-id: 9dff0e9749f8de57fac6a653f685c14854611a02	2021-04-19 20:12:03 -07:00
Jeffrey Wan	dd8bfe2b93	Finish deprecation cycle for inplace view error checks (#56093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50617 Also updates the relevant tests to expect errors instead of warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/56093 Reviewed By: agolynski Differential Revision: D27806795 Pulled By: soulitzer fbshipit-source-id: 93c5c28edb1f97fa4457332c2ef4711f050ac81f	2021-04-16 10:44:58 -07:00
Jerry Zhang	0a541e23e1	[nn] Add allow_duplicate option for named_modules (#54812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54812 Needed for quantization since different attribute might refer to the same module instance Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27408376 fbshipit-source-id: cada85c4a1772d3dd9502c3f6f9a56d690d527e7	2021-04-16 01:26:16 -07:00
h6197627	f02454f957	Fix ChanelShuffle named tensor warnings (#55911 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54846 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55911 Reviewed By: agolynski Differential Revision: D27798078 Pulled By: jbschlosser fbshipit-source-id: 1ebd325ac8a21f82c395d2eafac7ef2ecd1f32b1	2021-04-15 15:36:35 -07:00
Peter Bell	1934725875	Use cascade summation in nll_loss on CPU (#55841 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55657 This also avoids summing `total_weight_val` when weights aren't supplied. Avoiding accumulated error completely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55841 Reviewed By: jbschlosser Differential Revision: D27751492 Pulled By: ngimel fbshipit-source-id: 2c2dc48f31c25dfa9db48693e3f765b179771a3c	2021-04-15 09:10:35 -07:00
S.Cao	416c18b7c9	Add a batch_first arg to Transformer / MHA modules (#55285 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25100 #43112 EDIT: pardon my inexperience since this is my first PR here, that I did not realize the doc should not have any trailing white spaces, and `[E712] comparison to False should be 'if cond is False:' or 'if not cond:'`, now both fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55285 Reviewed By: mruberry Differential Revision: D27765694 Pulled By: jbschlosser fbshipit-source-id: c34774fa065d67c0ac130de20a54e66e608bdbf4	2021-04-14 11:18:42 -07:00
Kurt Mohler	3fe4718d16	Add `padding_idx` argument to EmbeddingBag (#49237 ) Summary: This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction. This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided. Fixes https://github.com/pytorch/pytorch/issues/3194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237 Reviewed By: walterddr, VitalyFedyunin Differential Revision: D26948258 Pulled By: jbschlosser fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc	2021-04-14 09:38:01 -07:00
Vitaly Fedyunin	2bf26965e7	Revert D27710107: [pytorch][PR] Update a `batch_first` arg for transformers like GRU and LSTM. Test Plan: revert-hammer Differential Revision: D27710107 (`2237754b13`) Original commit changeset: c4363a460454 fbshipit-source-id: 5387b5deae6db43f17a7d5e0408a7d24e463d73a	2021-04-13 16:22:23 -07:00
S.Cao	2237754b13	Update a `batch_first` arg for transformers like GRU and LSTM. (#55285 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25100 #43112 EDIT: pardon my inexperience since this is my first PR here, that I did not realize the doc should not have any trailing white spaces, and `[E712] comparison to False should be 'if cond is False:' or 'if not cond:'`, now both fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55285 Reviewed By: ngimel Differential Revision: D27710107 Pulled By: jbschlosser fbshipit-source-id: c4363a4604548c0d84628c4997dd23d6b3afb4d9	2021-04-13 14:54:50 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Xiao Wang	55d45458bd	[cuDNN] Enable Conv3d channels_last_3d (#48430 ) Summary: This PR adds the functionality to use channals_last_3d, aka, NDHWC, in Conv3d. It's only enabled when cuDNN version is greater than or equal to 8.0.5. Todo: - [x] add memory_format test - [x] add random shapes functionality test Close https://github.com/pytorch/pytorch/pull/52547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48430 Reviewed By: mrshenli Differential Revision: D27641452 Pulled By: ezyang fbshipit-source-id: 0e98957cf30c50c3390903d307dd43bdafd28880	2021-04-09 07:56:49 -07:00
zsef123	3498fde20e	Add AccumulateType in AdaptiveAveragePooling3d.cu (#53607 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52719 - Changed the type(`scalar_t`) of intermediate results to `at::acc_type<scalar_t, true>` This issue occurs by decimal precision of the half precision. Follows test cases of upper issue, The value range of input tensors are [0, 1] because init by `rand`. And when the kernel size 1, summations all target values and divide numel of kernel `34d9278c19/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu (L94-L95)` When adding [0, 1] values, if `sum` more than 2048 then not changed values. ( Even if the value is small, the mored exact value is added, but there are still precision issues.) (https://en.wikipedia.org/wiki/Half-precision_floating-point_format) Benchmarks - In V100 32GB, Driver : 450.80, cuda 10.1 - faster than prev <details><summary>Script</summary><p> ```import torch from torch.utils.benchmark import Timer torch.manual_seed(0) kernel_sizes = [1, 3, 5, 7, 9, 11, 13] shapes = [(12, 12, 12), (16, 16, 16), (16, 32, 32), (16, 56, 56), (16, 112, 112)] def run(batch, channel): print(f"Batch : {batch}, Channel : {channel} / (diff, diff / numel, time)") head = "\t".join(f"{str(s):30s}" for s in ["k \ shape"] + shapes) print(head) for kernel_size in kernel_sizes: kernel_size = (kernel_size, kernel_size, kernel_size) pool = torch.nn.AdaptiveAvgPool3d(kernel_size) print(f"{str(kernel_size):30s}", end="\t") for shape in shapes: x_half = torch.rand([batch, channel, shape], dtype=torch.half, device="cuda") x_float = x_half.float() y_half = pool(x_half) y_float = pool(x_float) timer = Timer("pool(x_half)", globals={"pool": pool, "x_half": x_half}) measurement = timer.blocked_autorange(min_run_time=5) diff = (y_float - y_half).abs().sum().item() diff = f"{diff:.4f}, {diff / y_half.numel():.6f}, {measurement.median 1e6 :3.2f}us" print(f"{diff:30s}", end="\t") print("") run(1, 1) run(1, 3) run(1, 54) run(1, 16) run(8, 1) run(8, 16) run(8, 54) import torch m = torch.nn.AdaptiveAvgPool3d((1,1,1)) inputs = torch.rand([8,54,16,56,56]) inputs = inputs.cuda() inputs_2 = inputs.half() print("Float") out = m(inputs).float() print("half") out2 = m(inputs_2).float() print('Discepancies', torch.sum(torch.abs(out2- out)).item(), torch.sum(torch.abs(out2- out)).item() / out.numel() , out.numel()) print("Sum : ", torch.sum(inputs, dim=(2,3,4))[0, 0], torch.sum(inputs_2, dim=(2,3,4))[0, 0]) ``` </p> </details> <details><summary>This commit</summary><p> ``` Batch : 1, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0001, 0.000078, 55.73us 0.0001, 0.000079, 117.51us 0.0000, 0.000003, 379.60us 0.0000, 0.000046, 1046.21us 0.0001, 0.000139, 3897.17us (3, 3, 3) 0.0021, 0.000076, 22.04us 0.0031, 0.000115, 21.47us 0.0022, 0.000080, 41.63us 0.0030, 0.000111, 100.59us 0.0025, 0.000091, 295.04us (5, 5, 5) 0.0103, 0.000083, 21.65us 0.0097, 0.000078, 21.37us 0.0103, 0.000083, 21.60us 0.0114, 0.000091, 25.69us 0.0107, 0.000085, 97.06us (7, 7, 7) 0.0312, 0.000091, 21.52us 0.0290, 0.000084, 21.61us 0.0311, 0.000091, 21.60us 0.0309, 0.000090, 21.44us 0.0334, 0.000097, 33.60us (9, 9, 9) 0.0646, 0.000089, 21.57us 0.0672, 0.000092, 21.89us 0.0662, 0.000091, 21.89us 0.0684, 0.000094, 27.64us 0.0660, 0.000091, 54.85us (11, 11, 11) 0.1251, 0.000094, 21.68us 0.1194, 0.000090, 21.70us 0.1202, 0.000090, 21.72us 0.1233, 0.000093, 22.25us 0.1229, 0.000092, 41.39us (13, 13, 13) 0.2038, 0.000093, 21.57us 0.2047, 0.000093, 21.58us 0.1964, 0.000089, 21.54us 0.2021, 0.000092, 21.94us 0.1989, 0.000091, 40.01us Batch : 1, Channel : 3 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0003, 0.000110, 55.74us 0.0003, 0.000093, 118.62us 0.0003, 0.000093, 382.12us 0.0001, 0.000040, 1052.33us 0.0003, 0.000114, 3917.90us (3, 3, 3) 0.0073, 0.000090, 21.84us 0.0075, 0.000093, 22.25us 0.0072, 0.000089, 41.78us 0.0070, 0.000087, 100.27us 0.0069, 0.000086, 293.96us (5, 5, 5) 0.0353, 0.000094, 22.57us 0.0325, 0.000087, 21.64us 0.0343, 0.000092, 22.63us 0.0338, 0.000090, 25.82us 0.0332, 0.000089, 97.16us (7, 7, 7) 0.0937, 0.000091, 22.50us 0.0910, 0.000088, 21.92us 0.0933, 0.000091, 21.99us 0.0948, 0.000092, 21.56us 0.0928, 0.000090, 34.17us (9, 9, 9) 0.1957, 0.000089, 21.68us 0.1984, 0.000091, 21.57us 0.2025, 0.000093, 22.10us 0.1986, 0.000091, 27.66us 0.2020, 0.000092, 55.32us (11, 11, 11) 0.3585, 0.000090, 21.75us 0.3684, 0.000092, 22.70us 0.3706, 0.000093, 21.67us 0.3752, 0.000094, 21.86us 0.3663, 0.000092, 41.22us (13, 13, 13) 0.5931, 0.000090, 21.67us 0.6056, 0.000092, 21.79us 0.6005, 0.000091, 21.79us 0.6112, 0.000093, 21.69us 0.6034, 0.000092, 40.02us Batch : 1, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0051, 0.000095, 55.76us 0.0060, 0.000112, 118.60us 0.0036, 0.000067, 381.50us 0.0054, 0.000100, 1054.03us 0.0048, 0.000089, 4888.68us (3, 3, 3) 0.1332, 0.000091, 21.66us 0.1344, 0.000092, 22.62us 0.1354, 0.000093, 45.72us 0.1364, 0.000094, 106.63us 0.1324, 0.000091, 448.31us (5, 5, 5) 0.6221, 0.000092, 22.48us 0.6220, 0.000092, 21.71us 0.6053, 0.000090, 27.65us 0.6137, 0.000091, 31.40us 0.6209, 0.000092, 172.78us (7, 7, 7) 1.6859, 0.000091, 22.42us 1.6972, 0.000092, 21.96us 1.6849, 0.000091, 23.14us 1.7012, 0.000092, 26.25us 1.6920, 0.000091, 75.58us (9, 9, 9) 3.5811, 0.000091, 21.73us 3.5746, 0.000091, 22.55us 3.6237, 0.000092, 27.66us 3.6046, 0.000092, 59.71us 3.6392, 0.000092, 168.15us (11, 11, 11) 6.5582, 0.000091, 22.05us 6.5746, 0.000091, 21.74us 6.5955, 0.000092, 32.91us 6.5644, 0.000091, 45.57us 6.5697, 0.000091, 114.01us (13, 13, 13) 10.6384, 0.000090, 21.81us 10.8608, 0.000092, 21.79us 10.8375, 0.000091, 37.01us 10.8662, 0.000092, 51.80us 10.8593, 0.000092, 123.19us Batch : 1, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0015, 0.000093, 55.75us 0.0012, 0.000075, 118.10us 0.0013, 0.000079, 379.25us 0.0012, 0.000075, 1047.21us 0.0013, 0.000079, 4451.57us (3, 3, 3) 0.0407, 0.000094, 21.82us 0.0395, 0.000091, 21.69us 0.0385, 0.000089, 42.07us 0.0397, 0.000092, 100.33us 0.0384, 0.000089, 363.31us (5, 5, 5) 0.1858, 0.000093, 21.76us 0.1799, 0.000090, 21.63us 0.1834, 0.000092, 21.76us 0.1890, 0.000095, 26.04us 0.1814, 0.000091, 135.32us (7, 7, 7) 0.4937, 0.000090, 21.65us 0.5076, 0.000092, 21.69us 0.5001, 0.000091, 22.31us 0.4988, 0.000091, 21.59us 0.5123, 0.000093, 50.03us (9, 9, 9) 1.0678, 0.000092, 21.73us 1.0752, 0.000092, 21.75us 1.0673, 0.000091, 21.75us 1.0649, 0.000091, 30.01us 1.0786, 0.000092, 70.92us (11, 11, 11) 1.9591, 0.000092, 21.57us 1.9522, 0.000092, 21.60us 1.9566, 0.000092, 21.73us 1.9475, 0.000091, 23.46us 1.9323, 0.000091, 55.02us (13, 13, 13) 3.1784, 0.000090, 22.02us 3.2165, 0.000092, 21.95us 3.1969, 0.000091, 21.92us 3.2061, 0.000091, 24.40us 3.2578, 0.000093, 56.00us Batch : 8, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0010, 0.000122, 55.74us 0.0009, 0.000114, 118.82us 0.0006, 0.000074, 379.80us 0.0009, 0.000107, 1047.31us 0.0008, 0.000102, 3900.36us (3, 3, 3) 0.0219, 0.000101, 21.57us 0.0200, 0.000093, 21.61us 0.0194, 0.000090, 41.74us 0.0208, 0.000096, 99.91us 0.0212, 0.000098, 293.03us (5, 5, 5) 0.0906, 0.000091, 21.46us 0.0911, 0.000091, 21.60us 0.0934, 0.000093, 21.93us 0.0927, 0.000093, 25.74us 0.0913, 0.000091, 96.85us (7, 7, 7) 0.2530, 0.000092, 22.53us 0.2526, 0.000092, 22.46us 0.2558, 0.000093, 22.03us 0.2542, 0.000093, 22.29us 0.2475, 0.000090, 34.44us (9, 9, 9) 0.5305, 0.000091, 22.34us 0.5368, 0.000092, 22.42us 0.5265, 0.000090, 21.74us 0.5370, 0.000092, 27.81us 0.5416, 0.000093, 55.65us (11, 11, 11) 0.9887, 0.000093, 21.80us 0.9660, 0.000091, 21.61us 0.9793, 0.000092, 22.11us 0.9719, 0.000091, 21.80us 0.9650, 0.000091, 43.90us (13, 13, 13) 1.6024, 0.000091, 21.87us 1.6198, 0.000092, 22.65us 1.6242, 0.000092, 21.73us 1.6236, 0.000092, 22.59us 1.6025, 0.000091, 42.77us Batch : 8, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0113, 0.000088, 56.66us 0.0117, 0.000091, 119.57us 0.0130, 0.000102, 389.57us 0.0110, 0.000086, 1433.78us 0.0119, 0.000093, 5217.61us (3, 3, 3) 0.3209, 0.000093, 21.54us 0.3184, 0.000092, 22.87us 0.3115, 0.000090, 51.00us 0.3171, 0.000092, 164.17us 0.3182, 0.000092, 500.60us (5, 5, 5) 1.4391, 0.000090, 22.39us 1.4577, 0.000091, 21.69us 1.4601, 0.000091, 53.87us 1.4626, 0.000091, 93.65us 1.4567, 0.000091, 370.11us (7, 7, 7) 4.0501, 0.000092, 22.34us 4.0230, 0.000092, 31.45us 4.0381, 0.000092, 45.19us 4.0171, 0.000091, 65.35us 4.0108, 0.000091, 164.76us (9, 9, 9) 8.5360, 0.000091, 22.80us 8.5456, 0.000092, 27.24us 8.5461, 0.000092, 50.23us 8.5677, 0.000092, 117.63us 8.5645, 0.000092, 270.46us (11, 11, 11) 15.5521, 0.000091, 26.56us 15.5826, 0.000091, 32.81us 15.6014, 0.000092, 63.82us 15.5620, 0.000091, 96.87us 15.5722, 0.000091, 220.24us (13, 13, 13) 25.4146, 0.000090, 32.91us 25.7898, 0.000092, 38.48us 25.6698, 0.000091, 72.02us 25.8193, 0.000092, 121.73us 25.7718, 0.000092, 249.71us Batch : 8, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0377, 0.000087, 109.07us 0.0405, 0.000094, 233.17us 0.0392, 0.000091, 998.97us 0.0393, 0.000091, 2960.68us 0.0408, 0.000094, 11879.53us (3, 3, 3) 1.0660, 0.000091, 25.68us 1.0761, 0.000092, 64.12us 1.0725, 0.000092, 182.50us 1.0801, 0.000093, 505.82us 1.0736, 0.000092, 1650.21us (5, 5, 5) 4.9587, 0.000092, 50.84us 4.9336, 0.000091, 47.38us 4.9696, 0.000092, 158.49us 4.9347, 0.000091, 237.39us 4.9303, 0.000091, 965.13us (7, 7, 7) 13.5409, 0.000091, 45.60us 13.5736, 0.000092, 87.45us 13.5012, 0.000091, 141.63us 13.6111, 0.000092, 181.51us 13.5296, 0.000091, 469.77us (9, 9, 9) 28.7817, 0.000091, 58.01us 28.7969, 0.000091, 77.61us 28.8761, 0.000092, 159.33us 28.8786, 0.000092, 334.47us 28.8093, 0.000091, 786.72us (11, 11, 11) 52.4453, 0.000091, 78.19us 52.7265, 0.000092, 95.12us 52.7322, 0.000092, 200.38us 52.6342, 0.000092, 282.41us 52.6467, 0.000092, 652.54us (13, 13, 13) 85.7411, 0.000090, 98.85us 86.7183, 0.000091, 115.28us 86.8545, 0.000092, 232.34us 86.9997, 0.000092, 367.32us 86.9083, 0.000092, 757.73us Float half Discepancies 0.03963914513587952 9.175728040712852e-05 432 Sum : tensor(25110.1484, device='cuda:0') tensor(25104., device='cuda:0', dtype=torch.float16) ``` </p> </details> <details><summary>1.8.0</summary><p> ``` Batch : 1, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0023, 0.002275, 74.35us 0.0040, 0.003985, 159.73us 0.3740, 0.374021, 546.59us 0.4587, 0.458663, 1543.16us 0.4906, 0.490637, 5945.97us (3, 3, 3) 0.0100, 0.000370, 20.37us 0.0230, 0.000852, 22.12us 0.0309, 0.001143, 54.75us 0.0520, 0.001926, 129.78us 7.1219, 0.263775, 377.11us (5, 5, 5) 0.0441, 0.000352, 20.06us 0.0394, 0.000316, 20.50us 0.0759, 0.000607, 26.43us 0.1499, 0.001199, 32.01us 0.2707, 0.002166, 128.15us (7, 7, 7) 0.0791, 0.000231, 20.10us 0.1002, 0.000292, 20.56us 0.1812, 0.000528, 20.48us 0.2424, 0.000707, 20.83us 0.4994, 0.001456, 43.97us (9, 9, 9) 0.1122, 0.000154, 20.55us 0.1778, 0.000244, 20.44us 0.2572, 0.000353, 20.15us 0.4149, 0.000569, 35.64us 0.7208, 0.000989, 68.46us (11, 11, 11) 0.2044, 0.000154, 20.47us 0.2647, 0.000199, 20.62us 0.3867, 0.000291, 20.61us 0.6059, 0.000455, 23.54us 1.0902, 0.000819, 53.32us (13, 13, 13) 0.3094, 0.000141, 20.53us 0.3843, 0.000175, 20.60us 0.5756, 0.000262, 20.80us 0.8598, 0.000391, 24.52us 1.4853, 0.000676, 47.70us Batch : 1, Channel : 3 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0054, 0.001801, 74.36us 0.0108, 0.003614, 158.94us 1.1183, 0.372768, 547.67us 1.3782, 0.459387, 1545.27us 1.4685, 0.489505, 5949.17us (3, 3, 3) 0.0308, 0.000380, 20.14us 0.0502, 0.000619, 22.11us 0.1210, 0.001493, 54.80us 0.1900, 0.002345, 130.47us 21.3483, 0.263560, 375.68us (5, 5, 5) 0.1179, 0.000314, 20.68us 0.1326, 0.000354, 20.53us 0.2662, 0.000710, 26.51us 0.4116, 0.001098, 31.85us 0.8369, 0.002232, 128.19us (7, 7, 7) 0.2335, 0.000227, 20.40us 0.3057, 0.000297, 20.43us 0.4954, 0.000481, 20.31us 0.7339, 0.000713, 20.74us 1.4208, 0.001381, 44.55us (9, 9, 9) 0.3326, 0.000152, 20.63us 0.5353, 0.000245, 20.42us 0.8025, 0.000367, 20.13us 1.2693, 0.000580, 35.64us 2.2096, 0.001010, 68.88us (11, 11, 11) 0.6121, 0.000153, 20.59us 0.8086, 0.000202, 20.42us 1.1700, 0.000293, 20.71us 1.8170, 0.000455, 23.54us 3.2117, 0.000804, 53.36us (13, 13, 13) 0.9165, 0.000139, 20.51us 1.1395, 0.000173, 20.56us 1.7343, 0.000263, 20.80us 2.5868, 0.000392, 24.59us 4.5823, 0.000695, 47.77us Batch : 1, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.1092, 0.002023, 75.45us 0.1709, 0.003165, 160.44us 20.2452, 0.374911, 548.61us 24.7990, 0.459240, 1550.34us 26.4494, 0.489804, 6957.79us (3, 3, 3) 0.5352, 0.000367, 20.58us 1.0281, 0.000705, 24.14us 2.0150, 0.001382, 59.12us 3.3069, 0.002268, 138.23us 384.5216, 0.263732, 529.71us (5, 5, 5) 2.0739, 0.000307, 20.60us 2.5199, 0.000373, 20.44us 4.6916, 0.000695, 33.89us 7.9482, 0.001178, 37.74us 14.2553, 0.002112, 200.54us (7, 7, 7) 4.2236, 0.000228, 20.61us 5.5605, 0.000300, 20.97us 9.0440, 0.000488, 26.40us 12.7847, 0.000690, 30.64us 25.3050, 0.001366, 88.05us (9, 9, 9) 6.0817, 0.000154, 20.63us 9.5416, 0.000242, 20.84us 14.2416, 0.000362, 32.47us 22.8452, 0.000580, 78.57us 40.3246, 0.001024, 194.50us (11, 11, 11) 11.1144, 0.000155, 20.56us 14.5581, 0.000203, 20.91us 20.8263, 0.000290, 38.07us 33.0004, 0.000459, 52.74us 57.3275, 0.000798, 137.19us (13, 13, 13) 16.5176, 0.000139, 21.26us 20.8089, 0.000175, 22.33us 31.3433, 0.000264, 42.93us 45.9733, 0.000388, 59.84us 82.8301, 0.000698, 138.42us Batch : 1, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0274, 0.001715, 74.99us 0.0485, 0.003034, 159.92us 5.9925, 0.374529, 546.35us 7.3389, 0.458679, 1544.53us 7.8354, 0.489714, 6677.00us (3, 3, 3) 0.1560, 0.000361, 20.72us 0.3043, 0.000704, 22.37us 0.5838, 0.001352, 54.97us 1.0455, 0.002420, 130.57us 113.9739, 0.263828, 463.43us (5, 5, 5) 0.6121, 0.000306, 20.12us 0.7247, 0.000362, 20.73us 1.3740, 0.000687, 26.59us 2.3794, 0.001190, 32.12us 4.1929, 0.002096, 165.81us (7, 7, 7) 1.2389, 0.000226, 20.59us 1.6311, 0.000297, 20.53us 2.6732, 0.000487, 20.37us 3.7501, 0.000683, 20.71us 7.4575, 0.001359, 59.16us (9, 9, 9) 1.7983, 0.000154, 20.64us 2.8075, 0.000241, 20.59us 4.2165, 0.000361, 20.38us 6.7153, 0.000576, 38.29us 12.0530, 0.001033, 86.33us (11, 11, 11) 3.3326, 0.000156, 20.56us 4.3061, 0.000202, 20.67us 6.2235, 0.000292, 20.47us 9.8009, 0.000460, 27.41us 16.9994, 0.000798, 68.49us (13, 13, 13) 4.9016, 0.000139, 20.63us 6.1261, 0.000174, 20.65us 9.2106, 0.000262, 20.93us 13.5843, 0.000386, 27.95us 24.6476, 0.000701, 64.88us Batch : 8, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0170, 0.002122, 74.99us 0.0316, 0.003946, 160.66us 3.0013, 0.375158, 546.94us 3.6780, 0.459753, 1544.58us 3.9197, 0.489966, 5948.43us (3, 3, 3) 0.0821, 0.000380, 20.27us 0.1559, 0.000722, 22.29us 0.3133, 0.001450, 54.72us 0.5100, 0.002361, 130.12us 57.0481, 0.264111, 376.71us (5, 5, 5) 0.3075, 0.000307, 20.57us 0.3680, 0.000368, 20.69us 0.6786, 0.000679, 26.61us 1.1744, 0.001174, 31.77us 2.0654, 0.002065, 128.31us (7, 7, 7) 0.6512, 0.000237, 20.60us 0.8359, 0.000305, 20.50us 1.3712, 0.000500, 20.75us 1.9472, 0.000710, 20.92us 3.7586, 0.001370, 44.59us (9, 9, 9) 0.9138, 0.000157, 20.43us 1.4198, 0.000243, 20.58us 2.1018, 0.000360, 20.52us 3.3691, 0.000578, 35.90us 5.9491, 0.001020, 69.16us (11, 11, 11) 1.6606, 0.000156, 20.63us 2.1599, 0.000203, 20.57us 3.1240, 0.000293, 20.98us 4.8874, 0.000459, 24.65us 8.4780, 0.000796, 56.47us (13, 13, 13) 2.4987, 0.000142, 20.71us 3.0667, 0.000174, 20.45us 4.6387, 0.000264, 20.76us 6.8187, 0.000388, 25.95us 12.2077, 0.000695, 50.46us Batch : 8, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.2635, 0.002059, 75.66us 0.4030, 0.003149, 161.78us 48.0296, 0.375231, 550.46us 58.7787, 0.459209, 1902.41us 62.6966, 0.489817, 7817.48us (3, 3, 3) 1.2271, 0.000355, 20.72us 2.4185, 0.000700, 26.44us 4.6933, 0.001358, 64.66us 7.7016, 0.002228, 192.69us 912.0736, 0.263910, 593.69us (5, 5, 5) 4.8716, 0.000304, 24.75us 5.8624, 0.000366, 21.39us 11.0705, 0.000692, 66.94us 18.9280, 0.001183, 104.93us 34.0512, 0.002128, 441.81us (7, 7, 7) 10.1713, 0.000232, 20.98us 13.2273, 0.000301, 36.26us 21.5426, 0.000491, 52.18us 30.1910, 0.000688, 72.94us 59.8381, 0.001363, 191.52us (9, 9, 9) 14.4542, 0.000155, 23.85us 22.6579, 0.000243, 30.59us 33.8839, 0.000363, 57.40us 54.3563, 0.000583, 142.53us 95.8123, 0.001027, 309.24us (11, 11, 11) 26.3348, 0.000155, 30.07us 34.3043, 0.000201, 37.01us 49.8093, 0.000292, 74.04us 78.3720, 0.000460, 110.53us 136.5404, 0.000801, 264.14us (13, 13, 13) 39.3550, 0.000140, 37.38us 49.3207, 0.000175, 43.51us 74.1139, 0.000264, 83.70us 108.7627, 0.000387, 136.09us 196.5412, 0.000699, 280.16us Batch : 8, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.8467, 0.001960, 147.36us 1.3993, 0.003239, 314.95us 162.0182, 0.375042, 1327.22us 198.3226, 0.459080, 3921.79us 211.6123, 0.489843, 15646.94us (3, 3, 3) 4.3146, 0.000370, 29.23us 8.1125, 0.000696, 74.94us 15.8886, 0.001362, 223.69us 26.2404, 0.002250, 601.33us 3076.5354, 0.263763, 1974.06us (5, 5, 5) 16.5032, 0.000306, 58.79us 19.6887, 0.000365, 53.79us 37.2731, 0.000690, 192.34us 63.3076, 0.001172, 270.01us 114.8880, 0.002128, 1148.56us (7, 7, 7) 34.0802, 0.000230, 51.12us 44.4087, 0.000300, 100.93us 72.4613, 0.000489, 161.48us 101.9317, 0.000688, 202.91us 201.8955, 0.001363, 545.33us (9, 9, 9) 48.8179, 0.000155, 65.78us 76.3465, 0.000242, 87.48us 114.0228, 0.000362, 179.11us 182.9805, 0.000581, 403.66us 322.7040, 0.001025, 894.86us (11, 11, 11) 88.9993, 0.000155, 88.69us 116.4213, 0.000202, 107.55us 168.3363, 0.000293, 228.71us 264.2232, 0.000460, 322.84us 459.1324, 0.000799, 784.25us (13, 13, 13) 132.7447, 0.000140, 112.91us 165.4525, 0.000174, 131.08us 249.7127, 0.000263, 266.43us 367.0824, 0.000387, 410.17us 663.1367, 0.000699, 847.87us Float half Discepancies 198.37625122070312 0.4592042852331091 432 Sum : tensor(25110.1484, device='cuda:0') tensor(25104., device='cuda:0', dtype=torch.float16) ``` </p> </details> ngimel malfet anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53607 Reviewed By: mruberry Differential Revision: D27652337 Pulled By: ngimel fbshipit-source-id: 6439c0cafe6ca3f761a3f5d058050a55e9a0abd8	2021-04-08 15:48:08 -07:00
lezcano	d3d7f57c2c	Fix a problem when removing parametrizations (#55456 ) Summary: There was an error when removing a parametrization with `leave_parametrized=True`. It had escaped the previous tests. This PR should fix that. Edit. I also took this chance to fix a few mistakes that the documentation had, and to also write the `set_original_` in a more compact way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55456 Reviewed By: mrshenli Differential Revision: D27620481 Pulled By: albanD fbshipit-source-id: f1298ddbcf24566ef48850c62a1eb4d8a3576152	2021-04-08 06:39:28 -07:00
Maxim Grechkin	38a08a49ea	Flip clip_grad_norm default for error_if_nonfinite to false (#55169 ) Summary: Non-backwards-compatible change introduced in https://github.com/pytorch/pytorch/pull/53843 is tripping up a lot of code. Better to set it to False initially and then potentially flip to True in the later version to give people time to adapt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55169 Reviewed By: mruberry Differential Revision: D27511150 Pulled By: jbschlosser fbshipit-source-id: 1ac018557c0900b31995c29f04aea060a27bc525	2021-04-02 12:25:32 -07:00
Alexander Golynski	978fca64a6	Revert D25399470: add channels last for MaxPool2d Test Plan: revert-hammer Differential Revision: D25399470 (`f43eb59a68`) Original commit changeset: b49b9581f132 fbshipit-source-id: ab8c053964aeecf196f6d932c63ada51a3b7ced8	2021-04-02 10:15:11 -07:00
mingfeima	f43eb59a68	add channels last for MaxPool2d (#48917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48917 max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399470 Pulled By: VitalyFedyunin fbshipit-source-id: b49b9581f1329a8c2b9c75bb10f12e2650e4c65a	2021-04-02 09:13:06 -07:00
Michael Melesse	26c1e2ee83	[ROCM] enable miopen for rnn f16 (#52475 ) Summary: This PR enables using MIOpen for RNN FP16 on ROCM. It does this by altering use_miopen to allow fp16. In the special case where LSTMs use projections we use the default implementation, as it is not implemented in MIOpen at this time. We do send out a warning once to let the user know. We then remove the various asserts that are no longer necessary since we handle the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52475 Reviewed By: H-Huang Differential Revision: D27449150 Pulled By: malfet fbshipit-source-id: 06499adb94f28d4aad73fa52890d6ba361937ea6	2021-03-31 14:39:54 -07:00
Joel Schlosser	0bd96458ba	Revert D26820202: Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants Test Plan: revert-hammer Differential Revision: D26820202 (`f9097c43b9`) Original commit changeset: 3e8f09523329 fbshipit-source-id: 5742b69a96ce1c848d75348d0f761cf66a69cbf3	2021-03-31 13:57:44 -07:00
Arindam Roy	b907d6e3b6	[ROCm] skip some tests to enable 4.1 CI upgrade (#54536 ) Summary: Skips the tests indicated as failing in https://github.com/pytorch/pytorch/issues/54535. During the ROCm CI upgrade from 4.0.1 to 4.1, some tests regressed. Specifically, FFT tests in test_spectral_ops.py and test_grid_sample in test_nn.py. In order to keep a passing CI signal, we need to disable these temporarily. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54536 Reviewed By: H-Huang Differential Revision: D27442974 Pulled By: malfet fbshipit-source-id: 07dffb957757a5fc7afaa5bf78b935a427251ef4	2021-03-30 17:49:45 -07:00
Edward Yang	6c8d783830	Generate no-op meta functions for all inplace operations (#54901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54901 Some subtleties: - Need to make sure not to clobber composite definitions when deciding when to generate - I was lazy and so I didn't make inplace on TensorList work, nor did I make inplace functions that returned void work - A few tests started complaining that these noop meta functions weren't raising the errors they needed. This is tracked in https://github.com/pytorch/pytorch/issues/54897 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27407232 Pulled By: ezyang fbshipit-source-id: 5e706a267496368acdafd128942c310954e43d29	2021-03-30 09:31:39 -07:00
Peter Bell	2503028ff5	Fix ConvTranspose with padding as a list of values (#54911 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54452 The assertion that fails in the issue is necessary to appease mypy. Instead, I fix `_ntuple` to always return a `tuple`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54911 Reviewed By: H-Huang Differential Revision: D27411088 Pulled By: jbschlosser fbshipit-source-id: 7f5045c58dd4f5f3b07b4826d9b4ca85606c5bce	2021-03-30 07:37:31 -07:00
Zheng Yan	f9097c43b9	Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants (#53655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53655 Currently EmbeddingBag and it variants support either int32 or int64 indices/offsets. We have use cases where there are mix of int32 and int64 indices which are not supported yet. To avoid introducing too many branches we could simply cast offsets type to indices type when they are not the same. Test Plan: unit tests Reviewed By: qizzzh Differential Revision: D26820202 fbshipit-source-id: 3e8f09523329ea12393ea92ee9a6315aa40a0b7f	2021-03-29 23:58:03 -07:00
Kurt Mohler	3ddc6174da	Raise error in clip_grad_norm_ if norm is non-finite (#53843 ) Summary: BC-breaking note: This change throws errors for cases that used to silently pass. The old behavior can be obtained by setting `error_if_nonfinite=False` Fixes https://github.com/pytorch/pytorch/issues/46849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53843 Reviewed By: malfet Differential Revision: D27291838 Pulled By: jbschlosser fbshipit-source-id: 216d191b26e1b5919a44a3af5cde6f35baf825c4	2021-03-29 08:41:21 -07:00
Brian Hirsh	86b1f4e9f2	fix silent correctness bug with channels_last usage of upsample cuda kernels (#54744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54744 Fixes https://github.com/pytorch/pytorch/issues/54590 After the porting the upsample operators to be structured, they now forward memory_format information to the output. This is a problem for the cuda kernels, which are not implemented to deal with `torch.channels_last` memory format. The operators are: * upsample_nearest2d * upsample_bilinear2d * upsample_nearest3d * upsample_trilinear3d This fix just allocates a temporary, contiguous output tensor when that happens, writes the results to the temporary and copies the results back to the output tensor. I held off on adding tests to get the fix out quickly, but I wrote a script and ran some manual tests, that basically just asserts that the outputs are the same for cpu and cuda, for some threshold. I ran it for all 4 operators: ``` import torch def basically_equal(t1, t2): epsilon = 1e-4 diffs = torch.abs(t1 - t2) print(torch.all(diffs < 1e-4)) # upsample 2d a = torch.arange(48).reshape(2, 2, 3, 4).contiguous(memory_format=torch.channels_last).float() out_cpu = torch.nn.functional.interpolate(a, scale_factor=2, mode='nearest') out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=2, mode='nearest') basically_equal(out_cpu, out_cuda.to("cpu")) out_cpu = torch.nn.functional.interpolate(a, scale_factor=2, mode='bilinear', align_corners=True) out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=2, mode='bilinear', align_corners=True) basically_equal(out_cpu, out_cuda.to("cpu")) # upsample 3d a = torch.arange(96).reshape(2, 2, 2, 3, 4).contiguous(memory_format=torch.channels_last_3d).float() out_cpu = torch.nn.functional.interpolate(a, scale_factor=3, mode='nearest') out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=3, mode='nearest') basically_equal(out_cpu, out_cuda.to("cpu")) out_cpu = torch.nn.functional.interpolate(a, scale_factor=3, mode='trilinear', align_corners=True) out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=3, mode='trilinear', align_corners=True) basically_equal(out_cpu, out_cuda.to("cpu")) ``` prints ``` tensor(True) tensor(True) tensor(True) tensor(True) ``` One thing that was weird- `upsample_bilinear2d` and `upsample_trilinear3d` were only accurate across cpu/cuda with an epsilon of `1e-4`. That tentatively sounds close enough to say that cuda isn't "wrong" (?), but that's not exactly "equal"... and I also ran the script before my change, and `bilinear2d` and `trilinear3d` were also the same across cpu/cuda with an epsilon of `1e-4`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27351393 Pulled By: bdhirsh fbshipit-source-id: b33f46e4855dc8b49b363770190b639beebbf5a7	2021-03-29 06:42:03 -07:00
Thomas Viehmann	d12118c0aa	Handle stride > 1 with im2col in CUDA thnn conv2d (#54080 ) Summary: The fallback thnn 2d convolution uses `im2col` to get patches and `gemm` to implement convolution . I has a shortcut to use `gemm` directly for kernel size 1, but this only works for stride == 1 and padding == 0. This PR adds checks for stride == 1 and padding == 0 to determining whether `im2col` can be skipped. Fixes https://github.com/pytorch/pytorch/issues/54036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54080 Reviewed By: ejguan Differential Revision: D27170482 Pulled By: zou3519 fbshipit-source-id: 055d6502239d34945934de409d78144d8a5c56f4	2021-03-25 09:53:49 -07:00
haozhe.zhu	947ab84fd2	enable_and_enhance_bf16_threshold (#54384 ) Summary: enable_and_enhance_bf16_threshold Pull Request resolved: https://github.com/pytorch/pytorch/pull/54384 Reviewed By: ngimel Differential Revision: D27286323 Pulled By: mruberry fbshipit-source-id: 517fa94764d8202bbcbf94011d2d48f716fbd01b	2021-03-24 22:46:20 -07:00
Xiang Gao	9f336bdf10	Fixes new tf32 failures in test_nn.py (#52871 ) Summary: Also modify the `tf32_on_and_off` decorator to make it support function without `device` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52871 Reviewed By: ngimel Differential Revision: D27286674 Pulled By: mruberry fbshipit-source-id: 14f6d558271bd6a1d0bc40691c170d47e81de1ff	2021-03-24 21:53:33 -07:00
Peter Bell	04e0cbf5a9	Add padding='same' mode to conv{1,2,3}d (#45667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45667 First part of #3867 (Pooling operators still to do) This adds a `padding='same'` mode to the interface of `conv{n}d`and `nn.Conv{n}d`. This should match the behaviour of `tensorflow`. I couldn't find it explicitly documented but through experimentation I found `tensorflow` returns the shape `ceil(len/stride)` and always adds any extra asymmetric padding onto the right side of the input. Since the `native_functions.yaml` schema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically padded `conv{n}d` and the `_conv{n}d_same` variant. Underscores because I couldn't see any way to avoid exporting a function into the `torch` namespace. A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use `TORCH_WARN_ONCE` to notify the user of the performance implications. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27170744 Pulled By: jbschlosser fbshipit-source-id: b3d8a0380e0787ae781f2e5d8ee365a7bfd49f22	2021-03-18 16:22:03 -07:00
Vitaly Fedyunin	ce2f71836c	Disabling dispatch to OneDNN for group convolutions when groups size = 24 * n (#53991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53991 Reviewed By: malfet Differential Revision: D27048155 Pulled By: VitalyFedyunin fbshipit-source-id: 5009f064220156ca14e1eb97172cfd4f7531b2a9	2021-03-15 19:30:19 -07:00
Yi Wang	d726ce6668	Support loading a non-DP/DDP model from a DP/DDP state_dict (#53224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53224 Loading a DP/DDP dict just needs to strip the module prefix from all items in the state dict and the metadata. One existing example is here: https://github.com/facebookresearch/fvcore/blob/master/fvcore/common/checkpoint.py#L239. #Closes: https://github.com/pytorch/pytorch/issues/41048/ ghstack-source-id: 123722976 Test Plan: buck test mode/dev-nosan caffe2/test:nn -- test_load_state_dict buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_save_load_checkpoint Reviewed By: rohan-varma, mrshenli Differential Revision: D26798495 fbshipit-source-id: 035c7d0907d7ae8f0d7ca21ec71f7f96ef8df6c8	2021-03-11 18:43:33 -08:00
Jagadish Krishnamoorthy	0a549f9412	[ROCm] Disable flaky tests on ROCm (#53192 ) Summary: The disabled tests are tracked by https://github.com/pytorch/pytorch/issues/53190 Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53192 Reviewed By: zhangguanheng66 Differential Revision: D26782204 Pulled By: mrshenli fbshipit-source-id: bc90b182c236249961da1f0d4894d29f6b44fa27	2021-03-11 08:29:12 -08:00
Brian Hirsh	c68cc24cee	update upsample tests in test_nn.py to test for memory_format (#53665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53665 ngimel pointed out to me where we already test the behavior of the `Upsample` ops in `test_nn.py`. This PR deleting my bespoke tests in `test_torch.py` and updates those in `test_nn.py` to test memory format properly. There were two reasons the original test didn't pick up on a memory format regression: - They didn't test the memory format of the output tensor explicitly, i.e. `output.is_contiguous(memory_format=...)` - Even with that change, the test tensors were to simple to fail the tests. From some trial and error, it looks like one of the first two dimensions in the inputs needs to be > 1 in order for the `channels_last` memory format to actually re-order the strides. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26929683 Pulled By: bdhirsh fbshipit-source-id: d17bc660ff031e9b3e2c93c60a9e9308e56ea612	2021-03-10 14:21:14 -08:00
Thomas Viehmann	e13ef777a7	Use native ctc loss for target length 256 (#53557 ) Summary: Apparently cudnn (8.1) does not like 256-long targets. Thank you raotnameh for reporting. Fixes https://github.com/pytorch/pytorch/issues/53505 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53557 Reviewed By: VitalyFedyunin Differential Revision: D26947262 Pulled By: albanD fbshipit-source-id: df6da7db8fd8e35050b4303ff1658646ebc60141	2021-03-10 10:13:42 -08:00
kshitij12345	45ddf113c9	[fix] nn.Embedding: allow changing the padding vector (#53447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53447 Reviewed By: albanD Differential Revision: D26946284 Pulled By: jbschlosser fbshipit-source-id: 54e5eec7da86fa02b1b6e4a235d66976a80764fc	2021-03-10 09:53:27 -08:00
Tomasz Grzegorzek	a3465214ba	move rnn cell size check to cpp (#51964 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32193. Possible further improvements: - do the same for quantized cells - reuse newly written functions in `56034636b9/torch/csrc/api/src/nn/modules/rnn.cpp (L699-L715)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51964 Reviewed By: albanD Differential Revision: D26757050 Pulled By: ngimel fbshipit-source-id: 9c917d9124de2b914ad9915c79af675ae561295a	2021-03-09 15:02:20 -08:00
Xiao Wang	ef3765b992	Fix a cuda max_pool3d issue, do multiplication in int64 (#52828 ) Summary: Fix https://github.com/pytorch/pytorch/issues/52822 - [x] benchmark Pull Request resolved: https://github.com/pytorch/pytorch/pull/52828 Reviewed By: mrshenli Differential Revision: D26866674 Pulled By: heitorschueroff fbshipit-source-id: bd8276dd70316a767dc6e1991c1259f1f0b390b2	2021-03-09 10:54:43 -08:00
lezcano	7aeee2849b	Parametrization Functionality (#33344 ) Summary: Provides the implementation for feature request issue https://github.com/pytorch/pytorch/issues/28937. Adds the `Parametrization` functionality and implements `Pruning` on top of it. It adds the `auto` mode, on which the parametrization is just computed once per forwards pass. The previous implementation computed the pruning on every forward, which is not optimal when pruning RNNs for example. It implements a caching mechanism for parameters. This is implemented through the mechanism proposed at the end of the discussion https://github.com/pytorch/pytorch/issues/7313. In particular, it assumes that the user will not manually change the updated parameters between the call to `backwards()` and the `optimizer.step()`. If they do so, they would need to manually call the `.invalidate()` function provided in the implementation. This could be made into a function that gets a model and invalidates all the parameters in it. It might be the case that this function has to be called in the `.cuda()` and `.to` and related functions. As described in https://github.com/pytorch/pytorch/issues/7313, this could be used, to implement in a cleaner way the `weight_norm` and `spectral_norm` functions. It also allows, as described in https://github.com/pytorch/pytorch/issues/28937, for the implementation of constrained optimization on manifolds (i.e. orthogonal constraints, positive definite matrices, invertible matrices, weights on the sphere or the hyperbolic space...) TODO (when implementation is validated): - More thorough test - Documentation Resolves https://github.com/pytorch/pytorch/issues/28937 albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/33344 Reviewed By: zhangguanheng66 Differential Revision: D26816708 Pulled By: albanD fbshipit-source-id: 07c8f0da661f74e919767eae31335a9c60d9e8fe	2021-03-04 12:45:27 -08:00
Joel Schlosser	e86476f736	Huber loss (#50553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48595. ## Background This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant. I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows: ``` Huber loss calls dedicated Huber kernel: 2,795,300 Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612 ``` With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553 Test Plan: ``` python test/test_nn.py TestNN.test_HuberLoss python test/test_nn.py TestNN.test_HuberLoss_delta python test/test_nn.py TestNN.test_huber_loss_invalid_delta python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda python test/test_nn.py TestNN.test_loss_equal_input_target_shape python test/test_nn.py TestNN.test_pointwise_loss_broadcast python test/test_overrides.py python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss python test/test_type_hints.py python test/test_cpp_api_parity.py build/bin/test_api ``` ## Documentation <img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png"> <img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png"> <img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png"> <img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png"> <img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png"> Reviewed By: H-Huang Differential Revision: D26734071 Pulled By: jbschlosser fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5	2021-03-02 17:30:45 -08:00
Thomas J. Fan	e2ecfb60a6	FIX Validates target in cosine_embedding (#53110 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53030 This PR validates the target for `cosine_embedding_loss`. This is consistent with how `cross_entropy` handles non 1d targets: ```py import torch import torch.nn.functional as F input = torch.randn(3, 5, requires_grad=True) target = torch.randint(5, (3, 1)) # Raises RuntimeError loss = F.cross_entropy(input, target) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53110 Reviewed By: VitalyFedyunin Differential Revision: D26766579 Pulled By: jbschlosser fbshipit-source-id: 73ad559ff9376543b6528a36af094e82eb6f9735	2021-03-02 16:50:44 -08:00
Edward Yang	baed2cfe01	Back out "Revert D26753571: [pytorch][PR] add submodules to sys.modules so their attributes can be pickled" (#53127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53127 Original commit changeset: cc9cc4f508af ghstack-source-id: 122871468 Test Plan: run flake8 on the files locally Reviewed By: malfet, janeyx99 Differential Revision: D26757859 fbshipit-source-id: 7e7bde5c1f2b434442079656e2186b500d53fdc2	2021-03-02 14:46:56 -08:00
Edward Yang	2d7119f943	Revert D26753571: [pytorch][PR] add submodules to sys.modules so their attributes can be pickled Test Plan: revert-hammer Differential Revision: D26753571 (`fbf9745c85`) Original commit changeset: 2bda03bab39f fbshipit-source-id: cc9cc4f508af122b0fdec7f8475343bd9badb9db	2021-03-02 11:11:31 -08:00
Kyle Chen	d8ef3a4793	[ROCm] Enable test cases in test_nn.py for ROCm (#52836 ) Summary: Enabling tests in test_nn.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52836 Reviewed By: H-Huang Differential Revision: D26725891 Pulled By: mruberry fbshipit-source-id: 59655a2515ddce92ffc4c55dcf6f28257c05e3c9	2021-03-02 10:56:07 -08:00
mattip	fbf9745c85	add submodules to sys.modules so their attributes can be pickled (#53107 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38137 As mentioned in the issue, this is a workaround for [python issue 43367](https://bugs.python.org/issue43367). There are a number of other places where `sys.modules` is modified, if something changes in python perhaps those should be reviewed as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53107 Reviewed By: zou3519 Differential Revision: D26753571 Pulled By: ezyang fbshipit-source-id: 2bda03bab39ff9ca58ce4bc13befe021da91b9c4	2021-03-02 10:47:21 -08:00
Xiang Gao	a6b7da7dfe	Add 64bit indexing support for softmax (#52713 ) Summary: fixes https://github.com/pytorch/pytorch/issues/52715 https://github.com/pytorch/pytorch/issues/52716 split across batch dimension Pull Request resolved: https://github.com/pytorch/pytorch/pull/52713 Reviewed By: ailzhang Differential Revision: D26640033 Pulled By: ngimel fbshipit-source-id: f169cb0d6abc1cfbddf658d9775759a7d56f5c12	2021-02-24 21:39:58 -08:00
Nikita Shulga	59ac0ff037	Change `maybe_resize_storage_cpu` `new_size` arg to unsigned (#52671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52671 Code is written with the assumption that new_size is unsigned value, and when function is called with negative value it silently returns a nullptr rather than raise an exception. Fix above-mentioned logic by converting new_size to unsigned type and let cpu_allocator raise exception on negative alloc. Unroll nested if blocks by returning early if new_size is 0 Add TestNN.test_adaptive_pooling_size_overflow to indirecty validate the fix. Fixes https://github.com/pytorch/pytorch/issues/50960 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26607549 Pulled By: malfet fbshipit-source-id: e3d4f7548b098f24fa5aba42d8f4e9288ece1e2e	2021-02-24 09:50:28 -08:00
Joel Schlosser	a39b1c42c1	MHA: Fix regression and apply bias flag to both in/out proj (#52537 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52257 ## Background Reverts MHA behavior for `bias` flag to that of v1.5: flag enables or disables both in and out projection biases. Updates type annotations for both in and out projections biases from `Tensor` to `Optional[Tensor]` for `torch.jit.script` usage. Note: With this change, `_LinearWithBias` defined in `torch/nn/modules/linear.py` is no longer utilized. Completely removing it would require updates to quantization logic in the following files: ``` test/quantization/test_quantized_module.py torch/nn/quantizable/modules/activation.py torch/nn/quantized/dynamic/modules/linear.py torch/nn/quantized/modules/linear.py torch/quantization/quantization_mappings.py ``` This PR takes a conservative initial approach and leaves these files unchanged. Is it safe to fully remove `_LinearWithBias`? Pull Request resolved: https://github.com/pytorch/pytorch/pull/52537 Test Plan: ``` python test/test_nn.py TestNN.test_multihead_attn_no_bias ``` ## BC-Breaking Note In v1.6, the behavior of `MultiheadAttention`'s `bias` flag was incorrectly changed to affect only the in projection layer. That is, setting `bias=False` would fail to disable the bias for the out projection layer. This regression has been fixed, and the `bias` flag now correctly applies to both the in and out projection layers. Reviewed By: bdhirsh Differential Revision: D26583639 Pulled By: jbschlosser fbshipit-source-id: b805f3a052628efb28b89377a41e06f71747ac5b	2021-02-22 14:47:12 -08:00
kshitij12345	ad3319cbc2	fractional_max_pool{2/3}d : Fix segfaults for incorrect kernel_size and output_size (#51626 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50967 TODO: * [x] Add test for `fractional_max_pool3d` similar to `fractional_max_pool2d` (since there is no test for the same). Needs Resolution: * [ ] ASAN failure on the newly added 3d variant test. https://app.circleci.com/pipelines/github/pytorch/pytorch/269483/workflows/8426b3b7-9a35-4032-a57a-729964a4a5ff/jobs/10673756 * [ ] Failing gradcheck on MacOS. https://app.circleci.com/pipelines/github/pytorch/pytorch/269483/workflows/8426b3b7-9a35-4032-a57a-729964a4a5ff/jobs/10673101 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51626 Reviewed By: jbschlosser Differential Revision: D26514064 Pulled By: heitorschueroff fbshipit-source-id: e2cc57585dbc3a08c7f24591b202e0fabfd2a459	2021-02-22 12:06:36 -08:00
Gregory Chanan	f72b4b83fe	Fix upsample bicubic2d batching handling on CPU. (#52389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52389 Fixes: https://github.com/pytorch/pytorch/issues/49159 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26496319 Pulled By: gchanan fbshipit-source-id: d385cd683ef09e0596a9875ce84d03e6e77acc93	2021-02-18 09:14:41 -08:00
zilinzhu	c8b3686a3e	Make bias in lazy modules lazy and avoid create empty tensors (#52212 ) Summary: Some minor improvement for lazy modules introduced in https://github.com/pytorch/pytorch/issues/44538, https://github.com/pytorch/pytorch/issues/47350 and https://github.com/pytorch/pytorch/issues/51548. This PR mainly turn the bias to `UninitializedParameter` and instead of creating empty tensors like ```python self.bias = Parameter(torch.Tensor(0)) self.bias = UninitializedParameter() ``` I think it would be better to ```python self.register_parameter('bias', None) self.bias = UninitializedParameter() ``` In addition, I change the constructor of the `LazyBatchNorm` from ```python self.running_mean = UninitializedBuffer() ``` to ```python self.register_buffer('running_mean', UninitializedBuffer()) ``` as the original one would not change the underlying `self._buffers`. Thank you for your time on reviewing this PR :). Gently ping albanD, mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/52212 Reviewed By: jbschlosser Differential Revision: D26504508 Pulled By: albanD fbshipit-source-id: 7094d0bb4fa9e2a40a07b79d350ea12a6ebfd080	2021-02-18 06:34:53 -08:00
Vitaly Fedyunin	8bf846d2c8	Skip OneDNN Convolution in case of groups = 24 #50042 (#52327 ) Summary: Temporary disabling OneDNN conv for group size = 24 as OneDNN update came too late to be fully tested https://github.com/pytorch/pytorch/issues/50042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52327 Reviewed By: agolynski Differential Revision: D26474186 Pulled By: VitalyFedyunin fbshipit-source-id: 8d6964d33c8dcab70e207088c3940810eabbd068	2021-02-17 14:49:23 -08:00

... 2 3 4 5 6 ...

1243 Commits