pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Heitor Schueroff de Souza	13a48ac1f3	MaxPool1d without indices optimization (#43745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43745 This is part of a larger effort to refactor and optimize the pooling code. Previously I started working on MaxPool2d here https://github.com/pytorch/pytorch/pull/43267 but since it uses MaxPool1d as a subroutine, it made more sense to work on 1D first and get it tested and optimized and then move up to 2D and then 3D. Below are some benchmarking results, the python script I used is under the results. ## Benchmarking ``` Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_googlenet[(3, 2, 0, 1, 0)-new] 79.7659 (1.03) 1,059.6327 (5.32) 90.6280 (1.01) 19.1196 (1.41) 84.2176 (1.01) 2.4289 (1.0) 1079;2818 11.0341 (0.99) 9055 1 test_googlenet[(3, 2, 0, 1, 0)-old] 505.1531 (6.55) 830.8962 (4.17) 563.4763 (6.29) 65.3974 (4.81) 538.3361 (6.43) 80.5371 (33.16) 242;99 1.7747 (0.16) 1742 1 test_googlenet[(3, 2, 0, 1, 1)-new] 80.2949 (1.04) 233.0020 (1.17) 97.6498 (1.09) 19.1228 (1.41) 89.2282 (1.07) 18.5743 (7.65) 1858;741 10.2407 (0.92) 9587 1 test_googlenet[(3, 2, 0, 1, 1)-old] 513.5350 (6.66) 977.4677 (4.91) 594.4559 (6.63) 69.9372 (5.15) 577.9080 (6.90) 79.8218 (32.86) 503;84 1.6822 (0.15) 1675 1 test_googlenet[(3, 2, 1, 1, 0)-new] 77.1061 (1.0) 199.1168 (1.0) 89.6529 (1.0) 13.5864 (1.0) 83.7557 (1.0) 7.5139 (3.09) 1419;1556 11.1541 (1.0) 7434 1 test_googlenet[(3, 2, 1, 1, 0)-old] 543.6055 (7.05) 964.5708 (4.84) 636.9867 (7.11) 84.0732 (6.19) 616.7777 (7.36) 100.4562 (41.36) 434;65 1.5699 (0.14) 1552 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_inception[(3, 2, 0, 1, 0)-new] 84.5827 (1.00) 184.2827 (1.0) 90.5438 (1.01) 9.6324 (1.0) 89.3027 (1.05) 4.5672 (1.03) 637;759 11.0444 (0.99) 6274 1 test_inception[(3, 2, 0, 1, 0)-old] 641.2268 (7.59) 1,704.8977 (9.25) 686.9383 (7.65) 57.2499 (5.94) 682.5905 (8.01) 58.3753 (13.17) 86;21 1.4557 (0.13) 802 1 test_inception[(3, 2, 0, 1, 1)-new] 84.5008 (1.0) 1,093.6335 (5.93) 89.8233 (1.0) 14.0443 (1.46) 85.2682 (1.0) 4.4331 (1.0) 802;1106 11.1330 (1.0) 9190 1 test_inception[(3, 2, 0, 1, 1)-old] 643.7078 (7.62) 851.4188 (4.62) 687.4905 (7.65) 41.1116 (4.27) 685.1386 (8.04) 60.2733 (13.60) 286;14 1.4546 (0.13) 1300 1 test_inception[(3, 2, 1, 1, 0)-new] 106.0739 (1.26) 258.5649 (1.40) 115.3597 (1.28) 17.5436 (1.82) 106.9643 (1.25) 5.5470 (1.25) 894;1402 8.6685 (0.78) 7635 1 test_inception[(3, 2, 1, 1, 0)-old] 651.0504 (7.70) 955.2278 (5.18) 698.0295 (7.77) 45.5097 (4.72) 692.8109 (8.13) 64.6794 (14.59) 145;15 1.4326 (0.13) 909 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_batch_size[new] 2.9608 (1.0) 5.1127 (1.0) 3.3096 (1.0) 0.1936 (1.0) 3.3131 (1.0) 0.2093 (1.0) 71;6 302.1515 (1.0) 297 1 test_large_batch_size[old] 130.6583 (44.13) 152.9521 (29.92) 137.1385 (41.44) 7.4352 (38.40) 135.1784 (40.80) 5.1358 (24.53) 1;1 7.2919 (0.02) 7 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_channel_size[new] 2.9696 (1.0) 5.5595 (1.0) 3.5997 (1.0) 0.5836 (1.0) 3.3497 (1.0) 0.3445 (1.0) 58;54 277.8014 (1.0) 277 1 test_large_channel_size[old] 19.6838 (6.63) 22.6637 (4.08) 21.1775 (5.88) 0.8610 (1.48) 21.3739 (6.38) 1.4930 (4.33) 13;0 47.2199 (0.17) 36 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_width[new] 1.7714 (1.0) 2.4104 (1.0) 1.8988 (1.0) 0.0767 (1.0) 1.8911 (1.0) 0.0885 (1.0) 86;13 526.6454 (1.0) 373 1 test_large_width[old] 19.5708 (11.05) 22.8755 (9.49) 20.7987 (10.95) 0.7009 (9.14) 20.6623 (10.93) 0.8584 (9.70) 14;1 48.0799 (0.09) 46 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_multithreaded[new] 15.0560 (1.0) 24.2891 (1.0) 16.1627 (1.0) 1.5657 (1.0) 15.7182 (1.0) 0.7598 (1.0) 4;6 61.8709 (1.0) 65 1 test_multithreaded[old] 115.7614 (7.69) 120.9670 (4.98) 118.3004 (7.32) 1.6259 (1.04) 118.4164 (7.53) 1.9613 (2.58) 2;0 8.4531 (0.14) 8 1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean ``` ### Benchmarking script To run the benchmark make sure you have pytest-benchmark installed with `pip install pytest-benchmark` and use the following command: `pytest benchmark.py --benchmark-sort='name'` ``` import torch import pytest def _test_speedup(benchmark, batches=1, channels=32, width=32, kernel_size=2, stride=None, padding=0, dilation=1, ceil_mode=False, return_indices=False): torch.set_num_threads(1) x = torch.randn((batches, channels, width)) model = torch.nn.MaxPool1d(kernel_size, stride, padding, dilation, return_indices, ceil_mode) benchmark(model, x) pytest.mark.benchmark(group="inception") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_inception(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 147, params, return_indices=return_indices) pytest.mark.benchmark(group="googlenet") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_googlenet(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 112, params, return_indices=return_indices) pytest.mark.benchmark(group="large batch size") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_batch_size(benchmark, return_indices): _test_speedup(benchmark, 100000, 1, 32, return_indices=return_indices) pytest.mark.benchmark(group="large channel size") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_channel_size(benchmark, return_indices): _test_speedup(benchmark, 1, 100000, 32, return_indices=return_indices) pytest.mark.benchmark(group="large width") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_width(benchmark, return_indices): _test_speedup(benchmark, 1, 32, 100000, return_indices=return_indices) pytest.mark.benchmark(group="multithreading") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_multithreaded(benchmark, return_indices): x = torch.randn((40, 10000, 32)) model = torch.nn.MaxPool1d(2, return_indices=return_indices) benchmark(model, x) ``` ## Discussion The new algorithm is on average 7x faster than the old one. But because the old algorithm had many issues with how it parallelized the code and made use of the cache, one can come up with input parameters (like large batch size) that will make the new algorithm much faster than the original one. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23425348 Pulled By: heitorschueroff fbshipit-source-id: 3fa3f9b8e71200da48424a95510124a83f50d7b2	2020-09-01 08:40:01 -07:00
Gregory Chanan	a67246b2d4	Add reduction string test for ctc_loss. (#43884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43884 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23427907 Pulled By: gchanan fbshipit-source-id: 889bd92e9d3e0528b57e3952fc83e25bc7abe293	2020-09-01 07:01:54 -07:00
Gregory Chanan	42c895de4d	Properly check that reduction strings are valid for l1_loss, smoothl1_loss, and mse_loss. (#43527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43527 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23306786 Pulled By: gchanan fbshipit-source-id: f3b7c9c02ae02813da116cb6b247a95727c47587	2020-08-31 09:53:56 -07:00
Peter Bell	065ebdb92f	TensorIterator: Check for memory overlap in all `binary_op`s (#43419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43419 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23298655 Pulled By: zou3519 fbshipit-source-id: 82e0ff308a6a7e46b4342d57ddb4c1d73745411a	2020-08-28 08:40:19 -07:00
Peter Bell	bdee8e02c0	TensorIterator: Check memory overlap in all `unary_op`s (#43418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43418 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23298651 Pulled By: zou3519 fbshipit-source-id: 84be498f5375813fd10cf30b8beabbd2d15210a3	2020-08-28 08:39:13 -07:00
Nikita Shulga	4afbf39737	Add nn.functional.adaptive_avg_pool size empty tests (#42857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42857 Reviewed By: seemethere Differential Revision: D23053677 Pulled By: malfet fbshipit-source-id: b3d0d517cddc96796461332150e74ae94aac8090	2020-08-11 12:59:58 -07:00
Kurt Mohler	42b4a7132e	Raise error if `at::native::embedding` is given 0-D weight (#42550 ) Summary: Previously, `at::native::embedding` implicitly assumed that the `weight` argument would be 1-D or greater. Given a 0-D tensor, it would segfault. This change makes it throw a RuntimeError instead. Fixes https://github.com/pytorch/pytorch/issues/41780 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42550 Reviewed By: smessmer Differential Revision: D23040744 Pulled By: albanD fbshipit-source-id: d3d315850a5ee2d2b6fcc0bdb30db2b76ffffb01	2020-08-11 08:26:45 -07:00
Nikita Shulga	3cf2551f2f	Fix `torch.nn.functional.grid_sample` crashes if `grid` has NaNs (#42703 ) Summary: In `clip_coordinates` replace `minimum(maximum(in))` composition with `clamp_max(clamp_min(in))` Swap order of `clamp_min` operands to clamp NaNs in grid to 0 Fixes https://github.com/pytorch/pytorch/issues/42616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42703 Reviewed By: ezyang Differential Revision: D22987447 Pulled By: malfet fbshipit-source-id: a8a2d6de8043d6b77c8707326c5412d0250efae6	2020-08-10 16:20:09 -07:00
Peter Bell	33519e19ab	Fix 64-bit indexing in GridSampler (#41923 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41656 For the CPU version, this is a regression introduced in https://github.com/pytorch/pytorch/issues/10980 which vectorized the `grid_sampler_2d` implementation. It uses the AVX2 gather intrinsic which for `float` requires 32-bit indexing to match the number of floats in the AVX register. There is also an `i64gather_ps` variant but this only utilizes half of the vector width so would be expected to give worse performance in the more likely case where 32-bit indexing is acceptable. So, I've left the optimised AVX version as-is and reinstated the old non-vectorized version as a fallback. For the CUDA version, this operation has never supported 32-bit indexing so this isn't a regression. I've templated the kernel on index type and added 64-bit variants. Although I gather in some places a simple `TORCH_CHECK(canUse32BitIndexMath(...))` is used instead. So, there is a decision to be made here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41923 Reviewed By: glaringlee Differential Revision: D22925931 Pulled By: zou3519 fbshipit-source-id: 920816107aae26360c5e7f4e9c729fa9057268bb	2020-08-06 16:08:09 -07:00
Jianyu Huang	1c5c289b62	[pt] Add incude_last_offset option to EmbeddingBag mean and max (#42215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42215 Specifically on https://github.com/pytorch/pytorch/pull/27477#discussion_r371402079 We would like to supported with include_last=True overall for other reduction types like mean and max. It now causes further code fragmentation in DPER (https://www.internalfb.com/intern/diff/D22794469/). More details: https://www.internalfb.com/intern/diff/D22794469/?dest_fbid=309597093427021&transaction_id=631457624153457 ghstack-source-id: 108733009 Test Plan: ``` buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" ``` ``` (base) [jianyuhuang@devbig281.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ TORCH_SHOW_CPP_STACKTRACES=1 buck test mode/dev-nosan //caffe2/test: nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" --print-passing-details Parsing buck files: finished in 1.2 sec Building: finished in 5.5 sec (100%) 10130/10130 jobs, 2 updated Total time: 6.7 sec More details at https://www.internalfb.com/intern/buck/build/dbdc2063-69d8-45cb-9146-308a9e8505ef First unknown argument: --print-passing-details. Falling back to TestPilot classic. Trace available for this run at /tmp/testpilot.20200728-195414.1422748.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par Discovering tests Running 1 test Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375 ✓ caffe2/test:nn - test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) 0.162 1/1 (passed) Test output: > /data/users/jianyuhuang/fbsource/fbcode/buck-out/dev/gen/caffe2/test/nn#binary,link-tree/torch/_utils_internal.py:103: DeprecationWarning: This is a NOOP in python >= 3.7, its just too dangerous with how we write code at facebook. Instead we patch os.fork and multiprocessing which can raise exceptions if a deadlock would happen. > threadSafeForkRegisterAtFork() > /usr/local/fbcode/platform007/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__ > return f(args, *kwds) > test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) ... Couldn't download test skip set, leaving all tests enabled... > ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.162s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375 Summary (total time 5.54s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Did _not_ run with tpx. See https://fburl.com/tpx for details. ``` Reviewed By: dzhulgakov Differential Revision: D22801881 fbshipit-source-id: 80a624465727081bb9bf55c28419695a3d79c6e5	2020-07-29 01:20:00 -07:00
X Wang	b0424a895c	Raise RuntimeError for zero stride pooling (#41819 ) Summary: Close https://github.com/pytorch/pytorch/issues/41767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41819 Reviewed By: mrshenli Differential Revision: D22780634 Pulled By: ngimel fbshipit-source-id: 376ce5229ad5bd60804d839340d2c6505cf3288d	2020-07-28 11:07:12 -07:00
Alvaro	3e121d9688	Amend docstring and add test for Flatten module (#42084 ) Summary: I've noticed when PR https://github.com/pytorch/pytorch/issues/22245 introduced `nn.Flatten`, the docstring had a bug where it wouldn't render properly on the web, and this PR addresses that. Additionally, it adds a unit test for this module. Actual ![image](https://user-images.githubusercontent.com/13088001/88483672-cf896a00-cf3f-11ea-8b1b-a30d152e1368.png) Expected ![image](https://user-images.githubusercontent.com/13088001/88483642-86391a80-cf3f-11ea-8333-0964a027a172.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/42084 Reviewed By: mrshenli Differential Revision: D22756662 Pulled By: ngimel fbshipit-source-id: 60c58c18c9a68854533196ed6b9e9fb0d4f83520	2020-07-27 11:04:28 -07:00
Kurt Mohler	ec683299eb	Reland Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#41538 ) Summary: Reland PR https://github.com/pytorch/pytorch/issues/40056 A new overload of upsample_linear1d_backward_cuda was added in a recent commit, so I had to add the nondeterministic alert to it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41538 Reviewed By: zou3519 Differential Revision: D22608376 Pulled By: ezyang fbshipit-source-id: 54a2aa127e069197471f1feede6ad8f8dc6a2f82	2020-07-22 13:12:29 -07:00
Vinnam Kim	825a387ea2	Fix bug on the backpropagation of LayerNorm when create_graph=True (#41595 ) Summary: Solve an issue https://github.com/pytorch/pytorch/issues/41332 I found the bug at https://github.com/pytorch/pytorch/issues/41332 is caused by LayerNorm. Current implementations of LayerNorm have a disparity between 1. [`create_graph=False` CUDA implementation](`dde3d5f4a8/aten/src/ATen/native/cuda/layer_norm_kernel.cu (L145)`) 2. [`create_graph=True` implementation](`dde3d5f4a8/tools/autograd/templates/Functions.cpp (L2536)`) With this bug-fix, https://github.com/pytorch/pytorch/issues/41332 is solved. Ailing BIT-silence Signed-off-by: Vinnam Kim <vinnamkim@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/41595 Reviewed By: houseroad Differential Revision: D22598415 Pulled By: BIT-silence fbshipit-source-id: 63e390724bd935dc8e028b4dfb75d34a80558c3a	2020-07-22 00:19:12 -07:00
Alvaro	c89c294ef9	Add Unflatten Module (#41564 ) Summary: This PR implements a feature extension discussed in https://github.com/pytorch/pytorch/issues/41516. I followed this other PR https://github.com/pytorch/pytorch/issues/22245 to add this other module. While I was at it, I also added `extra_repr()` method in `Flatten` which was missing. I see there are no unit tests for these modules. Should I add those too? If so, what is the best place I should place these? Pull Request resolved: https://github.com/pytorch/pytorch/pull/41564 Reviewed By: gchanan Differential Revision: D22636766 Pulled By: albanD fbshipit-source-id: f9efdefd3ffe7d9af9482087625344af8f990943	2020-07-21 07:43:02 -07:00
Mike Ruberry	b2b8af9645	Removes assertAlmostEqual (#41514 ) Summary: This test function is confusing since our `assertEqual` behavior allows for tolerance to be specified, and this is a redundant mechanism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41514 Reviewed By: ngimel Differential Revision: D22569348 Pulled By: mruberry fbshipit-source-id: 2b2ff8aaa9625a51207941dfee8e07786181fe9f	2020-07-16 10:35:12 -07:00
Zhang, Xiaobing	b48ee175e6	[reland][DNNL]:enable conv3d (#40691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40691 Test Plan: Imported from OSS Differential Revision: D22296548 Pulled By: VitalyFedyunin fbshipit-source-id: 8e2a7cf14e8bdfa2f29b735a89e8c83f6119e68d	2020-07-15 13:54:41 -07:00
Shen Li	954c260061	Revert D22480638: [pytorch][PR] Add non-deterministic alert to CUDA operations that use `atomicAdd()` Test Plan: revert-hammer Differential Revision: D22480638 (`6ff306b8b5`) Original commit changeset: 4cc913cb3ca6 fbshipit-source-id: e47fa14b5085bb2b74a479bd0830efc2d7604eea	2020-07-15 12:10:05 -07:00
Kurt Mohler	6ff306b8b5	Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#40056 ) Summary: Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40056 Differential Revision: D22480638 Pulled By: ezyang fbshipit-source-id: 4cc913cb3ca6d4206de80f4665bbc9031aa3ca01	2020-07-15 10:57:32 -07:00
Wojciech Baranowski	20f3051f7d	[adaptive_]max_pool{1,2,3}d: handle edge case when input is filled with -inf (#40665 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40131 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40665 Differential Revision: D22463538 Pulled By: ezyang fbshipit-source-id: 7e08fd0205926911d45aa150012154637e64a8d4	2020-07-14 21:51:40 -07:00
Kurt Mohler	0b73ea0ea2	Change BCELoss size mismatch warning into an error (#41426 ) Summary: BCELoss currently uses different broadcasting semantics than numpy. Since previous versions of PyTorch have thrown a warning in these cases telling the user that input sizes should match, and since the CUDA and CPU results differ when sizes do not match, it makes sense to upgrade the size mismatch warning to an error. We can consider supporting numpy broadcasting semantics in BCELoss in the future if needed. Closes https://github.com/pytorch/pytorch/issues/40023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41426 Reviewed By: zou3519 Differential Revision: D22540841 Pulled By: ezyang fbshipit-source-id: 6c6d94c78fa0ae30ebe385d05a9e3501a42b3652	2020-07-14 20:34:06 -07:00
Peter Bell	87bf04fe12	AvgPool: Ensure all cells are valid in ceil mode (#41368 ) Summary: Closes https://github.com/pytorch/pytorch/issues/36977 This avoid the division by zero that was causing NaNs to appear in the output. `AvgPooling2d` and `AvgPooling3d` both had this issue on CPU and CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41368 Reviewed By: ailzhang Differential Revision: D22520013 Pulled By: ezyang fbshipit-source-id: 3ece7829f858f5bc17c2c1d905266ac510f11194	2020-07-14 09:24:30 -07:00
Kimish Patel	82c9f79e0e	Add fused add_relu op. (#39342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39342 Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Imported from OSS Differential Revision: D21822397 fbshipit-source-id: 03df83a3e46ddb48a90c5a6f755227a7e361a0e8	2020-07-09 16:25:11 -07:00
Liu	54d7a1e3f4	Fix module dict key ordering (#40905 ) Summary: fix https://github.com/pytorch/pytorch/issues/40227 Removed the sorting operation both in ModuleDict class, updated the docstring. Also remove a sort operation in corresponding unit test, which will lead to unit test fail. BC Note: Python version after 3.6, the plain dict will preserve the order of keys. example: For a python 3.6+ user, if he is initial a ModuleDict instance using plain python dict: { "b": torch.nn.MaxPool2d(3), "a": torch.nn.MaxPool2d(3) } , he will get a ModuleDict which preserve the order: ModuleDict( (b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) (a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) ) For a python 3.5 user, if we maintain the same input, then the output ModuleDict could be: ModuleDict( (a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) (b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40905 Differential Revision: D22357480 Pulled By: albanD fbshipit-source-id: 0e2502769647bb64f404978243ca1ebe5346d573	2020-07-06 06:40:48 -07:00
Sameer Deshmukh	cf8a9b50ca	Allow ReflectionPad to accept 0-dim batch sizes. (#39231 ) Summary: Allows ReflectionPad 1D and 2D to accept 0-dim batch sizes. Related to issues: * https://github.com/pytorch/pytorch/issues/38115 * https://github.com/pytorch/pytorch/issues/12013 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39231 Reviewed By: ezyang Differential Revision: D22205717 Pulled By: mruberry fbshipit-source-id: 6744661002fcbeb4aaafd8693fb550ed53f3e00f	2020-06-24 22:24:05 -07:00
Xiao Wang	17d3f74ea3	Relax cudnn conditions for channels-last convolutions (#38904 ) Summary: Follow up of https://github.com/pytorch/pytorch/issues/38044. Thanks ptrblck, mcarilli for the help on discussing the changes! Could fix https://github.com/pytorch/pytorch/issues/37725 by skipping the depthwise-workload check introduced in https://github.com/pytorch/pytorch/issues/22302. This PR also relaxed dilated convolution for channels-last. The testing script is https://gist.github.com/xwang233/82a707f69bb710cb612349280a2c5f41. About 387k conv arguments were tested and no cudnn exception was thrown. cc ngimel VitalyFedyunin ptrblck mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/38904 Differential Revision: D22155797 Pulled By: VitalyFedyunin fbshipit-source-id: 81b5736cec67ea263029121521c6acafd9dddba6	2020-06-22 10:59:37 -07:00
F-G Fernandez	881c1adfcd	Fixed buffer update in BatchNorm when track_running_stats is set to False (#38084 ) Summary: This PR aims at tackling https://github.com/pytorch/pytorch/issues/37823 by: - ensuring that buffers will be used for normalization computation but won't be updated, when buffers are not None, and `track_running_stats=False` - adding a corresponding unittest to ensure expected behaviour Any feedback is welcome! _Note: we might want to update the docstrings of `BatchNorm*d`, feel free to share any suggestion!_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/38084 Differential Revision: D22047871 Pulled By: ezyang fbshipit-source-id: 5acbcad9773e7901f26d625db71d43d7dc236d3e	2020-06-22 08:17:31 -07:00
Xiao Wang	1670ea9474	Remove overload of GPU max_pool3d with kernel_width; fix nan, inf in GPU {fractional,adaptive} max_pool{2,3}d (#39903 ) Summary: Fix https://github.com/pytorch/pytorch/issues/39846. Fix https://github.com/pytorch/pytorch/issues/39044 The problem was that `max_pool3d_with_indices_single_out_frame` has an overload of kernel_width being a template argument. The two overloaded kernels were supposed to be identical, however, they were not. The general version `da3073e9b1/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu (L69-L73)` The overloaded version `da3073e9b1/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu (L130-L134)` While the max_pool3d being "switch-case"-ed to the overloaded version, the NaN value comparison is ignored. Also, maintaining two overloaded versions of such a complicated kernel would be hard. I'm not sure if the overloaded version would even give huge performance benefit. So I propose to remove the kernel_width overloaded version. Also, the current test of max_pool_XD_nan forgot the device kwarg. I added that. Edit: profiling before and after script: https://github.com/xwang233/code-snippet/blob/master/maxpool-3d-kw-template-arg/a.py plot: https://github.com/xwang233/code-snippet/blob/master/maxpool-3d-kw-template-arg/b.ipynb The performance difference is within +- 5%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39903 Differential Revision: D22080759 Pulled By: ngimel fbshipit-source-id: 4dacdd266a0522b3ff432eb9d58b131fa86821e9	2020-06-17 16:18:33 -07:00
Emilio Castillo	5e77999ecb	Add global hooks to `torch.nn.Module` (#38972 ) Summary: This allows registering hooks that will be executed for every module. This idea arose in a discussion with tkerola and niboshi kindly proposed this approach. The use case for this is to avoid boilerplate code when registering the same hook for all the modules in a complex model, the internal use-case was to allow every model to accept a NumPy array in the forward pass in a simpler way. Other use cases involve general mechanisms for plotting or tracing & debugging. Currently, this is shared for all the modules but this can be worked out to have the hooks shared only per type of module. If this functionality is not needed feel free to close the PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38972 Differential Revision: D22091364 Pulled By: albanD fbshipit-source-id: 204ff5f9e119eff5bdd9140c64cb5dc467bb23a2	2020-06-17 12:20:35 -07:00
Emilio Castillo	5200814cfa	Fix test_hook_* issues (#40135 ) Summary: Follows https://github.com/pytorch/pytorch/issues/38972 Some of the changes asked by albanD in the above review are appliable to the regular hooks tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40135 Differential Revision: D22091389 Pulled By: albanD fbshipit-source-id: e1004213276bfb189167b9870e1a88b3d23b458c	2020-06-17 08:50:42 -07:00
jiej	bfcb687b9c	Nearest interpolation gpu implementation fix [Resolves issue #38985 ] (#39055 ) Summary: fix nearest upsample dgrad bug, where window computation was wrong previously; fix python test where previously GPU implementation was not tested; Pull Request resolved: https://github.com/pytorch/pytorch/pull/39055 Differential Revision: D21763242 Pulled By: albanD fbshipit-source-id: 9b1d5365f40176450f529136110542fd36bd7f58	2020-05-28 08:07:14 -07:00
Ailing	20397285c6	Replace use of np.allclose in tests. (#34287 ) Summary: fixes https://github.com/pytorch/pytorch/issues/34096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34287 Differential Revision: D21735525 Pulled By: ailzhang fbshipit-source-id: 611da17cfc5a3fee77d482abccf8f9854f504263	2020-05-27 15:29:35 -07:00
Mike Ruberry	13120bf677	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21740237 Pulled By: mruberry fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042	2020-05-27 06:31:07 -07:00
Rohan Varma	63e545e0fe	Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol Test Plan: revert-hammer Differential Revision: D21717199 Original commit changeset: 9feb856f94ee fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259	2020-05-26 18:23:59 -07:00
Xiao Wang	e4a3c584d5	Fix max_pool2d nchw backward bug (#38953 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38764 The current problem is that, `top_diff` and `top_mask` pointers are shifted "accumulatively" with for-n and for-c loops. This may cause overflow and illegal memory access when the loop counts are greater than one, that is n > 65535 or c > 65535 (the case in https://github.com/pytorch/pytorch/issues/38764). Since neither of n > 65535 or c > 65535 is common, it has not been seen before. The simple fix would be using new pointer variables for the n & c offset instead of directly modifying `top_diff` or `top_mask`. However, I think the current nchw max_pool2d GPU impl still has plenty of room for performance improvement. We can check that in a later PR if needed. Slightly clean up the indentation. Also add tests to use CPU impl as a reference check. cc skrah Pull Request resolved: https://github.com/pytorch/pytorch/pull/38953 Differential Revision: D21721930 Pulled By: ezyang fbshipit-source-id: fef7d911d814f8ed9fd67c60cabe5d52f8fd3d57	2020-05-26 12:00:31 -07:00
Xiao Wang	583ff947e1	Fix max_pool2d for returning wrong shape with return_indices=True on cuda (#38992 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38986 The current code only resizes pooling output but forget to resize indices as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38992 Differential Revision: D21718324 Pulled By: ngimel fbshipit-source-id: 7cf937966d38ab2167be79979475c4e0cacbf82c	2020-05-26 11:27:36 -07:00
Mike Ruberry	6ddca30b2d	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21717199 Pulled By: mruberry fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a	2020-05-26 08:30:23 -07:00
Natalia Gimelshein	c34b333230	improve accuracy of logsoftmax computation on cuda (#38945 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38839. Previously, if magnitude of input values was large, when computing `max+log(sum)` the `log(sum)` value was essentially ignored, now the result is computed as `x-max-log(sum)` which has a better chance of preserving accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38945 Differential Revision: D21712483 Pulled By: ngimel fbshipit-source-id: c1a3599ed981ba7a7fd130cbd7040a706b7eace0	2020-05-26 08:29:56 -07:00
jiej	5b8a79ab49	fix the device inconsistency for import convert_sync_batchnorm (#38729 ) Summary: This fixes the device inconsistency reported in https://github.com/pytorch/pytorch/issues/37930 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38729 Differential Revision: D21671039 Pulled By: ngimel fbshipit-source-id: 17fdb4eae2ddaf64560dd026fe39958536ab313f	2020-05-20 15:42:53 -07:00
Jeff Daily	55914f8e83	Add skipCUDAIfRocm to test_nn test_softmax_results. (#38724 ) Summary: CC ezyang xw285cornell sunway513 Commit `59d92e442b` (https://github.com/pytorch/pytorch/issues/38557) has caused this test to regularly fail on ROCm CI gfx900 hosts. Skipping test until root cause analysis can complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38724 Differential Revision: D21645815 Pulled By: xw285cornell fbshipit-source-id: 4087e9565710c271ca5c026a5ae0c5132e56f44d	2020-05-19 13:20:34 -07:00
Natalia Gimelshein	54d4b419db	fix clip_grad_norm to work with parameters on the different devices (#38615 ) Summary: Per title. We move all the individual gradient norms to a single device before stacking (no-op if all the gradients are already on a single device), `clip_coef` is copied to the device of gradient, which may be suboptimal as there could be multiple copies, but no worse than when we were synchronizing for each parameter. In a simple case of all gradients on a single device, there should be no synchronization. Also, we no longer error out if parameter list is empty or none of the parameters have gradients, and return 0 total_norm instead. Fixes https://github.com/pytorch/pytorch/issues/38605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38615 Reviewed By: ailzhang Differential Revision: D21634588 Pulled By: ngimel fbshipit-source-id: ea4d08d4f3445438260052820c7ca285231a156b	2020-05-19 10:33:40 -07:00
Simon Layton	59d92e442b	Vectorize non-persistent Softmax (#38557 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/36485 with bug fix & enhanced testing. Moved `test_softmax_backward` -> `test_softmax_results`, check fprop & bgrad against CPU implementation for all cases. \cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/38557 Differential Revision: D21620805 Pulled By: ngimel fbshipit-source-id: 4f736b3e59f79142e1b982eb643c592dedcbe111	2020-05-18 13:05:36 -07:00
Mike Ruberry	9cfc10d52e	Updates assertEqual to use torch.isclose-like logic (#37294 ) Summary: Edit: this has been updated to reflect the PR's current status, which has changed after review. This PR updates the behavior of the assertEqual, assertNotEqual, and assert_allclose to be consistent with each other and torch.isclose. It corrects several additional bugs in the current implementations and adds extensive testing and comments, too. These updates follow from changes to assertEqual like https://github.com/pytorch/pytorch/pull/34258 and https://github.com/pytorch/pytorch/pull/37069, and from our discussion of torch.isclose for complex tensors (see https://github.com/pytorch/pytorch/issues/36462), where we decided to implement a NumPy-compatible mathematical notion of "closeness" for complex tensors that is not a great fit for our testing framework. The detailed changelist is: - New test framework functions for comparing tensors and scalars - Tensors are compared using isclose; the real and imaginary parts of complex tensors are compared independently - Scalars are compared using the same algorithm - assertEqual and assert_allclose now use this common comparison function, instead of each implementing their own with divergent behavior - assertEqual-like debug messages are now available for all tensor and scalar comparisons, with additional context when comparing the components of sparse, quantized, and complex tensors - Extensive testing of the comparison behavior and debug messages - Small Updates - assertEqual now takes an "exact_device" argument, analogous to "exact_dtype", which should be useful in multidevice tests - assertEqual now takes an "equal_nan" argument for argument consistency with torch.isclose - assertEqual no longer takes the "allow_inf" keyword, which misleadingly only applied to scalar comparisons, was only ever set (rarely) to true, and is not supported by torch.isclose - Bug fixes: - the exact_dtype attribute has been removed (no longer needed after https://github.com/pytorch/pytorch/pull/38103) - message arguments passed to assertEqual are now handled correctly - bool x other dtype comparisons are now supported - uint8 and int8 tensor comparisons now function properly - rtol for integer comparisons is now supported (default is zero) - rtol and atol for scalar comparisons are now supported - complex scalar comparisons are now supported, analogous to complex tensor comparisons - assertNotEqual is now equivalent to the logical negation of assertEqual Pull Request resolved: https://github.com/pytorch/pytorch/pull/37294 Differential Revision: D21596830 Pulled By: mruberry fbshipit-source-id: f2576669f7113a06f82581fc71883e6b772de19b	2020-05-15 16:24:03 -07:00
Natalia Gimelshein	c0bc182761	Revert "Vectorize non-persistent Softmax kernels (#36485 )" (#38534 ) Summary: This reverts commit `c879c6fb98`. (it produces incorrect results) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38534 Reviewed By: soumith Differential Revision: D21589251 Pulled By: ngimel fbshipit-source-id: 66d5324848d0245d15b7ef5f1fe4302ed0992b56	2020-05-14 23:17:59 -07:00
David Reiss	d060deb5bb	Remove _compatible_subtest (#35620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35620 Python 2 has reached end-of-life and is no longer supported by PyTorch. `self.subTest` can be used directly in Python 3. Test Plan: CI Differential Revision: D20842872 Pulled By: dreiss fbshipit-source-id: 6ad42550c01e6959821ff07df767fc14b58c5a9e	2020-05-14 10:07:48 -07:00
Robert Wang	2b2d2168e8	Issue #27441 Fix: Bug in updating ModuleDict & ParameterDict (#27814 ) Summary: Fix a bug in `nn.ModuleDict.update` and `nn.ParameterDict.update` when passing another same dictionary as input. Related issue: [Issue https://github.com/pytorch/pytorch/issues/27441](https://github.com/pytorch/pytorch/issues/27441) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27814 Differential Revision: D21518099 Pulled By: ezyang fbshipit-source-id: 9e6bb6fcc26c8070e137e2e52c65f69a1fcaab37	2020-05-14 08:01:41 -07:00
Jeff Daily	138769b1b8	[ROCm] add exact_dtype=False to bfloat16 test (#38381 ) Summary: CC rohithkrn ezyang xw285cornell Fixes - TestNNDeviceTypeCUDA.test_activations_bfloat16_cuda - TestNNDeviceTypeCUDA.test_pooling_bfloat16_cuda - TestNNDeviceTypeCUDA.test_softmax_bfloat16_cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/38381 Differential Revision: D21549636 Pulled By: ezyang fbshipit-source-id: acb290c57eff4077b040a696267ecde613f0a433	2020-05-13 08:48:18 -07:00
Vitaly Fedyunin	57d01be92b	Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102 Test Plan: Imported from OSS Differential Revision: D21477060 Pulled By: VitalyFedyunin fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4	2020-05-09 14:48:55 -07:00
Simon Layton	c879c6fb98	Vectorize non-persistent Softmax kernels (#36485 ) Summary: Add read/write vectorization to non-persistent softmax kernels only. At this point launch logic has minimal changes, and `ILP=vectorization=2` is always used (the code can handle other values, but `ILP=2` has been the most consistent performer). Dispatch to persistent / non-persistent kernels is unchanged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36485 Differential Revision: D21477775 Pulled By: ngimel fbshipit-source-id: 9ff7fd243695d7bbf4121390085b64db0bbdef35	2020-05-08 15:20:33 -07:00

1 2 3 4 5 ...

803 Commits