pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
neginraoof	0b57b383b1	Im2col export (#30972 ) Summary: Added im2col to opset 11. This symbolic is used to export torch.nn.Unfold Pull Request resolved: https://github.com/pytorch/pytorch/pull/30972 Reviewed By: hl475 Differential Revision: D18946921 Pulled By: houseroad fbshipit-source-id: 13dd0cbae899700df32fd74d6dff1f29033a2b4c	2019-12-20 09:45:45 -08:00
Nikita Shulga	6cd987e7c0	Make fully_qualified_type_name_impl() compatible with VS2017 15.9 (#31455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31455 In 15.9, __FUNCSIG__ unwraps using definitions as well as preserves noexcept qualifiers Test Plan: Build caffe2 on Windows using VS2017 Differential Revision: D19166204 fbshipit-source-id: b6c5f70e5262d13adf585f77b92223cf5f1e78dd	2019-12-20 09:17:44 -08:00
Nikita Shulga	2099cfa13d	Fix input_channels divisibility check in concat_split_op (#31448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31448 Replace `(!x%y)` with `(x%y != 0)` Test Plan: CI Reviewed By: orionr Differential Revision: D19165492 fbshipit-source-id: 246635fb8ddd5823196bcef9d0e6cdf1c349015e	2019-12-20 09:12:54 -08:00
Gregory Chanan	b38901aa15	Test reading `__cuda_array_interface__` inferred strides. (#31451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31451 The PR that fixed this, https://github.com/pytorch/pytorch/pull/24947, didn't add a test. Fixes: https://github.com/pytorch/pytorch/issues/31443 Test Plan: Imported from OSS Differential Revision: D19170020 Pulled By: gchanan fbshipit-source-id: bdbf09989ac8a61b1b70bb1ddee103caa8ef435b	2019-12-20 08:21:39 -08:00
Brian Vaughan	d0d6e0b5e3	add type promotion support for sparse tensors (#30429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30429 also fix a bug in uncoalesced division General approach here is that we: * compute the common dtype based on input tensors * error if the output tensor is specified and the common type can't be cast back to the output type (e.g. for inplace ops) * convert input tensor (values) to the common dtype * perform the op as normal (computing at the common dtype instead of the result type). * convert/copy the result values back to that of the result tensor (for in-place ops). For uncoalesced division we need to coalesce, because an integral tensor with values=[1,1] at the same index divided by 2 would give 1/2 + 1/2 =0 instead of 2/2=1. Test Plan: Imported from OSS Differential Revision: D19143223 Pulled By: nairbv fbshipit-source-id: 480fa334c0b2b3df046818f2342cfd4e2d9d892a	2019-12-20 08:01:00 -08:00
svcscm	e9ef087d2d	Updating submodules Summary: GitHub commits: `357842e091` `d62f47c763` `dc94cd4972` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: dcb9813e1469cc867d9c826daa873c535ef408ab	2019-12-20 00:57:39 -08:00
Chunli Fu	4c341582ea	modify model to enable loading by blob (#31507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31507 This script is used to generate a model with bound shape inference and blob reorder, which are requirements for big model loading on T17. 1. Load existing model. 2. Do bound shape inference and blob reorder (put embedding blobs at the end). 3. Save the modified model. Test Plan: Generated a new moel and tested on NNPI. P124181047 (mismatch is AA variance) Reviewed By: ipiszy Differential Revision: D19165467 fbshipit-source-id: c3522fc5dc53b7ec652420558e9e8bf65a1ccfae	2019-12-19 21:57:22 -08:00
davidriazati	06dbef663d	Add support for `del` (#31273 ) Summary: Adds the `del` keyword to the parser and corresponding `aten::Delete` op for lists and dicts Fixes #20615 ](https://our.intern.facebook.com/intern/diff/19181473/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31273 Pulled By: driazati Differential Revision: D19181473 fbshipit-source-id: c42a2d43ec361a98e0c425232981edc9c39388c4	2019-12-19 21:48:11 -08:00
Xiang Gao	624088e444	Don't dispatch to cudnn if it is not possible to make it 32bit by splitting batch dim (#31383 ) Summary: Also a step towards supporting 64bit indexing in convolution. See also: https://github.com/pytorch/pytorch/pull/31379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31383 Differential Revision: D19183443 Pulled By: ngimel fbshipit-source-id: 0c2030fac147e629d7be0c29f0683ec2b3f28c71	2019-12-19 18:00:03 -08:00
svcscm	87768e5ade	Updating submodules Summary: GitHub commits: `286867987e` `09cbf47ea5` `db100834c1` `1ba92b8582` `60240e3f08` `beb5c4798e` `c37eb5d377` `1ada29037c` `f12539bbc9` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 75b16ea1bc038b599b3540d0615dd9eb9ecfda74	2019-12-19 17:30:48 -08:00
Zachary DeVito	457286a383	fix missing type check in dictionary literal Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31375 Test Plan: Imported from OSS Differential Revision: D19145440 Pulled By: zdevito fbshipit-source-id: 69909089586149ef766b4858d3420864a81b2493	2019-12-19 16:22:36 -08:00
Rohan Varma	348d42114e	Kill MessageType::SHUTDOWN related logic in pg agent (#31270 ) Summary: https://github.com/pytorch/pytorch/pull/30330 got rid of the need to send a `MessageType::SHUTDOWN` message, so we can now remove the logic/utils for this type of message. I think we can also delete the enum entry in the `enum MessageType`, but we may want to keep it in case the logic in https://github.com/pytorch/pytorch/pull/30710 is ever moved to C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31270 Test Plan: All existing unit tests pass Differential Revision: D19146983 Pulled By: rohan-varma fbshipit-source-id: 35b185411f9446d7d4dfc37a6cb5477cf041e647	2019-12-19 13:47:43 -08:00
davidriazati	57caeb3fc1	Fix builtins table (#31492 ) Summary: Fixes a bad merge that is breaking distributed tests on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/31492 Pulled By: driazati Differential Revision: D19180978 fbshipit-source-id: f69f525e2c7f61194686f07cf75db00eb642882f	2019-12-19 13:33:15 -08:00
Jerry Zhang	226c2d79ce	Get QScheme from observer module (#31293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31293 Previously we check the number of elements in scale to determine if we are using per channel quantization, but we should get qscheme information from observer module directly and we'll expose this information to caller as well Test Plan: . Imported from OSS Differential Revision: D19146669 fbshipit-source-id: ea430eeae0ef8f441be39aa6dcc1bb530b065554	2019-12-19 13:33:11 -08:00
Richard Zou	dbe2f265d0	Better error msg for autograd profiler + multi-worker dataloader crash (#31473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31473 Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) - because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70 Test Plan: - Tested locally. - I didn't add a test case for this because it's hard to write a test case that doesn't completely stop the rest of our test suite from running. Differential Revision: D19178080 Pulled By: zou3519 fbshipit-source-id: c632525ba1f7b168324f1aa55416e5250f56a086	2019-12-19 13:30:19 -08:00
Richard Zou	e67064a96f	Exclude generated source docs from Google (#31484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31484 See https://github.com/pytorch/pytorch/issues/26123 for context. Previously, when someone googles for `pytorch "adaptive_max_pool2d"`, https://pytorch.org/docs/stable/_modules/torch/nn/modules/pooling.html is the first result. This PR changes the docs build script to exclude all such generated source docs under `_modules/` from Google. It does this by doing a search for `<head>` and then appending `<meta name="robots" content="noindex">`. The [google developer docs](https://support.google.com/webmasters/answer/93710?hl=en) suggest that this is the right way to prevent google from indexing the page. In the future, when the CI builds documentation (both master and stable docs), the newly created docs under _modules will have the meta noindex tag. Test Plan: - I ran `find "$install_path/_modules" -name "*.html" -print0 \| xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'` on a docs build locally and checked that it does indeed append the meta noindex tag after `<head>`. - In a few days we should rerun the search to see if these pages are still being indexed. Differential Revision: D19180300 Pulled By: zou3519 fbshipit-source-id: 5f5aa95a85dd9f065607c2a16f4cdd24ed699a83	2019-12-19 13:27:12 -08:00
Richard Zou	8f3c0d541e	Speed up `Tensor::has_names` for unnamed tensors (#31436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31436 Tensor::has_names is slower than it should be for unnamed tensors because of the following: - it always tries to access the TLS for NamesMode. Unnamed tensors don't need to peek at NamesMode to determine if they have names or not. - There is some virtual function being called because TensorImpl is in c10 and NamedTensorMeta is in libtorch. This PR short-circuits Tensor::has_names for unnamed tensors by checking if the underlying TensorImpl hold a pointer to NamedTensorMeta or not. If the NamedTensorMeta is nullptr; then the tensor is definitely unnamed. Benchmarks: - I have a dedicated benchmarking machine where I isolate a single CPU and make sure it runs at a fixed frequency. - I benchmarked torch.add, which calls `tensor::has_names` three times. - The TL;DR is that torch.add between size-1 unnamed tensors gets sped up ~200ns after this change which is a 9% improvement. - Before, on my machine: https://gist.github.com/zou3519/dfd648a1941d584711d850754e0694bc - After on my machine: https://gist.github.com/zou3519/e78f0d8980b43d0d9c3e3e78ecd0d4d5 Test Plan: - run tests Differential Revision: D19166510 Pulled By: zou3519 fbshipit-source-id: 1888a4e92d29152a5e3b778a95e531087e532f53	2019-12-19 13:19:30 -08:00
anjali411	9d9bc93bfb	Added error message to indicate that reduction operations are not supported for dim>=64 (#31476 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/23159 Currently we don't support reduction operations for dim>=64 and we should give a descriptive RuntimeError indicating the same Diff: D19179039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31476 Differential Revision: D19179039 Pulled By: anjali411 fbshipit-source-id: 58568f64627bf3df6b3e00a1498544c030e74a0e	2019-12-19 13:00:53 -08:00
Elias Ellison	779b128872	add back in reference to jit_unsupported section (#31486 ) Summary: It was added in https://github.com/pytorch/pytorch/pull/31329 and removed in a bad merge in https://github.com/pytorch/pytorch/pull/31138/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/31486 Differential Revision: D19181967 Pulled By: eellison fbshipit-source-id: 7e4b4a9b2042c30ec18f7f737bc4a9a56fac7d92	2019-12-19 12:44:16 -08:00
anjali411	49fe7a7401	Updated documentation for NLLLoss to explain what x, y and w refer to (#31488 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/31385 In the current documentation for NLLLoss, it's unclear what `y` refers to in the math section of the loss description. There was an issue(https://github.com/pytorch/pytorch/issues/31295) filed earlier where there was a confusion if the loss returned for reduction=mean is right or not, perhaps because of lack in clarity of formula symbol description in the current documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31488 Differential Revision: D19181391 Pulled By: anjali411 fbshipit-source-id: 8b75f97aef93c92c26ecbce55b3faf2cd01d3e74	2019-12-19 12:28:16 -08:00
Jerry Zhang	d6acc87c93	Guard against copying from quantized Tensor to non-quantized Tensor (#29660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29660 att Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D18799897 fbshipit-source-id: 5d1b4ef84f5ae8eba830784b74485d78fa1e6fcf	2019-12-19 12:16:44 -08:00
peterjc123	c4121ed8db	Fix is_fundamental template for MSVC (#30959 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959 Differential Revision: D18891797 Pulled By: mingbowan fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1	2019-12-19 12:10:22 -08:00
svcscm	6d6a91fb0f	Updating submodules Summary: GitHub commits: `58a1ec274c` `24da1c8b66` `77d5ba7887` `c7b80d7ab5` Test Plan: n/a Reviewed By: tgreenidge fbshipit-source-id: be872df9014b795b279b93bd81efbaa41f2d0fd7	2019-12-19 12:05:29 -08:00
davidriazati	28376e826d	Fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31463 Pulled By: driazati Differential Revision: D19173580 fbshipit-source-id: 6e5bb24949ec357c4d5b29a16d1733b664f21e05	2019-12-19 10:17:01 -08:00
Gregory Chanan	540b9da41e	Bump numba version in circleCI config to 0.46.0. (#31435 ) Summary: The current numba version doesn't appear to actually work with our numba-cuda tests (numba.cuda.is_available()) fails. Previous attempts to upgrade were blocked by https://github.com/numba/numba/issues/4368. It's a bit unclear to me, but I believe 0.46.0 fixes the above version. I'm verify that we catch that issue in CI via https://github.com/pytorch/pytorch/pull/31434. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31435 Differential Revision: D19166865 Pulled By: gchanan fbshipit-source-id: e01fa48c577e35de178423db7a7f79ac3dd3894d	2019-12-19 07:55:55 -08:00
Nikolay Korovaiko	fc3103b116	fixing a naming issue in creating a residual loop node in a bailout graph (#31400 ) Summary: This addresses the issue of differentiating between `%4` in `%12 : int, %y.1 : Tensor = prim::Loop(%9, %6, %4, %3)` and `%y.5 : Double(3) = aten::cat(%22, %4) # test_jit.py:3772:24` in `%4` loop's body in a residual continuation loop, because these should be different values. ``` [DUMP profiling_graph_executor_impl.cpp:124] with prim::BailoutTemplate_0 = graph(%z.1 : int, [DUMP profiling_graph_executor_impl.cpp:124] %size.1 : int): [DUMP profiling_graph_executor_impl.cpp:124] %2 : Tensor = prim::Constant[value= 1 1 [ CPUDoubleType{2} ]]() [DUMP profiling_graph_executor_impl.cpp:124] %3 : Double(2) = prim::BailOut[index=0](%2, %z.1, %size.1) [DUMP profiling_graph_executor_impl.cpp:124] %4 : int = prim::Constant[value=0]() # test_jit.py:3772:54 [DUMP profiling_graph_executor_impl.cpp:124] %5 : None = prim::Constant() [DUMP profiling_graph_executor_impl.cpp:124] %6 : bool = prim::Constant[value=1]() # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] %counters.1 : int[] = prim::ListConstruct() [DUMP profiling_graph_executor_impl.cpp:124] %8 : int = prim::Constant[value=8]() [DUMP profiling_graph_executor_impl.cpp:124] %9 : int = aten::__round_to_zero_floordiv(%size.1, %8) [DUMP profiling_graph_executor_impl.cpp:124] %10 : int = aten::mul(%9, %8) [DUMP profiling_graph_executor_impl.cpp:124] %11 : int = aten::sub(%size.1, %10) [DUMP profiling_graph_executor_impl.cpp:124] %12 : int, %y.1 : Tensor = prim::Loop(%9, %6, %4, %3) # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] block0(%i.2 : int, %15 : int, %y.7 : Tensor): [DUMP profiling_graph_executor_impl.cpp:124] %17 : Double(2) = prim::BailOut[index=1](%y.7, %z.1, %counters.1, %9, %11, %i.2, %15) [DUMP profiling_graph_executor_impl.cpp:124] %18 : int[] = aten::append(%counters.1, %15) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %19 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %20 : Tensor = aten::ones(%19, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %21 : Double(1) = prim::BailOut[index=2](%20, %z.1, %counters.1, %9, %11, %i.2, %15, %17) [DUMP profiling_graph_executor_impl.cpp:124] %22 : Tensor[] = prim::ListConstruct(%17, %21) [DUMP profiling_graph_executor_impl.cpp:124] %y.5 : Double(3) = aten::cat(%22, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %24 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:124] %25 : int = aten::add(%15, %24) [DUMP profiling_graph_executor_impl.cpp:124] %26 : int[] = aten::append(%counters.1, %25) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %27 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %28 : Tensor = aten::ones(%27, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %29 : Double(1) = prim::BailOut[index=3](%28, %z.1, %counters.1, %9, %11, %i.2, %y.5, %25) [DUMP profiling_graph_executor_impl.cpp:124] %30 : Tensor[] = prim::ListConstruct(%y.5, %29) [DUMP profiling_graph_executor_impl.cpp:124] %y.9 : Double(4) = aten::cat(%30, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %32 : int = aten::add(%25, %24) [DUMP profiling_graph_executor_impl.cpp:124] %33 : int[] = aten::append(%counters.1, %32) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %34 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %35 : Tensor = aten::ones(%34, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %36 : Double(1) = prim::BailOut[index=4](%35, %z.1, %counters.1, %9, %11, %i.2, %y.9, %32) [DUMP profiling_graph_executor_impl.cpp:124] %37 : Tensor[] = prim::ListConstruct(%y.9, %36) [DUMP profiling_graph_executor_impl.cpp:124] %y.10 : Double(5) = aten::cat(%37, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %39 : int = aten::add(%32, %24) [DUMP profiling_graph_executor_impl.cpp:124] %40 : int[] = aten::append(%counters.1, %39) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %41 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %42 : Tensor = aten::ones(%41, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %43 : Double(1) = prim::BailOut[index=5](%42, %z.1, %counters.1, %9, %11, %i.2, %y.10, %39) [DUMP profiling_graph_executor_impl.cpp:124] %44 : Tensor[] = prim::ListConstruct(%y.10, %43) [DUMP profiling_graph_executor_impl.cpp:124] %y.11 : Double(6) = aten::cat(%44, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %46 : int = aten::add(%39, %24) [DUMP profiling_graph_executor_impl.cpp:124] %47 : int[] = aten::append(%counters.1, %46) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %48 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %49 : Tensor = aten::ones(%48, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %50 : Double(1) = prim::BailOut[index=6](%49, %z.1, %counters.1, %9, %11, %i.2, %y.11, %46) [DUMP profiling_graph_executor_impl.cpp:124] %51 : Tensor[] = prim::ListConstruct(%y.11, %50) [DUMP profiling_graph_executor_impl.cpp:124] %y.12 : Double(7) = aten::cat(%51, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %53 : int = aten::add(%46, %24) [DUMP profiling_graph_executor_impl.cpp:124] %54 : int[] = aten::append(%counters.1, %53) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %55 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %56 : Tensor = aten::ones(%55, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %57 : Double(1) = prim::BailOut[index=7](%56, %z.1, %counters.1, %9, %11, %i.2, %y.12, %53) [DUMP profiling_graph_executor_impl.cpp:124] %58 : Tensor[] = prim::ListConstruct(%y.12, %57) [DUMP profiling_graph_executor_impl.cpp:124] %y.13 : Double(8) = aten::cat(%58, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %60 : int = aten::add(%53, %24) [DUMP profiling_graph_executor_impl.cpp:124] %61 : int[] = aten::append(%counters.1, %60) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %62 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %63 : Tensor = aten::ones(%62, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %64 : Double(1) = prim::BailOut[index=8](%63, %z.1, %counters.1, %9, %11, %i.2, %y.13, %60) [DUMP profiling_graph_executor_impl.cpp:124] %65 : Tensor[] = prim::ListConstruct(%y.13, %64) [DUMP profiling_graph_executor_impl.cpp:124] %y.14 : Double(9) = aten::cat(%65, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %67 : int = aten::add(%60, %24) [DUMP profiling_graph_executor_impl.cpp:124] %68 : int[] = aten::append(%counters.1, %67) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %69 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %70 : Tensor = aten::ones(%69, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %71 : Double(1) = prim::BailOut[index=9](%70, %z.1, %counters.1, %9, %11, %i.2, %y.14, %67) [DUMP profiling_graph_executor_impl.cpp:124] %72 : Tensor[] = prim::ListConstruct(%y.14, %71) [DUMP profiling_graph_executor_impl.cpp:124] %y.15 : Tensor = aten::cat(%72, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %74 : int = aten::add(%67, %24) [DUMP profiling_graph_executor_impl.cpp:124] -> (%6, %74, %y.15) [DUMP profiling_graph_executor_impl.cpp:124] %75 : Double(10) = prim::BailOut[index=10](%y.1, %z.1, %counters.1, %11, %12) [DUMP profiling_graph_executor_impl.cpp:124] %76 : int, %y : Tensor = prim::Loop(%11, %6, %12, %75) # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] block0(%i.1 : int, %79 : int, %y.6 : Tensor): [DUMP profiling_graph_executor_impl.cpp:124] %81 : Double(*) = prim::BailOut[index=11](%y.6, %z.1, %counters.1, %11, %i.1, %79) [DUMP profiling_graph_executor_impl.cpp:124] %82 : int[] = aten::append(%counters.1, %79) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %83 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %84 : Tensor = aten::ones(%83, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %85 : Double(1) = prim::BailOut[index=12](%84, %counters.1, %11, %i.1, %79, %81) [DUMP profiling_graph_executor_impl.cpp:124] %86 : Tensor[] = prim::ListConstruct(%81, %85) [DUMP profiling_graph_executor_impl.cpp:124] %y.4 : Tensor = aten::cat(%86, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %88 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:124] %89 : int = aten::add(%79, %88) [DUMP profiling_graph_executor_impl.cpp:124] -> (%6, %89, %y.4) [DUMP profiling_graph_executor_impl.cpp:124] %90 : Double(12) = prim::BailOut[index=13](%y, %counters.1) [DUMP profiling_graph_executor_impl.cpp:124] %91 : (Tensor, int[]) = prim::TupleConstruct(%90, %counters.1) [DUMP profiling_graph_executor_impl.cpp:124] return (%91) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31400 Differential Revision: D19172750 Pulled By: Krovatkin fbshipit-source-id: 85d3aac4e80b65b83b6be3c0bca8075a731a2b7e	2019-12-19 00:34:50 -08:00
David Riazati	1e116a5089	Revert D19054937: Add support for `del` Test Plan: revert-hammer Differential Revision: D19054937 Original commit changeset: c535ea16a9e6 fbshipit-source-id: e57d31811441947b7ee38c8c2b16eecde5005792	2019-12-18 22:39:41 -08:00
Junjie Bai	489dd6cb90	Add TORCH_DCHECK macro that checks only in debug builds (#31240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31240 Follow up on discoveries/discussions in https://github.com/pytorch/pytorch/pull/30810 Mimic the `DCHECK` macro from https://github.com/pytorch/pytorch/blob/e5eb871/c10/util/logging_is_not_google_glog.h#L117-L125 With this change the perf gap is eliminated: ``` ================================================================================ Program Output: ================================================================================ Run on (36 X 1601 MHz CPU s) 2019-12-12 20:12:13 ----------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------- BM_IntrusivePtrCtorDtor 23 ns 23 ns 30914703 BM_SharedPtrCtorDtor 27 ns 27 ns 25895944 BM_IntrusivePtrArray/16 503 ns 503 ns 1392139 BM_IntrusivePtrArray/32 1006 ns 1006 ns 695749 BM_IntrusivePtrArray/64 2013 ns 2013 ns 347714 BM_IntrusivePtrArray/128 4024 ns 4024 ns 173964 BM_IntrusivePtrArray/256 8047 ns 8047 ns 86994 BM_IntrusivePtrArray/512 16106 ns 16106 ns 43461 BM_IntrusivePtrArray/1024 32208 ns 32207 ns 21731 BM_IntrusivePtrArray/2048 64431 ns 64430 ns 10865 BM_IntrusivePtrArray/4096 128940 ns 128938 ns 5429 BM_SharedPtrArray/16 503 ns 503 ns 1392128 BM_SharedPtrArray/32 1006 ns 1006 ns 695940 BM_SharedPtrArray/64 2012 ns 2012 ns 347817 BM_SharedPtrArray/128 4024 ns 4023 ns 173927 BM_SharedPtrArray/256 8069 ns 8069 ns 86741 BM_SharedPtrArray/512 16143 ns 16142 ns 43357 BM_SharedPtrArray/1024 32283 ns 32283 ns 21685 BM_SharedPtrArray/2048 64718 ns 64717 ns 10817 BM_SharedPtrArray/4096 129469 ns 129466 ns 5407 ================================================================================ ``` ``` ================================================================================ Program Output: ================================================================================ Run on (80 X 2001 MHz CPU s) 2019-12-12 20:12:23 ----------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------- BM_IntrusivePtrCtorDtor 18 ns 18 ns 38630411 BM_SharedPtrCtorDtor 22 ns 22 ns 32356114 BM_IntrusivePtrArray/16 402 ns 402 ns 1739637 BM_IntrusivePtrArray/32 805 ns 805 ns 869818 BM_IntrusivePtrArray/64 1610 ns 1609 ns 434881 BM_IntrusivePtrArray/128 3218 ns 3218 ns 217437 BM_IntrusivePtrArray/256 6436 ns 6436 ns 108739 BM_IntrusivePtrArray/512 12882 ns 12882 ns 54356 BM_IntrusivePtrArray/1024 25763 ns 25763 ns 27177 BM_IntrusivePtrArray/2048 51532 ns 51531 ns 13590 BM_IntrusivePtrArray/4096 103091 ns 103091 ns 6778 BM_SharedPtrArray/16 402 ns 402 ns 1740165 BM_SharedPtrArray/32 804 ns 804 ns 869035 BM_SharedPtrArray/64 1610 ns 1610 ns 434975 BM_SharedPtrArray/128 3218 ns 3218 ns 217505 BM_SharedPtrArray/256 6457 ns 6457 ns 108510 BM_SharedPtrArray/512 12909 ns 12909 ns 54249 BM_SharedPtrArray/1024 25810 ns 25810 ns 27127 BM_SharedPtrArray/2048 51763 ns 51763 ns 13531 BM_SharedPtrArray/4096 103506 ns 103505 ns 6759 ================================================================================ ``` Test Plan: buck test caffe2/c10/... buck test mode/opt caffe2/c10/... Differential Revision: D18998243 fbshipit-source-id: ddf0a118a80efe032b52d403867c1f416c721590	2019-12-18 21:55:58 -08:00
Elias Ellison	fb24f7c4ad	catch all exceptions in converting default values to ivalues (#31398 ) Summary: Previously we would only catch `py::cast_error` which led to incomprehensible error messages like: `TypeError: 'NoneType' object is not iterable`. We are running arbitrary pybind code here, and not doing anything with the error message, so we should be less restrictive with the types of errors we catch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31398 Differential Revision: D19166655 Pulled By: eellison fbshipit-source-id: 84db8b3714c718b475913f2f4bb6f19e62f2d9ec	2019-12-18 20:27:46 -08:00
Jerry Zhang	1bb6c51421	Fix getAttribute (#31011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31011 `getAttribute` is supposed to throw when there the attribute is not found rather than return a `nullptr`. Test Plan: . Imported from OSS Differential Revision: D18898417 fbshipit-source-id: 0fe7d824b978ad19bb5ef094d3aa560e9fc57f87	2019-12-18 19:27:39 -08:00
Jeremy Lilley	dff7b945bf	Avoid sending large unneeded data over wire in process_group_agent. (#31357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31357 If a user selects a subset of a Tensor and sends it in an RPC, we were sending the whole original Tensor Storage over the network. While this sounds reasonable, in practice, we observed view-like Tensors being sent over rpc, where only 1% of the data in the provided Tensor's Storage was actually used/needed. The simple solution here is to just force a clone in the serializer code if we see that less than (arbitrary) half the bits are used, and the tensor is more than a nominal few KB. Add related tests to ensure this doesn't break. An alternate approach would be to modify the Pickler. That said, since Pickler is shared by more components, the logic might be harder to tailor appropriately at that layer (particularly given that the Pickler has explicit logic to share a single Storage* among several Tensors that commonly point to the same Storage*). It's possible that we might want to further refine the basic thresholds in this change. In practice, we've seen a mostly bimodal distribution thus far for the percent of Tensor Storage referred by a Tensor in observed rpcs (i.e. either 90%+ or sub-10% of the Storage referenced), hence the existing 50% threshold here is probably not an unreasonable starting point. ghstack-source-id: 95925474 Test Plan: buck test mode/dev caffe2/test/cpp/rpc/... Differential Revision: D19137056 fbshipit-source-id: e2b3a4dd0cc6e1de820fd0740aa1d59883dbf8d4	2019-12-18 19:24:24 -08:00
svcscm	1bb800cf5c	Updating submodules Summary: GitHub commits: `f5d37bdcfd` `21ba9e3692` `576eeaee27` `7ba1f57d53` `e520f8f5b3` `54f9092b0c` `88bb770ce1` `d91888de6c` `ff06eb0881` `fdaeb6ea30` `1fd432f00f` `60b7cb3408` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: f63bd0a879f4d08e159f530f595067f5a09ffe70	2019-12-18 18:41:23 -08:00
Jerry Zhang	fe707c7849	Use `default_observer` and `default_weight_observer` in tests (#31424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31424 att Test Plan: test_jit.py Imported from OSS Differential Revision: D19162368 fbshipit-source-id: 33b95ba643eeeae942283bbc33f7ceda8d14c431	2019-12-18 18:35:07 -08:00
davidriazati	e1509cb468	Add support for `del` (#31273 ) Summary: Adds the `del` keyword to the parser and corresponding `aten::Delete` op for lists and dicts Fixes #20615 ](https://our.intern.facebook.com/intern/diff/19054937/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31273 Pulled By: driazati Differential Revision: D19054937 fbshipit-source-id: c535ea16a9e62d176f8ad45947670fc3535af77c	2019-12-18 18:19:22 -08:00
Michael Suo	e7d25a3e4d	add a suggested alternative to _get_trace_graph Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31441 Test Plan: Imported from OSS Differential Revision: D19165646 Pulled By: suo fbshipit-source-id: 96a264bc55ceafd798d92b986d319cddbb0d9c69	2019-12-18 17:34:25 -08:00
Kaikai Wang	d2e66b44cc	Temporary fix to support building pytorch from fbsource (for xplat dependencies) (#31393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31393 pytorch build was set up with the include paths (-I) relative to fbcode/. This works well for fbcode builds, but doesn't work for the new fbcode_deps args for xplat build targets that work across xplat and fbcode. When these targets are built, the include paths need to be relative to fbsource, so fbcode/ suffix needs to be added to those paths. Longer term, to properly fix this, we need to use raw_headers with public_include_directories specified for all of these targets. Test Plan: buck test mode/dev //papaya/integration/service/local/test:mnist_federated_system_test -- 'MnistFederatedSystemTest\.test' --run-disabled Reviewed By: mzlee Differential Revision: D19148465 fbshipit-source-id: a610e84bf4cad5838e54e94bae71b957c4b6d4b5	2019-12-18 17:30:57 -08:00
James Reed	a3cdb7eca3	Fix default instantation of dynamic quantized LSTM Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31433 Test Plan: Imported from OSS Differential Revision: D19164539 Pulled By: jamesr66a fbshipit-source-id: 7045817ab3dfb530c4480a10523c4c6bcdbfc7eb	2019-12-18 16:59:00 -08:00
Tristan Rice	1e80ff7a67	autograd/profiler: make record_function more threadsafe (#31346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31346 This makes it so that if profiling is enabled/disabled from a different thread while a RecordFunction span is active via an op it doesn't crash the process. We currently see when using torch.distributed.rpc to enable/disable profiling on other nodes while other things are running. Test Plan: buck test //caffe2/test:autograd -- test_record_function Reviewed By: albanD Differential Revision: D19133258 fbshipit-source-id: 30712b06c6aa051789948de2918dcfb9b78967ba	2019-12-18 16:27:42 -08:00
davidriazati	148bcd3ee5	Add support for builtins as attributes (#31269 ) Summary: Fixes #27495 This adds builtins as another piece of a concrete type. They're separate from normal functions since they represent the `BuiltinFunction` sugared value (which is a direct call to a builtin op). It also moves the builtins related logic from `jit/__init__.py` to `jit/_builtins.py` so it can be used from `jit/_recursive.py` to look up functions in the builtins table. ](https://our.intern.facebook.com/intern/diff/19149779/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31269 Pulled By: driazati Differential Revision: D19149779 fbshipit-source-id: d4e5e5d7d7d528b75a2f503e6004394251a4e82d	2019-12-18 15:24:45 -08:00
davidriazati	503a4e9019	Cleanup after moving language reference (#31146 ) Summary: Stacked PRs * #31146 - [jit] Cleanup after moving language reference * #31138 - [jit] Move TorchScript language reference to its own page Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language Pull Request resolved: https://github.com/pytorch/pytorch/pull/31146 Pulled By: driazati Differential Revision: D19167390 fbshipit-source-id: f28daed36754a553264fc8ac142ed22c3e26d63e	2019-12-18 15:09:35 -08:00
davidriazati	ae2487bf4d	Move TorchScript language reference to its own page (#31138 ) Summary: Stacked PRs * #31146 - [jit] Cleanup after moving language reference * #31138 - [jit] Move TorchScript language reference to its own page Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language Pull Request resolved: https://github.com/pytorch/pytorch/pull/31138 Pulled By: driazati Differential Revision: D19167375 fbshipit-source-id: d37110d85fc8b8d2c741be49846e873de1357c2a	2019-12-18 15:09:31 -08:00
Yanghan Wang	d08250c223	fix zero-batch handling in convtranspose (#24341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24341 ConvTransposeOp doesn't crash for zero-batch, but it doesn't modify the output blob. This leads to buggy behaviour especially when running the same network twice using different input, or backprop during training. Seems `ConvTransposeUnpoolBase<Context>::GetOutputSize` works for zero-batch, so I remove the check for `input.numel() > 0`, and reshape the output blob before returning. For CudnnConvTransposeGradientOp, it's a bit verbose to set `dfilter` and `dbias`, it's a seems the Cudnn can handle it, so simply remove the `X.numel() == 0` branch. Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:conv_transpose_test -- --run-disabled Reviewed By: BIT-silence Differential Revision: D16807606 fbshipit-source-id: 0d72c5bd8f2e03c34465e7b530cca548d9bdd5e1	2019-12-18 15:06:36 -08:00
davidriazati	7692494c67	Fix hex literal parsing (#29935 ) Summary: Stacked PRs * #29940 - [jit] Fix parsing of big float literals * #29935 - [jit] Fix hex literal parsing * #29931 - [jit] Throw a better error for int too big for int64_t Previously these were all parsed as `0` ](https://our.intern.facebook.com/intern/diff/19124944/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29935 Pulled By: driazati Differential Revision: D19124944 fbshipit-source-id: 1ee0c1dee589933363a5efba069a2cfaf94373c5	2019-12-18 14:00:22 -08:00
davidriazati	1f50cfc24d	Throw a better error for int too big for int64_t Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29931 Pulled By: driazati Differential Revision: D19124934 fbshipit-source-id: 91841d7ba4f2f6142c51fba07b7faa14bb817e3a	2019-12-18 14:00:16 -08:00
Elias Ellison	fb30a48b4e	add unsupported section (#31329 ) Summary: Add a section for unsupported ops, and modules. Automatically generate the properties and attributes that aren't bound, and for ops that have semantic mismatches set up tests so the docs stay up to date. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31329 Differential Revision: D19164472 Pulled By: eellison fbshipit-source-id: 46290bb8a64d9de928cfb1eda5ff4558c3799c88	2019-12-18 13:56:02 -08:00
Andreas Koepf	5e8bac24b4	Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) (#28135 ) Summary: Fix: https://github.com/pytorch/pytorch/issues/24631, https://github.com/pytorch/pytorch/issues/24632, https://github.com/pytorch/pytorch/issues/24764, https://github.com/pytorch/pytorch/issues/24765 Port of TH SoftMarginCriterion to ATen using un-fused tensor operators but with custom backward code. This is a follow-up/fixc of reverted PR https://github.com/pytorch/pytorch/issues/27673. Benchmark results: CPU became faster, GPU slower. To reach previous TH perf probably manual fusion is necessary. ### WITH patch ``` CPU warmup 1000 took 7.997200009413064e-05 CPU warmup 10000 took 0.0008116499957395718 CPU warmup 100000 took 0.0012691459996858612 CPU warmup TOTAL time 0.0021982479956932366 CPU forward 1000 took 7.320100849028677e-05 CPU forward 10000 took 0.00015837099635973573 CPU forward 100000 took 0.0010471990099176764 CPU forward 1000000 took 0.01238470000680536 CPU forward 10000000 took 0.12747182900784537 CPU forward 100000000 took 1.2076255190040683 CPU forward TOTAL time 1.3488940890092636 CPU for- & backward 1000 took 0.00032587299938313663 CPU for- & backward 10000 took 0.0006926299975020811 CPU for- & backward 100000 took 0.002146183993318118 CPU for- & backward 1000000 took 0.019158899012836628 CPU for- & backward 10000000 took 0.2957490350090666 CPU for- & backward 100000000 took 1.7630806300003314 CPU for- & backward TOTAL time 2.081367089995183 GPU warmup 1000 took 0.0004558280052151531 GPU warmup 10000 took 0.0002567449992056936 GPU warmup 100000 took 0.0001593509950907901 GPU warmup TOTAL time 0.0009442300070077181 GPU forward 1000 took 0.00015061900194268674 GPU forward 10000 took 0.00015258099301718175 GPU forward 100000 took 0.00015409699699375778 GPU forward 1000000 took 0.0008183339959941804 GPU forward 10000000 took 0.004424853003001772 GPU forward 100000000 took 0.04356115800328553 GPU forward TOTAL time 0.04938192600093316 GPU for- & backward 1000 took 0.0008062430133577436 GPU for- & backward 10000 took 0.0006074949924368411 GPU for- & backward 100000 took 0.0007091690058587119 GPU for- & backward 1000000 took 0.001022183001623489 GPU for- & backward 10000000 took 0.009945805999450386 GPU for- & backward 100000000 took 0.0944173600000795 GPU for- & backward TOTAL time 0.28060428200114984 ``` ### WITHOUT patch ``` CPU warmup 1000 took 6.394000956788659e-05 CPU warmup 10000 took 0.00038220599526539445 CPU warmup 100000 took 0.0034939230099553242 CPU warmup TOTAL time 0.003981974994530901 CPU forward 1000 took 4.7855006414465606e-05 CPU forward 10000 took 0.000347569992300123 CPU forward 100000 took 0.003367935001733713 CPU forward 1000000 took 0.03605044000141788 CPU forward 10000000 took 0.35935167300340254 CPU forward 100000000 took 3.630371332008508 CPU forward TOTAL time 4.029640004009707 CPU for- & backward 1000 took 0.00028494100843090564 CPU for- & backward 10000 took 0.0006738200027029961 CPU for- & backward 100000 took 0.0051178760040784255 CPU for- & backward 1000000 took 0.04925115800870117 CPU for- & backward 10000000 took 0.7172313440096332 CPU for- & backward 100000000 took 5.441953932997421 CPU for- & backward TOTAL time 6.21466830400459 GPU warmup 1000 took 0.001803738996386528 GPU warmup 10000 took 0.00041877900366671383 GPU warmup 100000 took 0.0003870719956466928 GPU warmup TOTAL time 0.0026561370032140985 GPU forward 1000 took 0.00037833399255760014 GPU forward 10000 took 0.00038825398951303214 GPU forward 100000 took 0.0003841099969577044 GPU forward 1000000 took 0.0007090550061548129 GPU forward 10000000 took 0.0016171559982467443 GPU forward 100000000 took 0.013463679002597928 GPU forward TOTAL time 0.017010531009873375 GPU for- & backward 1000 took 0.0007374050037469715 GPU for- & backward 10000 took 0.0006343529967125505 GPU for- & backward 100000 took 0.0006375070079229772 GPU for- & backward 1000000 took 0.0007550300069851801 GPU for- & backward 10000000 took 0.002672752001672052 GPU for- & backward 100000000 took 0.023170708998804912 GPU for- & backward TOTAL time 0.20251446698966902 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28135 Differential Revision: D18001447 Pulled By: VitalyFedyunin fbshipit-source-id: ad90dc1cca42dcaf3ea9e17e4f8fd79cee0a293e	2019-12-18 13:33:59 -08:00
xiaobing.zhang	7cf8b9bada	Move leaky_relu to Aten(CPU, CUDA) (#29899 ) Summary: VitalyFedyunin, This PR is about port LeakyReLU activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.LeakyReLU() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.14 (ms). input size(128, 10000) forward time is 4.21 (ms); backwad avg time is 8.02 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.07 (ms). input size(128, 10000) forward time is 1.98 (ms); backwad avg time is 6.21 (ms) ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.04 (ms). input size(128, 10000) forward time is 0.03 (ms); backwad avg time is 0.09 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10000) forward time is 0.47 (ms); backwad avg time is 1.02 (ms). ``` How to set the numbers of thread? using following script: ``` num_threads=$1 script=$2 last_core=`expr $num_threads - 1` echo "using $num_threads OMP threads" echo "bind cores to 0~$last_core" export OMP_NUM_THREADS=$num_threads export KMP_AFFINITY=granularity=fine,compact,1,0 numactl --physcpubind=0-$last_core --membind=0 python $script ``` and run ./run.sh num_threads test.py. Fixes https://github.com/pytorch/pytorch/issues/24583 #24584 https://github.com/pytorch/pytorch/issues/24720 #24721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29899 Differential Revision: D18816231 Pulled By: VitalyFedyunin fbshipit-source-id: afb1e43a99317d17f50cff1b593cd8f7a0a83da2	2019-12-18 13:14:11 -08:00
Tristan Rice	b0bd35ff13	caffe2/event: allow multiple errors such as when cancelled (#31335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31335 When an error occurs in a net we end up cancelling all the async ops. If one error occurs it's highly likely other errors will occur as well. Typically we see: 1. SendOp failed due to a network error 2. async scheduling cancels all other ops via `SetFinished("Cancelled");` 3. Another SendOp fails due to a network error and crashes the process when the exception is thrown. This changes caffe2 ops to allow failing twice. Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu Reviewed By: andrewwdye Differential Revision: D19106548 fbshipit-source-id: 4b7882258a240894cc16d061a563c83a3214d3d9	2019-12-18 13:10:57 -08:00
Mingbo Wan	4d22c3ba01	fix docker login, add docker image tag list after purge as html (#31328 ) Summary: example of the generated html: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/31328 Differential Revision: D19147113 Pulled By: mingbowan fbshipit-source-id: 5104e92d4490f047a6474e2b12aed3293b52a9df	2019-12-18 12:08:51 -08:00
Pavel Belevich	47766e648f	C++ API parity: MultiheadAttention Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27309 Test Plan: Imported from OSS Differential Revision: D17766736 Pulled By: pbelevich fbshipit-source-id: 7a5f2399f081945d31d4c13d7a8d248c387fc1a6	2019-12-18 10:13:29 -08:00

... 3 4 5 6 7 ...

23444 Commits