pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Tongzhou Wang	b6f43afaca	Fix tensordot allowing negative dims (#31954 ) Summary: fixes https://github.com/pytorch/pytorch/issues/31926 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31954 Differential Revision: D19331847 Pulled By: zou3519 fbshipit-source-id: e30dd9517917c056a52be7d16f23247fe28f4e28	2020-01-10 07:42:04 -08:00
Rohan Varma	8ea49e7a08	add missing braces for format in rpc _to_worker_info (#31969 ) Summary: This was missing and resulted in the incorrect `name` passed into `_to_worker_info` not being printed out in the error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31969 Differential Revision: D19331927 Pulled By: rohan-varma fbshipit-source-id: e74d47daec3224c2d9b9da3c0a6404cfa67baf65	2020-01-09 23:18:46 -08:00
Edward Yang	67c1d930eb	Lock graph_task before writing leaf_streams. (#31995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31995 Fixes #31906. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19331259 Pulled By: ezyang fbshipit-source-id: 5d24bf3555e632211a9b6f8e50ff241603c18b3d	2020-01-09 13:26:36 -08:00
TH3CHARLie	1296e2d55e	C++ API parity: isinf (#31099 ) Summary: fixes https://github.com/pytorch/pytorch/issues/31021, port the legacy binding method of `isinf` to C++ therefore support JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/31099 Differential Revision: D19314733 Pulled By: yf225 fbshipit-source-id: 5725c51d19c33b4fddd0fc9e7034078580bd534e	2020-01-09 13:16:13 -08:00
Sameer Deshmukh	cfdfdf70d7	remove JSON dumping dependency (#30724 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19420 So after actually writing a C++ JSON dumping class I figured that a faster and cleaner way would be simply rewrite the Python without the JSON module since the JSON that we need to output is so simple. For now I decided to not touch the `parse_cpu_trace` function since only changing `export_chrome_trace` shows a 4x speedup. Here's the script I used for benchmarking: ``` python import time import torch x = torch.ones(2, 2) start = time.time() with torch.autograd.profiler.profile() as prof: for _ in range(10000): x * x for i in range(50): prof.export_chrome_trace("trace.json") stop = time.time() print(stop-start) ``` master branch (using json dump) -> 8.07515025138855 new branch (without json dump) -> 2.0943689346313477 I checked the trace file generated in the [test](https://github.com/pytorch/pytorch/blob/master/test/test_autograd.py#L2659) and it does work fine. Please let me know what you think. If you still insist on the C++ version I can send a new patch soon enough. CC ezyang rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/30724 Differential Revision: D19298955 Pulled By: ezyang fbshipit-source-id: b0d7324ea5f90884ab8a00dd272f3aa3d9bc0427	2020-01-09 12:56:16 -08:00
jlquinn	bc68a8745f	Spelling fix in transformer docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31973 Differential Revision: D19330660 Pulled By: zou3519 fbshipit-source-id: 29ea1e790a34f0241cb7aba85110f087cdc069ba	2020-01-09 11:13:23 -08:00
Edward Yang	ddff4efa26	Don't use RTLD_GLOBAL to load _C. (#31162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31162 This should help us resolve a multitude of weird segfaults and crashes when PyTorch is imported along with other packages. Those would often happen because libtorch symbols were exposed globally and could be used as a source of relocations in shared libraries loaded after libtorch. Fixes #3059. Some of the subtleties in preparing this patch: * Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this. * Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D19262579 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2	2020-01-09 07:28:15 -08:00
Edward Yang	8614860210	Uniformly apply Windows logic in cpp_extensions everywhere (#31161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161 Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols. But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262578 Pulled By: ezyang fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f	2020-01-09 07:28:11 -08:00
Elias Ellison	8ecd3f783d	check for object equality in constant pooling (#31800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31800 If we know that two constants are the same object, we can ignore other constraints and pool them together. This fixes an issue introduced by the other PR where quantization relied on constant pooling happening for correctness. Test Plan: Imported from OSS Differential Revision: D19269499 Pulled By: eellison fbshipit-source-id: 9d4396125aa6899cb081863d463d4f024135cbf4	2020-01-08 16:47:07 -08:00
Elias Ellison	319cc21108	Add AliasDb API For Changing Aliasing (#31501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31501 We have a number of places in our code base where we should be checking if it's safe to change the alias relationship between two sets of values. This PR adds an api to Alias Db to consolidate the logic, and refactors Constant Pooling and `CSE` to use the new api. Next steps: add api usage in peephole.cpp where applicable. Happy to bikeshed `AliasDb::safeToChangeAliasingRelationship`. Previously I suggested `AliasDb::safeToIntroduceAliasing`, however that's not quite accurate, because this API also handles when it is unsafe to remove aliasing. Alternate suggestions: `safeToChangeAliasing`, `validToChangeAliasing`, `validToChangeAliasingRelationship` Related: https://github.com/pytorch/pytorch/issues/28360 Test Plan: Imported from OSS Differential Revision: D19254413 Pulled By: eellison fbshipit-source-id: 17f7f52ad2d1526d303132767cbbb32f8189ae15	2020-01-08 16:47:03 -08:00
davidriazati	883fb5434a	Use real argument names for Python functions (#29300 ) Summary: This hooks up `inspect` so that Python functions get their parameters names attached instead of naming them `0, 1, 2, ...`. This also fixes issue #28537 where `ignore` functions were improperly typing `self`. ](https://our.intern.facebook.com/intern/diff/19256434/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29300 Pulled By: driazati Differential Revision: D19256434 fbshipit-source-id: 6a1fe7bd0afab708b8439517798955d0abfeb44c	2020-01-08 15:41:28 -08:00
xiaobing.zhang	9ba6a768de	Add op bitwise_or (#31559 ) Summary: ezyang , this PR add bitwise_or operator as https://github.com/pytorch/pytorch/pull/31104 . Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__or__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__ior__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.17616272252053022 device: cpu, dtype: torch.uint8, 100000 times 0.17148233391344547 device: cpu, dtype: torch.int16, 100000 times 0.17616403382271528 device: cpu, dtype: torch.int32, 100000 times 0.17717823758721352 device: cpu, dtype: torch.int64, 100000 times 0.1801931718364358 device: cuda, dtype: torch.int8, 100000 times 1.270583058707416 device: cuda, dtype: torch.uint8, 100000 times 1.2636413089931011 device: cuda, dtype: torch.int16, 100000 times 1.2839747751131654 device: cuda, dtype: torch.int32, 100000 times 1.2548385225236416 device: cuda, dtype: torch.int64, 100000 times 1.2650810535997152 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.031136621721088886 device: cpu, dtype: torch.uint8, 10000 times 0.030786747112870216 device: cpu, dtype: torch.int16, 10000 times 0.02391665056347847 device: cpu, dtype: torch.int32, 10000 times 0.024147341027855873 device: cpu, dtype: torch.int64, 10000 times 0.024414129555225372 device: cuda, dtype: torch.int8, 10000 times 0.12741921469569206 device: cuda, dtype: torch.uint8, 10000 times 0.1249831635504961 device: cuda, dtype: torch.int16, 10000 times 0.1283819805830717 device: cuda, dtype: torch.int32, 10000 times 0.12591975275427103 device: cuda, dtype: torch.int64, 10000 times 0.12655890546739101 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3908365070819855 device: cpu, dtype: torch.uint8, 100000 times 0.38267823681235313 device: cpu, dtype: torch.int16, 100000 times 0.38239253498613834 device: cpu, dtype: torch.int32, 100000 times 0.3817988149821758 device: cpu, dtype: torch.int64, 100000 times 0.3901665909215808 device: cuda, dtype: torch.int8, 100000 times 1.4211318120360374 device: cuda, dtype: torch.uint8, 100000 times 1.4215159295126796 device: cuda, dtype: torch.int16, 100000 times 1.4307750314474106 device: cuda, dtype: torch.int32, 100000 times 1.4123614141717553 device: cuda, dtype: torch.int64, 100000 times 1.4480243818834424 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.06468924414366484 device: cpu, dtype: torch.uint8, 10000 times 0.06442475505173206 device: cpu, dtype: torch.int16, 10000 times 0.05267547257244587 device: cpu, dtype: torch.int32, 10000 times 0.05286940559744835 device: cpu, dtype: torch.int64, 10000 times 0.06211103219538927 device: cuda, dtype: torch.int8, 10000 times 0.15332304500043392 device: cuda, dtype: torch.uint8, 10000 times 0.15353196952492 device: cuda, dtype: torch.int16, 10000 times 0.15300503931939602 device: cuda, dtype: torch.int32, 10000 times 0.15274472255259752 device: cuda, dtype: torch.int64, 10000 times 0.1512152962386608 ``` After: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.2465507509186864 device: cpu, dtype: torch.uint8, 100000 times 0.2472386620938778 device: cpu, dtype: torch.int16, 100000 times 0.2469814233481884 device: cpu, dtype: torch.int32, 100000 times 0.2535214088857174 device: cpu, dtype: torch.int64, 100000 times 0.24855613708496094 device: cuda, dtype: torch.int8, 100000 times 1.4351346511393785 device: cuda, dtype: torch.uint8, 100000 times 1.4434308474883437 device: cuda, dtype: torch.int16, 100000 times 1.4520929995924234 device: cuda, dtype: torch.int32, 100000 times 1.4456610176712275 device: cuda, dtype: torch.int64, 100000 times 1.4580101007595658 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.029985425993800163 device: cpu, dtype: torch.uint8, 10000 times 0.03024935908615589 device: cpu, dtype: torch.int16, 10000 times 0.026356655173003674 device: cpu, dtype: torch.int32, 10000 times 0.027377349324524403 device: cpu, dtype: torch.int64, 10000 times 0.029163731262087822 device: cuda, dtype: torch.int8, 10000 times 0.14540370367467403 device: cuda, dtype: torch.uint8, 10000 times 0.1456305105239153 device: cuda, dtype: torch.int16, 10000 times 0.1450125053524971 device: cuda, dtype: torch.int32, 10000 times 0.1472016740590334 device: cuda, dtype: torch.int64, 10000 times 0.14709716010838747 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.27195510920137167 device: cpu, dtype: torch.uint8, 100000 times 0.2692424338310957 device: cpu, dtype: torch.int16, 100000 times 0.27726674638688564 device: cpu, dtype: torch.int32, 100000 times 0.2815811652690172 device: cpu, dtype: torch.int64, 100000 times 0.2852728571742773 device: cuda, dtype: torch.int8, 100000 times 1.4743850827217102 device: cuda, dtype: torch.uint8, 100000 times 1.4766502184793353 device: cuda, dtype: torch.int16, 100000 times 1.4774163831025362 device: cuda, dtype: torch.int32, 100000 times 1.4749693805351853 device: cuda, dtype: torch.int64, 100000 times 1.5772947426885366 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03614502027630806 device: cpu, dtype: torch.uint8, 10000 times 0.03619729354977608 device: cpu, dtype: torch.int16, 10000 times 0.0319912089034915 device: cpu, dtype: torch.int32, 10000 times 0.03319283854216337 device: cpu, dtype: torch.int64, 10000 times 0.0343862259760499 device: cuda, dtype: torch.int8, 10000 times 0.1581476852297783 device: cuda, dtype: torch.uint8, 10000 times 0.15974601730704308 device: cuda, dtype: torch.int16, 10000 times 0.15957212820649147 device: cuda, dtype: torch.int32, 10000 times 0.16002820804715157 device: cuda, dtype: torch.int64, 10000 times 0.16129320487380028 ``` Fix https://github.com/pytorch/pytorch/issues/24511, https://github.com/pytorch/pytorch/issues/24515, https://github.com/pytorch/pytorch/issues/24658, https://github.com/pytorch/pytorch/issues/24662. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31559 Differential Revision: D19315875 Pulled By: ezyang fbshipit-source-id: 4a3ca88fdafbeb796079687e676228111eb44aad	2020-01-08 15:06:30 -08:00
Alban Desmaison	1314f7f4f4	Ensure the original grad_mode is restored during backward (#31884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31884 Fix #31715 Test Plan: Imported from OSS Differential Revision: D19301076 Pulled By: albanD fbshipit-source-id: 2d20c01bfb6364fa96c8fe5aa5ce7ea39defa3ce	2020-01-08 14:16:51 -08:00
Edward Yang	5dfcfeebb8	Revert D19298735: Emit warning from deprecated torch function signatures Test Plan: revert-hammer Differential Revision: D19298735 Original commit changeset: 03cb78af1765 fbshipit-source-id: 304a6d4412f53a8fc822d36897c96815432e0f70	2020-01-08 13:04:41 -08:00
Shen Li	7f723cbd8a	Revert D19290954: Implement backend-agnostic rpc._wait_all_workers() utility Test Plan: revert-hammer Differential Revision: D19290954 Original commit changeset: cdb22203c2f2 fbshipit-source-id: 2ae194a06a645e4f48879271eccf0588b0956cd3	2020-01-08 10:25:51 -08:00
Shihao Xu	6664703842	Implement backend-agnostic rpc._wait_all_workers() utility (#31888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31888 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. ghstack-source-id: 96386210 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19290954 fbshipit-source-id: cdb22203c2f27b5e0d0ad5b2d3b279d438c22dcf	2020-01-08 01:00:25 -08:00
davidriazati	3c07eb33bb	Better error for `torch::jit::load`ing a eager file (#31709 ) Summary: This adds a check to catch the case where someone `torch.save`s something then `torch::jit::load`s it in C++. Relevant for #31620 ](https://our.intern.facebook.com/intern/diff/19252172/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31709 Pulled By: driazati Differential Revision: D19252172 fbshipit-source-id: f2a9b4442647285418b2778306629b4ff77c15e5	2020-01-07 16:20:42 -08:00
Shihao Xu	a730920a3d	Make RRef leak detection always print a warning log (#31922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31922 For better debugging, `test_rref_leak` failure in https://app.circleci.com/jobs/github/pytorch/pytorch/4135881, as per discussion in https://github.com/pytorch/pytorch/pull/31888. ghstack-source-id: 96375261 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19302814 fbshipit-source-id: 51632aede98e01689f8bc0f266788a9b020daa15	2020-01-07 15:18:00 -08:00
Karl Ostmo	227d1a43a4	Revert D18838848: disable __torch_function__ overides for operators in torch.functional Test Plan: revert-hammer Differential Revision: D18838848 Original commit changeset: 22b8015d7b2f fbshipit-source-id: fdaeffcd112990ed379782cf7216d3f1beeb2cb1	2020-01-07 15:03:15 -08:00
Nathan Goldbaum	ca72df06ae	disable __torch_function__ overides for operators in torch.functional (#30839 ) Summary: For now I'm just removing the decorators from all of the currently overridable functions in `torch.functional`. This means they are no longer overridable, however this should fix the benchmark regressions reported in https://github.com/pytorch/pytorch/issues/30831. Moving forward we'll be looking at reducing the overhead of the python-level override mechanism and failing that, re-implementing all of these operators in C++. cc hl475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30839 Differential Revision: D18838848 Pulled By: ezyang fbshipit-source-id: 22b8015d7b2f7a947f1ebc9632c998e081b48ad8	2020-01-07 12:27:28 -08:00
Artem Volkhin	3a2757c682	Fix tracing for modules with List[Tensor] as output (#31343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31343 Fix an issue in TorchScript tracing for modules with `c10::List<at::Tensor>` as an output. TensorList was not supported properly. Test Plan: unit tests Reviewed By: wanchaol Differential Revision: D18850722 fbshipit-source-id: 87a223104d1361fe754d55deceeb1e8bbcad629b	2020-01-07 11:57:25 -08:00
Pritam Damania	bf8e1c0710	Integrate async mode for autograd engine with distributed autograd. (#31508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31508 This PR builds on top of https://github.com/pytorch/pytorch/pull/31230 to ensure that distributed autograd doesn't block an RPC thread anymore during the backward pass. I've also added a unit test where all ranks hammer rank 0 without about 60 backward calls (which would cause a deadlock earlier), but now such a test passes without any issues. ghstack-source-id: 96345097 Test Plan: waitforbuildbot Differential Revision: D19188749 fbshipit-source-id: b21381b38175699afd0f9dce1ddc8ea6a220f589	2020-01-07 11:01:16 -08:00
Peter Bell	0e5a6700cc	Emit warning from deprecated torch function signatures (#31514 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28430 The unpythonic signatures for functions such as `torch.addcdiv` are already seperated in [`deprecated.yaml`] and the signatures marked as deprecated in `PythonArgParser`. However, nothing was done with this information previously. So, this now emits a warning when the deprecated signatures are used. One minor complication is that if all arguments are passed as keyword args then there is nothing to differentiate the deprecated overload. This can lead to false warnings being emitted. So, I've also modified `PythonArgParser` to prefer non-deprecated signatures. [`deprecated.yaml`]: https://github.com/pytorch/pytorch/blob/master/tools/autograd/deprecated.yaml Pull Request resolved: https://github.com/pytorch/pytorch/pull/31514 Differential Revision: D19298735 Pulled By: ezyang fbshipit-source-id: 03cb78af17658eaab9d577cd2497c6f413f07647	2020-01-07 10:57:53 -08:00
Pritam Damania	5cc62f2913	Ensure autograd callbacks are called only once for reentrant backward. (#31909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31909 https://github.com/pytorch/pytorch/pull/31230 introduced a bug where we would end up calling `graph_task_post_processing` twice for reentrant backward calls (once when we mark the future completed and then we we called graph_task_post_processing in execute_with_graph_task). This PR fixes the issues by verifying the future we return in that case is completed and we remove the call to graph_task_post_processing. In addition to that I added a test that reproduced the problem and verified it is fixed by this PR. ghstack-source-id: 96349102 Test Plan: waitforbuildbot Differential Revision: D19296363 fbshipit-source-id: dc01a4e95989709ad163bb0357b1d191ef5a4fb2	2020-01-07 10:35:04 -08:00
Sameer Deshmukh	2f5eefe525	Raise ValueError if CUDA device is specified without specifying the : (#29087 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29087 Differential Revision: D19298959 Pulled By: ezyang fbshipit-source-id: 878ea4840682012f07177d8d159a77c0e5afada6	2020-01-07 10:29:49 -08:00
Edward Yang	3c7db5ccbc	Don't unconditionally compile runJITCPPTests (#31236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31236 It is not compiled on Windows Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262581 Pulled By: ezyang fbshipit-source-id: 80bfa553333a946f00291aaca6ad26313caaa9e6	2020-01-07 10:24:52 -08:00
Andreas Koepf	22044c6f7c	Use TORCH_CHECK instead of AT_ASSERT in torch::cuda::gather() (#27456 ) Summary: The error message produced by AT_ASSERT() in gather() encouraged users to file a bug report ("please report a bug to PyTorch..."). The assertion should be a regular argument check since it can be triggered by passing tensors with different dimensionality, e.g. `torch.cuda.comm.gather([torch.rand(1, device='cuda'), torch.rand(1, 1, device='cuda')])`. See: https://github.com/pytorch/pytorch/issues/26400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27456 Differential Revision: D19300270 Pulled By: ezyang fbshipit-source-id: ec87d225e23445020b377521e0daccceb4748215	2020-01-07 10:04:24 -08:00
yyb1995	20c5dd59bd	Add stub for transformer.py and MultiheadAttention Class. (#28396 ) Summary: Add stub for `transformer.py` and `class MultiheadAttention`. Add import for `transformer.py` and `class MultiheadAttention` in `__init__.pyi.in`. I've tested the code hint in PyCharm and all works file. Relate issue: [https://github.com/pytorch/pytorch/issues/27842](https://github.com/pytorch/pytorch/issues/27842) ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/28396 Differential Revision: D19300287 Pulled By: ezyang fbshipit-source-id: 1a79d6518b5edd4643892c46a959108385c739ad	2020-01-07 09:13:36 -08:00
Peter Bell	5d80f63478	no_grad, enable_grad: support for decorating generator functions (#31792 ) Summary: Closes https://github.com/pytorch/pytorch/issues/31497 This allows `torch.no_grad` and `torch.enable_grad` to be used as decorators for generator functions. In which case it disables/enables grad only inside the body of the generator and restores the context outside of the generator. https://github.com/pytorch/pytorch/issues/31497 doesn't include a complete reproducer but the included test with `torch.is_grad_enabled` show this is working where it failed before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31792 Differential Revision: D19274971 Pulled By: albanD fbshipit-source-id: fde6d3fd95d76c8d324ad02db577213a4b68ccbe	2020-01-06 15:21:20 -08:00
Edward Yang	58cffbff91	Add missing TORCH_CUDA_API annotation to throw_nccl_error (#31157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31157 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262583 Pulled By: ezyang fbshipit-source-id: 8fb87b41ab53770329b38e1e2fe679fb868fee12	2020-01-06 14:39:51 -08:00
neginraoof	112196fdee	Fix index put (#31552 ) Summary: This change is required for cases like: x[1:] = data or x[:3] = data Pull Request resolved: https://github.com/pytorch/pytorch/pull/31552 Reviewed By: hl475 Differential Revision: D19238815 Pulled By: houseroad fbshipit-source-id: 56c9837d86b341ea92b0a71d55034ce189d12e6c	2020-01-06 14:09:48 -08:00
neginraoof	78cba90a8c	Enable constant folding for Reshape (#31054 ) Summary: Enabled constant folding for onnx::Reshape Pull Request resolved: https://github.com/pytorch/pytorch/pull/31054 Reviewed By: hl475 Differential Revision: D18946951 Pulled By: houseroad fbshipit-source-id: 499e8bf5fb091a94f7a27cbdf4311a23b1a6e3d3	2020-01-06 13:35:44 -08:00
anjali411	ddff014b79	fixed scale_factor calculation for uint8 tensor (#31778 ) Summary: When calling the add_images() method on the tensorboard SummaryWriter with a uint8 NCHW tensor, the tensor is incorrectly scaled, resulting in overflow behavior. This leads to incorrect images being displayed in tensorboard. Issue: https://github.com/pytorch/pytorch/issues/31459 Local Testing (ran this code with and without the PR changes and printed scale_factor): import torch import torchvision from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() x=torch.tensor([[[[1, 2, 3], [4, 5, 6]]]], dtype=torch.uint8) writer.add_images("images", x) Before- scale_factor: 255, After- scale_factor: 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31778 Differential Revision: D19289189 Pulled By: anjali411 fbshipit-source-id: 350a1650337244deae4fd8f8b7fb0e354ae6986b	2020-01-06 10:27:35 -08:00
meganset	1ba1799a66	C++ added 3rd arg of false to BatchNorm/InstanceNorm register_parameter … (#31873 ) Summary: Fix for issue https://github.com/pytorch/pytorch/issues/31680 C++ BatchNorm & InstanceNorm attempt to register undefined tensors when affine is false. Fixes https://github.com/pytorch/pytorch/issues/31680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31873 Differential Revision: D19287087 Pulled By: yf225 fbshipit-source-id: 0d57f10c49083386919b703d72b520a73a8e9e7f	2020-01-06 01:46:24 -08:00
Shen Li	33430cf094	Revert D18643137: Implement backend-agnostic rpc._wait_all_workers() utility Test Plan: revert-hammer Differential Revision: D18643137 Original commit changeset: d669d4fc9ad6 fbshipit-source-id: fe1f8ed77c1c5760638fef06e67ba100b86c33e9	2020-01-05 11:58:51 -08:00
Pritam Damania	fde94e7556	Provide async mode for local autograd engine. (#31230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31230 A major issue with distributed autograd currently is that we block an RPC thread when we call Engine::execute_with_graph_task. To resolve this issue, I've made modifications to the local autograd engine such that `execute_with_graph_task` returns a Future instead. The `execute()` methods for Engine::execute() and DistEngine::execute() still wait() on this Future which ensures there is no change in behavior yet. In follow up PRs we can modify the distributed autograd engine to take advantage of this Future. Closes #26359 ghstack-source-id: 96298057 Test Plan: waitforbuildbot Differential Revision: D18999709 fbshipit-source-id: 388f54467fd2415a0acb7df17bd063aedc105229	2020-01-05 00:29:28 -08:00
James Noeckel	3f0b330736	corrected keyword argument name in docs for Tensor.scatter (#31617 ) Summary: See https://github.com/pytorch/pytorch/issues/31601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31617 Differential Revision: D19268872 Pulled By: mruberry fbshipit-source-id: 52f0213f4aab991fd549b7623556a2ced61631a6	2020-01-04 21:48:30 -08:00
Shihao Xu	502533cfe6	Implement backend-agnostic rpc._wait_all_workers() utility (#30710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30710 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers$ buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_forward_chain ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_wait_all_workers$ ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` # Debug ``` buck test mode/dev-nosan caffe2/test:rpc_fork -- test_shutdown ``` ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_clean_context_during_backward buck build mode/dev-nosan //caffe2/test:dist_autograd_fork buck-out/gen/caffe2/test/dist_autograd_fork\#binary.par -r test_clean_context_during_backward ``` https://our.intern.facebook.com/intern/testinfra/diagnostics/281475127895800.844424945328750.1575664368/ ``` I1206 12:27:47.491420 185619 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.493880 185630 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.494526 185625 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.495390 185636 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. E1206 12:27:47.544198 185627 pair.cc:642] 1 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) E1206 12:27:47.544203 185633 pair.cc:642] 2 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) E1206 12:27:47.544210 185639 pair.cc:642] 3 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) ``` This should mean the UDF in the request has been run, so Python proceeded and ran to `_agent.shutdown()`. While the RpcAgents on followers wanted to send back the response, but the leader has closed RPC. Need to re-trigger "pytorch_rpc-buck" to reproduce the rare-seen issue. Differential Revision: D18643137 fbshipit-source-id: d669d4fc9ad65ed48bed1329a4eb1c32ba51323c	2020-01-04 17:13:44 -08:00
Martin Yuan	f362cd510d	Move prim ops from JIT registration to C10 (#30612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612 The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style. Test Plan: Imported from OSS Differential Revision: D19237648 Pulled By: iseeyuan fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9	2020-01-04 13:47:44 -08:00
Jerry Zhang	5579611544	Enable foldbn tests (#29220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29220 Support for accessing constant is added in previous PRs, this PR re-enables the foldbn tests Test Plan: test_jit.py Imported from OSS Differential Revision: D18846848 fbshipit-source-id: 90ceaf42539ffee80b984e0d8b2420da66c263c3	2020-01-04 11:47:01 -08:00
Jerry Zhang	ebe69236d1	Expose class constant through `attr` and `setattr` in object (#29219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29219 We added class constant in previous PRs, this PR allows access to class constant in the object API Test Plan: build/bin/test_jit python test/test_jit.py Imported from OSS Differential Revision: D18846851 fbshipit-source-id: 888a6517d5f747d1f8ced283c0c2c30b2f6c72c6	2020-01-04 11:09:35 -08:00
Jerry Zhang	6f62c311a1	Add unsafeRemoveConstant for ClassType (#30787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30787 This is needed when we fuse conv bn modules, where we need to rewrite a constant bias (None) of conv to an attribute bias of Tensor Test Plan: build/bin/test_jit Imported from OSS Differential Revision: D18846850 fbshipit-source-id: 9fd5fe85d93d07226e180b75d2e068fe00ca25fe	2020-01-04 01:11:59 -08:00
Jerry Zhang	2bac76969c	Fix getConstant (#31012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31012 - getConstant should throw when the item is not found - add another getConstant which takes slot index as argument Test Plan: test_class_type.cpp Imported from OSS Differential Revision: D18898418 fbshipit-source-id: d3a23a4896fdbf5fa98e1c55c9c4d6205840014b	2020-01-03 23:06:11 -08:00
Michael Suo	8420f205ee	Remove refs from ArrayRef arguments (#31845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31845 ArrayRef is trivially copyable and should be passed by value. Removing unnecessary `&`s. Test Plan: Imported from OSS Differential Revision: D19278523 Pulled By: suo fbshipit-source-id: 026db693ea98d19246b02c48d49d1929ecb6478e	2020-01-03 22:50:55 -08:00
Jerry Zhang	5fe3604987	Preserve constant from ConcreteModuleType to ClassType (#29218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29218 We need to be able to access constant in module. Test Plan: tbd Imported from OSS Differential Revision: D18846847 fbshipit-source-id: 22d2c485c3c449bc14ad798f6e1a0c64fc8fb346	2020-01-03 21:30:04 -08:00
Rohan Varma	28c9dd4436	fix ProcessGroupGlooTest (#31255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31255 This test had 2 issues. A timeout would occasionally happen due to a timeout of 50ms, and CUDA could would get compiled and run on CPU, leading to errors. This PR fixes those issues. Differential Revision: D19028231 fbshipit-source-id: e50752228affe0021e7c0caa83bce78d76473759	2020-01-03 18:35:29 -08:00
Rohan Varma	457c57d9f7	use unordered_set instead of vector for futureTimeouts key in (#31813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31813 Closes https://github.com/pytorch/pytorch/issues/31804. We were using an `std::vector` for the key for a map that keeps track of futures to mark them if they timeout, but we can instead use an `unordered_set`. This results in a faster lookup in the code block where we remove futureIDs from this set when they complete successfully. Previously we were finding them via a linear `std::find`. Switching it to a constant time find will help performance in the case where a large number of futures are scheduled to time out at the same time, or if there is no timeout enforced. To benchmark a rough perf improvement, I created 50k futures with the same timeout. Before this PR, the lookup `std::find(futuresAtTime.begin(), futuresAtTime.end(), id)` took ~200us, now it takes 1us. ghstack-source-id: 96251355 Test Plan: Unit tests pass. Differential Revision: D19269798 fbshipit-source-id: 1a0fa84a478ee27a16ab0b9fa6f5413b065a663e	2020-01-03 13:21:23 -08:00
Edward Yang	9c9d3cd550	Revert D19262570: Fix race condition when creating build dir Test Plan: revert-hammer Differential Revision: D19262570 Original commit changeset: bb18c72e4264 fbshipit-source-id: 40675ef6ef4c98629deaaef0b25956f92534ff50	2020-01-03 11:17:42 -08:00
xiaobing.zhang	b47e9b97a2	Add op bitwise_and (#31104 ) Summary: Refer to https://github.com/pytorch/pytorch/pull/25665, add `bitwise_and` operator. Benchmark script : ``` import timeit #for __and__ for n, t in [(10, 100000),(1000, 10000)]: print('__and__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) #for __iand__ for n, t in [(10, 100000),(1000, 10000)]: print('__iand__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __and__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.1766007635742426 device: cpu, dtype: torch.uint8, 100000 times 0.17322628945112228 device: cpu, dtype: torch.int16, 100000 times 0.17650844901800156 device: cpu, dtype: torch.int32, 100000 times 0.17711848113685846 device: cpu, dtype: torch.int64, 100000 times 0.18240160401910543 device: cuda, dtype: torch.int8, 100000 times 1.273967768996954 device: cuda, dtype: torch.uint8, 100000 times 1.2778537990525365 device: cuda, dtype: torch.int16, 100000 times 1.2753686187788844 device: cuda, dtype: torch.int32, 100000 times 1.2797665279358625 device: cuda, dtype: torch.int64, 100000 times 1.2933144550770521 __and__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.031139614060521126 device: cpu, dtype: torch.uint8, 10000 times 0.03091452084481716 device: cpu, dtype: torch.int16, 10000 times 0.022756479680538177 device: cpu, dtype: torch.int32, 10000 times 0.025045674294233322 device: cpu, dtype: torch.int64, 10000 times 0.024164282716810703 device: cuda, dtype: torch.int8, 10000 times 0.12820732593536377 device: cuda, dtype: torch.uint8, 10000 times 0.12775669433176517 device: cuda, dtype: torch.int16, 10000 times 0.12697868794202805 device: cuda, dtype: torch.int32, 10000 times 0.12832533661276102 device: cuda, dtype: torch.int64, 10000 times 0.1280576130375266 __iand__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3687064303085208 device: cpu, dtype: torch.uint8, 100000 times 0.36253443732857704 device: cpu, dtype: torch.int16, 100000 times 0.362891579978168 device: cpu, dtype: torch.int32, 100000 times 0.37680106051266193 device: cpu, dtype: torch.int64, 100000 times 0.3689364707097411 device: cuda, dtype: torch.int8, 100000 times 1.419940729625523 device: cuda, dtype: torch.uint8, 100000 times 1.4247053815051913 device: cuda, dtype: torch.int16, 100000 times 1.4191444097086787 device: cuda, dtype: torch.int32, 100000 times 1.4305962566286325 device: cuda, dtype: torch.int64, 100000 times 1.4567416654899716 __iand__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.06224383972585201 device: cpu, dtype: torch.uint8, 10000 times 0.06205617543309927 device: cpu, dtype: torch.int16, 10000 times 0.05016433447599411 device: cpu, dtype: torch.int32, 10000 times 0.05216377507895231 device: cpu, dtype: torch.int64, 10000 times 0.06139362137764692 device: cuda, dtype: torch.int8, 10000 times 0.14827249851077795 device: cuda, dtype: torch.uint8, 10000 times 0.14801877550780773 device: cuda, dtype: torch.int16, 10000 times 0.14952312968671322 device: cuda, dtype: torch.int32, 10000 times 0.14999118447303772 device: cuda, dtype: torch.int64, 10000 times 0.14951884001493454 ``` After: ``` __and__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.23157884553074837 device: cpu, dtype: torch.uint8, 100000 times 0.23063660878688097 device: cpu, dtype: torch.int16, 100000 times 0.23005440644919872 device: cpu, dtype: torch.int32, 100000 times 0.23748818412423134 device: cpu, dtype: torch.int64, 100000 times 0.24106105230748653 device: cuda, dtype: torch.int8, 100000 times 1.4394256137311459 device: cuda, dtype: torch.uint8, 100000 times 1.4436759827658534 device: cuda, dtype: torch.int16, 100000 times 1.4631587155163288 device: cuda, dtype: torch.int32, 100000 times 1.459101552143693 device: cuda, dtype: torch.int64, 100000 times 1.4784048134461045 __and__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.028442862443625927 device: cpu, dtype: torch.uint8, 10000 times 0.028130197897553444 device: cpu, dtype: torch.int16, 10000 times 0.025318274274468422 device: cpu, dtype: torch.int32, 10000 times 0.02519288007169962 device: cpu, dtype: torch.int64, 10000 times 0.028299466706812382 device: cuda, dtype: torch.int8, 10000 times 0.14342594426125288 device: cuda, dtype: torch.uint8, 10000 times 0.145280827768147 device: cuda, dtype: torch.int16, 10000 times 0.14673697855323553 device: cuda, dtype: torch.int32, 10000 times 0.14499565307050943 device: cuda, dtype: torch.int64, 10000 times 0.14582364354282618 __iand__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.25548241566866636 device: cpu, dtype: torch.uint8, 100000 times 0.2552562616765499 device: cpu, dtype: torch.int16, 100000 times 0.25905191246420145 device: cpu, dtype: torch.int32, 100000 times 0.26635489892214537 device: cpu, dtype: torch.int64, 100000 times 0.26269810926169157 device: cuda, dtype: torch.int8, 100000 times 1.485458506271243 device: cuda, dtype: torch.uint8, 100000 times 1.4742380809038877 device: cuda, dtype: torch.int16, 100000 times 1.507783885113895 device: cuda, dtype: torch.int32, 100000 times 1.4926990242674947 device: cuda, dtype: torch.int64, 100000 times 1.519851053133607 __iand__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03425929415971041 device: cpu, dtype: torch.uint8, 10000 times 0.03293587639927864 device: cpu, dtype: torch.int16, 10000 times 0.029559112153947353 device: cpu, dtype: torch.int32, 10000 times 0.030915481969714165 device: cpu, dtype: torch.int64, 10000 times 0.03292469773441553 device: cuda, dtype: torch.int8, 10000 times 0.15792148280888796 device: cuda, dtype: torch.uint8, 10000 times 0.16000914946198463 device: cuda, dtype: torch.int16, 10000 times 0.1600684942677617 device: cuda, dtype: torch.int32, 10000 times 0.16162546630948782 device: cuda, dtype: torch.int64, 10000 times 0.1629159888252616 ``` Fix https://github.com/pytorch/pytorch/issues/24508, https://github.com/pytorch/pytorch/issues/24509, https://github.com/pytorch/pytorch/issues/24655, https://github.com/pytorch/pytorch/issues/24656. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31104 Differential Revision: D18938930 Pulled By: VitalyFedyunin fbshipit-source-id: a77e805a0b84e8ace16c6e648c2f67dad44f2e44	2020-01-03 10:32:36 -08:00
Kaiyu Shi	8c425dd201	Fix race condition when creating build dir (#30956 ) Summary: The original `check-and-act` style can raise `FileExistsError` when multiple processes are jit-compiling the extension on the same node. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30956 Differential Revision: D19262570 Pulled By: ezyang fbshipit-source-id: bb18c72e42648770b47f9378ac7c3929c3c03efc	2020-01-03 07:58:26 -08:00

1 2 3 4 5 ...

8526 Commits