Commit Graph

8526 Commits

Author SHA1 Message Date
Tongzhou Wang
b6f43afaca Fix tensordot allowing negative dims (#31954)
Summary:
fixes https://github.com/pytorch/pytorch/issues/31926
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31954

Differential Revision: D19331847

Pulled By: zou3519

fbshipit-source-id: e30dd9517917c056a52be7d16f23247fe28f4e28
2020-01-10 07:42:04 -08:00
Rohan Varma
8ea49e7a08 add missing braces for format in rpc _to_worker_info (#31969)
Summary:
This was missing and resulted in the incorrect `name` passed into `_to_worker_info` not being printed out in the error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31969

Differential Revision: D19331927

Pulled By: rohan-varma

fbshipit-source-id: e74d47daec3224c2d9b9da3c0a6404cfa67baf65
2020-01-09 23:18:46 -08:00
Edward Yang
67c1d930eb Lock graph_task before writing leaf_streams. (#31995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31995

Fixes #31906.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19331259

Pulled By: ezyang

fbshipit-source-id: 5d24bf3555e632211a9b6f8e50ff241603c18b3d
2020-01-09 13:26:36 -08:00
TH3CHARLie
1296e2d55e C++ API parity: isinf (#31099)
Summary:
fixes https://github.com/pytorch/pytorch/issues/31021, port the legacy binding method of `isinf` to C++ therefore support JIT
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31099

Differential Revision: D19314733

Pulled By: yf225

fbshipit-source-id: 5725c51d19c33b4fddd0fc9e7034078580bd534e
2020-01-09 13:16:13 -08:00
Sameer Deshmukh
cfdfdf70d7 remove JSON dumping dependency (#30724)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/19420

So after actually writing a C++ JSON dumping class I figured that
a faster and cleaner way would be simply rewrite the Python without
the JSON module since the JSON that we need to output is so simple.

For now I decided to not touch the `parse_cpu_trace` function since
only changing `export_chrome_trace` shows a 4x speedup.

Here's the script I used for benchmarking:
``` python
import time
import torch

x = torch.ones(2, 2)

start = time.time()
with torch.autograd.profiler.profile() as prof:
  for _ in range(10000):
    x * x

for i in range(50):
  prof.export_chrome_trace("trace.json")

stop = time.time()

print(stop-start)
```
master branch (using json dump) -> 8.07515025138855
new branch (without json dump) ->  2.0943689346313477

I checked the trace file generated in the [test](https://github.com/pytorch/pytorch/blob/master/test/test_autograd.py#L2659)
and it does work fine.

Please let me know what you think.

If you still insist on the C++ version I can send a new patch soon enough.

CC ezyang rgommers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30724

Differential Revision: D19298955

Pulled By: ezyang

fbshipit-source-id: b0d7324ea5f90884ab8a00dd272f3aa3d9bc0427
2020-01-09 12:56:16 -08:00
jlquinn
bc68a8745f Spelling fix in transformer docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31973

Differential Revision: D19330660

Pulled By: zou3519

fbshipit-source-id: 29ea1e790a34f0241cb7aba85110f087cdc069ba
2020-01-09 11:13:23 -08:00
Edward Yang
ddff4efa26 Don't use RTLD_GLOBAL to load _C. (#31162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31162

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D19262579

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2
2020-01-09 07:28:15 -08:00
Edward Yang
8614860210 Uniformly apply Windows logic in cpp_extensions everywhere (#31161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161

Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols.  But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262578

Pulled By: ezyang

fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f
2020-01-09 07:28:11 -08:00
Elias Ellison
8ecd3f783d check for object equality in constant pooling (#31800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31800

If we know that two constants are the same object, we can ignore other constraints and pool them together. This fixes an issue introduced by the other PR where quantization relied on constant pooling happening for correctness.

Test Plan: Imported from OSS

Differential Revision: D19269499

Pulled By: eellison

fbshipit-source-id: 9d4396125aa6899cb081863d463d4f024135cbf4
2020-01-08 16:47:07 -08:00
Elias Ellison
319cc21108 Add AliasDb API For Changing Aliasing (#31501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31501

We have a number of places in our code base where we should be checking if it's safe to change the alias relationship between two sets of values. This PR adds an api to Alias Db to consolidate the logic, and refactors Constant Pooling and `CSE` to use the new api. Next steps: add api usage in peephole.cpp where applicable.

Happy to bikeshed `AliasDb::safeToChangeAliasingRelationship`. Previously I suggested `AliasDb::safeToIntroduceAliasing`, however that's not quite accurate, because this API also handles when it is unsafe to remove aliasing.

Alternate suggestions: `safeToChangeAliasing`, `validToChangeAliasing`, `validToChangeAliasingRelationship`

Related:  https://github.com/pytorch/pytorch/issues/28360

Test Plan: Imported from OSS

Differential Revision: D19254413

Pulled By: eellison

fbshipit-source-id: 17f7f52ad2d1526d303132767cbbb32f8189ae15
2020-01-08 16:47:03 -08:00
davidriazati
883fb5434a Use real argument names for Python functions (#29300)
Summary:
This hooks up `inspect` so that Python functions get their parameters
names attached instead of naming them `0, 1, 2, ...`. This also fixes
issue #28537 where `ignore` functions were improperly typing `self`.
](https://our.intern.facebook.com/intern/diff/19256434/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29300

Pulled By: driazati

Differential Revision: D19256434

fbshipit-source-id: 6a1fe7bd0afab708b8439517798955d0abfeb44c
2020-01-08 15:41:28 -08:00
xiaobing.zhang
9ba6a768de Add op bitwise_or (#31559)
Summary:
ezyang ,  this PR add bitwise_or operator as https://github.com/pytorch/pytorch/pull/31104 .
Benchmark script :
```
import timeit
import torch
torch.manual_seed(1)

for n, t in [(10, 100000),(1000, 10000)]:
    print('__or__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a | b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))

for n, t in [(10, 100000),(1000, 10000)]:
    print('__ior__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a | b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__or__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.17616272252053022
device: cpu, dtype: torch.uint8, 100000 times           0.17148233391344547
device: cpu, dtype: torch.int16, 100000 times           0.17616403382271528
device: cpu, dtype: torch.int32, 100000 times           0.17717823758721352
device: cpu, dtype: torch.int64, 100000 times           0.1801931718364358
device: cuda, dtype: torch.int8, 100000 times           1.270583058707416
device: cuda, dtype: torch.uint8, 100000 times          1.2636413089931011
device: cuda, dtype: torch.int16, 100000 times          1.2839747751131654
device: cuda, dtype: torch.int32, 100000 times          1.2548385225236416
device: cuda, dtype: torch.int64, 100000 times          1.2650810535997152
__or__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.031136621721088886
device: cpu, dtype: torch.uint8, 10000 times            0.030786747112870216
device: cpu, dtype: torch.int16, 10000 times            0.02391665056347847
device: cpu, dtype: torch.int32, 10000 times            0.024147341027855873
device: cpu, dtype: torch.int64, 10000 times            0.024414129555225372
device: cuda, dtype: torch.int8, 10000 times            0.12741921469569206
device: cuda, dtype: torch.uint8, 10000 times           0.1249831635504961
device: cuda, dtype: torch.int16, 10000 times           0.1283819805830717
device: cuda, dtype: torch.int32, 10000 times           0.12591975275427103
device: cuda, dtype: torch.int64, 10000 times           0.12655890546739101
__ior__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3908365070819855
device: cpu, dtype: torch.uint8, 100000 times           0.38267823681235313
device: cpu, dtype: torch.int16, 100000 times           0.38239253498613834
device: cpu, dtype: torch.int32, 100000 times           0.3817988149821758
device: cpu, dtype: torch.int64, 100000 times           0.3901665909215808
device: cuda, dtype: torch.int8, 100000 times           1.4211318120360374
device: cuda, dtype: torch.uint8, 100000 times          1.4215159295126796
device: cuda, dtype: torch.int16, 100000 times          1.4307750314474106
device: cuda, dtype: torch.int32, 100000 times          1.4123614141717553
device: cuda, dtype: torch.int64, 100000 times          1.4480243818834424
__ior__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.06468924414366484
device: cpu, dtype: torch.uint8, 10000 times            0.06442475505173206
device: cpu, dtype: torch.int16, 10000 times            0.05267547257244587
device: cpu, dtype: torch.int32, 10000 times            0.05286940559744835
device: cpu, dtype: torch.int64, 10000 times            0.06211103219538927
device: cuda, dtype: torch.int8, 10000 times            0.15332304500043392
device: cuda, dtype: torch.uint8, 10000 times           0.15353196952492
device: cuda, dtype: torch.int16, 10000 times           0.15300503931939602
device: cuda, dtype: torch.int32, 10000 times           0.15274472255259752
device: cuda, dtype: torch.int64, 10000 times           0.1512152962386608
```
After:
```
__or__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.2465507509186864
device: cpu, dtype: torch.uint8, 100000 times           0.2472386620938778
device: cpu, dtype: torch.int16, 100000 times           0.2469814233481884
device: cpu, dtype: torch.int32, 100000 times           0.2535214088857174
device: cpu, dtype: torch.int64, 100000 times           0.24855613708496094
device: cuda, dtype: torch.int8, 100000 times           1.4351346511393785
device: cuda, dtype: torch.uint8, 100000 times          1.4434308474883437
device: cuda, dtype: torch.int16, 100000 times          1.4520929995924234
device: cuda, dtype: torch.int32, 100000 times          1.4456610176712275
device: cuda, dtype: torch.int64, 100000 times          1.4580101007595658
__or__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.029985425993800163
device: cpu, dtype: torch.uint8, 10000 times            0.03024935908615589
device: cpu, dtype: torch.int16, 10000 times            0.026356655173003674
device: cpu, dtype: torch.int32, 10000 times            0.027377349324524403
device: cpu, dtype: torch.int64, 10000 times            0.029163731262087822
device: cuda, dtype: torch.int8, 10000 times            0.14540370367467403
device: cuda, dtype: torch.uint8, 10000 times           0.1456305105239153
device: cuda, dtype: torch.int16, 10000 times           0.1450125053524971
device: cuda, dtype: torch.int32, 10000 times           0.1472016740590334
device: cuda, dtype: torch.int64, 10000 times           0.14709716010838747
__ior__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.27195510920137167
device: cpu, dtype: torch.uint8, 100000 times           0.2692424338310957
device: cpu, dtype: torch.int16, 100000 times           0.27726674638688564
device: cpu, dtype: torch.int32, 100000 times           0.2815811652690172
device: cpu, dtype: torch.int64, 100000 times           0.2852728571742773
device: cuda, dtype: torch.int8, 100000 times           1.4743850827217102
device: cuda, dtype: torch.uint8, 100000 times          1.4766502184793353
device: cuda, dtype: torch.int16, 100000 times          1.4774163831025362
device: cuda, dtype: torch.int32, 100000 times          1.4749693805351853
device: cuda, dtype: torch.int64, 100000 times          1.5772947426885366
__ior__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.03614502027630806
device: cpu, dtype: torch.uint8, 10000 times            0.03619729354977608
device: cpu, dtype: torch.int16, 10000 times            0.0319912089034915
device: cpu, dtype: torch.int32, 10000 times            0.03319283854216337
device: cpu, dtype: torch.int64, 10000 times            0.0343862259760499
device: cuda, dtype: torch.int8, 10000 times            0.1581476852297783
device: cuda, dtype: torch.uint8, 10000 times           0.15974601730704308
device: cuda, dtype: torch.int16, 10000 times           0.15957212820649147
device: cuda, dtype: torch.int32, 10000 times           0.16002820804715157
device: cuda, dtype: torch.int64, 10000 times           0.16129320487380028
```

Fix  https://github.com/pytorch/pytorch/issues/24511, https://github.com/pytorch/pytorch/issues/24515, https://github.com/pytorch/pytorch/issues/24658, https://github.com/pytorch/pytorch/issues/24662.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31559

Differential Revision: D19315875

Pulled By: ezyang

fbshipit-source-id: 4a3ca88fdafbeb796079687e676228111eb44aad
2020-01-08 15:06:30 -08:00
Alban Desmaison
1314f7f4f4 Ensure the original grad_mode is restored during backward (#31884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31884

Fix #31715

Test Plan: Imported from OSS

Differential Revision: D19301076

Pulled By: albanD

fbshipit-source-id: 2d20c01bfb6364fa96c8fe5aa5ce7ea39defa3ce
2020-01-08 14:16:51 -08:00
Edward Yang
5dfcfeebb8 Revert D19298735: Emit warning from deprecated torch function signatures
Test Plan: revert-hammer

Differential Revision:
D19298735

Original commit changeset: 03cb78af1765

fbshipit-source-id: 304a6d4412f53a8fc822d36897c96815432e0f70
2020-01-08 13:04:41 -08:00
Shen Li
7f723cbd8a Revert D19290954: Implement backend-agnostic rpc._wait_all_workers() utility
Test Plan: revert-hammer

Differential Revision:
D19290954

Original commit changeset: cdb22203c2f2

fbshipit-source-id: 2ae194a06a645e4f48879271eccf0588b0956cd3
2020-01-08 10:25:51 -08:00
Shihao Xu
6664703842 Implement backend-agnostic rpc._wait_all_workers() utility (#31888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31888

We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent.

- Sort worker names.
- Elect the first name as the leader in the ordered worker names.
- Followers reports therir intent to synchronize to the leader.
- Leader also reports to itself, when `_wait_all_workers()` called.
- If all workers report their intent to proceed, leader send the command to every one to proceed.
ghstack-source-id: 96386210

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id
```

# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```

Differential Revision: D19290954

fbshipit-source-id: cdb22203c2f27b5e0d0ad5b2d3b279d438c22dcf
2020-01-08 01:00:25 -08:00
davidriazati
3c07eb33bb Better error for torch::jit::loading a eager file (#31709)
Summary:
This adds a check to catch the case where someone `torch.save`s something then `torch::jit::load`s it in C++.

Relevant for #31620
](https://our.intern.facebook.com/intern/diff/19252172/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31709

Pulled By: driazati

Differential Revision: D19252172

fbshipit-source-id: f2a9b4442647285418b2778306629b4ff77c15e5
2020-01-07 16:20:42 -08:00
Shihao Xu
a730920a3d Make RRef leak detection always print a warning log (#31922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31922

For better debugging, `test_rref_leak` failure in https://app.circleci.com/jobs/github/pytorch/pytorch/4135881, as per discussion in https://github.com/pytorch/pytorch/pull/31888.

ghstack-source-id: 96375261

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
```

# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```

Differential Revision: D19302814

fbshipit-source-id: 51632aede98e01689f8bc0f266788a9b020daa15
2020-01-07 15:18:00 -08:00
Karl Ostmo
227d1a43a4 Revert D18838848: disable __torch_function__ overides for operators in torch.functional
Test Plan: revert-hammer

Differential Revision:
D18838848

Original commit changeset: 22b8015d7b2f

fbshipit-source-id: fdaeffcd112990ed379782cf7216d3f1beeb2cb1
2020-01-07 15:03:15 -08:00
Nathan Goldbaum
ca72df06ae disable __torch_function__ overides for operators in torch.functional (#30839)
Summary:
For now I'm just removing the decorators from all of the currently overridable functions in `torch.functional`. This means they are no longer overridable, however this should fix the benchmark regressions reported in https://github.com/pytorch/pytorch/issues/30831. Moving forward we'll be looking at reducing the overhead of the python-level override mechanism and failing that, re-implementing all of these operators in C++.

cc hl475
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30839

Differential Revision: D18838848

Pulled By: ezyang

fbshipit-source-id: 22b8015d7b2f7a947f1ebc9632c998e081b48ad8
2020-01-07 12:27:28 -08:00
Artem Volkhin
3a2757c682 Fix tracing for modules with List[Tensor] as output (#31343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31343

Fix an issue in TorchScript tracing for modules with `c10::List<at::Tensor>` as an output. TensorList was not supported properly.

Test Plan: unit tests

Reviewed By: wanchaol

Differential Revision: D18850722

fbshipit-source-id: 87a223104d1361fe754d55deceeb1e8bbcad629b
2020-01-07 11:57:25 -08:00
Pritam Damania
bf8e1c0710 Integrate async mode for autograd engine with distributed autograd. (#31508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31508

This PR builds on top of https://github.com/pytorch/pytorch/pull/31230
to ensure that distributed autograd doesn't block an RPC thread anymore during
the backward pass.

I've also added a unit test where all ranks hammer rank 0 without about 60
backward calls (which would cause a deadlock earlier), but now such a test
passes without any issues.
ghstack-source-id: 96345097

Test Plan: waitforbuildbot

Differential Revision: D19188749

fbshipit-source-id: b21381b38175699afd0f9dce1ddc8ea6a220f589
2020-01-07 11:01:16 -08:00
Peter Bell
0e5a6700cc Emit warning from deprecated torch function signatures (#31514)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28430

The unpythonic signatures for functions such as `torch.addcdiv` are already seperated in [`deprecated.yaml`] and the signatures marked as deprecated in `PythonArgParser`. However, nothing was done with this information previously. So, this now emits a warning when the deprecated signatures are used.

One minor complication is that if all arguments are passed as keyword args then there is nothing to differentiate the deprecated overload. This can lead to false warnings being emitted. So, I've also modified `PythonArgParser` to prefer non-deprecated signatures.

[`deprecated.yaml`]: https://github.com/pytorch/pytorch/blob/master/tools/autograd/deprecated.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31514

Differential Revision: D19298735

Pulled By: ezyang

fbshipit-source-id: 03cb78af17658eaab9d577cd2497c6f413f07647
2020-01-07 10:57:53 -08:00
Pritam Damania
5cc62f2913 Ensure autograd callbacks are called only once for reentrant backward. (#31909)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31909

https://github.com/pytorch/pytorch/pull/31230 introduced a bug where
we would end up calling `graph_task_post_processing` twice for reentrant
backward calls (once when we mark the future completed and then we we called
graph_task_post_processing in execute_with_graph_task).

This PR fixes the issues by verifying the future we return in that case is
completed and we remove the call to graph_task_post_processing.

In addition to that I added a test that reproduced the problem and verified it
is fixed by this PR.
ghstack-source-id: 96349102

Test Plan: waitforbuildbot

Differential Revision: D19296363

fbshipit-source-id: dc01a4e95989709ad163bb0357b1d191ef5a4fb2
2020-01-07 10:35:04 -08:00
Sameer Deshmukh
2f5eefe525 Raise ValueError if CUDA device is specified without specifying the : (#29087)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/19076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29087

Differential Revision: D19298959

Pulled By: ezyang

fbshipit-source-id: 878ea4840682012f07177d8d159a77c0e5afada6
2020-01-07 10:29:49 -08:00
Edward Yang
3c7db5ccbc Don't unconditionally compile runJITCPPTests (#31236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31236

It is not compiled on Windows

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262581

Pulled By: ezyang

fbshipit-source-id: 80bfa553333a946f00291aaca6ad26313caaa9e6
2020-01-07 10:24:52 -08:00
Andreas Koepf
22044c6f7c Use TORCH_CHECK instead of AT_ASSERT in torch::cuda::gather() (#27456)
Summary:
The error message produced by AT_ASSERT() in gather() encouraged users to file a bug report ("please report a bug to PyTorch..."). The assertion should be a regular argument check since it can be triggered by passing tensors with different dimensionality, e.g. `torch.cuda.comm.gather([torch.rand(1, device='cuda'), torch.rand(1, 1, device='cuda')])`.

See: https://github.com/pytorch/pytorch/issues/26400
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27456

Differential Revision: D19300270

Pulled By: ezyang

fbshipit-source-id: ec87d225e23445020b377521e0daccceb4748215
2020-01-07 10:04:24 -08:00
yyb1995
20c5dd59bd Add stub for transformer.py and MultiheadAttention Class. (#28396)
Summary:
Add stub for `transformer.py` and `class MultiheadAttention`. Add import for `transformer.py`  and `class MultiheadAttention` in `__init__.pyi.in`. I've tested the code hint in PyCharm and all works file.
Relate issue: [https://github.com/pytorch/pytorch/issues/27842](https://github.com/pytorch/pytorch/issues/27842)
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28396

Differential Revision: D19300287

Pulled By: ezyang

fbshipit-source-id: 1a79d6518b5edd4643892c46a959108385c739ad
2020-01-07 09:13:36 -08:00
Peter Bell
5d80f63478 no_grad, enable_grad: support for decorating generator functions (#31792)
Summary:
Closes https://github.com/pytorch/pytorch/issues/31497

This allows `torch.no_grad` and `torch.enable_grad` to be used as decorators for generator functions. In which case it disables/enables grad only inside the body of the generator and restores the context outside of the generator.

https://github.com/pytorch/pytorch/issues/31497 doesn't include a complete reproducer but the included test with `torch.is_grad_enabled` show this is working where it failed before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31792

Differential Revision: D19274971

Pulled By: albanD

fbshipit-source-id: fde6d3fd95d76c8d324ad02db577213a4b68ccbe
2020-01-06 15:21:20 -08:00
Edward Yang
58cffbff91 Add missing TORCH_CUDA_API annotation to throw_nccl_error (#31157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31157

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262583

Pulled By: ezyang

fbshipit-source-id: 8fb87b41ab53770329b38e1e2fe679fb868fee12
2020-01-06 14:39:51 -08:00
neginraoof
112196fdee Fix index put (#31552)
Summary:
This change is required for cases like:
x[1:] = data or x[:3] = data
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31552

Reviewed By: hl475

Differential Revision: D19238815

Pulled By: houseroad

fbshipit-source-id: 56c9837d86b341ea92b0a71d55034ce189d12e6c
2020-01-06 14:09:48 -08:00
neginraoof
78cba90a8c Enable constant folding for Reshape (#31054)
Summary:
Enabled constant folding for onnx::Reshape
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31054

Reviewed By: hl475

Differential Revision: D18946951

Pulled By: houseroad

fbshipit-source-id: 499e8bf5fb091a94f7a27cbdf4311a23b1a6e3d3
2020-01-06 13:35:44 -08:00
anjali411
ddff014b79 fixed scale_factor calculation for uint8 tensor (#31778)
Summary:
When calling the add_images() method on the tensorboard SummaryWriter with a uint8 NCHW tensor, the tensor is incorrectly scaled, resulting in overflow behavior. This leads to incorrect images being displayed in tensorboard.

Issue: https://github.com/pytorch/pytorch/issues/31459

Local Testing (ran this code with and without the PR changes and printed scale_factor):

import torch
import torchvision
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()
x=torch.tensor([[[[1, 2, 3], [4, 5, 6]]]], dtype=torch.uint8)
writer.add_images("images", x)

Before- scale_factor: 255, After- scale_factor: 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31778

Differential Revision: D19289189

Pulled By: anjali411

fbshipit-source-id: 350a1650337244deae4fd8f8b7fb0e354ae6986b
2020-01-06 10:27:35 -08:00
meganset
1ba1799a66 C++ added 3rd arg of false to BatchNorm/InstanceNorm register_parameter … (#31873)
Summary:
Fix for issue https://github.com/pytorch/pytorch/issues/31680
C++ BatchNorm & InstanceNorm attempt to register undefined tensors when affine is false.

Fixes https://github.com/pytorch/pytorch/issues/31680
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31873

Differential Revision: D19287087

Pulled By: yf225

fbshipit-source-id: 0d57f10c49083386919b703d72b520a73a8e9e7f
2020-01-06 01:46:24 -08:00
Shen Li
33430cf094 Revert D18643137: Implement backend-agnostic rpc._wait_all_workers() utility
Test Plan: revert-hammer

Differential Revision:
D18643137

Original commit changeset: d669d4fc9ad6

fbshipit-source-id: fe1f8ed77c1c5760638fef06e67ba100b86c33e9
2020-01-05 11:58:51 -08:00
Pritam Damania
fde94e7556 Provide async mode for local autograd engine. (#31230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31230

A major issue with distributed autograd currently is that we block an
RPC thread when we call Engine::execute_with_graph_task.

To resolve this issue, I've made modifications to the local autograd engine
such that `execute_with_graph_task` returns a Future instead. The `execute()`
methods for Engine::execute() and DistEngine::execute() still wait() on this
Future which ensures there is no change in behavior yet.

In follow up PRs we can modify the distributed autograd engine to take
advantage of this Future.

Closes #26359
ghstack-source-id: 96298057

Test Plan: waitforbuildbot

Differential Revision: D18999709

fbshipit-source-id: 388f54467fd2415a0acb7df17bd063aedc105229
2020-01-05 00:29:28 -08:00
James Noeckel
3f0b330736 corrected keyword argument name in docs for Tensor.scatter (#31617)
Summary:
See https://github.com/pytorch/pytorch/issues/31601
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31617

Differential Revision: D19268872

Pulled By: mruberry

fbshipit-source-id: 52f0213f4aab991fd549b7623556a2ced61631a6
2020-01-04 21:48:30 -08:00
Shihao Xu
502533cfe6 Implement backend-agnostic rpc._wait_all_workers() utility (#30710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30710

We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent.

- Sort worker names.
- Elect the first name as the leader in the ordered worker names.
- Followers reports therir intent to synchronize to the leader.
- Leader also reports to itself, when `_wait_all_workers()` called.
- If all workers report their intent to proceed, leader send the command to every one to proceed.

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_wait_all_workers

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers$
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_forward_chain
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_wait_all_workers

buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_wait_all_workers$
```

# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```

# Debug

```
buck test mode/dev-nosan caffe2/test:rpc_fork -- test_shutdown
```

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_clean_context_during_backward

buck build mode/dev-nosan //caffe2/test:dist_autograd_fork

buck-out/gen/caffe2/test/dist_autograd_fork\#binary.par -r test_clean_context_during_backward
```

https://our.intern.facebook.com/intern/testinfra/diagnostics/281475127895800.844424945328750.1575664368/

```
I1206 12:27:47.491420 185619 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.493880 185630 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.494526 185625 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.495390 185636 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
E1206 12:27:47.544198 185627 pair.cc:642] 1 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
E1206 12:27:47.544203 185633 pair.cc:642] 2 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
E1206 12:27:47.544210 185639 pair.cc:642] 3 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
```
This should mean the UDF in the request has been run, so Python proceeded and ran to `_agent.shutdown()`.

While the RpcAgents on followers wanted to send back the response, but the leader has closed RPC.

Need to re-trigger "pytorch_rpc-buck" to reproduce the rare-seen issue.

Differential Revision: D18643137

fbshipit-source-id: d669d4fc9ad65ed48bed1329a4eb1c32ba51323c
2020-01-04 17:13:44 -08:00
Martin Yuan
f362cd510d Move prim ops from JIT registration to C10 (#30612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612

The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style.

Test Plan: Imported from OSS

Differential Revision: D19237648

Pulled By: iseeyuan

fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9
2020-01-04 13:47:44 -08:00
Jerry Zhang
5579611544 Enable foldbn tests (#29220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29220

Support for accessing constant is added in previous
PRs, this PR re-enables the foldbn tests

Test Plan:
test_jit.py

Imported from OSS

Differential Revision: D18846848

fbshipit-source-id: 90ceaf42539ffee80b984e0d8b2420da66c263c3
2020-01-04 11:47:01 -08:00
Jerry Zhang
ebe69236d1 Expose class constant through attr and setattr in object (#29219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29219

We added class constant in previous PRs, this PR allows access to
class constant in the object API

Test Plan:
build/bin/test_jit
python test/test_jit.py

Imported from OSS

Differential Revision: D18846851

fbshipit-source-id: 888a6517d5f747d1f8ced283c0c2c30b2f6c72c6
2020-01-04 11:09:35 -08:00
Jerry Zhang
6f62c311a1 Add unsafeRemoveConstant for ClassType (#30787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30787

This is needed when we fuse conv bn modules,
where we need to rewrite a constant bias (None) of conv to an attribute
bias of Tensor

Test Plan:
build/bin/test_jit

Imported from OSS

Differential Revision: D18846850

fbshipit-source-id: 9fd5fe85d93d07226e180b75d2e068fe00ca25fe
2020-01-04 01:11:59 -08:00
Jerry Zhang
2bac76969c Fix getConstant (#31012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31012

- getConstant should throw when the item is not found
- add another getConstant which takes slot index as argument

Test Plan:
test_class_type.cpp

Imported from OSS

Differential Revision: D18898418

fbshipit-source-id: d3a23a4896fdbf5fa98e1c55c9c4d6205840014b
2020-01-03 23:06:11 -08:00
Michael Suo
8420f205ee Remove refs from ArrayRef arguments (#31845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31845

ArrayRef is trivially copyable and should be passed by value. Removing
unnecessary `&`s.

Test Plan: Imported from OSS

Differential Revision: D19278523

Pulled By: suo

fbshipit-source-id: 026db693ea98d19246b02c48d49d1929ecb6478e
2020-01-03 22:50:55 -08:00
Jerry Zhang
5fe3604987 Preserve constant from ConcreteModuleType to ClassType (#29218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29218

We need to be able to access constant in module.

Test Plan:
tbd

Imported from OSS

Differential Revision: D18846847

fbshipit-source-id: 22d2c485c3c449bc14ad798f6e1a0c64fc8fb346
2020-01-03 21:30:04 -08:00
Rohan Varma
28c9dd4436 fix ProcessGroupGlooTest (#31255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31255

This test had 2 issues. A timeout would occasionally happen due to a timeout of 50ms, and CUDA could would get compiled and run on CPU, leading to errors. This PR fixes those issues.

Differential Revision: D19028231

fbshipit-source-id: e50752228affe0021e7c0caa83bce78d76473759
2020-01-03 18:35:29 -08:00
Rohan Varma
457c57d9f7 use unordered_set instead of vector for futureTimeouts key in (#31813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31813

Closes https://github.com/pytorch/pytorch/issues/31804. We were using
an `std::vector` for the key for a map that keeps track of futures to mark them
if they timeout, but we can instead use an `unordered_set`. This results in a
faster lookup in the code block where we remove futureIDs from this set when
they complete successfully. Previously we were finding them via a linear
`std::find`. Switching it to a constant time find will help performance in the
case where a large number of futures are scheduled to time out at the same
time, or if there is no timeout enforced.

To benchmark a rough perf improvement, I created 50k futures with the same
timeout. Before this PR, the lookup `std::find(futuresAtTime.begin(),
futuresAtTime.end(), id)` took ~200us, now it takes 1us.
ghstack-source-id: 96251355

Test Plan: Unit tests pass.

Differential Revision: D19269798

fbshipit-source-id: 1a0fa84a478ee27a16ab0b9fa6f5413b065a663e
2020-01-03 13:21:23 -08:00
Edward Yang
9c9d3cd550 Revert D19262570: Fix race condition when creating build dir
Test Plan: revert-hammer

Differential Revision:
D19262570

Original commit changeset: bb18c72e4264

fbshipit-source-id: 40675ef6ef4c98629deaaef0b25956f92534ff50
2020-01-03 11:17:42 -08:00
xiaobing.zhang
b47e9b97a2 Add op bitwise_and (#31104)
Summary:
Refer to https://github.com/pytorch/pytorch/pull/25665,  add `bitwise_and` operator.
Benchmark script :
```
import timeit
#for __and__
for n, t in [(10, 100000),(1000, 10000)]:
    print('__and__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))
#for __iand__
for n, t in [(10, 100000),(1000, 10000)]:
    print('__iand__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__and__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.1766007635742426
device: cpu, dtype: torch.uint8, 100000 times           0.17322628945112228
device: cpu, dtype: torch.int16, 100000 times           0.17650844901800156
device: cpu, dtype: torch.int32, 100000 times           0.17711848113685846
device: cpu, dtype: torch.int64, 100000 times           0.18240160401910543
device: cuda, dtype: torch.int8, 100000 times           1.273967768996954
device: cuda, dtype: torch.uint8, 100000 times          1.2778537990525365
device: cuda, dtype: torch.int16, 100000 times          1.2753686187788844
device: cuda, dtype: torch.int32, 100000 times          1.2797665279358625
device: cuda, dtype: torch.int64, 100000 times          1.2933144550770521
__and__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.031139614060521126
device: cpu, dtype: torch.uint8, 10000 times            0.03091452084481716
device: cpu, dtype: torch.int16, 10000 times            0.022756479680538177
device: cpu, dtype: torch.int32, 10000 times            0.025045674294233322
device: cpu, dtype: torch.int64, 10000 times            0.024164282716810703
device: cuda, dtype: torch.int8, 10000 times            0.12820732593536377
device: cuda, dtype: torch.uint8, 10000 times           0.12775669433176517
device: cuda, dtype: torch.int16, 10000 times           0.12697868794202805
device: cuda, dtype: torch.int32, 10000 times           0.12832533661276102
device: cuda, dtype: torch.int64, 10000 times           0.1280576130375266
__iand__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3687064303085208
device: cpu, dtype: torch.uint8, 100000 times           0.36253443732857704
device: cpu, dtype: torch.int16, 100000 times           0.362891579978168
device: cpu, dtype: torch.int32, 100000 times           0.37680106051266193
device: cpu, dtype: torch.int64, 100000 times           0.3689364707097411
device: cuda, dtype: torch.int8, 100000 times           1.419940729625523
device: cuda, dtype: torch.uint8, 100000 times          1.4247053815051913
device: cuda, dtype: torch.int16, 100000 times          1.4191444097086787
device: cuda, dtype: torch.int32, 100000 times          1.4305962566286325
device: cuda, dtype: torch.int64, 100000 times          1.4567416654899716
__iand__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.06224383972585201
device: cpu, dtype: torch.uint8, 10000 times            0.06205617543309927
device: cpu, dtype: torch.int16, 10000 times            0.05016433447599411
device: cpu, dtype: torch.int32, 10000 times            0.05216377507895231
device: cpu, dtype: torch.int64, 10000 times            0.06139362137764692
device: cuda, dtype: torch.int8, 10000 times            0.14827249851077795
device: cuda, dtype: torch.uint8, 10000 times           0.14801877550780773
device: cuda, dtype: torch.int16, 10000 times           0.14952312968671322
device: cuda, dtype: torch.int32, 10000 times           0.14999118447303772
device: cuda, dtype: torch.int64, 10000 times           0.14951884001493454
```
After:
```
__and__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.23157884553074837
device: cpu, dtype: torch.uint8, 100000 times           0.23063660878688097
device: cpu, dtype: torch.int16, 100000 times           0.23005440644919872
device: cpu, dtype: torch.int32, 100000 times           0.23748818412423134
device: cpu, dtype: torch.int64, 100000 times           0.24106105230748653
device: cuda, dtype: torch.int8, 100000 times           1.4394256137311459
device: cuda, dtype: torch.uint8, 100000 times          1.4436759827658534
device: cuda, dtype: torch.int16, 100000 times          1.4631587155163288
device: cuda, dtype: torch.int32, 100000 times          1.459101552143693
device: cuda, dtype: torch.int64, 100000 times          1.4784048134461045
__and__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.028442862443625927
device: cpu, dtype: torch.uint8, 10000 times            0.028130197897553444
device: cpu, dtype: torch.int16, 10000 times            0.025318274274468422
device: cpu, dtype: torch.int32, 10000 times            0.02519288007169962
device: cpu, dtype: torch.int64, 10000 times            0.028299466706812382
device: cuda, dtype: torch.int8, 10000 times            0.14342594426125288
device: cuda, dtype: torch.uint8, 10000 times           0.145280827768147
device: cuda, dtype: torch.int16, 10000 times           0.14673697855323553
device: cuda, dtype: torch.int32, 10000 times           0.14499565307050943
device: cuda, dtype: torch.int64, 10000 times           0.14582364354282618
__iand__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.25548241566866636
device: cpu, dtype: torch.uint8, 100000 times           0.2552562616765499
device: cpu, dtype: torch.int16, 100000 times           0.25905191246420145
device: cpu, dtype: torch.int32, 100000 times           0.26635489892214537
device: cpu, dtype: torch.int64, 100000 times           0.26269810926169157
device: cuda, dtype: torch.int8, 100000 times           1.485458506271243
device: cuda, dtype: torch.uint8, 100000 times          1.4742380809038877
device: cuda, dtype: torch.int16, 100000 times          1.507783885113895
device: cuda, dtype: torch.int32, 100000 times          1.4926990242674947
device: cuda, dtype: torch.int64, 100000 times          1.519851053133607
__iand__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.03425929415971041
device: cpu, dtype: torch.uint8, 10000 times            0.03293587639927864
device: cpu, dtype: torch.int16, 10000 times            0.029559112153947353
device: cpu, dtype: torch.int32, 10000 times            0.030915481969714165
device: cpu, dtype: torch.int64, 10000 times            0.03292469773441553
device: cuda, dtype: torch.int8, 10000 times            0.15792148280888796
device: cuda, dtype: torch.uint8, 10000 times           0.16000914946198463
device: cuda, dtype: torch.int16, 10000 times           0.1600684942677617
device: cuda, dtype: torch.int32, 10000 times           0.16162546630948782
device: cuda, dtype: torch.int64, 10000 times           0.1629159888252616
```
Fix  https://github.com/pytorch/pytorch/issues/24508, https://github.com/pytorch/pytorch/issues/24509,  https://github.com/pytorch/pytorch/issues/24655, https://github.com/pytorch/pytorch/issues/24656.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31104

Differential Revision: D18938930

Pulled By: VitalyFedyunin

fbshipit-source-id: a77e805a0b84e8ace16c6e648c2f67dad44f2e44
2020-01-03 10:32:36 -08:00
Kaiyu Shi
8c425dd201 Fix race condition when creating build dir (#30956)
Summary:
The original `check-and-act` style can raise `FileExistsError` when multiple processes are jit-compiling the extension on the same node.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30956

Differential Revision: D19262570

Pulled By: ezyang

fbshipit-source-id: bb18c72e42648770b47f9378ac7c3929c3c03efc
2020-01-03 07:58:26 -08:00