Commit Graph

23444 Commits

Author SHA1 Message Date
Tongzhou Wang
b6f43afaca Fix tensordot allowing negative dims (#31954)
Summary:
fixes https://github.com/pytorch/pytorch/issues/31926
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31954

Differential Revision: D19331847

Pulled By: zou3519

fbshipit-source-id: e30dd9517917c056a52be7d16f23247fe28f4e28
2020-01-10 07:42:04 -08:00
Rohan Varma
8ea49e7a08 add missing braces for format in rpc _to_worker_info (#31969)
Summary:
This was missing and resulted in the incorrect `name` passed into `_to_worker_info` not being printed out in the error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31969

Differential Revision: D19331927

Pulled By: rohan-varma

fbshipit-source-id: e74d47daec3224c2d9b9da3c0a6404cfa67baf65
2020-01-09 23:18:46 -08:00
Jiakai Liu
4e84661139 update llvmlite to 0.30.0 (#31858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31858

Trying to upgrade docker image but ran into the following error:

```
Running test_nn ... [2020-01-04 18:05:12.537860]
Traceback (most recent call last):
  File "test_nn.py", line 45, in <module>
    from common_cuda import TEST_CUDA, TEST_MULTIGPU, TEST_CUDNN, TEST_CUDNN_VERSION
  File "/var/lib/jenkins/workspace/test/common_cuda.py", line 16, in <module>
    import numba.cuda
  File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 178, in <module>
    _ensure_llvm()
  File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 100, in _ensure_llvm
    raise ImportError(msg)
ImportError: Numba requires at least version 0.30.0 of llvmlite.
Installed version is 0.28.0.
```

Test Plan: Imported from OSS

Differential Revision: D19282923

Pulled By: ljk53

fbshipit-source-id: bdeefbf4f6c0c97df622282f76e77eb1eadba436
2020-01-09 19:28:08 -08:00
Shen Li
62f93443e5 Explain RPC behavior when using Tensor as arg or return value
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31968

Test Plan: Imported from OSS

Differential Revision: D19321380

Pulled By: mrshenli

fbshipit-source-id: e3431f1f02963cc8d8266a420ab03866106f26ac
2020-01-09 16:42:24 -08:00
Zafar Takhirov
6abfa9ad8a Quantized H Tangent function (#31031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031

This activation will be needed for the LSTM implementation.
Also includes the QNNPack implementation.

Test Plan: Imported from OSS

Differential Revision: D19334280

Pulled By: z-a-f

fbshipit-source-id: ae14399765a47afdf9b1e072d3967c24ff473e8d
2020-01-09 16:16:17 -08:00
Bram Wasti
021e1e20c1 Revert D19320493: Javadoc changes
Test Plan: revert-hammer

Differential Revision:
D19320493

Original commit changeset: cc76b2a2acbe

fbshipit-source-id: 3b36dd2d2591acc60a06a421dd625c21adbe578a
2020-01-09 14:23:30 -08:00
Jiakai Liu
700d1c5cbc update CI script to take string docker image version (#31857)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31857

According to mingbowan we will change to use string docker image
version because the tag is no longer an integer since we move the docker
image build job to circle CI:
http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html

Test Plan: - with stacked PR

Differential Revision: D19282726

Pulled By: ljk53

fbshipit-source-id: 7a12ae89a11cf15163b905734d50fed6dc98cb07
2020-01-09 14:15:10 -08:00
Lu Fang
67ff051ddd Remove temporary fix for torchbind in BC check (#31982)
Summary:
Remove the patch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31982

Reviewed By: hl475

Differential Revision: D19333205

Pulled By: houseroad

fbshipit-source-id: 1d16fd31ede7266789141238520d47b762a7a340
2020-01-09 13:58:16 -08:00
Alban Desmaison
2968faf154 Update doc about output_differentiability keyword in derivatives.yaml
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31925

Test Plan: Imported from OSS

Differential Revision: D19303833

Pulled By: albanD

fbshipit-source-id: 291a9f122720844a5f8386b22cf6abc66ae86e4d
2020-01-09 13:48:06 -08:00
Edward Yang
67c1d930eb Lock graph_task before writing leaf_streams. (#31995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31995

Fixes #31906.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19331259

Pulled By: ezyang

fbshipit-source-id: 5d24bf3555e632211a9b6f8e50ff241603c18b3d
2020-01-09 13:26:36 -08:00
TH3CHARLie
1296e2d55e C++ API parity: isinf (#31099)
Summary:
fixes https://github.com/pytorch/pytorch/issues/31021, port the legacy binding method of `isinf` to C++ therefore support JIT
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31099

Differential Revision: D19314733

Pulled By: yf225

fbshipit-source-id: 5725c51d19c33b4fddd0fc9e7034078580bd534e
2020-01-09 13:16:13 -08:00
Sameer Deshmukh
cfdfdf70d7 remove JSON dumping dependency (#30724)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/19420

So after actually writing a C++ JSON dumping class I figured that
a faster and cleaner way would be simply rewrite the Python without
the JSON module since the JSON that we need to output is so simple.

For now I decided to not touch the `parse_cpu_trace` function since
only changing `export_chrome_trace` shows a 4x speedup.

Here's the script I used for benchmarking:
``` python
import time
import torch

x = torch.ones(2, 2)

start = time.time()
with torch.autograd.profiler.profile() as prof:
  for _ in range(10000):
    x * x

for i in range(50):
  prof.export_chrome_trace("trace.json")

stop = time.time()

print(stop-start)
```
master branch (using json dump) -> 8.07515025138855
new branch (without json dump) ->  2.0943689346313477

I checked the trace file generated in the [test](https://github.com/pytorch/pytorch/blob/master/test/test_autograd.py#L2659)
and it does work fine.

Please let me know what you think.

If you still insist on the C++ version I can send a new patch soon enough.

CC ezyang rgommers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30724

Differential Revision: D19298955

Pulled By: ezyang

fbshipit-source-id: b0d7324ea5f90884ab8a00dd272f3aa3d9bc0427
2020-01-09 12:56:16 -08:00
jlquinn
bc68a8745f Spelling fix in transformer docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31973

Differential Revision: D19330660

Pulled By: zou3519

fbshipit-source-id: 29ea1e790a34f0241cb7aba85110f087cdc069ba
2020-01-09 11:13:23 -08:00
Jessica Lin
26f552a3d1 Javadoc changes (#31956)
Summary:
- Add Javadoc url in index.rst
- Delete no longer needed java rst files
- Remove intersphinx extension from conf.oy
- Remove javasphinx from docs/requirements.txt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31956

Differential Revision: D19320493

Pulled By: jlin27

fbshipit-source-id: cc76b2a2acbe2ecdabcd3339e1cc3182f0c906ae
2020-01-09 10:55:24 -08:00
xiaobing.zhang
e59e5ba5a3 Move geometric to Aten(CPU) (#31878)
Summary:
Fix https://github.com/pytorch/pytorch/issues/24704.
Benchmark script :
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"

#warm up
for n in [10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(1000):
        input.geometric_(0.5)

for n in [1, 10, 100, 1000]:
    fwd_t = 0
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(10000):
        t1 = _time()
        input.geometric_(0.5)
        t2 = _time()
        fwd_t = fwd_t + (t2 -t1)
    fwd_avg = fwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg))
```
Test device: **skx-8180**.
Before:
```
input size(128, 1) forward time is 0.0092 (ms).
input size(128, 10) forward time is 0.0802 (ms).
input size(128, 100) forward time is 0.7994 (ms).
input size(128, 1000) forward time is 7.8403 (ms).
```
After:
```
input size(128, 1) forward time is 0.0088 (ms).
input size(128, 10) forward time is 0.0781 (ms).
input size(128, 100) forward time is 0.7815 (ms).
input size(128, 1000) forward time is 7.7163 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31878

Differential Revision: D19314510

Pulled By: ezyang

fbshipit-source-id: 2d95bf9938c8becf280890acf9e37223ddd08a39
2020-01-09 10:47:56 -08:00
xiaobing.zhang
99b3f9cac4 Move log_sigmoid to Aten(CPU) (#30958)
Summary:
VitalyFedyunin, This PR is about port LogSigmoid activation to Aten:
Test script:
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"
m = nn.LogSigmoid()
#warm up
for n in [1, 10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.randn(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [1, 10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.randn(128, n, device=device)
    fwd_t = 0
    bwd_t = 0
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
**Before:**
```
input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms).
input size(128, 100) forward time is 0.90 (ms); backwad avg time is 0.09 (ms).
input size(128, 1000) forward time is 9.04 (ms); backwad avg time is 0.87 (ms).
```
**After:**
```
input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.03 (ms).
input size(128, 1000) forward time is 0.28 (ms); backwad avg time is 0.07 (ms).
```
**OMP_NUM_THREADS=1:**
```
Before:
input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms).
input size(128, 100) forward time is 0.88 (ms); backwad avg time is 0.10 (ms).
input size(128, 1000) forward time is 8.72 (ms); backwad avg time is 0.81 (ms).
After:
input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.03 (ms).
input size(128, 1000) forward time is 0.63 (ms); backwad avg time is 0.15 (ms).
```

Fix https://github.com/pytorch/pytorch/issues/24724, https://github.com/pytorch/pytorch/issues/24725.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30958

Differential Revision: D19275111

Pulled By: ezyang

fbshipit-source-id: bbfe82e58fb27a4fb21c1914c6547a9050072e5c
2020-01-09 10:30:00 -08:00
xiaobing.zhang
5a76335aaa Move lshift to Aten (#31566)
Summary:
VitalyFedyunin , this PR is about move lshift to Aten.
Benchmark script :
```
import timeit
import torch
torch.manual_seed(1)

for n, t in [(10, 100000),(1000, 10000)]:
    print('__lshift__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))
        for dtype in ('torch.float32', 'torch.float64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.randn({n}, dtype = {dtype}, device="{device}")', number=t))

for n, t in [(10, 100000),(1000, 10000)]:
    print('__ilshift__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
        for dtype in ('torch.float32', 'torch.float64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__lshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.31618343852460384
device: cpu, dtype: torch.uint8, 100000 times           0.31258584931492805
device: cpu, dtype: torch.int16, 100000 times           0.3140896391123533
device: cpu, dtype: torch.int32, 100000 times           0.34389012958854437
device: cpu, dtype: torch.int64, 100000 times           0.339566046372056
device: cpu, dtype: torch.float32, 100000 times         0.4180623721331358
device: cpu, dtype: torch.float64, 100000 times         0.4165227338671684
device: cuda, dtype: torch.int8, 100000 times           1.7851383443921804
device: cuda, dtype: torch.uint8, 100000 times          1.7842160519212484
device: cuda, dtype: torch.int16, 100000 times          1.789359962567687
device: cuda, dtype: torch.int32, 100000 times          1.7822618428617716
device: cuda, dtype: torch.int64, 100000 times          1.7968465769663453
device: cuda, dtype: torch.float32, 100000 times                1.8066061967983842
device: cuda, dtype: torch.float64, 100000 times                1.8046843251213431
__lshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.04618230368942022
device: cpu, dtype: torch.uint8, 10000 times            0.04634759668260813
device: cpu, dtype: torch.int16, 10000 times            0.040676115080714226
device: cpu, dtype: torch.int32, 10000 times            0.04404774494469166
device: cpu, dtype: torch.int64, 10000 times            0.04511771444231272
device: cpu, dtype: torch.float32, 10000 times          0.6887832451611757
device: cpu, dtype: torch.float64, 10000 times          0.5559549620375037
device: cuda, dtype: torch.int8, 10000 times            0.17996764183044434
device: cuda, dtype: torch.uint8, 10000 times           0.17970609478652477
device: cuda, dtype: torch.int16, 10000 times           0.17873135022819042
device: cuda, dtype: torch.int32, 10000 times           0.1781835313886404
device: cuda, dtype: torch.int64, 10000 times           0.17846618220210075
device: cuda, dtype: torch.float32, 10000 times         0.18056879844516516
device: cuda, dtype: torch.float64, 10000 times         0.18132662680000067
__ilshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.61110960226506
device: cpu, dtype: torch.uint8, 100000 times           0.6333359787240624
device: cpu, dtype: torch.int16, 100000 times           0.6345370784401894
device: cpu, dtype: torch.int32, 100000 times           0.6470990972593427
device: cpu, dtype: torch.int64, 100000 times           0.6587044578045607
device: cpu, dtype: torch.float32, 100000 times         0.7269002720713615
device: cpu, dtype: torch.float64, 100000 times         0.7217964073643088
device: cuda, dtype: torch.int8, 100000 times           1.9880435159429908
device: cuda, dtype: torch.uint8, 100000 times          1.986489498987794
device: cuda, dtype: torch.int16, 100000 times          2.0059875370934606
device: cuda, dtype: torch.int32, 100000 times          1.995262237265706
device: cuda, dtype: torch.int64, 100000 times          1.9974954994395375
device: cuda, dtype: torch.float32, 100000 times                2.00442770216614
device: cuda, dtype: torch.float64, 100000 times                2.009664717130363
__ilshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.08199594635516405
device: cpu, dtype: torch.uint8, 10000 times            0.08096733782440424
device: cpu, dtype: torch.int16, 10000 times            0.0734213450923562
device: cpu, dtype: torch.int32, 10000 times            0.0769620593637228
device: cpu, dtype: torch.int64, 10000 times            0.08650507684797049
device: cpu, dtype: torch.float32, 10000 times          0.7196345143020153
device: cpu, dtype: torch.float64, 10000 times          0.597336508333683
device: cuda, dtype: torch.int8, 10000 times            0.19723015930503607
device: cuda, dtype: torch.uint8, 10000 times           0.19754122477024794
device: cuda, dtype: torch.int16, 10000 times           0.19710093270987272
device: cuda, dtype: torch.int32, 10000 times           0.19611249305307865
device: cuda, dtype: torch.int64, 10000 times           0.19750046730041504
device: cuda, dtype: torch.float32, 10000 times         0.19680574722588062
device: cuda, dtype: torch.float64, 10000 times         0.19689027685672045
```
After:
```
__lshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3031281465664506
device: cpu, dtype: torch.uint8, 100000 times           0.30772678554058075
device: cpu, dtype: torch.int16, 100000 times           0.3088294789195061
device: cpu, dtype: torch.int32, 100000 times           0.30907699652016163
device: cpu, dtype: torch.int64, 100000 times           0.31315001379698515
device: cpu, dtype: torch.float32, 100000 times         0.38823566399514675
device: cpu, dtype: torch.float64, 100000 times         0.39300001971423626
device: cuda, dtype: torch.int8, 100000 times           1.3225595457479358
device: cuda, dtype: torch.uint8, 100000 times          1.31739442050457
device: cuda, dtype: torch.int16, 100000 times          1.3198596313595772
device: cuda, dtype: torch.int32, 100000 times          1.309600466862321
device: cuda, dtype: torch.int64, 100000 times          1.3264533821493387
device: cuda, dtype: torch.float32, 100000 times                1.3377520674839616
device: cuda, dtype: torch.float64, 100000 times                1.3343619462102652
__lshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.02718757465481758
device: cpu, dtype: torch.uint8, 10000 times            0.02701799664646387
device: cpu, dtype: torch.int16, 10000 times            0.025483975186944008
device: cpu, dtype: torch.int32, 10000 times            0.025557605549693108
device: cpu, dtype: torch.int64, 10000 times            0.026179466396570206
device: cpu, dtype: torch.float32, 10000 times          0.0962932649999857
device: cpu, dtype: torch.float64, 10000 times          0.1611471576616168
device: cuda, dtype: torch.int8, 10000 times            0.13165222201496363
device: cuda, dtype: torch.uint8, 10000 times           0.13358880020678043
device: cuda, dtype: torch.int16, 10000 times           0.1342075066640973
device: cuda, dtype: torch.int32, 10000 times           0.1328689968213439
device: cuda, dtype: torch.int64, 10000 times           0.13336248509585857
device: cuda, dtype: torch.float32, 10000 times         0.1345295710489154
device: cuda, dtype: torch.float64, 10000 times         0.14084953162819147
__ilshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.19080814253538847
device: cpu, dtype: torch.uint8, 100000 times           0.18541878275573254
device: cpu, dtype: torch.int16, 100000 times           0.19136024825274944
device: cpu, dtype: torch.int32, 100000 times           0.1916898973286152
device: cpu, dtype: torch.int64, 100000 times           0.1973192635923624
device: cpu, dtype: torch.float32, 100000 times         0.2668355852365494
device: cpu, dtype: torch.float64, 100000 times         0.24472137168049812
device: cuda, dtype: torch.int8, 100000 times           1.3581306440755725
device: cuda, dtype: torch.uint8, 100000 times          1.3522163443267345
device: cuda, dtype: torch.int16, 100000 times          1.366145665757358
device: cuda, dtype: torch.int32, 100000 times          1.3674909211695194
device: cuda, dtype: torch.int64, 100000 times          1.3734915973618627
device: cuda, dtype: torch.float32, 100000 times                1.3831533305346966
device: cuda, dtype: torch.float64, 100000 times                1.396162535995245
__ilshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.02847585454583168
device: cpu, dtype: torch.uint8, 10000 times            0.02960751298815012
device: cpu, dtype: torch.int16, 10000 times            0.028516249731183052
device: cpu, dtype: torch.int32, 10000 times            0.02842544950544834
device: cpu, dtype: torch.int64, 10000 times            0.029186096973717213
device: cpu, dtype: torch.float32, 10000 times          0.0999628696590662
device: cpu, dtype: torch.float64, 10000 times          0.16676222812384367
device: cuda, dtype: torch.int8, 10000 times            0.13856443110853434
device: cuda, dtype: torch.uint8, 10000 times           0.13766566663980484
device: cuda, dtype: torch.int16, 10000 times           0.13652489613741636
device: cuda, dtype: torch.int32, 10000 times           0.13678150344640017
device: cuda, dtype: torch.int64, 10000 times           0.13749946560710669
device: cuda, dtype: torch.float32, 10000 times         0.13879029918462038
device: cuda, dtype: torch.float64, 10000 times         0.14587809145450592
```

Fix https://github.com/pytorch/pytorch/issues/24510 #24514 https://github.com/pytorch/pytorch/issues/24657  https://github.com/pytorch/pytorch/issues/24661
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31566

Differential Revision: D19314251

Pulled By: ezyang

fbshipit-source-id: 52df17b2c18ef1880374c6dbcf18fb1118086552
2020-01-09 09:41:36 -08:00
Richard Zou
5c423cae72 Add precision tests for CUDA half linspace+logspace (#31962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31962

I added precision tests for CUDA half, float, and double.

The precision for CUDA half seems bad, but I checked the numbers against
previous versions of pytorch. The output of CUDA Half linspace+logspace
are exactly the same when compared with 1.2.0.

Test Plan: - Run CI

Differential Revision: D19320182

Pulled By: zou3519

fbshipit-source-id: 38d3d4dea2807875ed0b0ec2b93b19c10a289988
2020-01-09 07:35:52 -08:00
Iurii Zdebskyi
5d5f156558 Revert D18903453: Quantized H Tangent function
Test Plan: revert-hammer

Differential Revision:
D18903453

Original commit changeset: 0050b1cebb1d

fbshipit-source-id: 205978f71d5688d4068861f7cf2dff40fbb311c6
2020-01-09 07:30:49 -08:00
Edward Yang
ddff4efa26 Don't use RTLD_GLOBAL to load _C. (#31162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31162

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D19262579

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2
2020-01-09 07:28:15 -08:00
Edward Yang
8614860210 Uniformly apply Windows logic in cpp_extensions everywhere (#31161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161

Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols.  But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262578

Pulled By: ezyang

fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f
2020-01-09 07:28:11 -08:00
Negin Raoof
0dbd5c0bfe Added torchvision tests as part of ORT tests (#31835)
Summary:
Added torchvision tests as part of ORT tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31835

Reviewed By: hl475

Differential Revision: D19278607

Pulled By: houseroad

fbshipit-source-id: 18a6a85ce3019bcc9aee9517af1378964b585afd
2020-01-08 21:04:29 -08:00
Supriya Rao
6d9a9e379d Fix segfault in caffe2 slice test (#31801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31801

Try to fix issue #30764

Test Plan:
python test/onnx/test_utility_funs.py TestUtilityFuns

Imported from OSS

Differential Revision: D19315046

fbshipit-source-id: de3595969280e4ebe762cb098ff0891f8b5a9a90
2020-01-08 17:13:29 -08:00
Hector Yuen
9e9ca6ec37 add conversion functions to embedding tables (#31083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31083

add (fp32/fp16)<->(int8 rowwise quantized fp32/fp16 scale biases)

Test Plan:
added unit tests
enhanced shape inference tests

Reviewed By: jspark1105

Differential Revision: D18920547

fbshipit-source-id: 6b3d7cb93f9d1669ecf511817d73976177632891
2020-01-08 16:56:12 -08:00
jjsjann123
eb23171bce TensorIterator norm update (#31903)
Summary:
special case for norm out where p == 2. Instead of calling `pow`,
we use multiplication as a faster code path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31903

Differential Revision: D19312749

Pulled By: ngimel

fbshipit-source-id: 73732b7b37a243a14438609784795b920271a0b5
2020-01-08 16:50:42 -08:00
Elias Ellison
8ecd3f783d check for object equality in constant pooling (#31800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31800

If we know that two constants are the same object, we can ignore other constraints and pool them together. This fixes an issue introduced by the other PR where quantization relied on constant pooling happening for correctness.

Test Plan: Imported from OSS

Differential Revision: D19269499

Pulled By: eellison

fbshipit-source-id: 9d4396125aa6899cb081863d463d4f024135cbf4
2020-01-08 16:47:07 -08:00
Elias Ellison
319cc21108 Add AliasDb API For Changing Aliasing (#31501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31501

We have a number of places in our code base where we should be checking if it's safe to change the alias relationship between two sets of values. This PR adds an api to Alias Db to consolidate the logic, and refactors Constant Pooling and `CSE` to use the new api. Next steps: add api usage in peephole.cpp where applicable.

Happy to bikeshed `AliasDb::safeToChangeAliasingRelationship`. Previously I suggested `AliasDb::safeToIntroduceAliasing`, however that's not quite accurate, because this API also handles when it is unsafe to remove aliasing.

Alternate suggestions: `safeToChangeAliasing`, `validToChangeAliasing`, `validToChangeAliasingRelationship`

Related:  https://github.com/pytorch/pytorch/issues/28360

Test Plan: Imported from OSS

Differential Revision: D19254413

Pulled By: eellison

fbshipit-source-id: 17f7f52ad2d1526d303132767cbbb32f8189ae15
2020-01-08 16:47:03 -08:00
davidriazati
5cc49ed45f Document IValue (#31904)
Summary:
This is a first pass attempt at documenting `IValue` to help with problems like in #17165. Most users are probably concerned with
 * how to make an `IValue` that matches the input type to their graph (most of the constructors are pretty self explanatory, so as long as they are in the docs I think its enough)
 * how to extract the results after running their graph (there is a small note on the behavior of `.toX()` based on confusions we've had in the past)

Preview:
https://driazati.github.io/pytorch_doc_previews/31904/api/structc10_1_1_i_value.html#exhale-struct-structc10-1-1-i-value

There are also some random CSS fixes to clean up the style.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31904

Pulled By: driazati

Differential Revision: D19318733

fbshipit-source-id: b29dae3349d5a7ea5a3b8e09cd23f7ff8434edb4
2020-01-08 16:08:35 -08:00
davidriazati
883fb5434a Use real argument names for Python functions (#29300)
Summary:
This hooks up `inspect` so that Python functions get their parameters
names attached instead of naming them `0, 1, 2, ...`. This also fixes
issue #28537 where `ignore` functions were improperly typing `self`.
](https://our.intern.facebook.com/intern/diff/19256434/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29300

Pulled By: driazati

Differential Revision: D19256434

fbshipit-source-id: 6a1fe7bd0afab708b8439517798955d0abfeb44c
2020-01-08 15:41:28 -08:00
davidriazati
09a22f3301 Remove C++ docs contributing page (#31908)
Summary:
Stacked PRs
 * **#31908 - Remove C++ docs contributing page**
 * #31905 - Add doc previewing instructions

We should have 1 source of truth for contribution instructions (CONTRIBUTING.md).
This PR moves the instructions from the C++ doc pages there instead of having its
own separate page.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31908

Pulled By: driazati

Differential Revision: D19296366

fbshipit-source-id: c1daf004259342bd09e09dea3b80e34db47066ec
2020-01-08 15:37:35 -08:00
davidriazati
8c59d48281 Add doc previewing instructions (#31905)
Summary:
Stacked PRs
 * #31908 - Remove C++ docs contributing page
 * **#31905 - Add doc previewing instructions**

This adds some instructions on how to get started with Github pages you can show reviewers your documentation changes. Hopefully we can delete this eventually and build docs automatically on relevant PRs in CI.
](https://our.intern.facebook.com/intern/diff/19296364/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31905

Pulled By: driazati

Differential Revision: D19296364

fbshipit-source-id: df47fa1a8d7be029c3efcf6521298583ad9f7a95
2020-01-08 15:37:31 -08:00
xiaobing.zhang
dedd16b418 remove THConv code which never be used (#31879)
Summary:
Just remove dead code in TH.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31879

Differential Revision: D19315818

Pulled By: ezyang

fbshipit-source-id: dbeb2475e19e9ebf769df2649cc859c08d3d184d
2020-01-08 15:14:27 -08:00
xiaobing.zhang
9a3cb1e859 Move cauchy to Aten(CPU) (#31824)
Summary:
Fix https://github.com/pytorch/pytorch/issues/24684.
Benchmark script :
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"

#warm up
for n in [10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(1000):
        input.cauchy_()

for n in [1, 10, 100, 1000]:
    fwd_t = 0
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(10000):
        t1 = _time()
        input.cauchy_()
        t2 = _time()
        fwd_t = fwd_t + (t2 -t1)
    fwd_avg = fwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg))
```
Test device: **skx-8180**.
Before:
```
input size(128, 1) forward time is 0.0071 (ms).
input size(128, 10) forward time is 0.0596 (ms).
input size(128, 100) forward time is 0.5798 (ms).
input size(128, 1000) forward time is 5.8395 (ms).
```
After:
```
input size(128, 1) forward time is 0.0070 (ms).
input size(128, 10) forward time is 0.0583 (ms).
input size(128, 100) forward time is 0.5714 (ms).
input size(128, 1000) forward time is 5.7674 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31824

Differential Revision: D19314411

Pulled By: ezyang

fbshipit-source-id: 58098546face3e5971b023f702cfe44ff1cccfbc
2020-01-08 15:10:53 -08:00
xiaobing.zhang
9ba6a768de Add op bitwise_or (#31559)
Summary:
ezyang ,  this PR add bitwise_or operator as https://github.com/pytorch/pytorch/pull/31104 .
Benchmark script :
```
import timeit
import torch
torch.manual_seed(1)

for n, t in [(10, 100000),(1000, 10000)]:
    print('__or__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a | b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))

for n, t in [(10, 100000),(1000, 10000)]:
    print('__ior__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a | b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__or__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.17616272252053022
device: cpu, dtype: torch.uint8, 100000 times           0.17148233391344547
device: cpu, dtype: torch.int16, 100000 times           0.17616403382271528
device: cpu, dtype: torch.int32, 100000 times           0.17717823758721352
device: cpu, dtype: torch.int64, 100000 times           0.1801931718364358
device: cuda, dtype: torch.int8, 100000 times           1.270583058707416
device: cuda, dtype: torch.uint8, 100000 times          1.2636413089931011
device: cuda, dtype: torch.int16, 100000 times          1.2839747751131654
device: cuda, dtype: torch.int32, 100000 times          1.2548385225236416
device: cuda, dtype: torch.int64, 100000 times          1.2650810535997152
__or__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.031136621721088886
device: cpu, dtype: torch.uint8, 10000 times            0.030786747112870216
device: cpu, dtype: torch.int16, 10000 times            0.02391665056347847
device: cpu, dtype: torch.int32, 10000 times            0.024147341027855873
device: cpu, dtype: torch.int64, 10000 times            0.024414129555225372
device: cuda, dtype: torch.int8, 10000 times            0.12741921469569206
device: cuda, dtype: torch.uint8, 10000 times           0.1249831635504961
device: cuda, dtype: torch.int16, 10000 times           0.1283819805830717
device: cuda, dtype: torch.int32, 10000 times           0.12591975275427103
device: cuda, dtype: torch.int64, 10000 times           0.12655890546739101
__ior__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3908365070819855
device: cpu, dtype: torch.uint8, 100000 times           0.38267823681235313
device: cpu, dtype: torch.int16, 100000 times           0.38239253498613834
device: cpu, dtype: torch.int32, 100000 times           0.3817988149821758
device: cpu, dtype: torch.int64, 100000 times           0.3901665909215808
device: cuda, dtype: torch.int8, 100000 times           1.4211318120360374
device: cuda, dtype: torch.uint8, 100000 times          1.4215159295126796
device: cuda, dtype: torch.int16, 100000 times          1.4307750314474106
device: cuda, dtype: torch.int32, 100000 times          1.4123614141717553
device: cuda, dtype: torch.int64, 100000 times          1.4480243818834424
__ior__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.06468924414366484
device: cpu, dtype: torch.uint8, 10000 times            0.06442475505173206
device: cpu, dtype: torch.int16, 10000 times            0.05267547257244587
device: cpu, dtype: torch.int32, 10000 times            0.05286940559744835
device: cpu, dtype: torch.int64, 10000 times            0.06211103219538927
device: cuda, dtype: torch.int8, 10000 times            0.15332304500043392
device: cuda, dtype: torch.uint8, 10000 times           0.15353196952492
device: cuda, dtype: torch.int16, 10000 times           0.15300503931939602
device: cuda, dtype: torch.int32, 10000 times           0.15274472255259752
device: cuda, dtype: torch.int64, 10000 times           0.1512152962386608
```
After:
```
__or__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.2465507509186864
device: cpu, dtype: torch.uint8, 100000 times           0.2472386620938778
device: cpu, dtype: torch.int16, 100000 times           0.2469814233481884
device: cpu, dtype: torch.int32, 100000 times           0.2535214088857174
device: cpu, dtype: torch.int64, 100000 times           0.24855613708496094
device: cuda, dtype: torch.int8, 100000 times           1.4351346511393785
device: cuda, dtype: torch.uint8, 100000 times          1.4434308474883437
device: cuda, dtype: torch.int16, 100000 times          1.4520929995924234
device: cuda, dtype: torch.int32, 100000 times          1.4456610176712275
device: cuda, dtype: torch.int64, 100000 times          1.4580101007595658
__or__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.029985425993800163
device: cpu, dtype: torch.uint8, 10000 times            0.03024935908615589
device: cpu, dtype: torch.int16, 10000 times            0.026356655173003674
device: cpu, dtype: torch.int32, 10000 times            0.027377349324524403
device: cpu, dtype: torch.int64, 10000 times            0.029163731262087822
device: cuda, dtype: torch.int8, 10000 times            0.14540370367467403
device: cuda, dtype: torch.uint8, 10000 times           0.1456305105239153
device: cuda, dtype: torch.int16, 10000 times           0.1450125053524971
device: cuda, dtype: torch.int32, 10000 times           0.1472016740590334
device: cuda, dtype: torch.int64, 10000 times           0.14709716010838747
__ior__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.27195510920137167
device: cpu, dtype: torch.uint8, 100000 times           0.2692424338310957
device: cpu, dtype: torch.int16, 100000 times           0.27726674638688564
device: cpu, dtype: torch.int32, 100000 times           0.2815811652690172
device: cpu, dtype: torch.int64, 100000 times           0.2852728571742773
device: cuda, dtype: torch.int8, 100000 times           1.4743850827217102
device: cuda, dtype: torch.uint8, 100000 times          1.4766502184793353
device: cuda, dtype: torch.int16, 100000 times          1.4774163831025362
device: cuda, dtype: torch.int32, 100000 times          1.4749693805351853
device: cuda, dtype: torch.int64, 100000 times          1.5772947426885366
__ior__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.03614502027630806
device: cpu, dtype: torch.uint8, 10000 times            0.03619729354977608
device: cpu, dtype: torch.int16, 10000 times            0.0319912089034915
device: cpu, dtype: torch.int32, 10000 times            0.03319283854216337
device: cpu, dtype: torch.int64, 10000 times            0.0343862259760499
device: cuda, dtype: torch.int8, 10000 times            0.1581476852297783
device: cuda, dtype: torch.uint8, 10000 times           0.15974601730704308
device: cuda, dtype: torch.int16, 10000 times           0.15957212820649147
device: cuda, dtype: torch.int32, 10000 times           0.16002820804715157
device: cuda, dtype: torch.int64, 10000 times           0.16129320487380028
```

Fix  https://github.com/pytorch/pytorch/issues/24511, https://github.com/pytorch/pytorch/issues/24515, https://github.com/pytorch/pytorch/issues/24658, https://github.com/pytorch/pytorch/issues/24662.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31559

Differential Revision: D19315875

Pulled By: ezyang

fbshipit-source-id: 4a3ca88fdafbeb796079687e676228111eb44aad
2020-01-08 15:06:30 -08:00
xiaobing.zhang
4f9d2f74e2 Port softplus activation to Aten(CPU+CUDA) (#30504)
Summary:
VitalyFedyunin, This PR is about port Softplus activation to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)
def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
m = nn.Softplus()
if torch.cuda.is_available():
    device = "cuda"
    m = m.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    fwd_t = 0
    bwd_t = 0
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test Device: CPU: skx-8180, GPU: Tesla P40.
Perfromance:
Before:
```
GPU:
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.12 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.18 (ms).
CPU:
input size(128, 100) forward time is 1.16 (ms); backwad avg time is 0.69 (ms).
input size(128, 10000) forward time is 60.19 (ms); backwad avg time is 31.86 (ms).
```
After:
```
GPU:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).
CPU:
input size(128, 100) forward time is 0.43 (ms); backwad avg time is 0.16 (ms).
input size(128, 10000) forward time is 1.65 (ms); backwad avg time is 0.83 (ms).
```
`OMP_NUM_THREADS=1:`
```
Before:
input size(128, 100) forward time is 0.53 (ms); backwad avg time is 0.28 (ms).
input size(128, 10000) forward time is 51.33 (ms); backwad avg time is 25.48 (ms).
After:
input size(128, 100) forward time is 0.44 (ms); backwad avg time is 0.16 (ms).
input size(128, 10000) forward time is 42.05 (ms); backwad avg time is 13.97 (ms).
```

Fix https://github.com/pytorch/pytorch/issues/24633, https://github.com/pytorch/pytorch/issues/24634, https://github.com/pytorch/pytorch/issues/24766, https://github.com/pytorch/pytorch/issues/24767.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30504

Differential Revision: D19274913

Pulled By: ezyang

fbshipit-source-id: 21b29e8459dcba5a040cc68333887b45a858328e
2020-01-08 15:03:53 -08:00
Yinghai Lu
d2fdf140af Combine all the user inputs together and convert them to fp16 (#31898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31898

Att

Reviewed By: tracelogfb

Differential Revision: D19291357

fbshipit-source-id: 747ed5234ca042ceeaff2d094701ead7597ac3ee
2020-01-08 14:36:42 -08:00
Yinghai Lu
8b4feff01d Use simd version for fp16 conversions (#31897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31897

Previous version only use avx2. The _simd version uses avx512 if CPU is capable of that.

Test Plan: Unitttest

Reviewed By: tracelogfb

Differential Revision: D19291499

fbshipit-source-id: 3b1ee0ba756e5c9defbd5caf7f68982d9b2ca06c
2020-01-08 14:36:38 -08:00
Alban Desmaison
1314f7f4f4 Ensure the original grad_mode is restored during backward (#31884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31884

Fix #31715

Test Plan: Imported from OSS

Differential Revision: D19301076

Pulled By: albanD

fbshipit-source-id: 2d20c01bfb6364fa96c8fe5aa5ce7ea39defa3ce
2020-01-08 14:16:51 -08:00
Alban Desmaison
c299cb05ef temporary fix for jit test backward compatibility issues
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31949

Test Plan: Imported from OSS

Differential Revision: D19314763

Pulled By: albanD

fbshipit-source-id: b5eff0ed53a371d260596ca85d914c8bddb0a8aa
2020-01-08 13:32:08 -08:00
Mingbo Wan
462bfc7fe7 docker hub image info (#31923)
Summary:
result: http://docker.pytorch.org/docker_hub.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31923

Differential Revision: D19316770

Pulled By: mingbowan

fbshipit-source-id: 57f34d8983d26772bb0d310fa0a4085674c860e5
2020-01-08 13:20:06 -08:00
Edward Yang
5dfcfeebb8 Revert D19298735: Emit warning from deprecated torch function signatures
Test Plan: revert-hammer

Differential Revision:
D19298735

Original commit changeset: 03cb78af1765

fbshipit-source-id: 304a6d4412f53a8fc822d36897c96815432e0f70
2020-01-08 13:04:41 -08:00
Zafar Takhirov
620060cb0c Quantized H Tangent function (#31031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031

This activation will be needed for the LSTM implementation.
Also includes the QNNPack implementation.

Test Plan: Imported from OSS

Differential Revision: D18903453

Pulled By: z-a-f

fbshipit-source-id: 0050b1cebb1ddb179b7ecbcb114fe70705070f67
2020-01-08 12:59:39 -08:00
Peter Bell
54777b1e73 Avoid reference invalidation in cuda SpectralOps' plan_caches (#31861)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31412

The root cause is `plan_caches` being resized in one thread while another holds a reference to an existing `CuFFTParamsLRUCache` which then becomes invalidated.

I was able to reproduce the crash very reliably without this fix applied and no longer see it. Being a race condition, it's hard to say for sure though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31861

Differential Revision: D19312314

Pulled By: ezyang

fbshipit-source-id: 06e4561128d503f2d70cdfe1982be0f3db2a8cf8
2020-01-08 11:50:05 -08:00
Shen Li
7f723cbd8a Revert D19290954: Implement backend-agnostic rpc._wait_all_workers() utility
Test Plan: revert-hammer

Differential Revision:
D19290954

Original commit changeset: cdb22203c2f2

fbshipit-source-id: 2ae194a06a645e4f48879271eccf0588b0956cd3
2020-01-08 10:25:51 -08:00
Xiang Gao
c66ca74f03 Add device debug info to CUDA build (#31929)
Summary:
Also print NVCC flags in the summary
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31929

Differential Revision: D19312079

Pulled By: ezyang

fbshipit-source-id: cd20d5a385f61174c1907a9ad883c04de66ef037
2020-01-08 09:56:20 -08:00
Sebastian Messmer
f0072b3af5 Remove C++11 compatibility from c10::optional (#30919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30919

deletecode
ghstack-source-id: 96383227

Test Plan: waitforsandcastle

Differential Revision: D18869641

fbshipit-source-id: c08345d17a291cea3749af20473b6acddc78ab27
2020-01-08 09:19:59 -08:00
Sebastian Messmer
f67851d69a Fix c10::util::get_fully_qualified_type_name for MSVC (#31313)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31313

This is a bugfix. The reason we couldn't enable the constexpr-ness for it before is that it was buggy,
and without constexpr it crashed at runtime and not at compile time which seems to have passed our CI unfortunately...
ghstack-source-id: 96380160

Test Plan: Now it works even when enabling constexpr for it

Differential Revision: D19087471

fbshipit-source-id: 28be107389f4507d35d08eab4b089a405690529b
2020-01-08 09:11:10 -08:00
Sebastian Messmer
2a294aace6 Remove memory ordering from LeftRight (#31026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31026

This is error prone and probably wrong. Since we don't use LeftRight on the hot path anymore, let's remove this.
ghstack-source-id: 96369644

Test Plan: none

Differential Revision: D18902165

fbshipit-source-id: 7b9478cd7cc071f403d75da20c7c889c27248b5c
2020-01-08 08:59:30 -08:00
James Donald
84dfa96f62 Fix -Wundef warning in conversions.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31911

Test Plan:
* CI builds including GPU and OSS-build tests
* The `defined(__HIP_DEVICE_COMPILE__) ` instance a few lines below is proof that this is a define/undef flag, not a define01 flag

Reviewed By: hlu1

Differential Revision: D19296560

fbshipit-source-id: 1c45069aec534b0bf4a87751a74680675c985e06
2020-01-08 08:39:37 -08:00
Alban Desmaison
ee817012b2 Add more tests to the autograd wrt view and inplace (#31147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31147

The goal here is to add more tests of the current behavior of the autograd to make sure no regressions are introduced when modifying it.
Do let me know if you think of other corner cases I missed.

Test Plan: Imported from OSS

Differential Revision: D19301082

Pulled By: albanD

fbshipit-source-id: 2cb07dcf99e56eb1f2c56a179796f2e6042d5a2d
2020-01-08 07:14:52 -08:00