Commit Graph

999 Commits

Author SHA1 Message Date
xiaobing.zhang
b47e9b97a2 Add op bitwise_and (#31104)
Summary:
Refer to https://github.com/pytorch/pytorch/pull/25665,  add `bitwise_and` operator.
Benchmark script :
```
import timeit
#for __and__
for n, t in [(10, 100000),(1000, 10000)]:
    print('__and__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))
#for __iand__
for n, t in [(10, 100000),(1000, 10000)]:
    print('__iand__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__and__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.1766007635742426
device: cpu, dtype: torch.uint8, 100000 times           0.17322628945112228
device: cpu, dtype: torch.int16, 100000 times           0.17650844901800156
device: cpu, dtype: torch.int32, 100000 times           0.17711848113685846
device: cpu, dtype: torch.int64, 100000 times           0.18240160401910543
device: cuda, dtype: torch.int8, 100000 times           1.273967768996954
device: cuda, dtype: torch.uint8, 100000 times          1.2778537990525365
device: cuda, dtype: torch.int16, 100000 times          1.2753686187788844
device: cuda, dtype: torch.int32, 100000 times          1.2797665279358625
device: cuda, dtype: torch.int64, 100000 times          1.2933144550770521
__and__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.031139614060521126
device: cpu, dtype: torch.uint8, 10000 times            0.03091452084481716
device: cpu, dtype: torch.int16, 10000 times            0.022756479680538177
device: cpu, dtype: torch.int32, 10000 times            0.025045674294233322
device: cpu, dtype: torch.int64, 10000 times            0.024164282716810703
device: cuda, dtype: torch.int8, 10000 times            0.12820732593536377
device: cuda, dtype: torch.uint8, 10000 times           0.12775669433176517
device: cuda, dtype: torch.int16, 10000 times           0.12697868794202805
device: cuda, dtype: torch.int32, 10000 times           0.12832533661276102
device: cuda, dtype: torch.int64, 10000 times           0.1280576130375266
__iand__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3687064303085208
device: cpu, dtype: torch.uint8, 100000 times           0.36253443732857704
device: cpu, dtype: torch.int16, 100000 times           0.362891579978168
device: cpu, dtype: torch.int32, 100000 times           0.37680106051266193
device: cpu, dtype: torch.int64, 100000 times           0.3689364707097411
device: cuda, dtype: torch.int8, 100000 times           1.419940729625523
device: cuda, dtype: torch.uint8, 100000 times          1.4247053815051913
device: cuda, dtype: torch.int16, 100000 times          1.4191444097086787
device: cuda, dtype: torch.int32, 100000 times          1.4305962566286325
device: cuda, dtype: torch.int64, 100000 times          1.4567416654899716
__iand__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.06224383972585201
device: cpu, dtype: torch.uint8, 10000 times            0.06205617543309927
device: cpu, dtype: torch.int16, 10000 times            0.05016433447599411
device: cpu, dtype: torch.int32, 10000 times            0.05216377507895231
device: cpu, dtype: torch.int64, 10000 times            0.06139362137764692
device: cuda, dtype: torch.int8, 10000 times            0.14827249851077795
device: cuda, dtype: torch.uint8, 10000 times           0.14801877550780773
device: cuda, dtype: torch.int16, 10000 times           0.14952312968671322
device: cuda, dtype: torch.int32, 10000 times           0.14999118447303772
device: cuda, dtype: torch.int64, 10000 times           0.14951884001493454
```
After:
```
__and__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.23157884553074837
device: cpu, dtype: torch.uint8, 100000 times           0.23063660878688097
device: cpu, dtype: torch.int16, 100000 times           0.23005440644919872
device: cpu, dtype: torch.int32, 100000 times           0.23748818412423134
device: cpu, dtype: torch.int64, 100000 times           0.24106105230748653
device: cuda, dtype: torch.int8, 100000 times           1.4394256137311459
device: cuda, dtype: torch.uint8, 100000 times          1.4436759827658534
device: cuda, dtype: torch.int16, 100000 times          1.4631587155163288
device: cuda, dtype: torch.int32, 100000 times          1.459101552143693
device: cuda, dtype: torch.int64, 100000 times          1.4784048134461045
__and__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.028442862443625927
device: cpu, dtype: torch.uint8, 10000 times            0.028130197897553444
device: cpu, dtype: torch.int16, 10000 times            0.025318274274468422
device: cpu, dtype: torch.int32, 10000 times            0.02519288007169962
device: cpu, dtype: torch.int64, 10000 times            0.028299466706812382
device: cuda, dtype: torch.int8, 10000 times            0.14342594426125288
device: cuda, dtype: torch.uint8, 10000 times           0.145280827768147
device: cuda, dtype: torch.int16, 10000 times           0.14673697855323553
device: cuda, dtype: torch.int32, 10000 times           0.14499565307050943
device: cuda, dtype: torch.int64, 10000 times           0.14582364354282618
__iand__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.25548241566866636
device: cpu, dtype: torch.uint8, 100000 times           0.2552562616765499
device: cpu, dtype: torch.int16, 100000 times           0.25905191246420145
device: cpu, dtype: torch.int32, 100000 times           0.26635489892214537
device: cpu, dtype: torch.int64, 100000 times           0.26269810926169157
device: cuda, dtype: torch.int8, 100000 times           1.485458506271243
device: cuda, dtype: torch.uint8, 100000 times          1.4742380809038877
device: cuda, dtype: torch.int16, 100000 times          1.507783885113895
device: cuda, dtype: torch.int32, 100000 times          1.4926990242674947
device: cuda, dtype: torch.int64, 100000 times          1.519851053133607
__iand__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.03425929415971041
device: cpu, dtype: torch.uint8, 10000 times            0.03293587639927864
device: cpu, dtype: torch.int16, 10000 times            0.029559112153947353
device: cpu, dtype: torch.int32, 10000 times            0.030915481969714165
device: cpu, dtype: torch.int64, 10000 times            0.03292469773441553
device: cuda, dtype: torch.int8, 10000 times            0.15792148280888796
device: cuda, dtype: torch.uint8, 10000 times           0.16000914946198463
device: cuda, dtype: torch.int16, 10000 times           0.1600684942677617
device: cuda, dtype: torch.int32, 10000 times           0.16162546630948782
device: cuda, dtype: torch.int64, 10000 times           0.1629159888252616
```
Fix  https://github.com/pytorch/pytorch/issues/24508, https://github.com/pytorch/pytorch/issues/24509,  https://github.com/pytorch/pytorch/issues/24655, https://github.com/pytorch/pytorch/issues/24656.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31104

Differential Revision: D18938930

Pulled By: VitalyFedyunin

fbshipit-source-id: a77e805a0b84e8ace16c6e648c2f67dad44f2e44
2020-01-03 10:32:36 -08:00
leetanenbaum
0b9cd410a9 Fix cumsum error for tensors with zero elements (#31694)
Summary:
Currently `cumsum` crashes for tensors with non-empty dimensions but with zero elements, which could happen when some dimension is zero. This commit fixes the error by checking both `dim()` and `numel()` in cumsum backward

Fixes https://github.com/pytorch/pytorch/issues/31515
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31694

Reviewed By: mrshenli

Differential Revision: D19266613

Pulled By: leedtan

fbshipit-source-id: 9407e0aa55440fed911c01a3580bb6c5eab62a16
2020-01-03 10:16:46 -08:00
BowenBao
c4f10e0fe7 Renaming scales parameter for interpolate (#31526)
Summary:
PR separated from https://github.com/pytorch/pytorch/pull/31274.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31526

Reviewed By: zou3519

Differential Revision: D19221931

Pulled By: gchanan

fbshipit-source-id: 81958a9910867ac9d62f2b47abc49384526c4e51
2020-01-02 08:19:30 -08:00
anjali411
ae214f67a5 updated code to ensure error check for negative dims
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31636

Differential Revision: D19233031

Pulled By: anjali411

fbshipit-source-id: c29265ddd1f887f1a0b98aca56a2691d7584353d
2019-12-27 14:39:57 -08:00
Gregory Chanan
68e5172382 Support optional float parameters (float?, optional<double>). (#31517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31517

This is going to be used by upsample (which currently uses magic values to represent optionals).

For now, we just introduce a fake function for testing (torch._test_optional_float(x)).

Test Plan: Imported from OSS

Differential Revision: D19198721

Pulled By: gchanan

fbshipit-source-id: 0a1382fde0927c5d277d02d62bfb31fb574b8c74
2019-12-23 08:33:39 -08:00
anjali411
9d9bc93bfb Added error message to indicate that reduction operations are not supported for dim>=64 (#31476)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/23159
Currently we don't support reduction operations for dim>=64 and we should give a descriptive RuntimeError indicating the same
Diff: D19179039
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31476

Differential Revision: D19179039

Pulled By: anjali411

fbshipit-source-id: 58568f64627bf3df6b3e00a1498544c030e74a0e
2019-12-19 13:00:53 -08:00
Iurii Zdebskyi
58d2dd5b73 Enabled flip for bool tensors (#31267)
Summary:
Fix this [issue](https://github.com/pytorch/pytorch/issues/31213)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31267

Differential Revision: D19047249

Pulled By: izdeby

fbshipit-source-id: f58ca3ac88aab28742b8d345400270f7d31c3856
2019-12-18 09:01:32 -08:00
Kurt Mohler
3694749cd1 Detect dill version in torch.save/load (#30985)
Summary:
Fix for issue https://github.com/pytorch/pytorch/issues/28313
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30985

Differential Revision: D19142947

Pulled By: zou3519

fbshipit-source-id: 10e3a182a99e80ca8c9c8328b6f8764b27d78eb3
2019-12-18 08:05:08 -08:00
Xiang Gao
ffe0c1ae4d Make test_torch.py pass cuda-memcheck (#29243)
Summary:
Make the following changes:
- When there are more than 10k errors, cuda-memcheck only shows 10k errors, in this case we shouldn't raise an Exception
- Add UNDER_CUDA_MEMCHECK environment to allow disabling `pin_memory` tests when running cuda-memcheck.
- Add a `--ci` command option, when turned on, then this script would run output to stdout instead of writing a file, and exit with an error if cuda-memcheck fails
- Add a `--nohang` command option. When turned on, then hang would be treated as pass instead of error
- Do simple filtering on the test to run: if `'cpu'` in the test name but not `'cuda'` is not in the test name
- Add `--split` and `--rank` to allowing splitting the work (NVIDIA CI has a limitation of 3 hours, we have to split the work to satisfy this limitation)
- The error summary could be `ERROR SUMMARY: 1 error`, or `ERROR SUMMARY: 2 errors`, the tail could be `error` or `errors`, it is not of the same length. The script is fixed to handle this case.
- Ignore errors from `cufft`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29243

Differential Revision: D18941701

Pulled By: mruberry

fbshipit-source-id: 2048428f32b66ef50c67444c03ce4dd9491179d2
2019-12-14 20:29:58 -08:00
Vitaly Fedyunin
c35cddb306 Switch default memory format of clone operator to Preserve
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30089

Test Plan: Imported from OSS

Differential Revision: D18624985

Pulled By: VitalyFedyunin

fbshipit-source-id: 8d315b08b7b5858fd0a81d3375b44ccb94787ad4
2019-12-14 20:29:06 -08:00
Vitaly Fedyunin
fde3d707ad Switch default memory format of to (and similar) operators to Preserve
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30088

Test Plan: Imported from OSS

Differential Revision: D18624984

Pulled By: VitalyFedyunin

fbshipit-source-id: 54901786d7496c7dce785140b0585ac9093b1d86
2019-12-14 20:29:01 -08:00
Vitaly Fedyunin
927588df8e Switch default memory format of _like operators to Preserve
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30087

Test Plan: Imported from OSS

Differential Revision: D18624986

Pulled By: VitalyFedyunin

fbshipit-source-id: 8e434966f872ffaddf1249248ea445cbbab300ce
2019-12-14 20:28:57 -08:00
Xiang Gao
9954739956 Refactor test for unique and unique_consecutive and fix some bugs (#31211)
Summary:
Tests for unique_dim will be refactored in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31211

Differential Revision: D19034968

Pulled By: ngimel

fbshipit-source-id: 855d326b37638b5944f11fbbce03394cf000daf9
2019-12-14 20:28:38 -08:00
Iurii Zdebskyi
f6c31f61c5 Enabled roll for bool tensor (#31194)
Summary:
Fixed this [issue](https://github.com/pytorch/pytorch/issues/31079).
Tested via unit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31194

Differential Revision: D18958141

Pulled By: izdeby

fbshipit-source-id: 119bf4d31df10ee02c277f5a4663038470cf7780
2019-12-12 13:48:14 -08:00
Brian Vaughan
945ce71b18 Correctly handle scalar types, fix parse of numpy ints (#30486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30486

Fixes: https://github.com/pytorch/pytorch/issues/29252

There is some incorrect code in the handling of parsing python numbers that led to issue #29252:

When we allow interpretation of a zero-dim numpy integer value
as a scalar in pytorch, we incorrectly parse the int as a float.

This PR also fixes the issue described in the "FIXME" here:
https://github.com/pytorch/pytorch/pull/27628/files#diff-f539198dd366265fb8dc2d661bc5d5bcR1487

Test Plan: Added a unit test based on the example given in the issue.

Differential Revision: D18932520

Pulled By: nairbv

fbshipit-source-id: f6416f28dfd73ac72c1042042851d76beb5fcf65
2019-12-11 15:35:57 -08:00
Alban Desmaison
717274c001 Add useful warnings for t.grad when it won't be populated for known reasons (#30531)
Summary:
Fix https://github.com/pytorch/pytorch/issues/2362 and https://github.com/pytorch/pytorch/issues/19778

To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30531

Differential Revision: D18832767

Pulled By: albanD

fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff
2019-12-11 09:47:18 -08:00
Michael Suo
62b10721fb Actually make flake8 do something (#30892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892

Fixes all outstanding lints and actually installs a properly configured
flake8

Test Plan: Imported from OSS

Differential Revision: D18862825

Pulled By: suo

fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85
2019-12-06 17:50:50 -08:00
Gregory Chanan
377131b0eb MultiMarginCriterion: fix scalar_check in the case where reduction == None. (#30826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30826

Previously the scalar_check for the reduction None case was:
input.dim() <= 1, but it should be target based, i.e.:
target.dim() == 0.  This follows from the "correct cases", i.e.
(N, C) X (N,) -> (N,)
(C,) X () -> ()

Test Plan: Imported from OSS

Differential Revision: D18833660

Pulled By: gchanan

fbshipit-source-id: 26338b842a8311718c4b89da3e2f1b726d5409b8
2019-12-06 09:04:38 -08:00
Gregory Chanan
e5d571ae25 Remove scalar_check from topk, move it to the THC implementation.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30852

Test Plan: Imported from OSS

Differential Revision: D18842662

Pulled By: gchanan

fbshipit-source-id: b5e8a4367fce9441be2ddbd026495f1911038221
2019-12-06 07:50:20 -08:00
Edward Yang
6e38d50352 Revert D18117070: Migrate max and min (binary) from TH to ATen.
Test Plan: revert-hammer

Differential Revision:
D18117070

Original commit changeset: e06d37a8a140

fbshipit-source-id: 49dd33f52e7e3ffcaafc02109a0a0a67545ec7e8
2019-12-05 14:43:29 -08:00
Edward Yang
2ced81f289 Revert "Default to not build Caffe2 operators on Windows. (#29061)" (#30740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30740

This reverts commit 7102aceaf8.

Test Plan: Imported from OSS

Differential Revision: D18834315

Pulled By: ezyang

fbshipit-source-id: 2dbd1cf686864b9840365083182cd6188a285399
2019-12-05 14:01:59 -08:00
Hong Xu
1578a28692 Migrate max and min (binary) from TH to ATen. (#27185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27185

TH implementation will be removed after the unary max and min are migrated.

Benchmark: (Debian 10, Release build, gcc 7.4, no turbo)

```python
import timeit
for device in ('cpu', 'cuda'):
    print(f'device: {device}')
    for op in ('max', 'min'):
        for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'):
            for n, t in [(10_000, 200000),
                        (100_000, 20000)]:
                print(f'torch.{op}(a, b), numel() == {n} for {t} times, dtype={dtype}')
                print(timeit.timeit(f'torch.{op}(a)' + (';torch.cuda.synchronize()' if device == 'cuda' else ''),
                                    setup=f'import torch; a = torch.arange({n}, dtype={dtype}); b = torch.ones({n}, 0, dtype={dtype}) * ({n} / 2)', number=t))
    print()
```

Before:

```
device: cpu
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.241763713000182
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.7138833169992722
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.2183356810000987
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.7031846980007685
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7704679510006827
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.289198366999699
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.7937613740014058
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2930124340000475
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8032857640009752
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.2908709189996443
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
1.8829010000008566
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.2994690759987861
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
1.8037853410005482
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.2929310759991495
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.8075240359994496
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2932477679987642
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.7868400779989315
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2885970789993735
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8389664830010588
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.29402057399966

device: cuda
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
4.787109836999662
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.842438002999188
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.429616614999759
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.835390076999829
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.940423873000327
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4108991760003846
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.9318018840003788
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4168134739993548
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9610764919998473
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4189234130008117
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.960172712999338
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.4162539499993727
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.8985912560001452
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.4113489299998037
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.9160250799995993
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4128787690005993
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.8806865219994506
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4086357010000938
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9362181240012433
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4151225870009512

```

After:

```
device: cpu
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.2685823729998447
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.72004808300062
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.212242640000113
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.7089235590001408
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7767087259999244
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2916517639996528
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.8265984959998605
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.3002885240002797
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8084679720004715
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.3012119999993956
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
1.8800218449996464
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.3060645710002063
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.4905043950002437
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.9126290209997023
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7972335520007618
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2918074379995232
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.8047651860006226
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2992197730000044
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8526509560006161
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.3030709570002728

device: cuda
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
4.700986622000528
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.8415469050005413
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.3051693249999516
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.8321999460004008
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.8086475109994353
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.405110773999695
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.913458047999484
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4236377289998927
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9386842409994642
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4230227469997772
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
3.0341797270002644
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.4289592409995748
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.6091147850002017
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
2.036691903999781
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.8256167649997224
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4078955400000268
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.8631781489993955
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4210130069996012
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
3.0112479260005784
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4297719679998409

```

Solve partly #24594 #24595

Close #25016

Test Plan: Imported from OSS

Differential Revision: D18117070

Pulled By: VitalyFedyunin

fbshipit-source-id: e06d37a8a1405848ba0b9e398870a77eb52bae8b
2019-12-05 09:55:56 -08:00
Gregory Chanan
2607772959 Turn off scalar_checks for SpatialDepthwiseConvolution and SpatialConvolutionMM. (#30789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30789

The input(s) can't be 0-dimensional, so its irrelevant.

Restacked version of: https://github.com/pytorch/pytorch/pull/30438

Test Plan: Imported from OSS

Differential Revision: D18825716

Pulled By: gchanan

fbshipit-source-id: a4883b795163efcb9d8dba6166d0f2102b6728a2
2019-12-05 08:07:31 -08:00
Gregory Chanan
50625798df Fix scalar check of MultiLabelMarginLoss. (#30768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30768

The behavior didn't match the documentation, because the documentation (for 'none' reduction) reads:
input X target -> output
(N, C) X (N, C) -> (N,)
(C,) X (C,) -> ()

but the later case would output (1,).  This also changes the case to:
() X (C,) -> ()
from:
() X (C,) -> (C,)
which makes more sense with the above formulas.

Restacked version of: https://github.com/pytorch/pytorch/pull/30748

Test Plan: Imported from OSS

Differential Revision: D18821554

Pulled By: gchanan

fbshipit-source-id: 3df77c51cf25648cb5fab62a68b09f49c91dab4e
2019-12-05 08:07:20 -08:00
Gregory Chanan
473a044835 Fix a CUDA memory leak in MultiLabelMarginCriterion error checking. (#30767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30767

Restacked version of: https://github.com/pytorch/pytorch/pull/30733

Test Plan: Imported from OSS

Differential Revision: D18821553

Pulled By: gchanan

fbshipit-source-id: 8bf0365ce54dd2f07a5d6d0937332d0baf75b350
2019-12-05 08:07:15 -08:00
Gregory Chanan
786de33832 Move scalar_check logic from codegen to code in NLLLoss. (#30670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30670

Also turn off scalar_check for grad_input: it isn't necessary because the input can't be 0-dimensional.

Test Plan: Imported from OSS

Differential Revision: D18784523

Pulled By: gchanan

fbshipit-source-id: 246d30970457075a0403dd0089317659a2cd2dd4
2019-12-04 12:30:23 -08:00
Gregory Chanan
fa2aa245cf Simplify scalar_check of nll_loss. (#30669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30669

The inputs can't be 0-d, so we don't need that check in the scalar_check.

Test Plan: Imported from OSS

Differential Revision: D18784524

Pulled By: gchanan

fbshipit-source-id: d44222dffc91880a6e8c7be69e6e146e60040d43
2019-12-04 12:30:19 -08:00
Hong Xu
bb5dcaf24f Add logical_and and logical_or (#30521)
Summary:
With the CI failure caused in 8bbafa0b32 fixed (incorrect return type of the lambdas in CUDA kernels)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30521

Differential Revision: D18770151

Pulled By: ailzhang

fbshipit-source-id: 02f0fe1d5718c34d24da6dbb5884ee8b247ce39a
2019-12-03 18:24:54 -08:00
Hong Xu
4ac614191a Remove exp10 in TH (unused)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30422

Test Plan: Imported from OSS

Differential Revision: D18764186

Pulled By: VitalyFedyunin

fbshipit-source-id: 9343a5a7e4edf61ba3b85eaf846b2e149ed6529a
2019-12-03 18:17:15 -08:00
Brian Vaughan
a376dd344c Added check for torch.where on CPU that both arguments have same dtype (#30662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30662

Cherry picked from: https://github.com/pytorch/pytorch/pull/29081

Test Plan: Imported from OSS

Differential Revision: D18782295

Pulled By: nairbv

fbshipit-source-id: 897ab25ddf8819ca34f5e86c5d3f41debb56cb04

Co-authored-by: ifedan
2019-12-03 15:19:52 -08:00
Gregory Chanan
8b29701ae5 Turn off scalar_checks for _th_reciprocal. (#30436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30436

The underlying TH implementation is correct.

Test Plan: Imported from OSS

Differential Revision: D18699088

Pulled By: gchanan

fbshipit-source-id: e75a588ae4afb0506922ba98208546d5c0de623a
2019-12-03 07:04:53 -08:00
Gregory Chanan
61798865e3 Turn off scalar_checks for torch.clamp. (#30435)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30435

The underlying THC implementations are correct.

Test Plan: Imported from OSS

Differential Revision: D18699089

Pulled By: gchanan

fbshipit-source-id: f5d1319bf48eae36903296dad0b98ed80661f732
2019-12-03 07:04:47 -08:00
Brian Vaughan
e5b947a3a8 Raise an error for is_signed on quantized types (#30527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30527

When we introduced dtype.is_signed we allowed for support of
quantized types, but we're not sure what the correct result should be.

See discussion at https://github.com/pytorch/pytorch/pull/29511

Test Plan: Imported from OSS

Differential Revision: D18765410

Pulled By: nairbv

fbshipit-source-id: c87cfe999b604cfcbbafa561e04d0d5cdbf41e6d
2019-12-03 06:34:53 -08:00
Gregory Chanan
569729527b Turn off scalar_checks for exp, cos, cosh, tan, atan, tanh, erf, erfc. (#30434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30434

These are all pointwise ops that are implemented correctly wrt shapes in THC.

Test Plan: Imported from OSS

Differential Revision: D18699087

Pulled By: gchanan

fbshipit-source-id: 82cb91b00c77bfaca75be497c87fc7ae52daf46c
2019-12-02 16:10:25 -08:00
Gregory Chanan
0b25371f5d Turn off scalar_check for _th_normal.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29955

Test Plan: Imported from OSS

Differential Revision: D18548051

Pulled By: gchanan

fbshipit-source-id: c652999ac9e37d2592aa85ef022040fe0700b5cf
2019-11-27 14:52:06 -08:00
Richard Zou
ec5c08de74 Revert D18580867: Add logical_and and logical_or
Test Plan: revert-hammer

Differential Revision:
D18580867

Original commit changeset: 7e4d7c37da4d

fbshipit-source-id: 81fb604c7aef8d847f518f5faa016e7bd0423016
2019-11-27 09:27:00 -08:00
Hong Xu
8bbafa0b32 Add logical_and and logical_or (#28162)
Summary:
Superseding https://github.com/pytorch/pytorch/issues/24379 as type promotion has been implemented.

Close https://github.com/pytorch/pytorch/issues/24379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28162

Differential Revision: D18580867

Pulled By: ailzhang

fbshipit-source-id: 7e4d7c37da4dc8df87314bd4f1f6a7539e46586a
2019-11-26 17:38:22 -08:00
Gregory Chanan
dbce53fe32 Turn off scalar_check for _th_gather. (#29954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29954

The underlying op handles scalar_check correctly.

Test Plan: Imported from OSS

Differential Revision: D18548054

Pulled By: gchanan

fbshipit-source-id: a1b44afa80c2928b78abbfba8b8b5d3608ac0fd3
2019-11-26 10:23:42 -08:00
Gregory Chanan
72ac45662b Turn off scalar_checks for torch.take. (#29953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29953

The underlying function handles it correctly.

Test Plan: Imported from OSS

Differential Revision: D18548055

Pulled By: gchanan

fbshipit-source-id: cc2d0ae37d9689423363d115c6a653cb64840528
2019-11-26 10:23:37 -08:00
Gregory Chanan
79a830af56 Turn off scalar_check for Tensor.set_(Tensor) (#29952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29952

The underlying op handles the check correctly.

Test Plan: Imported from OSS

Differential Revision: D18548048

Pulled By: gchanan

fbshipit-source-id: 9ac6fde743408e59ccdfc61bd574ebe6e2862238
2019-11-26 10:23:33 -08:00
Gregory Chanan
0c67311878 Turn off scalar_check for set_(Storage, ...) (#29950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29950

The underlying code handles it correctly.

Test Plan: Imported from OSS

Differential Revision: D18548052

Pulled By: gchanan

fbshipit-source-id: 88b737572c816fb0026ac5e66da7e3f4ab686773
2019-11-25 14:52:22 -08:00
Gregory Chanan
7160300638 Turn off scalar_check for reductions _th_max, _th_min. (#29949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29949

The underlying functions handle this already.

Test Plan: Imported from OSS

Differential Revision: D18548047

Pulled By: gchanan

fbshipit-source-id: 123c9297db4e4315da9b1d996ac8b41aa1b4c7bc
2019-11-25 14:52:17 -08:00
Gregory Chanan
16606e1725 Turn off scalar_check for mode; the underlying code is correct.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29948

Test Plan: Imported from OSS

Differential Revision: D18548053

Pulled By: gchanan

fbshipit-source-id: 15cdfc24d3e5123497c72dc09c5e6b28cb5e1f88
2019-11-25 14:52:12 -08:00
Gregory Chanan
b8eba7aca9 Turn off scalar_check for ormqr. (#29947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29947

It requires > 0-dimensional tensors.

Test Plan: Imported from OSS

Differential Revision: D18548049

Pulled By: gchanan

fbshipit-source-id: ce80a42515b59513a0e5ef2b32e2c2b90b4d64f5
2019-11-25 14:52:07 -08:00
Gregory Chanan
7c6cc1d6d4 Turn off scalar_checks for _th_multinomial_alias_draw. (#29946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29946

it requires > 0-dimensional tensors.

Test Plan: Imported from OSS

Differential Revision: D18548050

Pulled By: gchanan

fbshipit-source-id: 4d1e3b53bd701137cc2cb674f95627a5e064a274
2019-11-25 14:52:02 -08:00
Gregory Chanan
ce5f1a1b25 Turn off scalar_check for masked_select. (#29923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29923

Note that this changes the behavior of masked_select when both "self" and "mask" are 0-dimensional.

In previous versions of PyTorch, this would return a 0-dimensional tensor.  But the documentation reads:
"Returns a new 1-D tensor which indexes the input tensor according to the boolean mask mask which is a BoolTensor."

Test Plan: Imported from OSS

Differential Revision: D18539560

Pulled By: gchanan

fbshipit-source-id: 1637ed2c434fcf8ceead0073aa610581f4a19d21
2019-11-25 14:51:51 -08:00
Gregory Chanan
0c9c62ba6e Turn off scalar_checks for __and__ and clone.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29880

Test Plan: Imported from OSS

Differential Revision: D18521732

Pulled By: gchanan

fbshipit-source-id: 7fdf5d8a7b93b43ac32067222cb8df5e790900de
2019-11-25 14:51:46 -08:00
Gregory Chanan
94ad7544ae Turn off scalar_check for __or__
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29879

Test Plan: Imported from OSS

Differential Revision: D18521745

Pulled By: gchanan

fbshipit-source-id: 93d17d5e9cad5dd6d2c20221d87408c838d74eca
2019-11-25 14:51:40 -08:00
Gregory Chanan
f994377d28 Turn off scalar_check for lshift, rshift.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29878

Test Plan: Imported from OSS

Differential Revision: D18521746

Pulled By: gchanan

fbshipit-source-id: 11fd7db79ac8ae76b1a5df25fb0ff59d81fcf394
2019-11-25 14:51:34 -08:00
Gerard Goossen
faacbfa8bf Migrate index_add cpu from TH to ATen (#28421)
Summary:
Migrate index_add cpu from TH to ATen.

I couldn't find replacement for get1d and set1d, so doing pointer arithmetic inplace.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28421

Test Plan: existing tests

Differential Revision: D18060971

Pulled By: ggoossen

fbshipit-source-id: 413719990cdb2fe578964cde14e93577e48a4342
2019-11-22 06:25:13 -08:00