Commit Graph

69 Commits

Author SHA1 Message Date
yanbing-j
cd33e412a2 Enable fp32/bf16 PRelu forward and backward in MkldnnCPU path (#60427)
Enable fp32/bf16 PRelu forward and backward in MkldnnCPU path.

Fixes https://github.com/pytorch/pytorch/issues/58896

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60427
Approved by: https://github.com/VitalyFedyunin, https://github.com/ngimel, https://github.com/malfet
2022-05-10 17:29:11 +00:00
Nikita Shulga
b08633917d Revert D29463782: opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU
Test Plan: revert-hammer

Differential Revision:
D29463782 (479e0d64e6)

Original commit changeset: 74b3d6138945

Original Phabricator Diff: D29463782 (479e0d64e6)

fbshipit-source-id: a9765f67f9c8c01faad82450e3c6a8d0c0abbe4b
(cherry picked from commit 12ce4ef02a13da85aa9bfe6c92ac41d4e0b8d2b0)
2022-05-06 19:34:41 +00:00
mingfeima
479e0d64e6 opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU (#58348)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58348

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D29463782

Pulled By: VitalyFedyunin

fbshipit-source-id: 74b3d613894526280996c8211e0df918ac09364d
(cherry picked from commit 2db963bfaee7823bf5ecb2ef909405eb02db0613)
2022-05-06 17:19:05 +00:00
mingfeima
dbfb9a823d enable BFloat16 mkldnn_convolution on both contiguous and channels last memory format (#55864)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55864

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D27941367

Pulled By: VitalyFedyunin

fbshipit-source-id: c6bcb73c41652cc0aca11c1d1e0697a8a2fa43ad
(cherry picked from commit 3fc0b992a7dccbc31042dc35afec9ae3dc59a05a)
2022-05-02 22:23:10 +00:00
mingfeima
92a9c0e3e0 add channels last (2d) support for mkldnn_convolution (#55584)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55584

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27941368

Pulled By: VitalyFedyunin

fbshipit-source-id: 7dd6f02a5787efa1995f31cdbd3244b25653840c
(cherry picked from commit bb555ed0fedafd529cb552807326384e95c90df9)
2022-04-20 22:34:44 +00:00
yanbing-j
12026124cc Avoid the view for mkldnn case in 1D convolution (#68166)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68034

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68166

Reviewed By: mrshenli

Differential Revision: D32432444

Pulled By: jbschlosser

fbshipit-source-id: fc4e626d497d9e4597628a18eb89b94518bb3b33
2021-11-15 11:56:45 -08:00
Jane Xu
6e67150f57 [skip ci] Set test owner for test_mkldnn.py (#66845)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc gujinghui PenghuiCheng XiaobingSuper jianyuh VitalyFedyunin

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66845

Reviewed By: anjali411

Differential Revision: D31803377

Pulled By: janeyx99

fbshipit-source-id: 4fcf77d3e4bf976449a0b1ab4d750619db3493a1
2021-10-20 12:38:56 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
yanbing-j
c7a7c2b62f Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525)
Summary:
Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one.

Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525

Reviewed By: ejguan

Differential Revision: D29940369

Pulled By: ezyang

fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf
2021-08-03 06:52:23 -07:00
XiaobingSuper
4f46943e3d enable check trace when tracing a mkldnn model (#61241)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43039, when tracing a MKLDNN model with setting **check_trace=True**, there has an error: **RuntimeError: unsupported memory format option Preserve**, this PR is to solve this problem.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61241

Reviewed By: anjali411

Differential Revision: D29737365

Pulled By: suo

fbshipit-source-id: e8f7f124bc6256f10b9d29969e0c65d332514625
2021-07-19 11:03:53 -07:00
Nikita Shulga
c7d8d8f925 [BE] Improve has_bf16_support (#57408)
Summary:
Use `functools.lru_cache` to avoid calling this function multiple time
Check that we are running on Linux platform before trying to open
"/proc/cpuinfo"
Do not spawn new process, but simply open("/proc/cpuinfo").read() and
search the output for the keywords

Fixes https://github.com/pytorch/pytorch/issues/57360

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57408

Reviewed By: driazati

Differential Revision: D28136769

Pulled By: malfet

fbshipit-source-id: ab476774c3be2913cb576d98d47a2f7ec03c19aa
2021-05-03 09:11:04 -07:00
Masaya, Kato
473d193966 Use mkldnn copy for copy_ when self and src are Mkldnn layout (#54248)
Summary:
Currently, when copy_ is called with Mkldnn layout, a RuntimeError is raised.

**Environment**
- CPU : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
- PyTorch master(1772e26f63)
- build with USE_MKLDNN=1

**Sample code to reproduce:**
```python
import torch

x = torch.randn(4, 5, dtype=torch.float32)
mkldnn_x = x.to_mkldnn()
mkldnn_y = torch.randn(4, 5, dtype=torch.float32).to_mkldnn()
mkldnn_y.copy_(mkldnn_x)

print(x)
print(mkldnn_y.to_dense())
```

**Results:**
Actual:
```sh
Traceback (most recent call last):
  File "mkldnn_copy.py", line 6, in <module>
    mkldnn_y.copy_(mkldnn_x)
RuntimeError: unsupported tensor layout: Mkldnn
```

Expected:
```sh
# x
tensor([[ 0.1276, -0.1179,  1.1970,  2.4836,  1.9059],
        [-1.9647,  0.8613, -0.5060,  0.1555,  0.3661],
        [-0.1560, -0.2133,  0.3414, -1.7095, -2.3431],
        [ 1.3291,  0.3083,  0.5523, -2.0577, -0.4740]])
# mkldnn_y
tensor([[ 0.1276, -0.1179,  1.1970,  2.4836,  1.9059],
        [-1.9647,  0.8613, -0.5060,  0.1555,  0.3661],
        [-0.1560, -0.2133,  0.3414, -1.7095, -2.3431],
        [ 1.3291,  0.3083,  0.5523, -2.0577, -0.4740]])
```

This is because `copy_` does not support Mkldnn layout.
So I modified to call `copy_mkldnn_` in `copy_` when both `self` and `src` are Mkldnn layout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54248

Reviewed By: mrshenli

Differential Revision: D27641352

Pulled By: ezyang

fbshipit-source-id: 70a37cdacb4a40b250ca16f2f6ddb6b71ff52d90
2021-04-08 06:35:39 -07:00
Akao, Kazutoshi
d2a58bfe6f Add mkldnn tanh operator (#54656)
Summary:
## 🚀 Feature
Add Mkl-Layout kernel for tanh.

## Motivation
We want to add a Mkl-Layout kernel for tanh to improve tanh's performance when the input Tensor is Mkl-Layout.
Because, PyTorch does not have the Mkl-Layout kernel for tanh, so it cannot execute the tanh input by the Mkl-Layout Tensor.
Off course you can temporarily avoid this problem by executing to_dense/to_mkldnn, but the performance is significantly reduced due to the copy overhead(1.6-4.3 times slower than CPU kernel).

## Perfomance results

### Environment
- CPU: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz
- OS: 18.04.1 LTS
- compiler: gcc 7.5.0
- branch: master
- commit ID: fe2c126
- build Environment variable: USE_CUDA=0
- Python: 3.6.9
- Intel MKL(Math Kernel Library): 2020.2-254
- Intel oneDNN: 1.8.1

### Benchmark script
``` python
import torch
import torch.nn as nn

torch.manual_seed(1)

x = torch.randn(2048, 2048)
x_mkl = x.to_mkldnn()

print("### CPU tanh")
with torch.autograd.profiler.profile(record_shapes=True) as prof:
    for i in range(100):
        output = x.tanh()
print(prof.key_averages().table(sort_by="self_cpu_time_total"))

print("\n### CPU tanh_")
with torch.autograd.profiler.profile(record_shapes=True) as prof:
    for i in range(100):
        x.tanh_()
print(prof.key_averages().table(sort_by="self_cpu_time_total"))

print("\n### to_dense/to_mkldnn + tanh")
with torch.autograd.profiler.profile(record_shapes=True) as prof:
    for i in range(100):
        output = x_mkl.to_dense().tanh().to_mkldnn()
print(prof.key_averages().table(sort_by="self_cpu_time_total"))

print("\n### to_dense/to_mkldnn + tanh_")
with torch.autograd.profiler.profile(record_shapes=True) as prof:
    for i in range(100):
        x_mkl.to_dense().tanh_().to_mkldnn()
print(prof.key_averages().table(sort_by="self_cpu_time_total"))

print("\n### Mkl-Layout tanh")
with torch.autograd.profiler.profile(record_shapes=True) as prof:
    for i in range(100):
        output = x_mkl.tanh()
print(prof.key_averages().table(sort_by="self_cpu_time_total"))

print("\n### Mkl-Layout tanh_")
with torch.autograd.profiler.profile(record_shapes=True) as prof:
    for i in range(100):
        x_mkl.tanh_()
print(prof.key_averages().table(sort_by="self_cpu_time_total"))
```

### Results
#### OMP_NUM_THREADS=1 Results(Self CPU time total ms)
| Operation | CPU kernel | to_dense/to_mkldnn+CPU kernel | Mkl-Layout kernel(This PR) |
| ---------- | ---------- | ----------------------------- | -------------------------- |
|tanh | 579.662 | 1658.000 | 617.565 |
| tanh_ | 554.477 | 881.997 | 589.426 |

#### OMP_NUM_THREADS=6 Results(Self CPU time total ms)
| Operation | CPU kernel | to_dense/to_mkldnn+CPU kernel | Mkl-Layout kernel(This PR) |
| ---------- | ---------- | ----------------------------- | -------------------------- |
|tanh | 182.387 | 421.336 | 136.226 |
| tanh_ | 94.331 | 404.931 | 99.254 |

## Modification policy for the code
oneDNN is already supported tanh operation.

[oneDNN: Elementwise](https://spec.oneapi.com/versions/latest/elements/oneDNN/source/primitives/eltwise.html)

There is already exist sigmoid implementation that uses the same Elementwise API as tanh, so we created this PR code with reference to the sigmoid implementation.

527c1e0e37/aten/src/ATen/native/mkldnn/UnaryOps.cpp (L28-L42)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54656

Test Plan:
A test for sigmoid has already been created as shown below.
So, I added a new test of tanh referring to the test of sigmoid.

527c1e0e37/test/test_mkldnn.py (L944-L954)

### mkldnn tanh test result

```
$ python3 test/test_mkldnn.py TestMkldnn.test_tanh
Couldn't download test skip set, leaving all tests enabled...
.
----------------------------------------------------------------------
Ran 1 test in 0.004s

OK
```

Reviewed By: gchanan

Differential Revision: D27395827

Pulled By: ezyang

fbshipit-source-id: d4481332de187e2dea095f9b6aabc73a497960fe
2021-04-05 00:00:16 -07:00
Masaya, Kato
2c4a64589b fix mkldnn_add in-place behavior (#51687)
Summary:
There are the following two patterns to call add in-pace.

```python
torch.add(a, b, out=a) # (1) a in-placed
torch.add(a, b, out=b) # (2) b in-placed
```

If a and b are mkldnn Tensor, the value is different from expected in case (2).

**Sample code to reproduce the behavior:**

```python
import torch

torch.manual_seed(4)
a = torch.randn(4, 4)
b = torch.randn(4, 4)
b.fill_(1.0)

a_mkl = a.to_mkldnn()
b_mkl = b.to_mkldnn()

torch.add(b, a, alpha=1.0, out=a)
torch.add(b_mkl, a_mkl, alpha=1.0, out=a_mkl)

print(a)
print(a_mkl)
```

**Results:**

Actual:

```python
tensor([[ 0.0586,  2.2632,  0.8162,  1.1505],
        [ 1.1075,  0.7220, -1.6021,  1.6245],
        [ 0.1316,  0.7949,  1.3976,  1.6699],
        [ 0.9463,  1.0467, -0.7671, -1.1205]])
tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]], layout=torch._mkldnn)
```

Expected:

```python
tensor([[ 0.0586,  2.2632,  0.8162,  1.1505],
        [ 1.1075,  0.7220, -1.6021,  1.6245],
        [ 0.1316,  0.7949,  1.3976,  1.6699],
        [ 0.9463,  1.0467, -0.7671, -1.1205]])
tensor([[ 0.0586,  2.2632,  0.8162,  1.1505],
        [ 1.1075,  0.7220, -1.6021,  1.6245],
        [ 0.1316,  0.7949,  1.3976,  1.6699],
        [ 0.9463,  1.0467, -0.7671, -1.1205]], layout=torch._mkldnn)
```

This is because `dnnl::sum` called in `mkldnn_add` has the following specifications:

[oneDNN doc : Sum](https://oneapi-src.github.io/oneDNN/dev_guide_sum.html)

> The sum primitive supports in-place operation, meaning that the src0 tensor can be used as both input and output.
> In-place operation overwrites the original data. Using in-place operation requires the memory footprint of the
> output tensor to be either bigger than or equal to the size of the dst memory descriptor used for primitive creation.

but, case 2) are added to the first argument.
So, we modified it so that a and b are swapped and passed to "sum" in case (2).

**Environment**
・CPU : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
・build USE_MKLDNN=1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51687

Reviewed By: jbschlosser

Differential Revision: D27062172

Pulled By: VitalyFedyunin

fbshipit-source-id: bf76d36f9fdb1b4337d71d87bcdbaf4edb11f12f
2021-03-16 12:54:27 -07:00
XiaobingSuper
793a29a7d5 add OneDNN batch_norm backward (#50460)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50460

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26006887

Pulled By: VitalyFedyunin

fbshipit-source-id: 472398772af01a31594096ccc714fd487ed33dd4
2021-03-15 13:30:17 -07:00
XiaobingSuper
33e3deed4f add OneDNN relu backward and reshape backward (#49455)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49455

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26006886

Pulled By: VitalyFedyunin

fbshipit-source-id: c81ef115205171b80652800a76170dd759905e28
2021-03-15 13:27:56 -07:00
Elias Ellison
f41c80c267 Dont error on 0-dim in convolution (#51922)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51922

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696701

Pulled By: eellison

fbshipit-source-id: f8b2c19e134931971fac00246920c1584dd43581
2021-03-01 21:22:30 -08:00
Elias Ellison
42bfda36e1 Add 0-dim support for binary mkldnn ops (#51921)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51921

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696696

Pulled By: eellison

fbshipit-source-id: 96ca79c0d6b5ed7c32c14dc4e7c383f2522a85cb
2021-03-01 21:22:26 -08:00
XiaobingSuper
420fc42eab add OneDNN pooling backward (#49454)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49454

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26006888

Pulled By: VitalyFedyunin

fbshipit-source-id: 6a4930982db784819fea70ffc9029441d673d90e
2021-02-23 14:45:55 -08:00
XiaobingSuper
8f3ed60d3e enable mkldnn conv2d backward to support mkldnn tensor input (#48994)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48994

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D25537189

Pulled By: VitalyFedyunin

fbshipit-source-id: d81d247798fad3815b735468d66ef9d62c07ef77
2021-02-18 10:23:10 -08:00
XiaobingSuper
324c6aada1 BFloat16: enable prepacked weights's inference (#48922)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48922

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D25537188

Pulled By: VitalyFedyunin

fbshipit-source-id: ab6eb1ba8cffb5ba9d00d05db8ef616628f8c932
2021-02-17 11:20:00 -08:00
jiej
bc1b1e8253 fixing mkldnn_linear & backward with silent error (#51713)
Summary:
mkldnn_linear & mkldnn_linear_backward_input gives wrong result when weight is non contiguous.

Issue exposed in PR https://github.com/pytorch/pytorch/issues/51613

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51713

Reviewed By: zhangguanheng66

Differential Revision: D26282319

Pulled By: ngimel

fbshipit-source-id: 96516e10c9dc72c30dac278fce09b746aa5f51b2
2021-02-05 18:36:30 -08:00
XiaobingSuper
ec378055c3 add OneDNN linear backward (#49453)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49453

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26006889

Pulled By: VitalyFedyunin

fbshipit-source-id: 06e2a02b6e01d847395521a31fe84d844f2ee9ae
2021-02-02 12:18:59 -08:00
Jeffrey Wan
c0966914bc Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49409

There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories:
1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead
3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag

Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?)

Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False.

So far exceptions to the above (as discovered by CI) include:
 - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests
 - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103)
 - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236)
 - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235)
 - test_data_parallel (test_data_parallel_buffers_requiring_grad) - *SIGSEGV* (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697)
 - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315)

Possible TODO is to prevent new tests from invoking external gradcheck.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133

Reviewed By: ezyang

Differential Revision: D26147919

Pulled By: soulitzer

fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432
2021-01-29 09:13:37 -08:00
XiaobingSuper
f66147ebca BFloat16: add explicit dtype support for to_mkldnn and to_dense (#48881)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48881

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25537190

Pulled By: VitalyFedyunin

fbshipit-source-id: a61a433c638e2e95576f88f081b64ff171b2316e
2020-12-16 16:09:42 -08:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
XiaobingSuper
b72da0cf28 OneDNN: report error for dilation max_pooling and replace AT_ERROR with TORCH_CHECK in oneDNN codes (#43538)
Summary:
Fix https://github.com/pytorch/pytorch/issues/43514.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43538

Reviewed By: agolynski

Differential Revision: D23364302

Pulled By: ngimel

fbshipit-source-id: 8d17752cf33dcacd34504e32b5e523e607cfb497
2020-08-28 10:57:19 -07:00
Zhang, Xiaobing
2b14f2d368 [reland][DNNL]:enable max_pool3d and avg_pool3d (#40996)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40996

Test Plan: Imported from OSS

Differential Revision: D22440766

Pulled By: VitalyFedyunin

fbshipit-source-id: 242711612920081eb4a7e5a7e80bc8b2d4c9f978
2020-07-16 10:26:45 -07:00
Zhang, Xiaobing
2b8db35c7e [reland][DNNL]:enable batchnorm3d (#40995)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40995

Test Plan: Imported from OSS

Differential Revision: D22440765

Pulled By: VitalyFedyunin

fbshipit-source-id: b4bf427bbb7010ee234a54e81ade371627f9e82c
2020-07-15 13:56:47 -07:00
Zhang, Xiaobing
b48ee175e6 [reland][DNNL]:enable conv3d (#40691)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40691

Test Plan: Imported from OSS

Differential Revision: D22296548

Pulled By: VitalyFedyunin

fbshipit-source-id: 8e2a7cf14e8bdfa2f29b735a89e8c83f6119e68d
2020-07-15 13:54:41 -07:00
Zhang, Xiaobing
fc4824aa4a enable mkldnn dilation conv (#40483)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40483

Reviewed By: ezyang

Differential Revision: D22213696

Pulled By: ngimel

fbshipit-source-id: 0321eee8fcaf144b20a5182aa76f98d505c65400
2020-06-24 13:28:05 -07:00
Yanli Zhao
016cf7d66e Revert D22102408: DNNL: enable conv3d
Test Plan: revert-hammer

Differential Revision:
D22102408

Original commit changeset: 1e95cede429f

fbshipit-source-id: a20b725164177e8571320007548a58cc4779d669
2020-06-22 15:41:51 -07:00
Yanli Zhao
17fe0e2b8a Revert D22102407: DNNL: enable batchnorm3d
Test Plan: revert-hammer

Differential Revision:
D22102407

Original commit changeset: c9dbb61d0538

fbshipit-source-id: d40976aa8120d2d0839624bf02c082d7d1eb610d
2020-06-22 15:39:37 -07:00
Yanli Zhao
13a8ec3cc5 Revert D22102406: DNNL: enable max_pool3d and avg_pool3d
Test Plan: revert-hammer

Differential Revision:
D22102406

Original commit changeset: 296a87188b79

fbshipit-source-id: ff023be5e8dd4bfcd68770cab305da6ba2e03893
2020-06-22 15:23:01 -07:00
Yanli Zhao
9498e24ca8 Revert D22138737: DNNL: enable dilation conv
Test Plan: revert-hammer

Differential Revision:
D22138737

Original commit changeset: 4225bc7d2624

fbshipit-source-id: 7bbafbe9f412a8f9167e3ae4425dbc933ec67c6b
2020-06-22 15:20:55 -07:00
Zhang, Xiaobing
dbcc5b7533 DNNL: enable dilation conv (#40220)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40220

Test Plan: Imported from OSS

Differential Revision: D22138737

Pulled By: VitalyFedyunin

fbshipit-source-id: 4225bc7d26241b443d18ef9d56326e5a9e6bbeda
2020-06-22 13:14:09 -07:00
Zhang, Xiaobing
c873895722 DNNL: enable max_pool3d and avg_pool3d (#35664)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35664

Test Plan: Imported from OSS

Differential Revision: D22102406

Pulled By: VitalyFedyunin

fbshipit-source-id: 296a87188b79545741f6b7e136a58e4380564f25
2020-06-22 11:57:12 -07:00
Zhang, Xiaobing
8df35fd755 DNNL: enable batchnorm3d (#35663)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35663

Test Plan: Imported from OSS

Differential Revision: D22102407

Pulled By: VitalyFedyunin

fbshipit-source-id: c9dbb61d0538ab9e1e76b2815564030b5f89d33e
2020-06-22 11:57:09 -07:00
Zhang, Xiaobing
6ba807cb43 DNNL: enable conv3d (#35662)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35662

Test Plan: Imported from OSS

Differential Revision: D22102408

Pulled By: VitalyFedyunin

fbshipit-source-id: 1e95cede429f1a950f26bc7052ab33d198857df3
2020-06-22 11:55:04 -07:00
Zhang, Xiaobing
5d4a662846 DNNL: fix F.max_pool2d and F.avg_pool2 issue when stride=None (#39221)
Summary:
For F.max_pool2d and F.avg_pool2d, there has **RuntimeErro**r when stride is **None**, this PR sovle it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39221

Differential Revision: D22059565

Pulled By: ngimel

fbshipit-source-id: 2080e1e010815aedd904c58552e92be9f7443d38
2020-06-15 21:00:12 -07:00
Mingfei Ma
9ad14f6b43 cover nn.Conv1d in mkldnn model conversion logic (#38528)
Summary:
current `to_mkldnn` model conversion logic under `torch.utils.mkldnn` does not cover `nn.Conv1d`. This patch fills the gap, using similar logic to `nn.Conv2d`. The model conversion will remove unnecessary memory format reorders of input/output tensors and thus speedup the model.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38528

Differential Revision: D21640325

Pulled By: albanD

fbshipit-source-id: c3340153b5c524e020c097eb4b9e2ffcbde8896d
2020-05-19 13:04:18 -07:00
Wanchao Liang
3526627f46 Use unittest assertWarns instead (#36411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411

This PR remove pytorch specific defined assertwarns and use the unit
test one, also format some tests

Test Plan: Imported from OSS

Differential Revision: D20998159

Pulled By: wanchaol

fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201
2020-04-13 15:56:42 -07:00
pinzhenx
bd604cb5b7 Upgrade MKL-DNN to DNNL v1.2 (#32422)
Summary:
## Motivation

This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.

DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.

This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.

<br>

## What's included?

Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:

<br>

**General:**

1. Replace op-level allocator with global-registered allocator

```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);

// after
ideep::sum::compute(scales, {x, y}, z);
```

The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.

```
RegisterEngineAllocator cpu_alloc(
  ideep::engine::cpu_engine(),
  [](size_t size) {
    return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
  },
  [](void* p) {
    c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
  }
);
```
------

2. Simplify group convolution

We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.

As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.

```
// aten/src/ATen/native/mkldnn/Conv.cpp

if (w.ndims() == x.ndims() + 1) {
  AT_ASSERTM(
      groups > 1,
      "Only group _mkldnn_conv2d weights could have been reordered to 5d");
  kernel_size[0] = w.get_dim(0) * w.get_dim(1);
  std::copy_n(
      w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
  std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```

------

3. Enable DNNL built-in cache

Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.

This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.

------

4. Use 64-bit integer to denote dimensions

We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.

<br>

**Misc changes in each commit:**

**Commit:** change build options

Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.

Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)

------

**Commit:** aten reintegration

- aten/src/ATen/native/mkldnn/BinaryOps.cpp

    Implement binary ops using new operation `binary` provided by DNNL

- aten/src/ATen/native/mkldnn/Conv.cpp

    Clean up group convolution checks
    Simplify conv backward integration

- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

    Simplify prepacking convolution weights

- test/test_mkldnn.py

    Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue

- torch/utils/mkldnn.py

    Prepack weight tensor on module `__init__` to achieve better performance significantly

------

**Commit:** caffe2 reintegration

- caffe2/ideep/ideep_utils.h

    Clean up unused type definitions

- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc

   Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`

- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc

    Clean up group convolution checks
    Revamp convolution API

- caffe2/ideep/operators/conv_transpose_op.cc

    Clean up group convolution checks
    Clean up deconv workaround code

------

**Commit:** custom allocator

- Register c10 allocator as mentioned above

<br><br>

## Performance

We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.

ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%

_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_

† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422

Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results

10% improvement for ResNext with avx512, neutral on avx2

More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP

Reviewed By: yinghai

Differential Revision: D20381325

Pulled By: dzhulgakov

fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
2020-03-26 22:07:59 -07:00
Dmytro Dzhulgakov
67608cc018 Fix MKLDNN conv2d 5d weight handling (#34115)
Summary:
Effectively backporting c5c00c119f before that PR lands

The bug didn't manifesting itself earlier because MkldnnConv2d constructor didn't reorder the weights. So the issue was arising only on second serialization/deserialization. This also fixes the constructor to deliver better perf right away.

Note, that I still serialize 5d tensor - it was the previous behavior, we have to handle it anyway and with https://github.com/pytorch/pytorch/issues/32422 the output of `mkldnn_reorder_conv2d_weight` will always be 4d.

cc pinzhenx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34115

Reviewed By: wanchaol

Differential Revision: D20224685

Pulled By: dzhulgakov

fbshipit-source-id: 24ca9227c4eb4c139096a64ae348808d7478d7dc
2020-03-04 11:26:38 -08:00
Pritam Damania
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00
Gregory Chanan
29f345831e Error out if legacy Tensor.new is called on alternate layouts / dtypes (#31485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31485

Fixes: https://github.com/pytorch/pytorch/issues/22158

Test Plan: Imported from OSS

Differential Revision: D19196499

Pulled By: gchanan

fbshipit-source-id: a01ea7641b5fcd00a9d267243539ff64a5492e5f
2019-12-26 07:27:24 -08:00
Jiakai Liu
3b1c3996e1 remove RTTI check for TensorImpl shadow copy (#22773)
Summary:
We introduced RTTI in recent change: https://github.com/pytorch/pytorch/pull/21613

For internal mobile build we don't enable '-frtti' yet. This diff is trying to replace
RTTI with alternative approach.

According to dzhulgakov we could compare two tensors' type_id directly in most cases -
which is more strict than comparing TensorImpl subclass type as TensorImpl -> type_id
mapping is 1-to-n but it's more proper for this use case.

The only two cases where we can relax direct type comparison (for legacy reason) are:
1. CPUTensor <-> CUDATensor;
2. SparseCPUTensor <-> SparseCUDATensor;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22773

Differential Revision: D16277696

Pulled By: ljk53

fbshipit-source-id: 043e264fbacc37b7a11af2046983c70ddb62a599
2019-07-15 23:21:57 -07:00
Your Name
d632b1ff3c Expose is_mkldnn to python and register it as torchscript prim op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22386

Differential Revision: D16074722

Pulled By: bddppq

fbshipit-source-id: b9b2a05a894847640084f063fba68d9db4e6aec1
2019-07-01 12:31:59 -07:00
Junjie Bai
7d81e62562 Add mkldnn tests for running end to end resnet models
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22041

Differential Revision: D15928786

Pulled By: bddppq

fbshipit-source-id: 4b12e5bda2da13aba2d63d357a0a854d59317362
2019-06-20 22:42:49 -07:00