Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23837
This is a temporary workaround to an issue in MKL-DNN's Convolution backwards implementation: https://github.com/pytorch/pytorch/issues/23825
It is only used to enable testing quantization
Test Plan: Imported from OSS
Differential Revision: D16659081
Pulled By: jamesr66a
fbshipit-source-id: de18ebe98dec2a042f28b23373e20da2b44a42a2
Summary:
This PR aims at improving BERT performance on CPU by using `mkldnn` inner product for `nn.Linear()`.
The current logic is to use `mkldnn` only when `input` tensor is of mkldnn layout. This PR loosens this condition, `mkldnn` will be used for `nn.Linear()` when `input` tensor is of dense layout. The aten tensor is viewed inplace in `mkldnn` without additional memory copy.
1. when `input.dim() >= 3` , it is viewed as 2d tensor. e.g. `[T, N, C]` is treated as `[TN, C]`;
2. when `input` is not contiguous, it is copied so as to be contiguous. `mkldnn` inner product can't handle non-contiguous memory.
With this PR, BERT on `glue/MRPC` inference (batch size = 1) on Xeon 6148 single socket (20 cores@2.5GHz) improves by `44%`:
1. before (unit: iterations/sec):
```bash
408/408 [00:24<00:00, 16.69it/s]
```
2. after (unit: iterations/sec):
```bash
408/408 [00:16<00:00, 24.06it/s]
```
The latency reduces from `59.92 ms` to `41.56ms` correspondingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21851
Differential Revision: D16056334
Pulled By: dzhulgakov
fbshipit-source-id: 9b70ed58323b5e2f3f4e3ebacc766a74a8b68a8a