Commit Graph

101 Commits

Author SHA1 Message Date
Xia, Weiwen
97a291f6bd [ONEDNN][BC-breaking] update onednn from v2.7.3 to v3.1.1 (#97957)
**Summary**
Update onednn from v2.7.3 to v3.1.1.
It is bc-breaking as some APIs are changed on oneDNN side. Changes include:
- PyTorch code where oneDNN is directly called
- Submodule `third_party/ideep` to adapt to oneDNN's new API.
- CMAKE files to fix build issues.

**Test plan**
Building issues and correctness are covered by CI checks.
For performance, we have run TorchBench models to ensure there is no regression. Below is the comparison before and after oneDNN update.
![image](https://github.com/pytorch/pytorch/assets/12522207/415a4ff0-7566-40c6-aed0-24997a475b0e)

Note:
- Base commit of PyTorch: da322ea
- CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Ice Lake)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97957
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-25 12:13:18 +00:00
cyy
483f748dd5 [BE] Enforce missing override keyword (#104032)
This PR enables `-Winconsistent-missing-destructor-override` and `-Winconsistent-missing-override`
and fixes violations.

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 47e904e</samp>

This pull request updates the code of various classes and operators in the `caffe2` and `aten` subdirectories to use the `override` specifier instead of the `virtual` keyword for destructors and other virtual functions that override a base class function. This improves the code readability, quality, and consistency with C++ best practices. It also modifies the `./CMakeLists.txt` file to enable warnings for these specifiers, but disable errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104032
Approved by: https://github.com/malfet
2023-06-24 02:34:24 +00:00
Xia, Weiwen
3613ff06b1 [MKLDNN] Rename pooling_avg to pooling_avg_exclude_padding (#90247)
**Summary**
Rename `pooling_avg` to `pooling_avg_exclude_padding` to align with onednn v3.0. It does not affect correctness or performance. Same as https://github.com/pytorch/pytorch/pull/87851 . Looks like https://github.com/pytorch/pytorch/pull/87851 did not cover all occurrences.

**Test plan**
python test/test_mkldnn.py
python caffe2/python/ideep/pool_op_test.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90247
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-01-12 00:08:30 +00:00
Will Constable
4f34cd6d1e Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032)
Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases.

All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed.
c10/util/logging_is_not_google_glog.h
c10/util/logging_is_google_glog.h

Fixes https://github.com/pytorch/pytorch/issues/81415

cc @miladm @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032
Approved by: https://github.com/soumith, https://github.com/miladm
2022-07-26 01:20:44 +00:00
Michael Andreas Dagitses
606b234336 turn on -Werror=unused-function in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 22:11:54 +00:00
PyTorch MergeBot
bcd7a20953 Revert "turn on -Werror=unused-function in our Bazel CPU build"
This reverts commit 67d313a032.

Reverted https://github.com/pytorch/pytorch/pull/79154 on behalf of https://github.com/malfet due to Breaks bazel build: 67d313a032
2022-06-10 20:43:03 +00:00
Michael Andreas Dagitses
67d313a032 turn on -Werror=unused-function in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 18:30:08 +00:00
Wei Wei
bd3db019a0 Update fbcode symlinks for mkl-dnn ideep 2.5.2
Summary: as titled

Test Plan: buck test caffe2/test:nn

Reviewed By: VitalyFedyunin, luciang

Differential Revision: D34285331

fbshipit-source-id: 5144b3ae1dce02e995d1d633443fb660c57df101
(cherry picked from commit 61f12557b7c924d8cdb97c7be791bccc06e7d30d)
2022-03-04 06:40:08 +00:00
Nikita Shulga
269e92669a [c2] Remove unused private fields (#69709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69709

Fix logical bug in `caffe2/ideep/operators/conv_op.cc`, which
contained an always false statement (fusion_type_ == X && fusion_type_ == Y ) statement

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997006

Pulled By: malfet

fbshipit-source-id: 23e4db1b17cf8a77eae6a8691847ffa484d4736c
2021-12-14 11:31:08 -08:00
Richard Barnes
1433160a36 use irange for loops 6 (#66742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66742

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705366

fbshipit-source-id: be58222426c192406a7f93c21582c3f6f2082401
2021-12-07 16:07:50 -08:00
Stephen Macke
27cc11226d make broadcast fastpath the default for currently rolled-out ops (#68365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68365

title. broadcast fastpath has been running fine for the enabled ops for a while now, so make it the default for these ops.

Test Plan: diff is a no-op, so sandcastle

Differential Revision: D32107847

fbshipit-source-id: b239b127b219985bf7df6a0eea2d879b8e9c79a4
2021-11-15 21:41:57 -08:00
Xue Li
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
Richard Barnes
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
Stephen Macke
9f9244aabe [dte] scaffolding for c2 operator broadcasting fastpath (1/x) (#62369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62369

This diff is a big no-op that just sets up scaffolding for passing the "allow_broadcast_fastpath" from caffe2 operator protos created in Python down to C++. To facilitate this, we create helper template wrappers that pass a flag for "allow_broadcast_fastpath" down to elementwise functors. This flag will determine whether to try and take the broadcast fastpath, which we will add in subsequent diffs.

Test Plan: sandcastle + let github CI run

Differential Revision: D28154475

fbshipit-source-id: 15750a0bcd2994fbc6a61fb5653d8cae6b0177dd
2021-07-29 16:31:02 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Yury Gitman
9c8f5cb61d Ensure IDEEP transpose operator works correctly
Summary: I found out that without exporting to public format IDEEP transpose operator in the middle of convolution net produces incorrect results (probably reading some out-of-bound memory). Exporting to public format might not be the most efficient solution, but at least it ensures correct behavior.

Test Plan: Running ConvFusion followed by transpose should give identical results on CPU and IDEEP

Reviewed By: bwasti

Differential Revision: D22970872

fbshipit-source-id: 1ddca16233e3d7d35a367c93e72d70632d28e1ef
2020-08-11 12:58:31 -07:00
pinzhenx
bd604cb5b7 Upgrade MKL-DNN to DNNL v1.2 (#32422)
Summary:
## Motivation

This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.

DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.

This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.

<br>

## What's included?

Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:

<br>

**General:**

1. Replace op-level allocator with global-registered allocator

```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);

// after
ideep::sum::compute(scales, {x, y}, z);
```

The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.

```
RegisterEngineAllocator cpu_alloc(
  ideep::engine::cpu_engine(),
  [](size_t size) {
    return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
  },
  [](void* p) {
    c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
  }
);
```
------

2. Simplify group convolution

We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.

As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.

```
// aten/src/ATen/native/mkldnn/Conv.cpp

if (w.ndims() == x.ndims() + 1) {
  AT_ASSERTM(
      groups > 1,
      "Only group _mkldnn_conv2d weights could have been reordered to 5d");
  kernel_size[0] = w.get_dim(0) * w.get_dim(1);
  std::copy_n(
      w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
  std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```

------

3. Enable DNNL built-in cache

Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.

This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.

------

4. Use 64-bit integer to denote dimensions

We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.

<br>

**Misc changes in each commit:**

**Commit:** change build options

Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.

Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)

------

**Commit:** aten reintegration

- aten/src/ATen/native/mkldnn/BinaryOps.cpp

    Implement binary ops using new operation `binary` provided by DNNL

- aten/src/ATen/native/mkldnn/Conv.cpp

    Clean up group convolution checks
    Simplify conv backward integration

- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

    Simplify prepacking convolution weights

- test/test_mkldnn.py

    Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue

- torch/utils/mkldnn.py

    Prepack weight tensor on module `__init__` to achieve better performance significantly

------

**Commit:** caffe2 reintegration

- caffe2/ideep/ideep_utils.h

    Clean up unused type definitions

- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc

   Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`

- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc

    Clean up group convolution checks
    Revamp convolution API

- caffe2/ideep/operators/conv_transpose_op.cc

    Clean up group convolution checks
    Clean up deconv workaround code

------

**Commit:** custom allocator

- Register c10 allocator as mentioned above

<br><br>

## Performance

We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.

ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%

_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_

† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422

Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results

10% improvement for ResNext with avx512, neutral on avx2

More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP

Reviewed By: yinghai

Differential Revision: D20381325

Pulled By: dzhulgakov

fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
2020-03-26 22:07:59 -07:00
xiaobing.zhang
19bb496a0d Enable mkldnn on windows (#31355)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15982.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31355

Differential Revision: D19428979

Pulled By: ezyang

fbshipit-source-id: bee304c5913e70e8dead3098e9796051861cd666
2020-01-27 09:00:02 -08:00
Pieter Noordhuis
3556bea5aa Build torch.distributed with Gloo backend on macOS (#25260)
Summary:
In facebookincubator/gloo#212, a libuv based Gloo transport was introduced,
which allows us to use Gloo on macOS (and later perhaps also Windows). This
commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS.

A few notes:
* The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`.
* The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS).
* The TCP store works but sometimes crashes on process termination.
* The distributed tests are not yet run.
* The nightly builds don't use `USE_DISTRIBUTED=1`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260

Reviewed By: mrshenli

Differential Revision: D17202381

Pulled By: pietern

fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c
2019-09-05 07:09:50 -07:00
Cheng,Penghui
7ee82d48a8 Removed work around for convolution transpose op since the bug has be… (#22184)
Summary:
…en fixed in v0.18
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22184

Differential Revision: D15982627

Pulled By: bddppq

fbshipit-source-id: 8725d5b5e5b68e029ffb08af12b416bd310c9638
2019-06-25 14:34:34 -07:00
Jinghui
29c849ff34 implement transpose operator for MKLDNN (#19955)
Summary:
implement transpose operator for MKLDNN
1. upgrade mkldnn-bridge to support ND transpose
2. implement transpose operator in caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19955

Differential Revision: D15701832

Pulled By: bddppq

fbshipit-source-id: e4337cd0ba6f8180a35c8c70cbb6830a0a84182f
2019-06-11 01:55:13 -07:00
Cheng,Penghui
74f6c55f0f support negative axis in concat and split operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17955

Differential Revision: D14476031

Pulled By: ezyang

fbshipit-source-id: e0e57e8595ed2005ded9e923572a40fe62aca5a7
2019-06-10 15:26:29 -07:00
Tim Khatkevich
a5cca4d342 add failback for Sign operator (#21343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21343

Needed to binarise features

Reviewed By: yinghai

Differential Revision: D15625653

fbshipit-source-id: 52f48259a040dac35a7000bb1eea9feb5c7ef1ab
2019-06-07 10:56:22 -07:00
Gu, Jinghui
b675f07bb6 Remove useless input shape checker in conv (#19608)
Summary:
The input shape checkers in conv/int8_conv operator is aims to avoid the issue when running with mkldnn winograd, the weigths has to be reordered each time if input shape changed.
However, the checkers result to big performance regression due to frequent reorder.

Meanwhile, in mkldnn-bridge, such case has been already fixed by correcting the prop_kind.
Therefore, we have to remove the useless checker to fix the performance regression.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19608

Differential Revision: D15061169

Pulled By: yinghai

fbshipit-source-id: 649a43ae6fce989e84939210f6dffb143ec3d350
2019-04-24 11:39:43 -07:00
Gu, Jinghui
575aebc182 implement operators for DNNLOWP (#18656)
Summary:
Implement operators for DNNLOWP, including int8_conv, int8_FC, int8_pooling, int8_relu, int8_sum, quantize/dequantize, and order_swtich operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18656

Differential Revision: D14767092

Pulled By: yinghai

fbshipit-source-id: 1f3e24929a358a42214da333bd304c593ea4468f
2019-04-10 12:04:39 -07:00
Gu, Jinghui
a7b82a44c4 Upgrade mkldnn-bridge for dnnlowp support (#16308)
Summary:
The mkldnn-bridge is upgraded in this PR to support DNNLOWP operators.
Meanwhile, APIs have been updated in caffe2 to use latest version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16308

Differential Revision: D14697018

Pulled By: yinghai

fbshipit-source-id: ca952589098accb08295fd5aa92924c61e74d69c
2019-04-03 12:47:17 -07:00
Tim Khatkevich
6aacc1b2dd Support failback for more operators in ideep (#17747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17747

RMACRegions, Normalize and RoIPooling

Reviewed By: dskhudia

Differential Revision: D14365096

fbshipit-source-id: dafcb7077515e03c2880832a442015b70fc7140d
2019-03-08 05:48:22 -08:00
Michael Liu
f9ba3831ef Apply modernize-use-override (4)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.
bypass-lint
drop-conflicts

Reviewed By: ezyang

Differential Revision: D14191981

fbshipit-source-id: 1f3421335241cbbc0cc763b8c1e85393ef2fdb33
2019-02-25 08:31:27 -08:00
Gu, Jinghui
60de0b885f fallback operators to CPU for onnx support (#15270)
Summary:
fallback operators to CPU for onnx support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15270

Differential Revision: D14099496

Pulled By: yinghai

fbshipit-source-id: 52b744aa5917700a802bdf19f7007cdcaa6e640a
2019-02-22 10:47:53 -08:00
Cheng,Penghui
376bb40379 Implementation convolutionTranspose operator for mkl-dnn (#12866)
Summary:
the speed-up of a single operation is up to 2-3X on BDW.
This PR depend on https://github.com/pytorch/pytorch/pull/14308
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12866

Differential Revision: D13936110

Pulled By: ezyang

fbshipit-source-id: 34e3c2ca982a41e8bf556e2aa0477c999fc939d3
2019-02-20 17:26:10 -08:00
Michael Liu
5f866d0ea2 Apply modernize-use-override (2nd iteration)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14086124

fbshipit-source-id: 2005227d095d776ca3b4309a57f54e25782b9b58
2019-02-14 16:52:57 -08:00
Gu, Jinghui
5ada54e0bc Impl ExpandDims op and fallback to CPU if needed (#15264)
Summary:
Impl ExpandDims op and fallback to CPU if needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15264

Differential Revision: D13808797

Pulled By: yinghai

fbshipit-source-id: 7795ec303a46e85f84e5490273db0ec76e8b9374
2019-02-08 12:04:53 -08:00
Gu, Jinghui
887080e92a Fallback sum/add to CPU if needed (#15267)
Summary:
Fallback sum/add to CPU if needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15267

Differential Revision: D13935064

Pulled By: yinghai

fbshipit-source-id: eb228683d00a0462a1970f849d35365bc98340d6
2019-02-06 09:35:14 -08:00
Hui Wu
31ab03e34f Add Winograd Conv method for CPU (#15196)
Summary:
Add winograd conv method. Users can select the direct conv or winograd conv in the model file.
We close the origin pr https://github.com/pytorch/pytorch/pull/12154 and create this new one for better rebasing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15196

Differential Revision: D13463721

Pulled By: yinghai

fbshipit-source-id: c5cd5c8aa7622ae7e52aeabd3dbb8ffb99b9b4ee
2019-02-01 16:41:30 -08:00
Xiaomeng Yang
4ae9ab24b6 Update conv_base to support empty batch (#16603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16603

Update conv_base to support empty batch

Reviewed By: houseroad

Differential Revision: D13894111

fbshipit-source-id: fc4370ff16ba6046f374e77bd845d28e6af05ea3
2019-01-31 23:46:18 -08:00
Tim Khatkevich
2ed5569bd6 Support fallback for more operators (#16566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16566

it's a follow-up to https://github.com/pytorch/pytorch/pull/16456

Reviewed By: yinghai

Differential Revision: D13881462

fbshipit-source-id: eff063580ac8f622477417ed4b25320299451811
2019-01-30 13:21:20 -08:00
Yinghai Lu
fa717cba63 Support int64_t shape data for ideep reshape op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16533

Reviewed By: jerryzh168

Differential Revision: D13867402

fbshipit-source-id: ff53a851f142ef915ad69da3868bb3aab4d48987
2019-01-30 09:00:09 -08:00
Tim Khatkevich
7d7855ea31 Fallback support for more operators (#16456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16456

Adding fallbacks for more operators and fixing ifndef for expand_op.h

Reviewed By: yinghai

Differential Revision: D13845382

fbshipit-source-id: b7c5b7f7f176707b9ddffade139562a8085967ed
2019-01-30 03:54:11 -08:00
Jerry Zhang
539894d70a Remove caffe2::ShareData (#16139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16139

Original commit changeset: 4b15a4c62995

Reviewed By: dzhulgakov

Differential Revision: D13677464

fbshipit-source-id: 1a644a88fac02b44feebac48ccc01bc72cc47edb
2019-01-25 15:39:11 -08:00
Gu, Jinghui
0e6791b275 Impl Shape op for mkldnn (#15266)
Summary:
Impl Shape op for mkldnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15266

Differential Revision: D13804558

Pulled By: yinghai

fbshipit-source-id: 8a35f608c23973d7a15c3d645aee4059eb55f245
2019-01-25 11:04:57 -08:00
Shane Li
620ff25bdb Enhance cpu support on gloo based multi-nodes mode. (#11330)
Summary:
1. Add some gloo communication operators into related fallback list;
2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator;
3. Add new cpu context support for some python module files and resnet50 training example file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330

Reviewed By: yinghai

Differential Revision: D13624519

Pulled By: wesolwsk

fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f
2019-01-15 11:47:10 -08:00
Jerry Zhang
6371bc76a9 Back out "[pt1][tensor] Remove caffe2::ShareData" (#15983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15983

Original commit changeset: 6e4275d02f4c

Reviewed By: supertopher, Yangqing

Differential Revision: D13644123

fbshipit-source-id: 4b15a4c62995c0e68aad58465600409e302e6504
2019-01-12 07:07:22 -08:00
Cheng,Penghui
926e718d5f Add/fallback some operators for mkl-dnn (#11696)
Summary:
Implementation LeakyRelu operator for mkl-dnn,the speed-up of a single operation is up to 10X on BDW.
Implementation rashape operator for mkl-dnn,it will resolve occasionally crash issue which use fallback reshape operator.
Implementation CreateBlobQueue and SafeEnqueueBlobs operators,it will resolve crash issue which use fallback operators.
Fallback CreateBlobsQueueDBOp,TensorProtosDBInput,CloseBlobsQueue operators.
Implement adam operator for mkl-dnn,the speed-up of a single operator is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11696

Reviewed By: yinghai

Differential Revision: D10100438

Pulled By: wesolwsk

fbshipit-source-id: 0b6e06897cc11e0a8e349d80a870b1e72e47f10d
2019-01-11 12:53:06 -08:00
Gu, Jinghui
07ea3e035e Fix fallback issues to handle inplace case (#15726)
Summary:
Fix fallback issues to handle inplace case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15726

Differential Revision: D13591243

Pulled By: yinghai

fbshipit-source-id: 6897f1daacb36beabcdfc22c39242bbdfdd0e534
2019-01-10 19:47:09 -08:00
Jerry Zhang
ede1f4ad05 Remove caffe2::ShareData (#15418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15418

Previously we are using Resize + ShareData.
Instead, we'll create a function on Tensor that clones itself with same storage.

Suppose we want `t` to `ShareData` with `t0`, Previous:
```
Tensor t(dims, CPU);
t.Resize(t0.sizes());
t.ShareData(t0);
```
Now:
```
Tensor t = t0.Alias();
```

Reviewed By: dzhulgakov

Differential Revision: D13507609

fbshipit-source-id: 6e4275d02f4c3356cbce91127f1b01111dc86b9f
2019-01-08 11:01:56 -08:00
Gu, Jinghui
2ebeb33697 Fallback to CPU concat op to handle TensorCPU inputs (#15263)
Summary:
Fallback to CPU concat op to handle TensorCPU inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15263

Differential Revision: D13587030

Pulled By: yinghai

fbshipit-source-id: 010a8579d61c3beb8556eb92493a552b2ab0030c
2019-01-07 11:13:23 -08:00
Cheng,Penghui
1488c5dd03 support 0 size in any of the tensor dimensions in mkldnn (#15295)
Summary:
support 0 size in any of the tensor dimensions in mkldnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15295

Differential Revision: D13573747

Pulled By: yinghai

fbshipit-source-id: 5bf7a0b9e2567e80f44981a7823be5407fc94e53
2019-01-04 22:33:18 -08:00