Commit Graph

1044 Commits

Author SHA1 Message Date
Luca Wehrstedt
e5242aaf89 Update TensorPipe submodule (#45433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45433

Primarily in order to pick up the fix landed in https://github.com/pytorch/tensorpipe/pull/225 which fixes the handling of scopes in link-local IPv6 addresses, which was reported by a user.

Test Plan: The specific upstream change is covered by new unit tests. The submodule update will be validated by the PyTorch CI.

Reviewed By: beauby

Differential Revision: D23962289

fbshipit-source-id: 4ed762fc19c4aeb1398d1337d61b3188c4c228be
2020-09-28 10:32:06 -07:00
Michael Suo
5a0514e3e6 [pytorch] Update fmt to 7.0.3 (#45304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45304

As title

Test Plan: sandcastle

Reviewed By: malfet

Differential Revision: D23916328

fbshipit-source-id: 47c76886c1f17233304dc59289ff6baa16c50b8d
2020-09-25 11:33:36 -07:00
Shen Li
bfdf4323ac Bump up NCCL to 2.7.8 (#45251)
Summary:
Use latest NCCL

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45251

Reviewed By: mingzhe09088

Differential Revision: D23893064

Pulled By: mrshenli

fbshipit-source-id: 820dd166039e61a5aa59b4c5bbc615a7b18be8c3
2020-09-24 09:33:57 -07:00
Jordan Fix
c760bc8fb1 Add GlowLoadAOTModel flag (#45189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45189

Pull Request resolved: https://github.com/pytorch/glow/pull/4902

Test Plan: Test locally

Reviewed By: yinghai

Differential Revision: D23810445

fbshipit-source-id: 56e717d80abbfe76b15d0f4249e1e399a9722753
2020-09-23 20:50:04 -07:00
gunandrose4u
acc2a1e5fa Update submodule gloo (#45025)
Summary:
Including commits to fix Windows CI failure of enable distributed training on Windows PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45025

Reviewed By: beauby

Differential Revision: D23807995

Pulled By: mrshenli

fbshipit-source-id: a2f4c1684927ca66d7d3e9920ecb588fb4386f7c
2020-09-21 10:28:37 -07:00
Lucas Hosseini
ac8c7c4e9f Make Channel API accept buffer structs rather than raw pointers. (#45014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45014

Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/219

Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/212

+ Introduce buffer.h defining the buffer struct(s). The `CpuBuffer`
struct is always defined, while the `CudaBuffer` struct is defined
only when `TENSORPIPE_SUPPORTS_CUDA` is true.
+ Update all channels to take a `CpuBuffer` or `CudaBuffer` for
`send`/`recv` rather than a raw pointer and a length.
+ Make the base `Channel`/`Context` classes templated on `TBuffer`,
effectively creating two channel hierarchies (one for CPU channels,
one for CUDA channels).
+ Update the Pipe and the generic channel tests to use the new API. So
far, generic channel tests are CPU only, and tests for the CUDA IPC
channel are (temporarily) disabled. A subsequent PR will take care of
refactoring tests so that generic tests work for CUDA channels. An
other PR will add support for CUDA tensors in the Pipe.

Differential Revision: D23598033

Test Plan: Imported from OSS

Reviewed By: lw

Pulled By: beauby

fbshipit-source-id: 1d6c3f91e288420858835cd5e7962e8da051b44b
2020-09-21 10:18:45 -07:00
Natalia Gimelshein
620c999979 update gloo submodule (#45008)
Summary:
Revert accidental gloo submodule changes in https://github.com/pytorch/pytorch/issues/41977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45008

Reviewed By: malfet

Differential Revision: D23799892

Pulled By: ngimel

fbshipit-source-id: e8dab244c6abad32ed60efe3c26cab40837e57c8
2020-09-18 19:02:36 -07:00
Nikita Shulga
1c15452703 Update Windows builders to latest VS2019 (#44746)
Summary:
Restore https://github.com/pytorch/pytorch/issues/44706, which should workaround VC compiler crash, which was reverted by https://github.com/pytorch/pytorch/issues/41977
Update configs to use ":stable" Windows images

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44746

Reviewed By: walterddr

Differential Revision: D23793682

Pulled By: malfet

fbshipit-source-id: bfdc36c35b920f58798a18c15642ec7efc68f00e
2020-09-18 18:46:44 -07:00
Sameer Deshmukh
e18a2219dd Implement scatter reductions (CUDA), remove divide/subtract (#41977)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33394 .

This PR does two things:
1. Implement CUDA scatter reductions with revamped GPU atomic operations.
2. Remove support for divide and subtract for CPU reduction as was discussed with ngimel .

I've also updated the docs to reflect the existence of only multiply and add.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41977

Reviewed By: mruberry

Differential Revision: D23748888

Pulled By: ngimel

fbshipit-source-id: ea643c0da03c9058e433de96db02b503514c4e9c
2020-09-16 23:25:21 -07:00
pinzhenx
72b5665c4f Upgrade oneDNN (mkl-dnn) to v1.6 (#44706)
Summary:
- Bump oneDNN (mkl-dnn) to 1.6 for bug fixes
    - Fixes https://github.com/pytorch/pytorch/issues/42446. RuntimeError: label is redefined for convolutions with large filter size on Intel AVX512
    - Implemented workaround for internal compiler error when building oneDNN with Microsoft Visual Studio 2019 (https://github.com/pytorch/pytorch/pull/43169)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44706

Reviewed By: ngimel

Differential Revision: D23705967

Pulled By: albanD

fbshipit-source-id: 65e8fecc52a76c9f3324403a8b60ffa8a8948bc6
2020-09-15 09:30:01 -07:00
Facebook Community Bot
a91c2be2a9 Automated submodule update: FBGEMM (#44647)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 1d710393d5

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44647

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia

Differential Revision: D23684528

fbshipit-source-id: 316ff2e448707a6e5a83248c9b22e58118bc8741
2020-09-14 16:43:59 -07:00
Facebook Community Bot
870f647040 Automated submodule update: FBGEMM (#44581)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 0725301da5

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44581

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia, VitalyFedyunin

Differential Revision: D23665173

fbshipit-source-id: 03cee22335eef0517e561827795bbe2036942ea0
2020-09-13 21:26:56 -07:00
gunandrose4u
9a3b83cbf2 Update submodule gloo to have latest commits to enable it can work on Windows (#44529)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44529

Reviewed By: rohan-varma

Differential Revision: D23650123

Pulled By: mrshenli

fbshipit-source-id: b5b891cbcec51a14379d6604af63c714c32d93e7
2020-09-11 08:47:02 -07:00
Facebook Community Bot
1130de790c Automated submodule update: FBGEMM (#44177)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: d5ace7ca70

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44177

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia

Differential Revision: D23533561

fbshipit-source-id: 9e580f8dbfb83e57bebc28f8e459caa0c5fc7317
2020-09-08 10:12:21 -07:00
Daya Khudia
7d95eb8633 [fbgemm] manual submodule update (#44082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44082

Automated submodule is running into some test failures and I am not sure how can I rebase that.

automated submodule update:
https://github.com/pytorch/pytorch/pull/43817

Test Plan: CI tests

Reviewed By: jianyuh

Differential Revision: D23489240

fbshipit-source-id: a49b01786ebf0a59b719a0abf22398e1eafa90af
2020-09-03 10:07:46 -07:00
Nikita Shulga
04ccd3ed77 Fix bazel dependencies (#43688)
Summary:
Add `header_template_rule` to `substitution.bzl`
Use it in BUILD.bazel to specify dependencies on autogenerated headers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43688

Test Plan: bazel build --sandbox_writable_path=$HOME/.ccache -c dbg :caffe2

Reviewed By: seemethere

Differential Revision: D23374702

Pulled By: malfet

fbshipit-source-id: 180dd996d1382df86258bb6abab9f2c7e964152e
2020-08-27 12:11:34 -07:00
Facebook Community Bot
018b4d7abb Automated submodule update: FBGEMM (#43251)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 685149bbc0

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43251

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: YazhiGao

Differential Revision: D23207016

fbshipit-source-id: 54e13b246bb5189260ed11316ddf3d26d52c6b24
2020-08-19 11:42:16 -07:00
Facebook Community Bot
d60d6d0d7b Automated submodule update: FBGEMM (#42834)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 29d5eb9f3c

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42834

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D23040145

fbshipit-source-id: 1d7209ea1910419b7837703122b8a4c76380ca4a
2020-08-14 05:43:20 -07:00
Facebook Community Bot
77305c1e44 Automated submodule update: FBGEMM (#42781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42781

This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: fbd813e29f

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42771

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia

Differential Revision: D23015890

Pulled By: jspark1105

fbshipit-source-id: f0f62969f8744df96a4e7f5aff2ce95baabb2f76
2020-08-10 10:14:56 -07:00
Luca Wehrstedt
05f00532f5 Fix TensorPipe submodule (#42789)
Summary:
Not sure what happened, but possibly I landed a PR on PyTorch which updated the TensorPipe submodule to a commit hash of a *PR* of TensorPipe. Now that the latter PR has been merged though that same commit has a different hash. The commit referenced by PyTorch, therefore, has become orphaned. This is causing some issues.

Hence here I am updating the commit, which however does not change a single line of code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42789

Reviewed By: houseroad

Differential Revision: D23023238

Pulled By: lw

fbshipit-source-id: ca2dcf6b7e07ab64fb37e280a3dd7478479f87fd
2020-08-10 02:15:44 -07:00
Facebook Community Bot
4eb66b814e Automated submodule update: FBGEMM (#42713)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: a989b99279

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42713

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: amylittleyang

Differential Revision: D22990108

Pulled By: jspark1105

fbshipit-source-id: 3252a0f5ad9546221ef2fe908ce6b896252e1887
2020-08-07 13:41:54 -07:00
Xiang Gao
576aab5084 Bump up NCCL to 2.7.6 (#42645)
Summary:
Because 2.7.3 has some bug on GA100 which is fixed in 2.7.6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42645

Reviewed By: malfet

Differential Revision: D22977280

Pulled By: mrshenli

fbshipit-source-id: 74779eff90d7d660a988ff33659f3a2237ca7e29
2020-08-06 08:45:59 -07:00
Luca Wehrstedt
c30bc6d4d7 Update TensorPipe submodule (#42522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522

Main changes:
- Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch.
- Changed the way the preprocessor flags are provided, and changed their name.

There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories.

I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one.

Test Plan: CI

Reviewed By: malfet

Differential Revision: D22959472

fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67
2020-08-06 02:14:58 -07:00
Stephen Chen
54ffb05eff better error message between C2 and glow (#41603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41603

Pull Request resolved: https://github.com/pytorch/glow/pull/4704

Previously in the glow onnxifi path, when an error is encountered, we log it to stderr then just return ONNXIFI_STATUS_INTERNAL_ERROR to C2. C2 then does CAFFE2_ENFORCE_EQUAL(return_code, ONNXIFI_STATUS_SUCCESS). The error message that eventually went to the user is something like

   [enforce fail at onnxifi_op.cc:545] eventStatus == ONNXIFI_STATUS_SUCCESS. 1030 vs 0

This diff adds plumbing to get human readable error message out of glow into C2.

Test Plan:
Run ads replayer. Overload it with traffic. Now the error message sent back to the client used to be

  E0707 00:57:45.697196 3709559 Caffe2DisaggAcceleratorTask.cpp:493] During running REMOTE_OTHER net: [enforce fail at onnxifi_op.cc:545] eventStatus == ONNXIFI_STATUS_SUCCESS. 1030 vs 0 (Error from operator:....

Now it's

```
E0707 16:46:48.366263 1532943 Client.cpp:966] Exception when calling caffe2_run_disagg_accelerator on remote predictor for model 190081310_0 : apache::thrift::TApplicationException: c10::Error: [enforce fail at onnxifi_op.cc:556] .
Error code: RUNTIME_REQUEST_REFUSED
Error message: The number of allowed queued requests has been exceeded. queued requests: 100 allowed requests: 100
Error return stack:
glow/glow/lib/Runtime/HostManager/HostManager.cpp:673
glow/glow/lib/Onnxifi/HostMana (Error from operator:...
```

Reviewed By: gcatron, yinghai

Differential Revision: D22416857

fbshipit-source-id: 564bc7644d9666eb660725c2dca5637affae9b73
2020-08-05 16:25:13 -07:00
Facebook Community Bot
eb8a5fed38 Automated submodule update: FBGEMM (#42584)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 4abc34af1a

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42584

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia

Differential Revision: D22941475

fbshipit-source-id: 29863cad7f77939edb44d337918693879b35cfaa
2020-08-05 09:19:27 -07:00
Facebook Community Bot
c3e2ee725f Automated submodule update: FBGEMM (#42496)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 87c378172a

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42496

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia

Differential Revision: D22911638

fbshipit-source-id: f20c83908b51ff56d8bf1d8b46961f70d023c81a
2020-08-04 16:15:26 -07:00
Edward Yang
352e15f1a2 Revert D22812445: Update TensorPipe submodule
Test Plan: revert-hammer

Differential Revision:
D22812445 (2335430086)

Original commit changeset: e6d824bb28f5

fbshipit-source-id: 606632a9aaf2513b5ac949e4d6687aa7563eae5d
2020-07-31 10:16:48 -07:00
Facebook Community Bot
86b2faeb53 Automated submodule update: FBGEMM (#42302)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: e04b9ce034

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42302

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: efiks

Differential Revision: D22841424

fbshipit-source-id: 211463b0207da986fc5b451242ae99edf32b9f68
2020-07-30 08:56:34 -07:00
Luca Wehrstedt
2335430086 Update TensorPipe submodule (#42225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42225

Main changes:
- Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch.
- Changed the way the preprocessor flags are provided, and changed their name.

There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header.

I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one.

Test Plan: CircleCI is all green.

Reviewed By: beauby

Differential Revision: D22812445

fbshipit-source-id: e6d824bb28f5afe75fd765de0430968174f3531f
2020-07-30 02:32:52 -07:00
Facebook Community Bot
c8e15842aa Automated submodule update: FBGEMM (#42205)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: cad1c21404

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42205

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia

Differential Revision: D22806731

Pulled By: efiks

fbshipit-source-id: 779a9f7f00645e7e65f183e2832dc79117eae5fd
2020-07-29 09:26:18 -07:00
Natalia Gimelshein
b00c05c86c update cub submodule (#42042)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42042

Reviewed By: mruberry

Differential Revision: D22752345

Pulled By: ngimel

fbshipit-source-id: 363735bfe3d49bab12fedef43b68c9dc9e372815
2020-07-25 17:52:45 -07:00
Facebook Community Bot
9fbcfe848b Automated submodule update: FBGEMM (#41814)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 139c6f2292

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41814

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia

Differential Revision: D22648844

fbshipit-source-id: 4cfa8d83585407f870ea2bdee74e1c1f371082eb
2020-07-22 09:38:15 -07:00
Nikita Shulga
30551ea7b2 Update NCCL from 2.4.8 to 2.7.3 (#41608)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41608

Reviewed By: mrshenli, ngimel

Differential Revision: D22604953

Pulled By: malfet

fbshipit-source-id: 28151e2d5b6ea360b79896cb79c761756687d121
2020-07-20 13:21:47 -07:00
Alphons Jaimon
ce443def01 Grammar patch 1 (.md) (#41599)
Summary:
A minor spell check!
I have gone through a dozen of .md files to fix the typos.
zou3519 take a look!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41599

Reviewed By: ezyang

Differential Revision: D22601629

Pulled By: zou3519

fbshipit-source-id: 68d8f77ad18edc1e77874f778b7dadee04b393ef
2020-07-20 10:19:08 -07:00
Stanislau Hlebik
b774ce54f8 remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:19:47 -07:00
Stanislau Hlebik
8fdea489af remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:17:03 -07:00
Facebook Community Bot
58244a9586 Automated submodule update: FBGEMM (#40332)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 73ea1f5828

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40332

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: gchanan, yns88

Differential Revision: D22150737

fbshipit-source-id: fe7e6787adef9e2fedee5d1a0a1e57bc4760b88c
2020-07-16 10:32:39 -07:00
Hongyi Jia
f27e395a4a [Gloo] update gloo submodule for PyTorch (#41462)
Summary:
To include alltoall

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41462

Test Plan: CI

Reviewed By: osalpekar

Differential Revision: D22544255

Pulled By: jiayisuse

fbshipit-source-id: ad55a50a31e5e5affaf3e14e2401d38f99657dc9
2020-07-15 21:50:08 -07:00
Ashkan Aliabadi
c8deca8ea8 Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40524

Reviewed By: ezyang

Differential Revision: D22215742

Pulled By: AshkanAliabadi

fbshipit-source-id: ef594e0901337a92b21ddd44e554da66c723eb7c
2020-07-09 10:00:36 -07:00
Zhang, Xiaobing
63e5a53b8c DNNL: fix build error when DNNL using TBB threading pool (#40699)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40699

Differential Revision: D22286334

Pulled By: albanD

fbshipit-source-id: 0635a0a5e4bf80d44d90c86945d92e98e26ef480
2020-06-29 13:53:18 -07:00
Luca Wehrstedt
a62f8805e7 Update TensorPipe submodule (#40614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40614

This update pulls in a oneliner fix, which sets the TCP_NODELAY option on the TCP sockets of the UV transport. This leads to exceptional performance gains in terms of latency, with about a 25x improvement in one simple benchmark. This thus resolves a regression that TensorPipe had compared to the ProcessGroup agent and, in fact, ends up beating it by 2x.

The benchmark I ran is this, with the two endpoints pinned to different cores of the same machine:
```
torch.jit.script
def remote_fn(t: int):
    return t

torch.jit.script
def local_fn():
    for _ in range(1_000_000):
        fut = rpc.rpc_async("rhs", remote_fn, (42,))
        fut.wait()
```

And the average round-trip time (one iteration) is:
- TensorPipe with SHM: 97.2 us
- TensorPipe with UV _after the fix_: 205us
- Gloo: 440us
- TensorPipe with UV _before the fix_: 5ms

Test Plan: Ran PyTorch RPC test suite

Differential Revision: D22255393

fbshipit-source-id: 3f6825d03317d10313704c05a9280b3043920507
2020-06-26 11:45:51 -07:00
Ashkan Aliabadi
71edd7f175 Update FP16 to FP16:4dfe081cf6bcd15db339cf2680b9281b8451eeb3. (#40526)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40526

Differential Revision: D22215600

Pulled By: AshkanAliabadi

fbshipit-source-id: 6ff0c17d17f118b64ae34c0007b705c7127f07ef
2020-06-24 16:58:40 -07:00
Ashkan Aliabadi
a208a272cb Update cpuinfo to cpuinfo:63b254577ed77a8004a9be6ac707f3dccc4e1fd9. (#40516)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40516

Differential Revision: D22215554

Pulled By: AshkanAliabadi

fbshipit-source-id: f779cf6e08cf344b87071c2ffc9b3f7cf4659085
2020-06-24 16:47:24 -07:00
Ashkan Aliabadi
cef35e339f Update FXdiv to FXdiv:b408327ac2a15ec3e43352421954f5b1967701d1. (#40520)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40520

Differential Revision: D22215614

Pulled By: AshkanAliabadi

fbshipit-source-id: 5e41a3a69522cbfe1cc4ac76a0d1f3e90a58528d
2020-06-24 16:31:25 -07:00
Ashkan Aliabadi
4a0ba62ded Update psimd to psimd:072586a71b55b7f8c584153d223e95687148a900. (#40522)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40522

Differential Revision: D22215685

Pulled By: AshkanAliabadi

fbshipit-source-id: 78c103c4f7ad21e78069dc86a8ee47aebc9aa73e
2020-06-24 16:21:25 -07:00
Luca Wehrstedt
0e146d2df4 Update TensorPipe submodule (#40374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40374

To pick up two fixes to MPT:
4b1b855f21
462200aad3

MPT isn't yet used by PyTorch so this should have no effect

Test Plan: Export to CircleCI and test

Reviewed By: patricklabatut

Differential Revision: D22160029

fbshipit-source-id: 202ea7487fcde015e5856f71ad6aebdfa6564ee1
2020-06-22 09:40:17 -07:00
Luca Wehrstedt
c3ce35e67b Update TensorPipe submodule
Summary:
This is to import a few features:
- a fix to a race condition happening in SHM's use of epoll
- a new XTH channel, that uses a memcpy to transfer between threads of the same process
- a new MPT channel, that chunks and multiplexes tensors over multiple transport event loops

Test Plan: Run in CircleCI

Reviewed By: patricklabatut

Differential Revision: D22140736

fbshipit-source-id: a3cee8a3839d98a42b8438844a9fd24fd85b2744
2020-06-19 13:22:06 -07:00
pinzhenx
7f270233fb Upgrade DNNL to 1.5 (#40088)
Summary:
- Bump DNNL to 1.5
- Bug fixes and improvements in ideep
  - suppress g++ Wreorder warning
  - avoid rebuilding `libmkldnn.so` https://github.com/oneapi-src/oneDNN/issues/743
  - enable conv3d (integration code was checked in by Xiaobing https://github.com/pytorch/pytorch/pull/35662)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40088

Differential Revision: D22071530

Pulled By: albanD

fbshipit-source-id: e7a53d7421e8a7a03e36a7dfb68edc565a2f00df
2020-06-16 11:42:30 -07:00
Luca Wehrstedt
14099374bd Update TensorPipe submodule (#39945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39945

In order to pick up 8fb1fe66f8.

Test Plan: Export to CircleCI and make sure tests pass.

Reviewed By: patricklabatut

Differential Revision: D22019033

fbshipit-source-id: eb192ea3950e4f27ed222f84e2d9de8bf6eb927c
2020-06-12 12:57:53 -07:00
Luca Wehrstedt
68b8740611 Update TensorPipe submodule (#39783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39783

This is needed to pick up the new pipe method used in https://github.com/pytorch/pytorch/pull/39781.

Test Plan: CircleCI

Reviewed By: patricklabatut

Differential Revision: D21974131

fbshipit-source-id: 4b74064279ad4881cbd95e408423566a1cd62c2a
2020-06-10 12:41:32 -07:00