Commit Graph

842 Commits

Author SHA1 Message Date
mattip
f10fbcc820 Split up documentation into subpages and clean up some warnings (#37419)
Summary:
xref gh-32838, gh-34032

This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages.

Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py`

I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419

Differential Revision: D21337640

Pulled By: ezyang

fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f
2020-05-04 09:39:22 -07:00
Shen Li
ba7461c135 Add pointer to RPC parameter server tutorial (#37667)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37667

Test Plan: Imported from OSS

Differential Revision: D21351052

Pulled By: mrshenli

fbshipit-source-id: 8c3f78215f40b5641983f1aea4ac92152a9c136a
2020-05-01 12:18:45 -07:00
Shen Li
49c8a37a0d Fix doc-gen warnings in RPC (#37666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37666

Add `:orphan:` to avoid "WARNING: document isn't included in any toctree".

Test Plan: Imported from OSS

Differential Revision: D21351053

Pulled By: mrshenli

fbshipit-source-id: 6ff67c418fc1de410c7dc39ad9a0be5c30d07122
2020-05-01 12:17:15 -07:00
Jesse Brizzi
bca82801e7 add support for generating Vandermonde matrices (#36725)
Summary:
Adds support for generating Vandermonde matrices based off of the Numpy implementation found [here](https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/twodim_base.py#L475-L563).

Adds test to ensure generated matrix matches expected Numpy implementation. Note test are only limited to torch.long and torch.double due to differences in now PyTorch and Numpy deal with type promotion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36725

Differential Revision: D21075138

Pulled By: jessebrizzi

fbshipit-source-id: 6bb1559e8247945714469b0e2b07c6f4d5fd1fd0
2020-04-29 13:16:26 -07:00
Ilia Cherniavskii
d068a456d3 [resubmit] Enable global observers API (#37382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37382

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: jamesr66a

Differential Revision: D21268320

fbshipit-source-id: 93207e3b55325d20dcc5b1e8f448ab86933321da
2020-04-28 10:49:31 -07:00
Michael Suo
a4383266f0 Revert D21262421: [pytorch][PR] [doc] Fix JIT code highlighting
Test Plan: revert-hammer

Differential Revision:
D21262421

Original commit changeset: 4fb62cce9543

fbshipit-source-id: 4e852e178a2469d94ddbf8ee18903ed8cebd4906
2020-04-27 18:30:18 -07:00
Michael Suo
20143e5f27 Revert D21245094: [resubmit] Enable global observers API
Test Plan: revert-hammer

Differential Revision:
D21245094

Original commit changeset: 595e41b18206

fbshipit-source-id: 90344b361857d76ce5db75438c949dad1f5f186b
2020-04-27 16:19:46 -07:00
Wanchao Liang
1039b95ff0 [autograd] add documentation about multithread autograd (#37020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37020

Add multithread autograd documentation to the doc note.

Test Plan: Imported from OSS

Differential Revision: D21260996

Pulled By: wanchaol

fbshipit-source-id: 91d523560268ae62d4c6d773121b282ba837a561
2020-04-27 15:53:21 -07:00
Shawn Zhong
023c3575f0 [doc] Fix JIT code highlighting (#37338)
Summary:
Fix https://github.com/pytorch/pytorch/issues/36216

| Before | After |
| --- | --- |
| ![image](https://user-images.githubusercontent.com/6421097/80353700-55abec80-883b-11ea-9ae2-72f37ba23c16.png)| ![image](https://user-images.githubusercontent.com/6421097/80353403-ef26ce80-883a-11ea-885b-2a2963f79d20.png) |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37338

Differential Revision: D21262421

Pulled By: mrshenli

fbshipit-source-id: 4fb62cce9543e6a4852828f58a279c36565f8c44
2020-04-27 15:04:42 -07:00
Ilia Cherniavskii
5fab4c30dd [resubmit] Enable global observers API (#37292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37292

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: jamesr66a

Differential Revision: D21245094

fbshipit-source-id: 595e41b18206d2ba4cf639cb320f630907868b3f
2020-04-27 14:24:51 -07:00
Ilia Cherniavskii
856e8cf028 Revert D21213786: Enable global observers API
Test Plan: revert-hammer

Differential Revision:
D21213786

Original commit changeset: e618254da74a

fbshipit-source-id: 425ea5d44fa55655ec0dd586c5075996b926177b
2020-04-25 00:59:24 -07:00
Ilia Cherniavskii
6e659e928b Enable global observers API (#37195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37195

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: ngimel

Differential Revision: D21213786

fbshipit-source-id: e618254da74a4f1ce16c51a3869bbd75a4f561ad
2020-04-24 23:49:28 -07:00
moto
5a27ec09b8 Add Inverse Short Time Fourier Transform in ATen native (#35569)
Summary:
Ported `torchaudio`'s implementation (test, and documentation as well) to ATen.

Note
 - Batch packing/unpacking is performed in Python. ATen implementation expects 4D input tensor.
 - The way `hop_length` is initialized in the same way as `stft` implementation. [The Torchaudio's version tried to mimic the same behavior but slightly different](7da61a4bee/torchaudio/functional.py (L152-L157)).

Closes https://github.com/pytorch/pytorch/issues/34827
Relates https://github.com/pytorch/pytorch/issues/3775
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35569

Differential Revision: D21178090

Pulled By: mthrok

fbshipit-source-id: 2701a8b241a36a6fb1b740c2fb2b07cb938185d4
2020-04-24 12:14:55 -07:00
Alban Desmaison
3799d1d74a Fix many doc issues (#37099)
Summary:
Fix https://github.com/pytorch/pytorch/issues/35643 https://github.com/pytorch/pytorch/issues/37063 https://github.com/pytorch/pytorch/issues/36307 https://github.com/pytorch/pytorch/issues/35861 https://github.com/pytorch/pytorch/issues/35299 https://github.com/pytorch/pytorch/issues/23108 https://github.com/pytorch/pytorch/issues/4661

Just a bunch of small updates on the doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37099

Differential Revision: D21185713

Pulled By: albanD

fbshipit-source-id: 4ac06d6709dc0da6109a6ad3daae75667ee5863e
2020-04-23 10:01:03 -07:00
Mike Ruberry
4a2372bc90 Implements torch.isclose for complex tensors (#36456)
Summary:
Previously torch.isclose would RuntimeError when called on complex tensors. This update updates torch.isclose to run on complex tensors and be consistent with [NumPy](https://numpy.org/doc/1.18/reference/generated/numpy.isclose.html). However, NumPy's handling of NaN, -inf, and inf values is odd, so I adopted  Python's [cmath.isclose](https://docs.python.org/3/library/cmath.html) behavior when dealing with them. See https://github.com/numpy/numpy/issues/15959 for more on NumPy's behavior.

While implementing complex isclose I also simplified the isclose algorithm to:

- A is close to B if A and B are equal, if equal_nan is true then NaN is equal to NaN
- If A and B are finite, then A is close to B if `abs(a - b) <= (atol + abs(rtol * b))`

This PR also documents torch.isclose, since it was undocumented, and adds multiple tests for its behavior to test_torch.py since it had no dedicated tests.

The PR leaves equal_nan=True with complex inputs an error for now, pending the outcome of https://github.com/numpy/numpy/issues/15959.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36456

Differential Revision: D21159853

Pulled By: mruberry

fbshipit-source-id: fb18fa7048e6104cc24f5ce308fdfb0ba5e4bb30
2020-04-21 19:53:55 -07:00
Shen Li
b982a6a247 Expose torch.distributed.is_available() API (#37021)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37021

Test Plan: Imported from OSS

Differential Revision: D21164318

Pulled By: mrshenli

fbshipit-source-id: 08a446af342cbe54f3eb4994956ffa7ef4922bcf
2020-04-21 18:38:46 -07:00
Jesse Brizzi
28f439d4f4 add absolute alias for abs (#36597)
Summary:
Adds an absolute alias for the abs function to match Numpy's use of both:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.absolute.html

Adds test to ensure the output from abs and absolute are the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36597

Differential Revision: D21024458

Pulled By: jessebrizzi

fbshipit-source-id: 4f2987e7bc7cde444d0a93e833a0350844b48d44
2020-04-20 14:49:51 -07:00
Joseph Spisak
3c55b5a8ef
Update persons_of_interest.rst 2020-04-19 20:26:02 -07:00
Karel Ha
5d9b4d5720 Update contribution_guide.rst (#36438)
Summary:
Fix formatting: change "Frequently Asked Questions" into an RST header, which is clickable and one get a URL of the FAQ section
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36438

Differential Revision: D21106180

Pulled By: mruberry

fbshipit-source-id: 370dafd1883bd57285b478cf2faa14ae2f86e3ba
2020-04-18 02:27:38 -07:00
Michael Carilli
e6bc34f549 Amp gradient accumulation example (#36601)
Summary:
Several people have asked me about proper Amp usage with gradient accumulation.  In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step.  This PR adds a minimal accumulation example.

I built the docs locally and it looks free from sphinx errors, at least.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601

Differential Revision: D21082295

Pulled By: ngimel

fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac
2020-04-17 09:56:36 -07:00
Jessica Lin
ac950bb9c8 Update docs for master to remove Python 2 references (#36336)
Summary:
Fix compile error from original PR in jit_language_references.rst: https://github.com/pytorch/pytorch/pull/36114

Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265

With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36336

Differential Revision: D21057507

Pulled By: jlin27

fbshipit-source-id: 993a763f1ecb16dad859bc02a07625ddc023645d
2020-04-16 10:15:48 -07:00
Shen Li
049dede3be Move rpc.rst back to the source folder to preserve existing doc URLs (#36675)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36675

Test Plan: Imported from OSS

Differential Revision: D21048628

Pulled By: mrshenli

fbshipit-source-id: 3cb1b35ddc1f40c673b0db9048d77dfa024be1e7
2020-04-16 08:12:34 -07:00
Omkar Salpekar
5927a6731c [PyTorch Docs] Updated RRef docs to indicate RPC Retries (#36678)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36678

Updated the docs to explicitly indicate that RRef control messages are
idempotent and retried upon failure.
ghstack-source-id: 102225791

Test Plan: build bot

Differential Revision: D20828041

fbshipit-source-id: ca4d71c65a453664c16c32134c47637a966b1a19
2020-04-15 17:33:20 -07:00
Kurt Mohler
2bc49a4b85 block_diag dense (#33449)
Summary:
Add block_diag function for dense tensors, based on scipy.linalg.block_diag

Closes https://github.com/pytorch/pytorch/issues/31932
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33449

Differential Revision: D20943099

Pulled By: zou3519

fbshipit-source-id: 8b5c9476fb5af959aafa4169612c660396d9b717
2020-04-13 10:04:55 -07:00
Hameer Abbasi
1875c2e4bd Add torch.Tensor.as_subclass method. (#34369)
Summary:
This is according to pytorch/rfcs#3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34369

Differential Revision: D20963929

Pulled By: ezyang

fbshipit-source-id: e618af6fd36e1dfaeda617162314ad5840f55358
2020-04-10 09:16:35 -07:00
Edward Yang
6016f694c0 Revert D20901746: [pytorch][PR] Update docs for master to remove Python 2 references
Test Plan: revert-hammer

Differential Revision:
D20901746

Original commit changeset: 07f8dc8e6fab

fbshipit-source-id: 13c55597f9f79b8473210cf35a5a0f1fb34bae39
2020-04-08 14:49:11 -07:00
Jessica Lin
373dc7c8ef Group libraries in TOC and add PyTorch Elastic (#34928)
Summary:
Move XLA out of Notes and group with other libraries. Also adds link to PyTorch Elastic

![image](https://user-images.githubusercontent.com/8042156/76912125-f76d1080-686f-11ea-99d5-bb7be199adbd.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34928

Differential Revision: D20901732

Pulled By: jlin27

fbshipit-source-id: a5da915bb435a3aa8995d8bbe87f53ef79fd3ce6
2020-04-07 16:37:45 -07:00
Jessica Lin
43234be525 Update docs for master to remove Python 2 references (#36114)
Summary:
Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265

With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36114

Differential Revision: D20901746

Pulled By: jlin27

fbshipit-source-id: 07f8dc8e6fab0b232e5048a63079cab0c433c85f
2020-04-07 16:13:18 -07:00
Orion Reblitz-Richardson
2d8dbcd3ef Remove python2 and 3.5 from requirements.txt, README and docs (#35677)
Summary:
Some more cleanup now that we no longer support python2 or 3.5 on master and eventually PyTorch 1.6 release.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35677

Differential Revision: D20838097

Pulled By: orionr

fbshipit-source-id: 95d553a1e8769f3baa395e0bc6d4ce7cd93236e9
2020-04-03 11:05:43 -07:00
Feng Tian
762270c51f add c10d dynamic loading mechanism and unit test (#28068)
Summary:
The original behavior of pytorch c10d only supports built-in c10d backends, such as
nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically
loading 3rd party communication libraries which are derived from ProcessGroup base class.

related RFC is in: https://github.com/pytorch/pytorch/issues/27955

Through this way, user just need specify a 3rd party c10d backend name when invoking
torch.distributed.init_process_group(). The proposed logic will try to load corresponding
c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend
through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068

Differential Revision: D19174838

Pulled By: agolynski

fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62
2020-04-02 15:46:51 -07:00
anjali411
c070e8fb26 Updated canCast to disallow complex -> non complex conversion (#35883)
Summary:
fixes https://github.com/pytorch/pytorch/issues/35675
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35883

Differential Revision: D20818130

Pulled By: anjali411

fbshipit-source-id: c9b4b6112897639d1e9b7073c5dac7a29b9cd990
2020-04-02 15:12:38 -07:00
Rohan Varma
6616fad92e [Docs] Fix typo in RPC docs (#35809)
Summary:
It's also fixed in the cherry pick PR https://github.com/pytorch/pytorch/pull/35808
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35809

Differential Revision: D20803338

Pulled By: rohan-varma

fbshipit-source-id: 1925f367703faf053ab4b1c0ff0acb86230c5d89
2020-04-01 21:16:12 -07:00
Dhiraj D Kalamkar
945d7a7408 Add All-to-all comms support to distributed module and MPI backend (#32361)
Summary:
As described in https://github.com/pytorch/pytorch/issues/32345, a prototype implementation to add an alltoall communication primitive to torch.distributed module and ProcessGroup abstract interface. Also, implements alltoall in ProcessGroupMPI backend.

mnaumovfb JianpingChen066 dmudiger srinivas212 Jianhui-Li mshiryaev ftian1

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini xush6528 osalpekar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32361

Reviewed By: mrshenli

Differential Revision: D20635481

Pulled By: srinivas212

fbshipit-source-id: 3dd0af800ce55d02f02813cde550e3a0f1a287d2
2020-04-01 08:57:12 -07:00
Rohan Varma
1f06db2579 Refactored rpc docs (#35109)
Summary:
Reorganize as per jlin27 's comments. Screenshots added in comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109

Differential Revision: D20788774

Pulled By: rohan-varma

fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766
2020-04-01 02:01:34 -07:00
Ilia Cherniavskii
bc6bd0bb1a Debug Information Guard
Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other)

Test Plan: CI test/cpp/jit

Reviewed By: dzhulgakov

Differential Revision: D20602775

fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb
2020-04-01 01:55:29 -07:00
Ilia Cherniavskii
800d5617c0 Recording of TorchScript functions (#34710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710

Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.

Test Plan: unit test (test_misc.cpp/testRecordFunction)

Reviewed By: gdankel, dzhulgakov

Differential Revision: D20158523

fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
2020-03-31 00:33:23 -07:00
Mike Ruberry
860790de88 Makes torch.real and torch.imag NumPy compatible, but disables them for complex tensors (#35560)
Summary:
The current implementations of torch.real and torch.imag are not NumPy compatible. In particular:

- torch.real on a real tensor does not return the real tensor, like contiguous
- torch.real on a complex tensor does not return a real-valued view of the real part
- torch.imag on a complex tensor does not return a real-valued view of the imaginary part
- torch.Tensor.real and torch.Tensor.imag exist as methods, but in NumPy they are writable attributes

This PR makes the functions NumPy compatible by removing the method variants and out kwarg, restricting them to work on only real tensors, and updating the behavior of torch.real to return its input. New tests are added to test_torch.py to verify the behavior, a couple existing complex tests are skipped, and the documentation is updated to reflect the change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35560

Differential Revision: D20714568

Pulled By: mruberry

fbshipit-source-id: 5dd092f45757b620c8426c829dd15ee997246a26
2020-03-29 02:09:00 -07:00
pinzhenx
bd604cb5b7 Upgrade MKL-DNN to DNNL v1.2 (#32422)
Summary:
## Motivation

This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.

DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.

This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.

<br>

## What's included?

Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:

<br>

**General:**

1. Replace op-level allocator with global-registered allocator

```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);

// after
ideep::sum::compute(scales, {x, y}, z);
```

The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.

```
RegisterEngineAllocator cpu_alloc(
  ideep::engine::cpu_engine(),
  [](size_t size) {
    return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
  },
  [](void* p) {
    c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
  }
);
```
------

2. Simplify group convolution

We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.

As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.

```
// aten/src/ATen/native/mkldnn/Conv.cpp

if (w.ndims() == x.ndims() + 1) {
  AT_ASSERTM(
      groups > 1,
      "Only group _mkldnn_conv2d weights could have been reordered to 5d");
  kernel_size[0] = w.get_dim(0) * w.get_dim(1);
  std::copy_n(
      w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
  std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```

------

3. Enable DNNL built-in cache

Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.

This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.

------

4. Use 64-bit integer to denote dimensions

We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.

<br>

**Misc changes in each commit:**

**Commit:** change build options

Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.

Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)

------

**Commit:** aten reintegration

- aten/src/ATen/native/mkldnn/BinaryOps.cpp

    Implement binary ops using new operation `binary` provided by DNNL

- aten/src/ATen/native/mkldnn/Conv.cpp

    Clean up group convolution checks
    Simplify conv backward integration

- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

    Simplify prepacking convolution weights

- test/test_mkldnn.py

    Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue

- torch/utils/mkldnn.py

    Prepack weight tensor on module `__init__` to achieve better performance significantly

------

**Commit:** caffe2 reintegration

- caffe2/ideep/ideep_utils.h

    Clean up unused type definitions

- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc

   Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`

- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc

    Clean up group convolution checks
    Revamp convolution API

- caffe2/ideep/operators/conv_transpose_op.cc

    Clean up group convolution checks
    Clean up deconv workaround code

------

**Commit:** custom allocator

- Register c10 allocator as mentioned above

<br><br>

## Performance

We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.

ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%

_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_

† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422

Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results

10% improvement for ResNext with avx512, neutral on avx2

More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP

Reviewed By: yinghai

Differential Revision: D20381325

Pulled By: dzhulgakov

fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
2020-03-26 22:07:59 -07:00
Ailing Zhang
7580470cc5 Update view op list. (#35399)
Summary:
Adding ops to the list based on our discussion. :D
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35399

Differential Revision: D20651393

Pulled By: ailzhang

fbshipit-source-id: 8cf9026d10c0d74117953dbb68ebc2f537be956a
2020-03-25 16:15:00 -07:00
Yuichiro Ueno
aadd0fda8b Document reduce_scatter collective operation (#35274)
Summary:
I don't know why reduce_scatter collective operation is not documented so I add it to the document.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35274

Differential Revision: D20645850

Pulled By: mrshenli

fbshipit-source-id: 0a4458bff1a4e15a4593dd4dcc25e4e0f6e2265d
2020-03-25 13:36:29 -07:00
anjali411
c73e97033a Added type promotion logic for complex numbers (#34093)
Summary:
Issue: https://github.com/pytorch/pytorch/issues/33780
After this PR:
1. dtype promotion logic will correctly work for ops involving complex scalars
2. added alias for complex64 (cfloat) and complex128 (cdouble)
3. added an internal function get_complex_default_dtype (consciously not exposed in public API)
   - sets the default complex dtype to be double if default_dtype is set to double, else float https://github.com/pytorch/pytorch/pull/34093#discussion_r392350224
>>> 1j*torch.ones(2)
tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex64)

>>> torch.set_default_dtype(torch.float64)
>>> 1j*torch.ones(2)
tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex128)

>>> 1j + torch.ones(2)
tensor([(1.0000 + 1.0000j), (1.0000 + 1.0000j)], dtype=torch.complex128)

>>> torch.tensor(1j) + torch.ones(2,2)
tensor([[(1.0000 + 1.0000j), (1.0000 + 1.0000j)],
        [(1.0000 + 1.0000j), (1.0000 + 1.0000j)]], dtype=torch.complex128)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34093

Differential Revision: D20537125

Pulled By: anjali411

fbshipit-source-id: 05fb1f81b8ba039d0b698cdd2c0bbf8b0ce0b767
2020-03-25 09:12:21 -07:00
Michael Carilli
0f0271e255 [RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102)
Summary:
This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140.

The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`).

The present PR restores skipIfRocm.

Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011:

> https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.
>
> The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.
>
> Original description of https://github.com/pytorch/pytorch/pull/32140:
> > Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
> Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081
>
> > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102

Differential Revision: D20596918

Pulled By: ezyang

fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50
2020-03-24 09:08:04 -07:00
Mike Ruberry
7c1ea736ba Extends true_divide to be a method (#34794)
Summary:
Per title. See related https://github.com/pytorch/pytorch/pull/34570.

In PyTorch 1.7 the plan is for torch.div and Python's division operator to perform "true" division, like Python 3, JAX, and NumPy. To facilitate this change, this PR expands true_divide to be a method so it can cover all of torch.div's use cases.

New true_divide tests are added to test_torch.py, test_type_promotion.py, and test_sparse.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34794

Differential Revision: D20545507

Pulled By: mruberry

fbshipit-source-id: 55286f819716c8823d1930441a69008560ac2bd5
2020-03-23 23:12:23 -07:00
Peter Bell
bd0ef784e0 FAQ: Add note about recovering from OOM (#35214)
Summary:
Closes https://github.com/pytorch/pytorch/issues/18853

This documents the workaround needed to solve the issues in https://github.com/pytorch/pytorch/issues/18853
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35214

Differential Revision: D20604877

Pulled By: ezyang

fbshipit-source-id: 71ed13cfa567d8e88fa9f18180a171cd174fb528
2020-03-23 20:22:46 -07:00
Vitaly Fedyunin
40da01db6a Add docs about memory format (#34818)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34818

Test Plan: Imported from OSS

Differential Revision: D20601336

Pulled By: VitalyFedyunin

fbshipit-source-id: d34ad226be950bf134c6b383a4810ea6aa75599e
2020-03-23 15:06:33 -07:00
Jerry Zhang
3fa7813b9f [quant] Add dequantize.tensors (#34348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34348

We need this function to do swap dequantize for prim::ListConstruct since
the output of prim::ListConstruct is a list of Tensors

Test Plan:
.

Imported from OSS

Differential Revision: D20504454

fbshipit-source-id: e6155e37da98e2219a6f79737cd46fe32a509c9f
2020-03-20 22:51:51 -07:00
Xiang Gao
df8d6eeb19 Update docs about DP and DDP for CUDA (#35063)
Summary:
We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063

Differential Revision: D20549621

Pulled By: ngimel

fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543
2020-03-20 20:06:37 -07:00
Mike Ruberry
fe276d541e Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix)
Test Plan: revert-hammer

Differential Revision:
D20541921

Original commit changeset: abb5488dca86

fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4
2020-03-19 22:39:12 -07:00
Michael Carilli
991b97277a [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011)
Summary:
https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.

The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.

Original description of https://github.com/pytorch/pytorch/pull/32140:
> Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

> In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011

Differential Revision: D20541921

Pulled By: ezyang

fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5
2020-03-19 20:18:18 -07:00
albanD
1f4a4aaf64 functional autograd api (#34066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34066

Basic implementation of https://github.com/pytorch/pytorch/issues/30632

Test Plan: Imported from OSS

Differential Revision: D20260307

Pulled By: albanD

fbshipit-source-id: 7db5c2411ddc3e954ff8fbbe93eb3b96a2bcfb2f
2020-03-19 08:24:07 -07:00