Commit Graph

823 Commits

Author SHA1 Message Date
Michael Carilli
e6bc34f549 Amp gradient accumulation example (#36601)
Summary:
Several people have asked me about proper Amp usage with gradient accumulation.  In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step.  This PR adds a minimal accumulation example.

I built the docs locally and it looks free from sphinx errors, at least.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601

Differential Revision: D21082295

Pulled By: ngimel

fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac
2020-04-17 09:56:36 -07:00
Jessica Lin
ac950bb9c8 Update docs for master to remove Python 2 references (#36336)
Summary:
Fix compile error from original PR in jit_language_references.rst: https://github.com/pytorch/pytorch/pull/36114

Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265

With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36336

Differential Revision: D21057507

Pulled By: jlin27

fbshipit-source-id: 993a763f1ecb16dad859bc02a07625ddc023645d
2020-04-16 10:15:48 -07:00
Shen Li
049dede3be Move rpc.rst back to the source folder to preserve existing doc URLs (#36675)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36675

Test Plan: Imported from OSS

Differential Revision: D21048628

Pulled By: mrshenli

fbshipit-source-id: 3cb1b35ddc1f40c673b0db9048d77dfa024be1e7
2020-04-16 08:12:34 -07:00
Omkar Salpekar
5927a6731c [PyTorch Docs] Updated RRef docs to indicate RPC Retries (#36678)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36678

Updated the docs to explicitly indicate that RRef control messages are
idempotent and retried upon failure.
ghstack-source-id: 102225791

Test Plan: build bot

Differential Revision: D20828041

fbshipit-source-id: ca4d71c65a453664c16c32134c47637a966b1a19
2020-04-15 17:33:20 -07:00
Kurt Mohler
2bc49a4b85 block_diag dense (#33449)
Summary:
Add block_diag function for dense tensors, based on scipy.linalg.block_diag

Closes https://github.com/pytorch/pytorch/issues/31932
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33449

Differential Revision: D20943099

Pulled By: zou3519

fbshipit-source-id: 8b5c9476fb5af959aafa4169612c660396d9b717
2020-04-13 10:04:55 -07:00
Hameer Abbasi
1875c2e4bd Add torch.Tensor.as_subclass method. (#34369)
Summary:
This is according to pytorch/rfcs#3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34369

Differential Revision: D20963929

Pulled By: ezyang

fbshipit-source-id: e618af6fd36e1dfaeda617162314ad5840f55358
2020-04-10 09:16:35 -07:00
Edward Yang
6016f694c0 Revert D20901746: [pytorch][PR] Update docs for master to remove Python 2 references
Test Plan: revert-hammer

Differential Revision:
D20901746

Original commit changeset: 07f8dc8e6fab

fbshipit-source-id: 13c55597f9f79b8473210cf35a5a0f1fb34bae39
2020-04-08 14:49:11 -07:00
Jessica Lin
373dc7c8ef Group libraries in TOC and add PyTorch Elastic (#34928)
Summary:
Move XLA out of Notes and group with other libraries. Also adds link to PyTorch Elastic

![image](https://user-images.githubusercontent.com/8042156/76912125-f76d1080-686f-11ea-99d5-bb7be199adbd.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34928

Differential Revision: D20901732

Pulled By: jlin27

fbshipit-source-id: a5da915bb435a3aa8995d8bbe87f53ef79fd3ce6
2020-04-07 16:37:45 -07:00
Jessica Lin
43234be525 Update docs for master to remove Python 2 references (#36114)
Summary:
Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265

With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36114

Differential Revision: D20901746

Pulled By: jlin27

fbshipit-source-id: 07f8dc8e6fab0b232e5048a63079cab0c433c85f
2020-04-07 16:13:18 -07:00
Orion Reblitz-Richardson
2d8dbcd3ef Remove python2 and 3.5 from requirements.txt, README and docs (#35677)
Summary:
Some more cleanup now that we no longer support python2 or 3.5 on master and eventually PyTorch 1.6 release.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35677

Differential Revision: D20838097

Pulled By: orionr

fbshipit-source-id: 95d553a1e8769f3baa395e0bc6d4ce7cd93236e9
2020-04-03 11:05:43 -07:00
Feng Tian
762270c51f add c10d dynamic loading mechanism and unit test (#28068)
Summary:
The original behavior of pytorch c10d only supports built-in c10d backends, such as
nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically
loading 3rd party communication libraries which are derived from ProcessGroup base class.

related RFC is in: https://github.com/pytorch/pytorch/issues/27955

Through this way, user just need specify a 3rd party c10d backend name when invoking
torch.distributed.init_process_group(). The proposed logic will try to load corresponding
c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend
through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068

Differential Revision: D19174838

Pulled By: agolynski

fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62
2020-04-02 15:46:51 -07:00
anjali411
c070e8fb26 Updated canCast to disallow complex -> non complex conversion (#35883)
Summary:
fixes https://github.com/pytorch/pytorch/issues/35675
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35883

Differential Revision: D20818130

Pulled By: anjali411

fbshipit-source-id: c9b4b6112897639d1e9b7073c5dac7a29b9cd990
2020-04-02 15:12:38 -07:00
Rohan Varma
6616fad92e [Docs] Fix typo in RPC docs (#35809)
Summary:
It's also fixed in the cherry pick PR https://github.com/pytorch/pytorch/pull/35808
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35809

Differential Revision: D20803338

Pulled By: rohan-varma

fbshipit-source-id: 1925f367703faf053ab4b1c0ff0acb86230c5d89
2020-04-01 21:16:12 -07:00
Dhiraj D Kalamkar
945d7a7408 Add All-to-all comms support to distributed module and MPI backend (#32361)
Summary:
As described in https://github.com/pytorch/pytorch/issues/32345, a prototype implementation to add an alltoall communication primitive to torch.distributed module and ProcessGroup abstract interface. Also, implements alltoall in ProcessGroupMPI backend.

mnaumovfb JianpingChen066 dmudiger srinivas212 Jianhui-Li mshiryaev ftian1

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini xush6528 osalpekar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32361

Reviewed By: mrshenli

Differential Revision: D20635481

Pulled By: srinivas212

fbshipit-source-id: 3dd0af800ce55d02f02813cde550e3a0f1a287d2
2020-04-01 08:57:12 -07:00
Rohan Varma
1f06db2579 Refactored rpc docs (#35109)
Summary:
Reorganize as per jlin27 's comments. Screenshots added in comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109

Differential Revision: D20788774

Pulled By: rohan-varma

fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766
2020-04-01 02:01:34 -07:00
Ilia Cherniavskii
bc6bd0bb1a Debug Information Guard
Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other)

Test Plan: CI test/cpp/jit

Reviewed By: dzhulgakov

Differential Revision: D20602775

fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb
2020-04-01 01:55:29 -07:00
Ilia Cherniavskii
800d5617c0 Recording of TorchScript functions (#34710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710

Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.

Test Plan: unit test (test_misc.cpp/testRecordFunction)

Reviewed By: gdankel, dzhulgakov

Differential Revision: D20158523

fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
2020-03-31 00:33:23 -07:00
Mike Ruberry
860790de88 Makes torch.real and torch.imag NumPy compatible, but disables them for complex tensors (#35560)
Summary:
The current implementations of torch.real and torch.imag are not NumPy compatible. In particular:

- torch.real on a real tensor does not return the real tensor, like contiguous
- torch.real on a complex tensor does not return a real-valued view of the real part
- torch.imag on a complex tensor does not return a real-valued view of the imaginary part
- torch.Tensor.real and torch.Tensor.imag exist as methods, but in NumPy they are writable attributes

This PR makes the functions NumPy compatible by removing the method variants and out kwarg, restricting them to work on only real tensors, and updating the behavior of torch.real to return its input. New tests are added to test_torch.py to verify the behavior, a couple existing complex tests are skipped, and the documentation is updated to reflect the change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35560

Differential Revision: D20714568

Pulled By: mruberry

fbshipit-source-id: 5dd092f45757b620c8426c829dd15ee997246a26
2020-03-29 02:09:00 -07:00
pinzhenx
bd604cb5b7 Upgrade MKL-DNN to DNNL v1.2 (#32422)
Summary:
## Motivation

This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.

DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.

This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.

<br>

## What's included?

Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:

<br>

**General:**

1. Replace op-level allocator with global-registered allocator

```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);

// after
ideep::sum::compute(scales, {x, y}, z);
```

The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.

```
RegisterEngineAllocator cpu_alloc(
  ideep::engine::cpu_engine(),
  [](size_t size) {
    return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
  },
  [](void* p) {
    c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
  }
);
```
------

2. Simplify group convolution

We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.

As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.

```
// aten/src/ATen/native/mkldnn/Conv.cpp

if (w.ndims() == x.ndims() + 1) {
  AT_ASSERTM(
      groups > 1,
      "Only group _mkldnn_conv2d weights could have been reordered to 5d");
  kernel_size[0] = w.get_dim(0) * w.get_dim(1);
  std::copy_n(
      w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
  std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```

------

3. Enable DNNL built-in cache

Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.

This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.

------

4. Use 64-bit integer to denote dimensions

We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.

<br>

**Misc changes in each commit:**

**Commit:** change build options

Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.

Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)

------

**Commit:** aten reintegration

- aten/src/ATen/native/mkldnn/BinaryOps.cpp

    Implement binary ops using new operation `binary` provided by DNNL

- aten/src/ATen/native/mkldnn/Conv.cpp

    Clean up group convolution checks
    Simplify conv backward integration

- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

    Simplify prepacking convolution weights

- test/test_mkldnn.py

    Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue

- torch/utils/mkldnn.py

    Prepack weight tensor on module `__init__` to achieve better performance significantly

------

**Commit:** caffe2 reintegration

- caffe2/ideep/ideep_utils.h

    Clean up unused type definitions

- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc

   Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`

- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc

    Clean up group convolution checks
    Revamp convolution API

- caffe2/ideep/operators/conv_transpose_op.cc

    Clean up group convolution checks
    Clean up deconv workaround code

------

**Commit:** custom allocator

- Register c10 allocator as mentioned above

<br><br>

## Performance

We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.

ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%

_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_

† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422

Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results

10% improvement for ResNext with avx512, neutral on avx2

More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP

Reviewed By: yinghai

Differential Revision: D20381325

Pulled By: dzhulgakov

fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
2020-03-26 22:07:59 -07:00
Ailing Zhang
7580470cc5 Update view op list. (#35399)
Summary:
Adding ops to the list based on our discussion. :D
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35399

Differential Revision: D20651393

Pulled By: ailzhang

fbshipit-source-id: 8cf9026d10c0d74117953dbb68ebc2f537be956a
2020-03-25 16:15:00 -07:00
Yuichiro Ueno
aadd0fda8b Document reduce_scatter collective operation (#35274)
Summary:
I don't know why reduce_scatter collective operation is not documented so I add it to the document.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35274

Differential Revision: D20645850

Pulled By: mrshenli

fbshipit-source-id: 0a4458bff1a4e15a4593dd4dcc25e4e0f6e2265d
2020-03-25 13:36:29 -07:00
anjali411
c73e97033a Added type promotion logic for complex numbers (#34093)
Summary:
Issue: https://github.com/pytorch/pytorch/issues/33780
After this PR:
1. dtype promotion logic will correctly work for ops involving complex scalars
2. added alias for complex64 (cfloat) and complex128 (cdouble)
3. added an internal function get_complex_default_dtype (consciously not exposed in public API)
   - sets the default complex dtype to be double if default_dtype is set to double, else float https://github.com/pytorch/pytorch/pull/34093#discussion_r392350224
>>> 1j*torch.ones(2)
tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex64)

>>> torch.set_default_dtype(torch.float64)
>>> 1j*torch.ones(2)
tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex128)

>>> 1j + torch.ones(2)
tensor([(1.0000 + 1.0000j), (1.0000 + 1.0000j)], dtype=torch.complex128)

>>> torch.tensor(1j) + torch.ones(2,2)
tensor([[(1.0000 + 1.0000j), (1.0000 + 1.0000j)],
        [(1.0000 + 1.0000j), (1.0000 + 1.0000j)]], dtype=torch.complex128)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34093

Differential Revision: D20537125

Pulled By: anjali411

fbshipit-source-id: 05fb1f81b8ba039d0b698cdd2c0bbf8b0ce0b767
2020-03-25 09:12:21 -07:00
Michael Carilli
0f0271e255 [RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102)
Summary:
This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140.

The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`).

The present PR restores skipIfRocm.

Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011:

> https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.
>
> The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.
>
> Original description of https://github.com/pytorch/pytorch/pull/32140:
> > Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
> Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081
>
> > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102

Differential Revision: D20596918

Pulled By: ezyang

fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50
2020-03-24 09:08:04 -07:00
Mike Ruberry
7c1ea736ba Extends true_divide to be a method (#34794)
Summary:
Per title. See related https://github.com/pytorch/pytorch/pull/34570.

In PyTorch 1.7 the plan is for torch.div and Python's division operator to perform "true" division, like Python 3, JAX, and NumPy. To facilitate this change, this PR expands true_divide to be a method so it can cover all of torch.div's use cases.

New true_divide tests are added to test_torch.py, test_type_promotion.py, and test_sparse.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34794

Differential Revision: D20545507

Pulled By: mruberry

fbshipit-source-id: 55286f819716c8823d1930441a69008560ac2bd5
2020-03-23 23:12:23 -07:00
Peter Bell
bd0ef784e0 FAQ: Add note about recovering from OOM (#35214)
Summary:
Closes https://github.com/pytorch/pytorch/issues/18853

This documents the workaround needed to solve the issues in https://github.com/pytorch/pytorch/issues/18853
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35214

Differential Revision: D20604877

Pulled By: ezyang

fbshipit-source-id: 71ed13cfa567d8e88fa9f18180a171cd174fb528
2020-03-23 20:22:46 -07:00
Vitaly Fedyunin
40da01db6a Add docs about memory format (#34818)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34818

Test Plan: Imported from OSS

Differential Revision: D20601336

Pulled By: VitalyFedyunin

fbshipit-source-id: d34ad226be950bf134c6b383a4810ea6aa75599e
2020-03-23 15:06:33 -07:00
Jerry Zhang
3fa7813b9f [quant] Add dequantize.tensors (#34348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34348

We need this function to do swap dequantize for prim::ListConstruct since
the output of prim::ListConstruct is a list of Tensors

Test Plan:
.

Imported from OSS

Differential Revision: D20504454

fbshipit-source-id: e6155e37da98e2219a6f79737cd46fe32a509c9f
2020-03-20 22:51:51 -07:00
Xiang Gao
df8d6eeb19 Update docs about DP and DDP for CUDA (#35063)
Summary:
We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063

Differential Revision: D20549621

Pulled By: ngimel

fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543
2020-03-20 20:06:37 -07:00
Mike Ruberry
fe276d541e Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix)
Test Plan: revert-hammer

Differential Revision:
D20541921

Original commit changeset: abb5488dca86

fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4
2020-03-19 22:39:12 -07:00
Michael Carilli
991b97277a [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011)
Summary:
https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.

The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.

Original description of https://github.com/pytorch/pytorch/pull/32140:
> Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

> In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011

Differential Revision: D20541921

Pulled By: ezyang

fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5
2020-03-19 20:18:18 -07:00
albanD
1f4a4aaf64 functional autograd api (#34066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34066

Basic implementation of https://github.com/pytorch/pytorch/issues/30632

Test Plan: Imported from OSS

Differential Revision: D20260307

Pulled By: albanD

fbshipit-source-id: 7db5c2411ddc3e954ff8fbbe93eb3b96a2bcfb2f
2020-03-19 08:24:07 -07:00
Mike Ruberry
9c4683e8e3 Revert D20312366: [pytorch][PR] Added type promotion logic for complex numbers
Test Plan: revert-hammer

Differential Revision:
D20312366

Original commit changeset: 90f00a1a916d

fbshipit-source-id: 4510739a888b2eec5d8a72e792998ac46da6d82a
2020-03-19 05:55:57 -07:00
anjali411
c8f665dcb6 Added type promotion logic for complex numbers (#34093)
Summary:
Issue: https://github.com/pytorch/pytorch/issues/33780
After this PR:
1. dtype promotion logic will correctly work for ops involving complex scalars
2. torch.ComplexFloatTensor, torch.ComplexDoubleTensor works
3. added alias for complex64 (cfloat) and complex128 (cdouble)
4. added an internal function get_complex_default_dtype (consciously not exposed in public API)

>>> 1j*torch.ones(2)
tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex64)

>>> torch.set_default_dtype(torch.float64)
>>> 1j*torch.ones(2)
tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex128)

>>> 1j + torch.ones(2)
tensor([(1.0000 + 1.0000j), (1.0000 + 1.0000j)], dtype=torch.complex128)

>>> torch.tensor(1j) + torch.ones(2,2)
tensor([[(1.0000 + 1.0000j), (1.0000 + 1.0000j)],
        [(1.0000 + 1.0000j), (1.0000 + 1.0000j)]], dtype=torch.complex128)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34093

Differential Revision: D20312366

Pulled By: anjali411

fbshipit-source-id: 90f00a1a916d9c8eeda101eb6e9d250fce569815
2020-03-18 23:36:13 -07:00
Mike Ruberry
3b7e1cd2cc Makes floor_divide a method, adds sparse floor division (#34552)
Summary:
(Updated per review feedback)

`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:

- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors

Tests are added to test_sparse.py and test_torch.py for these new behaviors.

In addition, this PR:

- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU

Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).

The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.

There are two potential follow-up issues suggested by this PR:

- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552

Differential Revision: D20509850

Pulled By: mruberry

fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8
2020-03-18 15:00:53 -07:00
Edward Yang
d0577e19f0 Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only
Test Plan: revert-hammer

Differential Revision:
D20346700

Original commit changeset: 12d77b391731

fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60
2020-03-18 11:42:51 -07:00
Michael Carilli
aaa8f02156 Eager autocasting, out-of-place ops only (#32140)
Summary:
Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests.  Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140

Differential Revision: D20346700

Pulled By: ezyang

fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f
2020-03-18 10:28:21 -07:00
Mike Ruberry
a1eaaea288 Revert D20497453: [pytorch][PR] Makes floor_divide a method, adds sparse floor division
Test Plan: revert-hammer

Differential Revision:
D20497453

Original commit changeset: ac326f2007d8

fbshipit-source-id: b94b89b1a25521506e3d0a6b072d3d4d8c55e63d
2020-03-18 01:48:50 -07:00
Mike Ruberry
b7129050e7 Makes floor_divide a method, adds sparse floor division (#34552)
Summary:
(Updated per review feedback)

`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:

- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors

Tests are added to test_sparse.py and test_torch.py for these new behaviors.

In addition, this PR:

- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU

Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).

The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.

There are two potential follow-up issues suggested by this PR:

- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552

Differential Revision: D20497453

Pulled By: mruberry

fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d
2020-03-18 00:01:45 -07:00
Shen Li
3c48aadd98 Update descriptions for transmitting CUDA tensors (#34888)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34888

Test Plan: Imported from OSS

Differential Revision: D20491408

Pulled By: mrshenli

fbshipit-source-id: 4ca35ac9edd4c1af4f2bae2cfb0f1f6060658d5c
2020-03-17 17:43:48 -07:00
Shen Li
800bdcf000 Removing experimental tag in for RPC and adding experimental tag for RPC+TorchScript (#34887)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34887

Test Plan: Imported from OSS

Differential Revision: D20491409

Pulled By: mrshenli

fbshipit-source-id: ce79c9706eb70a3a52a4032de4f0bd538b694332
2020-03-17 17:43:42 -07:00
Hameer Abbasi
6b701de130 Add types argument to __torch_function__ (#34303)
Summary:
This PR adds the `types` argument to `__torch_function__` as per RFC 0001: https://github.com/pytorch/rfcs/pull/3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34303

Differential Revision: D20474992

Pulled By: ezyang

fbshipit-source-id: cdd40b3b38f3bda4ece8812a629f5db87e919d01
2020-03-17 13:32:00 -07:00
Pearu Peterson
8bae1ed144 PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem - copy (#34721)
Summary:
This is a copy of PR https://github.com/pytorch/pytorch/issues/29488 to help the merging process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34721

Differential Revision: D20444270

Pulled By: vincentqb

fbshipit-source-id: 042c56c8c0dae37834f52b4aee2deae7dd6fa659
2020-03-16 14:13:30 -07:00
Rohan Varma
fd35596585 [docs][1.5] Update distributed autograd note (#34657)
Summary:
- Update API calls `backward` and `optim.step` now that we require `context_id`
- Add notes to clarify purpose of distributed autograd context (this was a source of confusion in some feedback)
- Add note that details why optimizer requires context_id
- Clearly specify that we don't have SMART mode yet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34657

Differential Revision: D20427667

Pulled By: rohan-varma

fbshipit-source-id: 5f8a3539ccf648a78e9e9a0dfdfe389c678b1606
2020-03-12 22:56:32 -07:00
gabloa
a74fbea345 Continuous bernoulli distribution (take 2) (#34619)
Summary:
We recently had a NeurIPS paper (https://arxiv.org/abs/1907.06845 and https://papers.nips.cc/paper/9484-the-continuous-bernoulli-fixing-a-pervasive-error-in-variational-autoencoders) where we introduce a new [0,1]-supported distribution: the continuous Bernoulli. This pull request implements this distribution in pytorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34619

Differential Revision: D20403123

Pulled By: ngimel

fbshipit-source-id: d807c7d0d372c6daf6cb6ef09df178bc7491abb2
2020-03-12 11:53:18 -07:00
Nathan Goldbaum
3f1ba3c465 Redo of "Add API for listing functions overridable by __torch_function__" (#34240)
Summary:
This is a redo of https://github.com/pytorch/pytorch/pull/33791, which was reverted because it introduced a flaky test. The test was flaky and only flaky on Python3.5 because of dict order randomization.

I've fixed the issue with tests clobbering each other in b539fec and removed the override tests for `torch.nn.functional.tanh` and `torch.nn.functional.sigmoid`, which are deprecated and shouldn't be overridable in e0d7402. I also verified that no more test clobbering is happening.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34240

Differential Revision: D20252442

Pulled By: cpuhrsch

fbshipit-source-id: 069568e342a41c90e1dc76cbf85ba4aed47f24be
2020-03-12 10:33:17 -07:00
Michael Suo
c235be42dd [jit] kill script namespace (#34515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515

Once upon a time we thought this was necessary. In reality it is not, so
removing it.

For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.

There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.

Test Plan: Imported from OSS

Differential Revision: D20353503

Pulled By: suo

fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
2020-03-11 23:32:48 -07:00
Samuel
b039bca4db Fix typo in data.rst (#34624)
Summary:
Fix minor typo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34624

Differential Revision: D20401946

Pulled By: ngimel

fbshipit-source-id: 0c6a7d838aa15120b3ecb8b9ba4b57550c9bcd32
2020-03-11 19:40:18 -07:00
Edward Yang
4b929e5466 Revert D20193196: [pytorch][PR] PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem
Test Plan: revert-hammer

Differential Revision:
D20193196

Original commit changeset: 78a487991242

fbshipit-source-id: 8da4f8cb17c45af41e8c0ce80bc72581eb10dbb8
2020-03-11 09:24:34 -07:00
Pearu Peterson
2ec779d46c PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem (#29488)
Summary:
This PR implements the following linear algebra algorithms for low-rank matrices:
- [x] Approximate `A` as `Q Q^H A` - using Algorithm 4.4 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
  + exposed as `torch.lowrank.get_approximate_basis(A, q, niter=2, M=None) -> Q`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] SVD - using Algorithm 5.1 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
  + uses `torch.lowrank.get_approximate_basis`
  + exposed as `torch.svd_lowrank(A, q=6, niter=2, M=None) -> (U, S, V)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] PCA - using `torch.svd_lowrank`
  + uses `torch.svd_lowrank`
  + exposed as `torch.pca_lowrank(A, center=True, q=None, niter=2) -> (U, S, V)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices, uses non-centered sparse matrix algorithm
  + [x] documentation
- [x] generalized eigenvalue solver using the original LOBPCG algorithm [Knyazev, 2001](https://epubs.siam.org/doi/abs/10.1137/S1064827500366124)
  + exposed as `torch.lobpcg(A, B=None, k=1, method="basic", ...)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] generalized eigenvalue solver using robust LOBPCG with orthogonal basis selection [Stathopoulos, 2002](https://epubs.siam.org/doi/10.1137/S1064827500370883)
  + exposed as `torch.lobpcg(A, B=None, k=1, method="ortho", ...)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] generalized eigenvalue solver using the robust and efficient LOBPCG Algorithm 8 from [Duersch et al, 2018](https://epubs.siam.org/doi/abs/10.1137/17M1129830) that switches to orthogonal basis selection automatically
  + the "ortho" method improves iterations so rapidly that in the current test cases it does not make sense to use the basic iterations at all. If users will have matrices for which basic iterations could improve convergence then the `tracker` argument allows breaking the iteration process at user choice so that the user can switch to the orthogonal basis selection if needed. In conclusion, there is no need to implement Algorithm 8 at this point.
- [x] benchmarks
  + [x] `torch.svd` vs `torch.svd_lowrank`, see notebook [Low-rank SVD](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/Low-rank%20SVD.ipynb). In conclusion, the low-rank SVD is going to be useful only for large sparse matrices where the full-rank SVD will fail due to memory limitations.
  + [x] `torch.lobpcg` vs `scipy.sparse.linalg.lobpcg`, see notebook [LOBPCG - pytorch vs scipy](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/LOBPCG%20-%20pytorch%20vs%20scipy.ipynb). In conculsion, both implementations give the same results (up to numerical errors from different methods), scipy lobpcg implementation is generally faster.
  + [x] On very small tolerance cases, `torch.lobpcg` is more robust than `scipy.sparse.linalg.lobpcg` (see `test_lobpcg_scipy` results)

Resolves https://github.com/pytorch/pytorch/issues/8049.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29488

Differential Revision: D20193196

Pulled By: vincentqb

fbshipit-source-id: 78a4879912424595e6ea95a95e483a37487a907e
2020-03-11 07:33:49 -07:00
Mike Ruberry
3671036ef3 Adds true_divide function, analogous to Python 's, JAX's, NumPy's (true) division (#34236)
Summary:
See NumPy's division documentation here: https://numpy.org/doc/1.18/reference/generated/numpy.divide.html#numpy.divide.

True division is the same as PyTorch's default division except when both inputs are integer or bool tensors. In the latter case the inputs are (conceptually) cast to the default floating type before the division is performed.

The function is implemented for dense and sparse tensors and supports exporting to ONNX from PyTorch's eager mode or JIT traces. The function is inherently incompatible with exporting to ONNX via JIT script, and is another datapoint suggesting we should deprecate exporting scripted graphs to ONNX.

Tests are added for the type promotion, named tensor, and ONNX export behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34236

Reviewed By: houseroad

Differential Revision: D20334087

Pulled By: mruberry

fbshipit-source-id: 83d00d886f46f713215d7d9e02ffd043164c57f1
2020-03-09 21:06:33 -07:00