Commit Graph

187 Commits

Author SHA1 Message Date
Michael Carilli
3e6bb5233f Reference amp tutorial (recipe) from core amp docs (#44725)
Summary:
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live.  Core amp docs should reference it.

Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725

Reviewed By: mruberry

Differential Revision: D23723807

Pulled By: ngimel

fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3
2020-09-16 11:37:58 -07:00
Michael Carilli
2fd142a2ef Small clarification to amp gradient penalty example (#44667)
Summary:
requested by https://discuss.pytorch.org/t/what-is-the-correct-way-of-computing-a-grad-penalty-using-amp/95827/3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44667

Reviewed By: mruberry

Differential Revision: D23692768

Pulled By: ngimel

fbshipit-source-id: 83c61b94e79ef9f86abed2cc066f188dce0c8456
2020-09-14 21:56:09 -07:00
Gao, Xiang
5e97f251a8 Enable TF32 support for cuDNN (#40737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737

Reviewed By: mruberry

Differential Revision: D22801525

Pulled By: ngimel

fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2
2020-09-01 15:34:24 -07:00
Kurt Mohler
d7ee84c9b5 Update determinism documentation (#41692)
Summary:
Add user-facing documentation for set_deterministic
Also update grammar and readability in Reproducibility page

Issue https://github.com/pytorch/pytorch/issues/15359

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41692

Reviewed By: ailzhang

Differential Revision: D23433061

Pulled By: mruberry

fbshipit-source-id: 4c4552950803c2aaf80f7bb4792d2095706d07cf
2020-08-31 21:06:24 -07:00
peterjc123
9b05fbd92e Correct the windows docs (#43479)
Summary:
Fixes https://discuss.pytorch.org/t/i-cannot-use-the-pytorch-that-was-built-successfully-from-source-dll-initialization-routine-failed-error-loading-caffe2-detectron-ops-gpu-dll/93243/5?u=peterjc123.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43479

Reviewed By: mrshenli, ngimel

Differential Revision: D23294211

Pulled By: ezyang

fbshipit-source-id: d67df7d0355c2783153d780c94f959758b246d36
2020-08-25 13:41:24 -07:00
Heitor Schueroff de Souza
ffc3da35f4 Don't materialize output grads (#41821)
Summary:
Added a new option in AutogradContext to tell autograd to not materialize output grad tensors, that is, don't expand undefined/None tensors into tensors full of zeros before passing them as input to the backward function.

This PR is the second part that closes https://github.com/pytorch/pytorch/issues/41359. The first PR is https://github.com/pytorch/pytorch/pull/41490.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41821

Reviewed By: albanD

Differential Revision: D22693163

Pulled By: heitorschueroff

fbshipit-source-id: a8d060405a17ab1280a8506a06a2bbd85cb86461
2020-08-11 04:27:07 -07:00
Hameer Abbasi
3d46e02ea1 Add __torch_function__ for methods (#37091)
Summary:
According to pytorch/rfcs#3

From the goals in the RFC:

1. Support subclassing `torch.Tensor` in Python (done here)
2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here)
3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor`
   subclasses (done in https://github.com/pytorch/pytorch/issues/30730)
4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here)
5. Propagating subclass instances correctly also with operators, using
   views/slices/indexing/etc. (done here)
6. Preserve subclass attributes when using methods or views/slices/indexing. (done here)
7. A way to insert code that operates on both functions and methods uniformly
   (so we can write a single function that overrides all operators). (done here)
8. The ability to give external libraries a way to also define
   functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR)

This PR makes the following changes:

1. Adds the `self` argument to the arg parser.
2. Dispatches on `self` as well if `self` is not `nullptr`.
3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`.
4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`.
5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`.

TODO:

- [x] Sequence Methods
- [x] Docs
- [x] Tests

Closes https://github.com/pytorch/pytorch/issues/28361

Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091

Reviewed By: ngimel

Differential Revision: D22765678

Pulled By: ezyang

fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0
2020-08-05 20:44:13 -07:00
peter
192487d716 Update MAGMA to 2.5.3 for Windows (#42410)
Summary:
In order to introduce CUDA 11 build jobs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42410

Reviewed By: malfet

Differential Revision: D22892025

Pulled By: ezyang

fbshipit-source-id: 11bd7507f623d654a589ba00a138f6b947990f4c
2020-08-03 07:43:09 -07:00
Xiao Wang
60e2baf5e0 [doc] Add LSTM non-deterministic workaround (#40893)
Summary:
Related: https://github.com/pytorch/pytorch/issues/35661

Preview
![image](https://user-images.githubusercontent.com/24860335/86861581-4b4c7100-c07c-11ea-950a-3145bfae9af9.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40893

Reviewed By: vincentqb

Differential Revision: D22535418

Pulled By: ngimel

fbshipit-source-id: f194ddaff8ec6d03a3616c87466e2cbbe7e429a9
2020-07-21 16:20:02 -07:00
Mike Ruberry
a0e58996fb Makes the use of the term "module" consistent through the serialization note (#41563)
Summary:
module -> torch.nn.Module or ScriptModule, as appropriate. + bonus grammar fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41563

Reviewed By: gchanan

Differential Revision: D22584173

Pulled By: mruberry

fbshipit-source-id: 8c90f1f9a194bfdb277c97cf02c9b8c1c6ddc601
2020-07-16 14:59:49 -07:00
Mike Ruberry
f49d97a848 Notes for lcm and gcd, formatting doc fixes (#41526)
Summary:
A small PR fixing some formatting in lcm, gcd, and the serialization note. Adds a note to lcm and gcd explaining behavior that is not always defined.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41526

Reviewed By: ngimel

Differential Revision: D22569341

Pulled By: mruberry

fbshipit-source-id: 5f5ff98c0831f65e82b991ef444a5cee8e3c8b5a
2020-07-16 13:15:29 -07:00
anjali411
b9442bb03e Doc note for complex (#41252)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41252

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D22553266

Pulled By: anjali411

fbshipit-source-id: f6dc409da048496d72b29b0976dfd3dd6645bc4d
2020-07-16 08:53:27 -07:00
Xiang Gao
23174ca71b [reland] Enable TF32 support for cuBLAS (#41498)
Summary:
fix rocm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498

Reviewed By: mruberry

Differential Revision: D22560572

Pulled By: ngimel

fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041
2020-07-15 21:00:55 -07:00
Mike Ruberry
60f2fa6a84 Updates serialization note to explain versioned symbols and dynamic versioning (#41395)
Summary:
Doc update intended to clarify and expand our current serialization behavior, including explaining the difference between torch.save/torch.load, torch.nn.Module.state_dict/torch.nn.Module.load_state_dict, and torch.jit.save/torch.jit.load. Also explains, for the time, when historic serialized Torchscript behavior is preserved and our recommendation for preserving behavior (using the same PyTorch version to consume a model as produced it).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41395

Reviewed By: ngimel

Differential Revision: D22560538

Pulled By: mruberry

fbshipit-source-id: dbc2f1bb92ab61ff2eca4888febc21f7dda76ba1
2020-07-15 19:05:19 -07:00
Shen Li
3a63a939d4 Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS
Test Plan: revert-hammer

Differential Revision:
D22517785 (288ece89e1)

Original commit changeset: 87334c893561

fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458
2020-07-15 08:15:48 -07:00
Xiang Gao
288ece89e1 Enable TF32 support for cuBLAS (#40800)
Summary:
Benchmark on a fully connected network and torchvision models (time in seconds) on GA100:

| model              | batch size | forward(TF32) | forward(FP32) | backward(TF32) | backward(FP32) |
|--------------------|------------|---------------|---------------|----------------|----------------|
| FC 512-128-32-8    | 512        | 0.000211      | 0.000321      | 0.000499       | 0.000532       |
| alexnet            | 512        | 0.0184        | 0.0255        | 0.0486         | 0.0709         |
| densenet161        | 128        | 0.0665        | 0.204         | 0.108          | 0.437          |
| googlenet          | 256        | 0.0925        | 0.110         | 0.269          | 0.326          |
| inception_v3       | 256        | 0.155         | 0.214         | 0.391          | 0.510          |
| mnasnet1_0         | 512        | 0.108         | 0.137         | 0.298          | 0.312          |
| mobilenet_v2       | 512        | 0.114         | 0.294         | 0.133          | 0.303          |
| resnet18           | 512        | 0.0722        | 0.100         | 0.182          | 0.228          |
| resnext50_32x4d    | 256        | 0.170         | 0.237         | 0.373          | 0.479          |
| shufflenet_v2_x1_0 | 512        | 0.0463        | 0.0473        | 0.125          | 0.123          |
| squeezenet1_0      | 512        | 0.0870        | 0.0948        | 0.205          | 0.214          |
| vgg16              | 256        | 0.167         | 0.234         | 0.401          | 0.502          |
| wide_resnet50_2    | 512        | 0.186         | 0.310         | 0.415          | 0.638          |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800

Reviewed By: mruberry

Differential Revision: D22517785

Pulled By: ngimel

fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e
2020-07-14 13:21:10 -07:00
Michael Carilli
d927aee312 Small clarification of torch.cuda.amp multi-model example (#41203)
Summary:
some people have been confused by `retain_graph` in the snippet, they thought it was an additional requirement imposed by amp.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41203

Differential Revision: D22463700

Pulled By: ngimel

fbshipit-source-id: e6fc8871be2bf0ecc1794b1c6f5ea99af922bf7e
2020-07-10 11:13:26 -07:00
anjali411
db38487ece Autograd Doc for Complex Numbers (#41012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41012

Test Plan: Imported from OSS

Differential Revision: D22476911

Pulled By: anjali411

fbshipit-source-id: 7da20cb4312a0465272bebe053520d9911475828
2020-07-10 09:57:43 -07:00
Edward Leardi
6b50874cb7 Fix HTTP links in documentation to HTTPS (#40878)
Summary:
I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878

Differential Revision: D22404647

Pulled By: ngimel

fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3
2020-07-06 20:05:21 -07:00
Ailing Zhang
d7cd16858f Add documentation about storage sharing is preserved and serialized f… (#40412)
Summary:
…ile size.
fixes https://github.com/pytorch/pytorch/issues/40157
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40412

Reviewed By: ezyang

Differential Revision: D22265639

Pulled By: ailzhang

fbshipit-source-id: 16b0301f16038bd784e7e92f63253fedc7820adc
2020-06-29 17:23:29 -07:00
Jeong Ukjae
b4db529352 Fix wrong link in docs/source/notes/ddp.rst (#40484)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40484

Differential Revision: D22259834

Pulled By: mrshenli

fbshipit-source-id: 4ec912c600c81010bdb2778c35cbb0321480199f
2020-06-28 13:55:56 -07:00
Wanchao Liang
eebd492dcf [doc] fix autograd doc subsubsection display issue (#40582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40582

There's a misuse in the `requires_grad` with ~~~~~~, "~~~~" is not a official section marker, change it to "^^^^^" to denote subsubsections, also fix the other places where we should use subsection "-----" instead of subsubsection "^^^^"

see https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections

Before:
<img width="712" alt="rst_before" src="https://user-images.githubusercontent.com/9443650/85789835-2226fa80-b6e4-11ea-97b6-2b19fdf324a4.png">
After:
<img width="922" alt="rst_after" src="https://user-images.githubusercontent.com/9443650/85789856-281cdb80-b6e4-11ea-925f-cb3f4ebaa2bf.png">

Test Plan: Imported from OSS

Differential Revision: D22245747

Pulled By: wanchaol

fbshipit-source-id: 11548ed42f627706863bb74d4269827d1b3450d4
2020-06-25 23:28:33 -07:00
Michael Carilli
3b040c478a Make custom_fwd a no-op when not executed under autocast (#36171)
Summary:
Currently, a custom autograd function written with
```
torch.cuda.amp.custom_fwd(cast_inputs=dtype)
def forward(ctx, *args):
    ...
```
casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region.  I think I had the wrong idea there.  Autocast-disabled regions should give the user control of input types.  Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions.  C++-side casting wrappers have no effect when autocast is disabled, and  `custom_fwd`'s casting should behave the same way.

The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171

Differential Revision: D22179511

Pulled By: ngimel

fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd
2020-06-23 10:23:02 -07:00
Rohan Varma
ae2f1f0372 [DDP Note] Remove refs to RoundRobin PG until we officially support it (#40380)
Summary:
Removes line mentioning `ProcessGroupRoundRobin` since we don't intend it to be used as a public API just yet. We can add this back when we officially support the API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40380

Differential Revision: D22165556

Pulled By: rohan-varma

fbshipit-source-id: 24d0477d881dc74f2ff579de61dfd1ced2b09e75
2020-06-22 16:19:29 -07:00
anjali411
8ec2ae9a9f Add view_as_real, view_as_complex for complex tensors (#39099)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39099

Test Plan: Imported from OSS

Differential Revision: D22057886

Pulled By: anjali411

fbshipit-source-id: bad5ba7097ba0dd13f2c549b2463094dee9afa14
2020-06-22 15:15:27 -07:00
James Reed
c73095e78f Add note to serialization docs about zipfile format (#40288)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40288

Test Plan: Imported from OSS

Differential Revision: D22140324

Pulled By: jamesr66a

fbshipit-source-id: 01d7aa642ed2f4e4bdac4b7f3223bf4d7e62fd4d
2020-06-19 13:40:08 -07:00
Alban Desmaison
b88b7d552f Prevent custom Functions from creating non differentiable type that requires grad (#38326)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38326

Test Plan: Imported from OSS

Differential Revision: D21668740

Pulled By: albanD

fbshipit-source-id: f452f65e76003492055311523a652937b1300183
2020-05-21 08:30:14 -07:00
Ilia Cherniavskii
43dd8760d7 Move ThreadLocalDebugInfo to c10 (#37774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774

Move ThreadLocalDebugInfo from ATen to C10

Test Plan: Imported from OSS

Differential Revision: D21384249

Pulled By: ilia-cher

fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2
2020-05-11 19:27:41 -07:00
毛毛
19d6e32e9a fix sample code (#38002)
Summary:
Make Linear layer working correct when bias is False
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38002

Differential Revision: D21509679

Pulled By: malfet

fbshipit-source-id: c7077992cf414ecc557b39e5ed1e39ef01c8b347
2020-05-11 15:34:09 -07:00
Ilia Cherniavskii
2d708cefcc Move RecordFunction into ATen (#37548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548

Moving RecordFunction from torch::autograd::profiler into at namespace

Test Plan:
CI

Imported from OSS

Differential Revision: D21315852

fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa
2020-05-07 14:52:39 -07:00
Ilia Cherniavskii
c24c5f9684 Make RecordFunction callbacks thread local and modernize interface (#37491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37491

This PR modernizes RecordFunction API and adds thread local callbacks
in addition to the global ones

Changes:
 - support for TLS callbacks, this is going to be the foundation of profiler and other tools
 - modernize interface around simple set of functions (add|remove|has|clear)(Global|ThreadLocal)(Callback) and adding RecordFunctionCallback to easily construct callbacks to be passed
 - we also add `.setShouldRun` into the callback interface to support cases when simple uniform sampling is not enough
 - to properly support add/remove introduce the idea of callback handle returned by add
 - internal implementation still uses SmallVector to store intermediate state (as before) - in this case these are vector of handles of callbacks that were picked to run
 - to speed up runtime we keep these vectors sorted, this way we can quickly enumerate callbacks that need to be run
 - added tests for new functionality

Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install
./build/bin/test_jit
CI

record_function_benchmark: https://gist.github.com/ilia-cher/f1e094dae47fe23e55e7672ac4dcda2f

Imported from OSS

Differential Revision: D21300448

fbshipit-source-id: 6d55c26dbf20b33d35c3f1604dcc07bb063c8c43
2020-05-07 14:51:02 -07:00
Edward Yang
4fef3763dd Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778)
Summary:
Original PR: https://github.com/pytorch/pytorch/pull/37419

cc mattip suo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37778

Differential Revision: D21385774

Pulled By: ezyang

fbshipit-source-id: 5de532faab8bae132736b6b5189e0ee2ac9935be
2020-05-04 14:32:35 -07:00
Michael Suo
20f7e62b1d Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings
Test Plan: revert-hammer

Differential Revision:
D21337640

Original commit changeset: d4ad198780c3

fbshipit-source-id: fa9ba6ac542173a50bdb45bfa12f3fec0ed704fb
2020-05-04 10:57:55 -07:00
mattip
f10fbcc820 Split up documentation into subpages and clean up some warnings (#37419)
Summary:
xref gh-32838, gh-34032

This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages.

Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py`

I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419

Differential Revision: D21337640

Pulled By: ezyang

fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f
2020-05-04 09:39:22 -07:00
Ilia Cherniavskii
d068a456d3 [resubmit] Enable global observers API (#37382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37382

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: jamesr66a

Differential Revision: D21268320

fbshipit-source-id: 93207e3b55325d20dcc5b1e8f448ab86933321da
2020-04-28 10:49:31 -07:00
Michael Suo
20143e5f27 Revert D21245094: [resubmit] Enable global observers API
Test Plan: revert-hammer

Differential Revision:
D21245094

Original commit changeset: 595e41b18206

fbshipit-source-id: 90344b361857d76ce5db75438c949dad1f5f186b
2020-04-27 16:19:46 -07:00
Wanchao Liang
1039b95ff0 [autograd] add documentation about multithread autograd (#37020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37020

Add multithread autograd documentation to the doc note.

Test Plan: Imported from OSS

Differential Revision: D21260996

Pulled By: wanchaol

fbshipit-source-id: 91d523560268ae62d4c6d773121b282ba837a561
2020-04-27 15:53:21 -07:00
Ilia Cherniavskii
5fab4c30dd [resubmit] Enable global observers API (#37292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37292

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: jamesr66a

Differential Revision: D21245094

fbshipit-source-id: 595e41b18206d2ba4cf639cb320f630907868b3f
2020-04-27 14:24:51 -07:00
Ilia Cherniavskii
856e8cf028 Revert D21213786: Enable global observers API
Test Plan: revert-hammer

Differential Revision:
D21213786

Original commit changeset: e618254da74a

fbshipit-source-id: 425ea5d44fa55655ec0dd586c5075996b926177b
2020-04-25 00:59:24 -07:00
Ilia Cherniavskii
6e659e928b Enable global observers API (#37195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37195

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: ngimel

Differential Revision: D21213786

fbshipit-source-id: e618254da74a4f1ce16c51a3869bbd75a4f561ad
2020-04-24 23:49:28 -07:00
Alban Desmaison
3799d1d74a Fix many doc issues (#37099)
Summary:
Fix https://github.com/pytorch/pytorch/issues/35643 https://github.com/pytorch/pytorch/issues/37063 https://github.com/pytorch/pytorch/issues/36307 https://github.com/pytorch/pytorch/issues/35861 https://github.com/pytorch/pytorch/issues/35299 https://github.com/pytorch/pytorch/issues/23108 https://github.com/pytorch/pytorch/issues/4661

Just a bunch of small updates on the doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37099

Differential Revision: D21185713

Pulled By: albanD

fbshipit-source-id: 4ac06d6709dc0da6109a6ad3daae75667ee5863e
2020-04-23 10:01:03 -07:00
Michael Carilli
e6bc34f549 Amp gradient accumulation example (#36601)
Summary:
Several people have asked me about proper Amp usage with gradient accumulation.  In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step.  This PR adds a minimal accumulation example.

I built the docs locally and it looks free from sphinx errors, at least.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601

Differential Revision: D21082295

Pulled By: ngimel

fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac
2020-04-17 09:56:36 -07:00
Jessica Lin
ac950bb9c8 Update docs for master to remove Python 2 references (#36336)
Summary:
Fix compile error from original PR in jit_language_references.rst: https://github.com/pytorch/pytorch/pull/36114

Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265

With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36336

Differential Revision: D21057507

Pulled By: jlin27

fbshipit-source-id: 993a763f1ecb16dad859bc02a07625ddc023645d
2020-04-16 10:15:48 -07:00
Edward Yang
6016f694c0 Revert D20901746: [pytorch][PR] Update docs for master to remove Python 2 references
Test Plan: revert-hammer

Differential Revision:
D20901746

Original commit changeset: 07f8dc8e6fab

fbshipit-source-id: 13c55597f9f79b8473210cf35a5a0f1fb34bae39
2020-04-08 14:49:11 -07:00
Jessica Lin
43234be525 Update docs for master to remove Python 2 references (#36114)
Summary:
Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265

With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36114

Differential Revision: D20901746

Pulled By: jlin27

fbshipit-source-id: 07f8dc8e6fab0b232e5048a63079cab0c433c85f
2020-04-07 16:13:18 -07:00
Rohan Varma
1f06db2579 Refactored rpc docs (#35109)
Summary:
Reorganize as per jlin27 's comments. Screenshots added in comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109

Differential Revision: D20788774

Pulled By: rohan-varma

fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766
2020-04-01 02:01:34 -07:00
Ilia Cherniavskii
bc6bd0bb1a Debug Information Guard
Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other)

Test Plan: CI test/cpp/jit

Reviewed By: dzhulgakov

Differential Revision: D20602775

fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb
2020-04-01 01:55:29 -07:00
Ilia Cherniavskii
800d5617c0 Recording of TorchScript functions (#34710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710

Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.

Test Plan: unit test (test_misc.cpp/testRecordFunction)

Reviewed By: gdankel, dzhulgakov

Differential Revision: D20158523

fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
2020-03-31 00:33:23 -07:00
pinzhenx
bd604cb5b7 Upgrade MKL-DNN to DNNL v1.2 (#32422)
Summary:
## Motivation

This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.

DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.

This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.

<br>

## What's included?

Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:

<br>

**General:**

1. Replace op-level allocator with global-registered allocator

```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);

// after
ideep::sum::compute(scales, {x, y}, z);
```

The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.

```
RegisterEngineAllocator cpu_alloc(
  ideep::engine::cpu_engine(),
  [](size_t size) {
    return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
  },
  [](void* p) {
    c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
  }
);
```
------

2. Simplify group convolution

We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.

As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.

```
// aten/src/ATen/native/mkldnn/Conv.cpp

if (w.ndims() == x.ndims() + 1) {
  AT_ASSERTM(
      groups > 1,
      "Only group _mkldnn_conv2d weights could have been reordered to 5d");
  kernel_size[0] = w.get_dim(0) * w.get_dim(1);
  std::copy_n(
      w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
  std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```

------

3. Enable DNNL built-in cache

Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.

This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.

------

4. Use 64-bit integer to denote dimensions

We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.

<br>

**Misc changes in each commit:**

**Commit:** change build options

Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.

Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)

------

**Commit:** aten reintegration

- aten/src/ATen/native/mkldnn/BinaryOps.cpp

    Implement binary ops using new operation `binary` provided by DNNL

- aten/src/ATen/native/mkldnn/Conv.cpp

    Clean up group convolution checks
    Simplify conv backward integration

- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

    Simplify prepacking convolution weights

- test/test_mkldnn.py

    Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue

- torch/utils/mkldnn.py

    Prepack weight tensor on module `__init__` to achieve better performance significantly

------

**Commit:** caffe2 reintegration

- caffe2/ideep/ideep_utils.h

    Clean up unused type definitions

- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc

   Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`

- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc

    Clean up group convolution checks
    Revamp convolution API

- caffe2/ideep/operators/conv_transpose_op.cc

    Clean up group convolution checks
    Clean up deconv workaround code

------

**Commit:** custom allocator

- Register c10 allocator as mentioned above

<br><br>

## Performance

We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.

ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%

_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_

† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422

Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results

10% improvement for ResNext with avx512, neutral on avx2

More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP

Reviewed By: yinghai

Differential Revision: D20381325

Pulled By: dzhulgakov

fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
2020-03-26 22:07:59 -07:00
Michael Carilli
0f0271e255 [RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102)
Summary:
This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140.

The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`).

The present PR restores skipIfRocm.

Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011:

> https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.
>
> The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.
>
> Original description of https://github.com/pytorch/pytorch/pull/32140:
> > Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
> Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081
>
> > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102

Differential Revision: D20596918

Pulled By: ezyang

fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50
2020-03-24 09:08:04 -07:00