Commit Graph

428 Commits

Author SHA1 Message Date
Eddie Yan
d892d5d682 [CUBLAS][TF32][CUDNN] Update numerical_accuracy.rst (#79537)
CC @mruberry @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79537
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-09-07 18:30:26 +00:00
Christian Jauvin
089101fc82 Fix small typo in cuda.rst (#84012)
This fixes a very minor typo in the CUDA semantics doc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84012
Approved by: https://github.com/malfet
2022-08-26 04:53:49 +00:00
soulitzer
e60f8f4f60 Improve autograd custom function docs (#81340)
Fixes https://github.com/pytorch/pytorch/issues/81223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81340
Approved by: https://github.com/albanD
2022-07-21 19:54:30 +00:00
Danielle Pintz
8926b5b9c2 Fix typos in docs: Profiler and CUDA semantics (#80406)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80406
Approved by: https://github.com/robieta
2022-07-13 18:53:02 +00:00
eqy
eff74ed7bd [AMP] Use generic autocast in example, specify dtype (#79579)
CC @mruberry @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79579
Approved by: https://github.com/mruberry, https://github.com/ngimel
2022-06-17 21:32:51 +00:00
Rhys Goodall
62ba548cac [DOC] Missing line in serialization notes (#79454)
Small typo fix to serialization docs where there was a missing line in one of the examples.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79454
Approved by: https://github.com/mruberry
2022-06-17 18:26:47 +00:00
Mike Ruberry
1d47e0df5a Updates TF32 docs (#79401)
Updates TF32 docs to reflect PyTorch 1.12 updates.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79401
Approved by: https://github.com/ngimel
2022-06-13 21:02:00 +00:00
lezcano
a8ea58afee Add randomness case to the autograd notes
I also took this chance to clean a bit the sphinx formatting and
reworded a few minor things.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78617

Approved by: https://github.com/soulitzer, https://github.com/albanD
2022-06-08 21:27:03 +00:00
Kurt Mohler
a4403c17c7 Improve reproducibility docs for RNG (#78849)
* Mention that operations may change RNG state and how to deal with it
* Add link to Reproducibility note in `use_deterministic_algorithms` docs
* Also fix a broken link

Fixes #77206

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78849
Approved by: https://github.com/mruberry
2022-06-06 14:53:59 +00:00
albanD
b30b1f3dec update mps note with more details (#78669)
Follow up to the comments in https://github.com/pytorch/pytorch/pull/77767#pullrequestreview-978807521
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78669
Approved by: https://github.com/kulinseth, https://github.com/anjali411
2022-06-02 20:53:19 +00:00
vfdev
642fc94501 Update extending.rst (#78707)
Follow-up fix for https://github.com/pytorch/pytorch/pull/78073 : https://github.com/pytorch/pytorch/pull/78073#discussion_r887621219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78707
Approved by: https://github.com/albanD
2022-06-02 17:24:00 +00:00
Philip Meier
288b23bc52 fix MetadataTensor example (#78073)
```py
[bar if bar for bar in foo]
```

is invalid Python syntax. The `if` clause needs to be at the end:

```py
[bar for bar in foo if bar]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78073
Approved by: https://github.com/albanD
2022-05-31 21:34:19 +00:00
Alban Desmaison
dcd2ba3538 improve mps note to describe the different functions available (#77767)
Fixing https://github.com/pytorch/pytorch/issues/77748
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77767
Approved by: https://github.com/soulitzer
2022-05-18 20:17:23 +00:00
Jeff Daily
de86146c61 rocblas alt impl during backward pass only (#71881)
In preparation of adopting future rocblas library options, it is necessary to track when the backward pass of training is executing.  The scope-based helper class `BackwardPassGuard` is provided to toggle state.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71881
Approved by: https://github.com/albanD
2022-05-18 19:42:58 +00:00
Kulin Seth
e011a8e18b Enable PyTorch operations on MPS Backend. (#77343)
Add PyTorch operations to MPS backend.

- https://github.com/pytorch/pytorch/issues/77394
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343
Approved by: https://github.com/albanD
2022-05-13 18:28:53 +00:00
James Reed
286d788029 Properly capitalize PyTorch (#77308)
pytorch -> PyTorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77308
Approved by: https://github.com/bertmaher, https://github.com/mthrok
2022-05-12 18:07:32 +00:00
Alban Desmaison
d5210a4269 Add gradient choice detail to autograd doc
Trying to clarify what our backward functions should compute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76898
Approved by: https://github.com/soulitzer, https://github.com/Lezcano
2022-05-06 21:12:25 +00:00
Smark
ab57876420 fix docs error in Autograd Mechanics
Fixes #74682

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74807
Approved by: https://github.com/albanD
2022-03-29 18:32:16 +00:00
leslie-fang-intel
3a112ebb57 add autocast cpu doc
As discussed in https://github.com/pytorch/pytorch/issues/55374#issuecomment-968333614, here we update the cpu autocast operation list in autocast API document.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68567
Approved by: https://github.com/ezyang
2022-03-22 02:02:43 +00:00
Jaewon Lee
11ea09effc [CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync (#74261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74261

### Goal
Implement a cheap way to reclaim GPU memory (garbage collection) without incurring GPU sync.

### Why do we need this?
Currently, there are only two ways to reclaim GPU memory block already assigned to a particular stream.

- `release_available_cached_blocks(params)`: Free blocks exceeding the `CachingAllocatorConfig::max_split_size()` until we can satisfy the request.

Issue: If the `max_split_size` is unset (default), this function is a no-op. Even if this is set, the reclamation is quite conservative (e.g., never frees blocks under max_split_size).

- `release_cached_blocks()`: Waits for all the in-flight events and then reclaim blocks.

Issue: 'waiting for all event' is very expensive as it will likely stall all the GPU operations. Many GPU applications without a proper handling of potential GPU throttling would suffer/crash.

### Proposed idea
- If the garbage collection threshold is set, try to reclaim some memory blocks *without* synchronization. It should be safe to do so, as `release_available_cached_blocks` essentially does the same thing (but less aggressively).
- GC is triggered only when we fail to serve a `malloc` request from the block pool. No need to free blocks when the block pool is functioning just fine.
- Prioritize reclaiming blocks that weren't reused for long time. Reclamation stops once the used memory capacity < threshold.
- This code path is totally optional; by default it won't be invoked.

Test Plan:
- Unit tests
- Manually checked that the GPU memory usage stays as indicated by the garbage collector. If not the caching allocator at least tries to keep freeing the blocks.

Reviewed By: jianyuh

Differential Revision: D34482514

fbshipit-source-id: d5eae62ac60b94b0bca851f9d233a092d086e3c2
(cherry picked from commit 05780f1ed4b176f05e765b2411c9eaa2eaeb48b0)
2022-03-21 18:46:02 +00:00
Banit Agrawal
ac3effd150 [PyTorch GPU Allocator] Better use of blocks with rounding of allocation sizes (#74213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74213

In the current CUDACachingAllocator, the sizes are rounded up in multiple of blocks size of 512, so this works for smaller sizes. However for large sizes, we can have lots of different size blocks in the larger pool. This is problematic when we have variable batch sizes 1001, 1021, 1023 -> all will go to different block size and will create different size of blocks. This will create lots of unused blocks and will waste GPU memory capacity.

This diff adds a rounding approach to allocation size. It rounds up the size to nearest power-of-2 divisions and the power2-division can be changed with env variable setting.

   For example, if we need to round-up  size of1200 and if number of divisions is 4,
   the size 1200 lies between 1024 and 2048 and if we do 4 divisions between
   them, the values are 1024, 1280, 1536, and 1792. So the function will
   return 1280 as the nearest ceiling of power-2 division.

env setting:
   export PYTORCH_CUDA_ALLOC_CONF=roundup_power2_divisions:4
ghstack-source-id: 151446017

Reviewed By: ezyang

Differential Revision: D34868036

fbshipit-source-id: 494785add16e6b37c920dcb5a2b81d4c637b554a
(cherry picked from commit 548454ccacbd8700e7ffd2d762e40b4ba37abbae)
2022-03-16 02:53:53 +00:00
Rohit Goswami
801abc0cdd MAINT, DOC: Trivial spellings and warnings (#72745)
Summary:
Fixes N/A.
Just minor annoyances.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72745

Reviewed By: samdow

Differential Revision: D34216016

Pulled By: albanD

fbshipit-source-id: b65600b50e41a1dd7bf7d076b0dd3e2d1c99caf9
(cherry picked from commit b959392a5f)
2022-02-14 21:55:19 +00:00
Felix Divo
340fae4363 [Doc] Better formatting in autograd.rst (#72586)
Summary:
See title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72586

Reviewed By: soulitzer

Differential Revision: D34177704

Pulled By: albanD

fbshipit-source-id: 1adf6ebed4f64ec4d8fff160df300c8e6ee528ea
(cherry picked from commit bbb586d67d)
2022-02-11 22:46:10 +00:00
Felix Divo
25fba4a019 [DOC] Add link to "double backward" from "extending pytorch" page (#72584)
Summary:
It is probably the most user friendly to link to that (lesser known?) feature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72584

Reviewed By: soulitzer

Differential Revision: D34173999

Pulled By: albanD

fbshipit-source-id: 99fff7a55412faf54888f8317ab2388f4d7d30e4
(cherry picked from commit 2191ee7657)
2022-02-11 20:34:13 +00:00
Mike Ruberry
9b9b878c89 Fixes jiterator cache macro include + updates CUDA note with cache variables (#71452)
Summary:
Per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71452

Reviewed By: ngimel

Differential Revision: D33646495

Pulled By: mruberry

fbshipit-source-id: bbf627e6d7a724a83a3ea2ae9c0f50430f8d578e
(cherry picked from commit d1e72b144a)
2022-01-19 03:45:05 +00:00
Rohan Varma
4fd1992a60 [Docs][BE] DDP doc fix (#71363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71363

Looks like DDP example is currently broken as per
https://discuss.pytorch.org/t/official-ddp-example-is-broken/141493. Fix the
issue by setting the correct env variable.
ghstack-source-id: 147080377

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D33607250

fbshipit-source-id: e0e7d03cc365c186253b959c4c5405a5e3609218
(cherry picked from commit 32472884ec)
2022-01-18 22:24:51 +00:00
Jake Tae
23f902f7e4 Fix incorrect variable in autograd docs (#70884)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68362.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70884

Reviewed By: mruberry

Differential Revision: D33463331

Pulled By: ngimel

fbshipit-source-id: 834ba9c450972710e0424cc92af222551f0b4a4a
2022-01-06 20:53:10 -08:00
Peter Bell
e279963eef Remove remaining THC code (#69039)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32872476

Pulled By: ngimel

fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31
2021-12-08 12:18:08 -08:00
Rodrigo Bermúdez Schettino
1a202b0c39 Docs: Fix broken code syntax in autograd.rst (#69362)
Summary:
The backticks around `nn.Parameters` were not rendered correctly because the word was enclosed in an italics block.
Spotted the issue on https://pytorch.org/docs/stable/notes/autograd.html#locally-disable-grad-doc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69362

Reviewed By: zou3519

Differential Revision: D32924093

Pulled By: albanD

fbshipit-source-id: 5a310ac3f3d13a5116f7aa911817b9452eee711d
2021-12-07 12:03:15 -08:00
Michael Carilli
da023611d7 [CUDA graphs] Fixes make_graphed_callables example typos (#69379)
Summary:
cc mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69379

Reviewed By: mruberry

Differential Revision: D32841260

Pulled By: ngimel

fbshipit-source-id: a7d0b9db0578526907547b201eddd55827812b63
2021-12-03 16:51:14 -08:00
Elio
088a4feb41 Update the documentation for AMP with DataParallel (#69218)
Summary:
Following https://github.com/pytorch/pytorch/issues/60540 and pull request https://github.com/pytorch/pytorch/issues/43102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69218

Reviewed By: gchanan

Differential Revision: D32803814

Pulled By: ngimel

fbshipit-source-id: 06fdbbee2c7734153271be70ec4bc24263c8c367
2021-12-03 14:58:47 -08:00
Vansh Sharma
ff125a3624 Minor changes in documentation (#68557)
Summary:
Fixed some small typos

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68557

Reviewed By: mruberry

Differential Revision: D32538749

Pulled By: ngimel

fbshipit-source-id: 09a9cd4031463b6a40d7307bd8fcb7d364444ac3
2021-11-18 17:57:16 -08:00
eqy
790763b0fe Add an option to disable reduced precision reductions for FP16 GEMM (#67946)
Summary:
https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions
`torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = `
rather than making it the default behavior.

CC ngimel ptrblck
stas00 Note that the behavior after the previous PR can be replicated with
`torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946

Reviewed By: zou3519

Differential Revision: D32289896

Pulled By: ngimel

fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe
2021-11-09 17:27:20 -08:00
Alban Desmaison
708f7b1209 Update extending doc to cover forward mode AD (#66962)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66962

Reviewed By: VitalyFedyunin

Differential Revision: D31897782

Pulled By: albanD

fbshipit-source-id: 64164783a14a7ed4cedc17da28f1181d9807a499
2021-10-27 14:18:38 -07:00
Natalia Gimelshein
fdd9f49cf5 add a note on numerical accuracy (#65947)
Summary:
Per title
Fixes https://github.com/pytorch/pytorch/issues/54437

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65947

Reviewed By: albanD

Differential Revision: D31612445

Pulled By: ngimel

fbshipit-source-id: 5c155891a088aef3b9813f253d0dc1ee4d51ae1c
2021-10-13 12:43:55 -07:00
Rodrigo Berriel
7e772e7685 Update link to tutorial on defining NN modules (#65534)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65527. Please, see my comment in the issue: https://github.com/pytorch/pytorch/issues/65527#issuecomment-925863193. The file was renamed in ce58d5904c (diff-e5ef486bd89eb38de15752211d9437953681b8caa8f44d7c86bb820d13151df2), but the link in this repository was not updated.

It doesn't change the fact that the old link is still working, but I guess this has to be fixed in [pytorch/tutorials](https://github.com/pytorch/tutorials) instead of here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65534

Reviewed By: soulitzer

Differential Revision: D31144269

Pulled By: H-Huang

fbshipit-source-id: f70744a21113b7dc84510e2992d87f0fed793985
2021-09-23 11:26:50 -07:00
Rodrigo Berriel
f0ada4bd54 [docs] Remove .data from some docs (#65358)
Summary:
Related to https://github.com/pytorch/pytorch/issues/30987. Fix the following task:

- [ ] Remove the use of `.data` in all our internal code:
  - [ ] ...
  - [x] `docs/source/scripts/build_activation_images.py` and `docs/source/notes/extending.rst`

In `docs/source/scripts/build_activation_images.py`, I used `nn.init` because the snippet already assumes `nn` is available (the class inherits from `nn.Module`).

cc albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65358

Reviewed By: malfet

Differential Revision: D31061790

Pulled By: albanD

fbshipit-source-id: be936c2035f0bdd49986351026fe3e932a5b4032
2021-09-21 06:32:31 -07:00
Michael Carilli
e3210ca184 [CUDA graphs] Beta, not prototype (#65247)
Summary:
Powers have decided this API should be listed as beta.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65247

Reviewed By: malfet

Differential Revision: D31057940

Pulled By: ngimel

fbshipit-source-id: 137b63cbd2c7409fecdc161a22135619bfc96bfa
2021-09-20 13:32:36 -07:00
albanD
473e55d5b2 Use classmethods for overrides (#64841)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991424

Pulled By: albanD

fbshipit-source-id: 551e2119768f3a4292713f3bfa83930f5506adbd
2021-09-17 08:32:49 -07:00
Jane Xu
4c4c03124b Remove old references to 9.2 in documentation (#65059)
Summary:
Removes references in .rst and README.md and comments in the Dockerfile

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65059

Reviewed By: malfet

Differential Revision: D30961110

Pulled By: janeyx99

fbshipit-source-id: 702a9a81bf08125ec4ac38bc656fc2c128c30018
2021-09-16 13:24:05 -07:00
Michael Carilli
36cac2be4d [CUDA graphs] moves memory sharing intro paragraph (#64996)
Summary:
Puts memory sharing intro under Sharing memory... header, where it should have been all along.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996

Reviewed By: mruberry

Differential Revision: D30948619

Pulled By: ngimel

fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2
2021-09-14 17:53:43 -07:00
Michael Carilli
8d08b103be [CUDA graphs] Prototype API and documentation (#63269)
Summary:
RFC: https://github.com/pytorch/pytorch/issues/61880

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63269

Reviewed By: mruberry

Differential Revision: D30596643

Pulled By: ngimel

fbshipit-source-id: b1f8061406364b667e2c2d4d30fbce1f0d8456be
2021-08-31 13:34:23 -07:00
Joel Schlosser
196fd3ee7a Modules note v2 (#63963)
Summary:
This PR expands the [note on modules](https://pytorch.org/docs/stable/notes/modules.html) with additional info for 1.10.

It adds the following:
* Examples of using hooks
* Examples of using apply()
* Examples for ParameterList / ParameterDict
* register_parameter() / register_buffer() usage
* Discussion of train() / eval() modes
* Distributed training overview / links
* TorchScript overview / links
* Quantization overview / links
* FX overview / links
* Parametrization overview / link to tutorial

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63963

Reviewed By: albanD

Differential Revision: D30606604

Pulled By: jbschlosser

fbshipit-source-id: c1030b19162bcb5fe7364bcdc981a2eb6d6e89b4
2021-08-27 11:30:18 -07:00
Jithun Nair
730ce29baf Add note on ifdefing based on CUDA_VERSION for ROCm path (#62850)
Summary:
CUDA_VERSION and HIP_VERSION follow very unrelated versioning schemes, so it does not make sense to use CUDA_VERSION to determine the ROCm path. This note explicitly addresses it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62850

Reviewed By: mruberry

Differential Revision: D30547562

Pulled By: malfet

fbshipit-source-id: 02990fa66a88466c2330ab85f446b25b78545150
2021-08-25 15:02:03 -07:00
Victor Quach
b95ce1591d Add docs describing saved tensor hooks (#62362)
Summary:
Add section to the Autograd mechanics docs to describe the recently
exposed saved tensors (https://github.com/pytorch/pytorch/issues/52451), how to register packing / unpacking
hooks (https://github.com/pytorch/pytorch/issues/60975) and how to use default hooks (https://github.com/pytorch/pytorch/issues/61834)

Sister PR: https://github.com/pytorch/pytorch/issues/62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62362

Reviewed By: soulitzer

Differential Revision: D30453177

Pulled By: Varal7

fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa
2021-08-20 11:10:51 -07:00
soulitzer
2f615f6313 Improve custom function docs (#60312)
Summary:
- Adds some code examples for `ctx` methods and make requirements of arguments more clear
- Type annotations for `save_for_backward`, `mark_dirty`, `mark_non_differentiable`, and `set_materialize_grads` (BC-breaking?)
- Refactor `torch.autograd.Function` doc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60312

Reviewed By: VitalyFedyunin

Differential Revision: D30314961

Pulled By: soulitzer

fbshipit-source-id: a284314b65662e26390417bd2b6b12cd85e68dc8
2021-08-18 11:31:31 -07:00
kyshel
e75ed4a4b5 add comma to prevent syntax errors (#62492)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62492

Reviewed By: VitalyFedyunin

Differential Revision: D30304684

Pulled By: ezyang

fbshipit-source-id: db08ca39bcecbfd79ea50df18536bf4e87f51e15
2021-08-16 12:27:31 -07:00
cpatru
6d896cb545 Update faq.rst so OOM section mentions checkpoint (#62709)
Summary:
This FAQ has a section for CUDA OOMs where there are lots of don'ts. This limits modeling solution. Deep nets can blow up memory due to output caching during training.
It's a known problem with a known solution: to trade-off compute for memory via checkpointing.
FAQ should mention it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62709

Reviewed By: nairbv

Differential Revision: D30103326

Pulled By: ezyang

fbshipit-source-id: 3a8b465a7fbe19aae88f83cc50fe82ebafcb56c9
2021-08-05 07:40:08 -07:00
Victor Quach
5830f122f1 Add docstrings for save_on_cpu hooks (#62410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62410

This PR adds docstrings for CPU hooks introduced in #61928.

Also uncomments the warning about pinned memory in CUDA semantics docs.

Depends on: #62361.

For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29990129

Pulled By: Varal7

fbshipit-source-id: 7a98eeee6a0abb11e2c2d9169cd1aa35ad7ba3f4
2021-08-03 17:53:45 -07:00
Michael Dagitses
58df01c3b8 clarify default value of requires_grad for tensors (#61038)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61038

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29491984

Pulled By: dagitses

fbshipit-source-id: 7e6b7f8e81d77f38c881b86a68c17d3cf5483dad
2021-07-12 12:57:37 -07:00
Jithun Nair
336970c03e Add note on torch.distributed backends on ROCm (#58975)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58975

Reviewed By: soulitzer

Differential Revision: D29595510

Pulled By: rohan-varma

fbshipit-source-id: 384bb67fcd003d65b76e957a474406b2a38099b9
2021-07-10 03:51:19 -07:00
Michael Carilli
2fa6c7627e [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421)
Summary:
Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe:
```python
with torch.cuda.stream(s):
    # imagine forward used many streams, so backward leaf nodes may run on many streams
    loss.backward()
# no sync
use grads
```

but a more benign-looking pattern was unsafe:
```python
with torch.cuda.stream(s):
    # imagine forward used a lot of streams, so backward leaf nodes may run on many streams
    loss.backward()
    # backward() syncs the default stream with all the leaf streams, but does not sync s with anything,
    # so counterintuitively (even though we're in the same stream context as backward()!)
    # it is NOT SAFE to use grads here, and there's no easy way to make it safe,
    # unless you manually sync on all the streams you used in forward,
    # or move "use grads" back to default stream outside the context.
    use grads
```
mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes).** In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams.

After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility.

This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream.

With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)).

** first paragraph has a formatting error which this PR should also fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421

Reviewed By: albanD

Differential Revision: D29370344

Pulled By: ngimel

fbshipit-source-id: 3248bc5fb92fc517db0c15c897e5d7250f67d7fe
2021-06-24 17:34:02 -07:00
Luca Wehrstedt
bb9e1150ea Revert D29342234: [pytorch][PR] [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream
Test Plan: revert-hammer

Differential Revision:
D29342234 (675cea1adb)

Original commit changeset: 98e6be7fdd85

fbshipit-source-id: 84022973248b2254210eee57402df2c4f4bc43c6
2021-06-24 04:49:28 -07:00
Michael Carilli
675cea1adb [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421)
Summary:
Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe:
```python
with torch.cuda.stream(s):
    # imagine forward used many streams, so backward leaf nodes may run on many streams
    loss.backward()
# no sync
use grads
```

but a more benign-looking pattern was unsafe:
```python
with torch.cuda.stream(s):
    # imagine forward used a lot of streams, so backward leaf nodes may run on many streams
    loss.backward()
    # backward() syncs the default stream with all the leaf streams, but does not sync s with anything,
    # so counterintuitively (even though we're in the same stream context as backward()!)
    # it is NOT SAFE to use grads here, and there's no easy way to make it safe,
    # unless you manually sync on all the streams you used in forward,
    # or move "use grads" back to default stream outside the context.
    use grads
```
mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes).** In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams.

After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility.

This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream.

With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)).

** first paragraph has a formatting error which this PR should also fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421

Reviewed By: VitalyFedyunin, albanD

Differential Revision: D29342234

Pulled By: ngimel

fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63
2021-06-23 23:35:24 -07:00
Michael Wootton
2f3be2735f Don't split oversize cached blocks (#44742)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901

This change is designed to prevent fragmentation in the Caching Allocator.  Permissive block splitting in the allocator allows very large blocks to be split into many pieces.  Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned.   Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'

Approach:

- Large blocks above a certain size are designated "oversize".  This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated.  This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering

Initial performance tests show this is similar or quicker than the original strategy.  Additional tests are ongoing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742

Reviewed By: zou3519

Differential Revision: D29186394

Pulled By: ezyang

fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9
2021-06-21 11:46:08 -07:00
Michael Carilli
be038d8989 [CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833)
Summary:
ci-all resubmit of https://github.com/pytorch/pytorch/pull/54227.

Tests look good except for a few distributed autograd failures (pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test) and rocm failures (pr/pytorch-linux-bionic-rocm4.1-py3.6).

The common denominator in rocm failures appears to be multi-gpu activity: some [multiprocess DDP failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test1/8115/console), some [single-process failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test2/8115/console) where the single process has autograd ops that span devices. jeffdaily jithunnair-amd sunway513, could one of you take a look? The streaming backward change is also beneficial to rocm, I expect.

For debugging rocm failures, I think we should ignore the multiprocess/DDP tests and focus on the single process cases. The root cause is probably the same and the single process cases are simpler.

----------------------------------

Update: Rocm failures are due to https://github.com/pytorch/pytorch/issues/59750.
2718a54032 is a workaround, to be updated once https://github.com/pytorch/pytorch/issues/59750 is fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57833

Reviewed By: mruberry

Differential Revision: D28942391

Pulled By: ngimel

fbshipit-source-id: d6047e971c5f1c6386334bf3641402a92f12e2f8
2021-06-13 12:09:56 -07:00
Jeffrey Wan
a7a5992d7d Add no-grad inference mode note (#58513)
Summary:
Adds a note explaining the difference between several often conflated mechanisms in the autograd note
Also adds a link to this note from the docs in `grad_mode` and `nn.module`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58513

Reviewed By: gchanan

Differential Revision: D28651129

Pulled By: soulitzer

fbshipit-source-id: af9eb1749b641fc1b632815634eea36bf7979156
2021-05-25 13:06:54 -07:00
Jithun Nair
ab6b5fa036 Add HIP (ROCm) semantics doc (#57871)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57871

Reviewed By: agolynski

Differential Revision: D28385510

Pulled By: malfet

fbshipit-source-id: 9cf69e52d026a1cf74cc12d8727ca17ae026235e
2021-05-12 12:34:07 -07:00
albanD
d16ed1ee8a Add first draft of gradcheck note (#55966)
Summary:
You can find the latest rendered version in the `python_doc_build` CI job below, in the artifact tab of that build on circle CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55966

Reviewed By: H-Huang

Differential Revision: D28032446

Pulled By: albanD

fbshipit-source-id: 227ad37b03d39894d736c19cae3195b4d56fc62f
2021-04-27 14:33:42 -07:00
Ilqar Ramazanli
70d9be0f42 Replace duplicative s with alpha (#56804)
Summary:
It is always easier to read a document when different objects / concepts denoted with different variables / representations.
In this PR we make sure the [complex autograd](https://pytorch.org/docs/master/notes/autograd.html#autograd-for-complex-numbers) documentation, the variable of output and step size diverge.

Fixes https://github.com/pytorch/pytorch/issues/53633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56804

Reviewed By: anjali411

Differential Revision: D27989959

Pulled By: iramazanli

fbshipit-source-id: c271590ee744c8aeeff62bfaa2295429765ef64e
2021-04-25 16:27:09 -07:00
Erjia Guan
8cf85a1152 [DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56528

Tried to search across internal and external usage of DataLoader. People haven't started to use `generator` for `DataLoader`.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27908487

Pulled By: ejguan

fbshipit-source-id: 14c83ed40d4ba4dc988b121968a78c2732d8eb93
2021-04-22 09:40:45 -07:00
Natalia Gimelshein
f94c95a2dd Revert D23752058: [pytorch][PR] Don't split oversize cached blocks
Test Plan: revert-hammer

Differential Revision:
D23752058 (67dcd62310)

Original commit changeset: ccb7c13e3cf8

fbshipit-source-id: 12ae9702135ea510e9714ed97fb75ca3b9f97c27
2021-04-14 09:24:08 -07:00
Michael Wootton
67dcd62310 Don't split oversize cached blocks (#44742)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901

This change is designed to prevent fragmentation in the Caching Allocator.  Permissive block splitting in the allocator allows very large blocks to be split into many pieces.  Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned.   Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'

Approach:

- Large blocks above a certain size are designated "oversize".  This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated.  This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering

Initial performance tests show this is similar or quicker than the original strategy.  Additional tests are ongoing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742

Reviewed By: ngimel

Differential Revision: D23752058

Pulled By: ezyang

fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8
2021-04-14 03:04:41 -07:00
Yukio Siraichi
93bf0ae6fc Remove legacy constructor calls from pytorch codebase. (#54142)
Summary:
Follow up from https://github.com/pytorch/pytorch/issues/53889
Related to https://github.com/pytorch/pytorch/issues/47112

Removing every occurrence of the legacy constructor call present in PyTorch at:
- _docs_
- _benchmarks_
- _test_
- _caffe2_
- _CONTRIBUTING.md_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142

Reviewed By: ngimel

Differential Revision: D27699450

Pulled By: mruberry

fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546
2021-04-11 15:45:17 -07:00
James Reed
ec38dda1cc Remove extra close bracket in extending.rst (#55409)
Summary:
Small typo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55409

Reviewed By: pbelevich

Differential Revision: D27611177

Pulled By: jamesr66a

fbshipit-source-id: 8a5ff702e4ab8a7eb2403432889f8b7a5a69484b
2021-04-07 21:15:46 -07:00
Peter Bell
8ac0619784 Avoid infinite recursion in __torch_function__ example (#55391)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/55284

This gets the example to run but probably doesn't help the readability of the example.

Thoughts?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55391

Reviewed By: mrshenli

Differential Revision: D27621096

Pulled By: ezyang

fbshipit-source-id: d02c4fb0001e54139a167b477fd3b4a229e4dc8c
2021-04-07 20:31:46 -07:00
James Reed
c96f076248 Fix typo in extending.rst (#55408)
Summary:
Small typo in docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55408

Reviewed By: pbelevich

Differential Revision: D27611175

Pulled By: jamesr66a

fbshipit-source-id: a83a6220054c0411329792c7ac6afceb2b699f44
2021-04-07 03:46:01 -07:00
Sam Estep
5bcbbf5373 Lint trailing newlines (#54737)
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.

The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:

- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`

I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):

- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)

To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737

Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:

- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true

In contrast, this run (after correcting the trailing newlines in this PR) succeeded:

- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241

To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```

Reviewed By: malfet

Differential Revision: D27409736

Pulled By: samestep

fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
2021-03-30 13:09:52 -07:00
Jeff Yang
475251631b docs: reference links to serialization.html (#54659)
Summary:
fixes https://github.com/pytorch/pytorch/issues/54311
https://11811979-65600975-gh.circle-artifacts.com/0/docs/generated/torch.save.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54659

Reviewed By: ailzhang

Differential Revision: D27328281

Pulled By: zou3519

fbshipit-source-id: b88d02e5407238a338d537d013a297ae9cdf922b
2021-03-29 10:15:07 -07:00
Eric Jang
c2ccb3578e Fix inport -> import typo in documentation (#53589)
Summary:
Fixes a small documentation typo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53589

Reviewed By: ngimel

Differential Revision: D26907045

Pulled By: Chillee

fbshipit-source-id: 15c35bec8d75dd897fe8886d0e0e1b889df65b24
2021-03-08 23:56:42 -08:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
peter
8870c391e9 Update mkl to 2020.2.254 (#52964)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52907

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52964

Reviewed By: H-Huang

Differential Revision: D26726464

Pulled By: seemethere

fbshipit-source-id: 8f3067292e6416e299b4b040c8fb73510134f02e
2021-03-01 11:13:57 -08:00
Joel Schlosser
a0137808a7 Note on Modules for 1.8 docs (#51536)
Summary:
A new note on Modules for 1.8 documentation.

Rendered form can be seen here: https://alband.github.io/doc_view/notes/modules.html
(thanks Alban!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51536

Reviewed By: albanD

Differential Revision: D26254282

Pulled By: jbschlosser

fbshipit-source-id: 09cbd46aa268a29b6f54fd48ffe1d6b98db0ff31
2021-02-04 11:28:11 -08:00
anjali411
34d4d79966 Autograd doc note fix (#51661)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51661

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D26230912

Pulled By: anjali411

fbshipit-source-id: 94323d7bce631a4c5781020e9650495461119ede
2021-02-03 15:08:35 -08:00
Mike Ruberry
40c0fffb4b Fixes docs (#51439)
Summary:
pytorch_python_doc_build is failing with:

```
Jan 31 04:30:45 /var/lib/jenkins/workspace/docs/source/notes/broadcasting.rst:6: WARNING: 'any' reference target not found: numpy.doc.broadcasting
```

this removes the incorrect reference and adds an updated link.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51439

Reviewed By: ngimel

Differential Revision: D26170232

Pulled By: mruberry

fbshipit-source-id: 829999db52e1e860d36d626d0d9f26e31283d14b
2021-01-31 22:00:26 -08:00
anjali411
fd9a85d21b Doc update for complex numbers (#51129)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51129

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26094947

Pulled By: anjali411

fbshipit-source-id: 4e1cdf8915a8c6a86ac3462685cdce881e1bcffa
2021-01-27 07:32:26 -08:00
Hameer Abbasi
f7b339d11c Clarify wording around overrides subclasses. (#51031)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47117

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51031

Reviewed By: bdhirsh

Differential Revision: D26047498

Pulled By: albanD

fbshipit-source-id: dd0a7d9f97c0f6469b3050d2e3b4473f1bee3820
2021-01-25 08:19:13 -08:00
Kurt Mohler
8ab1a1495d Rename set_deterministic to use_deterministic_algorithms (#49904)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49100

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49904

Reviewed By: ezyang, mrshenli

Differential Revision: D25956761

Pulled By: mruberry

fbshipit-source-id: 86a59289d50825a0ebbd7c358b483c8d8039ffa6
2021-01-22 11:27:07 -08:00
Jeffrey Wan
1833009202 Fix typo in complex autograd docs (#49755)
Summary:
Update complex autograd docs to fix a typo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49755

Reviewed By: mruberry

Differential Revision: D25692649

Pulled By: soulitzer

fbshipit-source-id: 43c2113b4c8f2d1828880102189a5a9b887dc784
2020-12-23 14:42:34 -08:00
pbialecki
1451d84766 Minor doc fix: change truncating to rounding in TF32 docs (#49625)
Summary:
Minor doc fix in clarifying that the input data is rounded not truncated.

CC zasdfgbnm ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49625

Reviewed By: mruberry

Differential Revision: D25668244

Pulled By: ngimel

fbshipit-source-id: ac97e41e0ca296276544f9e9f85b2cf1790d9985
2020-12-22 13:46:33 -08:00
Peter Bell
5180caeeb4 Remove deprecated spectral ops from torch namespace (#48594)
Summary:
Ref https://github.com/pytorch/pytorch/issues/42175

This removes the 4 deprecated spectral functions: `torch.{fft,rfft,ifft,irfft}`. `torch.fft` is also now imported by by default.

The actual `at::native` functions are still used in `torch.stft` so can't be full removed yet. But will once https://github.com/pytorch/pytorch/issues/47601 has been merged.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48594

Reviewed By: heitorschueroff

Differential Revision: D25298929

Pulled By: mruberry

fbshipit-source-id: e36737fe8192fcd16f7e6310f8b49de478e63bf0
2020-12-05 04:12:32 -08:00
peter
3c5db30eaa Update magma to 2.5.4 for Windows (#48656)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48527

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48656

Reviewed By: zhangguanheng66

Differential Revision: D25261601

Pulled By: malfet

fbshipit-source-id: 4ba0036ca882bccd1990108d13596455d179d06e
2020-12-02 09:45:21 -08:00
Hameer Abbasi
4e15877d5c Add documentation for torch.overrides submodule. (#48170)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48087

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48170

Reviewed By: ejguan

Differential Revision: D25220942

Pulled By: ezyang

fbshipit-source-id: a2b7f7b565f5e77173d8ce2fe9676a8131f929b6
2020-11-30 11:25:31 -08:00
Hameer Abbasi
3a2aad9314 Fix documentation to point to torch.overrides instead of _overrides. (#47842)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47697

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47842

Reviewed By: smessmer

Differential Revision: D24951750

Pulled By: ezyang

fbshipit-source-id: df62ec2e52f1c561c864a50bac4abf4a55e4f8e6
2020-11-16 08:28:53 -08:00
Masaki Kozuki
2eb1e866e8 Update links in DDP note (#47663)
Summary:
Update the links in https://pytorch.org/docs/stable/notes/ddp.html#.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47663

Reviewed By: smessmer

Differential Revision: D24951684

Pulled By: ezyang

fbshipit-source-id: c1c104d76cf0292a7fc75a627bf76bb56fea72d0
2020-11-13 21:26:28 -08:00
Xiang Gao
4a7de2746f Add docs on how to toggle TF32 flags on C++ (#47331)
Summary:
I have been asked several times how to toggle this flag on libtorch. I think it would be good to mention it in the docs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47331

Reviewed By: glaringlee

Differential Revision: D24777576

Pulled By: mruberry

fbshipit-source-id: cc2a338c477bb57e0bb74b8960c47fde99665e41
2020-11-08 01:29:24 -08:00
anjali411
ac245f6b45 Complex autograd doc fix (#46258)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46258

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D24286512

Pulled By: anjali411

fbshipit-source-id: 60bc98d69336101c0d8fe5ab542b9757b5e7faac
2020-10-13 14:36:50 -07:00
Vitaly Fedyunin
31ee5d8d8b Adding information how to control randomness with DataLoader (#45749)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45749

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D24088407

Pulled By: VitalyFedyunin

fbshipit-source-id: 398b73ec5e8c83000ebc692001da847fc0aaa48f
2020-10-12 16:57:58 -07:00
anjali411
89256611b5 Doc note update for complex autograd (#45270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45270

<img width="1679" alt="Screen Shot 2020-10-07 at 1 45 59 PM" src="https://user-images.githubusercontent.com/20081078/95368324-fa7b2d00-08a3-11eb-9066-2e659a4085a2.png">
<img width="1673" alt="Screen Shot 2020-10-07 at 1 46 10 PM" src="https://user-images.githubusercontent.com/20081078/95368332-fbac5a00-08a3-11eb-9be5-77ce6deb8967.png">
<img width="1667" alt="Screen Shot 2020-10-07 at 1 46 30 PM" src="https://user-images.githubusercontent.com/20081078/95368337-fe0eb400-08a3-11eb-80a2-5ad23feeeb83.png">
<img width="1679" alt="Screen Shot 2020-10-07 at 1 46 48 PM" src="https://user-images.githubusercontent.com/20081078/95368345-00710e00-08a4-11eb-96d9-e2d544554a4b.png">
<img width="1680" alt="Screen Shot 2020-10-07 at 1 47 03 PM" src="https://user-images.githubusercontent.com/20081078/95368350-023ad180-08a4-11eb-89b3-f079480741f4.png">
<img width="1680" alt="Screen Shot 2020-10-07 at 1 47 12 PM" src="https://user-images.githubusercontent.com/20081078/95368364-0535c200-08a4-11eb-82fc-9435a046e4ca.png">

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D24203257

Pulled By: anjali411

fbshipit-source-id: cd637dade5fb40cecf5d9f4bd03d508d36e26fcd
2020-10-08 15:04:52 -07:00
Michael Carilli
5640b79bf8 Allow consumer ops to sync on GraphRoot's gradient (#45787)
Summary:
Currently, a GraphRoot instance doesn't have an associated stream.  Streaming backward synchronization logic assumes the instance ran on the default stream, and tells consumer ops to sync with the default stream.  If the gradient the GraphRoot instance passes to consumer backward ops was populated on a non-default stream, we have a race condition.

The race condition can exist even if the user doesn't give a manually populated gradient:
```python
with torch.cuda.stream(side_stream):
    # loss.backward() implicitly synthesizes a one-element 1.0 tensor on side_stream
    # GraphRoot passes it to consumers, but consumers first sync on default stream, not side_stream.
    loss.backward()

    # Internally to backward(), streaming-backward logic takes over, stuff executes on the same stream it ran on in forward,
    # and the side_stream context is irrelevant.  GraphRoot's interaction with its first consumer(s) is the spot where
    # the side_stream context causes a problem.
```

This PR fixes the race condition by associating a GraphRoot instance, at construction time, with the current stream(s) on the device(s) of the grads it will pass to consumers. (i think this relies on GraphRoot executing in the main thread, before backward thread(s) fork, because the grads were populated on the main thread.)

The test demonstrates the race condition. It fails reliably without the PR's GraphRoot diffs and passes with the GraphRoot diffs.

With the GraphRoot diffs, manually populating an incoming-gradient arg for `backward` (or `torch.autograd.grad`) and the actual call to `autograd.backward` will have the same stream-semantics relationship as any other pair of ops:
```python
# implicit population is safe
with torch.cuda.stream(side_stream):
    loss.backward()

# explicit population in side stream then backward in side stream is safe
with torch.cuda.stream(side_stream):
    kickoff_grad = torch.ones_like(loss)
    loss.backward(gradient=kickoff_grad)

# explicit population in one stream then backward kickoff in another stream
# is NOT safe, even with this PR's diffs, but that unsafety is consistent with
# stream-semantics relationship of any pair of ops
kickoff_grad = torch.ones_like(loss)
with torch.cuda.stream(side_stream):
    loss.backward(gradient=kickoff_grad)

# Safe, as you'd expect for any pair of ops
kickoff_grad = torch.ones_like(loss)
side_stream.wait_stream(torch.cuda.current_stream())
with torch.cuda.stream(side_stream):
    loss.backward(gradient=kickoff_grad)
```
This PR also adds the last three examples above to cuda docs and references them from autograd docstrings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45787

Reviewed By: nairbv

Differential Revision: D24138376

Pulled By: albanD

fbshipit-source-id: bc4cd9390f9f0358633db530b1b09f9c1080d2a3
2020-10-07 08:53:53 -07:00
Bert Maher
03342af3a3 Add env variable to bypass CUDACachingAllocator for debugging (#45294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294

While tracking down a recent memory corruption bug we found that
cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that
it's because we use a caching allocator so a lot of "out of bounds" accesses
land in a valid slab.

This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set,
bypasses the caching allocator's caching logic so that allocations go straight
to cudaMalloc.  This way, cuda-memcheck will actually work.

Test Plan:
Insert some memory errors and run a test under cuda-memcheck;
observe that cuda-memcheck flags an error where expected.

Specifically I removed the output-masking logic here:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826

And ran:
```
PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py
```

Reviewed By: ngimel

Differential Revision: D23964734

Pulled By: bertmaher

fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39
2020-09-28 11:40:04 -07:00
Michael Carilli
3e6bb5233f Reference amp tutorial (recipe) from core amp docs (#44725)
Summary:
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live.  Core amp docs should reference it.

Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725

Reviewed By: mruberry

Differential Revision: D23723807

Pulled By: ngimel

fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3
2020-09-16 11:37:58 -07:00
Michael Carilli
2fd142a2ef Small clarification to amp gradient penalty example (#44667)
Summary:
requested by https://discuss.pytorch.org/t/what-is-the-correct-way-of-computing-a-grad-penalty-using-amp/95827/3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44667

Reviewed By: mruberry

Differential Revision: D23692768

Pulled By: ngimel

fbshipit-source-id: 83c61b94e79ef9f86abed2cc066f188dce0c8456
2020-09-14 21:56:09 -07:00
Gao, Xiang
5e97f251a8 Enable TF32 support for cuDNN (#40737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737

Reviewed By: mruberry

Differential Revision: D22801525

Pulled By: ngimel

fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2
2020-09-01 15:34:24 -07:00
Kurt Mohler
d7ee84c9b5 Update determinism documentation (#41692)
Summary:
Add user-facing documentation for set_deterministic
Also update grammar and readability in Reproducibility page

Issue https://github.com/pytorch/pytorch/issues/15359

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41692

Reviewed By: ailzhang

Differential Revision: D23433061

Pulled By: mruberry

fbshipit-source-id: 4c4552950803c2aaf80f7bb4792d2095706d07cf
2020-08-31 21:06:24 -07:00
peterjc123
9b05fbd92e Correct the windows docs (#43479)
Summary:
Fixes https://discuss.pytorch.org/t/i-cannot-use-the-pytorch-that-was-built-successfully-from-source-dll-initialization-routine-failed-error-loading-caffe2-detectron-ops-gpu-dll/93243/5?u=peterjc123.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43479

Reviewed By: mrshenli, ngimel

Differential Revision: D23294211

Pulled By: ezyang

fbshipit-source-id: d67df7d0355c2783153d780c94f959758b246d36
2020-08-25 13:41:24 -07:00
Heitor Schueroff de Souza
ffc3da35f4 Don't materialize output grads (#41821)
Summary:
Added a new option in AutogradContext to tell autograd to not materialize output grad tensors, that is, don't expand undefined/None tensors into tensors full of zeros before passing them as input to the backward function.

This PR is the second part that closes https://github.com/pytorch/pytorch/issues/41359. The first PR is https://github.com/pytorch/pytorch/pull/41490.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41821

Reviewed By: albanD

Differential Revision: D22693163

Pulled By: heitorschueroff

fbshipit-source-id: a8d060405a17ab1280a8506a06a2bbd85cb86461
2020-08-11 04:27:07 -07:00
Hameer Abbasi
3d46e02ea1 Add __torch_function__ for methods (#37091)
Summary:
According to pytorch/rfcs#3

From the goals in the RFC:

1. Support subclassing `torch.Tensor` in Python (done here)
2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here)
3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor`
   subclasses (done in https://github.com/pytorch/pytorch/issues/30730)
4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here)
5. Propagating subclass instances correctly also with operators, using
   views/slices/indexing/etc. (done here)
6. Preserve subclass attributes when using methods or views/slices/indexing. (done here)
7. A way to insert code that operates on both functions and methods uniformly
   (so we can write a single function that overrides all operators). (done here)
8. The ability to give external libraries a way to also define
   functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR)

This PR makes the following changes:

1. Adds the `self` argument to the arg parser.
2. Dispatches on `self` as well if `self` is not `nullptr`.
3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`.
4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`.
5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`.

TODO:

- [x] Sequence Methods
- [x] Docs
- [x] Tests

Closes https://github.com/pytorch/pytorch/issues/28361

Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091

Reviewed By: ngimel

Differential Revision: D22765678

Pulled By: ezyang

fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0
2020-08-05 20:44:13 -07:00
peter
192487d716 Update MAGMA to 2.5.3 for Windows (#42410)
Summary:
In order to introduce CUDA 11 build jobs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42410

Reviewed By: malfet

Differential Revision: D22892025

Pulled By: ezyang

fbshipit-source-id: 11bd7507f623d654a589ba00a138f6b947990f4c
2020-08-03 07:43:09 -07:00
Xiao Wang
60e2baf5e0 [doc] Add LSTM non-deterministic workaround (#40893)
Summary:
Related: https://github.com/pytorch/pytorch/issues/35661

Preview
![image](https://user-images.githubusercontent.com/24860335/86861581-4b4c7100-c07c-11ea-950a-3145bfae9af9.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40893

Reviewed By: vincentqb

Differential Revision: D22535418

Pulled By: ngimel

fbshipit-source-id: f194ddaff8ec6d03a3616c87466e2cbbe7e429a9
2020-07-21 16:20:02 -07:00
Mike Ruberry
a0e58996fb Makes the use of the term "module" consistent through the serialization note (#41563)
Summary:
module -> torch.nn.Module or ScriptModule, as appropriate. + bonus grammar fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41563

Reviewed By: gchanan

Differential Revision: D22584173

Pulled By: mruberry

fbshipit-source-id: 8c90f1f9a194bfdb277c97cf02c9b8c1c6ddc601
2020-07-16 14:59:49 -07:00
Mike Ruberry
f49d97a848 Notes for lcm and gcd, formatting doc fixes (#41526)
Summary:
A small PR fixing some formatting in lcm, gcd, and the serialization note. Adds a note to lcm and gcd explaining behavior that is not always defined.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41526

Reviewed By: ngimel

Differential Revision: D22569341

Pulled By: mruberry

fbshipit-source-id: 5f5ff98c0831f65e82b991ef444a5cee8e3c8b5a
2020-07-16 13:15:29 -07:00
anjali411
b9442bb03e Doc note for complex (#41252)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41252

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D22553266

Pulled By: anjali411

fbshipit-source-id: f6dc409da048496d72b29b0976dfd3dd6645bc4d
2020-07-16 08:53:27 -07:00
Xiang Gao
23174ca71b [reland] Enable TF32 support for cuBLAS (#41498)
Summary:
fix rocm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498

Reviewed By: mruberry

Differential Revision: D22560572

Pulled By: ngimel

fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041
2020-07-15 21:00:55 -07:00
Mike Ruberry
60f2fa6a84 Updates serialization note to explain versioned symbols and dynamic versioning (#41395)
Summary:
Doc update intended to clarify and expand our current serialization behavior, including explaining the difference between torch.save/torch.load, torch.nn.Module.state_dict/torch.nn.Module.load_state_dict, and torch.jit.save/torch.jit.load. Also explains, for the time, when historic serialized Torchscript behavior is preserved and our recommendation for preserving behavior (using the same PyTorch version to consume a model as produced it).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41395

Reviewed By: ngimel

Differential Revision: D22560538

Pulled By: mruberry

fbshipit-source-id: dbc2f1bb92ab61ff2eca4888febc21f7dda76ba1
2020-07-15 19:05:19 -07:00
Shen Li
3a63a939d4 Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS
Test Plan: revert-hammer

Differential Revision:
D22517785 (288ece89e1)

Original commit changeset: 87334c893561

fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458
2020-07-15 08:15:48 -07:00
Xiang Gao
288ece89e1 Enable TF32 support for cuBLAS (#40800)
Summary:
Benchmark on a fully connected network and torchvision models (time in seconds) on GA100:

| model              | batch size | forward(TF32) | forward(FP32) | backward(TF32) | backward(FP32) |
|--------------------|------------|---------------|---------------|----------------|----------------|
| FC 512-128-32-8    | 512        | 0.000211      | 0.000321      | 0.000499       | 0.000532       |
| alexnet            | 512        | 0.0184        | 0.0255        | 0.0486         | 0.0709         |
| densenet161        | 128        | 0.0665        | 0.204         | 0.108          | 0.437          |
| googlenet          | 256        | 0.0925        | 0.110         | 0.269          | 0.326          |
| inception_v3       | 256        | 0.155         | 0.214         | 0.391          | 0.510          |
| mnasnet1_0         | 512        | 0.108         | 0.137         | 0.298          | 0.312          |
| mobilenet_v2       | 512        | 0.114         | 0.294         | 0.133          | 0.303          |
| resnet18           | 512        | 0.0722        | 0.100         | 0.182          | 0.228          |
| resnext50_32x4d    | 256        | 0.170         | 0.237         | 0.373          | 0.479          |
| shufflenet_v2_x1_0 | 512        | 0.0463        | 0.0473        | 0.125          | 0.123          |
| squeezenet1_0      | 512        | 0.0870        | 0.0948        | 0.205          | 0.214          |
| vgg16              | 256        | 0.167         | 0.234         | 0.401          | 0.502          |
| wide_resnet50_2    | 512        | 0.186         | 0.310         | 0.415          | 0.638          |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800

Reviewed By: mruberry

Differential Revision: D22517785

Pulled By: ngimel

fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e
2020-07-14 13:21:10 -07:00
Michael Carilli
d927aee312 Small clarification of torch.cuda.amp multi-model example (#41203)
Summary:
some people have been confused by `retain_graph` in the snippet, they thought it was an additional requirement imposed by amp.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41203

Differential Revision: D22463700

Pulled By: ngimel

fbshipit-source-id: e6fc8871be2bf0ecc1794b1c6f5ea99af922bf7e
2020-07-10 11:13:26 -07:00
anjali411
db38487ece Autograd Doc for Complex Numbers (#41012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41012

Test Plan: Imported from OSS

Differential Revision: D22476911

Pulled By: anjali411

fbshipit-source-id: 7da20cb4312a0465272bebe053520d9911475828
2020-07-10 09:57:43 -07:00
Edward Leardi
6b50874cb7 Fix HTTP links in documentation to HTTPS (#40878)
Summary:
I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878

Differential Revision: D22404647

Pulled By: ngimel

fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3
2020-07-06 20:05:21 -07:00
Ailing Zhang
d7cd16858f Add documentation about storage sharing is preserved and serialized f… (#40412)
Summary:
…ile size.
fixes https://github.com/pytorch/pytorch/issues/40157
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40412

Reviewed By: ezyang

Differential Revision: D22265639

Pulled By: ailzhang

fbshipit-source-id: 16b0301f16038bd784e7e92f63253fedc7820adc
2020-06-29 17:23:29 -07:00
Jeong Ukjae
b4db529352 Fix wrong link in docs/source/notes/ddp.rst (#40484)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40484

Differential Revision: D22259834

Pulled By: mrshenli

fbshipit-source-id: 4ec912c600c81010bdb2778c35cbb0321480199f
2020-06-28 13:55:56 -07:00
Wanchao Liang
eebd492dcf [doc] fix autograd doc subsubsection display issue (#40582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40582

There's a misuse in the `requires_grad` with ~~~~~~, "~~~~" is not a official section marker, change it to "^^^^^" to denote subsubsections, also fix the other places where we should use subsection "-----" instead of subsubsection "^^^^"

see https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections

Before:
<img width="712" alt="rst_before" src="https://user-images.githubusercontent.com/9443650/85789835-2226fa80-b6e4-11ea-97b6-2b19fdf324a4.png">
After:
<img width="922" alt="rst_after" src="https://user-images.githubusercontent.com/9443650/85789856-281cdb80-b6e4-11ea-925f-cb3f4ebaa2bf.png">

Test Plan: Imported from OSS

Differential Revision: D22245747

Pulled By: wanchaol

fbshipit-source-id: 11548ed42f627706863bb74d4269827d1b3450d4
2020-06-25 23:28:33 -07:00
Michael Carilli
3b040c478a Make custom_fwd a no-op when not executed under autocast (#36171)
Summary:
Currently, a custom autograd function written with
```
torch.cuda.amp.custom_fwd(cast_inputs=dtype)
def forward(ctx, *args):
    ...
```
casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region.  I think I had the wrong idea there.  Autocast-disabled regions should give the user control of input types.  Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions.  C++-side casting wrappers have no effect when autocast is disabled, and  `custom_fwd`'s casting should behave the same way.

The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171

Differential Revision: D22179511

Pulled By: ngimel

fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd
2020-06-23 10:23:02 -07:00
Rohan Varma
ae2f1f0372 [DDP Note] Remove refs to RoundRobin PG until we officially support it (#40380)
Summary:
Removes line mentioning `ProcessGroupRoundRobin` since we don't intend it to be used as a public API just yet. We can add this back when we officially support the API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40380

Differential Revision: D22165556

Pulled By: rohan-varma

fbshipit-source-id: 24d0477d881dc74f2ff579de61dfd1ced2b09e75
2020-06-22 16:19:29 -07:00
anjali411
8ec2ae9a9f Add view_as_real, view_as_complex for complex tensors (#39099)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39099

Test Plan: Imported from OSS

Differential Revision: D22057886

Pulled By: anjali411

fbshipit-source-id: bad5ba7097ba0dd13f2c549b2463094dee9afa14
2020-06-22 15:15:27 -07:00
James Reed
c73095e78f Add note to serialization docs about zipfile format (#40288)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40288

Test Plan: Imported from OSS

Differential Revision: D22140324

Pulled By: jamesr66a

fbshipit-source-id: 01d7aa642ed2f4e4bdac4b7f3223bf4d7e62fd4d
2020-06-19 13:40:08 -07:00
Alban Desmaison
b88b7d552f Prevent custom Functions from creating non differentiable type that requires grad (#38326)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38326

Test Plan: Imported from OSS

Differential Revision: D21668740

Pulled By: albanD

fbshipit-source-id: f452f65e76003492055311523a652937b1300183
2020-05-21 08:30:14 -07:00
Ilia Cherniavskii
43dd8760d7 Move ThreadLocalDebugInfo to c10 (#37774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774

Move ThreadLocalDebugInfo from ATen to C10

Test Plan: Imported from OSS

Differential Revision: D21384249

Pulled By: ilia-cher

fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2
2020-05-11 19:27:41 -07:00
毛毛
19d6e32e9a fix sample code (#38002)
Summary:
Make Linear layer working correct when bias is False
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38002

Differential Revision: D21509679

Pulled By: malfet

fbshipit-source-id: c7077992cf414ecc557b39e5ed1e39ef01c8b347
2020-05-11 15:34:09 -07:00
Ilia Cherniavskii
2d708cefcc Move RecordFunction into ATen (#37548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548

Moving RecordFunction from torch::autograd::profiler into at namespace

Test Plan:
CI

Imported from OSS

Differential Revision: D21315852

fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa
2020-05-07 14:52:39 -07:00
Ilia Cherniavskii
c24c5f9684 Make RecordFunction callbacks thread local and modernize interface (#37491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37491

This PR modernizes RecordFunction API and adds thread local callbacks
in addition to the global ones

Changes:
 - support for TLS callbacks, this is going to be the foundation of profiler and other tools
 - modernize interface around simple set of functions (add|remove|has|clear)(Global|ThreadLocal)(Callback) and adding RecordFunctionCallback to easily construct callbacks to be passed
 - we also add `.setShouldRun` into the callback interface to support cases when simple uniform sampling is not enough
 - to properly support add/remove introduce the idea of callback handle returned by add
 - internal implementation still uses SmallVector to store intermediate state (as before) - in this case these are vector of handles of callbacks that were picked to run
 - to speed up runtime we keep these vectors sorted, this way we can quickly enumerate callbacks that need to be run
 - added tests for new functionality

Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install
./build/bin/test_jit
CI

record_function_benchmark: https://gist.github.com/ilia-cher/f1e094dae47fe23e55e7672ac4dcda2f

Imported from OSS

Differential Revision: D21300448

fbshipit-source-id: 6d55c26dbf20b33d35c3f1604dcc07bb063c8c43
2020-05-07 14:51:02 -07:00
Edward Yang
4fef3763dd Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778)
Summary:
Original PR: https://github.com/pytorch/pytorch/pull/37419

cc mattip suo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37778

Differential Revision: D21385774

Pulled By: ezyang

fbshipit-source-id: 5de532faab8bae132736b6b5189e0ee2ac9935be
2020-05-04 14:32:35 -07:00
Michael Suo
20f7e62b1d Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings
Test Plan: revert-hammer

Differential Revision:
D21337640

Original commit changeset: d4ad198780c3

fbshipit-source-id: fa9ba6ac542173a50bdb45bfa12f3fec0ed704fb
2020-05-04 10:57:55 -07:00
mattip
f10fbcc820 Split up documentation into subpages and clean up some warnings (#37419)
Summary:
xref gh-32838, gh-34032

This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages.

Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py`

I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419

Differential Revision: D21337640

Pulled By: ezyang

fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f
2020-05-04 09:39:22 -07:00
Ilia Cherniavskii
d068a456d3 [resubmit] Enable global observers API (#37382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37382

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: jamesr66a

Differential Revision: D21268320

fbshipit-source-id: 93207e3b55325d20dcc5b1e8f448ab86933321da
2020-04-28 10:49:31 -07:00
Michael Suo
20143e5f27 Revert D21245094: [resubmit] Enable global observers API
Test Plan: revert-hammer

Differential Revision:
D21245094

Original commit changeset: 595e41b18206

fbshipit-source-id: 90344b361857d76ce5db75438c949dad1f5f186b
2020-04-27 16:19:46 -07:00
Wanchao Liang
1039b95ff0 [autograd] add documentation about multithread autograd (#37020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37020

Add multithread autograd documentation to the doc note.

Test Plan: Imported from OSS

Differential Revision: D21260996

Pulled By: wanchaol

fbshipit-source-id: 91d523560268ae62d4c6d773121b282ba837a561
2020-04-27 15:53:21 -07:00
Ilia Cherniavskii
5fab4c30dd [resubmit] Enable global observers API (#37292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37292

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: jamesr66a

Differential Revision: D21245094

fbshipit-source-id: 595e41b18206d2ba4cf639cb320f630907868b3f
2020-04-27 14:24:51 -07:00
Ilia Cherniavskii
856e8cf028 Revert D21213786: Enable global observers API
Test Plan: revert-hammer

Differential Revision:
D21213786

Original commit changeset: e618254da74a

fbshipit-source-id: 425ea5d44fa55655ec0dd586c5075996b926177b
2020-04-25 00:59:24 -07:00
Ilia Cherniavskii
6e659e928b Enable global observers API (#37195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37195

After adding c10::DispatchKey::Profiler the behavior of RecordFunction
observers is also controlled by the dispatch key,
this PR moves the logic outside of the profiler into the record function

Reviewed By: ngimel

Differential Revision: D21213786

fbshipit-source-id: e618254da74a4f1ce16c51a3869bbd75a4f561ad
2020-04-24 23:49:28 -07:00
Alban Desmaison
3799d1d74a Fix many doc issues (#37099)
Summary:
Fix https://github.com/pytorch/pytorch/issues/35643 https://github.com/pytorch/pytorch/issues/37063 https://github.com/pytorch/pytorch/issues/36307 https://github.com/pytorch/pytorch/issues/35861 https://github.com/pytorch/pytorch/issues/35299 https://github.com/pytorch/pytorch/issues/23108 https://github.com/pytorch/pytorch/issues/4661

Just a bunch of small updates on the doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37099

Differential Revision: D21185713

Pulled By: albanD

fbshipit-source-id: 4ac06d6709dc0da6109a6ad3daae75667ee5863e
2020-04-23 10:01:03 -07:00
Michael Carilli
e6bc34f549 Amp gradient accumulation example (#36601)
Summary:
Several people have asked me about proper Amp usage with gradient accumulation.  In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step.  This PR adds a minimal accumulation example.

I built the docs locally and it looks free from sphinx errors, at least.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601

Differential Revision: D21082295

Pulled By: ngimel

fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac
2020-04-17 09:56:36 -07:00
Jessica Lin
ac950bb9c8 Update docs for master to remove Python 2 references (#36336)
Summary:
Fix compile error from original PR in jit_language_references.rst: https://github.com/pytorch/pytorch/pull/36114

Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265

With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36336

Differential Revision: D21057507

Pulled By: jlin27

fbshipit-source-id: 993a763f1ecb16dad859bc02a07625ddc023645d
2020-04-16 10:15:48 -07:00
Edward Yang
6016f694c0 Revert D20901746: [pytorch][PR] Update docs for master to remove Python 2 references
Test Plan: revert-hammer

Differential Revision:
D20901746

Original commit changeset: 07f8dc8e6fab

fbshipit-source-id: 13c55597f9f79b8473210cf35a5a0f1fb34bae39
2020-04-08 14:49:11 -07:00
Jessica Lin
43234be525 Update docs for master to remove Python 2 references (#36114)
Summary:
Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265

With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36114

Differential Revision: D20901746

Pulled By: jlin27

fbshipit-source-id: 07f8dc8e6fab0b232e5048a63079cab0c433c85f
2020-04-07 16:13:18 -07:00
Rohan Varma
1f06db2579 Refactored rpc docs (#35109)
Summary:
Reorganize as per jlin27 's comments. Screenshots added in comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109

Differential Revision: D20788774

Pulled By: rohan-varma

fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766
2020-04-01 02:01:34 -07:00
Ilia Cherniavskii
bc6bd0bb1a Debug Information Guard
Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other)

Test Plan: CI test/cpp/jit

Reviewed By: dzhulgakov

Differential Revision: D20602775

fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb
2020-04-01 01:55:29 -07:00
Ilia Cherniavskii
800d5617c0 Recording of TorchScript functions (#34710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710

Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.

Test Plan: unit test (test_misc.cpp/testRecordFunction)

Reviewed By: gdankel, dzhulgakov

Differential Revision: D20158523

fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
2020-03-31 00:33:23 -07:00
pinzhenx
bd604cb5b7 Upgrade MKL-DNN to DNNL v1.2 (#32422)
Summary:
## Motivation

This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.

DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.

This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.

<br>

## What's included?

Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:

<br>

**General:**

1. Replace op-level allocator with global-registered allocator

```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);

// after
ideep::sum::compute(scales, {x, y}, z);
```

The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.

```
RegisterEngineAllocator cpu_alloc(
  ideep::engine::cpu_engine(),
  [](size_t size) {
    return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
  },
  [](void* p) {
    c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
  }
);
```
------

2. Simplify group convolution

We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.

As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.

```
// aten/src/ATen/native/mkldnn/Conv.cpp

if (w.ndims() == x.ndims() + 1) {
  AT_ASSERTM(
      groups > 1,
      "Only group _mkldnn_conv2d weights could have been reordered to 5d");
  kernel_size[0] = w.get_dim(0) * w.get_dim(1);
  std::copy_n(
      w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
  std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```

------

3. Enable DNNL built-in cache

Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.

This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.

------

4. Use 64-bit integer to denote dimensions

We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.

<br>

**Misc changes in each commit:**

**Commit:** change build options

Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.

Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)

------

**Commit:** aten reintegration

- aten/src/ATen/native/mkldnn/BinaryOps.cpp

    Implement binary ops using new operation `binary` provided by DNNL

- aten/src/ATen/native/mkldnn/Conv.cpp

    Clean up group convolution checks
    Simplify conv backward integration

- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

    Simplify prepacking convolution weights

- test/test_mkldnn.py

    Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue

- torch/utils/mkldnn.py

    Prepack weight tensor on module `__init__` to achieve better performance significantly

------

**Commit:** caffe2 reintegration

- caffe2/ideep/ideep_utils.h

    Clean up unused type definitions

- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc

   Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`

- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc

    Clean up group convolution checks
    Revamp convolution API

- caffe2/ideep/operators/conv_transpose_op.cc

    Clean up group convolution checks
    Clean up deconv workaround code

------

**Commit:** custom allocator

- Register c10 allocator as mentioned above

<br><br>

## Performance

We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.

ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%

_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_

† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422

Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results

10% improvement for ResNext with avx512, neutral on avx2

More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP

Reviewed By: yinghai

Differential Revision: D20381325

Pulled By: dzhulgakov

fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
2020-03-26 22:07:59 -07:00
Michael Carilli
0f0271e255 [RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102)
Summary:
This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140.

The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`).

The present PR restores skipIfRocm.

Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011:

> https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.
>
> The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.
>
> Original description of https://github.com/pytorch/pytorch/pull/32140:
> > Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
> Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081
>
> > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102

Differential Revision: D20596918

Pulled By: ezyang

fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50
2020-03-24 09:08:04 -07:00
Peter Bell
bd0ef784e0 FAQ: Add note about recovering from OOM (#35214)
Summary:
Closes https://github.com/pytorch/pytorch/issues/18853

This documents the workaround needed to solve the issues in https://github.com/pytorch/pytorch/issues/18853
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35214

Differential Revision: D20604877

Pulled By: ezyang

fbshipit-source-id: 71ed13cfa567d8e88fa9f18180a171cd174fb528
2020-03-23 20:22:46 -07:00
Xiang Gao
df8d6eeb19 Update docs about DP and DDP for CUDA (#35063)
Summary:
We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063

Differential Revision: D20549621

Pulled By: ngimel

fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543
2020-03-20 20:06:37 -07:00
Mike Ruberry
fe276d541e Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix)
Test Plan: revert-hammer

Differential Revision:
D20541921

Original commit changeset: abb5488dca86

fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4
2020-03-19 22:39:12 -07:00
Michael Carilli
991b97277a [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011)
Summary:
https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](d0577e19f0) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI.  The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer.

The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8.  The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8.  All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140.

Original description of https://github.com/pytorch/pytorch/pull/32140:
> Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

> In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011

Differential Revision: D20541921

Pulled By: ezyang

fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5
2020-03-19 20:18:18 -07:00
Edward Yang
d0577e19f0 Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only
Test Plan: revert-hammer

Differential Revision:
D20346700

Original commit changeset: 12d77b391731

fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60
2020-03-18 11:42:51 -07:00
Michael Carilli
aaa8f02156 Eager autocasting, out-of-place ops only (#32140)
Summary:
Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests.  Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140

Differential Revision: D20346700

Pulled By: ezyang

fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f
2020-03-18 10:28:21 -07:00
Shen Li
800bdcf000 Removing experimental tag in for RPC and adding experimental tag for RPC+TorchScript (#34887)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34887

Test Plan: Imported from OSS

Differential Revision: D20491409

Pulled By: mrshenli

fbshipit-source-id: ce79c9706eb70a3a52a4032de4f0bd538b694332
2020-03-17 17:43:42 -07:00
Hameer Abbasi
6b701de130 Add types argument to __torch_function__ (#34303)
Summary:
This PR adds the `types` argument to `__torch_function__` as per RFC 0001: https://github.com/pytorch/rfcs/pull/3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34303

Differential Revision: D20474992

Pulled By: ezyang

fbshipit-source-id: cdd40b3b38f3bda4ece8812a629f5db87e919d01
2020-03-17 13:32:00 -07:00
Rohan Varma
fd35596585 [docs][1.5] Update distributed autograd note (#34657)
Summary:
- Update API calls `backward` and `optim.step` now that we require `context_id`
- Add notes to clarify purpose of distributed autograd context (this was a source of confusion in some feedback)
- Add note that details why optimizer requires context_id
- Clearly specify that we don't have SMART mode yet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34657

Differential Revision: D20427667

Pulled By: rohan-varma

fbshipit-source-id: 5f8a3539ccf648a78e9e9a0dfdfe389c678b1606
2020-03-12 22:56:32 -07:00
Nathan Goldbaum
3f1ba3c465 Redo of "Add API for listing functions overridable by __torch_function__" (#34240)
Summary:
This is a redo of https://github.com/pytorch/pytorch/pull/33791, which was reverted because it introduced a flaky test. The test was flaky and only flaky on Python3.5 because of dict order randomization.

I've fixed the issue with tests clobbering each other in b539fec and removed the override tests for `torch.nn.functional.tanh` and `torch.nn.functional.sigmoid`, which are deprecated and shouldn't be overridable in e0d7402. I also verified that no more test clobbering is happening.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34240

Differential Revision: D20252442

Pulled By: cpuhrsch

fbshipit-source-id: 069568e342a41c90e1dc76cbf85ba4aed47f24be
2020-03-12 10:33:17 -07:00
Michael Suo
c235be42dd [jit] kill script namespace (#34515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515

Once upon a time we thought this was necessary. In reality it is not, so
removing it.

For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.

There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.

Test Plan: Imported from OSS

Differential Revision: D20353503

Pulled By: suo

fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
2020-03-11 23:32:48 -07:00
Duncan Riach
516a587438 Enhance reproducibility documentation (#33795)
Summary:
Improves explanation of non-determinism when running on GPUs. Adds info about `torch.nn.BCELoss` operating non-deterministically on GPUs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33795

Differential Revision: D20284880

Pulled By: ngimel

fbshipit-source-id: d543959636d261a80c234150304344b19a37ba5d
2020-03-06 15:32:04 -08:00
Shen Li
ac6e75a165 Revert D20195053: [pytorch][PR] Add API for listing functions overridable by __torch_function__
Test Plan: revert-hammer

Differential Revision:
D20195053

Original commit changeset: 1585f4e405f5

fbshipit-source-id: 3c1aab9c60e3138d40d200ae4238bda0cddf8896
2020-03-04 10:13:54 -08:00
peter
5f4a01b2ea Update MAGMA to 2.5.2 for Windows (#34205)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34205

Differential Revision: D20248224

Pulled By: soumith

fbshipit-source-id: f5e0fe06aa8f8ee551abe45db1d55d06e95ab928
2020-03-04 08:28:09 -08:00
Nathan Goldbaum
ad2825a2c9 Add API for listing functions overridable by __torch_function__ (#33791)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33182

This adds private API functions that developers of types that implement `__torch_function__` can use to ensure full coverage of the subset of the PyTorch API that can be overrided.

I've refactored some of the code in the tests into a new `torch._overrides.get_overridable_functions` function. I've also changed `TENSOR_LIKE_TORCH_OVERRIDES` into `torch._overrides.get_testing_overrides` and `IGNORED_TORCH_FUNCTIONS` into `torch._overrides.get_ignored_functions`. Making these two static global variables in the tests into functions should allow rewriting their implementation to construct their return values instead of just statically defining the return value as is done here. Currently that is blocked on not being able to inspect function signatures of compiled kernels in PyTorch (see https://github.com/pytorch/pytorch/issues/28233). See the docs I've added for usage examples of these new functions. I also refactored the existing override tests to make use of these new functions, which should be a good forcing function to make sure they're kept up-to-date.

Finally, while working on this I discovered that `TestTorchFunctionOverrides.test_mean` and `TestTorchFunctionOverrides.test_mm` weren't ever being run because they were getting clobbered by the other dynamically generated override tests. I fixed that by renaming the tests and then fixing the actual test code. I've verified that all the subclassing semantics is correct and that the updated test answers are correct. I'm happy to put the fixes to the existing tests in as a separate pull request if that would be easier to review.

ping cpuhrsch since the feature request originally came from them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33791

Differential Revision: D20195053

Pulled By: cpuhrsch

fbshipit-source-id: 1585f4e405f5223932b410eae03a288dc8eb627e
2020-03-03 12:40:34 -08:00
Omkar Salpekar
24dd800e6a [Dist Autograd] Functional API for Dist Autograd and Dist Optimizer (#33711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33711

Fixed #33480

This makes `dist_autograd.backward` and `dist_optimizer.step` functional by making the user explicitly pass in the `context_id` as opposed to relying on the confusing thread_local context_id.

This diff incorporates these API changes and all places where these functions are called.

More concretely, this code:

```
with dist_autograd.context():
    # Forward pass.
    dist_autograd.backward([loss.sum()])
    dist_optim.step()
```

should now be written as follows:

```
with dist_autograd.context() as context_id:
    # Forward pass.
    dist_autograd.backward(context_id, [loss.sum()])
    dist_optim.step(context_id)
```

Test Plan: Ensuring all existing dist_autograd and dist_optimizer tests pass with the new API. Also added a new test case for input checking.

Differential Revision: D20011710

fbshipit-source-id: 216e12207934a2a79c7223332b97c558d89d4d65
2020-02-26 19:08:28 -08:00
Michael Carilli
fc6a153688 [WIP] Reanimate gradient scaling API with original scale update heuristic (#33366)
Summary:
Also, windows memory failures responsible for the earlier reversion have been fixed.

This PR (initially) contains 2 commits:
* a revert of the revert
* all changes to implement the original Apex scale update heuristic, squashed into a single commit for easier diff review
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33366

Differential Revision: D20099026

Pulled By: ngimel

fbshipit-source-id: 339b9b6bd5134bf055057492cd1eedb7e4461529
2020-02-25 19:00:34 -08:00
peter
adbe289870 Update MKL to 2020.0.166 for Windows (#33690)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33690

Differential Revision: D20089300

Pulled By: ezyang

fbshipit-source-id: 887c006fbdb2c837f0a1c607a196811f44f1fb35
2020-02-24 22:43:34 -08:00
Edward Yang
ae53f8dd25 Revert D19859905: [pytorch][PR] Gradient scaling API
Test Plan: revert-hammer

Differential Revision:
D19859905

Original commit changeset: bb8ae6966214

fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970
2020-02-14 11:03:27 -08:00
Michael Carilli
40246fa63c Gradient scaling API (#26512)
Summary:
This PR implements the gradient scaling API that mruberry, jjsjann123, ngimel, zdevito, gchanan and I have been discussing.  Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081.

Volume-wise, this PR is mostly documentation and tests.  The Python API (found entirely in `torch/cuda/amp/amp_scaler.py`) is lightweight .  The exposed functions are intended to make the implementation and control flow of gradient scaling convenient, intuitive, and performant.

The API is probably easiest to digest by looking at the documentation and examples. `docs/source/amp.rst` is the homepage for the Automatic Mixed Precision package.  `docs/source/notes/amp_examples.rst` includes several examples demonstrating common but not-immediately-obvious use cases.  Examples are backed by tests in `test_cuda.py` (and thankfully the tests pass :P).

Two small utility kernels have been added in `native/cuda/AmpKernels.cu` to improve performance and avoid host-device synchronizations wherever possible.

Existing optimizers, both in the wild and in Pytorch core, do not need to change to use the scaling API.

However, the API was also designed to establish a contract between user scripts and optimizers such that writers of _new_ custom optimizers have the control points they need to implement fast, optionally sync-free updates.  User scripts that obey the scaling API can drop such custom optimizers in and reap performance benefits without having to change anything aside from the optimizer constructor itself.  [I know what the contract with custom optimizers should be](35829f24ef/torch/cuda/amp/amp_scaler.py (L179-L184)), but I'm waiting for review on the rest of the API before I go about documenting it (it will be given a dedicated section in `docs/source/notes/amp_examples.rst`.

Currently, the gradient scaling examples do not include the auto-casting API as discussed in https://github.com/pytorch/pytorch/issues/25081.  The gradient scaling API is intended to be orthogonal/modular relative to autocasting.  Without auto-casting the gradient scaling API is fully use-_able_, but not terribly use-_ful_, so it's up to you guys whether you want to wait until auto-casting is ready before merging the scaling API as well.

### Todo
- [ ] How do I get c10 registered status for my two custom kernels?  They're very simple.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26512

Differential Revision: D19859905

Pulled By: mruberry

fbshipit-source-id: bb8ae6966214718dfee11345db824389e4286923
2020-02-13 11:06:06 -08:00
Ilia Cherniavskii
04829e924a Update CPU threading doc (#33083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33083

Added more recommendations, some notes and warning

Test Plan: cd docs ; make html

Differential Revision: D19829133

Pulled By: ilia-cher

fbshipit-source-id: b9fbd89f5875b3ce35cc42ba75a3b44bb132c506
2020-02-11 14:13:51 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Shen Li
322f34b245 Adding DDP Design Note
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32158

Test Plan: Imported from OSS

Differential Revision: D19405980

Pulled By: mrshenli

fbshipit-source-id: 808ef1c71b637546f8872375bf1828967b1a5a60
2020-01-15 14:10:45 -08:00
Vamshi Chowdary
05088da8e9 [pytorch][PR] Fixed error in sample code of documentation (#31682)
Summary:
"in_features" and "out_features" are not defined. Possibly a typo. They should be "input_features" and "output_features" instead
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31682

Differential Revision: D19251685

Pulled By: zou3519

fbshipit-source-id: ac9e524e792a1853a16e8876d76b908495d8f35e
2020-01-15 10:34:07 -08:00
Rohan Varma
a561a8448b minor doc tweak to use mp.spawn in example (#30381)
Summary:
Per pietern's comment in https://github.com/pytorch/pytorch/issues/30022, we can make this example launcher a bit simpler by using `torch.multiprocessing`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30381

Differential Revision: D19292080

Pulled By: rohan-varma

fbshipit-source-id: 018ace945601166ef3af05d8c3e69d900bd77c3b
2020-01-06 22:19:01 -08:00
Nathan Goldbaum
9d3402e4cb Add the __torch_function__ API override mechanism (#30730)
Summary:
This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (b8792c0438). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures.

I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730

Differential Revision: D18813270

Pulled By: ezyang

fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68
2019-12-04 13:19:07 -08:00
Edward Yang
b8792c0438 Revert D18645954: add __torch_function__ API override mechanism
Test Plan: revert-hammer

Differential Revision:
D18645954

Original commit changeset: 54b5e4344d7a

fbshipit-source-id: 4a7aebb483e6b001130d6f384ccc53c5a808ab13
2019-12-04 07:41:47 -08:00
Prasun Anand
d12786b24f add __torch_function__ API override mechanism (#27064)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24015 (see description of that issue for more details).

For a toy example, see the `DiagonalTensor` and `SubDiagonalTensor` class in test/test_overrides.py.

This PR currently contains:

* tests for `__torch_function__` behavior
* modification to `gen_python_functions` and `parse` function signatures and dispatched to correct overloaded argument.

This feature is inspired by and analogous to NumPy's `__array_function__` protocol ([see NumPy Enhancement Proposal 18](https://numpy.org/neps/nep-0018-array-function-protocol.html#trying-array-function-methods-until-the-right-one-works)).

### Benchmarks:
See Nathan's comment below: https://github.com/pytorch/pytorch/pull/27064#issuecomment-554601189
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27064

Differential Revision: D18645954

Pulled By: ezyang

fbshipit-source-id: 54b5e4344d7afdbcf996bb57191b0bdadc7b1767
2019-12-04 05:56:46 -08:00
peterjc123
6deb41c88d Update magma to 2.5.1 for Windows and switch CUDA in CI to 9.2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30513

Differential Revision: D18764184

Pulled By: ezyang

fbshipit-source-id: 4992869fd6a89471a5d25eb6a9b44ad8eceb480f
2019-12-02 11:56:10 -08:00
Rohan Varma
1350b99de4 Add local shutdown to process group agent (#30330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330

This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately.

ghstack-source-id: 94673884
ghstack-source-id: 94673884

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18661775

fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2
2019-11-27 22:34:08 -08:00
Rohan Varma
5c6705e62c add default arg for init_method (#30208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30208

Adds default arg for init_method so users don't have to pass this in,
and moves it to `RpcBackendOptions` struct. Removes `init_method` arg from rpc.init_rpc. Also fixes some docs.
ghstack-source-id: 94500475

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18630074

fbshipit-source-id: 04b7dd7ec96f4c4da311b71d250233f1f262135a
2019-11-25 14:52:48 -08:00
Shen Li
a9f3f48f88 Revert D5578006: Add local shutdown to process group agent
Test Plan: revert-hammer

Differential Revision:
D5578006

Original commit changeset: 6258879fb44c

fbshipit-source-id: 11b893b3a280a8383eeb20a0548626811616dca1
2019-11-22 11:31:04 -08:00
Rohan Varma
c478a92b93 Add local shutdown to process group agent (#30020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30020
This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The destructor calls this same `localShutdown` method, but we ensure this is not called multiple times.

ghstack-source-id: 94415336

Test Plan: Unit tests pass.

Differential Revision: D5578006

fbshipit-source-id: 6258879fb44c9fca97fdfad64468c1488c16ac02
2019-11-22 10:03:00 -08:00
Shen Li
063e22b7c2 Fix RRef design doc warning (#30240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30240

Get rid of the following warning when build docs:

```
/Users/shenli/Project/pytorch/docs/source/notes/rref.rst:184: WARNING: Error in "code" directive:
maximum 1 argument(s) allowed, 6 supplied.

.. code::
  import torch
  import torch.distributed.rpc as rpc

  # on worker A
  rref = rpc.remote('B', torch.add, args=(torch.ones(2), 1))
  # say the rref has RRefId 100 and ForkId 1
  rref.to_here()
```

Test Plan: Imported from OSS

Differential Revision: D18640016

Pulled By: mrshenli

fbshipit-source-id: d527827f01183411d4b4c73e0a976bdd7fccbf49
2019-11-21 16:22:39 -08:00
Shen Li
e0325011e4 Add link to RRef protocol in RPC doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30218

Test Plan: Imported from OSS

Differential Revision: D18638881

Pulled By: mrshenli

fbshipit-source-id: ca6fae6f8cea8cdcc33d275dd71a347fbb5dd45c
2019-11-21 16:22:35 -08:00
Alban Desmaison
a78e7eadbd Fix typo in extending doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30159

Differential Revision: D18619060

Pulled By: albanD

fbshipit-source-id: 1109c8da6242dffd6315b0c9de0f8ca34df0b276
2019-11-21 08:12:32 -08:00
Shen Li
73cf4d468f Design doc for Remote Reference (#30066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30066

This commit adds design reasoning and walks through four scenarios
for RRef.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D18595094

Pulled By: mrshenli

fbshipit-source-id: 134102901ce515a44a2e7cd013b62143a6158120
2019-11-20 12:42:28 -08:00
Rohan Varma
f304bd5062 rename join_rpc to wait_all_workers in public api (#30050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30050

Renames this API to wait_all_workers as discussed.
ghstack-source-id: 94273005

Test Plan: Unit tests pass

Differential Revision: D18581466

fbshipit-source-id: 4ff5d5fb2d528f17252d5b5f30c3047d2efb92bf
2019-11-20 12:38:35 -08:00
Pritam Damania
88ef402cb5 Add distributed optimizer section to distributed autograd design doc. (#30068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30068

ghstack-source-id: 94228719

Test Plan: waitforbuildbot

Differential Revision: D18556536

fbshipit-source-id: decd6927bfdd1ee3c81fef7430aa7095d7f38d33
2019-11-19 22:43:03 -08:00
Pritam Damania
ab93b3df60 Polish distributed autograd docs. (#29942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29942

1) Added links to the design.
2) Fixed function signautres.
3) Expanded examples
ghstack-source-id: 94162372

Test Plan: waitforbuildbot

Differential Revision: D18547103

fbshipit-source-id: 067ba166c107ed14085af8ee3306d3f8a9dcebe7
2019-11-18 18:13:08 -08:00
Pritam Damania
eb29276623 Update distributed autograd design doc with appropriate links. (#29927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29927

With the docs page now up, we can update the links in the design doc
to point to the docs page.
ghstack-source-id: 94055423

Test Plan: waitforbuildbot

Differential Revision: D18541878

fbshipit-source-id: f44702d9a8296ccc0a5d58d56c3b6dc8a822b520
2019-11-15 21:10:53 -08:00
Pritam Damania
c3b2c2e353 Design doc for distributed autograd. (#29175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29175

Updates our docs to include a design doc for distributed autograd.
Currently, this doc only covers the FAST mode algorithm. The Smart mode
algorithm section just refers to the original RFC.

There is a section for Distributed Optimizer that we can complete once we've
finalized the API for the same.
ghstack-source-id: 93701129

Test Plan: look at docs.

Differential Revision: D18318949

fbshipit-source-id: 670ea1b6bb84692f07facee26946bbc6ce8c650c
2019-11-12 15:04:23 -08:00
Alban Desmaison
f5edb62a7f Clean extending autograd doc for output size 1 (#28860)
Summary:
Fix https://github.com/pytorch/pytorch/issues/28583
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28860

Differential Revision: D18224497

Pulled By: albanD

fbshipit-source-id: 0fa4eacce6f6092d555e509dc23bd75206f78d41
2019-10-30 13:57:10 -07:00
Prasun Anand
4230132baf Added docs for context method mixins. Fixes issue #27365 (#28643)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27365 .

This PR:
1. Makes Context method docs available.
2. Links [Extending torch autograd](https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd) notes to Context method docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28643

Differential Revision: D18170089

Pulled By: albanD

fbshipit-source-id: a1119ea8e2f8a71f0d1aadf416f2f98343aa9b7b
2019-10-28 08:31:35 -07:00
zou3519
e5d6b75319 Bag of documentation fixes; fix more sphinx warnings (#27850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850

Many of these are real problems in the documentation (i.e., link or
bullet point doesn't display correctly).

Test Plan: - built and viewed the documentation for each change locally.

Differential Revision: D17908123

Pulled By: zou3519

fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a
2019-10-15 07:31:14 -07:00
Jerry Ma
1610ea8ef8 Comprehensive-ish instrumentation for CUDA memory allocator (#27361)
Summary:
Adds comprehensive memory instrumentation to the CUDA caching memory allocator.

# Counters

Added comprehensive instrumentation for the following stats:
  - Allocation requests (`allocation`)
  - Allocated memory (`allocated_bytes`)
  - Reserved segments from cudaMalloc (`segment`)
  - Reserved memory (`reserved_bytes`)
  - Active memory blocks (`active`)
  - Active memory (`active_bytes`)
  - Inactive, non-releasable blocks (`inactive_split`)
  - Inactive, non-releasable memory (`inactive_split_bytes`)
  - Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`)
  - Number of OOMs (`num_ooms`)

Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator.

# Snapshots

Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state.

# Implementation: major changes

- Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary.
- Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments.
- Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq

# Implementation: minor changes

- Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`.
- Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module.
- Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`.
- `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent.
- `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`.
- Style (add access modifiers in the allocator class, random nit fixes, etc.)

# Testing

- Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`.
- Ran on various basic workflows (toy example, CIFAR)

# Performance

Running the following speed benchmark: https://pastebin.com/UNndQg50

- Before this PR: 45.98 microseconds per tensor creation
- After this PR: 46.65 microseconds per tensor creation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361

Differential Revision: D17758747

Pulled By: jma127

fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6
2019-10-08 15:42:48 -07:00
Tongzhou Wang
98d3d1659e Document benchmarking practice for CUDA
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23910

Differential Revision: D16732365

Pulled By: ezyang

fbshipit-source-id: 24e055602d479293da3e00a7143bba8f92bb7c4a
2019-08-13 15:07:23 -07:00
Ilia Cherniavskii
936632b120 Thread local debug info
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22365

Test Plan:
USE_CUDA=0 python setup.py develop
./build/bin/test_jit

Imported from OSS

Reviewed By: ajyu

Differential Revision: D16065129

Pulled By: ilia-cher

fbshipit-source-id: f300985459a83c2c1049ed8c4fefd23a3144047a
2019-08-12 14:53:57 -07:00
Ilia Cherniavskii
41dfe7204b Threading and CPU Inference note
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23417

Test Plan:
cd docs; make html

Imported from OSS

Differential Revision: D16523781

Pulled By: ilia-cher

fbshipit-source-id: d6c09e8a85d39e6185bbdc4b312fea44fcdfff06
2019-07-29 15:45:49 -07:00
Dmytro Dzhulgakov
d6dcec37b6 Add docs about prod ecosystem features (#23010)
Summary:
Covering fleet-wide profiling, api logging, etc.

It's my first time writing rst, so suggestions are definitely welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23010

Differential Revision: D16456721

Pulled By: dzhulgakov

fbshipit-source-id: 3d3018f41499d04db0dca865bb3a9652d8cdf90a
2019-07-24 14:15:33 -07:00
Tongzhou Wang
058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00
Bram Vanroy
38d68ad803 Update randomness.rst (#21337)
Summary:
Following [this question on the forums](https://discuss.pytorch.org/t/reproducibility-and-performance/46504), I propose the following doc change. It clarifies that 'performance reduction' concerns the processing speed (and not the training accuracy).

Related website commit: https://github.com/pytorch/pytorch.github.io/pull/211
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21337

Differential Revision: D15622151

Pulled By: soumith

fbshipit-source-id: f0edeb20049f2ee715c400e7c57abb966864d621
2019-06-04 07:38:00 -07:00
Tongzhou Wang
bb89827e1d Update cuda pinned memory note to include tensor.to (#20977)
Summary:
separate bits of changes from #19228
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20977

Differential Revision: D15511919

Pulled By: soumith

fbshipit-source-id: 5015a29cdac6d6e160388c493182c330f0da63ec
2019-05-26 22:22:06 -07:00
Tongzhou Wang
83fe92870d Update multiprocessing note now that shared CUDA tensors are refcounted (#19904)
Summary:
The mp notes are not updated after https://github.com/pytorch/pytorch/pull/16854. (The torch.multiprocessing page is.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19904

Differential Revision: D15509661

Pulled By: soumith

fbshipit-source-id: 7c11e14a6c804498dda3adbf19710e63e6a564a0
2019-05-25 17:40:42 -07:00
Sergii Dymchenko
a5c90aaf47 Use "length of the RNN input" instead of "length of the RNN"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20873

Differential Revision: D15495570

Pulled By: ezyang

fbshipit-source-id: e3b4cd67ccf97d0053ac053c3bcb74415b928c0a
2019-05-24 09:03:50 -07:00
peter
39b885cbbf Add magma for CUDA 10.1 to Windows docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19914

Differential Revision: D15123029

Pulled By: soumith

fbshipit-source-id: a3d4b498a763e1a9829896d44211be00400ec39d
2019-04-29 10:13:21 -07:00
Tongzhou Wang
6d307db5b4 Move cuFFT plan cache note outside Best Practices (#19538)
Summary:
I mistakenly put it there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19538

Differential Revision: D15026500

Pulled By: soumith

fbshipit-source-id: 0c13499571fdfd789c3bd1c4b58abd870725d422
2019-04-20 21:39:59 -07:00
Tongzhou Wang
973d51079b Add device-specific cuFFT plan caches (#19300)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300

Differential Revision: D14986967

Pulled By: soumith

fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255
2019-04-18 06:39:35 -07:00
Edward Yang
48a35135fb Convert all tabs to spaces, add CI. (#18959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959
ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156

Differential Revision: D14831246

Pulled By: ezyang

fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0
2019-04-09 08:12:26 -07:00