Commit Graph

1393 Commits

Author SHA1 Message Date
Aliaksandr Ivanou
4e181dfc35 [torch] Various improvements to torch.distributed.launch and torch.distributed.run (#60925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925

* Make `torch.distributed.launch` restarts to 0
* Remove unnecessary `-use_env` warning, move `-use_env` warnings
* Move `-use_env` warnings to `torch.distributed.launch`
* Make default log level WARNING
* Add new doc section around transitioning to `torch.distributed.run`
* Make `torch.distributed.launch` not use error-propagation
* Set default events handler to `null` that does not print events to console
* Add reference from `torch.distributed.launch` to `torch.distributed.run`
* Set correct preexec function that sends SIGTERM to child processes when parent dies

Issues resolved:

https://github.com/pytorch/pytorch/issues/60716
https://github.com/pytorch/pytorch/issues/60754

Test Plan:
sandcastle

    python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts
    python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts

    python -m torch.distributed.launch --nproc_per_node=4  --use_env --no_python  main.py -> produces error
    python -m torch.distributed.launch --nproc_per_node=4  --use_env main.py -> no warning
    python -m torch.distributed.launch --nproc_per_node=4  --no_python  main.py ->warning

Output of running torch.distributed.launch without --use_env:

    $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated
    and will be removed in future. Use torch.distributed.run.
    Note that --use_env is set by default in torch.distributed.run.
    If your script expects `--local_rank` argument to be set, please
    change it to read from `os.environ('LOCAL_RANK')` instead.

New section:

{F628923078}

{F628974089}

Reviewed By: kiukchung, cbalioglu

Differential Revision: D29413019

fbshipit-source-id: 323bfbad9d0e4aba3b10ddd7a243ca6e48169630
2021-06-30 23:31:02 -07:00
Heitor Schueroff
f32f85e6da Implemented torch.corrcoef (#60420)
Summary:
Implements `torch.corrcoef` similar to [`np.corrcoef`](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html) using `torch.cov` implemented in https://github.com/pytorch/pytorch/pull/58311.

closes https://github.com/pytorch/pytorch/issues/1254

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60420

Reviewed By: mruberry

Differential Revision: D29474687

Pulled By: heitorschueroff

fbshipit-source-id: f3c7c5610363aebd88274a51fc77e3cf879cb611
2021-06-30 12:36:02 -07:00
Heitor Schueroff
ec9c03c234 Implemented torch.cov (#58311)
Summary:
Based from https://github.com/pytorch/pytorch/pull/50466

Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`.

cc PandaBoi

closes https://github.com/pytorch/pytorch/issues/19037

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311

Reviewed By: jbschlosser

Differential Revision: D29431651

Pulled By: heitorschueroff

fbshipit-source-id: 167dea880f534934b145ba94291a9d634c25b01b
2021-06-29 14:02:39 -07:00
Jeff Yang
a8057e7ef1 docs: add permute in torch docs (#60821)
Summary:
fix https://github.com/pytorch/pytorch/issues/60181

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60821

Reviewed By: VitalyFedyunin

Differential Revision: D29431949

Pulled By: jbschlosser

fbshipit-source-id: 2353afceaa188315cde1f0c955897c4750809c8e
2021-06-28 11:20:35 -07:00
Michael Carilli
2fa6c7627e [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421)
Summary:
Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe:
```python
with torch.cuda.stream(s):
    # imagine forward used many streams, so backward leaf nodes may run on many streams
    loss.backward()
# no sync
use grads
```

but a more benign-looking pattern was unsafe:
```python
with torch.cuda.stream(s):
    # imagine forward used a lot of streams, so backward leaf nodes may run on many streams
    loss.backward()
    # backward() syncs the default stream with all the leaf streams, but does not sync s with anything,
    # so counterintuitively (even though we're in the same stream context as backward()!)
    # it is NOT SAFE to use grads here, and there's no easy way to make it safe,
    # unless you manually sync on all the streams you used in forward,
    # or move "use grads" back to default stream outside the context.
    use grads
```
mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes).** In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams.

After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility.

This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream.

With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)).

** first paragraph has a formatting error which this PR should also fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421

Reviewed By: albanD

Differential Revision: D29370344

Pulled By: ngimel

fbshipit-source-id: 3248bc5fb92fc517db0c15c897e5d7250f67d7fe
2021-06-24 17:34:02 -07:00
sawradip
eddc5f40f9 Added GLU and FeatureAlphaDropout to nn docs (#60590)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60563 and https://github.com/pytorch/pytorch/issues/60570

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60590

Reviewed By: albanD

Differential Revision: D29352372

Pulled By: jbschlosser

fbshipit-source-id: f81dd65deab1848a68dc202df252c416ce5214d0
2021-06-24 08:00:18 -07:00
Luca Wehrstedt
bb9e1150ea Revert D29342234: [pytorch][PR] [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream
Test Plan: revert-hammer

Differential Revision:
D29342234 (675cea1adb)

Original commit changeset: 98e6be7fdd85

fbshipit-source-id: 84022973248b2254210eee57402df2c4f4bc43c6
2021-06-24 04:49:28 -07:00
kshitij12345
dfd2edc025 [special] add zeta (#59623)
Summary:
Reference https://github.com/pytorch/pytorch/issues/50345

`zeta` was already present in the codebase to support computation of `polygamma`.

However, `zeta` only had `double(double, double)` signature **for CPU** before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part).

With this PR, float computations will take place in float and double in double.

Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h`

**Note**: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial.

Verify:
* [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623

Reviewed By: ngimel

Differential Revision: D29348269

Pulled By: mruberry

fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb
2021-06-24 00:00:12 -07:00
Akifumi Imanishi
26cdec6ce4 Support torch.bitwise_{left/right}_shift and __rlshift__, __rrshift__ (#59544)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58121

This PR implements `torch.bitwise_left_shift` and `torch.bitwise_right_shift` and `torch.Tensor.{__rlshift__/__rrshift__}`for compatibility with Python array API standard.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59544

Reviewed By: ngimel

Differential Revision: D29348869

Pulled By: mruberry

fbshipit-source-id: 329aee296cf890735e8a9f858bccfe87c03d06ca
2021-06-23 23:57:16 -07:00
Michael Carilli
675cea1adb [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421)
Summary:
Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe:
```python
with torch.cuda.stream(s):
    # imagine forward used many streams, so backward leaf nodes may run on many streams
    loss.backward()
# no sync
use grads
```

but a more benign-looking pattern was unsafe:
```python
with torch.cuda.stream(s):
    # imagine forward used a lot of streams, so backward leaf nodes may run on many streams
    loss.backward()
    # backward() syncs the default stream with all the leaf streams, but does not sync s with anything,
    # so counterintuitively (even though we're in the same stream context as backward()!)
    # it is NOT SAFE to use grads here, and there's no easy way to make it safe,
    # unless you manually sync on all the streams you used in forward,
    # or move "use grads" back to default stream outside the context.
    use grads
```
mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes).** In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams.

After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility.

This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream.

With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)).

** first paragraph has a formatting error which this PR should also fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421

Reviewed By: VitalyFedyunin, albanD

Differential Revision: D29342234

Pulled By: ngimel

fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63
2021-06-23 23:35:24 -07:00
Ilqar Ramazanli
63219f1f9f To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: vincentqb

Differential Revision: D29310601

Pulled By: iramazanli

fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9
2021-06-23 18:27:57 -07:00
Ilqar Ramazanli
e8690dacb2 To add Nesterov Adam Algorithm to Optimizers (#59009)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/5804

In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ  Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms.

It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea.

In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf  where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well:

f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009

Reviewed By: gchanan, vincentqb

Differential Revision: D29220375

Pulled By: iramazanli

fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa
2021-06-23 08:21:43 -07:00
Weiqiang Wu
6a87e8d087 Implement erfcx() (#58194)
Summary:
Implement erfcx() https://github.com/pytorch/pytorch/issues/31945

Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58194

Reviewed By: ngimel

Differential Revision: D29285979

Pulled By: mruberry

fbshipit-source-id: 5bcfe77fddfabbeb8c8068658ba6d9fec6430399
2021-06-22 12:38:38 -07:00
Sam Estep
1abf45e37f Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers
Test Plan: revert-hammer

Differential Revision:
D29241736 (0d2a936176)

Original commit changeset: 288b9b1f3125

fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670
2021-06-22 12:08:31 -07:00
Ilqar Ramazanli
0d2a936176 To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: gchanan

Differential Revision: D29241736

Pulled By: iramazanli

fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448
2021-06-22 10:38:41 -07:00
Saketh Are
729f7cd52f Implement histogram operator on CPU (#58780)
Summary:
The existing [torch.histc](https://pytorch.org/docs/stable/generated/torch.histc.html) operator is limited in comparison to [numpy.histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html). This PR adds torch.histogram on CPU. The new operator replicates numpy.histogram's behavior, including support for caller-specified bin edges and weights. It was motivated by previous community requests for histogram.

The implementation was [benchmarked](https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing) against numpy.histogram as well as torch.histc. This implementation is weakly faster than numpy.histogram across all types of inputs tested, and performs in line with torch.histc for the limited inputs histc supports.

mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58780

Test Plan:
Added unit tests, OpInfo for the new torch.histogram operator.

Tested execution time on a variety of input sizes and compared to numpy.histogram performance: https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing

Reviewed By: ezyang

Differential Revision: D29134626

Pulled By: saketh-are

fbshipit-source-id: f2773085de1697f6bc6ffdeffe9a81267f51bdfc
2021-06-22 10:06:04 -07:00
kshitij12345
01e0296eb7 [special] migrate log1p, sinc, round to special namespace (#55878)
Summary:
Reference : https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55878

Reviewed By: zou3519, janeyx99

Differential Revision: D29160593

Pulled By: mruberry

fbshipit-source-id: f3ca9c541382bab33fb85d7817ce8ddc117c6826
2021-06-21 12:34:29 -07:00
Michael Wootton
2f3be2735f Don't split oversize cached blocks (#44742)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901

This change is designed to prevent fragmentation in the Caching Allocator.  Permissive block splitting in the allocator allows very large blocks to be split into many pieces.  Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned.   Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'

Approach:

- Large blocks above a certain size are designated "oversize".  This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated.  This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering

Initial performance tests show this is similar or quicker than the original strategy.  Additional tests are ongoing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742

Reviewed By: zou3519

Differential Revision: D29186394

Pulled By: ezyang

fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9
2021-06-21 11:46:08 -07:00
Thomas J. Fan
c16f87949f ENH Adds nn.ReflectionPad3d (#59791)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655

This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791

Reviewed By: gchanan

Differential Revision: D29242015

Pulled By: jbschlosser

fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
2021-06-21 10:53:14 -07:00
Michael Carilli
f89ae9cb8d Moves grid_sampler to autocast promote list (#58618)
Summary:
Should close https://github.com/pytorch/pytorch/issues/42218

Numerically, `grid_sampler` is fine in fp16 or fp32, but takes several inputs and expects their dtypes to match, so it belongs on the autocast promote list.

`grid_sampler` currently uses `gpuAtomicAdd`, notoriously slow in fp16 because it calls cuda's atomicAdd __half overload which uses a software compare-and-swap loop internally. To allow good performance if both inputs happen to be FP16, the PR also modifies `grid_sampler_[2,3]d_backward_kernel`s to use `fastAtomicAdd` instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58618

Reviewed By: mruberry

Differential Revision: D29257199

Pulled By: ngimel

fbshipit-source-id: 3cc7505945b480427f2fc1beb36bee80bf3853b3
2021-06-21 10:22:36 -07:00
kshitij12345
5ec4ad7f54 [special] Add special.ndtri (#58650)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

TODO
* [x] Add docs https://13865352-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtri
* [x] Add comments on implementation
* [x] Clean-up

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58650

Reviewed By: H-Huang

Differential Revision: D29160170

Pulled By: mruberry

fbshipit-source-id: 50e4ea663920e97b8437d03d5b52bcd9dedc1a8d
2021-06-19 18:36:54 -07:00
Patrick Wang
8b55e9feaf removed cat, equal, and stack from autocast promote list (#59497)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59497

Reviewed By: zou3519

Differential Revision: D29185909

Pulled By: ngimel

fbshipit-source-id: db96239106d9e46a2704b8f457fd0463dacc1f5c
2021-06-17 21:13:22 -07:00
Patrick
5948e6f653 removed gelu from autocast fp32 list (#59639)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59639

Reviewed By: H-Huang

Differential Revision: D29155914

Pulled By: ezyang

fbshipit-source-id: feb117181894c2355768d5b1189b3d5f1649fc0b
2021-06-16 16:29:57 -07:00
Michael Suo
15f236f3e3 [package] fix tutorial link (#60113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60113

The tutorial link in the docs was to an fb-only colab.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D29169818

Pulled By: suo

fbshipit-source-id: 374807c234a185bd515b8ffe1300e6cf8d821636
2021-06-16 11:27:25 -07:00
BowenBao
55530e2276 Update Autograd Export Docs (#56594) (#59534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59534

Update autograd export docs

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb, ansley

Differential Revision: D29046606

Pulled By: SplitInfinity

fbshipit-source-id: 36057f6bdfd3e5c071dbca05d327de7952904120

Co-authored-by: neginraoof <neginmr@utexas.edu>
2021-06-15 12:23:00 -07:00
Joel Schlosser
c645d39a77 Implementation of torch.isin() (#53125)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/3025

## Background

This PR implements a function similar to numpy's [`isin()`](https://numpy.org/doc/stable/reference/generated/numpy.isin.html#numpy.isin).

The op supports integral and floating point types on CPU and CUDA (+ half & bfloat16 for CUDA). Inputs can be one of:
* (Tensor, Tensor)
* (Tensor, Scalar)
* (Scalar, Tensor)

Internally, one of two algorithms is selected based on the number of elements vs. test elements. The heuristic for deciding which algorithm to use is taken from [numpy's implementation](fb215c7696/numpy/lib/arraysetops.py (L575)): if `len(test_elements) < 10 * len(elements) ** 0.145`, then a naive brute-force checking algorithm is used. Otherwise, a stablesort-based algorithm is used.

I've done some preliminary benchmarking to verify this heuristic on a devgpu, and determined for a limited set of tests that a power value of `0.407` instead of `0.145` is a better inflection point. For now, the heuristic has been left to match numpy's, but input is welcome for the best way to select it or whether it should be left the same as numpy's.

Tests are adapted from numpy's [isin and in1d tests](7dcd29aaaf/numpy/lib/tests/test_arraysetops.py).

Note: my locally generated docs look terrible for some reason, so I'm not including the screenshot for them until I figure out why.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53125

Test Plan:
```
python test/test_ops.py   # Ex: python test/test_ops.py TestOpInfoCPU.test_supported_dtypes_isin_cpu_int32
python test/test_sort_and_select.py   # Ex: python test/test_sort_and_select.py TestSortAndSelectCPU.test_isin_cpu_int32
```

Reviewed By: soulitzer

Differential Revision: D29101165

Pulled By: jbschlosser

fbshipit-source-id: 2dcc38d497b1e843f73f332d837081e819454b4e
2021-06-14 13:50:53 -07:00
Meghan Lele
8e92a3a8b0 [docs] Add pickle security warning to package docs (#59959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59959

**Summary**
This commit replaces the warning on the `torch.package` documentation
page about the module not being publicly released (which will no longer
be true as of 1.9) with one that warns about security issues caused by
the use of the `pickle` module.

**Test Plan**
1) Built the docs locally.
2) Continuous integration.

<img width="877" alt="Captura de Pantalla 2021-06-14 a la(s) 11 22 05 a  m" src="https://user-images.githubusercontent.com/4392003/121940300-c98cab00-cd02-11eb-99dc-08e29632079a.png">

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29108429

Pulled By: SplitInfinity

fbshipit-source-id: 3a0aeac0dc804a31203bc5071efb1c5bd6ef9725
2021-06-14 13:03:05 -07:00
Kushashwa Ravi Shrimali
cf38b20c61 Alias for digamma as psi to special namespace (#59143)
Summary:
See https://github.com/pytorch/pytorch/issues/50345

cc: mruberry kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59143

Reviewed By: jbschlosser

Differential Revision: D28986909

Pulled By: mruberry

fbshipit-source-id: bc8ff0375de968f3662b224689fa0a6b117f9c4e
2021-06-14 03:05:14 -07:00
Michael Carilli
be038d8989 [CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833)
Summary:
ci-all resubmit of https://github.com/pytorch/pytorch/pull/54227.

Tests look good except for a few distributed autograd failures (pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test) and rocm failures (pr/pytorch-linux-bionic-rocm4.1-py3.6).

The common denominator in rocm failures appears to be multi-gpu activity: some [multiprocess DDP failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test1/8115/console), some [single-process failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test2/8115/console) where the single process has autograd ops that span devices. jeffdaily jithunnair-amd sunway513, could one of you take a look? The streaming backward change is also beneficial to rocm, I expect.

For debugging rocm failures, I think we should ignore the multiprocess/DDP tests and focus on the single process cases. The root cause is probably the same and the single process cases are simpler.

----------------------------------

Update: Rocm failures are due to https://github.com/pytorch/pytorch/issues/59750.
2718a54032 is a workaround, to be updated once https://github.com/pytorch/pytorch/issues/59750 is fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57833

Reviewed By: mruberry

Differential Revision: D28942391

Pulled By: ngimel

fbshipit-source-id: d6047e971c5f1c6386334bf3641402a92f12e2f8
2021-06-13 12:09:56 -07:00
Mike Ruberry
92513038e8 Revert D28994140: [pytorch][PR] Implemented torch.cov
Test Plan: revert-hammer

Differential Revision:
D28994140 (23c232554b)

Original commit changeset: 1890166c0a9c

fbshipit-source-id: 73dfe1b00464e38f004f99960cdeeb604ed4b20a
2021-06-13 02:33:37 -07:00
Heitor Schueroff
23c232554b Implemented torch.cov (#58311)
Summary:
Based from https://github.com/pytorch/pytorch/pull/50466

Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`.

cc PandaBoi

TODO

- [x] Improve documentation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311

Reviewed By: mruberry

Differential Revision: D28994140

Pulled By: heitorschueroff

fbshipit-source-id: 1890166c0a9c01e0a536acd91571cd704d632f44
2021-06-11 09:40:50 -07:00
Meghan Lele
4025f95a20 [docs] Add table of contents to torch.package docs (#59842)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59842

Test Plan:
Continuous integration.

<img width="544" alt="Captura de Pantalla 2021-06-10 a la(s) 5 13 07 p  m" src="https://user-images.githubusercontent.com/4392003/121612390-2ccec280-ca0f-11eb-87ad-fef632ba05ca.png">

Reviewed By: Lilyjjo

Differential Revision: D29050627

Pulled By: SplitInfinity

fbshipit-source-id: 76c25ed4002cbaf072036e2e14e7857c15077df7
2021-06-10 19:52:50 -07:00
Meghan Lele
0e222db087 [docs] Add explanation section to torch.package docs (#59833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59833

**Summary**
This commit adds an explanation section to the `torch.package`
documentation. This section clarifies and illuminates various aspects of
the internals of `torch.package` that might be of interest to users.

**Test Plan**
Continuous integration.

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Differential Revision: D29050626

Pulled By: SplitInfinity

fbshipit-source-id: 78e0cda00f69506ef2dfc52d6df63694b502269e
2021-06-10 19:52:48 -07:00
Meghan Lele
062dde7285 [docs] Add "how do I" section to torch.package docs (#59503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59503

**Summary**
This commit adds a "how do I..." section to the `torch.package`
documentation. This section contains short guides about how to solve
real-world problems that frequently recur while using `torch.package`.

**Test Plan**
Continuous integration.

<img width="877" alt="Captura de Pantalla 2021-06-04 a la(s) 9 19 54 p  m" src="https://user-images.githubusercontent.com/4392003/120879911-98321380-c57b-11eb-8664-c582c92b7837.png">

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Differential Revision: D29050629

Pulled By: SplitInfinity

fbshipit-source-id: 2b7800732e0a3c1c947f110c05562aed5174a87f
2021-06-10 19:52:47 -07:00
Meghan Lele
6a18ca7a07 [docs] Add tutorials section to torch.package docs (#59499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59499

**Summary**
This commit adds a tutorials section to the torch.package docs.

**Test Plan**
Continuous integration.

<img width="870" alt="Captura de Pantalla 2021-06-04 a la(s) 5 10 31 p  m" src="https://user-images.githubusercontent.com/4392003/120874257-b9ced300-c55a-11eb-84dd-721cb7ac73ab.png">

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Differential Revision: D29050628

Pulled By: SplitInfinity

fbshipit-source-id: c17ab0100a9d63e7af8da7a618143cedbd0a5872
2021-06-10 19:52:45 -07:00
Meghan Lele
a3db8e0a26 [docs] Add torch.package documentation preamble (#59491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59491

**Summary**
This commit adds a preamble to the `torch.package` documentation page
that explains briefly what `torch.package` is.

**Test Plan**
Continous integration.

<img width="881" alt="Captura de Pantalla 2021-06-04 a la(s) 3 57 01 p  m" src="https://user-images.githubusercontent.com/4392003/120872203-d535e000-c552-11eb-841d-b38df19bc992.png">

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Differential Revision: D29050630

Pulled By: SplitInfinity

fbshipit-source-id: 70a3fd43f076751c6ea83be3ead291686c641158
2021-06-10 19:51:37 -07:00
Rohan Varma
2f395f3b54 [reland] Document debugability features in torch.distributed (#59726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59726

Reland of https://github.com/pytorch/pytorch/pull/59604 with indentation fix
ghstack-source-id: 130979356

Test Plan: ci

Reviewed By: SciPioneer

Differential Revision: D29001923

fbshipit-source-id: 225d9dc5054c223b453f3b39749e2b62f61b9a2c
2021-06-09 16:40:11 -07:00
Luca Wehrstedt
f1786b293d Revert D28972444: [pytorch][PR] Document debugability features in torch.distributed
Test Plan: revert-hammer

Differential Revision:
D28972444 (a9d2810817)

Original commit changeset: da5e8ee84f0d

fbshipit-source-id: 94d3b3b75ddec74ea5b2b76f6a7519dc921ee2a7
2021-06-09 03:04:36 -07:00
Rohan Varma
a9d2810817 Document debugability features in torch.distributed (#59604)
Summary:
Adds comprehensive documentation around debugability features added to `torch.distributed` recently, including the `monitored_barrier` and TORCH_DISTRIBUTED_DEBUG env variable.

![dist_one](https://user-images.githubusercontent.com/8039770/121102672-0f052180-c7b3-11eb-974c-81dbbe102cb6.png)
![dist_two](https://user-images.githubusercontent.com/8039770/121102734-39ef7580-c7b3-11eb-94f7-c75469351440.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59604

Reviewed By: jbschlosser, SciPioneer

Differential Revision: D28972444

Pulled By: rohan-varma

fbshipit-source-id: da5e8ee84f0d6f252c703c4d70ff2a0d5817cc4e
2021-06-08 23:52:19 -07:00
Jeffrey Wan
f52e202840 Add warning when accessing Tensor::grad() in the C++ API (#59362)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35379

 - Adds  `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`.
   - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?).
 - Python API now uses `retain_grad` implementation from cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362

Reviewed By: jbschlosser

Differential Revision: D28969298

Pulled By: soulitzer

fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02
2021-06-08 19:43:21 -07:00
James Reed
02d380450d [FX][docs][EZ] Fix link to fuser example (#59670)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59670

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D28975704

Pulled By: jamesr66a

fbshipit-source-id: 2fb759224b5b1ecc62c0ab26563d2a35ed422794
2021-06-08 17:32:55 -07:00
Vasiliy Kuznetsov
dafa4b3517 quantization: improve documentation on natively supported backends (#58925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58925

Cleans up documentation on natively supported backends.  In particular:
* adds a section title
* deduplicates information about fbgemm/qnnpack
* clarifies what `torch.backends.quantized.engine` does
* adds code samples with default settings for `fbgemm` and `qnnpack`

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D28681840

Pulled By: vkuzo

fbshipit-source-id: 51a6ab66934f657553351f6c84a638fd5f7b4e12
2021-06-07 17:29:03 -07:00
Thomas J. Fan
6ff001c125 DOC Improve documentation for LayerNorm (#59178)
Summary:
Closes https://github.com/pytorch/pytorch/issues/51455

I think the current implementation is aggregating over the correct dimensions. The shape of `normalized_shape` is only used to determine the dimensions to aggregate over. The actual values of `normalized_shape` are used when `elementwise_affine=True` to initialize the weights and biases.

This PR updates the docstring to clarify how `normalized_shape` is used. Here is a short script comparing the implementations for tensorflow and pytorch:

```python
import torch
import torch.nn as nn

import tensorflow as tf
from tensorflow.keras.layers import LayerNormalization

rng = np.random.RandomState()
x = rng.randn(10, 20, 64, 64).astype(np.float32)
# slightly non-trival
x[:, :10, ...] = x[:, :10, ...] * 10 + 20
x[:, 10:, ...] = x[:, 10:, ...] * 30 - 100

# Tensorflow Layer norm
x_tf = tf.convert_to_tensor(x)
layer_norm_tf = LayerNormalization(axis=[-3, -2, -1], epsilon=1e-5)
output_tf = layer_norm_tf(x_tf)
output_tf_np = output_tf.numpy()

# PyTorch Layer norm
x_torch = torch.as_tensor(x)
layer_norm_torch = nn.LayerNorm([20, 64, 64], elementwise_affine=False)
output_torch = layer_norm_torch(x_torch)
output_torch_np = output_torch.detach().numpy()

# check tensorflow and pytorch
torch.testing.assert_allclose(output_tf_np, output_torch_np)

# manual comutation
manual_output = ((x_torch - x_torch.mean(dim=(-3, -2, -1), keepdims=True)) /
                 (x_torch.var(dim=(-3, -2, -1), keepdims=True, unbiased=False) + 1e-5).sqrt())

torch.testing.assert_allclose(output_torch, manual_output)
```

To get to the layer normalization as shown here:

<img width="157" alt="Screen Shot 2021-05-29 at 2 13 52 PM" src="https://user-images.githubusercontent.com/5402633/120080691-1e37f100-c088-11eb-9060-4f263e4cd093.png">

One needs to pass in `normalized_shape` with shape `x.dim() - 1` with the size of the channels and all spatial dimensions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59178

Reviewed By: ejguan

Differential Revision: D28931877

Pulled By: jbschlosser

fbshipit-source-id: 193e05205b9085bb190c221428c96d2ca29f2a70
2021-06-07 14:34:10 -07:00
anjali411
3607478ecd Conjugate View (#54987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987

Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype:

Here's a summary of the changes in this PR:
This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose).

1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor.
2. NEW API:
    a) `.conj()` -- now returning a view.
    b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory.
    c) `.conj_physical_()`, and `out=` variant
    d) `.resolve_conj()`  -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0.
    e) `.resolve_conj_()` in-place version of (d)
    f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors.
    g) `view_as_real` -- existing function, but now errors out on conjugated tensors.
3. Conjugate Fallback
    a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor.
    b) This fallback is well equipped to handle the following cases:
        - functional operation e.g., `torch.sin(input)`
        - Mutable inputs and in-place operations e.g., `tensor.add_(2)`
        - out-of-place operation e.g., `torch.sin(input, out=out)`
        - Tensorlist input args
        - NOTE: Meta tensors don't work with conjugate fallback.
4. Autograd
    a) `resolve_conj()` is an identity function w.r.t. autograd
    b) Everything else works as expected.
5. Testing:
    a) All method_tests run with conjugate view tensors.
    b) OpInfo tests that run with conjugate views
        - test_variant_consistency_eager/jit
        - gradcheck, gradgradcheck
        - test_conj_views (that only run for `torch.cfloat` dtype)

NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit.

Follow up work:
1. conjugate view RFC
2. Add neg bit to re-enable view operation on conjugated tensors
3. Update linalg functions to call into specialized functions that fast path with the hermitian operation.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28227315

Pulled By: anjali411

fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f
2021-06-04 14:12:41 -07:00
Jeffrey Wan
4ae5764d47 Add is_inference to native functions (#58729)
Summary:
Adds `is_inference` as a native function w/ manual cpp bindings.
Also changes instances of `is_inference_tensor` to `is_inference` to be consistent with other properties such as `is_complex`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58729

Reviewed By: mruberry

Differential Revision: D28874507

Pulled By: soulitzer

fbshipit-source-id: 0fa6bcdc72a4ae444705e2e0f3c416c1b28dadc7
2021-06-04 08:59:11 -07:00
Kushashwa Ravi Shrimali
44c20ce676 Alias for i0 to special namespace (#59141)
Summary:
See https://github.com/pytorch/pytorch/issues/50345

cc: mruberry kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59141

Reviewed By: ngimel

Differential Revision: D28784097

Pulled By: mruberry

fbshipit-source-id: 9b61a21906ef337292686fd40e328502a79e6f09
2021-06-01 23:04:09 -07:00
Thomas J. Fan
8af6281201 DOC Adds register_module_full_backward_hook into docs (#58954)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54443

Adds `register_module_full_backward_hook` into the index so it is rendered in the html docs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58954

Reviewed By: ngimel

Differential Revision: D28801816

Pulled By: jbschlosser

fbshipit-source-id: a2e737fe983e5d7e4e26d7639183bca34b571cb8
2021-06-01 15:47:10 -07:00
kshitij12345
fea7a79e0b [special] Add ndtr (#58126)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Plot:
![image](https://user-images.githubusercontent.com/19503980/117942099-54efd680-b328-11eb-8948-c3080779ce19.png)
https://colab.research.google.com/drive/1Of67A042rOImj8wrLF_fUTgoy_wVEOZS?usp=sharing

TODO:
* [x] Add docs (https://13385714-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtr)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58126

Reviewed By: anjali411

Differential Revision: D28700957

Pulled By: mruberry

fbshipit-source-id: 5b9991e97ec1e8fd01518cc9d9849108d35fe406
2021-05-30 21:12:04 -07:00
kshitij12345
5c18994674 [special] Add i1 and i1e (#56352)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

* [x] Check Docs https://12721710-65600975-gh.circle-artifacts.com/0/docs/special.html
* [x] Investigate fp32 failure on CI?! (Fails on clang. Reproduced locally with clang-11)
* [ ] Kernel vs Composite?
* [x] Autograd for `i0e` for zero?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56352

Reviewed By: anjali411

Differential Revision: D28700888

Pulled By: mruberry

fbshipit-source-id: 91a3cbb94f5b8a3b063589ec38179848c11def83
2021-05-29 20:55:23 -07:00
Jeffrey Wan
9e60c7dee3 Add docstring for is_inference_mode_enabled (#59047)
Summary:
Fixes` #{issue number}

Testing:
```
>>> import torch
>>> torch.is_inference_mode_enabled.__doc__
'\nis_inference_mode_enabled(input) -> (bool)\n\nReturns True if inference mode is currently enabled.\n\nArgs:\n    input (Tensor): the input tensor.\n'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59047

Reviewed By: ailzhang

Differential Revision: D28726991

Pulled By: soulitzer

fbshipit-source-id: c117c7d73e551a1b5f0e215f2aed528bf558ef7c
2021-05-26 19:27:33 -07:00