Summary:
This PR fixes the formatting issues in the new language reference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56042
Reviewed By: gmagogsfm
Differential Revision: D27830179
Pulled By: nikithamalgifb
fbshipit-source-id: bce3397d4de3f1536a1a8f0a16f10a703e7d4406
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345
Changes:
* Add `i0e`
* Move some kernels from `UnaryOpsKernel.cu` to `UnarySpecialOpsKernel.cu` to decrease compilation time per file.
Time taken by i0e_vs_scipy tests: around 6.33.s
<details>
<summary>Test Run Log</summary>
```
(pytorch-cuda-dev) kshiteej@qgpu1:~/Pytorch/pytorch_module_special$ pytest test/test_unary_ufuncs.py -k _i0e_vs
======================================================================= test session starts ========================================================================
platform linux -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1
rootdir: /home/kshiteej/Pytorch/pytorch_module_special, configfile: pytest.ini
plugins: hypothesis-5.38.1
collected 8843 items / 8833 deselected / 10 selected
test/test_unary_ufuncs.py ...sss.... [100%]
========================================================================= warnings summary =========================================================================
../../.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73
test/test_unary_ufuncs.py::TestUnaryUfuncsCUDA::test_special_i0e_vs_scipy_cuda_bfloat16
/home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/warnings.html
===================================================================== short test summary info ======================================================================
SKIPPED [3] test/test_unary_ufuncs.py:1182: not implemented: Could not run 'aten::_copy_from' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_copy_from' is only available for these backends: [BackendSelect, Named, InplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
InplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:56 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_4.cpp:9348 [kernel]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:250 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
==================================================== 7 passed, 3 skipped, 8833 deselected, 2 warnings in 6.33s =====================================================
```
</details>
TODO:
* [x] Check rendered docs (https://11743402-65600975-gh.circle-artifacts.com/0/docs/special.html)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54409
Reviewed By: jbschlosser
Differential Revision: D27760472
Pulled By: mruberry
fbshipit-source-id: bdfbcaa798b00c51dc9513c34626246c8fc10548
Summary:
Related to https://github.com/pytorch/pytorch/issues/52256
Use autosummary instead of autofunction to create subpages for optim and cuda functions/classes.
Also fix some minor formatting issues in optim.LBFGS and cuda.stream docstings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55673
Reviewed By: jbschlosser
Differential Revision: D27747741
Pulled By: zou3519
fbshipit-source-id: 070681f840cdf4433a44af75be3483f16e5acf7d
Summary:
Related to https://github.com/pytorch/pytorch/issues/52256
Use autosummary instead of autofunction to create subpages for autograd functions. I left the autoclass parts intact but manually laid out their members.
Also the Latex formatting of the spcecial page emitted a warning (solved by adding `\begin{align}...\end{align}`) and fixed alignment of equations (by using `&=` instead of `=`).
zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55672
Reviewed By: jbschlosser
Differential Revision: D27736855
Pulled By: zou3519
fbshipit-source-id: addb56f4f81c82d8537884e0ff243c1e34969a6e
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901
This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'
Approach:
- Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering
Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742
Reviewed By: ngimel
Differential Revision: D23752058
Pulled By: ezyang
fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8
Summary:
Related to https://github.com/pytorch/pytorch/issues/52256
Use autosummary instead of autofunction to create subpages for `torch.fft` and `torch.linalg` functions.
zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55748
Reviewed By: jbschlosser
Differential Revision: D27739282
Pulled By: heitorschueroff
fbshipit-source-id: 37aa06cb8959721894ffadc15ae8c3b83481a319
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55817
**Summary**
This commit makes minor edits to the docstrings of `PackageExporter` so
that they render properly in the `torch.package` API reference.
**Test Plan**
Continuous integration (especially the docs tests).
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D27726817
Pulled By: SplitInfinity
fbshipit-source-id: b81276d7278f586fceded83d23cb4d0532f7c629
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55812
**Summary**
This commit creates a barebones API reference doc for `torch.package`.
The content is sourced from the docstrings in the source for the
`torch.package`.
**Test Plan**
Continuous integration (specifically the docs tests).
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D27726816
Pulled By: SplitInfinity
fbshipit-source-id: 5e9194536f80507e337b81c5ec3b5635d7121818
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345
Chages:
* Alias for sigmoid and logit
* Adds out variant for C++ API
* Updates docs to link back to `special` documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54759
Reviewed By: mrshenli
Differential Revision: D27615208
Pulled By: mruberry
fbshipit-source-id: 8bba908d1bea246e4aa9dbadb6951339af353556
Summary:
Related to https://github.com/pytorch/pytorch/issues/52256
Splits torch.nn.functional into a table-of-contents page and many sub-pages, one for each function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55038
Reviewed By: gchanan
Differential Revision: D27502677
Pulled By: zou3519
fbshipit-source-id: 38e450a0fee41c901eb56f94aee8a32f4eefc807
Summary:
This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility.
MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000.
Unfortunately, there is no cuSOLVER function for this operation.
Autograd support for `torch.linalg.eig` will be added in a follow-up PR.
Ref https://github.com/pytorch/pytorch/issues/42666
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52491
Reviewed By: anjali411
Differential Revision: D27563616
Pulled By: mruberry
fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55253
Previously DDP communication hooks takes a tensor list as the input. Now only takes a single tensor, as the preparation of retiring SPMD and only providing a single model replica for DDP communication hooks.
The next step is limiting only 1 model replica in Reducer.
ghstack-source-id: 125677637
Test Plan: waitforbuildbot
Reviewed By: zhaojuanmao
Differential Revision: D27533898
fbshipit-source-id: 5db92549c440f33662cf4edf8e0a0fd024101eae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55031
It turns out that PowerSGD hooks can work on PyTorch native AMP package, but not Apex AMP package, which can somehow mutate gradients during the execution of communication hooks.
{F561544045}
ghstack-source-id: 125268206
Test Plan:
Used native amp backend for the same pytext model and worked:
f261564342
f261561664
Reviewed By: rohan-varma
Differential Revision: D27436484
fbshipit-source-id: 2b63eb683ce373f9da06d4d224ccc5f0a3016c88
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52859
This reverts commit 92a4ee1cf6.
Added support for bfloat16 for CUDA 11 and removed fast-path for empty input tensors that was affecting autograd graph.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27402390
Pulled By: heitorschueroff
fbshipit-source-id: 73c5ccf54f3da3d29eb63c9ed3601e2fe6951034
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.
The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:
- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`
I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):
- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)
To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737
Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:
- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true
In contrast, this run (after correcting the trailing newlines in this PR) succeeded:
- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241
To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```
Reviewed By: malfet
Differential Revision: D27409736
Pulled By: samestep
fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
Summary:
This is to prepare for new language reference spec that needs to describe `torch.jit.Attribute` and `torch.jit.annotate`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54485
Reviewed By: SplitInfinity, nikithamalgifb
Differential Revision: D27406843
Pulled By: gmagogsfm
fbshipit-source-id: 98983b9df0f974ed69965ba4fcc03c1a18d1f9f5
Summary:
Reference: https://github.com/pytorch/pytorch/issues/38349
Wrapper around the existing `torch.gather` with broadcasting logic.
TODO:
* [x] Add Doc entry (see if phrasing can be improved)
* [x] Add OpInfo
* [x] Add test against numpy
* [x] Handle broadcasting behaviour and when dim is not given.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52833
Reviewed By: malfet
Differential Revision: D27319038
Pulled By: mruberry
fbshipit-source-id: 00f307825f92c679d96e264997aa5509172f5ed1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54645
Had to replace RRef[..] with just RRef in the return signature since
sphynx seemed to completely mess up rendering RRef[..]
ghstack-source-id: 125024783
Test Plan: View locally.
Reviewed By: SciPioneer
Differential Revision: D27314609
fbshipit-source-id: 2dd9901e79f31578ac7733f79dbeb376f686ed75