Commit Graph

1820 Commits

Author SHA1 Message Date
mikey dagitses
60729d02f1 remove unused nn_path from generate_code (#74563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74563

This is used inconsistently in all the generate_code program
invocations. Nevertheless, nothing consumes this flag, so we can
safely remove it.

This was removed in #25353.
ghstack-source-id: 152249818

Test Plan: Should be a no-op, rely on CI.

Reviewed By: malfet

Differential Revision: D35053096

fbshipit-source-id: 3ad19e83ca14649b514dc163c3caff6cbd118e14
(cherry picked from commit a43f05bb43553249caac3c3479986cbc45d286ae)
2022-03-31 18:35:30 +00:00
Sherlockk Huang
bbf7e159e0 Implement torch.special.log_ndtr
Implements torch.special.log_ndtr

Issue: https://github.com/pytorch/pytorch/issues/50345

TODO:
- [x] adding proper reference to scipy implementation
- [x] double check if the changes in test/test_unary_ufuncs.py is really necessary
- [x] check setting for UnaryUfuncInfo
cc: @kshitij12345 @mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74795
Approved by: https://github.com/anjali411
2022-03-29 23:13:37 +00:00
Smark
ab57876420 fix docs error in Autograd Mechanics
Fixes #74682

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74807
Approved by: https://github.com/albanD
2022-03-29 18:32:16 +00:00
PyTorch MergeBot
ea44645c9a Revert "Allow specifying tags for aten operators in native_functions.yaml"
This reverts commit 1dab71ab25.

Reverted https://github.com/pytorch/pytorch/pull/72549 on behalf of https://github.com/malfet
2022-03-28 18:04:38 +00:00
Janakan
923a922b1b Grammatically updated quantization tech doc
Improved PyTorch technical documentation consistency for the "quantization API summary" section.
![Screen Shot 2022-03-19 at 4 07 46 PM](https://user-images.githubusercontent.com/72175053/160317638-51e26ec0-903e-44ba-ba59-aa114d4fda93.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74436
Approved by: https://github.com/albanD
2022-03-28 16:48:25 +00:00
Kurt Mohler
5375b2e994 Resolve int[]? arguments to new OptionalIntArrayRef class
This PR uses the `OptionalArrayRef` template class that was drafted in #64084.

Fixes #44409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864
Approved by: https://github.com/ezyang
2022-03-26 01:45:50 +00:00
anjali411
1dab71ab25 Allow specifying tags for aten operators in native_functions.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72549

Approved by: https://github.com/ezyang
2022-03-25 21:17:52 +00:00
Slava Kovalevskyi
f7317d3c51 Jinja2 for docs/cpp build set to version 3.0
Fixes https://github.com/pytorch/pytorch/issues/74684

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74718
Approved by: https://github.com/malfet
2022-03-24 23:39:26 +00:00
Slava Kovalevskyi
7f996b855c Jinja2 version pinned to 3.0.* (#74690)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/74684

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74690

Reviewed By: malfet

Differential Revision: D35119993

Pulled By: b0noI

fbshipit-source-id: f53b2643000e24662644fda8718a7c4e1bfaa273
(cherry picked from commit 6dfadffff864f1d57eaea088c6dae0b673496bd7)
2022-03-24 21:58:28 +00:00
Kurt Mohler
79ddc72b85 Virtualize <type>Storage classes (#66970)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66228

cc ezyang bhosmer smessmer ljk53 bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66970

Reviewed By: bdhirsh

Differential Revision: D33245612

Pulled By: ezyang

fbshipit-source-id: 4c61c2cb029e2b94b0e68927c377d3e1c358dd7c
(cherry picked from commit d29fcdfb4bc2cc17b1795d4349e4b56fa0d1cf12)
2022-03-22 23:44:48 +00:00
Paulo Valente
8f55b1d87e docs: expose at::native::unfold (#74224)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/74091

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74224

Reviewed By: zou3519

Differential Revision: D34895555

Pulled By: H-Huang

fbshipit-source-id: 201ffa76681f6940eb8f180409f4e94974ae3e4f
(cherry picked from commit e0f8461d8990e951fb2c1585c871125c9dd3da29)
2022-03-22 19:42:43 +00:00
leslie-fang-intel
3a112ebb57 add autocast cpu doc
As discussed in https://github.com/pytorch/pytorch/issues/55374#issuecomment-968333614, here we update the cpu autocast operation list in autocast API document.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68567
Approved by: https://github.com/ezyang
2022-03-22 02:02:43 +00:00
Michael Suo
e5bf87963d Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4)
Test Plan: revert-hammer

Differential Revision:
D34584878 (7dd0823011)

Original commit changeset: ce817aa8cc90

Original Phabricator Diff: D34584878 (7dd0823011)

fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b
(cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)
2022-03-21 23:07:14 +00:00
chunyuan
7dd0823011 Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111)
Summary:
## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111

Reviewed By: eellison

Differential Revision: D34584878

Pulled By: malfet

fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4
(cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)
2022-03-21 22:12:19 +00:00
Jaewon Lee
11ea09effc [CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync (#74261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74261

### Goal
Implement a cheap way to reclaim GPU memory (garbage collection) without incurring GPU sync.

### Why do we need this?
Currently, there are only two ways to reclaim GPU memory block already assigned to a particular stream.

- `release_available_cached_blocks(params)`: Free blocks exceeding the `CachingAllocatorConfig::max_split_size()` until we can satisfy the request.

Issue: If the `max_split_size` is unset (default), this function is a no-op. Even if this is set, the reclamation is quite conservative (e.g., never frees blocks under max_split_size).

- `release_cached_blocks()`: Waits for all the in-flight events and then reclaim blocks.

Issue: 'waiting for all event' is very expensive as it will likely stall all the GPU operations. Many GPU applications without a proper handling of potential GPU throttling would suffer/crash.

### Proposed idea
- If the garbage collection threshold is set, try to reclaim some memory blocks *without* synchronization. It should be safe to do so, as `release_available_cached_blocks` essentially does the same thing (but less aggressively).
- GC is triggered only when we fail to serve a `malloc` request from the block pool. No need to free blocks when the block pool is functioning just fine.
- Prioritize reclaiming blocks that weren't reused for long time. Reclamation stops once the used memory capacity < threshold.
- This code path is totally optional; by default it won't be invoked.

Test Plan:
- Unit tests
- Manually checked that the GPU memory usage stays as indicated by the garbage collector. If not the caching allocator at least tries to keep freeing the blocks.

Reviewed By: jianyuh

Differential Revision: D34482514

fbshipit-source-id: d5eae62ac60b94b0bca851f9d233a092d086e3c2
(cherry picked from commit 05780f1ed4b176f05e765b2411c9eaa2eaeb48b0)
2022-03-21 18:46:02 +00:00
BowenBao
54a6942f8d [ONNX] ONNX Exporter logging (#71342)
Summary:
Add ONNX exporter logging facility. Supporting both C++/Python logging api. Logging can be turned on/off. Logging output stream can be either set to `stdout` or `stderr`.

A few other changes:
* When exception is raised in passes, the current IR graph being processed will be logged.
* When exception is raised from `_jit_pass_onnx` (the pass that converts nodes from namespace `ATen` to `ONNX`), both ATen IR graph and ONNX IR graph under construction will be logged.
* Exception message for ConstantFolding is truncated to avoid being too verbose.
* Update the final printed IR graph with node name in ONNX ModelProto as node attribute. Torch IR Node does not have name. Adding this to printed IR graph helps debugging.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71342

Reviewed By: msaroufim

Differential Revision: D34433473

Pulled By: malfet

fbshipit-source-id: 4b137dfd6a33eb681a5f2612f19aadf5dfe3d84a
(cherry picked from commit 67a8ebed5192c266f604bdcca931df6fe589699f)
2022-03-17 19:40:03 +00:00
Banit Agrawal
ac3effd150 [PyTorch GPU Allocator] Better use of blocks with rounding of allocation sizes (#74213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74213

In the current CUDACachingAllocator, the sizes are rounded up in multiple of blocks size of 512, so this works for smaller sizes. However for large sizes, we can have lots of different size blocks in the larger pool. This is problematic when we have variable batch sizes 1001, 1021, 1023 -> all will go to different block size and will create different size of blocks. This will create lots of unused blocks and will waste GPU memory capacity.

This diff adds a rounding approach to allocation size. It rounds up the size to nearest power-of-2 divisions and the power2-division can be changed with env variable setting.

   For example, if we need to round-up  size of1200 and if number of divisions is 4,
   the size 1200 lies between 1024 and 2048 and if we do 4 divisions between
   them, the values are 1024, 1280, 1536, and 1792. So the function will
   return 1280 as the nearest ceiling of power-2 division.

env setting:
   export PYTORCH_CUDA_ALLOC_CONF=roundup_power2_divisions:4
ghstack-source-id: 151446017

Reviewed By: ezyang

Differential Revision: D34868036

fbshipit-source-id: 494785add16e6b37c920dcb5a2b81d4c637b554a
(cherry picked from commit 548454ccacbd8700e7ffd2d762e40b4ba37abbae)
2022-03-16 02:53:53 +00:00
Ke Wen
1f04a00ccf [PyTorch Distributed] Update documentation about NCCL environment variables (#74006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74006

updated recommendations about environment variables to use during debug
and performance tuning

Test Plan: `make html`

Reviewed By: rohan-varma

Differential Revision: D34767454

fbshipit-source-id: 08cd58469bf72b58702e50e82020fa19b43b5911
(cherry picked from commit ac7e6630f8043f85d3d16be17c6a8ad1ebb2990c)
2022-03-11 23:57:17 +00:00
Alban Desmaison
734281c3d6 Cleanup all module references in doc (#73983)
Summary:
Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1#

This PR:
- Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool)
- Remove some long deprecated code that just error out on import
- Remove the allow list altogether to ensure nothing gets added back there

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983

Reviewed By: anjali411

Differential Revision: D34787908

Pulled By: albanD

fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7
(cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)
2022-03-10 22:26:29 +00:00
Alban Desmaison
238f7d9cbf rename config module file to work with gh pages better
Fixes https://github.com/pytorch/pytorch/issues/62018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74038
Approved by: https://github.com/mruberry, https://github.com/seemethere
2022-03-10 20:41:44 +00:00
Rohit Goswami
979a78f8b2 Sphinx panel
Fixes https://github.com/pytorch/pytorch/issues/73835.

The full context for this is detailed in the issue, but briefly:

- Adds `sphinx-panel`

Other PRs will demonstrate usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73836
Approved by: https://github.com/albanD
2022-03-07 14:50:09 +00:00
Pritam Damania
71aa3ab020 Add note in RPC docs about retries. (#73601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73601

Some users had questions about how the RPC framework deals with
failures and whether we retry. Adding a note about this to our docs to
elaborate on our current behavior and why we chose that approach.
ghstack-source-id: 150359866

Test Plan: view docs.

Reviewed By: mrshenli

Differential Revision: D34560199

fbshipit-source-id: ee33ceed7fa706270d4ca5c8fcff7535583490ff
(cherry picked from commit 954a906240cc40aacf08ca13f6554a35303a678a)
2022-03-03 00:29:31 +00:00
Ren Pang
e8b10b6e34 fix wrong indexing of class names in docs
Fixes #73631

Locally built and tested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73632
Approved by: jbschlosser
2022-03-02 22:21:21 +00:00
Christian Puhrsch
484c0de670 Minimal NestedTensor (#72881)
Summary:
This PR adds a minimal version of a NestedTensor. It introduces the general harness future development can be built around.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72881

Reviewed By: albanD

Differential Revision: D34259177

Pulled By: cpuhrsch

fbshipit-source-id: 0245c36f603424e20f3b09651043c207f526d760
(cherry picked from commit 10764e8d427f29b364567e4cbc86ed73c3933158)
2022-03-02 16:31:51 +00:00
Nikita Shulga
8ac7393565 Revert D33767740: [pytorch][PR] Sparse CSR CPU: cuSolverSP backend for linalg.solve
Test Plan: revert-hammer

Differential Revision:
D33767740 (199d9a992c)

Original commit changeset: a945f065210c

Original Phabricator Diff: D33767740 (199d9a992c)

fbshipit-source-id: b7934df18118f8d6d5f165deb5aae9887953ae43
(cherry picked from commit d3ddbb021b227e3638f6f7c22c6eadfa73695e31)
2022-03-01 18:33:23 +00:00
Kushashwa Ravi Shrimali
199d9a992c Sparse CSR CPU: cuSolverSP backend for linalg.solve (#71399)
Summary:
This PR introduces the `cuSolverSP` backend for `linalg.solve` with sparse CSR input matrices. The motivation comes from the issue: https://github.com/pytorch/pytorch/issues/69538.

`cuSolver` provides [`cusolverSp<t>csrlsvluHost`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu) API, a few things to note:

1. As mentioned in the documentation: `only CPU (Host) path is provided.` From the profiling, there doesn't seem to be any GPU kernel launch for optimization, please see the profiling below.
2. Since only `host` path is provided, the CPU path uses `csrlsvluHost` (but requires PyTorch to be installed/built with CUDA support).
3. The documentation mentions reordering helps optimize stuff, but it isn't clear how it affects the performance. There are options for reordering, so we stick to `reorder = 0` as the default choice.

`cuSolver` has [`csrlsvqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr) function which provides a `device` path to solve the linear system. This function is used for the CUDA path in this PR.

**Gist:**

For CPU Path: we call [`csrlsvluHost` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu).
For CUDA Path: we call [`csrlsvqr` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr).

**Profiling:** (On sparse input tensor of size 1000 x 1000, with a vector of shape length 1000), for `csrlsvlu` function (to show no GPU optimization)

```cpp
==3999651== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  2.1440us         1  2.1440us  2.1440us  2.1440us  [CUDA memcpy HtoD]
      API calls:   99.72%  1.07199s         9  119.11ms     500ns  1.07164s  cudaFree
                    0.11%  1.2182ms       398  3.0600us     140ns  137.94us  cuDeviceGetAttribute
                    0.06%  674.45us         4  168.61us  165.50us  173.64us  cuDeviceTotalMem
                    0.03%  357.07us         4  89.268us  2.7800us  201.89us  cudaMalloc
                    0.03%  309.29us         1  309.29us  309.29us  309.29us  cudaGetDeviceProperties
                    0.01%  160.47us       332     483ns     350ns  3.3300us  cudaFuncSetAttribute
                    0.01%  115.12us         4  28.780us  26.290us  33.410us  cuDeviceGetName
                    0.00%  28.591us         5  5.7180us     440ns  16.921us  cudaGetDevice
                    0.00%  22.061us         4  5.5150us     871ns  18.690us  cudaDeviceSynchronize
                    0.00%  20.370us        18  1.1310us     410ns  6.9900us  cudaEventDestroy
                    0.00%  16.390us         1  16.390us  16.390us  16.390us  cudaMemcpy
                    0.00%  11.540us         2  5.7700us  1.4900us  10.050us  cuDeviceGetPCIBusId
                    0.00%  10.510us        18     583ns     430ns  1.6200us  cudaEventCreateWithFlags
                    0.00%  7.9100us        21     376ns     290ns     700ns  cudaDeviceGetAttribute
                    0.00%  1.4300us         6     238ns     150ns     590ns  cuDeviceGet
                    0.00%  1.2200us         4     305ns     190ns     500ns  cuDeviceGetCount
                    0.00%     900ns         1     900ns     900ns     900ns  cuInit
                    0.00%     860ns         4     215ns     180ns     260ns  cuDeviceGetUuid
                    0.00%     240ns         1     240ns     240ns     240ns  cuDriverGetVersion
                    0.00%     230ns         1     230ns     230ns     230ns  cudaGetDeviceCount
```

Script:

```python
import torch

def solve(x, other, out):
    torch.linalg.solve(x, other, out=out)

if __name__ == "__main__":
    dense_inp = torch.randn((1000, 1000), dtype=torch.float64)
    # Set 50% of the values to 0 randomly
    dense_inp = torch.nn.functional.dropout(dense_inp, p=0.5)
    sparse_inp = dense_inp.to_sparse_csr()

    other = torch.randint(100, (1000,), dtype=torch.float64)
    out = torch.randint(1, (1000,), dtype=torch.float64)

    solve(sparse_inp, other, out)
```

The following error is raised when the function is used on a CPU device with PyTorch built/installed without CUDA support:
* When built without CUDA support:

```python
/home/krshrimali/pytorch/torch/autograd/profiler.py:151: UserWarning: CUDA is not available, disabling CUDA profiling
  warn("CUDA is not available, disabling CUDA profiling")
Traceback (most recent call last):
  File "/home/krshrimali/pytorch/test_sp.py", line 17, in <module>
    solve(x, other, out)
  File "/home/krshrimali/pytorch/test_sp.py", line 5, in solve
    torch.linalg.solve(x, other, out=out)
RuntimeError: PyTorch was not built with CUDA support. Please use PyTorch built CUDA support
```

**Performance Comparison** (vs SciPy's [`scipy.sparse.linalg.spsolve`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.spsolve.html):

Time taken by `scipy.sparse.linalg.spsolve` : 0.595 seconds

On CPU: Time taken by `torch.linalg.solve` : 4.565 seconds
On CUDA: Time taken by `torch.linalg.solve`: 1.838 seconds

The inputs are of dimensions: (17281, 17281) and (17281, 1), and were taken from https://math.nist.gov/MatrixMarket/extreme.html.

Thanks to IvanYashchuk for helping me with the PR, and guiding me through it.

cc: IvanYashchuk pearu nikitaved cpuhrsch

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71399

Reviewed By: VitalyFedyunin

Differential Revision: D33767740

Pulled By: cpuhrsch

fbshipit-source-id: a945f065210cd719096eb8d7cdbf8e8937c2fce9
(cherry picked from commit f4f35c17da414e1ca6c6d91402933521857aa1ea)
2022-03-01 05:32:35 +00:00
Vasiliy Kuznetsov
01bd6f4357 pytorch: fix typo in quantization docs (#73511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73511

Fixes typo in describing the `torch.qint32` data type.

Test Plan: CI

Reviewed By: andrewor14

Differential Revision: D34522741

Pulled By: vkuzo

fbshipit-source-id: f05f8440d9708281213a4b3736e8f59199dd7b1a
(cherry picked from commit ca9e598d60cac016e58fda9cd0f329ca412ec36b)
2022-02-28 23:11:52 +00:00
Peter Bell
f437ca6e8e Remove legacy tensor constructors for complex dtypes
PR #72405 added four new types to the public python API:
`torch.ComplexFloatTensor`, `torch.ComplexDoubleTensor`,
`torch.cuda.ComplexFloatTensor` and `torch.cuda.ComplexDoubleTensor`.

I believe this was unintentional and a clarifying comment as to the
purpose of `all_declared_types` is needed to avoid this in future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73370
2022-02-28 15:13:44 +00:00
Philip Meier
c6f1bbc0ac promote torch.testing to stable (#73348)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73348

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D34457727

Pulled By: mruberry

fbshipit-source-id: 2cc812b643e0d1e753bead2751ee79b3f03fde20
(cherry picked from commit bcdaca1a019a679b8b274e2fb5f19bfd08874ce9)
2022-02-25 06:30:31 +00:00
Jacob Hepkema
91261feb7b Add SoftplusTransform (#52300)
Summary:
This pull request introduces `SoftplusTransform` to `torch.distributions.transforms`. `SoftplusTransform` transforms via the mapping `Softplus(x) = log(1 + exp(x))`. Note that the transform is different to [`torch.nn.Softplus`](https://pytorch.org/docs/stable/generated/torch.nn.Softplus.html#torch.nn.Softplus), as that has additional `beta` and `threshold` parameters. Inverse and `log_abs_det_jacobian` for a more complex `SoftplusTransform` can be added in the future.

vitkl fritzo

Addresses the issue discussed here: [pyro issue 855](https://github.com/pyro-ppl/numpyro/issues/855)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52300

Reviewed By: albanD, ejguan

Differential Revision: D34082655

Pulled By: neerajprad

fbshipit-source-id: 6114e74ee5d73c1527191bed612a142d691e2094
(cherry picked from commit a181a3a9e53a34214a503d38760ad7778d08a680)
2022-02-25 02:30:03 +00:00
Can Balioglu
0e7a7a5fe7 Add documentation for c10d log levels (#73361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73361

This PR adds the documentation for the newly introduced `TORCH_CPP_LOG_LEVEL` and how it can be used along with `TORCH_DISTRIBUTED_DEBUG` to adjust the log level of c10d.
ghstack-source-id: 149874995

Test Plan: Locally rendered and checked the documentation.

Reviewed By: rohan-varma

Differential Revision: D34452352

fbshipit-source-id: ecb54590f3030ddef9921a7152ca9f7fc9438345
(cherry picked from commit f4c7c6f3b27dbd3006686cf26a6e9e53cd2c8f09)
2022-02-24 20:38:15 +00:00
Edgar Andrés Margffoy Tuay
86deecd7be Check clang++/g++ version when compiling CUDA extensions (#63230)
Summary:
See https://github.com/pytorch/pytorch/issues/55267

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63230

Reviewed By: soulitzer

Differential Revision: D34159119

Pulled By: malfet

fbshipit-source-id: 6eef7582388bf6a42dcc1d82b6e4b1f40f418dd7
(cherry picked from commit 2056d0a0be7951602de22f8d3b4efc28dd71b6c2)
2022-02-24 08:32:32 +00:00
Can Balioglu
e1db2f13ce Refactor TORCH_DISTRIBUTED_DEBUG implementation (#73166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73166

This PR refactors, cleans up, and optimizes the implementation of `TORCH_DISTRIBUTED_DEBUG`. It also introduces three new user APIs: `get_debug_level()`, `set_debug_level()`, and `set_debug_level_from_env()` to retrieve and modify the debug level after a process has started.
ghstack-source-id: 149778566

Test Plan: Run the existing unit tests.

Reviewed By: rohan-varma

Differential Revision: D34371226

fbshipit-source-id: e18443b411adcbaf39b2ec999178c198052fcd5b
(cherry picked from commit 26d6bb1584b83a0490d8b766482656a5887fa21d)
2022-02-24 02:33:05 +00:00
Nikita Karetnikov
75db05c3fd Check if the iterator is valid before dereferencing it (#72405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72405

Fixes #71674.

This shouldn't segfault now:

```
import torch
d = torch.complex64
torch.set_default_dtype(d)
```

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D34423660

Pulled By: anjali411

fbshipit-source-id: cac92a6f56846f2c0727a120b5f568aa75baa21e
(cherry picked from commit eaab813a0fddced24303b3bd50e4fcdba1516e46)
2022-02-23 18:33:46 +00:00
Nikita Shulga
cfb6c942fe scatter_reduce documentation (#73125)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/68580 (which were milestoned for 1.11) plus partial revert of https://github.com/pytorch/pytorch/pull/72543

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73125

Reviewed By: bdhirsh

Differential Revision: D34355217

Pulled By: malfet

fbshipit-source-id: 325ecdeaf53183d653b44ee5e6e8839ceefd9200
(cherry picked from commit 71db31748a)
2022-02-22 19:33:46 +00:00
Gary Miguel
dbac0f5cdc Update persons of interest for ONNX (#72072)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72072

Reviewed By: H-Huang

Differential Revision: D34230534

Pulled By: malfet

fbshipit-source-id: ed5abdfacf0d9628c6cc99957fa578d71a79d025
(cherry picked from commit 4669c346c4)
2022-02-16 23:01:13 +00:00
Elias Ellison
f8a2efc190 Make fusion strategy api public (#72639)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72639

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D34159123

Pulled By: eellison

fbshipit-source-id: 27e4d9694a83e8d6829009882715be4308c96a9f
(cherry picked from commit 1cadcd2f75)
2022-02-16 03:45:15 +00:00
Kurt Mohler
8e7fe87630 Rename Typed/UntypedStorage to _Typed/_UntypedStorage (#72540)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72540

Reviewed By: jbschlosser

Differential Revision: D34216823

Pulled By: bdhirsh

fbshipit-source-id: 1bc9930ab582771ebf02308e035576cd1a0dbe47
(cherry picked from commit 329238f612)
2022-02-15 23:53:01 +00:00
Nikita Shulga
cb00d9601c Revert D33800694: [pytorch][PR] scatter_reduce documentation
Test Plan: revert-hammer

Differential Revision:
D33800694 (12a1df27c7)

Original commit changeset: 2e09492a29ce

Original Phabricator Diff: D33800694 (12a1df27c7)

fbshipit-source-id: 2a4775c0042551607fe3ab77f5bfe9f2e4b6b78e
(cherry picked from commit 4bd6c0d2bb)
2022-02-15 20:10:26 +00:00
rusty1s
12a1df27c7 scatter_reduce documentation (#68580)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63780 (part 2)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68580

Reviewed By: atalman

Differential Revision: D33800694

Pulled By: malfet

fbshipit-source-id: 2e09492a29cef115a7cca7c8209d1dcb6ae24eb9
(cherry picked from commit 696ff75940)
2022-02-15 19:43:54 +00:00
Huamin Li
32dd4a8639 move fx_acc out of pytorch core (#72803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72803

as title

Reviewed By: jfix71

Differential Revision: D34101788

fbshipit-source-id: a9fd84671929af21405c049603e9895ec68de3d8
(cherry picked from commit e98fd1c32d)
2022-02-15 16:13:43 +00:00
mattip
fb4504da2f DOC: release documentation version should be major.minor (#72706)
Summary:
Fixes pytorch/pytorch.github.io#929

The pytorch doc team would like to move to only major.minor documentation at https://pytorch.org/docs/versions.html, not major.minor.patch. This has been done in the CI scripts, but the generated documentation still has the patch version. Remove it when building RELEASE documentation. This allows simplifying the logic, using `'.'.join(torch_version.split('.')[:2])` since we no longer care about trimming off the HASH: it automatically gets removed.

holly1238, brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72706

Reviewed By: samdow

Differential Revision: D34215815

Pulled By: albanD

fbshipit-source-id: 8437036cc6636674d9ab8b1666f37b561d0527e1
(cherry picked from commit d8caf988f9)
2022-02-14 23:37:43 +00:00
Rohit Goswami
801abc0cdd MAINT, DOC: Trivial spellings and warnings (#72745)
Summary:
Fixes N/A.
Just minor annoyances.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72745

Reviewed By: samdow

Differential Revision: D34216016

Pulled By: albanD

fbshipit-source-id: b65600b50e41a1dd7bf7d076b0dd3e2d1c99caf9
(cherry picked from commit b959392a5f)
2022-02-14 21:55:19 +00:00
Kurt Mohler
47c6993355 Update from_dlpack tests and documentation (#70543)
Summary:
Part of https://github.com/pytorch/pytorch/issues/58742

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70543

Reviewed By: soulitzer

Differential Revision: D34172475

Pulled By: mruberry

fbshipit-source-id: d498764b8651a8b7a19181b3421aeebf28a5db2b
(cherry picked from commit 05332f164c)
2022-02-14 03:35:17 +00:00
Felix Divo
340fae4363 [Doc] Better formatting in autograd.rst (#72586)
Summary:
See title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72586

Reviewed By: soulitzer

Differential Revision: D34177704

Pulled By: albanD

fbshipit-source-id: 1adf6ebed4f64ec4d8fff160df300c8e6ee528ea
(cherry picked from commit bbb586d67d)
2022-02-11 22:46:10 +00:00
BowenBao
9257de7efa [ONNX] Minor doc update (#69501) (#69550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69550

Fix the wiki URL.

Also minor reorganization in onnx.rst.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32994269

Pulled By: malfet

fbshipit-source-id: 112acfe8b7c778d7e3c2cef684023fdaf2c6ec9c
(cherry picked from commit f0787fabde)
2022-02-11 22:05:15 +00:00
BowenBao
ce5b155ccb [ONNX] Link to the wiki (#68505) (#72663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72663

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D34150535

Pulled By: malfet

fbshipit-source-id: 230b786f6235549fff764083eac2c3744c6bff88

Co-authored-by: Gary Miguel <garymiguelmicrosoft.com>
(cherry picked from commit c848c582d1)
2022-02-11 22:05:15 +00:00
Felix Divo
25fba4a019 [DOC] Add link to "double backward" from "extending pytorch" page (#72584)
Summary:
It is probably the most user friendly to link to that (lesser known?) feature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72584

Reviewed By: soulitzer

Differential Revision: D34173999

Pulled By: albanD

fbshipit-source-id: 99fff7a55412faf54888f8317ab2388f4d7d30e4
(cherry picked from commit 2191ee7657)
2022-02-11 20:34:13 +00:00
BowenBao
04c5d978b9 [ONNX] Refactor _run_symbolic_function (#67573) (#68491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68491

* Allows implementing symbolic functions for domains other than `aten`, for example `prim`, in symbolic_opset#.py.
* Allows symbolic function to access extra context if needed, through `SymbolicFunctionState`.
  * Particularly, the `prim::PythonOp` special case can access node without the need of passing node through inputs. Updates will be made downstreams, and in a follow-up PR we will remove the previous workaround in exporter.
* `prim::Loop`, `prim::If`, etc are now moved outside of `_run_symbolic_function` from utils.py, and to symbolic_opset9.py.

Motivation for this change:
- Better maintainability and reducing complexity. Easier to add symbolic for operators, both simple and complex ones (that need additional context), without the former needing to know the existence of the latter.
- The design idea was long outdated. prim ops are no longer rare special cases, and they shouldn't all be handled inside `_run_symbolic_function`. As a result this function becomes too clumsy. There were also prim ops symbolic added in symbolic_opset#.py with signature `prim_[opname]`, creating separation and confusion.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D32483782

Pulled By: malfet

fbshipit-source-id: f9affc31b1570af30ffa6668da9375da111fd54a

Co-authored-by: BowenBao <bowbao@microsoft.com>
(cherry picked from commit 1e04ffd2fd)
2022-02-11 18:35:35 +00:00
Mike Ruberry
2fa34fb7b9 Revert D34154832: [pytorch][PR] Add multi_head_attention_forward to functional rst docs
Test Plan: revert-hammer

Differential Revision:
D34154832 (bafaf0d610)

Original commit changeset: 7279d05f31d4

Original Phabricator Diff: D34154832 (bafaf0d610)

fbshipit-source-id: fcbc896b25f3b51a7ce0c5dc1dca652f57f7218c
(cherry picked from commit afa53acdfd)
2022-02-11 05:08:46 +00:00