Commit Graph

1679 Commits

Author SHA1 Message Date
PyTorch MergeBot
4ebc4890dd Revert "Add linalg.lu_solve"
This reverts commit fc5b4a5a33.

Reverted https://github.com/pytorch/pytorch/pull/72935 on behalf of https://github.com/malfet
2022-05-09 19:12:30 +00:00
Alban Desmaison
d5210a4269 Add gradient choice detail to autograd doc
Trying to clarify what our backward functions should compute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76898
Approved by: https://github.com/soulitzer, https://github.com/Lezcano
2022-05-06 21:12:25 +00:00
Sherlockk Huang
8b6a78f39f Python Interface for Jiterator
This PR allows user to author a CUDA kernel in python.

```
from torch.cuda.jiterator import create_jit_fn

code_string = "template <typename T> T my_kernel(T x, T y, T alpha) { return  -x * y + x - y + alpha; }"
jitted_fn = create_jit_fn(code_string, alpha=0)

a = torch.rand(3, device='cuda')
b = torch.rand(3, device='cuda')
result = jitted_fn(a, b, alpha=1.0)
```

Limitations:
- Only supports elementwise kernel
- 1~8 tensor inputs (empty input, e.g. factory methods, is not supported)
- inputs tensors must live in cuda device
- cpu Scalar is not supported
- kwargs must be pre-declared when calling create_jit_fn
- kwargs must be convertible to at::Scalar, one of float64, int64_t, bool. (complex not support for now)

TODOs:
- [x] consolidate union and c10::variant implementation
- [x] plug into existing op testing framework
- [ ] rename files, place files in the right folder
- [ ] place util functions in the right file
- [x] enforce assumptions in python interface e.g <8 inputs, kwargs types
- [x] Add user-facing documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76394
Approved by: https://github.com/mruberry
2022-05-06 18:44:28 +00:00
zhoubo
fd6991e714 add trunc_normal_ function to doc of torch.nn.init
Fixes #72517

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76896
Approved by: https://github.com/jbschlosser
2022-05-06 14:33:08 +00:00
lezcano
621ff0f973 Add linalg.vander
This PR adds `linalg.vander`, the linalg version of `torch.vander`.

We add autograd support and support for batched inputs.

We also take this chance to improve the docs (TODO: Check that they
render correctly!) and add an OpInfo.

**Discussion**: The current default for the `increasing` kwargs is extremely
odd as it is the opposite of the classical definition (see
[wiki](https://en.wikipedia.org/wiki/Vandermonde_matrix)). This is
reflected in the docs, where I explicit both the odd defaults that we
use and the classical definition. See also [this stackoverflow
post](https://stackoverflow.com/a/71758047/5280578), which shows how
people are confused by this defaults.

My take on this would be to correct the default to be `increasing=True`
and document the divergence with NumPy (as we do for other `linalg`
functions) as:

- It is what people expect
- It gives the correct determinant called "the Vandermonde determinant" rather than (-1)^{n-1} times the Vandermonde det (ugh).
- [Minor] It is more efficient (no `flip` needed)
- Since it's under `linalg.vander`, it's strictly not a drop-in replacement for `np.vander`.

We will deprecate `torch.vander` in a PR after this one in this stack
(once we settle on what's the correct default).

Thoughts? mruberry

cc kgryte rgommers as they might have some context for the defaults of
NumPy.

Fixes https://github.com/pytorch/pytorch/issues/60197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76303

Approved by: https://github.com/albanD, https://github.com/mruberry
2022-05-06 08:44:14 +00:00
PyTorch MergeBot
8ac6b0a010 Revert "Contribution- Grammatical Corrections in the documentation"
This reverts commit a0ebf1d386.

Reverted https://github.com/pytorch/pytorch/pull/57411 on behalf of https://github.com/malfet
2022-05-05 23:13:10 +00:00
Sanskar
a0ebf1d386 Contribution- Grammatical Corrections in the documentation
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57411
Approved by: https://github.com/svekars, https://github.com/holly1238, https://github.com/malfet
2022-05-05 22:35:08 +00:00
lezcano
fc5b4a5a33 Add linalg.lu_solve
This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA
when calling the batched MAGMA backend with trans=True. We work around
that by solving the system solving two triangular systems.

We also update the heuristics for this function, as they were fairly
updated. We found that cuSolver is king, so luckily we do not need to
rely on the buggy backend from magma for this function.

We added tests testing this function left and right. We also added tests
for the different backends. We also activated the tests for AMD, as
those should work as well.

Fixes https://github.com/pytorch/pytorch/issues/61657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72935

Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
2022-05-05 19:02:13 +00:00
sanchitintel
4ee29d6033 [Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5)
Re-landing #68111/#74596

## Description
v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of #50256, the below improvements are included:

 * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used
 * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

 ### User API:
The optimization pass is disabled by default. Users could enable it by:

```
 torch.jit.enable_onednn_fusion(True)
```
`torch.jit.freeze` should be used after tracing (recommended) or scripting a model.

 ### Performance:
 [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:

 * SkyLake 8180 (1 socket of 28 cores):
   ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)
* SkyLake 8180 (single thread):
   ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
   * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
   ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

 ### Directory structure of the integration code
 Fuser-related code is placed under:

 ```
 torch/csrc/jit/codegen/onednn/
 ```

 Optimization pass registration is done in:

 ```
 torch/csrc/jit/passes/onednn_graph_fuser.h
 ```

 CMake for the integration code is in:

 ```
 caffe2/CMakeLists.txt
 cmake/public/mkldnn.cmake
 cmake/Modules/FindMKLDNN.cmake
 ```

 ## Limitations
 * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step.
 * We have only optimized the inference use-case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622
Approved by: https://github.com/eellison
2022-05-05 16:57:03 +00:00
lezcano
7cb7cd5802 Add linalg.lu
This PR modifies `lu_unpack` by:
- Using less memory when unpacking `L` and `U`
- Fuse the subtraction by `-1` with `unpack_pivots_stub`
- Define tensors of the correct types to avoid copies
- Port `lu_unpack` to be a strucutred kernel so that its `_out` version
does not incur on extra copies

Then we implement `linalg.lu` as a structured kernel, as we want to
compute its derivative manually. We do so because composing the
derivatives of `torch.lu_factor` and `torch.lu_unpack` would be less efficient.

This new function and `lu_unpack` comes with all the things it can come:
forward and backward ad, decent docs, correctness tests, OpInfo, complex support,
support for metatensors and support for vmap and vmap over the gradients.

I really hope we don't continue adding more features.

This PR also avoids saving some of the tensors that were previously
saved unnecessarily for the backward in `lu_factor_ex_backward` and
`lu_backward` and does some other general improvements here and there
to the forward and backward AD formulae of other related functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67833

Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry
2022-05-05 09:17:05 +00:00
Eddie Yan
e838137b3e Add high level control of fp32 matmul precision; disable TF32 for matmuls by default
#76440

CC @mruberry @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76509
Approved by: https://github.com/ngimel
2022-05-04 20:40:13 +00:00
Shawn Zhong
9c902f4749 Add TORCH_CPP_LOG_LEVEL to the docs
Fixes #70667

`TORCH_CPP_LOG_LEVEL=INFO` is needed for `TORCH_DISTRIBUTED_DEBUG` to be effective.

For reference, https://github.com/pytorch/pytorch/pull/71746 introduced the environment variable `TORCH_CPP_LOG_LEVEL` and https://github.com/pytorch/pytorch/pull/73361 documented it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76625
Approved by: https://github.com/rohan-varma
2022-05-03 17:01:11 +00:00
Shabab Ayub
3e08b18167 Back out "Back out "[torch deploy] Update deploy.rst with working simple example"" (#76713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76713

Original commit changeset: d8deed7d0b7f

Original Phabricator Diff: D36073344 (d16ce8a2f6)

Test Plan: n/a

Reviewed By: osalpekar

Differential Revision: D36086703

fbshipit-source-id: 15d03bdb478c02a4c5253a2023828147ee1438e0
(cherry picked from commit fdc27f0fda4b63703839c9ddb620e4708a6360fa)
2022-05-03 14:12:18 +00:00
Shabab Ayub
d16ce8a2f6 Back out "[torch deploy] Update deploy.rst with working simple example"
Summary:
Original commit changeset: d78bb2886f94

Original Phabricator Diff: D35998155

Test Plan: n/a

Reviewed By: osalpekar

Differential Revision: D36073344

fbshipit-source-id: d8deed7d0b7fe716251bfed2450bf971a2dd394c
(cherry picked from commit 689d84be98c106a1883f07343b64326560c920ce)
2022-05-02 22:07:42 +00:00
Shabab Ayub
a240d45277 [torch deploy] Update deploy.rst with working simple example (#76538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76538

when running the example from the docs, I found that these steps were not working.

These are the updates necessary to get the example working.

Test Plan: n/a

Reviewed By: PaliC

Differential Revision: D35998155

fbshipit-source-id: d78bb2886f94889abae5a3af5239fcd306cd5e09
(cherry picked from commit 6893812efe7443b437ccafb7b1ff6bc7bd2e6670)
2022-05-02 22:07:42 +00:00
PyTorch MergeBot
bc5307347f Revert "Add linalg.vander"
This reverts commit 1ea49c68d0.

Reverted https://github.com/pytorch/pytorch/pull/76303 on behalf of https://github.com/malfet
2022-05-02 18:50:08 +00:00
lezcano
1ea49c68d0 Add linalg.vander
This PR adds `linalg.vander`, the linalg version of `torch.vander`.

We add autograd support and support for batched inputs.

We also take this chance to improve the docs (TODO: Check that they
render correctly!) and add an OpInfo.

**Discussion**: The current default for the `increasing` kwargs is extremely
odd as it is the opposite of the classical definition (see
[wiki](https://en.wikipedia.org/wiki/Vandermonde_matrix)). This is
reflected in the docs, where I explicit both the odd defaults that we
use and the classical definition. See also [this stackoverflow
post](https://stackoverflow.com/a/71758047/5280578), which shows how
people are confused by this defaults.

My take on this would be to correct the default to be `increasing=True`
and document the divergence with NumPy (as we do for other `linalg`
functions) as:

- It is what people expect
- It gives the correct determinant called "the Vandermonde determinant" rather than (-1)^{n-1} times the Vandermonde det (ugh).
- [Minor] It is more efficient (no `flip` needed)
- Since it's under `linalg.vander`, it's strictly not a drop-in replacement for `np.vander`.

We will deprecate `torch.vander` in a PR after this one in this stack
(once we settle on what's the correct default).

Thoughts? mruberry

cc kgryte rgommers as they might have some context for the defaults of
NumPy.

Fixes https://github.com/pytorch/pytorch/issues/60197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76303

Approved by: https://github.com/albanD
2022-05-02 15:26:44 +00:00
PyTorch MergeBot
3dcd67a1b3 Revert "[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)"
This reverts commit 8b11d81058.

Reverted https://github.com/pytorch/pytorch/pull/74596 on behalf of https://github.com/janeyx99
2022-04-29 15:40:17 +00:00
chunyuan
8b11d81058 [Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)
Re-landing https://github.com/pytorch/pytorch/pull/68111

## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596
Approved by: https://github.com/malfet
2022-04-29 01:01:33 +00:00
Ivan Yashchuk
8bb7203049 Add torch.linalg.ldl_factor_ex and torch.linalg.ldl_solve
This PR adds a function for computing the LDL decomposition and a function that can solve systems of linear equations using this decomposition. The result of `torch.linalg.ldl_factor_ex` is in a compact form and it's required to use it only through `torch.linalg.ldl_solve`. In the future, we could provide `ldl_unpack` function that transforms the compact representation into explicit matrices.

Fixes https://github.com/pytorch/pytorch/issues/54847.

cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69828
Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/albanD
2022-04-28 19:23:37 +00:00
Jerry Zhang
30342f6ba6 [quant][docs] Fix formatting for quantization.rst (#76223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76223

Small formatting fixes that was missed because I didn't check the generated doc last time

Test Plan:
visual inspection of the generated docs for this PR

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D35853174

fbshipit-source-id: 4454a4bf5d0c998d866bbae1d6b5286827082033
(cherry picked from commit 125f60356ccc9cd6888c515889bd27ff9860ec74)
2022-04-26 03:16:39 +00:00
Elias Ellison
0d7be81c9c [JIT] Add Context Manager to force strict fusion
Fixes https://github.com/pytorch/pytorch/issues/75464 Adds a context manager that will throw if the ops in the context are not fused.

API is :
```
with torch.jit.strict_fusion():
    ...
```

A few TODOs:
[+] Compose/figure out how to do with autodiff - right now it will run on autodiff as well
[+] Support all of the nvfuser operators that are added in guarding
[+] Figure out what to do with control flow that isn't taken (right now it will just error). this is probably a source of the original issue :/  - will just error
[+] (After those are figured out) add to docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75777
Approved by: https://github.com/davidberard98
2022-04-25 16:08:57 +00:00
Jerry Zhang
056627ddce [quant][docs] Add more docs for quantization.rst (#75998)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75998

Add more details to user facing docs quantization.rst, which will be displayed in the official quantization doc page: https://pytorch.org/docs/stable/quantization.html
This includes:
* docs for quantization stack (quantized tensor, quantized operator and modules, observer, fake_quantize, QConfig, quantization flow)
* Added support table for quantization mode, quantization flow mode and backend, (also moved around operator support table)
* restructured eager mode and fx mode docs as well

Test Plan:
inspect the doc that's built by github ci

Imported from OSS

Reviewed By: dzdang

Differential Revision: D35739111

fbshipit-source-id: 3762d387479bdd37472cb17d5c49da2f520effbb
(cherry picked from commit db5e6411c52c08dd9c45f841ab86713d36a75d51)
2022-04-22 06:42:39 -07:00
albanD
a6a5e6cecf move the stateless util to public API!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75834
Approved by: https://github.com/zou3519, https://github.com/jbschlosser
2022-04-21 13:42:24 +00:00
kshitij12345
aa51704ce5 [complex32] add chalf alias for complex32 and chalf method
Reference: https://github.com/pytorch/pytorch/issues/74537

Adds chalf alias for complex32 and also adds method `chalf` similar to `cfloat, cdouble`

TODO:
* [x] Add docs
* [x] Add override
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75320
Approved by: https://github.com/anjali411
2022-04-20 23:44:47 +00:00
Jerry Zhang
74454bdb46 [quant][fx] Move backend_config folder to torch.ao.quantization
Summary:
Following https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md we implemented
the backend configuration for fbgemm/qnnpack backend, currently it was under fx folder, but we'd like to use this for all different
workflows, including eager, fx graph and define by run quantization, this PR moves it to torch.ao.quantization namespace so that
it can be shared by different workflows
Also moves some utility functions specific to fx to fx/backend_config_utils.py and some files are kept in fx folder (quantize_handler.py and fuse_handler.py)

Test Plan:
python test/teset_quantization.py TestQuantizeFx
python test/teset_quantization.py TestQuantizeFxOps
python test/teset_quantization.py TestQuantizeFxModels
python test/test_quantization.py TestAOMigrationQuantization
python test/test_quantization.py TestAOMigrationQuantizationFx

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75823

Approved by: https://github.com/vkuzo
2022-04-19 15:38:57 +00:00
Alban Desmaison
bd7e99cbb9 Fix doc build
Regression introduced in https://github.com/pytorch/pytorch/pull/73224
The caller for this script has never been updated to pass in main: 2ecc59086a/.github/workflows/_docs.yml (L81-L85)

So this change made it so that all PR doc is built as-if it was a release (for example https://github.com/pytorch/pytorch/runs/6031182009?check_suite_focus=true) and so the coverage test for the doc didn't run for a month :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75997
Approved by: https://github.com/musebc, https://github.com/seemethere
2022-04-19 04:07:47 +00:00
Brian Johnson
990d155c9c Update Index.rst to add TorchRec to domain list.
Adds TorchRec and TorchData to domain library list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73229
Approved by: https://github.com/colin2328, https://github.com/jamesr66a
2022-04-15 02:39:12 +00:00
Nikita Shulga
348881deaf Update doc copyrights to 2022
Also, s/Torch/PyTorch/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75690
Approved by: https://github.com/kit1980, https://github.com/soumith
2022-04-13 00:25:23 +00:00
Yulv-git
ac2d2e3a3d Fix some typos.
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561
Approved by: https://github.com/albanD
2022-04-11 21:55:59 +00:00
Nuno-Mota
0bd3354547 Update onnx.rst
Fixes #75508

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75509
Approved by: https://github.com/BowenBao
2022-04-08 20:07:01 +00:00
Mikayla Gawarecki
11f1fef981 Update documentation for scatter_reduce
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74608

Approved by: https://github.com/cpuhrsch
2022-04-07 15:41:23 +00:00
Thiago Crepaldi
89e79f844d Add list of supported ATen ops by ONNX converter into torch.onnx page
This PR introduces a new documentation page with a list of supported ATen operators by the ONNX converter.

When `make html` (or similar) are called, a python script will generate a temporary CSV file inside the doc build folder with a list of operators/opsets currently supported by the PyTorch ONNX exporter. That CSV is used by Sphinx to build a HTML table using the same theme as the rest of the documentation.

That page is linked to the existing `onnx.rst`, including its table of contents.

@BowenBao @shubhambhokare1 Feel free to add more details on how the script cross reference onnx symbolics and aten operators list from torch jit api`

Below is the workflow for the changed pages:

The initial torch.onnx page was modified to add a link to the list of supported aten operators
![image](https://user-images.githubusercontent.com/5469809/159046387-c459bffc-c9b2-4fcb-8468-8181fdddf911.png)

The screen below highlights the text structure changes to the `ATen operartors` section
![image](https://user-images.githubusercontent.com/5469809/159046730-ccd1e594-c8e6-4b8d-a9ec-8bf6ad58a435.png)

Finally the new page with the list of supported operators is shown below
![image](https://user-images.githubusercontent.com/5469809/159046872-0d99b769-8b95-4c2b-99a9-a8cfdd0b6ecf.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74397
Approved by: https://github.com/garymm, https://github.com/malfet
2022-04-07 00:05:44 +00:00
Vasiliy Kuznetsov
74b23b2066 quantization: autogenerate quantization backend configs for documentation (#75126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75126

Quantization has a high volume of configurations of how to quantize an
op for a reference model representation which is useful for a lowering
step for a backend.  An example of this is

```
 {'dtype_configs': [{'input_dtype': torch.quint8,
										 'output_dtype': torch.quint8}],
	'observation_type': <ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT: 0>,
	'pattern': <class 'torch.nn.modules.conv.ConvTranspose1d'>},
```

These configs are checked into master, and they are created with Python functions.
Therefore, there is no easy way for the user to see what the configs actually
are without running some Python code.

This PR is one approach to document these configs. Here is what this is doing:
1. during documentation build, write a text file of the configs
2. render that text file on a quantization page, with some additional context

In the future, this could be extended to autogenerate better looking tables
such as: op support per backend and dtype, op support per valid quantization settings per backend,
etc.

Test Plan:
```
cd docs
make html
cd html
python -m http.server 8000
// render http://[::]:8000/quantization-backend-configuration.html
// it renders correctly
```

Reviewed By: ejguan

Differential Revision: D35365461

Pulled By: vkuzo

fbshipit-source-id: d60f776ccb57da9db3d09550e4b27bd5e725635a
(cherry picked from commit 14865c0e23bc080120342c8f9278f0fae8eb8fbd)
2022-04-04 22:22:30 +00:00
Sherlockk Huang
bbf7e159e0 Implement torch.special.log_ndtr
Implements torch.special.log_ndtr

Issue: https://github.com/pytorch/pytorch/issues/50345

TODO:
- [x] adding proper reference to scipy implementation
- [x] double check if the changes in test/test_unary_ufuncs.py is really necessary
- [x] check setting for UnaryUfuncInfo
cc: @kshitij12345 @mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74795
Approved by: https://github.com/anjali411
2022-03-29 23:13:37 +00:00
Smark
ab57876420 fix docs error in Autograd Mechanics
Fixes #74682

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74807
Approved by: https://github.com/albanD
2022-03-29 18:32:16 +00:00
Janakan
923a922b1b Grammatically updated quantization tech doc
Improved PyTorch technical documentation consistency for the "quantization API summary" section.
![Screen Shot 2022-03-19 at 4 07 46 PM](https://user-images.githubusercontent.com/72175053/160317638-51e26ec0-903e-44ba-ba59-aa114d4fda93.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74436
Approved by: https://github.com/albanD
2022-03-28 16:48:25 +00:00
Kurt Mohler
79ddc72b85 Virtualize <type>Storage classes (#66970)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66228

cc ezyang bhosmer smessmer ljk53 bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66970

Reviewed By: bdhirsh

Differential Revision: D33245612

Pulled By: ezyang

fbshipit-source-id: 4c61c2cb029e2b94b0e68927c377d3e1c358dd7c
(cherry picked from commit d29fcdfb4bc2cc17b1795d4349e4b56fa0d1cf12)
2022-03-22 23:44:48 +00:00
leslie-fang-intel
3a112ebb57 add autocast cpu doc
As discussed in https://github.com/pytorch/pytorch/issues/55374#issuecomment-968333614, here we update the cpu autocast operation list in autocast API document.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68567
Approved by: https://github.com/ezyang
2022-03-22 02:02:43 +00:00
Michael Suo
e5bf87963d Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4)
Test Plan: revert-hammer

Differential Revision:
D34584878 (7dd0823011)

Original commit changeset: ce817aa8cc90

Original Phabricator Diff: D34584878 (7dd0823011)

fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b
(cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)
2022-03-21 23:07:14 +00:00
chunyuan
7dd0823011 Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111)
Summary:
## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111

Reviewed By: eellison

Differential Revision: D34584878

Pulled By: malfet

fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4
(cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)
2022-03-21 22:12:19 +00:00
Jaewon Lee
11ea09effc [CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync (#74261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74261

### Goal
Implement a cheap way to reclaim GPU memory (garbage collection) without incurring GPU sync.

### Why do we need this?
Currently, there are only two ways to reclaim GPU memory block already assigned to a particular stream.

- `release_available_cached_blocks(params)`: Free blocks exceeding the `CachingAllocatorConfig::max_split_size()` until we can satisfy the request.

Issue: If the `max_split_size` is unset (default), this function is a no-op. Even if this is set, the reclamation is quite conservative (e.g., never frees blocks under max_split_size).

- `release_cached_blocks()`: Waits for all the in-flight events and then reclaim blocks.

Issue: 'waiting for all event' is very expensive as it will likely stall all the GPU operations. Many GPU applications without a proper handling of potential GPU throttling would suffer/crash.

### Proposed idea
- If the garbage collection threshold is set, try to reclaim some memory blocks *without* synchronization. It should be safe to do so, as `release_available_cached_blocks` essentially does the same thing (but less aggressively).
- GC is triggered only when we fail to serve a `malloc` request from the block pool. No need to free blocks when the block pool is functioning just fine.
- Prioritize reclaiming blocks that weren't reused for long time. Reclamation stops once the used memory capacity < threshold.
- This code path is totally optional; by default it won't be invoked.

Test Plan:
- Unit tests
- Manually checked that the GPU memory usage stays as indicated by the garbage collector. If not the caching allocator at least tries to keep freeing the blocks.

Reviewed By: jianyuh

Differential Revision: D34482514

fbshipit-source-id: d5eae62ac60b94b0bca851f9d233a092d086e3c2
(cherry picked from commit 05780f1ed4b176f05e765b2411c9eaa2eaeb48b0)
2022-03-21 18:46:02 +00:00
BowenBao
54a6942f8d [ONNX] ONNX Exporter logging (#71342)
Summary:
Add ONNX exporter logging facility. Supporting both C++/Python logging api. Logging can be turned on/off. Logging output stream can be either set to `stdout` or `stderr`.

A few other changes:
* When exception is raised in passes, the current IR graph being processed will be logged.
* When exception is raised from `_jit_pass_onnx` (the pass that converts nodes from namespace `ATen` to `ONNX`), both ATen IR graph and ONNX IR graph under construction will be logged.
* Exception message for ConstantFolding is truncated to avoid being too verbose.
* Update the final printed IR graph with node name in ONNX ModelProto as node attribute. Torch IR Node does not have name. Adding this to printed IR graph helps debugging.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71342

Reviewed By: msaroufim

Differential Revision: D34433473

Pulled By: malfet

fbshipit-source-id: 4b137dfd6a33eb681a5f2612f19aadf5dfe3d84a
(cherry picked from commit 67a8ebed5192c266f604bdcca931df6fe589699f)
2022-03-17 19:40:03 +00:00
Banit Agrawal
ac3effd150 [PyTorch GPU Allocator] Better use of blocks with rounding of allocation sizes (#74213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74213

In the current CUDACachingAllocator, the sizes are rounded up in multiple of blocks size of 512, so this works for smaller sizes. However for large sizes, we can have lots of different size blocks in the larger pool. This is problematic when we have variable batch sizes 1001, 1021, 1023 -> all will go to different block size and will create different size of blocks. This will create lots of unused blocks and will waste GPU memory capacity.

This diff adds a rounding approach to allocation size. It rounds up the size to nearest power-of-2 divisions and the power2-division can be changed with env variable setting.

   For example, if we need to round-up  size of1200 and if number of divisions is 4,
   the size 1200 lies between 1024 and 2048 and if we do 4 divisions between
   them, the values are 1024, 1280, 1536, and 1792. So the function will
   return 1280 as the nearest ceiling of power-2 division.

env setting:
   export PYTORCH_CUDA_ALLOC_CONF=roundup_power2_divisions:4
ghstack-source-id: 151446017

Reviewed By: ezyang

Differential Revision: D34868036

fbshipit-source-id: 494785add16e6b37c920dcb5a2b81d4c637b554a
(cherry picked from commit 548454ccacbd8700e7ffd2d762e40b4ba37abbae)
2022-03-16 02:53:53 +00:00
Ke Wen
1f04a00ccf [PyTorch Distributed] Update documentation about NCCL environment variables (#74006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74006

updated recommendations about environment variables to use during debug
and performance tuning

Test Plan: `make html`

Reviewed By: rohan-varma

Differential Revision: D34767454

fbshipit-source-id: 08cd58469bf72b58702e50e82020fa19b43b5911
(cherry picked from commit ac7e6630f8043f85d3d16be17c6a8ad1ebb2990c)
2022-03-11 23:57:17 +00:00
Alban Desmaison
734281c3d6 Cleanup all module references in doc (#73983)
Summary:
Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1#

This PR:
- Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool)
- Remove some long deprecated code that just error out on import
- Remove the allow list altogether to ensure nothing gets added back there

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983

Reviewed By: anjali411

Differential Revision: D34787908

Pulled By: albanD

fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7
(cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)
2022-03-10 22:26:29 +00:00
Alban Desmaison
238f7d9cbf rename config module file to work with gh pages better
Fixes https://github.com/pytorch/pytorch/issues/62018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74038
Approved by: https://github.com/mruberry, https://github.com/seemethere
2022-03-10 20:41:44 +00:00
Rohit Goswami
979a78f8b2 Sphinx panel
Fixes https://github.com/pytorch/pytorch/issues/73835.

The full context for this is detailed in the issue, but briefly:

- Adds `sphinx-panel`

Other PRs will demonstrate usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73836
Approved by: https://github.com/albanD
2022-03-07 14:50:09 +00:00
Pritam Damania
71aa3ab020 Add note in RPC docs about retries. (#73601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73601

Some users had questions about how the RPC framework deals with
failures and whether we retry. Adding a note about this to our docs to
elaborate on our current behavior and why we chose that approach.
ghstack-source-id: 150359866

Test Plan: view docs.

Reviewed By: mrshenli

Differential Revision: D34560199

fbshipit-source-id: ee33ceed7fa706270d4ca5c8fcff7535583490ff
(cherry picked from commit 954a906240cc40aacf08ca13f6554a35303a678a)
2022-03-03 00:29:31 +00:00
Ren Pang
e8b10b6e34 fix wrong indexing of class names in docs
Fixes #73631

Locally built and tested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73632
Approved by: jbschlosser
2022-03-02 22:21:21 +00:00