Commit Graph

2350 Commits

Author SHA1 Message Date
Svetlana Karslioglu
4d3ea5df65 Restructure torch.compile docs (#105376)
Current torch.compile docs have become a bit of a mess with the docs expanded in the left nav. This PR moves them under the torch.compiler menu item in the left nav. A bunch of rewrites were made in collaboration with @msaroufim to address formatting issues, latest updates that moved some of the APIs to the public torch.compiler namespace were addressed as well. The documentation is broken down in three categories that address three main audiences: PyTorch users, Pytorch Developers and PyTorch backend vendors. While, the user-facing documentation was significantly rewritten, dev docs and vendor docs kept mostly untouched. This can be addressed in the follow up PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105376
Approved by: https://github.com/msaroufim
2023-07-28 20:58:57 +00:00
Mikayla Gawarecki
035124774a Enable registering fallthroughs to (op, dk) from torch.library (#106086)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106086
Approved by: https://github.com/zou3519, https://github.com/albanD
2023-07-28 19:37:59 +00:00
fduwjj
487ebcac3b Clean up unsed MHA code to avoid confusion (#105956)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105956
Approved by: https://github.com/wz337, https://github.com/ezyang, https://github.com/wanchaol
2023-07-27 17:10:17 +00:00
Edward Z. Yang
edebdaf182 Change _dynamo.explain to be explain(f)(*args, **kwargs) (#106066)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106066
Approved by: https://github.com/wanchaol, https://github.com/voznesenskym
2023-07-27 03:21:52 +00:00
Edward Z. Yang
f70844bec7 Enable UFMT on a bunch of low traffic Python files outside of main files (#106052)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106052
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-27 01:01:17 +00:00
Jerry Zhang
3a77f9aaaf [quant][api] Move torch.ao.quantization.pt2e.quantizer to torch.ao.quantization.quantizer (#105885)
Summary: moving quantizer to torch.ao.quantization to make it a public api, since pt2e is a folder for implementations

Test Plan:
CIs

sanity check: "buck test //executorch/backends/xnnpack/test:test_xnnpack_quantized_models -- test_resnet18"

Differential Revision: D47727838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105885
Approved by: https://github.com/andrewor14
2023-07-26 18:20:09 +00:00
Danni Li
c0c208516b [Doc] Add Tensor.Shape (#104750)
Summary:
Add `Tensor.Shape` doc.

Fix: #104038

Ref:

- https://github.com/pytorch/pytorch/issues/5544
- https://github.com/pytorch/pytorch/issues/1980

Differential Revision: D47278630

CC: @svekars @carljparker

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104750
Approved by: https://github.com/mikaylagawarecki
2023-07-26 16:30:15 +00:00
albanD
9d2e15882e Add torch.utils to the docs page, remove dead code and fix docstrings (#105142)
As per title.
Note that the c++ side code for the minidumps part was removed. So trying to call any of these 3 functions today results in an error saying that `torch._C` doesn't have these attributes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105142
Approved by: https://github.com/janeyx99
2023-07-26 14:24:58 +00:00
Andrew Gu
c9edf11073 [FSDP][Docs] Make model/optim state dict configs visible in docs (#105848)
This closes https://github.com/pytorch/pytorch/issues/104717.

Rendered docs:
![Screenshot 2023-07-25 at 11 15 23 AM](https://github.com/pytorch/pytorch/assets/31054793/3c38166a-70c0-472c-805d-452d3bd9c700)
![Screenshot 2023-07-25 at 11 15 30 AM](https://github.com/pytorch/pytorch/assets/31054793/6d275d94-020a-44a2-a64c-0eeba083d47f)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105848
Approved by: https://github.com/rohan-varma
2023-07-25 16:23:53 +00:00
Ruoxi
5afc2f5069 Documentation for torch.autocast (#95760)
- [x] Corrected examples for CUDA devices.
- [x] Information about availability of `torch.autocast`.

Fixes #95547

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95760
Approved by: https://github.com/leslie-fang-intel, https://github.com/kit1980
2023-07-22 03:56:34 +00:00
PyTorch MergeBot
050d3de07d Revert "Correct dynamo logging docs (#105658)"
This reverts commit f3a261e096.

Reverted https://github.com/pytorch/pytorch/pull/105658 on behalf of https://github.com/PaliC due to breaking docs f3a261e096 ([comment](https://github.com/pytorch/pytorch/pull/105658#issuecomment-1646310865))
2023-07-21 22:38:28 +00:00
David Radley
f3a261e096 Correct dynamo logging docs (#105658)
Fixes #105657

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105658
Approved by: https://github.com/zou3519
2023-07-21 21:37:02 +00:00
PyTorch MergeBot
117325862c Revert "Add torch.utils to the docs page, remove dead code and fix docstrings (#105142)"
This reverts commit e985719e98.

Reverted https://github.com/pytorch/pytorch/pull/105142 on behalf of https://github.com/huydhn due to Sorry for reverting this but it is failing python doc build job in trunk e985719e98 ([comment](https://github.com/pytorch/pytorch/pull/105142#issuecomment-1644874540))
2023-07-21 01:47:49 +00:00
albanD
e985719e98 Add torch.utils to the docs page, remove dead code and fix docstrings (#105142)
As per title.
Note that the c++ side code for the minidumps part was removed. So trying to call any of these 3 functions today results in an error saying that `torch._C` doesn't have these attributes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105142
Approved by: https://github.com/janeyx99
2023-07-21 00:14:59 +00:00
ydwu4
6abb8c382c [export] add kwargs support for export. (#105337)
Solving #105242.

During export, the exported function's signature changes multiple times. Suppose we'd like to export f as shown in following example:
```python
def f(arg1, arg2, kw1, kw2):
  pass

args = (arg1, arg2)
kwargs =  {"kw2":arg3, "kw1":arg4}

torch.export(f, args, kwargs)
```
The signature changes mutiple times during export process in the following order:
1. **gm_torch_level = dynamo.export(f, *args, \*\*kwargs)**. In this step, we turn all  kinds of parameters such as **postional_only**, **var_positioinal**, **kw_only**, and **var_kwargs** into **positional_or_kw**.It also preserves the positional and kword argument names in original function (i.e. f in this example) [here](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/export.py#L546C13-L546C27). The order of kwargs will be the **key order** of kwargs (after python 3.6, the order is the insertion of order of keys) instead of the original function signature and the order is baked into a _orig_args varaible of gm_torch_level's pytree info. So we'll have:
```python
def gm_torch_level(arg1, arg2, kw2, kw1)
```
Such difference is acceptable as it's transparent to users of export.

2. **gm_aot_export = aot_export_module(gm_torch_level, pos_or_kw_args)**. In this step, we need to turn kwargs into positional args in the order of how gm_torch_level expected, which is stored in _orig_args. The returned gm_aot_export has the graph signature of flat_args, in_spec = pytree.tree_flatten(pos_or_kw_args):
``` python
flat_args, _ = pytree.tree_flatten(pos_or_kw_args)
def gm_aot_export(*flat_args)
```

3. **exported_program(*args, \*\*kwargs)**. The epxorted artifact is exported_program, which is a wrapper over gm_aot_export and has the same calling convention as the original function "f". To do this, we need to 1. specialize the order of kwargs into pos_or_kw_args and 2. flatten the pos_or_kw_args into what gm_aot_export expected.  We can combine the two steps into one with :
```python
_, in_spec = pytree.tree_flatten((args, kwargs))

# Then during exported_program.__call__(*args, **kwargs)
flat_args  = fx_pytree.tree_flatten_spec((args, kwargs), in_spec)
```
, where kwargs is treated as a normal pytree whose keyorder is preserved in in_spec.

Implementation-wise, we treat _orig_args in dynamo exported graph module as single source of truth and kwags are ordered following it.

Test plan:
See added tests in test_export.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105337
Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan
2023-07-20 19:53:08 +00:00
Andrey Talman
c6653b65d8 Back out "Make adding buffers more like adding parameters (#104069)" (#105581)
Summary:
D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/

with `TypeError: register_buffer() takes 3 positional arguments but 4 were given`

Original commit changeset: d4b4069fbd38

Original Phabricator Diff: D47537831

Test Plan:
```
buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform
```

Reviewed By: atalman

Differential Revision: D47600140

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581
Approved by: https://github.com/mikaylagawarecki
2023-07-20 03:39:53 +00:00
Justin Chu
14d87bb5ff [BE] Enable ruff's UP rules and autoformat tools and scripts (#105428)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105428
Approved by: https://github.com/albanD, https://github.com/soulitzer, https://github.com/malfet
2023-07-19 01:24:44 +00:00
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Jerry Zhang
7b4d080496 [quant][pt2e] Rename _pt2e to pt2e (#104668)
Summary:
X-link: https://github.com/pytorch/executorch/pull/3

att

Test Plan: Imported from OSS

Differential Revision: D47202807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104668
Approved by: https://github.com/andrewor14
2023-07-15 06:34:17 +00:00
Aleksandar Samardžić
d7e6040efa Update sparse semi-structured linear operator (#104608)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104608
Approved by: https://github.com/cpuhrsch
2023-07-13 23:52:39 +00:00
Aleksandar Samardžić
fc2f87b281 Add semi-structured sparse conversions (#103830)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103830
Approved by: https://github.com/amjames, https://github.com/jcaip, https://github.com/cpuhrsch
2023-07-13 21:09:09 +00:00
William Wen
15c67ca95c Update troubleshooting.rst (#105018)
Update with `TORCH_LOGS` information

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105018
Approved by: https://github.com/mlazos
2023-07-12 21:42:53 +00:00
Rodrigo Kumpera
fc012d716d [core] Bring cpu device module closer to cuda's. (#103172)
By implementing some of the functionality used by CUDA we make
implementing device agnostic code a lot easier.

With this set of changes it's now possible to get FSDP wrap a trivial
module. FWD/BWD still TBD.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103172
Approved by: https://github.com/wz337, https://github.com/wanchaol
2023-07-12 19:43:22 +00:00
Zaili Wang
16d3638c11 Add best practices for CPU backend doc (#105051)
Content same as #103948
@svekars the PR content is updated per your comment, but when trying to solve the conflict the original PR was closed by a mis-operation. Would you help handle this new one? sorry for the inconvenience.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105051
Approved by: https://github.com/svekars
2023-07-12 18:04:51 +00:00
Svetlana Karslioglu
eb03af44ee Fixes to the torch.compile doc and doctest (#104911)
Fixing the user warning in doctest by removing autosummary from the compile/index.rst :
```
/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/__init__.py:docstring of torch.compile:1: WARNING: duplicate object description of torch.compile, other instance in compile/generated/torch.compile, use :noindex: for one of them
```
The error is no longer present in the log: https://github.com/pytorch/pytorch/actions/runs/5513741050/jobs/10052379357?pr=104911
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104911
Approved by: https://github.com/kit1980, https://github.com/malfet
2023-07-11 17:54:12 +00:00
Thiago Crepaldi
f1bff6601c [ONNX] Add fake tensor support to torch.onnx.dynamo_export (#103865)
## Context prior to this PR

https://github.com/pytorch/pytorch/pull/100017/ was merged onto PyTorch `main` branch with the goal of enabling `torch._dynamo.export` to perform symbolic tracing.
In that context, symbolic tracing is defined as tracing of a model using fake inputs and weights. An input is Fake when `torch.nn.Tensor` is replaced by `torch._subclasses.FakeTensor`, whereas a weight is fake when a `torch.nn.Parameter` is replaced by `torch._subclasses.FakeTensor`.

For additional context, several strategies were discussed with Meta to enable this feature, including 1) calling `torch._dynamo.export` within a `torch._subclass.FakeTensorMode` context and 2) **fake**fying input and model as separate step and then call `torch._dynamo.export` without an active `torch._subclass.FakeTensorMode` context. At the end, 2) was preferred and implemented by #100017 to minimize the number of side-effects the fake tensor mode has on the code base.

As a consequence, `torch._dynamo.export` API introduced a new argument called `fake_mode`. When symbolic tracing is used, the user must pass in the `fake_mode` used to fakefy both the input and the model. Internally, `torch._dynamo.export` will adopt this `fake_mode` instead of creating its own instance. This is needed because each instance of `FakeTensorMode` has metadata on the tensor/parameter it fakefied. Thus, using real tensor/model and specify a `fake_mode` to `torch._dynamo.export` is an error. Also, specify a `fake_mode` instance to `torch._dynamo.export` different than the one used to fakefy the model and input is also an error.

## Changes introduced from this PR

This PR is intended to integrate `torch._dynamo.export(fake_mode=...)` through `torch.onnx.dynamo_export`. In essence, it
* Introduces a new public API `ONNXFakeContext` which wraps a `FakeTensorMode` under the hood. This removes complexity from the user side while still allow the exporter to leverage the fake mode.
* Adds a new public API `enable_fake_mode` *context manager* that instantiates and return a `ONNXFakeContext`.
* Adds a new `ExportOptions.fake_context` that will be used to persist the `ONNXFakeContext` created by `enable_fake_mode` and plumb through until it reaches the call to `torch._dynamo.export`.
* Adds a `model_state_dict` argument to `ExportOutput.save` API.
  * When model is exported with fake tensors, no actual data exist in the FX module and, therefore, in the ONNX graph.
    * In fact, `torch.fx.make_fx` lifts initializers as model input when fake tensors are used
      * https://github.com/pytorch/pytorch/pull/104493 is needed to enforce name matching between Parameters and inputs
    *  A model checkpoint file or state_dict is needed to populate the ONNX graph with real initializers through `export_output.save(model_state_dict=...)` API

Symbolic tracing, or onnx fake mode, is only enabled when the user instantiates the input and model within the `enable_fake_mode` context. Otherwise, real tracing is done, which preserves the current behavior.

## Usability

Because symbolic tracing depends a lot on having changes made on Dynamo side before it can be consumed on ONNX exporter, this feature may have its API and assumptions changed as symbolic tracing matures upstream. Nonetheless, it is still important to have this feature merged ASAP on the ONNX exporter side to "lock" changes on Dynamo that would otherwise break ONNX exporter without warning.

Example:

```python
class Model(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = torch.nn.Linear(2, 2)

    def forward(self, x):
        out = self.linear(x)
        return out

with torch.onnx.enable_fake_mode() as fake_context:
    x = torch.rand(5, 2, 2)
    model = Model()

# Export the model with fake inputs and parameters
export_options = ExportOptions(fake_context=fake_context)
export_output = torch.onnx.dynamo_export(
    model, x, export_options=export_options
)

model_state_dict = Model().state_dict()  # optional
export_output.save("/path/to/model.onnx", model_state_dict=model_state_dict)
```

## Next steps

* Add unit tests running the exported model with ORT
Today this is not possible yet because `make_fx` used by our Decomposition pass lifts initializers as model inputs. However, the initializer names are not preserved by FX tracing, causing a mismatch between the initializer and input name.
https://github.com/pytorch/pytorch/pull/104493 and https://github.com/pytorch/pytorch/pull/104741 should fix the initializer mismatch, enabling model execution

* Revisit `ONNXTorchPatcher` and how the ONNX initializers are saved in the graph as external data
We can try to get rid of the PyTorch patcher. If we can't, we might prefer to create specific patchers, say `FXSymbolicTracePatcher` used specifically during an export using `torch.fx.symbolic_trace` and maybe a `ExportOutputSavePacther` used specifically for `ExportOutput.save` to prevent "patching too many pytorch API that we don't need

## References

* [FakeTensor implementation](https://github.com/pytorch/pytorch/blob/main/torch/_subclasses/fake_tensor.py)
* [PR that adds fake tensor support to torch._dynamo.export](https://github.com/pytorch/pytorch/pull/100017)
* [Short fake tensor documentation](https://pytorch.org/torchdistx/latest/fake_tensor.html)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103865
Approved by: https://github.com/BowenBao
2023-07-11 03:17:17 +00:00
David Radley
dbc2216800 Add autograd modes table to docs (#104774)
Fixes #104461

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104774
Approved by: https://github.com/soulitzer
2023-07-08 03:14:10 +00:00
Aleksei Nikiforov
c42fd73cf9 Add functions to get and set default endianness in load() functions (#101973)
By default interpret tensor data as native endian, but add an option to interpret data as little endian or big endian.

Related to #101688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101973
Approved by: https://github.com/mikaylagawarecki
2023-07-06 20:12:56 +00:00
toma
2abbed42ee correct the generated code and corresponding text to make them consistent (#104596)
Fixes #104500

As discussed in #104500, the [corresponding doc](https://pytorch.org/docs/stable/dynamo/get-started.html#getting-started) for dynamo is inconsistent between the code and explanation. I have run the code example to get the correct code.
![image](https://github.com/pytorch/pytorch/assets/6964699/d11e0f2f-2225-4ba9-8934-b06c9fc78721)
This PR fixes the problem and makes the doc more readable.

cc:
@davidberard98 @ezyang  please help me check this PR, thanks!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104596
Approved by: https://github.com/ezyang
2023-07-04 22:56:03 +00:00
angelayi
828b275740 [exportdb] Setup website (#104288)
<img width="1109" alt="image" src="https://github.com/pytorch/pytorch/assets/10901756/e67ff8a9-adb1-466f-8285-fb7d3653d139">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104288
Approved by: https://github.com/zhxchen17
2023-07-01 01:03:56 +00:00
Jesse Cai
2da6cae43c [core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.

In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.

This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
  sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
  semi-structured sparse bool mask and creates an instance of the
  subclass.

**SparseSemiStructuredTensor**

The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.

Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().

Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.

**to_sparse_semi_structured**

This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.

Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.

**User Details**

We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```

The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:

```
from torch.sparse import to_sparse_semi_structured

mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()

linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
                                                       mask=linear.weight.bool())

```

This also updates tests and the `torch.sparse` module docstring to
reflect these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD
2023-06-27 19:21:06 +00:00
PyTorch MergeBot
b76a040b18 Revert "[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)"
This reverts commit aea771de30.

Reverted https://github.com/pytorch/pytorch/pull/102135 on behalf of https://github.com/huydhn due to test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mm_sparse_first_NT_cuda_int8 is still failing CUDA trunk jobs aea771de30 ([comment](https://github.com/pytorch/pytorch/pull/102135#issuecomment-1608744110))
2023-06-27 03:49:31 +00:00
Jesse Cai
aea771de30 [core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.

In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.

This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
  sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
  semi-structured sparse bool mask and creates an instance of the
  subclass.

**SparseSemiStructuredTensor**

The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.

Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().

Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.

**to_sparse_semi_structured**

This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.

Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.

**User Details**

We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```

The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:

```
from torch.sparse import to_sparse_semi_structured

mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()

linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
                                                       mask=linear.weight.bool())

```

This also updates tests and the `torch.sparse` module docstring to
reflect these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD
2023-06-27 02:37:00 +00:00
Mikayla Gawarecki
981f24e806 Add docstring to torch.serialization.register_package (#104046)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104046
Approved by: https://github.com/albanD
2023-06-26 23:28:32 +00:00
PyTorch MergeBot
bfa08a1c67 Revert "[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)"
This reverts commit cf5262a84f.

Reverted https://github.com/pytorch/pytorch/pull/102135 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mm_sparse_first_NT_cuda_int8 is failing CUDA trunk jobs cf5262a84f. This looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/102135#issuecomment-1608423849))
2023-06-26 22:54:16 +00:00
Jesse Cai
cf5262a84f [core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.

In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.

This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
  sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
  semi-structured sparse bool mask and creates an instance of the
  subclass.

**SparseSemiStructuredTensor**

The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.

Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().

Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.

**to_sparse_semi_structured**

This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.

Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.

**User Details**

We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```

The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:

```
from torch.sparse import to_sparse_semi_structured

mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()

linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
                                                       mask=linear.weight.bool())

```

This also updates tests and the `torch.sparse` module docstring to
reflect these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD
2023-06-26 21:30:43 +00:00
Sergii Dymchenko
adf9595c2f Update CODEOWNERS (#103934)
Remove users that no longer have write access to the repo, resolving CODEOWNERS errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103934
Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/malfet
2023-06-26 19:29:29 +00:00
ZhaoqiongZ
7cef7195f6 [draft] Update Multiprocessing best practices with CPU device (#103229)
Fixes [#102498](https://github.com/pytorch/pytorch/issues/102498)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103229
Approved by: https://github.com/mingfeima, https://github.com/svekars, https://github.com/jgong5
2023-06-25 06:26:40 +00:00
Zachary DeVito
afc788a99c Re-land _cycleviz.py: visualize reference cycles holding cuda memory (#104051)
Reference cycles are freed by the cycle collector rather than being cleaned up
when the objects in the cycle first become unreachable. If a cycle points to a tensor,
the CUDA memory for that tensor will not be freed until garbage collection runs.
Accumulation of CUDA allocations can lead to out of memory errors (OOMs), as well as
non-deterministic allocation behavior which is harder to debug.

This visualizer installs a garbage collection hook to look for cycles containing
CUDA tensors and saves a visualization of the garbage:

```
from torch.cuda._cycleviz import warn_tensor_cycles
warn_tensor_cycles()
# do some work that results in a cycle getting garbage collected
# ...
> WARNING:root:Reference cycle includes a CUDA Tensor see visualization of cycle /tmp/tmpeideu9gl.html
```

Reland to make windows skip the test.

This reverts commit 7b3b6dd426.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104051
Approved by: https://github.com/aaronenyeshi, https://github.com/malfet
2023-06-23 13:44:58 +00:00
PyTorch MergeBot
7b3b6dd426 Revert "_cycleviz.py: visualize reference cycles holding cuda memory (#102656)"
This reverts commit dba67f71c9.

Reverted https://github.com/pytorch/pytorch/pull/102656 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I think the change is failing on Windows CUDA https://github.com/pytorch/pytorch/actions/runs/5341701630/jobs/9683293600 ([comment](https://github.com/pytorch/pytorch/pull/102656#issuecomment-1603035364))
2023-06-22 17:16:47 +00:00
albanD
4143b6b89b Add torch_dispatch and modes to extending.rst note (#102087)
The following subjects are not in this PR and will be done in a follow up:
- Go through torch_function section and update to the latest phrasing and link to the proper new sections
- Go through torch.library and custom device docs to add links to the new sections as appropriate
- Top level explanations on which component should be used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102087
Approved by: https://github.com/janeyx99
2023-06-22 12:56:35 +00:00
Zachary DeVito
dba67f71c9 _cycleviz.py: visualize reference cycles holding cuda memory (#102656)
Reference cycles are freed by the cycle collector rather than being cleaned up
when the objects in the cycle first become unreachable. If a cycle points to a tensor,
the CUDA memory for that tensor will not be freed until garbage collection runs.
Accumulatin of CUDA allocations can lead to out of memory errors (OOMs), as well as
non-deterministic allocation behavior which is harder to debug.

This visualizer installs a garbage collection hook to look for cycles containing
CUDA tensors and saves a visualization of the garbage:

```
from torch.cuda._cycleviz import warn_tensor_cycles
warn_tensor_cycles()
# do some work that results in a cycle getting garbage collected
# ...
> WARNING:root:Reference cycle includes a CUDA Tensor see visualization of cycle /tmp/tmpeideu9gl.html
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102656
Approved by: https://github.com/aaronenyeshi
2023-06-22 04:00:28 +00:00
Michael Suo
a475ea4542 [fx] change from #users to num_users in graph printout (#101140)
`#users` means stuff in various chat apps, which makes it annoying to copypasta graphs into them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101140
Approved by: https://github.com/ezyang
2023-06-20 21:24:32 +00:00
PyTorch MergeBot
e031dd23b0 Revert "To add brief intro for CPU backend optimization (#103666)"
This reverts commit 013ffe457e.

Reverted https://github.com/pytorch/pytorch/pull/103666 on behalf of https://github.com/huydhn due to Failing doc tests in trunk 013ffe457e ([comment](https://github.com/pytorch/pytorch/pull/103666#issuecomment-1599301270))
2023-06-20 18:33:01 +00:00
Zaili Wang
013ffe457e To add brief intro for CPU backend optimization (#103666)
This PR is about adding brief introduction for x86 CPU backend optimization. Per previous discussion, the former PR #103307 was closed and creating this one, the contents are put into a new file.
@Guobing-Chen @jgong5 @mingfeima @jingxu10 please help review, thanks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103666
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-06-20 17:35:22 +00:00
leslie-fang-intel
9832cfbbfe Quantization oneDNN backend only support VNNI CPU (#103653)
**Summary**

- Update the quantization document that default qconfig with oneDNN backend is recommended to be used on CPUs with Vector Neural Network Instruction support.
- Add the warning message when user uses default qconfig with oneDNN backend on CPU without Vector Neural Network Instruction support.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103653
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-06-19 09:50:07 +00:00
albanD
918fe519a0 Use the new analytics ID (#103766)
Re: https://github.com/pytorch/pytorch.github.io/issues/1397
Following the migration to latest google analytics
FYI @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103766
Approved by: https://github.com/svekars
2023-06-16 23:21:08 +00:00
Edward Z. Yang
bc6ec97e02 Switch dynamic_shapes to True by default (#103597)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597
Approved by: https://github.com/voznesenskym
2023-06-15 15:16:20 +00:00
Mark Saroufim
ea384cd377 torch.compiler public namespace (#102182)
# torch.compiler public API

## Goal

The goal of this document is to describe the public facing API for torchdynamo and torchinductor.

Today both dynamo and torchinductor are in `torch/_dynamo` and `torch/_inductor` namespace with the only public function

`torch.compile()` which is directly placed in `torch/__init__.py`

This poses a few problems for users trying to take dependencies on PyTorch 2.0
1. Unclear BC guarantees
2. No builtin discovery mechanism outside of reading the source code
3. No hard requirements for docstrings or type annotations

Most importantly it mixes two personas the PyTorch 2.0 developer vs the PyTorch 2.0 customer so this is an attempt to address this. We draw a lot of inspiration from the `functorch` migration to the `func` namespace.

## Alternate names

We did discuss some other alternative names

1. `torch.compile` -> problem is this would break BC on the existing `torch.compile` function
2. `torch.dynamo` -> `dynamo` is so far not something we've deliberately hidden from users but problem is now figuring out what it's `_dynamo` vs `dynamo` might be confusing
3. `torch.compiler` -> 1 would be better but to keep BC this is a good compromise

# The general approach
## Proposal 1
In https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py

We have function called `reset()`, this function is essential if users are trying to `torch.compile()` a model under different settings

```python
# in _dynamo/
def reset():
    do_reset_stuff()
```

Instead we propose

```python
# in compiler/
def reset():
    do_reset_stuff() # As in copy paste the logic from _dynamo.reset

# in _dynamo/
import warnings
import inspect

def reset():
    function_name = inspect.currentframe().f_code.co_name
    warnings.warn(f"{function_name} is deprecated, use compiler.{function_name} instead", DeprecationWarning)
    return compiler.reset()

```
## Proposal 2

```python
# in compiler/
def reset():
    “””
    Docstrings here
    “””
    _dynamo.reset()

# in _dynamo/
No changes
```
Consensus so far seems to be proposal 2 since fewer warnings will be less jarring and it’ll make it quite easy to merge the public API

## Docstrings

The above was an example of a function that has no inputs or outputs but there are other functions which could use an improvement in their docstrings, for example allow_in_graph actually works over lists of functions but that’s not mentioned anywhere in the example only if you read the source code.

def allow_in_graph(fn):
    """
    Customize which functions TorchDynamo will include in the generated
    graph. Similar to `torch.fx.wrap()`.

    Parameters:
        fn (callable or list/tuple): The function(s) to be allowed in the graph.

    Returns:
        callable or list/tuple: The input function(s) included in the graph.

    Examples:
        Customize inclusion of a single function:
        ::
            torch._dynamo.allow_in_graph(my_custom_function)

        Customize inclusion of multiple functions:
        ::
            torch._dynamo.allow_in_graph([my_custom_function1, my_custom_function2])

        @torch._dynamo.optimize(...)
        def fn(a):
            x = torch.add(x, 1)
            x = my_custom_function(x)
            x = torch.add(x, 1)
            return x

        fn(...)

    Notes:
        The `allow_in_graph` function allows customization of which functions TorchDynamo
        includes in the generated graph. It can be used to include specific functions that
        are not automatically captured by TorchDynamo.

        If `fn` is a list or tuple, `allow_in_graph` will be called recursively on each
        element in the sequence.

        Once a function is allowed in the graph using `allow_in_graph`, it will be captured
        in the graph generated by TorchDynamo. This customization enables more fine-grained
        control over the functions included in the graph.

        Note that `allow_in_graph` expects the input `fn` to be a callable.

    """
    if isinstance(fn, (list, tuple)):
        return [allow_in_graph(x) for x in fn]
    assert callable(fn), "allow_in_graph expects a callable"
    allowed_functions._allowed_function_ids.add(id(fn))
    allowed_functions._disallowed_function_ids.remove(id(fn))
    return fn

So to make the API public, we’d have to write similar docstrings for all public functions we’d like to create.

The benefit of this approach is that
1. No BC risks, internal and external users relying on our tooling can slowly wean off the private functions.
2. We will also have to write correct docstrings which will automatically make our documentation easier to maintain and render correctly on pytorch.org
3. We already have some BC guarantees already, we don’t kill OptimizedModule, we rejected the PR to change the config system

The con of this approach is that
Will be stuck with some potentially suboptimal functions/classes that you can’t kill

## Testing strategy
If the approach is to mostly make a public function call an already tested private function then all we need to do is ensure that the function signatures don't change

## Which functions should be in the public API

Our heuristic for deciding whether something should be public or not is are users already relying on it for lack of other options or have we recommended some non public functions for users to debug their PT 2.0 programs.

Heuristic for not putting something in public is that it’s an experimental subsystem with the goal of turning it on by default, it’s very core dev centric, meta centric, a bunch of different configs that should be batched into a single user facing one, or something that needs to be renamed because the name is confusing

#### Top level
`torch.compile()` -> already is a public API it does require some minor improvements like having configs be passed in to any backend and not just inductor (EDIT: This was already done https://github.com/pytorch/pytorch/pull/99645l) and renaming `mode=reduce-overhead` to `mode=cudagraph`

To make sure that PT 2.0 is supported with a given pytorch version users can create a new public function and this would replace the need for `try/except` blocks around `import torch._dynamo` that has been populating user code.

```python
def pt2_enabled():
    if hasattr(torch, 'compile'):
        return True
    else:
        return False
```

For all of the below they will be translated to `torch.compiler.function_name()`

#### From _dynamo

As a starting point we looked at https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py and we suggest redefining these functions in `pytorch/torch/compiler/__init__.py`

It might also make sense to split them over multiple files and import them in `__init__.py` but because the number of functions is small it'd probably be fine to add them all into a single compiler/__init__.py until this list becomes larger

1. `reset()`
2. `allow_in_graph()`
10. `list_backends()`
12. `compile()`:  torch.compile() would be mostly a shell function passing arguments to torch.compiler.compile()
13. `assume_constant_result()`: TODO: Double check how this is useful
15. `torch._dynamo.disable()`

Some notable omissions
11. `explain()`: We need to clean up the output for this function, make it a data class and pretty printable
1. `forbid_in_graph()`: Considered adding this but should instead consolidate on `disallow_in_graph`
2. `optimize_assert()`: Already covered by `torch.compile(fullgraph=True)`
3. `check_if_dynamo_supported()`: this would be supplanted by pt2_enabled()
4. `compilation_metrics`, `graph_breaks_reasons` ..: would all be accessed via `torch.compiler.explain()`
5. `replay` does not seem useful to end customers
6. . `graph_break()`: Mostly useful for debugging or unit tests
9. `register_backend()`: End users will just pass a string backend to torch.compile, only devs will create new backends
10. `export()` : Eventually this needs to public but for now it’s not ready so just highlighting that it will be in the public API eventually
11. `disallow_in_graph()`: Usage is limited
12. `mark_static()`: we can keep this private until dynamic=True is recommended in stable
13. `mark_dynamic()`:  we can keep this private until dynamic=True is recommended in trunk
14. 8. `OptimizedModule`: This is the only class that we'd expose but is crucial since users are running code like `if isinstance(mod, OptimizedModule): torch.save(mod._orig_mod)` EDIT: because we fixed pickling we no longer need to
expose this
15. `is_compiling()`: Still not clear how this useful to end users

There are also config variables which we need to expose https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py

Some of our configs are useful dev flags, others are to gate experimental functionality and others are essential debugging tools and we seperate out the essential debugging and logging tools to a public facing config.

TODO: I still need to think of a good way of porting the config in a BC way here are some ideas
1. Just make all passes available and controllable via `torch.compile(options={})` but only show docstrings for the ones users should care about.

The current problem with our config system is we have 3 ways of setting them once via `options={}`, environment variables and variables in `config.py`, it'd be worth settling on one source of truth and have that be the public API.

The configs we should make public are
1. `log_file_name`
2. `verbose`
3. `cache_size_limit`
4. `repro_level` and `repro_after`: Although we can rename these to minifier and give human readable names to the levels

Everything else should stay private in particular

1. `print_graph_breaks`, `print_specializations`: should be supplanted by `explain()` for public users
2. dynamic shape configs : Users should only have to worry about `torch.compile(dynamic=True/False)`
3. The distributed flags, hook or guard configs: If we tell a user to use FSDP and DDP then the flag should be enabled by default or be in a private namespace
4. The fbcode flags: Obviously no need to be user facing
5. Skip/Allow lists: Not something normal users should play around with

#### From _inductor
Very little of inductor should be exposed in a public facing API, our core audience as in people writing models mostly just need information on what certain passes mean and how to control them a high level and they can do this with `torch.compile(options={})` so the goal here should be more to make available passes clearer and ideally consolidate them into `torch.compile()` docstrings or modes.

There are some exceptions though from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/__init__.py

1. `list_mode_options()`
2. `list_options()`: this needs an additional pass to hide internal or debug options

For both of these we’d rename them to compiler.inductor_list_mode_options and compiler.inductor_list_options() since they would be in the same init file as the one for dynamo

Notable omissions
1. `_inductor.compile()`: Because of users are coming in with their own fx graph, they are likely developers
2. `_inductor.aot_compile()`:Again this is about capturing and modifying fx graphs so users APIs don't need to be public

However the configs are a slightly different story, because we can choose to either
1. Make all configs public
2. Make some configs public and keep most of the private ones. If public config is set it should override the private version
3. Make all configs controllable via `torch.compile(options={})` but make list_options() hide more things

For now 3 seems like the most reasonable choice with some high level configs we’ll keep like TORCH_COMPILE_DEBUG

Regardless here's what should probably be public or advertised more
1. `disable_progress` and verbose_progress:  Combine and enable by default
2. `fallback_random`: We could make the case this shouldn't be public if a top level deterministic mode enables this
3. `profile_bandwidth`: Or could make the case that this should be in TORCH_COMPILE_DEBUG

Notable omissions
1. Any config that would generally improve performance for most that we should probably enable by default but might be disabled in the short term because of stability: example `epilogue_fusion`, `pattern_matcher`, `reordering`
2. Autotuning flags: Should just sit behind `torch.compile(mode="max-autotune")` like `max_autotune`, `max_autotune_gemm`
3. `coordinate_descent_tuning`: This one I'm a but mixed about, maybe it just also fall into `mode="max-autotune"`
4. `trace`: `TORCH_COMPILE_DEBUG` is the best flag for all of this
5. `triton.cudagraphs`: Default should be `torch.compile(mode="reduce-overhead")` - I'd go further and rename the `mode=cudagraph` and we can keep reduce-overhead for BC reasons
6. `triton_unique_kernel_names`: Mostly useful for devs debugging
7. `dce`: which doesnt really do anything
8. `shape_padding`: Elias is working on enabling this by default in which case we also remove it

## Mechanics

This PR would include the public functions with their docstrings

Another PR will take a stab at the configs

And for work where the APIs are still being cleaned up whether its minifier or escape hatches, export or dynamic shapes, aot_inductor etc.. we’ll keep them private until a public commitment can be made

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102182
Approved by: https://github.com/jansel, https://github.com/albanD
2023-06-13 19:52:17 +00:00
Michael Lazos
6c6c897d6b Add graph break logging option instead of config flag (#103202)
Make graph break logging a logging option vs a config setting

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103202
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2023-06-12 19:52:31 +00:00