Commit Graph

2108 Commits

Author SHA1 Message Date
PyTorch MergeBot
258d398eec Revert "torch.compiler public namespace (#102182)"
This reverts commit b5840f99c3.

Reverted https://github.com/pytorch/pytorch/pull/102182 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/102182#issuecomment-1576144551))
2023-06-05 06:52:37 +00:00
Mark Saroufim
b5840f99c3 torch.compiler public namespace (#102182)
# torch.compiler public API

## Goal

The goal of this document is to describe the public facing API for torchdynamo and torchinductor.

Today both dynamo and torchinductor are in `torch/_dynamo` and `torch/_inductor` namespace with the only public function

`torch.compile()` which is directly placed in `torch/__init__.py`

This poses a few problems for users trying to take dependencies on PyTorch 2.0
1. Unclear BC guarantees
2. No builtin discovery mechanism outside of reading the source code
3. No hard requirements for docstrings or type annotations

Most importantly it mixes two personas the PyTorch 2.0 developer vs the PyTorch 2.0 customer so this is an attempt to address this. We draw a lot of inspiration from the `functorch` migration to the `func` namespace.

## Alternate names

We did discuss some other alternative names

1. `torch.compile` -> problem is this would break BC on the existing `torch.compile` function
2. `torch.dynamo` -> `dynamo` is so far not something we've deliberately hidden from users but problem is now figuring out what it's `_dynamo` vs `dynamo` might be confusing
3. `torch.compiler` -> 1 would be better but to keep BC this is a good compromise

# The general approach
## Proposal 1
In https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py

We have function called `reset()`, this function is essential if users are trying to `torch.compile()` a model under different settings

```python
# in _dynamo/
def reset():
    do_reset_stuff()
```

Instead we propose

```python
# in compiler/
def reset():
    do_reset_stuff() # As in copy paste the logic from _dynamo.reset

# in _dynamo/
import warnings
import inspect

def reset():
    function_name = inspect.currentframe().f_code.co_name
    warnings.warn(f"{function_name} is deprecated, use compiler.{function_name} instead", DeprecationWarning)
    return compiler.reset()

```
## Proposal 2

```python
# in compiler/
def reset():
    “””
    Docstrings here
    “””
    _dynamo.reset()

# in _dynamo/
No changes
```
Consensus so far seems to be proposal 2 since fewer warnings will be less jarring and it’ll make it quite easy to merge the public API

## Docstrings

The above was an example of a function that has no inputs or outputs but there are other functions which could use an improvement in their docstrings, for example allow_in_graph actually works over lists of functions but that’s not mentioned anywhere in the example only if you read the source code.

def allow_in_graph(fn):
    """
    Customize which functions TorchDynamo will include in the generated
    graph. Similar to `torch.fx.wrap()`.

    Parameters:
        fn (callable or list/tuple): The function(s) to be allowed in the graph.

    Returns:
        callable or list/tuple: The input function(s) included in the graph.

    Examples:
        Customize inclusion of a single function:
        ::
            torch._dynamo.allow_in_graph(my_custom_function)

        Customize inclusion of multiple functions:
        ::
            torch._dynamo.allow_in_graph([my_custom_function1, my_custom_function2])

        @torch._dynamo.optimize(...)
        def fn(a):
            x = torch.add(x, 1)
            x = my_custom_function(x)
            x = torch.add(x, 1)
            return x

        fn(...)

    Notes:
        The `allow_in_graph` function allows customization of which functions TorchDynamo
        includes in the generated graph. It can be used to include specific functions that
        are not automatically captured by TorchDynamo.

        If `fn` is a list or tuple, `allow_in_graph` will be called recursively on each
        element in the sequence.

        Once a function is allowed in the graph using `allow_in_graph`, it will be captured
        in the graph generated by TorchDynamo. This customization enables more fine-grained
        control over the functions included in the graph.

        Note that `allow_in_graph` expects the input `fn` to be a callable.

    """
    if isinstance(fn, (list, tuple)):
        return [allow_in_graph(x) for x in fn]
    assert callable(fn), "allow_in_graph expects a callable"
    allowed_functions._allowed_function_ids.add(id(fn))
    allowed_functions._disallowed_function_ids.remove(id(fn))
    return fn

So to make the API public, we’d have to write similar docstrings for all public functions we’d like to create.

The benefit of this approach is that
1. No BC risks, internal and external users relying on our tooling can slowly wean off the private functions.
2. We will also have to write correct docstrings which will automatically make our documentation easier to maintain and render correctly on pytorch.org
3. We already have some BC guarantees already, we don’t kill OptimizedModule, we rejected the PR to change the config system

The con of this approach is that
Will be stuck with some potentially suboptimal functions/classes that you can’t kill

## Testing strategy
If the approach is to mostly make a public function call an already tested private function then all we need to do is ensure that the function signatures don't change

## Which functions should be in the public API

Our heuristic for deciding whether something should be public or not is are users already relying on it for lack of other options or have we recommended some non public functions for users to debug their PT 2.0 programs.

Heuristic for not putting something in public is that it’s an experimental subsystem with the goal of turning it on by default, it’s very core dev centric, meta centric, a bunch of different configs that should be batched into a single user facing one, or something that needs to be renamed because the name is confusing

#### Top level
`torch.compile()` -> already is a public API it does require some minor improvements like having configs be passed in to any backend and not just inductor (EDIT: This was already done https://github.com/pytorch/pytorch/pull/99645l) and renaming `mode=reduce-overhead` to `mode=cudagraph`

To make sure that PT 2.0 is supported with a given pytorch version users can create a new public function and this would replace the need for `try/except` blocks around `import torch._dynamo` that has been populating user code.

```python
def pt2_enabled():
    if hasattr(torch, 'compile'):
        return True
    else:
        return False
```

For all of the below they will be translated to `torch.compiler.function_name()`

#### From _dynamo

As a starting point we looked at https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py and we suggest redefining these functions in `pytorch/torch/compiler/__init__.py`

It might also make sense to split them over multiple files and import them in `__init__.py` but because the number of functions is small it'd probably be fine to add them all into a single compiler/__init__.py until this list becomes larger

1. `reset()`
2. `allow_in_graph()`
10. `list_backends()`
12. `compile()`:  torch.compile() would be mostly a shell function passing arguments to torch.compiler.compile()
13. `assume_constant_result()`: TODO: Double check how this is useful
15. `torch._dynamo.disable()`

Some notable omissions
11. `explain()`: We need to clean up the output for this function, make it a data class and pretty printable
1. `forbid_in_graph()`: Considered adding this but should instead consolidate on `disallow_in_graph`
2. `optimize_assert()`: Already covered by `torch.compile(fullgraph=True)`
3. `check_if_dynamo_supported()`: this would be supplanted by pt2_enabled()
4. `compilation_metrics`, `graph_breaks_reasons` ..: would all be accessed via `torch.compiler.explain()`
5. `replay` does not seem useful to end customers
6. . `graph_break()`: Mostly useful for debugging or unit tests
9. `register_backend()`: End users will just pass a string backend to torch.compile, only devs will create new backends
10. `export()` : Eventually this needs to public but for now it’s not ready so just highlighting that it will be in the public API eventually
11. `disallow_in_graph()`: Usage is limited
12. `mark_static()`: we can keep this private until dynamic=True is recommended in stable
13. `mark_dynamic()`:  we can keep this private until dynamic=True is recommended in trunk
14. 8. `OptimizedModule`: This is the only class that we'd expose but is crucial since users are running code like `if isinstance(mod, OptimizedModule): torch.save(mod._orig_mod)` EDIT: because we fixed pickling we no longer need to
expose this
15. `is_compiling()`: Still not clear how this useful to end users

There are also config variables which we need to expose https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py

Some of our configs are useful dev flags, others are to gate experimental functionality and others are essential debugging tools and we seperate out the essential debugging and logging tools to a public facing config.

TODO: I still need to think of a good way of porting the config in a BC way here are some ideas
1. Just make all passes available and controllable via `torch.compile(options={})` but only show docstrings for the ones users should care about.

The current problem with our config system is we have 3 ways of setting them once via `options={}`, environment variables and variables in `config.py`, it'd be worth settling on one source of truth and have that be the public API.

The configs we should make public are
1. `log_file_name`
2. `verbose`
3. `cache_size_limit`
4. `repro_level` and `repro_after`: Although we can rename these to minifier and give human readable names to the levels

Everything else should stay private in particular

1. `print_graph_breaks`, `print_specializations`: should be supplanted by `explain()` for public users
2. dynamic shape configs : Users should only have to worry about `torch.compile(dynamic=True/False)`
3. The distributed flags, hook or guard configs: If we tell a user to use FSDP and DDP then the flag should be enabled by default or be in a private namespace
4. The fbcode flags: Obviously no need to be user facing
5. Skip/Allow lists: Not something normal users should play around with

#### From _inductor
Very little of inductor should be exposed in a public facing API, our core audience as in people writing models mostly just need information on what certain passes mean and how to control them a high level and they can do this with `torch.compile(options={})` so the goal here should be more to make available passes clearer and ideally consolidate them into `torch.compile()` docstrings or modes.

There are some exceptions though from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/__init__.py

1. `list_mode_options()`
2. `list_options()`: this needs an additional pass to hide internal or debug options

For both of these we’d rename them to compiler.inductor_list_mode_options and compiler.inductor_list_options() since they would be in the same init file as the one for dynamo

Notable omissions
1. `_inductor.compile()`: Because of users are coming in with their own fx graph, they are likely developers
2. `_inductor.aot_compile()`:Again this is about capturing and modifying fx graphs so users APIs don't need to be public

However the configs are a slightly different story, because we can choose to either
1. Make all configs public
2. Make some configs public and keep most of the private ones. If public config is set it should override the private version
3. Make all configs controllable via `torch.compile(options={})` but make list_options() hide more things

For now 3 seems like the most reasonable choice with some high level configs we’ll keep like TORCH_COMPILE_DEBUG

Regardless here's what should probably be public or advertised more
1. `disable_progress` and verbose_progress:  Combine and enable by default
2. `fallback_random`: We could make the case this shouldn't be public if a top level deterministic mode enables this
3. `profile_bandwidth`: Or could make the case that this should be in TORCH_COMPILE_DEBUG

Notable omissions
1. Any config that would generally improve performance for most that we should probably enable by default but might be disabled in the short term because of stability: example `epilogue_fusion`, `pattern_matcher`, `reordering`
2. Autotuning flags: Should just sit behind `torch.compile(mode="max-autotune")` like `max_autotune`, `max_autotune_gemm`
3. `coordinate_descent_tuning`: This one I'm a but mixed about, maybe it just also fall into `mode="max-autotune"`
4. `trace`: `TORCH_COMPILE_DEBUG` is the best flag for all of this
5. `triton.cudagraphs`: Default should be `torch.compile(mode="reduce-overhead")` - I'd go further and rename the `mode=cudagraph` and we can keep reduce-overhead for BC reasons
6. `triton_unique_kernel_names`: Mostly useful for devs debugging
7. `dce`: which doesnt really do anything
8. `shape_padding`: Elias is working on enabling this by default in which case we also remove it

## Mechanics

This PR would include the public functions with their docstrings

Another PR will take a stab at the configs

And for work where the APIs are still being cleaned up whether its minifier or escape hatches, export or dynamic shapes, aot_inductor etc.. we’ll keep them private until a public commitment can be made

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102182
Approved by: https://github.com/jansel
2023-06-02 14:38:55 +00:00
Weiming Zhao
b76af5f9a6 Fix broken link in Dynamo's guards doc (#102183) (#102185)
This PR fixes broken link for the code referenced in the guards doc.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102185
Approved by: https://github.com/mikaylagawarecki, https://github.com/ezyang
2023-06-02 14:36:28 +00:00
Thomas J. Fan
0d17bd5fa4 DOC Fixes unpacking issue in dynamo explain docs (#101761)
This PR updates the docs to be consistent with `torch.explain` which currently returns 6 items:

bfb3941ad8/torch/_dynamo/eval_frame.py (L622-L629)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101761
Approved by: https://github.com/desertfire
2023-05-25 22:32:15 +00:00
Elias Ellison
aa83a52742 Profiling doc (#101895)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101895
Approved by: https://github.com/msaroufim, https://github.com/shunting314
2023-05-25 04:57:38 +00:00
Elias Ellison
4692ea76a0 Fine grained apis docs (#101897)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101897
Approved by: https://github.com/msaroufim
2023-05-23 19:03:44 +00:00
Elias Ellison
2bce7c8f46 CUDAGraph trees doc (#101902)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101902
Approved by: https://github.com/msaroufim
2023-05-23 03:35:43 +00:00
Ramil Nugmanov
2ae87a1f87 missed StackDataset documentation (#101927)
New dataset class added by #101338 missed in documentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101927
Approved by: https://github.com/kit1980
2023-05-22 21:12:16 +00:00
Ren Pang
a630328695 Fix Backend docs search items (#101214)
Fixes #100944

## New

<img width="1142" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/79102f2e-8a8f-4169-be53-9248397e653c">

<img width="765" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/4e5f17e7-a445-4822-ac8a-0d73c9ed71ee">

## Old

<img width="1341" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/985b4ec9-6d11-4962-8619-3c14ec09c3d9">

<img width="1112" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/e8dcf1a9-73e7-4fd6-8adc-eb036b1bb87b">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101214
Approved by: https://github.com/albanD
2023-05-22 14:58:38 +00:00
Rickey K. Liang
807d81155f [CUDA][CUBLAS] Fix BF16 reduced precision reduction note in Numerical accuracy docs (#101884)
Fixes #100966

Ref #101044

Align implementation and documentation. (This is what's previously missed from the above issue and PR)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101884
Approved by: https://github.com/eqy, https://github.com/ezyang
2023-05-21 17:38:00 +00:00
Mark Saroufim
3666ca9d97 Dynamic Shape Doc (#101885)
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 2f25c1e</samp>

> _Dynamic shapes guide_
> _`TorchDynamo` and `TorchInductor`_
> _Learn from data flow_

Thanks @ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101885
Approved by: https://github.com/eellison, https://github.com/ezyang
2023-05-19 21:43:22 +00:00
Mark Saroufim
ff5b9428aa Fake Tensor Docs (#101882)
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 75f33ae</samp>

> _Fake tensors help_
> _compile and optimize code_
> _`PT2` in autumn_

Thanks @ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101882
Approved by: https://github.com/eellison, https://github.com/ezyang
2023-05-19 21:39:34 +00:00
Mark Saroufim
581d13a069 Add Logging Doc to compile index (#101888)
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at ba85a41</samp>

> _`logging` module_
> _documents PyTorch events_
> _cutting through the fog_

Thanks @mlazos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101888
Approved by: https://github.com/eellison
2023-05-19 21:29:25 +00:00
Mark Saroufim
2dd33c71c1 Docs for torchcompile and functorch (#101881)
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at b5f48b6</samp>

> _`torch.compile` docs_
> _Add a new section for `func`_
> _Winter of features_

Thanks @zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101881
Approved by: https://github.com/eellison, https://github.com/zou3519
2023-05-19 21:23:43 +00:00
Jane Xu
cde597efa1 [docs] Warn that GradScaler can scale under 1 (#101569)
Completes action item 1 in https://github.com/pytorch/pytorch/issues/99640

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101569
Approved by: https://github.com/ngimel
2023-05-16 23:56:07 +00:00
PyTorch MergeBot
66eef31444 Revert "[fx] change from #users to num_users in graph printout (#101140)"
This reverts commit e568c5a18d.

Reverted https://github.com/pytorch/pytorch/pull/101140 on behalf of https://github.com/jeanschmidt due to There are internal changes to this commit that are preventing landing, so I am reverting to unblock the diff train ([comment](https://github.com/pytorch/pytorch/pull/101140#issuecomment-1547989487))
2023-05-15 14:35:22 +00:00
Ramin Azarmehr
0be53d83fc [MPS] Add support for MPSProfiler Python bindings (#101002)
- Added torch.mps.profiler.[start() and stop()] APIs with RST documentation
- Added test case in test_mps
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101002
Approved by: https://github.com/malfet
2023-05-12 21:55:34 +00:00
Yueming Hao
a12b640dc9 Fix typos in troubleshooting.rst (#101305)
There are several typos in the troubleshooting documentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101305
Approved by: https://github.com/desertfire
2023-05-12 21:05:13 +00:00
Ran Ding
b5c8d0359c Update autograd.rst (#101007)
Fixes #ISSUE_NUMBER

typo fix and small change to improve clarity

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101007
Approved by: https://github.com/lezcano, https://github.com/anjali411
2023-05-12 11:47:51 +00:00
Michael Suo
e568c5a18d [fx] change from #users to num_users in graph printout (#101140)
`#users` means stuff in various chat apps, which makes it annoying to copypasta graphs into them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101140
Approved by: https://github.com/ezyang
2023-05-12 04:34:01 +00:00
eqy
33f3dca6b5 [CUDA][CUBLAS] Fix BF16 reduced precision reduction note in docs (#101044)
#100966

CC @ngimel @ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101044
Approved by: https://github.com/ngimel
2023-05-10 06:50:58 +00:00
eqy
6e2efd16d8 [CUDA][CUBLAS] Add cuBLAS workspace allocation behavior to docs (#100919)
Adding to the docs for now, hopefully we can move to `cudaMallocAsync`-backed cuBLAS workspaces soon which should alleviate the recent confusion around `cuBLAS` "leaking" memory through workspaces.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100919
Approved by: https://github.com/ngimel
2023-05-10 06:40:26 +00:00
fduwjj
953aa6d90e [TP] Enable more generic attn in Tensor Parallelism (#100508)
To make TP more generic for Attention module, we come up with this new col/rowwise parallel style.

Basically, the idea behind is that:
We only do DTensor op for Col/Rowwise sharded part. For the rest of ATen ops, we will leave it to Tensor ops.

And we set this behavior as default for Colwise and Rowwise parallel style. If people want to customize it, they can always pass in different prepare_input or prepare_output

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100508
Approved by: https://github.com/wanchaol
2023-05-07 18:15:49 +00:00
Michael Lazos
850556ed6e Add "all" option to logging (#100664)
Adds the long-promised "all" option to logging.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100664
Approved by: https://github.com/lezcano
2023-05-06 01:11:18 +00:00
Michael Lazos
c525440ba3 Logging documentation updates (#100595)
Updated the logging.rst with info about the env var.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100595
Approved by: https://github.com/msaroufim, https://github.com/lezcano
2023-05-04 21:54:02 +00:00
Animesh Jain
8994d9e610 [dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590)
For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590
Approved by: https://github.com/voznesenskym, https://github.com/wconstab
2023-05-04 18:52:21 +00:00
Bin Bao
edebad81a9 Add a rst doc for the performance dashboard (#100592)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100592
Approved by: https://github.com/msaroufim, https://github.com/huydhn
2023-05-04 18:28:09 +00:00
Richard Barnes
9c185b6b46 [codemod] Replace hasattr with getattr in caffe2/docs/source/notes/extending.rst (#100598)
Summary:
The pattern
```
X.Y if hasattr(X, "Y") else Z
```
can be replaced with
```
getattr(X, "Y", Z)
```

The [getattr](https://www.w3schools.com/python/ref_func_getattr.asp) function gives more succinct code than the [hasattr](https://www.w3schools.com/python/ref_func_hasattr.asp) function. Please use it when appropriate.

**This diff is very low risk. Green tests indicate that you can safely Accept & Ship.**

Test Plan: Sandcastle

Differential Revision: D44886464

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100598
Approved by: https://github.com/Skylion007
2023-05-04 16:36:15 +00:00
Angela Yi
8eb82135d1 [docs] Docs for writing ATen IR passes + FX Pattern matching (#100577)
I'm not really sure where to put this...maybe just link it somewhere in torch.compile docs?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100577
Approved by: https://github.com/msaroufim
2023-05-04 05:17:10 +00:00
shibo
6aeb85add8 add checkpoint support for custom device (#99626)
Fixes #ISSUE_NUMBER
1、add checkpoint support for custom device
2、add a device argument, I want to add a device="cuda" parameter to the func `forward` of `CheckpointFunction`, and I can specify the device type when using it, but the func `apply` of `torch.autograd.Function` does not support `kwargs`, so I added a variable named `_device`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99626
Approved by: https://github.com/soulitzer
2023-05-04 00:23:42 +00:00
vfdev-5
6a12f10b08 Publicly exposing torch.backends.cpu.get_cpu_capability() (#100164)
Description:

- As suggested by Nikita, created `torch.backends.cpu` submodule and exposed `get_cpu_capability`.

- In torchvision Resize method we want to know current cpu capability in order to pick appropriate codepath depending on cpu capablities

Newly coded vectorized resize of uint8 images on AVX2 supported CPUs is now faster than older way (uint8->float->resize->uint8). However, on non-avx hardware (e.g. Mac M1) certain configs are slower using native uint8.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100164
Approved by: https://github.com/albanD, https://github.com/malfet
2023-05-03 19:02:07 +00:00
Svetlana Karslioglu
d425da8bf3 Replace master with main in links and docs/conf.py (#100176)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100176
Approved by: https://github.com/albanD, https://github.com/malfet
2023-05-02 18:20:32 +00:00
Hirochika Matsumoto
f143c92739 [docs] Fix typo in get-started.rst (#100355)
This PR changes `""nvprims_nvfuser"` which should be a typo to `"nvprims_nvfuser"`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100355
Approved by: https://github.com/Skylion007, https://github.com/kit1980
2023-05-02 00:29:53 +00:00
BowenBao
c94b6a6712 [ONNX] Introduce 'diagnostics' to 'dynamo_export' api (#99668)
Summary
* Introduce `DiagnosticContext` to `torch.onnx.dynamo_export`.
* Remove `DiagnosticEngine` in preparations to update 'diagnostics' in `dynamo_export` to drop dependencies on global diagnostic context. No plans to update `torch.onnx.export` diagnostics.

Next steps
* Separate `torch.onnx.export` diagnostics and `torch.onnx.dynamo_export` diagnostics.
* Drop dependencies on global diagnostic context. https://github.com/pytorch/pytorch/pull/100219
* Replace 'print's with 'logger.log'.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99668
Approved by: https://github.com/justinchuby, https://github.com/abock
2023-05-01 19:58:49 +00:00
pbialecki
8fe91d16b0 Remove CUDA 11.6 note from complex docs (#100118)
Removes note in the complex docs pointing to the CUDA 11.6 wheels introduced in https://github.com/pytorch/pytorch/pull/80363.
Background: this warning was added via https://github.com/pytorch/pytorch/issues/79876 which pointed out a slow compilation time in 11.3. The 11.6 pip wheels were thus recommended but are not build anymore as our current support is 11.7, 11.8 (and 12.1 experimental in nightlies).

The note is confusing users as it doesn't explain why 11.6 is needed.
Reference: https://discuss.pytorch.org/t/complex-numbers-cuda-11-6-documentation-warning/178588/1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100118
Approved by: https://github.com/msaroufim
2023-04-27 16:26:27 +00:00
milesial
45bf3f6216 Optimized EMA implementation (#94820)
This PR proposes an optimized way to do Exponential Moving Average (EMA), which is faster than the current way using `swa_utils.AveragedModel` described in https://pytorch.org/docs/stable/optim.html#custom-averaging-strategies.

This implementation is asynchronous, and is built as an optimizer wrapper so that the EMA weight update happens without any additional CPU/GPU sync, just after optimizer steps, and with limited code changes.

Example usage:
```
model = Model().to(device)
opt = torch.optim.Adam(model.parameters())

opt = EMAOptimizer(opt, device, 0.9999)

for epoch in range(epochs):
    training_loop(model, opt)

    regular_eval_accuracy = evaluate(model)

    with opt.swap_ema_weights():
        ema_eval_accuracy = evaluate(model)
```

Here are some benchmarks (time per iteration) on various torchvision models:

|model|this PR iteration time                      |swa_utils.AveragedModel iteration time| iteration speedup                                      |
|-----|-----------------------------|-----------------------|---------------------------------------------|
|     |                             |                       |                                             |
|regnet_x_1_6gf|62.73                        |67.998                 |1.08                                         |
|regnet_x_3_2gf|101.75                       |109.422                |1.08                                         |
|regnet_x_400mf|25.13                        |32.005                 |1.27                                         |
|regnet_x_800mf|33.01                        |37.466                 |1.13                                         |
|regnet_x_8gf|128.13                       |134.868                |1.05                                         |
|regnet_y_16gf|252.91                       |261.292                |1.03                                         |
|regnet_y_1_6gf|72.14                        |84.22                  |1.17                                         |
|regnet_y_3_2gf|99.99                        |109.296                |1.09                                         |
|regnet_y_400mf|29.53                        |36.506                 |1.24                                         |
|regnet_y_800mf|37.82                        |43.634                 |1.15                                         |
|regnet_y_8gf|196.63                       |203.317                |1.03                                         |
|resnet101|128.80                       |137.434                |1.07                                         |
|resnet152|182.85                       |196.498                |1.07                                         |
|resnet18|29.06                        |29.975                 |1.03                                         |
|resnet34|50.73                        |53.443                 |1.05                                         |
|resnet50|76.88                        |80.602                 |1.05                                         |
|resnext101_32x8d|277.29                       |280.759                |1.01                                         |
|resnext101_64x4d|269.56                       |281.052                |1.04                                         |
|resnext50_32x4d|100.73                       |101.102                |1.00                                         |
|shufflenet_v2_x0_5|10.56                        |15.419                 |1.46                                         |
|shufflenet_v2_x1_0|13.11                        |18.525                 |1.41                                         |
|shufflenet_v2_x1_5|18.05                        |23.132                 |1.28                                         |
|shufflenet_v2_x2_0|25.04                        |30.008                 |1.20                                         |
|squeezenet1_1|14.26                        |14.325                 |1.00                                         |
|swin_b|264.52                       |274.613                |1.04                                         |
|swin_s|180.66                       |188.914                |1.05                                         |
|swin_t|108.62                       |112.632                |1.04                                         |
|swin_v2_s|220.29                       |231.153                |1.05                                         |
|swin_v2_t|127.27                       |133.586                |1.05                                         |
|vgg11|95.52                        |103.714                |1.09                                         |
|vgg11_bn|106.49                       |120.711                |1.13                                         |
|vgg13|132.94                       |147.063                |1.11                                         |
|vgg13_bn|149.73                       |165.256                |1.10                                         |
|vgg16|158.19                       |172.865                |1.09                                         |
|vgg16_bn|177.04                       |192.888                |1.09                                         |
|vgg19|184.76                       |194.194                |1.05                                         |
|vgg19_bn|203.30                       |213.334                |1.05                                         |
|vit_b_16|217.31                       |219.748                |1.01                                         |
|vit_b_32|69.47                        |75.692                 |1.09                                         |
|vit_l_32|223.20                       |258.487                |1.16                                         |
|wide_resnet101_2|267.38                       |279.836                |1.05                                         |
|wide_resnet50_2|145.06                       |154.918                |1.07                                         |

You can see that in all cases it is faster than using `AveragedModel`. In fact in many cases, adding EMA does not add any overhead since the computation is hidden behind the usual iteration flow.

This is a similar implementation to the one currently in [NVIDIA NeMo](https://github.com/NVIDIA/NeMo).

If the team is interested in merging this, let me know and I'll add some documentation similar to `swa_utils` and tests.

Credits to @szmigacz for the implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94820
Approved by: https://github.com/janeyx99
2023-04-26 18:02:11 +00:00
Chris Gottbrath
f0e28b1cb9 Adding the maintainers approved in 2023Q1 Core Maintainers meeting (#98520)
Added Nikita to Core Maintainers
Merged MKLDNN with CPU Performance
Renamed CUDA to GPU Performance
Added Jiong to Compiler and CPU Performance
Added Xiaobing to CPU Performance
Marking Vitaly and Jian Hui as Emeritus
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98520
Approved by: https://github.com/ezyang, https://github.com/soumith, https://github.com/dzhulgakov
2023-04-24 17:58:18 +00:00
Kurt Mohler
1e8cf6ad7f Add documentation for torch._logging.set_logs (#99219)
Part of #98871

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99219
Approved by: https://github.com/mlazos, https://github.com/lezcano
2023-04-24 08:06:57 +00:00
BowenBao
51742a467d [ONNX] Fix missing import numpy for docs example (#99663)
Fixes https://github.com/pytorch/pytorch/issues/99408
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99663
Approved by: https://github.com/justinchuby
2023-04-21 04:06:45 +00:00
Simon Seo
9f95032101 Fix broken links in contribution_guide.rst (#99295)
mainly from `master` to `main`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99295
Approved by: https://github.com/kit1980
2023-04-20 22:20:56 +00:00
Will Constable
e6aa8e0729 Test and document dynamo backward hooks support (#99382)
No new support added, but backward hooks are working and now there is a test and some documentation about the limitations (hooks firing after whole graph).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99382
Approved by: https://github.com/yanboliang
2023-04-18 03:03:29 +00:00
Will Constable
6eab5e88c8 Graph-break on allowed modules if they have hooks (#97184)
Allowed modules are stuck into dynamo's fx graph as call_module
nodes, without dynamo doing any tracing of the module.  This means
during AOT trace time, hooks will fire during tracing when the
call_module is executed, but the hooks themselves will disappear
after that and not be present in the compiled program.
  (worse, if they performed any tensor operations, those would get
   traced so you could end up with part of the hook's functionality).

To circumvent this, there are two options for 'allowed modules' with hooks.
1) don't treat them as 'allowed' - trace into them
2) graph-break, so the module is no longer part of the dynamo trace at all

(1) will fail for users that opted into allowed modules becuase they know
    their module has problems being traced by dynamo.
(2) causes graph breaks on common modules such as nn.Linear, just because they
    are marked as 'allowed'.

It would help matters if we could differentiate between types of allowed modules
  (A) allowed to avoid overheads - used for common ops like nn.Linear
  (B) allowed to avoid dynamo graphbreaks caused by unsupported code

Ideally, we'd use method (1) for group (A) and (2) for (B).

For now, graph-break on all cases of allowed modules.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97184
Approved by: https://github.com/jansel
2023-04-15 01:46:15 +00:00
BowenBao
606ce5b653 [ONNX] Introduce Input/Ouptut adapter; Switch to 'DynamoExporter' (#98421)
Summary
* Introduce input/output adapter. Due to design differences, input/output format
between PyTorch model and exported ONNX model are often not the same. E.g., `None`
inputs are allowed for PyTorch model, but are not supported by ONNX. Nested constructs
of tensors are allowed for PyTorch model, but only flattened tensors are supported by ONNX,
etc. The new input/output adapter is exported with the model. Providing an interface to
automatically convert and validate inputs/outputs format.
* As suggested by #98251,
provide extension for unwrapping user defined python classes for `dynamo.export` based
exporter. Unblock huggingface models.
* Re-wire tests to run through `DynamoExporter` w/ `dynamo_export` api. Kept
`DynamoOptimizeExporter` in the tests for now for coverage of this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98421
Approved by: https://github.com/justinchuby, https://github.com/titaiwangms, https://github.com/thiagocrepaldi
2023-04-15 01:13:00 +00:00
PyTorch MergeBot
dda7ce4bb3 Revert "[core][pruning][be] Rename sparsifier folder to pruner (#98758)"
This reverts commit 778fd1922a.

Reverted https://github.com/pytorch/pytorch/pull/98758 on behalf of https://github.com/jcaip due to https://www.internalfb.com/diff/D44905951 need to fix broken import in fbcode
2023-04-13 16:30:47 +00:00
Tugsbayasgalan Manlaibaatar
39fd7f945f Add Symbool support in python to C++ translation (#98453)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98453
Approved by: https://github.com/ezyang
2023-04-12 03:21:57 +00:00
Mark Saroufim
bc8cb62bcb torch.compile benchmark utility (#97699)
I've had many exchanges that look like this https://github.com/rasbt/faster-pytorch-blog/pull/2 so this is an attempt to get make this problem easier

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97699
Approved by: https://github.com/ezyang
2023-04-12 03:02:06 +00:00
soulitzer
367051e47e [docs] Add missing functions to autograd.rst (#98854)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98854
Approved by: https://github.com/albanD
2023-04-11 20:45:49 +00:00
Jesse Cai
778fd1922a [core][pruning][be] Rename sparsifier folder to pruner (#98758)
Summary:
att

Test Plan:
```
python test/test_ao_sparsity.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98758
Approved by: https://github.com/jerryzh168
2023-04-11 17:26:29 +00:00
Edward Z. Yang
b8b840be3d Convert logging f-strings to use % format, part five (#98765)
This does some annoying but simple cases by hand.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765
Approved by: https://github.com/wanchaol
2023-04-11 13:17:59 +00:00
Guspan Tanadi
ab385bd49e docs: Linking ResNeXt PyTorch Hub Pipeline (#98689)
Introducing ResNeXt model as link to PyTorch Hub see Skip connections section.
Handle issue in #98690.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98689
Approved by: https://github.com/zou3519, https://github.com/kit1980
2023-04-11 02:20:26 +00:00