Commit Graph

2485 Commits

Author SHA1 Message Date
Lucas Pasqualin
ff8e33556e Enables load balancing duplicates in DCP (#116469)
Enables the deduplication of saved entries by load balancing duplicates across ranks.

Tested with existing and modified tests. Additionally tested with the following code snippet, which saves a 20GB DDP model in **~3 seconds on 8 ranks**.  Before this PR, the same operation has been measured at ~19 seconds.

```
def run(local_rank, world_size, param_size, num_params, work_dir):

    os.environ["RANK"] = str(local_rank)
    os.environ["MASTER_ADDR"] = "localhost"
    os.environ["MASTER_PORT"] = "12355"
    device = torch.device(f"cuda:{local_rank}")
    torch.cuda.set_device(device)
    dist.init_process_group(backend="nccl", rank=local_rank, world_size=world_size)

    model = Model(param_size=param_size, num_params=num_params)
    model = DistributedDataParallel(model, gradient_as_bucket_view=True)
    _patch_model_state_dict(model)

    sz = sum(t.nelement() * t.element_size() for t in model.parameters())
    rank_0_print(f"Model size: {sz / 1_000_000_000.0} GB")
    rank_0_print("Saving the model with DCP...")

    checkpointer = _FileSystemCheckpointer(
        f"{args.work_dir}/dcp",
        sync_files=False,
        single_file_per_rank=False,
        thread_count=1
    )

    begin_ts = time.monotonic()
    checkpointer.save(state_dict={"model": model})
    end_ts = time.monotonic()
    rank_0_print(f"Took {end_ts - begin_ts} seconds with DCP")
```

Differential Revision: [D52435926](https://our.internmc.facebook.com/intern/diff/D52435926/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116469
Approved by: https://github.com/fegin, https://github.com/wz337
2024-01-26 22:34:14 +00:00
Sherlock Huang
6596a3f23d [Export] Remove ScriptObjectMeta (#118241)
Summary: As title. Use CustomObjArgument as ScriptObjectMeta

Test Plan: CIs

Reviewed By: zhxchen17

Differential Revision: D53062230

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118241
Approved by: https://github.com/zhxchen17
2024-01-26 00:37:19 +00:00
drisspg
4e29f01bf2 Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689)
# Summary
Simplification of Backend Selection

This PR deprecates the `torch.backends/cuda/sdp_kernel` context manager and replaces it with a new context manager `torch.nn.attention.sdpa_kernel`. This context manager also changes the api for this context manager.

For `sdp_kernel` one would specify the backend choice by taking the negation of what kernel they would like to run. The purpose of this backend manager was to only to be a debugging tool, "turn off the math backend" and see if you can run one of the fused implementations.

Problems:
- This pattern makes sense if majority of users don't care to know anything about the backends that can be run. However, if users are seeking to use this context manager then they are explicitly trying to run a specific backend.
- This is not scalable. We are working on adding the cudnn backend and this API makes it so so that more implementations will need to be turned off if user wants to explicitly run a given backend.
- Discoverability of the current context manager. It is somewhat un-intutive that this backend manager is in backends/cuda/init when this now also controls the CPU fused kernel behavior. I think centralizing to attention namespace will be helpful.

Other concerns:
- Typically backends (kernels) for operators are entirely hidden from users and implementation details of the framework. We have exposed this to users already, albeit not by default and with beta warnings. Does making backends choices even more explicit lead to problems when we potentially want to remove existing backends, (perhaps inputs shapes will get covered by newer backends).

A nice side effect is now that we aren't using the `BACKEND_MAP` in test_transformers many, many dynamo failures are passing for CPU tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114689
Approved by: https://github.com/cpuhrsch
2024-01-24 22:28:04 +00:00
Zhengxu Chen
abd759d50d [fx] Add hooks to intercept node replacements. (#117825)
Summary: Adding an experimental API to FX graph module to place "hooks" every time when we are changing or replacing nodes in a graph, so that we can properly update the new name in graph signature and potentially other places.

Test Plan:
buck test mode/opt  -c fbcode.enable_gpu_sections=true caffe2/test/distributed/_tensor/experimental:tp_transform

buck test mode/opt caffe2/test:test_export -- -r test_replace_hook

Differential Revision: D52896531

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117825
Approved by: https://github.com/avikchaudhuri
2024-01-23 22:28:40 +00:00
Matteo Migliarini
fdac55c35d Added example regarding weight_decay distinction with per-parameter API (#117436)
Added new example and description regarding per-parameter `weight_decay` distinction for bias and non-bias terms.

Fixes #115935

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117436
Approved by: https://github.com/janeyx99
2024-01-22 21:26:02 +00:00
Wanchao Liang
2bb2cc0b71 [tp] add clarification to doc and improve TP examples (#117618)
This PR adds a clarification about evenly sharded assumption in the main
tp doc and improved the tp examples by adding device mesh constructions

fixes https://github.com/pytorch/pytorch/issues/100044

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117618
Approved by: https://github.com/wconstab, https://github.com/awgu
2024-01-22 18:56:50 +00:00
Stas Bekman
86b4b27e26 [docs] start a new FSDP notes doc (#117323)
As discussed on [slack](https://pytorch.slack.com/archives/C3PDTEV8E/p1703699711772289) adding Andrew Gu's advanced FSDP design notes with a few additions from myself based on our discussion.

I hope I did the RST right, I haven't done RST in a while.

- The first section is Andrew's words verbatim + formatting
- The second section is Andrew's words verbatim + formatting + a few of my additions that were confirmed by Andrew, and which hopefully should help understand the process better.

tagging @albanD as requested.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117323
Approved by: https://github.com/awgu
2024-01-22 15:46:35 +00:00
PyTorch MergeBot
02209b5880 Revert "[docs] start a new FSDP notes doc (#117323)"
This reverts commit 7f474da6bc.

Reverted https://github.com/pytorch/pytorch/pull/117323 on behalf of https://github.com/awgu due to broke docs ([comment](https://github.com/pytorch/pytorch/pull/117323#issuecomment-1902740900))
2024-01-21 19:47:27 +00:00
Stas Bekman
7f474da6bc [docs] start a new FSDP notes doc (#117323)
As discussed on [slack](https://pytorch.slack.com/archives/C3PDTEV8E/p1703699711772289) adding Andrew Gu's advanced FSDP design notes with a few additions from myself based on our discussion.

I hope I did the RST right, I haven't done RST in a while.

- The first section is Andrew's words verbatim + formatting
- The second section is Andrew's words verbatim + formatting + a few of my additions that were confirmed by Andrew, and which hopefully should help understand the process better.

tagging @albanD as requested.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117323
Approved by: https://github.com/albanD, https://github.com/awgu
2024-01-21 15:11:24 +00:00
suo
4057d005ff Initial torchbind support in PT2 (#117697)
This PR adds the bare minimum functionality to get torchbind working in an e2e testable way on PT2.

It implements:
* ProxyTensor support
* Simple torch.export support (proxytensor-only path, e.g. non-strict).
* add some tests exercising the path.

Because all this is not fully baked, I hide the functionality behind a feature flag (`enable_torchbind_tracing()`) so it does not affect regular users for now.

Still on the agenda:
* Dynamo support
* Actual FakeMode support
* Mutability support

Hoping to get this first bit in as a standalone, as it will unblock some more extensive experimentation/testing going on internally.

Differential Revision: [D51825372](https://our.internmc.facebook.com/intern/diff/D51825372/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117697
Approved by: https://github.com/SherlockNoMad
2024-01-19 06:28:20 +00:00
PyTorch MergeBot
2f84a9d37c Revert "[CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663)"
This reverts commit 5aa92b5090.

Reverted https://github.com/pytorch/pytorch/pull/115663 on behalf of https://github.com/PaliC due to Unfortunately, this pr breaks cuda builds internally ([comment](https://github.com/pytorch/pytorch/pull/115663#issuecomment-1899388813))
2024-01-18 23:40:30 +00:00
Angela Yi
92d718aed1 [export] Add lifted constant obj to input (#116985)
Test Plan: wip

Differential Revision: D52556070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116985
Approved by: https://github.com/suo
2024-01-18 22:10:53 +00:00
suo
ccc8440609 [export] introduce WrapperModule (#117571)
Simple module to wrap a callable. This is a useful utility for when we start requiring that torch.export take an nn.Module.

Differential Revision: [D52791310](https://our.internmc.facebook.com/intern/diff/D52791310/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117571
Approved by: https://github.com/tugsbayasgalan, https://github.com/avikchaudhuri
ghstack dependencies: #117570
2024-01-18 03:40:34 +00:00
Eddie Yan
5aa92b5090 [CUDNN][SDPA] Experimental cuDNN Flash Attention v2 Inference (#115663)
#113713

Going to clean up some of the checks and will remove draft status after.
Can be tested on SM80+ with `TORCH_CUDNN_MHA_ENABLED=1`.

CC @drisspg @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115663
Approved by: https://github.com/drisspg
2024-01-18 01:20:36 +00:00
Kurman Karabukaev
a60b566d37 [TorchElastic] Support for overprovisioning in C10 based rendezvous (#117066)
Summary:
Allow TorchElastic to manage more nodes than a maximum nnodes specifed in a job. It will be used as a spare capacity/warm nodes for schedulers that support elasticity.

RFC: https://github.com/pytorch/pytorch/issues/114097

Test Plan: Integration tests

Differential Revision: D52343874

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117066
Approved by: https://github.com/zdevito
2024-01-18 01:16:55 +00:00
Peter Bell
001585f446 [fx][inductor] Add statically_known_true utility for SymBool (#117359)
This adds a function `statically_known_true` for `SymBool` that works
like inductor's `is_expr_static_and_true`. That is, it tries to simplify the
expression to a constant or returns `False` if it cannot be simplified.

This is useful in cases that can be optimized if the condition is met,
otherwise it doesn't effect correctness so we can avoid adding guards.

I also use this new function in inductor for `FakeTensorUpdater` and
`remove_noop_pass` which both generated unexpected guards previously.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117359
Approved by: https://github.com/lezcano
2024-01-15 18:01:10 +00:00
Sai-Pra
19502ff6aa Fixed typo in build_activation_images.py (#117458)
In line 24 of build_activation_images.py, I changed "programmaticly" to "programmatically" to be dramatically correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117458
Approved by: https://github.com/malfet
2024-01-15 03:27:40 +00:00
vasiliy
a6d33614d6 add float8 types to dtypes table (#117375)
Summary:

As titled

Test Plan:

CI

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117375
Approved by: https://github.com/ezyang
2024-01-15 00:23:07 +00:00
Edward Z. Yang
d006cae2a8 Update documentation for unsigned int types (#116804)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116804
Approved by: https://github.com/albanD
ghstack dependencies: #116595, #116803
2024-01-08 22:02:10 +00:00
Guo Yejun
5323b2daa5 [docs] add mode="reduce-overhead" into torch.compile to enable cuda g… (#116529)
…raph

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116529
Approved by: https://github.com/eellison
2024-01-05 22:54:20 +00:00
Angela Yi
6413511713 [export][refactor][4/n] Make equality_constraints optional (#116233)
Summary: needed to remove equality_contraints eventually :P

Test Plan: CI

Differential Revision: D52351709

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116233
Approved by: https://github.com/tugsbayasgalan
2024-01-05 00:50:52 +00:00
Mikayla Gawarecki
0f6f582c0d Add config to disable TransformerEncoder/MHA fastpath (#112212)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112212
Approved by: https://github.com/jbschlosser
2024-01-02 23:59:30 +00:00
lezcano
b18d8d4595 Add a wrapper to transform a NumPy function into a PyTorch function (#114610)
A less general version of this wrapper was used in the keynote on
`torch.compile(numpy)`. We expose a generic version of the wrapper
that works seamlessly with `torch.compile`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114610
Approved by: https://github.com/albanD
2024-01-02 18:35:29 +00:00
Anupam Bhatnagar
4371939751 Removing HTA documentation (#116513)
Removing HTA documentation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116513
Approved by: https://github.com/aaronenyeshi, https://github.com/malfet, https://github.com/atalman
2023-12-28 23:04:23 +00:00
angelayi
6b91e6907e Add setUserEnabledNNPACK config (#116152)
When exporting a model with a convolution kernel on cpu, if mkldnn is disabled and nnpack is enabled, export will go down the nnpack optimized convolution kernel for certain shapes ((code pointer)[cd449e260c/aten/src/ATen/native/Convolution.cpp (L542-L552)]). This means that we will automatically create a guard on that certain shape. If users want to export without any restrictions, one option is to disable nnpack. However, no config function exists for this, so this PR is adding a config function, similar to the `set_mkldnn_enabled` function.

Original context is in https://fb.workplace.com/groups/1075192433118967/posts/1349589822345892/?comment_id=1349597102345164&reply_comment_id=1349677642337110.

To test the flag, the following script runs successfully:
```
import os

import torch
from torchvision.models import ResNet18_Weights, resnet18

torch.set_float32_matmul_precision("high")

model = resnet18(weights=ResNet18_Weights.DEFAULT)
model.eval()

with torch.no_grad():
    # device = "cuda" if torch.cuda.is_available() else "cpu"
    torch.backends.mkldnn.set_flags(False)
    torch.backends.nnpack.set_flags(False)   # <--- Added config
    device = "cpu"
    model = model.to(device=device)
    example_inputs = (torch.randn(2, 3, 224, 224, device=device),)
    batch_dim = torch.export.Dim("batch", min=2, max=32)
    so_path = torch._export.aot_compile(
        model,
        example_inputs,
        # Specify the first dimension of the input x as dynamic
        dynamic_shapes={"x": {0: batch_dim}},
        # Specify the generated shared library path
        options={
            "aot_inductor.output_path": os.path.join(os.getcwd(), "resnet18_pt2.so"),
            "max_autotune": True,
        },
    )

```

I'm not sure who to add as reviewer, so please feel free to add whoever is relevant!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116152
Approved by: https://github.com/malfet
2023-12-27 06:00:16 +00:00
Lucas Pasqualin
b342286646 adds async save, makes checkpointer private (#116293)
Adds Async Save and also makes `Checkpointer` classes private.

The original PR was here: https://github.com/pytorch/pytorch/pull/115864

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116293
Approved by: https://github.com/fegin
2023-12-22 05:22:39 +00:00
suo
bc3ef1684e [export] refactor unflatten.py to be a top-level API (#115466)
This is in preparation for the merging of the internal and external versions of
the unflattener. Unflatten needs to be its own API because we are adding more
options to it in forthcoming diffs.

Differential Revision: [D52001133](https://our.internmc.facebook.com/intern/diff/D52001133/)

@diff-train-skip-merge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115466
Approved by: https://github.com/zhxchen17
2023-12-21 20:52:29 +00:00
Damien
2d2016fdf8 WIP Add compatibility with channels_last_3d for conv3d (#114790)
Part of a multi-PR work to fix #59168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114790
Approved by: https://github.com/albanD
2023-12-20 19:28:25 +00:00
Bin Bao
fabf9433e7 [AOTI][refactor] Organize model runner files (#116022)
Summary: Move runner util files into a subdirectory and put AOTIModelContainerRunnerCpu into a separate file

Differential Revision: [D52300693](https://our.internmc.facebook.com/intern/diff/D52300693)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116022
Approved by: https://github.com/khabinov
2023-12-20 15:35:34 +00:00
FFFrog
327bdcdb14 Some tiny modification about torch.set/get_default_device (#116014)
1. fix bug of torch.set_default_device in multi-threading
2. add new interface named torch.get_default_device

Fixes #115333
Fixes #115917

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116014
Approved by: https://github.com/malfet, https://github.com/jansel
2023-12-19 05:08:06 +00:00
Wanchao Liang
61abacf829 [tp] improve documentation (#115880)
Improve the TP documentation in terms of format and descriptions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115880
Approved by: https://github.com/XilunWu
2023-12-15 18:44:22 +00:00
Will Constable
28e4004286 Add doc for torch.distributed.breakpoint (#115656)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115656
Approved by: https://github.com/wanchaol, https://github.com/fegin
ghstack dependencies: #115705
2023-12-14 14:45:36 +00:00
angelayi
dd9a989b83 [export][reland][refactor][1/n] Split dynamic shapes (#115556)
Reland of https://github.com/pytorch/pytorch/pull/114764
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115556
Approved by: https://github.com/zhxchen17
2023-12-12 05:36:41 +00:00
atalman
b88be1686d Revert "[export][refactor][1/n] Move dynamic shapes logic (#114764)" (#115508)
GitHub first oncall.
This reverts commit 53bf8cfcf9.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115508
Approved by: https://github.com/malfet, https://github.com/angelayi
2023-12-11 14:54:51 +00:00
William Wen
f614ed78b8 [docs, dynamo] fix typos in dynamo custom backend docs (#115444)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115444
Approved by: https://github.com/eellison
2023-12-08 23:58:26 +00:00
albanD
a2b89154bf New swap function (#111747)
This PR is proposing a new approach to solve the nn/optim only linked by python object identity problem.
The idea is to have a function that can swap the content of two Tensors t1 and t2 while preserving all the old references.
This would allow us to swap the `model.weight` with a new Tensor (can be any subclass of Tensor and any TensorImpl (xla, sparse, nested tensorimpl would work)). The use within nn will be done in a follow up.

This is done by swapping the whole content of the PyObject and then putting back the fields associated with external references (refcount, gc tracking and weakrefs).
Note that we have to properly handle all the cases where there is memory used before the public pointer PyObject* and where the PyObject is bigger due to dict/weakref being inlined (older CPython version) or due to slots.

The main limitation of this approach is that the number of slots need to match for the objects being swapped and thus limit usage of slots in subclasses.

Draft right now to see what @colesbury thinks about doing this?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111747
Approved by: https://github.com/colesbury
2023-12-08 18:49:35 +00:00
Linus
5f2ff29569 Fix typo in https://pytorch.org/docs/stable/sparse.html (#115282)
Fixes #111473

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115282
Approved by: https://github.com/svekars
2023-12-08 18:31:33 +00:00
Wongboo
68f74dd162 Add python and C++ support for LPPool3d (#114199)
Add python and C++ support for LPPool3d to Fixes #114114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199
Approved by: https://github.com/mikaylagawarecki
2023-12-08 18:18:44 +00:00
Iris Zhang (PyTorch)
23fa9621e4 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099) (#115193)
Summary:

Rename _device_mesh.py to device_mesh.py, update all callsites, add documentation.
We created stubs for public class and methods in torch.distributed.device_mesh so that torch.distributed.device_mesh can be imported with or without distributed is available().

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/115099
Prior to landing, CI signals are all passed. Shipit added the "ci/trunk" label to the PR and DID NOT wait for it and went ahead committing. More context can be found in the reverted PR above.

Test Plan: CI.

Differential Revision: D51861018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115193
Approved by: https://github.com/fegin
2023-12-08 08:44:32 +00:00
Lucas Pasqualin
5432088098 Adds Checkpointer Wrapper for DCP [3/N] (#114603)
Adds a useful high level wrapper for calling `dist.save/load` with the correct storage readers and writers.

Instead of doing:

```
DCP.save(
    state_dict={...},
    storage_writer=StorageWriter(...)
)

DCP.load(
    state_dict={...},
    storage_reader=StorageReader(...)
)
```

We can now do:

```
checkpointer = Checkpointer(...)

checkpointer.save(state_dict={...})
checkpointer.load(state_dict={...})
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114603
Approved by: https://github.com/fegin, https://github.com/wz337
2023-12-08 01:03:21 +00:00
Howard Huang
3e66385ddd Add Work to distributed docs (#115172)
Summary:
Documenting the `Work` object

For a collective (broadcast, all_reduce, etc.) when async_op=True we return a `Work` object to which users can call `.wait()`, `.is_success()`, among other things but this class is not documented

Test Plan: Preview the docs build in OSS

Differential Revision: D51854974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115172
Approved by: https://github.com/wconstab
2023-12-07 18:12:10 +00:00
angelayi
53bf8cfcf9 [export][refactor][1/n] Move dynamic shapes logic (#114764)
1/n of refactoring export code:
* Moved dynamic shapes/constraints/dynamic_dims logic in torch/_export/__init__.py and torch/export/__init__.py to torch/export/dynamic_shapes.py

Differential Revision: [D51823962](https://our.internmc.facebook.com/intern/diff/D51823962)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114764
Approved by: https://github.com/ydwu4
2023-12-06 16:46:38 +00:00
drisspg
d4c79a3078 Add an attention bias subclass for a lower right causal masking (#114823)
# Summary
This PR introduces a new Tensor subclass that is designed to be used with torch.nn.functional.scaled_dot_product_attention. Currently we have a boolean `is_causal` flag that allows users to do do causal masking without the need to actually create the "realized" attention bias and pass into sdpa. We originally added this flag since there is native support in both fused kernels we support. This provides a big performance gain ( the kernels only need to iterate over ~0.5x the sequence, and for very large sequence lengths this can provide vary large memory improvements.

The flag was introduced when the early on in the kernel development and at the time it was implicitly meant to "upper_left" causal attention. This distinction only matters when the attention_bias is not square. For a more detailed break down see: https://github.com/pytorch/pytorch/issues/108108. The kernels default behavior has since changed, largely due to the rise of autogressive text generation. And unfortunately this would lead to a BC break. In the long term it may actually be beneficial to change the default meaning of `is_causal` to represent lower_right causal masking.

The larger theme though is laid here: https://github.com/pytorch/pytorch/issues/110681. The thesis being that there is alot of innovation in SDPA revolving around the attention_bias being used. This is the first in hopefully a few more attention_biases that we would like to add. The next interesting one would be `sliding_window` which is used by the popular mistral model family.

Results from benchmarking, I improved the meff_attention perf hence the slightly decreased max perf.
```Shell
+---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+
|  Type   |      Speedup       | batch_size | num_heads | q_seq_len | k_seq_len | embed_dim |     dtype      | head_dim |
+---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+
| Average | 1.2388050062214226 |            |           |           |           |           |                |          |
|   Max   | 1.831672915579016  |    128     |    32     |   1024    |   2048    |   2048    | torch.bfloat16 |    64    |
|   Min   | 0.9430534166730135 |     1      |    16     |    256    |    416    |   2048    | torch.bfloat16 |   128    |
+---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114823
Approved by: https://github.com/cpuhrsch
2023-12-06 08:29:26 +00:00
Joel Schlosser
22704426c3 Expand dynamic dims support for traceable subclasses (#114311)
Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo).

Summary:
* Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors
    * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance
    * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called
    * Addresses this: 6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)
* Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols)
    * Signatures now:
    ```python
    # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr)
    # ctx is anything useful for rebuilding the class we want to guard on
    attrs, ctx = x.__tensor_flatten__()
    ...
    # inner_tensors is a dict of {attr -> tensor}
    # ctx is taken unmodified from flattening and (eventually) guarded on
    # outer_size is the expected size of the output; possibly symbolic
    # outer_stride is the expected strides of the output; possibly symbolic
    y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride)

    # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride
    # the assert simplifies symbols when there are relationships between outer and inner symbols
    ```
    * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least
    * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now
* ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work)
* ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~
    * Now handled in #114469
* Next PR: add TENSOR_MATCH guards on inner tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311
Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh
2023-12-05 21:09:25 +00:00
angelayi
5fdae89c03 [docs][aoti] Link to export docs in AOTI docs (#115088)
Context: https://fb.workplace.com/groups/1075192433118967/posts/1341833143121560/?comment_id=1341841786454029

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115088
Approved by: https://github.com/desertfire
2023-12-05 20:22:42 +00:00
Anupam Bhatnagar
85d4708512 HTA docs (#115060)
Added documentation for Holistic Trace Analysis

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115060
Approved by: https://github.com/aaronenyeshi
2023-12-05 19:38:09 +00:00
Nikita Shulga
a827ac71f2 Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)"
This reverts commit eaa64339d6.
2023-12-05 08:59:36 -08:00
Iris Zhang (PyTorch)
eaa64339d6 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)
Summary:
Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/114991
It was failing because failing a public module binding tests in MacOS, and this is due to the change in import order for torch/distributed/fsdp/_common_utils.py. Since this original import would still work, we remove the changes in this file.

Test Plan: CI.

Differential Revision: D51825114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115099
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-12-05 05:44:52 +00:00
soulitzer
a7bcc78bff Make it clearer that current selective AC is PT2-only and private (#115081)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115081
Approved by: https://github.com/albanD
2023-12-04 23:01:22 +00:00
Xuehai Pan
55064a4ef9 [BE] add parentheses to kwargs unpacking func(*args, **(kwargs or {})) (#115026)
This PR adds parentheses to kwargs unpacking `func(*args, **(kwargs or {}))` for better code readability.

With/without the parentheses are semantic equivalent because they produce the same bytecode.

```console
$ echo "func(*args, **kwargs or {})" | python3 -m dis -
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (func)
              6 LOAD_NAME                1 (args)
              8 BUILD_MAP                0
             10 LOAD_NAME                2 (kwargs)
             12 JUMP_IF_TRUE_OR_POP      1 (to 16)
             14 BUILD_MAP                0
        >>   16 DICT_MERGE               1
             18 CALL_FUNCTION_EX         1
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

$ echo "func(*args, **(kwargs or {}))" | python3 -m dis -
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (func)
              6 LOAD_NAME                1 (args)
              8 BUILD_MAP                0
             10 LOAD_NAME                2 (kwargs)
             12 JUMP_IF_TRUE_OR_POP      1 (to 16)
             14 BUILD_MAP                0
        >>   16 DICT_MERGE               1
             18 CALL_FUNCTION_EX         1
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115026
Approved by: https://github.com/Skylion007
2023-12-03 20:03:26 +00:00
PyTorch MergeBot
3a2e2044cd Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)"
This reverts commit 729ac7317a.

Reverted https://github.com/pytorch/pytorch/pull/114991 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114991#issuecomment-1837214567))
2023-12-02 17:55:51 +00:00
Wanchao Liang
28925902fa [TP] fully rewrite Tensor Parallel APIs (#114732)
This PR rewrites Tensor Parallel implementation. Tensor Parallel APIs
supposed to be a very thin-wrapper to DTensor APIs, but the current
implementation got too messy and buggy. It's really hard to debug what
went wrong when using it. It's crucially important for advanced users or
developers to understand the API and its implementation easily without
going through all different types of functions and utils, so that
they could trust what happen under the hood.

In particular this PR:

* Make ParallelStyle to be a real contract API for parallelize_module to
  take, each concrete ParallelStyle only needs to implement `apply` to
apply the sharding to nn.Module, remove all non-necessary fields. This
also enable easier ParallelStyle authoring going forward.
* Keep the ColwiseParallel and RowwiseParallel public interface, but
  refactor them in a way that makes the parameter sharding, inputs and
outputs handling lives within the style itself, so that it's easy to
understand how Linear/Embedding layers are sharded and how the inputs/outputs
transformations are performed.
* remove all those private _prepare_input/_prepare_output_fn fields for
  both ColwiseParallel/RowwiseParallel. Since we throw deprecation
messages in nightly for a while and TP is on prototype release, the
fields are also private, it should be safe to remove them
* Refactor the recently landed PrepareModuleInput/Output style, change
  output_layouts to desired_input/output_layouts, group
  the function inside the style itself, no default arguments for these
two styles and user need to specify them to think about the sharding
layouts. Fixed bugs about not handling
`use_local_output` flag.
* Make default arguments be None instead of Placement object, this is
  standard python practice to not have custom object instance as default
argument
* Remove all dead APIs (i.e. PairwiseParallel and SequenceParallel
  style, all prepare input/output functions) as we throw deprecation
 msgs for a while, and in the progress of removing all of them from the tests.
* throw deprecation warning for `tp_mesh_dim` as we recomemnd use device
  mesh slice/indexing instead of manually specify mesh dim
* Rewrite all documentations for every ParallelStyle and make the
  documentation more clear about what each style is doing

TODOs:
* Rewrite TP tests to adjust for the changes we have in this PR
* add more tests to guard the bug fixes

Differential Revision: [D51761183](https://our.internmc.facebook.com/intern/diff/D51761183)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114732
Approved by: https://github.com/wz337, https://github.com/fduwjj
2023-12-02 08:18:12 +00:00
Iris Zhang (PyTorch)
729ac7317a [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)
Summary:

Same content of changes as https://github.com/pytorch/pytorch/pull/114710

Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.
ghstack-source-id: 208980207
exported-using-ghexport

Test Plan: CI.

Reviewed By: wanchaol

Differential Revision: D51629761

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114991
Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/fegin
2023-12-02 04:39:41 +00:00
Rohan Varma
3c78ea4c9d [DDP][Compile] Test to Ensure torch.compile works w/static_graph=True (#114621)
Resolves https://github.com/pytorch/pytorch/issues/93672. This was
actually fixed by https://github.com/pytorch/pytorch/pull/103487 but I didn't
realize that PR also fixes torch compile at the time.

Differential Revision: [D51596148](https://our.internmc.facebook.com/intern/diff/D51596148/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114621
Approved by: https://github.com/wconstab
2023-12-01 22:18:45 +00:00
Lucas Pasqualin
f073dcd4f7 Stateful Checkpointing for Distributed [1/N] (#113867)
First pass at adding a save/load API, as well as definition of Stateful objects.

Amongst a couple todo's, we still need to explore adding an `all_gather` & potentially a `barrier` while iterating through state keys.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113867
Approved by: https://github.com/fegin, https://github.com/wz337
2023-12-01 19:21:03 +00:00
Philip Meier
373f2060ba fix extending torch native API docs (#114863)
Couldn't think of a better `release notes:` label. Feel free to set a more fitting one
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114863
Approved by: https://github.com/mikaylagawarecki
2023-12-01 06:09:35 +00:00
Jerry Zhang
64fd706b21 [quant][pt2e] Add generate_numeric_debug_handle pass (#114315)
Summary:
This is a util for numeric suite in pt2 export so that we can build
a more streamlined UX for numerical debugging in quant + executorch stack

Test Plan:
python test/test_quantization.py TestGenerateNumericDebugHandle

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114315
Approved by: https://github.com/zhxchen17
2023-12-01 03:38:17 +00:00
William Wen
38ae17d166 [dynamo, docs] update dynamo backend registration docs (#114820)
Update docs to reflect current backend registration API. Add `lookup_backend` to root `dynamo` module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114820
Approved by: https://github.com/eellison
2023-11-30 21:41:05 +00:00
Nikita Shulga
a9d5133207 [ez][doc] Fix sample code in onnx_dynamo.rst (#114770)
By adding `import torch.nn as nn`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114770
Approved by: https://github.com/atalman, https://github.com/thiagocrepaldi
2023-11-29 19:27:52 +00:00
Guo Yejun
4aa2c51a09 [doc] fix typo on graph 3 that is recorded (#114666)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114666
Approved by: https://github.com/eellison
2023-11-28 20:40:13 +00:00
Guo Yejun
4a35ec3c0e [docs] correct the code for cudagraph trees integration (#114583)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114583
Approved by: https://github.com/eellison
2023-11-28 20:28:52 +00:00
lezcano
4ba3e6758d Canonicalize runtime asserts (#114509)
This allows us to remove quite a few redundant runtime asserts, and potentially a number of guards as well.

On
```
python test/dynamo/test_subclasses.py -k test_unbind
```
we go from
```
inserting runtime assert i0 <= s0
inserting runtime assert 0 <= -i0 + s0
inserting runtime assert i0 + i1 <= s0
inserting runtime assert i0 <= -i1 + s0
inserting runtime assert i0 + i1 + i2 <= s0
inserting runtime assert i0 + i1 <= -i2 + s0
inserting runtime assert Eq(i0 + i1 + i2 + i3, s0)
inserting runtime assert i0 + i1 + i2 + i3 <= s0
inserting runtime assert i0 + i1 + i2 <= -i3 + s0
```
to
```
inserting runtime assert i0 - s0 <= 0
inserting runtime assert i0 + i1 - s0 <= 0
inserting runtime assert i0 + i1 + i2 - s0 <= 0
inserting runtime assert Eq(i0 + i1 + i2 + i3, s0)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114509
Approved by: https://github.com/voznesenskym
2023-11-28 01:38:47 +00:00
voznesenskym
081c5b3adc Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926) (#114526)
Summary:

The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this)

cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: huydhn, Chillee

Differential Revision: D51566250

Pulled By: voznesenskym

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526
Approved by: https://github.com/Chillee, https://github.com/huydhn
2023-11-26 23:40:32 +00:00
Akihiro Nitta
d37c4c6995 Update torch.compiler_troubleshooting.rst (#114530)
If you copy and paste the env var in the docs:
```console
TORCHDYNAMO_REPRO_AFTER=“aot”
```
it leads to this error:
```python
    @functools.wraps(unconfigured_compiler_fn)
    def debug_wrapper(gm, example_inputs, **kwargs):
        compiler_fn = functools.partial(unconfigured_compiler_fn, **kwargs)
>       assert config.repro_after in ("dynamo", "aot", None)
E       torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
E       AssertionError:
```
because `config.repro_after` is being `'“aot”'` but not `'aot'`.

---

It would've saved a few minutes of my time 😄
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114530
Approved by: https://github.com/Chillee
2023-11-25 23:15:47 +00:00
PyTorch MergeBot
2f3beb715c Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926)"
This reverts commit 2ca1119d53.

Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))
2023-11-22 12:52:33 +00:00
Thiago Crepaldi
3f736c2d77 Add ONNXProgram.__call__ API to run model with ONNX Runtime (#113495)
Currently the user can use torch.onnx.dynamo_export to export the model.
to ONNX.

```python
import torch

class Model(torch.nn.Module):
    def forward(self, x):
        return x + 1.0

onnx_program = torch.onnx.dynamo_export(
    Model(),
    torch.randn(1, 1, 2, dtype=torch.float),
)
```

The next step would be instantiating a ONNX runtime to execute it.

```python
import onnxruntime  # type: ignore[import]

onnx_input = self.adapt_torch_inputs_to_onnx(*args, **kwargs)
options = options or {}
providers = options.get("providers", onnxruntime.get_available_providers())
onnx_model = self.model_proto.SerializeToString()
ort_session = onnxruntime.InferenceSession(onnx_model, providers=providers)

def to_numpy(tensor):
    return (
        tensor.detach().cpu().numpy()
        if tensor.requires_grad
        else tensor.cpu().numpy()
    )

onnxruntime_input = {
    k.name: to_numpy(v) for k, v in zip(ort_session.get_inputs(), onnx_input)
}

return ort_session.run(None, onnxruntime_input)
```

This PR provides the `ONNXProgram.__call__` method as facilitator to use ONNX Runtime under the hood, similar to how `torch.export.ExportedProgram.__call__` which allows the underlying `torch.fx.GraphModule` to be executed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113495
Approved by: https://github.com/titaiwangms
2023-11-22 01:48:45 +00:00
Antonio Kim
7fc292930c Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-21 23:07:21 +00:00
HDCharles
18e1a37c4e [ao] updating embedding_bag support for fx and eager (#107623)
Summary: our docs were saying dynamic embedding bag wasn't supported but
it actually is (at least at the same level as embeddings were) it just wasn't previously tested/listed.

Test Plan: python test/test_quantization.py -k "test_embedding"

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107623
Approved by: https://github.com/jerryzh168
2023-11-21 03:54:00 +00:00
Ke Wen
dc65f6c601 [c10d] Remove deprecated multi-gpu-per-thread APIs (#114156)
As of today, PyTorch Distributed's preferred programming model is one device per thread, as exemplified by the APIs in its document.  The multi-GPU functions (which stand for multiple GPUs per CPU thread) have been deprecated for three versions. Removing them now before 2.2 release.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114156
Approved by: https://github.com/albanD, https://github.com/fduwjj, https://github.com/H-Huang
2023-11-21 03:50:23 +00:00
voznesenskym
2ca1119d53 Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926)
The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926
Approved by: https://github.com/ezyang, https://github.com/eellison
2023-11-20 23:06:37 +00:00
Edward Z. Yang
aeb5fd52c7 Remove dead tensor_has_hints. (#114071)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114071
Approved by: https://github.com/aakhundov
2023-11-20 16:02:24 +00:00
Pearu Peterson
0bd4d1f4ab Add sparse tensors support to dataloader. (#112842)
Fixes https://github.com/pytorch/pytorch/issues/106837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112842
Approved by: https://github.com/cpuhrsch, https://github.com/gokulavasan
2023-11-19 16:05:27 +00:00
Edward Z. Yang
e2b114ab9f [BE] Package dynamic_dims/constraint_dims into CreateSymbolicPolicy (#113802)
This will make it more convenient to propagate more information through
all of these functions in the future (e.g., for storage offset
information.)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113802
Approved by: https://github.com/davidberard98, https://github.com/voznesenskym
2023-11-17 18:22:46 +00:00
Edward Z. Yang
3a3a979984 Add torch.distributed.breakpoint (#113775)
I tested it works by patching

```
diff --git a/test/distributed/test_dynamo_distributed.py b/test/distributed/test_dynamo_distributed.py
index 96b3a82bdfa..dea9bac9302 100644
--- a/test/distributed/test_dynamo_distributed.py
+++ b/test/distributed/test_dynamo_distributed.py
@@ -18,6 +18,7 @@ from torch._dynamo import config
 from torch._dynamo.utils import same
 from torch._dynamo.testing import collect_results
 from torch.utils._triton import has_triton
+import torch.distributed as dist
 from torch.distributed.fsdp.wrap import transformer_auto_wrap_policy, lambda_auto_wrap_policy
 from torch._higher_order_ops.wrap import tag_activation_checkpoint
 from torch.nn.parallel import DistributedDataParallel as DDP
@@ -398,6 +399,7 @@ class TestMultiProc(DynamoDistributedMultiProcTestCase):
     @unittest.skipIf(not has_triton(), "Inductor+gpu needs triton and recent GPU arch")
     def test_fsdp_activation_checkpointing(self):
         with _dynamo_dist_per_rank_init(self.rank, self.world_size):
+            dist.breakpoint()
             model, inputs = get_toy_model_for_activation_checkpointing(f"cuda:{self.rank}")
             is_inner = lambda module: isinstance(module, ToyInnerModel)  # noqa: E731
             wrap_policy = functools.partial(lambda_auto_wrap_policy, lambda_fn=is_inner)
```

and then running `python test/distributed/test_dynamo_distributed.py -k test_fsdp_activation_checkpointing`

It prints:

```
ATTENTION!!!

Type 'up' to get to the frame that called dist.breakpoint(rank=0)

> /data/users/ezyang/c/pytorch/torch/distributed/__init__.py(71)breakpoint()
-> barrier()
(Pdb) up
> /data/users/ezyang/c/pytorch/test/distributed/test_dynamo_distributed.py(402)test_fsdp_activation_checkpointing()
-> dist.breakpoint()
(Pdb) list
397
398         @skip_if_lt_x_gpu(1)
399         @unittest.skipIf(not has_triton(), "Inductor+gpu needs triton and recent GPU arch")
400         def test_fsdp_activation_checkpointing(self):
401             with _dynamo_dist_per_rank_init(self.rank, self.world_size):
402  ->             dist.breakpoint()
403                 model, inputs = get_toy_model_for_activation_checkpointing(f"cuda:{self.rank}")
404                 is_inner = lambda module: isinstance(module, ToyInnerModel)  # noqa: E731
405                 wrap_policy = functools.partial(lambda_auto_wrap_policy, lambda_fn=is_inner)
406                 model = apply_fsdp_with_checkpointing(model, wrap_policy, is_inner)
407                 correct_outputs = model(inputs)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113775
Approved by: https://github.com/wconstab, https://github.com/wanchaol
2023-11-16 19:30:57 +00:00
Mu-Chu Lee
eddce3c054 [AOTInductor] Rename model_runner to model_container_runner (#111324)
Summary:
We rename the model_runner to model_container_runner to prepare for
adding tests of pure model without container.

Test Plan:
commit itself is a test.

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111324
Approved by: https://github.com/desertfire, https://github.com/chenyang78
2023-11-16 19:14:22 +00:00
Tongzhou Wang
275403be16 [doc] Add nn.parametrizations.weight_norm (#113783)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113783
Approved by: https://github.com/albanD
2023-11-16 17:42:48 +00:00
Iris Zhang
72ce5dd13e [2D] Remove enable_2d_with_fsdp() API and make remove_enable_2d_with_fsdp private (#112473)
As we have our new 2D flow out, we want to remove `enable_2d_with_fsdp()`.
In addition, we change pre_dp_module_transform to private, as we may need to change the UX later on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112473
Approved by: https://github.com/fegin, https://github.com/wanchaol
2023-11-16 01:14:00 +00:00
gs-olive
757f36b988 [docs] Fix torch.compile "tensorrt" backend docs (#113711)
- Update description from ONNX to current state (Torch-TensorRT)
- Add clarification about import

Fixes documentation on this page: https://pytorch.org/docs/stable/torch.compiler.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113711
Approved by: https://github.com/msaroufim
2023-11-15 08:42:53 +00:00
drisspg
9b0f2f8d94 expose sdpa helpers to python (#110496)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110496
Approved by: https://github.com/jbschlosser
2023-11-15 07:34:34 +00:00
PyTorch MergeBot
252e68a83b Revert "Add support for torch.Generator type in TorchScript (#110413)"
This reverts commit 54493fe8c4.

Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557))
2023-11-15 00:51:23 +00:00
Antonio Kim
54493fe8c4 Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-13 23:18:14 +00:00
Justin Chu
47a59ee4d1 [ONNX] Update exporter issue report instructions for quantized models (#113494)
Update the instructions to point users to the right place for creating issues.

https://github.com/onnx/onnx/issues/5674#issuecomment-1806505240

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113494
Approved by: https://github.com/jerryzh168
2023-11-13 18:18:19 +00:00
Bin Bao
c197c48ceb [aotinductor] Add a demo tutorial (#112457)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112457
Approved by: https://github.com/msaroufim, https://github.com/albanD
2023-11-10 17:01:03 +00:00
Thiago Crepaldi
574e313643 Add thiagocrepaldi as person of interest for onnx exporter (#113402)
@malfet @kit1980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113402
Approved by: https://github.com/malfet
2023-11-10 15:19:58 +00:00
Sergii Dymchenko
bb06725ee0 Update mentions of deprecated functions if complex_numbers.rst (#113391)
`torch.svd` is deprecated, and `torch.solve` is completely removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113391
Approved by: https://github.com/malfet, https://github.com/lezcano
2023-11-09 22:32:26 +00:00
Jerry Zhang
501d118255 [quant][pt2e] Add transform_for_annotation method in Quantizer (#113115)
Summary:
Adding the method so that people can do some transformations before annotation to make the graph easier to annotate

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_transform_for_annotation

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51141080](https://our.internmc.facebook.com/intern/diff/D51141080)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113115
Approved by: https://github.com/kimishpatel
2023-11-09 20:23:29 +00:00
Nikita Shulga
81bf0bd68d [no ci] Fix typo in persons_of_interest.rst (#113283)
There are no `c` in `Hirsh`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113283
Approved by: https://github.com/bdhirsh
2023-11-08 19:36:32 +00:00
Edward Z. Yang
1f3fa13f0a Handle unbacked SymInt sized outputs in AOTAutograd (#113159)
Thanks aakhundov for constructing the test case. This PR was constructed by running the failing test case, and then fixing problems until we got all the way to the end. There are a few distinct fixes:

* AOTAutograd performs equality tests on tensor metadata to determine if a metadata mutation had occurred. If we test i0 vs i1, we should report these are NOT equal, since obviously we have somehow resized the tensor from i0 to i1 (even if, on a particular run, it is possible i0 == i1).
* There's a sketchy fix for `test_aot_autograd_exhaustive_matmul_cpu_float32` where we check if the output shape equals the tangent shape. Unfortunately, the same `definitely_true` treatment does not work here, it still fails on the example. I piled an extra sketchy fix on top of it, where I just try my best to avoid doing the view. Maybe we should have some sort of logging here.
* Partitioner needs to get out a size for unbacked SymInt when partitioning. I just feed it a random heuristic value in this case, similar to how we've been dealing with this in Inductor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113159
Approved by: https://github.com/aakhundov, https://github.com/bdhirsh
2023-11-08 04:28:38 +00:00
PyTorch MergeBot
9a28a7b498 Revert "Add support for torch.Generator type in TorchScript (#110413)"
This reverts commit 27e31ab6e8.

Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164))
2023-11-07 15:53:32 +00:00
Thiago Crepaldi
eefe327b11 Rename torch.onnx.ExportOutput* to ONNXProgram* (#112263)
Since PyTorch 2.1, torch.export API was introduced and the term "export"
got overloaded due to the already existing torch.onnx.export API.

The torch.onnx.dynamo_export API was introduced on pyTorch 2.0 and it
exposed a torch.onnx.ExportOutput which now can be confused with
torch.export.export output

To prevent such ambiguity and standardize names around the new
torch.export.ExportedProgram, this PR renames torch.onnx.ExportOutput to
torch.onnx.ONNXProgram

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112263
Approved by: https://github.com/BowenBao
ghstack dependencies: #112444
2023-11-06 22:27:15 +00:00
Antonio Kim
27e31ab6e8 Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-06 21:27:02 +00:00
Peter Bell
718035791d Prefer e.is_number over not e.free_symbols in SymPy (#112688)
We spend somewhere on the order 1% in `sympy.Expr.free_symbols` as it is called millions of times.
Most of the time we actually just want to know "is this a constant", however `e.is_constant()` is
horribly slow. It turns out though that there is another propery `is_number` that does what we want.

> property is_number:
>
> Returns True if self has no free symbols and no undefined functions (AppliedUndef, to be precise). It will be faster
> than if not self.free_symbols, however, since is_number will fail as soon as it hits a free symbol or undefined
> function.

Even further, we also avoid the overhead of building the unnecessary set object.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112688
Approved by: https://github.com/lezcano
2023-11-06 20:05:13 +00:00
Chien-Chin Huang
9d0c3e21d0 [state_dict][9/N] Add get and set APIs for model and optimizer state_dict (#112203)
The original get_state_dict and set_state_dict pair is too complicated because of the possible combinations of usages. This PR adds the APIs to get/set model_state_dict and optimizer_state_dict seperately.

Differential Revision: [D50713584](https://our.internmc.facebook.com/intern/diff/D50713584/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112203
Approved by: https://github.com/wz337
ghstack dependencies: #112167
2023-11-02 22:03:57 +00:00
Zhengxu Chen
50767a075a [export] Clean up verifier [1/n]. (#112505)
Summary: Some adjustments to verifier so that it's easier to use it correctly. We will enable verifier later, so the current diff is no-op.

Test Plan: CI

Differential Revision: D50839295

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112505
Approved by: https://github.com/tugsbayasgalan, https://github.com/angelayi
2023-11-02 19:36:06 +00:00
Jerry Zhang
6929ebf2b0 [quant][docs] Add x86 inductor quant docs (#112648)
Summary:
att

Test Plan:
.

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112648
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/andrewor14
2023-11-02 17:02:09 +00:00
Edward Z. Yang
09df6b771b Add a note about performant record_stream use. (#112526)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112526
Approved by: https://github.com/albanD
2023-11-02 15:50:22 +00:00
David Berard
8191fb3e06 [Reland2] [inductor][BE] split triton_meta and inductor_meta (#112351)
triton_meta is intended to be passed directly to triton. Previous we were also putting other metadata into triton_meta; but we should split out the other metadata into a separate dict to avoid possible conficts in the future.

This PR splits out triton_meta and inductor_meta so we have a place to put additional metadata that isn't intended to be passed to triton.

Tests - wait for CI

Differential Revision: [D50864493](https://our.internmc.facebook.com/intern/diff/D50864493)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112351
Approved by: https://github.com/eellison
2023-11-02 00:40:12 +00:00
angelayi
131e0f1b75 [export] Separate out graph signature (#112412)
Differential Revision: [D50800524](https://our.internmc.facebook.com/intern/diff/D50800524)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112412
Approved by: https://github.com/zhxchen17
2023-11-02 00:18:28 +00:00
Kurt Mohler
fd209543d5 Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)
Part of #109802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377
Approved by: https://github.com/albanD, https://github.com/aaronenyeshi
2023-11-01 16:10:09 +00:00
Till Hoffmann
5296c14094 Add inverse gamma distribution and fix sign bug in PowerTransform. (#104501)
This PR comprises a few small contributions:

1. `PowerTransform` returned a sign of `+1` irrespective of exponent. However, it should return the sign of the exponent because the gradient has the same sign as the exponent. That issue has been fixed.
2. Added tests to catch errors akin to 1. in the future.
3. Added an `InverseGamma` distribution as a `TransformedDistribution` with `PowerTransform(-1)` and `Gamma` base distribution. The `InverseGamma` is often used as a prior for the length scale of Gaussian processes to aggressively suppress short length scales (see [here](https://betanalpha.github.io/assets/case_studies/gaussian_processes.html#323_Informative_Prior_Model) for a discussion).

Note: I added a `positive` constraint for the support of the inverse gamma distribution because the `PowerTransform(-1)` can fail for `nonnegative` constraints if the random variable is zero.

```python
>>> torch.distributions.InverseGamma(0.5, 1.0).log_prob(torch.zeros(1))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-758aa22deacd> in <module>
----> 1 torch.distributions.InverseGamma(0.5, 1.0).log_prob(torch.zeros(1))

~/git/pytorch/torch/distributions/transformed_distribution.py in log_prob(self, value)
    140         """
    141         if self._validate_args:
--> 142             self._validate_sample(value)
    143         event_dim = len(self.event_shape)
    144         log_prob = 0.0

~/git/pytorch/torch/distributions/distribution.py in _validate_sample(self, value)
    298         valid = support.check(value)
    299         if not valid.all():
--> 300             raise ValueError(
    301                 "Expected value argument "
    302                 f"({type(value).__name__} of shape {tuple(value.shape)}) "

ValueError: Expected value argument (Tensor of shape (1,)) to be within the support (GreaterThan(lower_bound=0.0)) of the distribution InverseGamma(), but found invalid values:
tensor([0.])
```

This differs from the scipy implementation.

```python
>>> scipy.stats.invgamma(0.5).pdf(0)
0.0
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104501
Approved by: https://github.com/fritzo, https://github.com/ezyang
2023-11-01 02:26:25 +00:00
Tugsbayasgalan Manlaibaatar
36164265ae [export oncall] add some examples during oncall (#112445)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112445
Approved by: https://github.com/ydwu4
2023-10-31 18:33:03 +00:00
Devang Aggarwal
69b9e54d45 Add openvino backend into torch.compile docs (#112321)
The torch.compile [docs page](https://pytorch.org/docs/stable/torch.compiler.html) shows commonly used backends through torch.compile. Recently, the OpenVINO backend for torch.compile was released. This PR adds the torch.compile openvino backend into the torch.compile docs page.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112321
Approved by: https://github.com/msaroufim
2023-10-30 20:13:41 +00:00
PyTorch MergeBot
ace2713d1e Revert "Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)"
This reverts commit f1785373c0.

Reverted https://github.com/pytorch/pytorch/pull/111377 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111377#issuecomment-1784179040))
2023-10-29 17:41:55 +00:00
agunapal
1460e5b7f5 updated aarch64 maintainers in docs (#112047)
This PR adds a new section for maintainers of `aarch64`.

Adding @snadampal to the list

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112047
Approved by: https://github.com/atalman
2023-10-27 21:09:36 +00:00
lezcano
47ccf04885 Split SymNode into its own file (#112037)
This PR:

- Moves TrueDiv, LShift, RShift, IsNonOverlappingAndDenseIndicator to `_sympy.functions.py`
- Moves SymNode to `fx.experimental.sym_node`.
  - This file does not have any SymPy dependencies at import time
  - It installs the magic methods in Sym{Bool,Int,Float}.
  - N.b. With this split, we may be able to move Sym{Bool,Int,Float} to this file, and remove quite a few of the hacks around these classes
- Imports `sym_node` in `torch/__init__.py` rather than the whole `symbolic_shapes.py`.
  This breaks the import-time dependency between torch and SymPy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112037
Approved by: https://github.com/peterbell10
ghstack dependencies: #112035, #112036
2023-10-26 23:32:27 +00:00
Kurt Mohler
f1785373c0 Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)
Part of #109802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377
Approved by: https://github.com/albanD
2023-10-26 02:39:06 +00:00
eellison
7fe51e3e9b Add cudagraph_mark_step_begin in torch.compiler, reference in error message (#111722)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111722
Approved by: https://github.com/ezyang, https://github.com/msaroufim
2023-10-25 21:53:21 +00:00
Mikayla Gawarecki
b54ab57522 Document torch.from_file and fix UntypedStorage.from_file docs (#111688)
Fixes https://github.com/pytorch/pytorch/issues/37439

Also threads through filename so it is accessible via `t.storage().filename`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111688
Approved by: https://github.com/albanD
2023-10-25 19:28:11 +00:00
Thiago Crepaldi
9d4dbebc34 Add support to ExportedProgram as input to torch.onnx.dynamo_export (#111497)
Fixes #109889

This PR adds `torch.export.export` as another `FXGraphExtractor` implementation. `torch.onnx.dynamo_export` automatically uses this new FX tracer when a `torch.export.ExportedProgram` is specified as `model`

Implementation is back compatible, thus non `ExportedProgram` models are handled the exact same way as before
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111497
Approved by: https://github.com/BowenBao
2023-10-25 18:11:19 +00:00
PyTorch MergeBot
5120c97f32 Revert "Add support to ExportedProgram as input to torch.onnx.dynamo_export (#111497)"
This reverts commit 4f42edfb6e.

Reverted https://github.com/pytorch/pytorch/pull/111497 on behalf of https://github.com/huydhn due to Sorry for reverting your change, it is failing ONNX test in trunk 4f42edfb6e, possibly a landrace ([comment](https://github.com/pytorch/pytorch/pull/111497#issuecomment-1778519212))
2023-10-25 05:07:00 +00:00
Thiago Crepaldi
4f42edfb6e Add support to ExportedProgram as input to torch.onnx.dynamo_export (#111497)
Fixes #109889

This PR adds `torch.export.export` as another `FXGraphExtractor` implementation. `torch.onnx.dynamo_export` automatically uses this new FX tracer when a `torch.export.ExportedProgram` is specified as `model`

Implementation is back compatible, thus non `ExportedProgram` models are handled the exact same way as before
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111497
Approved by: https://github.com/BowenBao
2023-10-25 00:17:43 +00:00
PyTorch MergeBot
e62c887bab Revert "[inductor][BE] split triton_meta and inductor_meta (#111397)"
This reverts commit 070b94dc08.

Reverted https://github.com/pytorch/pytorch/pull/111397 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111397#issuecomment-1776282039))
2023-10-24 00:52:24 +00:00
Richard Zou
0ea9646cdd Rewrite torch.library's documentation (#111310)
We mention the higher-level torch.library APIs and put the original docs
into a low-level API section.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111310
Approved by: https://github.com/soulitzer
ghstack dependencies: #111380, #111659
2023-10-23 23:02:41 +00:00
Nikita Shulga
d22e5e4b52 Fix DDP notes (#111833)
To include `import os` otherwise sample is not syntactically correct Reported in https://github.com/pytorch/pytorch.github.io/pull/1490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111833
Approved by: https://github.com/wanchaol
2023-10-23 22:05:36 +00:00
David Berard
070b94dc08 [inductor][BE] split triton_meta and inductor_meta (#111397)
triton_meta is intended to be passed directly to triton. Previous we were also putting other metadata into triton_meta; but we should split out the other metadata into a separate dict to avoid possible conficts in the future.

This PR splits out triton_meta and inductor_meta so we have a place to put additional metadata that isn't intended to be passed to triton.

Tests - wait for CI

Differential Revision: [D50442547](https://our.internmc.facebook.com/intern/diff/D50442547)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111397
Approved by: https://github.com/shunting314, https://github.com/eellison
2023-10-23 21:38:21 +00:00
ydwu4
f3d02d9ae6 Add support for sym_ite (#111440)
This PR supports sym_ite. This is useful for converting SymBool to SymInt in e.g. #109916. Internally, it uses sympy.Piecewise. We cannot use sympy.ITE because it expects the arguments and output all to be boolean type but we want return SymInt type when converting a SymBool to SymInt. So we use sympy.Piecewise to denote the symbolic relationship.

Note that this pr uses the range analysis for sympy.Piecewise implemented in https://github.com/pytorch/pytorch/blob/main/torch/utils/_sympy/value_ranges.py.

Test Plan:
See added test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111440
Approved by: https://github.com/ezyang
2023-10-23 16:17:43 +00:00
eqy
894b9957c8 [DOCS][CUDA] Update TF32 docs for sm90 (#111337)
For #110252.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111337
Approved by: https://github.com/msaroufim
2023-10-19 09:36:13 +00:00
PyTorch MergeBot
7a740e2b85 Revert "direct runtime assertions (#111262)"
This reverts commit e6d9350d7f.

Reverted https://github.com/pytorch/pytorch/pull/111262 on behalf of https://github.com/jeanschmidt due to Breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/111262#issuecomment-1765881675))
2023-10-17 08:04:36 +00:00
Chien-Chin Huang
19a6487ad4 [state_dict][6/N] Change API names to avoid conflict and simplify the API signatures (#111120)
`state_dict` is a very common variable name people use to represent a local
state_dict and `load_state_dict` conflicts with DCP's `load_state_dict`.

This PR changes `state_dict` to `get_state_dict`. `get_state_dict` is more close to what is this API does -- users use the API to get the current state_dict for saving or for loading (passed to DCP for loading in-place)..

This PR also changes `load_state_dict` to `set_state_dict`. `set_state_dict` is less ideal compared to `get_state_dict` but is symetric. We can still change the API name before it goes to beta.

This PR also simplies the API signatures. `model_only` is removed and `optim_only` only exists for `get_state_dict`.

Differential Revision: [D50213931](https://our.internmc.facebook.com/intern/diff/D50213931/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111120
Approved by: https://github.com/wz337
ghstack dependencies: #111106, #111107, #111275, #111109, #111110
2023-10-17 00:15:31 +00:00
Avik Chaudhuri
e6d9350d7f direct runtime assertions (#111262)
Previously we were generating a graph to add runtime assertions on inputs and then running that graph to check input constraints. This PR checks input constraints directly.

Differential Revision: D50289970

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111262
Approved by: https://github.com/zhxchen17
2023-10-15 05:15:09 +00:00
fduwjj
ff3d773dd9 [TP] Add deprecation warnings in the documentations for Pairwise parallel, sequence parallel and other prepare input/output functions (#111176)
As part of TP UX improvements, we want to keep our API simple (not easy) so that users get the flexibility to do what they want and avoid a too generic API which tries to solve everything and get things too complicated. We are updating the doc accordingly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111176
Approved by: https://github.com/wanchaol
ghstack dependencies: #111160, #111166
2023-10-15 00:39:24 +00:00
fduwjj
8085e08a84 [TP] Add prepareInput and output for input/output DTensor layout annotation in the parent module in TP API (#111166)
In some use cases, we found that users might want to annote the input/output DTensor layout for the parent module rather than the submodule whose parameters are to be distributed so that we want to have these two class for users to annote input/output DTensor layouts so that we register pre-FWD/FWD hook for the TP-lized module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111166
Approved by: https://github.com/wanchaol
ghstack dependencies: #111160
2023-10-14 15:37:52 +00:00
Chien-Chin Huang
7c67139e7b [state_dict][3/N] Cleanup StateDictOptions, make it more readable (#111275)
This is a reland PR for https://github.com/pytorch/pytorch/pull/111108 with the proper docstring fix.

1. Rename DistributedStateDictOptions to StateDictOptions.
2. Remove cpu_offload as we have not yet required this option.
3. Rename save_frozen_parameters to ignore_frozen_params.

Differential Revision: [D50294352](https://our.internmc.facebook.com/intern/diff/D50294352/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111275
Approved by: https://github.com/wz337
ghstack dependencies: #111106, #111107
2023-10-14 15:34:52 +00:00
yewentao
c151163333 Documentation Clarification on torch.compile Example (#110942)
Fixes #110917
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110942
Approved by: https://github.com/msaroufim, https://github.com/malfet
2023-10-13 22:46:42 +00:00
Kaichao You
69dcbc02b0 [Dynamo]Expose bytecode hooks and add example usage for decompilation in docs (#110714)
Dynamo dynamically translate bytecode of python functions, which is powerful but with difficult-to-understand bytecode. Most users cannot understand python bytecode. Although a general purpose way to decompile python bytecode into source code is very difficult, I find that this work can be greatly simplified since Dynamo already cleans up the code: the bytecode generated by Dynamo is a reduced subset of well-behaved python bytecode.

I created a tiny decompiler for pytorch 2.0, named `depyf`: https://github.com/youkaichao/depyf .

There are several takeaways:

- **It supports pyton 3.7 - 3.11 (both inclusive), the same python versions supported by pytorch.** Since the main usage of this library is to understand pytorch 2.0, I plan to keep pace with pytorch. If pytorch supports a new python version, I can add support for that. (Actually, the core code is just about 1k lines. Adding support for new versions of python bytecode can be done in just several days.)
- **I have tested the correctness of decompiled source code in torchbench.** I capture the modified bytecode generated by Dynamo, decompile it into source code, and then compile it into new bytecode, replace the Dynamo generated bytecode with new bytecode. And **it passed all the accuracy tests for timm models**. For huggingface models, the situation is more complicated: all failed cases are caused by the compile step: some functions use the `__class__`  as closure variables, but decompiler can only get the code object, so it has no way to figure out the `__class__` , leading to a name error when compiling the decompiled code. That said, it passed the rest tests without the `__class__` issue. Please see the log file https://cloud.tsinghua.edu.cn/f/685e4af8d930499baa7c/?dl=1 and https://cloud.tsinghua.edu.cn/f/cab89500e15e4b62890b/?dl=1 for details.

With the above efforts, I think it would be great to add an additional logging option in Dynamo: we can try to decompile the generated bytecode into source code, so that users can have a rough idea of what the modified bytecode does. It does not affect the workflow of Dynamo, but just adds more debug information.

An example code from the [doc](https://pytorch.org/docs/main/torch.compiler_deepdive.html):

```python
from typing import List
import torch
from torch import _dynamo as torchdynamo
def my_compiler(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]):
    print("my_compiler() called with FX graph:")
    gm.graph.print_tabular()
    return gm.forward  # return a python callable

@torchdynamo.optimize(my_compiler)
def toy_example(a, b):
    x = a / (torch.abs(a) + 1)
    if b.sum() < 0:
        b = b * -1
    return x * b
for _ in range(100):
    toy_example(torch.randn(10), torch.randn(10))
```

Run with `export TORCH_LOGS="+dynamo,guards,bytecode"`.

Bytecode logging:

```
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE toy_example /Users/youkaichao/DeepLearning/depyf/ykc_test.py line 8
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  10           0 LOAD_FAST                0 (a)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               2 LOAD_GLOBAL              0 (torch)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               4 LOAD_METHOD              1 (abs)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               6 LOAD_FAST                0 (a)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               8 CALL_METHOD              1
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              10 LOAD_CONST               1 (1)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              12 BINARY_ADD
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              14 BINARY_TRUE_DIVIDE
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              16 STORE_FAST               2 (x)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  11          18 LOAD_FAST                1 (b)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              20 LOAD_METHOD              2 (sum)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              22 CALL_METHOD              0
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              24 STORE_FAST               3 (__temp_2)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  12          26 LOAD_FAST                3 (__temp_2)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              28 LOAD_CONST               2 (0)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              30 COMPARE_OP               0 (<)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              32 POP_JUMP_IF_FALSE       21 (to 42)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  13          34 LOAD_FAST                1 (b)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              36 LOAD_CONST               3 (-1)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              38 BINARY_MULTIPLY
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              40 STORE_FAST               1 (b)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  14     >>   42 LOAD_FAST                2 (x)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              44 LOAD_FAST                1 (b)
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              46 BINARY_MULTIPLY
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              48 RETURN_VALUE
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE toy_example /Users/youkaichao/DeepLearning/depyf/ykc_test.py line 8
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]   8           0 LOAD_GLOBAL              3 (__compiled_fn_0)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               2 LOAD_FAST                0 (a)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               4 LOAD_FAST                1 (b)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               6 CALL_FUNCTION            2
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               8 UNPACK_SEQUENCE          2
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              10 STORE_FAST               2 (x)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              12 POP_JUMP_IF_FALSE       12 (to 24)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              14 LOAD_GLOBAL              4 (__resume_at_34_1)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              16 LOAD_FAST                1 (b)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              18 LOAD_FAST                2 (x)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              20 CALL_FUNCTION            2
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              22 RETURN_VALUE
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]         >>   24 LOAD_GLOBAL              5 (__resume_at_42_2)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              26 LOAD_FAST                1 (b)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              28 LOAD_FAST                2 (x)
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              30 CALL_FUNCTION            2
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              32 RETURN_VALUE
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
```

New output with this PR:

```
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code:
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def toy_example(a, b):
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]     __temp_1 = __compiled_fn_0(a, b)
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]     x = __temp_1[0]
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]     if __temp_1[1]:
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]         return __resume_at_34_1(b, x)
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]     return __resume_at_42_2(b, x)
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues.
```

The rest two log (please pay attention to the output `possible source code:`):

```
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  12           0 JUMP_ABSOLUTE           22 (to 44)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               2 LOAD_FAST                2 (a)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               4 LOAD_GLOBAL              0 (torch)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               6 LOAD_ATTR                1 (abs)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               8 LOAD_FAST                2 (a)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              10 CALL_FUNCTION            1
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              12 LOAD_CONST               1 (1)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              14 BINARY_ADD
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              16 BINARY_TRUE_DIVIDE
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              18 STORE_FAST               1 (x)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              20 LOAD_FAST                0 (b)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              22 LOAD_ATTR                2 (sum)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              24 CALL_FUNCTION            0
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              26 STORE_FAST               3 (__temp_2)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              28 LOAD_FAST                3 (__temp_2)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              30 LOAD_CONST               2 (0)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              32 COMPARE_OP               0 (<)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              34 POP_JUMP_IF_FALSE       22 (to 44)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              36 LOAD_FAST                0 (b)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              38 LOAD_CONST               3 (-1)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              40 BINARY_MULTIPLY
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              42 STORE_FAST               0 (b)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  14     >>   44 LOAD_FAST                1 (x)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              46 LOAD_FAST                0 (b)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              48 BINARY_MULTIPLY
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              50 RETURN_VALUE
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  12           0 LOAD_GLOBAL              3 (__compiled_fn_3)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               2 LOAD_FAST                0 (b)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               4 LOAD_FAST                1 (x)
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               6 CALL_FUNCTION            2
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               8 UNPACK_SEQUENCE          1
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              10 RETURN_VALUE
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code:
[2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def <resume in toy_example>(b, x):
[2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]     return __compiled_fn_3(b, x)[0]
[2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues.
```

```
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  12           0 JUMP_ABSOLUTE           18 (to 36)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               2 LOAD_FAST                2 (a)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               4 LOAD_GLOBAL              0 (torch)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               6 LOAD_ATTR                1 (abs)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               8 LOAD_FAST                2 (a)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              10 CALL_FUNCTION            1
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              12 LOAD_CONST               1 (1)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              14 BINARY_ADD
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              16 BINARY_TRUE_DIVIDE
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              18 STORE_FAST               1 (x)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              20 LOAD_FAST                0 (b)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              22 LOAD_ATTR                2 (sum)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              24 CALL_FUNCTION            0
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              26 STORE_FAST               3 (__temp_2)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              28 LOAD_FAST                3 (__temp_2)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              30 LOAD_CONST               2 (0)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              32 COMPARE_OP               0 (<)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              34 POP_JUMP_IF_FALSE       22 (to 44)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  13     >>   36 LOAD_FAST                0 (b)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              38 LOAD_CONST               3 (-1)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              40 BINARY_MULTIPLY
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              42 STORE_FAST               0 (b)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  14     >>   44 LOAD_FAST                1 (x)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              46 LOAD_FAST                0 (b)
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              48 BINARY_MULTIPLY
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              50 RETURN_VALUE
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]  12           0 LOAD_GLOBAL              3 (__compiled_fn_4)
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               2 LOAD_FAST                0 (b)
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               4 LOAD_FAST                1 (x)
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               6 CALL_FUNCTION            2
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]               8 UNPACK_SEQUENCE          1
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]              10 RETURN_VALUE
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code:
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def <resume in toy_example>(b, x):
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]     return __compiled_fn_4(b, x)[0]
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG]
[2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110714
Approved by: https://github.com/jansel
2023-10-13 12:36:00 +00:00
Zhengxu Chen
168bad5f23 [export] Reland "Fix graph signature data model to list of specs." (#111136)
Summary: reland D49876258

Test Plan: CI

Differential Revision: D50224384

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111136
Approved by: https://github.com/angelayi
2023-10-13 02:04:29 +00:00
Matthew Hoffman
ad4472833c define public API for torch.nn.utils (#111026)
Adding modules imported here and the following functions to the `__all__`:
* [clip_grad_norm_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html)
* [clip_grad_value_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_value_.html)
* [remove_weight_norm](https://pytorch.org/docs/stable/generated/torch.nn.utils.remove_weight_norm.html)
* [parameters_to_vector](https://pytorch.org/docs/stable/generated/torch.nn.utils.parameters_to_vector.html)
* [vector_to_parameters](https://pytorch.org/docs/stable/generated/torch.nn.utils.vector_to_parameters.html)
* [remove_spectral_norm](https://pytorch.org/docs/stable/generated/torch.nn.utils.remove_spectral_norm.html)
* [skip_init](https://pytorch.org/docs/stable/generated/torch.nn.utils.skip_init.html)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111026
Approved by: https://github.com/mikaylagawarecki
2023-10-12 23:05:23 +00:00
PyTorch MergeBot
42b89aea4b Revert "[export] Fix graph signature data model to list of specs. (#111017)"
This reverts commit 33b69509d3.

Reverted https://github.com/pytorch/pytorch/pull/111017 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111017#issuecomment-1759292161))
2023-10-12 09:52:33 +00:00
Tugsbayasgalan Manlaibaatar
5614023f5e Move export.constrain_as_* to torch._constrain_as_* (#110757)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110757
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #109859
2023-10-12 05:37:44 +00:00
PyTorch MergeBot
6ce3a38050 Revert "Move export.constrain_as_* to torch._constrain_as_* (#110757)"
This reverts commit 5aee22e0e0.

Reverted https://github.com/pytorch/pytorch/pull/110757 on behalf of https://github.com/kit1980 due to Depends on https://github.com/pytorch/pytorch/pull/109859 that needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/110757#issuecomment-1758908371))
2023-10-12 04:53:29 +00:00
albanD
5e8be63e99 Allow specifiying inputs as GradientEdge in autograd APIs (#110867)
This can be useful for advanced users (like AOTAutograd) who don't want to keep the corresponding Tensor alive (for memory reasons for example) or when inplace op will change the Tensor's grad_fn (but gradients wrt to the original value is needed).

I went minimal API change but open to suggestions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110867
Approved by: https://github.com/soulitzer
2023-10-12 04:08:44 +00:00
Zhengxu Chen
33b69509d3 [export] Fix graph signature data model to list of specs. (#111017)
Summary:
Previously we design the GraphSignature format as a bunch of inputs and outputs node names. After a discussion in the design meeting we decide to change the format to make signature more self-contained. Now the signature format look like the following:
```
[
InputSpec(
   kind=InputKind.USER_INPUT,
   arg=TensorArgument(name="arg0_1"),
   target=None,
),
...
]
```

Test Plan: CI

Reviewed By: angelayi

Differential Revision: D49876258

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111017
Approved by: https://github.com/angelayi
2023-10-12 03:39:04 +00:00
Kurt Mohler
5292a92e03 Add torch.unravel_index (#110580)
Fixes #35674

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110580
Approved by: https://github.com/lezcano, https://github.com/kulinseth
2023-10-12 00:55:51 +00:00
Michael Voznesensky
1e7947b3e0 Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323)" + Forward fixes + test (#110964)
This reverts commit f786fbdebd.

Forward fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2023-10-11 05:16:47 +00:00
wz337
a614281ea9 Add current_device() to torch.cpu (#110987)
Better support device agnostic, add a "cpu" return for `current_device()` in torch.cpu so that we won't run into `AttributeError: module 'torch.cpu' has no attribute 'current_device'`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110987
Approved by: https://github.com/wanchaol
2023-10-11 05:13:10 +00:00
Tugsbayasgalan Manlaibaatar
5aee22e0e0 Move export.constrain_as_* to torch._constrain_as_* (#110757)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110757
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #109859
2023-10-11 02:37:55 +00:00
soulitzer
c9eb8d8d90 Add set_checkpoint_debug_enabled that overrides local setting (#110728)
People access activation checkpoint through many layers of config and it is not always guaranteed that all the layers of wrapping around checkpoint properly propagate all the kwargs, e.g. debug mode. This context manager offers an alternative way to enable debug mode that bypasses the need for all layers to propagate kwargs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110728
Approved by: https://github.com/albanD
ghstack dependencies: #110673, #110674, #110675, #110676
2023-10-11 02:12:31 +00:00
Jerry Zhang
7a69e3d30b [fx][subgraph_matcher] Add a matcher that supports name to node map (#110743)
Summary:
We want the matcher to return a name -> node in target graph
so that we can refer to the node by name, this is useful for downstream applications like
quantization.

and also we can use the torch API as source of truth instead of matching aten API directly.

Test Plan:
python test/fx/test_matcher_utils.py

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110743
Approved by: https://github.com/SherlockNoMad
2023-10-10 22:21:24 +00:00
angelayi
3704bf4ee8 [export] Update custom ops docs (#110492)
Updating the doc links in the custom ops documentation in export
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110492
Approved by: https://github.com/avikchaudhuri
2023-10-09 23:40:40 +00:00
Wanchao Liang
28d7d7fc42 device agnostic: torch.cpu.set_device (#110716)
to support device agnostic, add a dummpy placeholder in torch.cpu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110716
Approved by: https://github.com/albanD
2023-10-09 23:00:15 +00:00
Kazuaki Ishizaki
50bd252863 Fix typo the the (#110869)
This PR fixes typo `the the` of comments and exception message in files.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110869
Approved by: https://github.com/soulitzer
2023-10-09 19:32:45 +00:00
ydwu4
d84bcb9c8c [HigherOrderOp] expose torch.cond (#110293)
This pr expose torch._higher_order_ops.cond as torch.cond.

1. Need to add #noqa: F811 to the _check calls in torch/__init__.py to address some confusing linter error "Redefinition of unused 'cond'" but only one cond is imported and for these lines that have this error, they don't define the cond but just use it as an argument.
2. Also add cond to the list that allows it to be traced through so as dynamo could trigger the CondHigherOrder logic instead of creating a TorchVariable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110293
Approved by: https://github.com/zou3519
2023-10-07 20:39:52 +00:00
albanD
a0bbd075b2 Add the Mode section in the extending doc (#110073)
Cover the basic principles of Mode and an example on how to use them and their behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110073
Approved by: https://github.com/janeyx99
2023-10-06 23:50:55 +00:00
PyTorch MergeBot
576b80d23e Revert "[HigherOrderOp] expose torch.cond (#110293)"
This reverts commit 601f872831.

Reverted https://github.com/pytorch/pytorch/pull/110293 on behalf of https://github.com/ydwu4 due to Sorry, didn't check the error carefully on the PR. A doc error is related to this pr ([comment](https://github.com/pytorch/pytorch/pull/110293#issuecomment-1751176719))
2023-10-06 17:44:17 +00:00
ydwu4
601f872831 [HigherOrderOp] expose torch.cond (#110293)
This pr expose torch._higher_order_ops.cond as torch.cond.

1. Need to add #noqa: F811 to the _check calls in torch/__init__.py to address some confusing linter error "Redefinition of unused 'cond'" but only one cond is imported and for these lines that have this error, they don't define the cond but just use it as an argument.
2. Also add cond to the list that allows it to be traced through so as dynamo could trigger the CondHigherOrder logic instead of creating a TorchVariable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110293
Approved by: https://github.com/zou3519
2023-10-06 17:04:31 +00:00
albanD
c4db607607 Doc test non packages (#110568)
Add non-package python modules to the public API checks.
The original change is to remove the `ispkg` check in this line
https://github.com/pytorch/pytorch/blob/main/docs/source/conf.py#L518

Everything else is to add the appropriate modules to the rst files, make sure every module we provide can be imported (fixed by either making optional dependencies optional or just deleting files that have been un-importable for 3 years), make API that are both modules and functions (like torch.autograd.gradcheck) properly rendered on the docs website without confusion and add every non-documented API to the allow list (~3k of them).

Next steps will be to try and fix these missing docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110568
Approved by: https://github.com/zou3519
2023-10-06 14:16:01 +00:00
Banit Agrawal
64583c4d04 [CUDA Host Allocator] Add support of CudaHostRegister (#108488)
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister.

Differential Revision: D45843715

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108488
Approved by: https://github.com/zdevito
2023-10-06 04:13:02 +00:00
Zhengxu Chen
be5dc3a00d [export] Update ArgumentSpec definition. (#110612)
Summary: Changing ArgumentSpec into a true union type in Python without changing serialization format.

Test Plan: CI

Differential Revision: D49871088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110612
Approved by: https://github.com/angelayi
2023-10-06 03:14:45 +00:00
Angela Yi
a93337ed55 [export] Add ir spec (#110394)
Summary: Copied IR spec over from Executorch

Test Plan: _docs_

Differential Revision: D49829187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110394
Approved by: https://github.com/ydwu4, https://github.com/gmagogsfm
2023-10-05 03:06:30 +00:00
ydwu4
6db3853eeb Add doc for torch.cond (#108691)
We add a doc for torch.cond. This PR is a replacement of https://github.com/pytorch/pytorch/pull/107977.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108691
Approved by: https://github.com/zou3519
2023-10-04 21:24:14 +00:00