Commit Graph

143 Commits

Author SHA1 Message Date
Oguz Ulgen
920f0426ae Add None return type to init -- tests rest (#132376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132376
Approved by: https://github.com/jamesjwu
ghstack dependencies: #132335, #132351, #132352
2024-08-01 15:44:51 +00:00
Animesh Jain
bcd1d2e832 [dynamo] Introduce UnspecializedNNModule guard source (#132304)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132304
Approved by: https://github.com/yanboliang
ghstack dependencies: #132302
2024-08-01 04:35:43 +00:00
ekamiti
9e473fd868 Make adding Buffers more like adding Parameters (#125971)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971
Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos
2024-07-31 10:32:40 +00:00
Edward Z. Yang
e55e9d8126 Clear speculation log when restarting due to compiler collective (#131983)
The compiler collective can trigger an input to become dynamic, which
can trigger operations to be recorded to the graph, which would change
the speculation log entries (since they only start being recorded once
we have a non-empty output graph).  Test case triggers this situation.

Production instance:
https://www.internalfb.com/mlhub/pipelines/runs/mast/f584750649-TrainingApplication?job_attempt=2&version=0&env=PRODUCTION

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131983
Approved by: https://github.com/anijain2305, https://github.com/mlazos
2024-07-29 22:32:10 +00:00
Edward Z. Yang
0c6f1ca064 Introduce torch._dynamo.config.enable_compiler_collectives for syncing compilation across ranks (#130935)
This PR implements an opt-in configuration option for synchronizing compilation across all ranks at the end of Dynamo tracing (and potentially, other places in the future). There are two pieces to this PR:

1. Implementing infrastructure for compiler collectives (DistributedState/LocalState, the actual collective)
2. Using this infrastructure to synchronize automatic dynamic choices across all ranks

The infrastructure in part one can be used for other purposes, just add more (serializable) fields to LocalState.

Here is how automatic dynamic synchronization works:

1. Preflight in "torch/_dynamo/variables/builder.py": On the first Dynamo trace run, we trace without automatic dynamic at all; we assume all Tensor inputs that are not otherwise marked are static. This run is purely to collect all Tensor input sizes in the program.
2. torch/_dynamo/output_graph.py: At the end of the first Dynamo trace run, we perform a compiler collective to distribute all Tensor input sizes to all ranks. Then, we restart Dynamo
3. Apply the updates in "torch/_dynamo/variables/builder.py": Now that we have all sizes for every rank, we now update frame state with the observed sizes for all ranks, in rank order. Under the assumption that frame state is consistent on all ranks, this series of updates will preserve consistency.

For future work, it would be safer if we force a consistent hint on all ranks; this is more involved as we have to interpose in fakification.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130935
Approved by: https://github.com/jansel
2024-07-24 11:24:11 +00:00
Edward Z. Yang
0099e15b47 Also put unbacked symbols in symbol_to_node in split_module pass (#130535)
This is not a complete fix but it is a simple one, full fix tracked
in https://github.com/pytorch/pytorch/issues/130534

Internal xref:
https://fb.workplace.com/groups/6829516587176185/posts/7510238679103969/

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130535
Approved by: https://github.com/malfet
2024-07-15 16:56:01 +00:00
Animesh Jain
f2f4dde2d3 [dynamo] Remove ID_MATCH for FSDPModuleVariable (#129015)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129015
Approved by: https://github.com/yf225
ghstack dependencies: #129098
2024-06-20 19:23:32 +00:00
Will Feng
ad2593cb86 [Animesh's PR #125340] [dynamo][fsdp] Track FSDPNNModuleVariable for mutations (#129045)
This is a copy of Animesh's work in https://github.com/pytorch/pytorch/pull/125340, with very small changes to the unit test. It's needed sooner for the Traceable FSDP2 work, so I copy it here and will work through landing it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129045
Approved by: https://github.com/anijain2305
2024-06-20 04:02:36 +00:00
Animesh Jain
c0b87afcad [RELAND2][dynamo][nn-modules] Trace through nn.Module dunder methods for UnspecializedNNModule (#126578)
Tracing through `__init__`  is important because it initializes (calls STORE_ATTR) on members. By doing that, we kick in the mutation tracking for these objects. So, things like mutating `_modules` etc is tracked automatically.

Fixes https://github.com/pytorch/pytorch/issues/111837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126578
Approved by: https://github.com/jansel
2024-06-12 04:09:23 +00:00
PyTorch MergeBot
adb699189b Revert "[RELAND][dynamo][nn-modules] Trace through nn.Module dunder methods for UnspecializedNNModule (#126578)"
This reverts commit b2d602306a.

Reverted https://github.com/pytorch/pytorch/pull/126578 on behalf of https://github.com/clee2000 due to failed internal test D58394084.  Author has forward fix but includes external changes so reverting is a bit easier to coordinate ([comment](https://github.com/pytorch/pytorch/pull/126578#issuecomment-2161481839))
2024-06-11 19:41:41 +00:00
Animesh Jain
b2d602306a [RELAND][dynamo][nn-modules] Trace through nn.Module dunder methods for UnspecializedNNModule (#126578)
Tracing through `__init__`  is important because it initializes (calls STORE_ATTR) on members. By doing that, we kick in the mutation tracking for these objects. So, things like mutating `_modules` etc is tracked automatically.

Fixes https://github.com/pytorch/pytorch/issues/111837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126578
Approved by: https://github.com/jansel
ghstack dependencies: #128295
2024-06-10 23:11:04 +00:00
PyTorch MergeBot
44371bd432 Revert "[dynamo][nn-modules] Trace through nn.Module dunder methods for UnspecializedNNModule (#126578)"
This reverts commit 7ede78f9f5.

Reverted https://github.com/pytorch/pytorch/pull/126578 on behalf of https://github.com/anijain2305 due to pippy tests fail ([comment](https://github.com/pytorch/pytorch/pull/126578#issuecomment-2155836555))
2024-06-08 06:35:34 +00:00
Animesh Jain
7ede78f9f5 [dynamo][nn-modules] Trace through nn.Module dunder methods for UnspecializedNNModule (#126578)
Tracing through `__init__`  is important because it initializes (calls STORE_ATTR) on members. By doing that, we kick in the mutation tracking for these objects. So, things like mutating `_modules` etc is tracked automatically.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126578
Approved by: https://github.com/jansel
ghstack dependencies: #128001
2024-06-06 23:05:49 +00:00
Xuehai Pan
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
PyTorch MergeBot
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af6.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
Xuehai Pan
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
Animesh Jain
ae5e2ab92e [dynamo][fsdp] Use Tensor match for FSDP modules (#125827)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125827
Approved by: https://github.com/yf225, https://github.com/jansel
ghstack dependencies: #125828, #125805
2024-05-09 21:26:15 +00:00
Animesh Jain
5ba777f46e [guards][cpp-guards] Optimize NN module getattr guards (#124522)
Improves the guard overhead of MobileBert model with nn module guards from 92000 units to 20000 units.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124522
Approved by: https://github.com/jansel
ghstack dependencies: #125439, #125421
2024-05-04 22:08:56 +00:00
Yuanhao Ji
e3effa5855 Enable UFMT on all of test/distributed (#123539)
Partially addresses #123062

Ran lintrunner on:

- `test/distributed`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123539
Approved by: https://github.com/ezyang
2024-04-17 06:46:02 +00:00
PyTorch MergeBot
52be63eb2c Revert "Enable UFMT on all of test/distributed (#123539)"
This reverts commit 89ac37fe91.

Reverted https://github.com/pytorch/pytorch/pull/123539 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/123539#issuecomment-2058329471))
2024-04-16 06:33:21 +00:00
Yuanhao Ji
89ac37fe91 Enable UFMT on all of test/distributed (#123539)
Partially addresses #123062

Ran lintrunner on:

- `test/distributed`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123539
Approved by: https://github.com/ezyang
2024-04-16 03:23:56 +00:00
Boyuan Feng
0c1ac4484d Support call_method in DDPOptimizer (#121771)
This PR fixes Issue #111279.

While #111279 reported the issue with `MultiheadAttention`, a minimal reproduction would be:
```python
class ToyModel(nn.Module):
    def __init__(self,):
        super().__init__()
        self.linear = nn.Linear(128, 10)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear.forward(x) # Error
        # return self.linear(x) # OK
```

Dynamo treats `self.linear(x)` as `call_module` while treating `self.linear.forward(x)` as a [`get_attr` and a `call_method`](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/variables/nn_module.py#L358-L378). However, existing DDPOptimizer assumes, for a `get_attr` node, `getattr(gm, node.target)` gives a tensor with the `requires_grad` attribute. Existing DDPOptimizer also does not support `call_method` nodes.

This PR adds support for `call_method` and check on `get_attr`. It also checks if a module's parameters have been added to a bucket to support multiple method calls from the same module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121771
Approved by: https://github.com/yf225
2024-03-13 20:03:15 +00:00
Elias Ellison
d03b11ad5b Pass inductor strides forward in ddp optimizer (#120523)
# Note: Returning Fake Tensors on First AOT Autograd Call
            #
            # Inductor will optimize strides of outputs when it deems it profitable.
            # For instance, converting to channels last. When we split the graph here
            # into multiple inductor compilations, we need to make sure that the
            # output strides of one compilation is appropriately passed to the subsequent
            # compilations. However, the mapping from inductor output to dynamo output
            # is non-trivial due to aot_autograd's deduping, de-aliasing, mutation, re-writing,
            # subclass handling, etc. In order to replay all this logic we set a flag such that
            # the first invocation of inductor in aot_autograd will return Fake Tensors with
            # appropriate strides. Then, all of aot autograd's runtime logic is replayed.
            # This gives us the appropriately strided outputs here which will reflect runtime strides.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120523
Approved by: https://github.com/yf225, https://github.com/bdhirsh
2024-02-29 22:25:00 +00:00
Alexander Grund
b5b36cf0c4 Fix failure of test_dynamo_distributed & test_inductor_collectives (#117741)
When CUDA is not available `c10d.init_process_group("nccl"...)` will fail with
> RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

Hence add a corresponding skip marker to the classes deriving from DynamoDistributedSingleProcTestCase next to the `requires_nccl` marker.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117741
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-01-25 13:25:36 +00:00
Edward Z. Yang
5c700f60a5 Properly preserve SymInt input invariant when splitting graphs (#117406)
Fixes https://github.com/pytorch/pytorch/issues/111636
Fixes https://github.com/pytorch/pytorch/issues/108877
Fixes https://github.com/pytorch/pytorch/issues/116956

Inductor has an invariant that every dynamic shape symbol s0, s1, etc. which is referenced by an input tensor must also be passed in explicitly as an argument. It has some capability of reverse engineering symbols if it's obvious how to get them (e.g., if you pass in `arg: f32[s0, 4]` it will know that it can retrieve `s0 = arg.size(0)`) but in full generality it is not always possible to derive this (e.g., if the only mention of s0 is in `arg2: f32[s0 + s1, 4]`).  However, the graph splitter used by optimize_ddp did not respect this invariant. This PR makes it respect it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117406
Approved by: https://github.com/wconstab
2024-01-15 15:04:57 +00:00
Will Feng
a27ed4d364 [dynamo / DDP] Add optimize_ddp_lazy_compile config to control lazy compile for DDPOptimizer (False by default) (#116292)
We want to enable `optimize_ddp_lazy_compile` by default as soon as possible, becuase it will fix stride mismatch errors (see motivation: https://github.com/pytorch/pytorch/pull/114154).

However, lazy compile currently causes shape mismatch in other cases (`test_graph_split_inductor_transpose`) and we need to fix them before we can enable it by default.

Differential Revision: D52373445

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116292
Approved by: https://github.com/williamwen42, https://github.com/wconstab
2023-12-21 22:34:24 +00:00
Jon Chuang
2cf0cf8137 [dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)
Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154
Approved by: https://github.com/wconstab, https://github.com/yf225
2023-12-06 18:50:14 +00:00
willfengg
01afa54df5 [dynamo][FSDP] unit test: FSDP should not be lifted as fx graph attrs (#115112)
this was a SEV when FSDP modules are registered as graph attributes this unit test prevents it from happening again

without SEV fix: D48810186
```
python test/distributed/test_dynamo_distributed.py -k
test_fsdp_skip_register_attr_or_module

  File "/data/users/weif/pytorch/torch/_dynamo/repro/after_dynamo.py",
line 117, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File
"/data/users/weif/pytorch/test/distributed/test_dynamo_distributed.py", line 897, in debug_compiler
    self.assertFalse(name in node.name, f"FSDP module {name} should not
be registered as attributes")
torch._dynamo.exc.BackendCompilerFailed: backend='debug_compiler' raised:
AssertionError: True is not false : FSDP module l__self___net_0_weight should not be registered as attributes
```

with SEV fix: D48810186
```
python test/distributed/test_dynamo_distributed.py -k test_fsdp_skip_register_attr_or_module

Ran 1 test in 6.438s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115112
Approved by: https://github.com/mlazos
2023-12-05 19:16:03 +00:00
Rohan Varma
3c78ea4c9d [DDP][Compile] Test to Ensure torch.compile works w/static_graph=True (#114621)
Resolves https://github.com/pytorch/pytorch/issues/93672. This was
actually fixed by https://github.com/pytorch/pytorch/pull/103487 but I didn't
realize that PR also fixes torch compile at the time.

Differential Revision: [D51596148](https://our.internmc.facebook.com/intern/diff/D51596148/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114621
Approved by: https://github.com/wconstab
2023-12-01 22:18:45 +00:00
PyTorch MergeBot
e38a3a6079 Revert "[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)"
This reverts commit 3f574eadb4.

Reverted https://github.com/pytorch/pytorch/pull/114154 on behalf of https://github.com/clee2000 due to reverted internally, broke internal builds, not sure why bot isn't working ([comment](https://github.com/pytorch/pytorch/pull/114154#issuecomment-1832496040))
2023-11-29 18:43:17 +00:00
Jon Chuang
3f574eadb4 [dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)
Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154
Approved by: https://github.com/wconstab
2023-11-28 06:29:43 +00:00
PyTorch MergeBot
e239a2b2d7 Revert "[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)"
This reverts commit 266054c3ca.

Reverted https://github.com/pytorch/pytorch/pull/114154 on behalf of https://github.com/DanilBaibak due to The lower PR in the stack https://github.com/pytorch/pytorch/pull/113926 breaks the internal build ([comment](https://github.com/pytorch/pytorch/pull/114154#issuecomment-1822704476))
2023-11-22 12:46:15 +00:00
PyTorch MergeBot
2c4930a91d Revert "[fx/DDP] add nested ctx_manager test for DDP Dynamo (#114056)"
This reverts commit d5d62e8561.

Reverted https://github.com/pytorch/pytorch/pull/114056 on behalf of https://github.com/malfet due to Breaks inductor_distributed, see d5d62e8561 ([comment](https://github.com/pytorch/pytorch/pull/114056#issuecomment-1822006423))
2023-11-22 02:52:31 +00:00
Edward Z. Yang
6187153753 Consolidate sym/non-sym overloads for _make_wrapper_subclass (#114236)
I'm not sure why we needed two overloads previously, let's find out! Removing the int overload is load bearing because it now forces specialization on SymInt arguments instead of falling through to the SymInt overload, see new test.

I decided NOT to allow storage offset simultaneously with None strides.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114236
Approved by: https://github.com/albanD
2023-11-22 02:03:29 +00:00
Jon Chuang
d5d62e8561 [fx/DDP] add nested ctx_manager test for DDP Dynamo (#114056)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114056
Approved by: https://github.com/wconstab
2023-11-22 01:08:25 +00:00
Jon Chuang
266054c3ca [dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154)
Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154
Approved by: https://github.com/wconstab
2023-11-21 22:40:08 +00:00
Jon Chuang
54d04553ea [fx, DDP] fx.split_module will setup/unwind autocast & grad_mode (#113374)
---

Replaces: https://github.com/pytorch/pytorch/pull/112231
Fixes: https://github.com/pytorch/pytorch/issues/111794

DDPOptimizer splits modules. We need to setup/unwind global states (autocast, grad_enabled) for each split, as this affects downstream compilation.

---

See before and after this PR for the split fx modules here (for autocast mode): https://github.com/pytorch/pytorch/pull/112231#issuecomment-1804274605

---

### Discussion
We don't actually have to do this for grad mode: https://github.com/pytorch/pytorch/pull/112231#issuecomment-1804280031. It's not wrong to do it anyway, but maybe unnecessary? But may still be better to keep this PR's changes so we're sure what the grad mode state ought to be for each subgraph.

It may come in handy in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113374
Approved by: https://github.com/wconstab
2023-11-21 21:29:59 +00:00
Peter Bell
9f47580ad7 [BE] Don't mutate torch.compile global config in tests (#113882)
We should uniformly use `config.patch` so the configuration changes don't effect
different tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113882
Approved by: https://github.com/lezcano
2023-11-17 16:49:48 +00:00
Kazuaki Ishizaki
9089242048 Fix typo under test directory (#112346)
This PR fixes typo in comments and messages under `test` directory. This PR also fixes related typo in messages under `torch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112346
Approved by: https://github.com/kit1980, https://github.com/ezyang
2023-11-03 07:53:33 +00:00
Jon Chuang
2ed3a73e40 [dynamo] treat torch.device, torch.dtype as constant literal; revise guards to have access to torch module (#112426)
Just like e.g. container - list/set of constant literals, these are constant literals.

We follow up to https://github.com/pytorch/pytorch/pull/112416, enforcing that we always use `ConstantVariable` to represent these.

Replace https://github.com/pytorch/pytorch/pull/112284, https://github.com/pytorch/pytorch/pull/112332 as incomplete, in case there is no movement there.

Ought to fix: https://github.com/pytorch/pytorch/issues/109910

We remove old guards special-casing, which fell back on str equality when not having access to `torch` module in `eval`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112426
Approved by: https://github.com/ezyang
2023-11-01 05:28:28 +00:00
Oguz Ulgen
1df14f1bf8 Move has_triton to top level triton utils so that dynamo can also access (#109832)
it without creating cyclic dependencies

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109832
Approved by: https://github.com/zou3519
2023-09-22 19:33:41 +00:00
Animesh Jain
2b6d983b8b Reland [dynamo][activation checkpointing] Trace through ActivationWrapper (#109327)
Fixes https://github.com/pytorch/pytorch/issues/108269
Original reverted PR - https://github.com/pytorch/pytorch/pull/108599

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109327
Approved by: https://github.com/aakhundov
2023-09-15 03:43:59 +00:00
CK Luk
366baf690b Back out "[Dynamo x FSDP] Add support for params, buffers, submodules on FSDPManagedNNModuleVariable (#107923)" (#108823)
Summary:
Original commit changeset: 33650f7cb0fb

Original Phabricator Diff: D48833682

Test Plan: See T162942232 for how we figured out that this diff caused significant numeric difference.

Reviewed By: voznesenskym

Differential Revision: D49082219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108823
Approved by: https://github.com/xw285cornell
2023-09-08 14:39:43 +00:00
PyTorch MergeBot
77691e8bc3 Revert "[dynamo][activation checkpointing] Trace through ActivationWrapper (#108599)"
This reverts commit 9efe0f7bf2.

Reverted https://github.com/pytorch/pytorch/pull/108599 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but test_ddp_activation_checkpointing is failing distributed ROCm test in trunk ([comment](https://github.com/pytorch/pytorch/pull/108599#issuecomment-1710479387))
2023-09-07 16:47:40 +00:00
Animesh Jain
9efe0f7bf2 [dynamo][activation checkpointing] Trace through ActivationWrapper (#108599)
Fixes https://github.com/pytorch/pytorch/issues/108269

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108599
Approved by: https://github.com/rohan-varma
2023-09-07 00:32:18 +00:00
wz337
66af4f6ec7 [HSDP] Add device_mesh to FSDP kwarg and add dtensor state_dict support for HSDP (#107533)
This PR:
1) Add device_mesh kwarg to FSDP. Remove init_device_mesh() from _runtime_utils.py, as device_mesh would be passed in by user as an kwarg.
2) change use_dtensor flag for state_dict_config and optim_state_dict_config to be private. If device_mesh is used with sharded model/optim state dict, _use_dtensor flag would be set to True and model/optim state dict would return dtensor state_dict. Otherwise, _use_dtensor flag would be set to False and model/optim state dict would return sharded_tensor state_dict.
3) Update _optim_utils.py, _shard_utils.py, and _state_dict_utils.py to add support for HSDP to return 2D DTensor state_dict.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107533
Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wanchaol
2023-09-05 21:21:21 +00:00
voznesenskym
f3a8d57aea [Dynamo x FSDP] Add support for params, buffers, submodules on FSDPManagedNNModuleVariable (#107923)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107923
Approved by: https://github.com/wconstab
2023-08-29 08:54:13 +00:00
Jason Lu
bc88028e8e Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743)
Summary:
Original commit changeset: 81319beb97f3

Original Phabricator Diff: D47961182

Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822

Reviewed By: atuljangra

Differential Revision: D48131623

@diff-train-skip-merge
(D48131623 landed internally)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743
Approved by: https://github.com/malfet
2023-08-08 15:27:34 +00:00
Mikayla Gawarecki
d8e5f2aa6d Reland "Make adding buffers more like adding parameters (#104069)" (#106224)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224
Approved by: https://github.com/atalman, https://github.com/albanD
2023-07-31 17:18:56 +00:00
Michael Voznesensky
8549abc347 Grab bag of DTensor enablement stuff (Enable whole graph capture for DTensor) (#105787)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105787
Approved by: https://github.com/ezyang
2023-07-30 00:17:45 +00:00
Edward Z. Yang
edebdaf182 Change _dynamo.explain to be explain(f)(*args, **kwargs) (#106066)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106066
Approved by: https://github.com/wanchaol, https://github.com/voznesenskym
2023-07-27 03:21:52 +00:00
Andrey Talman
c6653b65d8 Back out "Make adding buffers more like adding parameters (#104069)" (#105581)
Summary:
D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/

with `TypeError: register_buffer() takes 3 positional arguments but 4 were given`

Original commit changeset: d4b4069fbd38

Original Phabricator Diff: D47537831

Test Plan:
```
buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform
```

Reviewed By: atalman

Differential Revision: D47600140

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581
Approved by: https://github.com/mikaylagawarecki
2023-07-20 03:39:53 +00:00
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Jack Taylor
c9a806be28 [ROCm] enable additional inductor/dynamo UTs (#104624)
Enables additional inductor UTs on ROCm and un skips outdated skips.

I have also removed a group of failures in `test_torchinductor_opinfo` which are now passing for CUDA and ROCm

```
-    # The following 3 tests fail on CUDA with AssertionError: expected size 5==5, stride 5==1 at dim=0
-    # linalg._svd's return value has different strides on CUDA vs CPU which causes this
-    # In test_meta.py there is a mechanism to skipping strides checks for some ops
-    # (including _linalg_svd), possibly we should have something similar here
-    "linalg.cond": {f32, f64},
-    "linalg.svdvals": {f32, f64},
-    "linalg.matrix_rank": {f32, f64},
-    "linalg.svd": {f32, f64},
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104624
Approved by: https://github.com/malfet
2023-07-11 20:44:02 +00:00
Animesh Jain
d0e5c681f5 [dynamo][ddp][ac] Fallback to single bucket when higher order op (#104639)
This helps unblock an internal model. The real fix requires lot of work, which might question the alternate approach of partitioning AOT graphs instead of Dynamo graphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104639
Approved by: https://github.com/wconstab
2023-07-06 02:20:15 +00:00
Animesh Jain
75dab587ef [dynamo] FSDP + AC + torch.compile (#103953)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953
Approved by: https://github.com/wanchaol
2023-06-24 01:40:56 +00:00
Jack Taylor
ede1965f5d Enable additional inductor test suites on ROCm (#102270)
Enables additional inductor UTs on ROCm, following from https://github.com/pytorch/pytorch/pull/100981

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102270
Approved by: https://github.com/malfet
2023-06-22 00:36:35 +00:00
Mark Saroufim
95fced4483 Pretty dataclass dynamo explain (#102869)
Also thinking out loud: maybe we only print graph break reasons? And for the rest we have a verbose print which prints everything?

TODO: some tests are failing based on what they expect a guard string to look like, easy to fix i'll do it early next week

# After

```
(sourcetorch) ubuntu@ip-172-31-1-136:~/test$ python pretty.py
BREAK
Graph Count: 2
Graph Break Count: 1
Op Count: 2
Break Reasons:
  Break Reason 1:
    Reason: call_function BuiltinVariable(print) [ConstantVariable(str)] {}
    User Stack:
      <FrameSummary file /home/ubuntu/test/pretty.py, line 6 in fn>
Ops per Graph:
  Ops 1:
    <built-in function add>
  Ops 2:
    <built-in function add>
Out Guards:
  Guard 1:
    Name: ''
    Source: global
    Create Function: GRAD_MODE
    Guard Types: ['GRAD_MODE']
    Code List: ['___is_grad_enabled()']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 2:
    Name: ''
    Source: global
    Create Function: DEFAULT_DEVICE
    Guard Types: ['DEFAULT_DEVICE']
    Code List: ['utils_device.CURRENT_DEVICE == None']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 3:
    Name: "G['print']"
    Source: global
    Create Function: BUILTIN_MATCH
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 4:
    Name: ''
    Source: global
    Create Function: DETERMINISTIC_ALGORITHMS
    Guard Types: ['DETERMINISTIC_ALGORITHMS']
    Code List: ['not ___are_deterministic_algorithms_enabled()']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 5:
    Name: "L['x']"
    Source: local
    Create Function: TENSOR_MATCH
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 6:
    Name: ''
    Source: global
    Create Function: GRAD_MODE
    Guard Types: ['GRAD_MODE']
    Code List: ['___is_grad_enabled()']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 7:
    Name: ''
    Source: global
    Create Function: DEFAULT_DEVICE
    Guard Types: ['DEFAULT_DEVICE']
    Code List: ['utils_device.CURRENT_DEVICE == None']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 8:
    Name: ''
    Source: global
    Create Function: DETERMINISTIC_ALGORITHMS
    Guard Types: ['DETERMINISTIC_ALGORITHMS']
    Code List: ['not ___are_deterministic_algorithms_enabled()']
    Object Weakref: None
    Guarded Class Weakref: None
  Guard 9:
    Name: "L['x']"
    Source: local
    Create Function: TENSOR_MATCH
    Guard Types: None
    Code List: None
    Object Weakref: None
    Guarded Class Weakref: None
Compile Times: TorchDynamo compilation metrics:
Function                        Runtimes (s)
------------------------------  --------------
_compile                        0.0164, 0.0035
OutputGraph.call_user_compiler  0.0000, 0.0000
```

## Before

```
('Dynamo produced 2 graphs with 1 graph break and 2 ops', [{Guard(name='print', source=<GuardSource.GLOBAL: 1>, create_fn=<function GuardBuilder.BUILTIN_MATCH at 0x7f92ea5009d0>, is_volatile=False, guard_types=None, code_list=None, obj_weakref=None, guarded_class_weakref=None), Guard(name='x', source=<GuardSource.LOCAL: 0>, create_fn=<function GuardBuilder.TENSOR_MATCH at 0x7f92ea501000>, is_volatile=False, guard_types=['TENSOR_MATCH'], code_list=None, obj_weakref=<weakref at 0x7f9224d28f40; dead>, guarded_class_weakref=<weakref at 0x7f92d81734c0; to 'torch._C._TensorMeta' at 0x540b610 (Tensor)>)}, {Guard(name='x', source=<GuardSource.LOCAL: 0>, create_fn=<function GuardBuilder.TENSOR_MATCH at 0x7f92ea501000>, is_volatile=False, guard_types=['TENSOR_MATCH'], code_list=None, obj_weakref=<weakref at 0x7f9224d5e700; dead>, guarded_class_weakref=<weakref at 0x7f92d81734c0; to 'torch._C._TensorMeta' at 0x540b610 (Tensor)>)}], [GraphModule(), GraphModule()], [[<built-in function add>], [<built-in function add>]], [GraphCompileReason(reason='call_function BuiltinVariable(print) [ConstantVariable(str)] {}', user_stack=[<FrameSummary file <ipython-input-1-9e2ddb639697>, line 6 in fn>]), GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file <ipython-input-1-9e2ddb639697>, line 8 in <graph break in fn>>])], 'Dynamo produced 2 graphs with 1 graph break and 2 ops\n Break reasons: \n\n1. call_function BuiltinVariable(print) [ConstantVariable(str)] {}\n  File "<ipython-input-1-9e2ddb639697>", line 6, in fn\n    print("BREAK")\n \n2. return_value\n  File "<ipython-input-1-9e2ddb639697>", line 8, in <graph break in fn>\n    return x\n \nTorchDynamo compilation metrics:\nFunction                        Runtimes (s)\n------------------------------  --------------\n_compile                        0.0418, 0.0084\nOutputGraph.call_user_compiler  0.0001, 0.0001')

```

## Program

```python
import torch
import torch._dynamo

def fn(x):
    x = x + 1
    print("BREAK")
    x = x + 1
    return x

out = torch._dynamo.explain(fn, torch.randn(10))
print(out)

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102869
Approved by: https://github.com/voznesenskym
2023-06-07 22:38:57 +00:00
Aaron Gokaslan
3e2ea32dab [BE]: Enable ruff rule TRY302 and apply fixes (#101874)
Removes useless try statements and unreachable code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101874
Approved by: https://github.com/malfet
2023-05-19 17:30:52 +00:00
Jack Taylor
187eb7ca88 Enable default workflow PyT 2.0 UTs on ROCm stack (#100981)
PR to enable default workflow PyTorch 2.0 unit tests for the ROCm stack.

- Enables all the dynamo unit test suites
- Enables some of the inductor unit test suites
       - `test_config`
       - `test_cpp_wrapper` (cpu only)
       - `test_minifier`
       - `test_standalone_compile`
       - `test_torchinductor_dynamic_shapes`
       - `test_torchinductor_opinfo`
       - `test_torchinductor`
       - `test_triton_wrapper`
- Introduces TEST_WITH_ROCM conditions for unit test skip/fail dictionaries in test_torchinductor_dynamic_shapes.py and test_torchinductor_opinfo.py

Note this PR follows on from the discussions for the previous UT enablement PR https://github.com/pytorch/pytorch/pull/97988, we have opted to only enable a few inductor suites at the moment to ease the upstreaming effort as these files are changing very quickly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100981
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet
2023-05-15 23:45:04 +00:00
Andrew Gu
23de2e0620 [Dynamo] Fix staticmethods for FSDP (#100117)
This PR fixes capturing static methods for FSDP-managed modules. Previously, if a static method was invoked using `self.<staticmethod>`, then Dynamo would pass `self` twice to the method, causing a graph break due to the method being "unsupported". This PR achieves this by checking for `staticmethod` and using `UserFunctionVariable` instead of `UserMethodVariable`, which handles the correct calling convention.

This fixes FSDP + PT2 on HuggingFace's `T5ForConditionalGeneration`, which otherwise reports an error like the following based on the most recent trunk:
```
Output 0 of AsStridedBackward0 is a view of a view which was created in no_grad mode and is being modified inplace with grad mode enabled.
```
This is in reference to the `scores` tensor in `scores += position_bias_masked` ([code](a0ae2310ec/src/transformers/models/t5/modeling_t5.py (L559))).

I am not clear if this PR's fix is actually masking a different problem though. I wonder if there are edge cases with respect to Dynamo resuming execution and input mutations. Possibly, this PR only side steps the problem because there is no more recompilation at the static method `_relative_position_bucket()` ([code](a0ae2310ec/src/transformers/models/t5/modeling_t5.py (L443))).

In `UserDefinedObjectVariable.var_getattr()`, there is an existing branch:
e5291e633f/torch/_dynamo/variables/user_defined.py (L395-L398)
I am not clear on when this branch can be triggered since if `subobj` is a static method, it still takes the `FunctionTypes` branch:
e5291e633f/torch/_dynamo/variables/user_defined.py (L403-L404)
To preserve backward compatibility, the current version of this PR only modifies this `FunctionTypes` branch to differentiate between `staticmethod` and not `staticmethod`.

The PR that added this `FunctionTypes` branch is https://github.com/pytorch/pytorch/pull/92050/, and I checked that the added test `test_torch_distributions_functions()` only exercises the non-`staticmethod` case (since `Independent.log_prob` is not a `staticmethod`).

The last commit in `pytorch` that touched the `staticmethod` branch before https://github.com/pytorch/pytorch/pull/92050/ was the move from the `torchdynamo` repo into `pytorch`, so I cannot easily tell which test cases it corresponds to.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100117
Approved by: https://github.com/anijain2305
2023-04-28 14:31:20 +00:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Andrew Gu
3c5a825f3c [AOTAutograd] Fix is-duplicate check in de-dup guard logic (#98932)
**Context**
The existing check to see if an arg is duped is `if dupe_arg_pos != kept_pos:`. However, this incorrectly considers every arg after a true duped arg to also be a duped arg.

Consider `flat_args = [a, b, b, c]`, where indices `1` and `2` are duped.
- `add_dupe_map = {0: 0, 1: 1, 2: 1, 3: 2}`
- For `dupe_arg_pos=2, kept_pos=1`, `2 != 1`, so the check correctly identifies the second `b` to be a duped arg.
- For `dupe_arg_pos=3, kept_pos=2`, `3 != 2`, so the check incorrectly identifies the `c` to be a duped arg.

Indeed, if there were more args like `[a, b, b, c, d, e, ...]`, every arg after the second `b` will be considered a duped arg since its `kept_pos` will always be 1 lower than its `dupe_arg_pos`.

**Overview**
This PR changes `add_dupe_map` to be implemented as a `List[int]`, where the list index implicitly represents the `dupe_arg_pos` and the list element represents the `kept_pos`. We use a list to have stable in-order iteration and because we know the keys to be in `{0, 1, ..., len(flat_args) - 1}`.

With `add_dupe_map` as a list, the `is_dupe_arg` condition is whether the entry in `add_dupe_map` shows a new not-yet-seen index in the iteration. One way to do this is to count the number of unique args so far and compare against that.

This closes https://github.com/pytorch/pytorch/issues/98883, where now the guards change from
```
GUARDS ___guarded_code.valid
and ___check_type_id(L['self'], 93996836333040)
and ___check_obj_id(L['self'], 140119034997536)
and not ___are_deterministic_algorithms_enabled()
and ___check_tensors(L['x'])
and L['self']._buf is L['self']._buf_module._buf
and L['self']._buf_module._buf is L['self']._param
```
to without the final incorrect `L['self']._buf_module._buf is L['self']._param` guard.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98932
Approved by: https://github.com/ezyang
2023-04-12 22:22:50 +00:00
Andrew Gu
c9adc4c376 [Dynamo] De-dup graph inputs (#98775)
###  Overview
This PR de-duplicates graph inputs in TorchDynamo, using the `Source` as the unique identifier for each input. This closes https://github.com/pytorch/pytorch/issues/98743 and https://github.com/pytorch/pytorch/issues/98625.

### Details
`VariableBuilder.wrap_tensor()` should return a `VariableTracker` for the passed-in `value: Tensor`. If `value` is duplicated, we should avoid calling `OutputGraph.create_graph_input()` and `OutputGraph.add_grapharg()`.
- Note that `create_graph_input()` and `add_grapharg()` are not 1:1. For a constant source and either `wrap_sym()` or `wrap_unspecialized_primitive()`, TorchDynamo still calls `create_graph_input()` but not `add_grapharg()`.
- Note that `create_graph_input()` should be called before constructing the corresponding `VariableTracker`. TorchDynamo needs the `fx.Proxy` object to pass to `wrap_fx_proxy()`.

In this PR, the `OutputGraph` saves an additional mapping `input_source_to_var` from each graph input's `Source` to its `VariableTracker`, which works because `Source` is now hashable. This mapping should be updated each time `create_graph_input()` is called. However, since we must construct the `VariableTracker` after `create_graph_input()` returns, we must have a separate call to the `OutputGraph` to update the mapping.

If anyone has any suggestion on how to coalesce this logic and avoid having to remember to update `input_source_to_var` for each `create_graph_input()`, I would love to hear it.

<details>
<summary> Alternate Approach</summary>

Initially, I tried having TorchDynamo construct a new but equivalent `VariableTracker` for the duplicated tensor. However, I abandoned this approach after hitting an assertion in `def wrap_fx_proxy_cls()` due to `"example_value"` already being in the proxy node's metadata because we were reusing the primary tensor's `Proxy` object. Reusing the exact `VariableTracker` also seems less error-prone instead of requiring constructing a new but identical `VariableTracker`.
</details>

### Testing
#### Global Variable Test
```
import torch
@torch.compile()
def f():
    return x + x
x = torch.randn(3)
f()
```

Before:
```
====== Forward graph 0 ======
 <eval_with_key>.6 class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: f32[3], arg1_1: f32[3]):
        # File: /data/users/ezyang/b/pytorch/ff.py:5, code: return x + x
        add: f32[3] = torch.ops.aten.add.Tensor(arg0_1, arg1_1);  arg0_1 = arg1_1 = None
        return (add,)
```

After (only `arg0_1` and no more `arg1_1`):
```
 ====== Forward graph 0 ======
 <eval_with_key>.4 class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: f32[3]):
        # File: dynamo/test_dup_global.py:8, code: return x + x
        add: f32[3] = torch.ops.aten.add.Tensor(arg0_1, arg0_1);  arg0_1 = None
        return (add,)
```

#### FSDP Test
Before we error on
```
File "/.../pytorch/torch/_guards.py", line 244, in __post_init__
    assert self.input_source_a != self.input_source_b
```
and now there is no error.

---
The rename from `name_to_input` to `input_name_to_proxy` is not part of the core logic change and is a remnant from initial attempts. I can undo it later if desired, but I also feel that the new name is more informative. It also fixes the type annotation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98775
Approved by: https://github.com/ezyang, https://github.com/voznesenskym
2023-04-11 18:07:20 +00:00
Michael Voznesensky
b1e60bfb6a Pass f_locals as a dict rather than kwargs (#98107)
Fixes https://github.com/pytorch/pytorch/issues/97688

One big problem is that instead of printing x < y we now print
`E["x"] < E["y"]` and now all of the tests wobbled and I'm mad.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98107
Approved by: https://github.com/ezyang
2023-04-04 00:30:08 +00:00
Will Constable
c1a6dde79e Make dynamo-FSDP skip guards (#97463)
Create a new GuardSource for FSDP modules, and use it
to opt out of guard installation.

Based on @awgu's work in https://github.com/pytorch/pytorch/pull/97091

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97463
Approved by: https://github.com/voznesenskym, https://github.com/jansel, https://github.com/awgu
2023-03-28 04:04:34 +00:00
Will Constable
9fb9219478 Make DDPOptimizer work with torch._dynamo.explain() (#94749)
GraphModules that were created during DDPOptimizer graph breaking
lacked `compile_subgraph_reason`, which caused an exception when
running .explain().

Now the reason is provided and users can use .explain() to find out
that DDPOptimizer is causing graph breaks.

Fixes #94579

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94749
Approved by: https://github.com/voznesenskym
2023-02-14 01:33:47 +00:00
Xuehai Pan
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
Jason Ansel
2b0d7e63f0 Move dynamo.optimizations.distributed to backends (#93408)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93408
Approved by: https://github.com/wconstab
2023-02-02 20:42:17 +00:00
Will Constable
ac791bddce Refactor dynamo distributed test helpers to be reusable (#93187)
The point is to let Test helpers previously defined and used in `test_dynamo_distributed.py` be used from a new file `test_traceable_collectives.py` later in this stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93187
Approved by: https://github.com/kumpera
2023-02-01 06:09:42 +00:00
Will Constable
648202ceb9 Improve DDPOptimizer by avoiding small preamble graph (#93162)
This optimizes an edge case where some compute-only ops (e.g. add)
could end up in an orphan graph at the input side due to the bucket
for the next graph being full already.  The fix is to fuse this
graph (which is "empty" in parameter count) together with the adjoining
"full" bucket.

Note: i encountered this when trying to repro some suspected duplicate
argument errors, but this is unrelated and I have not yet repro'd
a duplicate arg issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93162
Approved by: https://github.com/davidberard98
2023-01-28 15:33:53 +00:00
Will Constable
5441f2c067 Fix DDPOptimizer fake_mode execution (#92986)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

* __->__ #92986

When running compiled submods for the purpose of producing outputs to pass
to the compilation step for the next submod, we use fake parameters and
assume fake inputs, but we forgot to activate our fake_mode during execution.

This caused certain edge cases where tensors other than activations or parameters
got created during execution, such as scalar->tensor expansion in the case
of executing torch.where(tensor, scalar, scalar).

Also add a test and clarify behavior of DDPOptimizer via comments.

Fixes #92941

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92986
Approved by: https://github.com/bdhirsh
2023-01-26 00:37:54 +00:00
Edward Z. Yang
6dcc214ac2 Fix AssertionError fake_mode is not None in distributed (#90392)
Fixes https://github.com/pytorch/pytorch/issues/90375

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90392
Approved by: https://github.com/voznesenskym
2022-12-07 20:12:39 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
Will Constable
705ad36cc5 Dynamo asserts FSDP wrapped modules use_orig_param (#89523)
- This is a strict requirement given the way dynamo+FSDP is implemented,
  but isn't convenient to assert.
- By plumbing use_orig_param field on all wrapped modules, we can
  do this assertion inside dynamo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89523
Approved by: https://github.com/awgu
2022-11-29 05:27:23 +00:00
Edward Z. Yang
95563b3eda Reland "Add single process version of dynamo distributed hf_Bert tests (#89721)" (#89756)
This reverts commit 0d9a615af4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89756
Approved by: https://github.com/anjali411, https://github.com/malfet
2022-11-28 19:15:03 +00:00
PyTorch MergeBot
0d9a615af4 Revert "Add single process version of dynamo distributed hf_Bert tests (#89721)"
This reverts commit 1a2dd6b15e.

Reverted https://github.com/pytorch/pytorch/pull/89721 on behalf of https://github.com/ezyang due to this broke inductor_distributed job
2022-11-28 14:56:54 +00:00
Edward Z. Yang
49eb43fc45 Don't modify log level in dynamo distributed test (#89655)
Let the developer decide!

Taken from voz's https://github.com/pytorch/pytorch/pull/89392

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89655
Approved by: https://github.com/albanD
2022-11-28 14:47:52 +00:00
Edward Z. Yang
1a2dd6b15e Add single process version of dynamo distributed hf_Bert tests (#89721)
It's a lot easier to debug problems in the Dynamo optimization pass if
you aren't actually triggering a multiprocessing run.  Keep these tests
around.

I think the other tests can probably get this treatment too, leaving
this to future work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721
Approved by: https://github.com/voznesenskym
2022-11-28 03:16:47 +00:00
David Berard
304b5de1b0 Re-enable test_hf_bert_fsdp (#89223)
It looks like this failure was actually caused by https://github.com/pytorch/pytorch/pull/88629, see the revert message on that PR. It probably just looked like a flaky test on CI because of how quickly the PR was reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89223
Approved by: https://github.com/voznesenskym
2022-11-18 21:40:27 +00:00
Michael Voznesensky
06ce1338bc [dynamo] Port all pytorch/dynamo and test/dynamo pieces over from symbolic-shapes branch (#88768)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88768
Approved by: https://github.com/jansel, https://github.com/ezyang
2022-11-13 04:50:21 +00:00
Will Constable
a3f3ec8fac [FSDP+dynamo]: forward treats parameter-views as params (#88781)
Dynamo+AotAutograd needs a way to wrap all tensors (whether
inputs or params/buffers) in FakeTensor wrappers, and
FSDP's mangling of parameters hides them from this wrapping.

This PR unblocks running hf_bert and hf_T5 with FSDP under dynamo, whether using recursive wrapping around transformer layers or only applying FSDP around the whole model.  Perf/memory validation and possibly optimization is the next step.
`python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager`
`python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager --fsdp_wrap`
`python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager`
`python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager --fsdp_wrap`

The problem:
Dynamo (Actually aot_autograd) trips up with FSDP becuase it must
wrap all input tensors in FakeTensor wrappers, and it only knows
to wrap graph inputs or named_(parameters, buffers).  FSDP's
pre_forward hook sets views (which are not nn.param) into the flatparam
as attrs on the module with the same name as the original param, but
they will not show up in named_parameters.

- in use_orig_params mode, FSDP still de-registers
  params during pre-forward hook, then re-registers them
  post-forward
- during forward (between the hooks), the params are setattr'd
  on the module as regular view tensors, not nn.Parameters
- note: use_orig_params is the recommended way to use FSDP,
  and use_orig_params=False is being deprecated.  So i only consider
  use_orig_params=True for this enablement

The solution:
- adding them to named_buffers is not possible because it interferes
  with how FSDP's `_apply` works
- since they are not actual nn.parameters, register_parameter will
  complain about registering them
- simply seting `module._parameters[name] = view` seems to be a viable
  workaround, despite being hacky, and FSDP code does modify _parameters
  directly already.

Note: Manual checkpointing still isn't working with FSDP+dynamo,
so that will have to be addressed in a follow up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88781
Approved by: https://github.com/ezyang, https://github.com/awgu
2022-11-12 01:17:23 +00:00
Will Constable
3fd0729bb6 DDPOptimizer replace debug=True/False with using torchdynamo logger (#88480)
Example output:

```
2022-11-04 05:09:29,525] torch._dynamo.optimizations.distributed: [INFO]
DDPOptimizer bucket assignments
┌─────────┬────────────┬───────────────────┐
│   Index │   Size (b) │ Param Names       │
├─────────┼────────────┼───────────────────┤
│       0 │  100120020 │ self_net_6_weight │
├─────────┼────────────┼───────────────────┤
│         │            │ self_net_6_bias   │
├─────────┼────────────┼───────────────────┤
│         │            │ self_net_4_weight │
├─────────┼────────────┼───────────────────┤
│         │            │ self_net_4_bias   │
├─────────┼────────────┼───────────────────┤
│       1 │  100020000 │ self_net_2_weight │
├─────────┼────────────┼───────────────────┤
│         │            │ self_net_2_bias   │
├─────────┼────────────┼───────────────────┤
│       2 │     220000 │ self_net_0_weight │
├─────────┼────────────┼───────────────────┤
│         │            │ self_net_0_bias   │
└─────────┴────────────┴───────────────────┘
[2022-11-04 05:09:29,527] torch._dynamo.optimizations.distributed: [DEBUG]
---orig graph---
graph():
    %inputs : torch.Tensor [#users=1] = placeholder[target=inputs]
    %self_net_0 : [#users=1] = call_module[target=self_net_0](args = (%inputs,), kwargs = {})
    %self_net_1 : [#users=1] = call_module[target=self_net_1](args = (%self_net_0,), kwargs = {})
    %self_net_2 : [#users=1] = call_module[target=self_net_2](args = (%self_net_1,), kwargs = {})
    %self_net_3 : [#users=1] = call_module[target=self_net_3](args = (%self_net_2,), kwargs = {})
    %self_net_4 : [#users=1] = call_module[target=self_net_4](args = (%self_net_3,), kwargs = {})
    %self_net_5 : [#users=1] = call_module[target=self_net_5](args = (%self_net_4,), kwargs = {})
    %self_net_6 : [#users=1] = call_module[target=self_net_6](args = (%self_net_5,), kwargs = {})
    %self_net_7 : [#users=1] = call_module[target=self_net_7](args = (%self_net_6,), kwargs = {})
    return (self_net_7,)

---split graph---
graph():
    %inputs : torch.Tensor [#users=1] = placeholder[target=inputs]
    %submod_0 : [#users=1] = call_module[target=submod_0](args = (%inputs,), kwargs = {})
    %submod_1 : [#users=1] = call_module[target=submod_1](args = (%submod_0,), kwargs = {})
    %submod_2 : [#users=1] = call_module[target=submod_2](args = (%submod_1,), kwargs = {})
    return (submod_2,)

---submod_0 graph---
graph():
    %inputs : [#users=1] = placeholder[target=inputs]
    %self_net_0 : [#users=1] = call_module[target=self_net_0](args = (%inputs,), kwargs = {})
    %self_net_1 : [#users=1] = call_module[target=self_net_1](args = (%self_net_0,), kwargs = {})
    return self_net_1

---submod_1 graph---
graph():
    %self_net_1 : [#users=1] = placeholder[target=self_net_1]
    %self_net_2 : [#users=1] = call_module[target=self_net_2](args = (%self_net_1,), kwargs = {})
    %self_net_3 : [#users=1] = call_module[target=self_net_3](args = (%self_net_2,), kwargs = {})
    return self_net_3

---submod_2 graph---
graph():
    %self_net_3 : [#users=1] = placeholder[target=self_net_3]
    %self_net_4 : [#users=1] = call_module[target=self_net_4](args = (%self_net_3,), kwargs = {})
    %self_net_5 : [#users=1] = call_module[target=self_net_5](args = (%self_net_4,), kwargs = {})
    %self_net_6 : [#users=1] = call_module[target=self_net_6](args = (%self_net_5,), kwargs = {})
    %self_net_7 : [#users=1] = call_module[target=self_net_7](args = (%self_net_6,), kwargs = {})
    return self_net_7

---------------
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88480
Approved by: https://github.com/anj-s, https://github.com/davidberard98
2022-11-05 02:40:51 +00:00
Will Constable
678d038001 Support DDP ignored parameters in DDPOptimizer (#88460)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88460
Approved by: https://github.com/aazzolini
2022-11-04 21:42:15 +00:00
Will Constable
70b00b1383 Add hf_bert + DDP multigpu test (#88435)
Spot-checks an e2e model working with ddp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88435
Approved by: https://github.com/davidberard98
2022-11-04 03:17:48 +00:00
Will Constable
a51da28551 Support multi-gpu CI for inductor-distributed (#87996)
This test by itself isn't the end goal, but it is a minimal test that exercises multi-gpu and the focus of the PR is the infra behind enabling that.  I'll follow up with more tests using actual models etc.

and @malfet @desertfire for awareness/feedback on the infra side
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87996
Approved by: https://github.com/aazzolini
2022-11-02 03:52:20 +00:00
Will Constable
82a9de16d4 Change dynamo/distributed tests to use cuda/nccl (#88133)
- FSDP tests require nccl
- also run in inductor shard and skip inductor in distributed shard
- inductor shard has newer GPU and supports triton/inductor, but only runs on trunk
- distributed shard runs on PR, but inductor shard only runs on trunk/opt-in

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88133
Approved by: https://github.com/davidberard98
2022-11-01 15:35:44 +00:00
Will Constable
91c95ff7c5 Enable graph_split_inductor test as it runs now (#87762)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87762
Approved by: https://github.com/davidberard98
2022-10-26 22:06:03 +00:00
Will Constable
aa66c6e01e Fix missing weight init and clean up helper (#87760)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87760
Approved by: https://github.com/davidberard98
2022-10-26 19:29:35 +00:00
Will Constable
233305a852 Improvements for DDP Optimizer (#87549)
- adds support for 'first_bucket_cap' arg, to align bucketing more precisely
  with DDP, which may start a smaller first bucket
- refactors the bucket splitting logic to be cleaner
- adds pretty-print for bucket info, and a way to access bucket info
  from the DDPOptimizer class from a test case or benchmark
- dumps debug logs to stdout

cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87549
Approved by: https://github.com/soumith
2022-10-24 03:40:43 +00:00
PyTorch MergeBot
0ef0a78196 Revert "Improvements for DDP Optimizer (#87525)"
This reverts commit cf693a02e0.

Reverted https://github.com/pytorch/pytorch/pull/87525 on behalf of https://github.com/ZainRizvi due to The macos error messages look like they were indeed caused by this PR
2022-10-22 04:51:33 +00:00
Will Constable
cf693a02e0 Improvements for DDP Optimizer (#87525)
- adds support for 'first_bucket_cap' arg, to align bucketing more precisely
  with DDP, which may start a smaller first bucket
- refactors the bucket splitting logic to be cleaner
- adds pretty-print for bucket info, and a way to access bucket info
  from the DDPOptimizer class from a test case or benchmark
- dumps debug logs to stdout

cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87525
Approved by: https://github.com/davidberard98
2022-10-22 03:44:12 +00:00
Will Constable
b18fadae88 Re-enable dynamo ddp tests (#87524)
- Move dynamo dist tests to another shard
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87524
Approved by: https://github.com/davidberard98
2022-10-22 03:29:02 +00:00