Commit Graph

252 Commits

Author SHA1 Message Date
Brian Hirsh
ba86dfcd83 AOTDispatch subclass (#104483)
This is a PoC of AOTDispatch support. This PR actually works on basic examples, and I'm working on testing it out on `DTensor` (with @wanchaol), `SemiStructuredSparsityTensor` (with @jcaip), and `FP8Tensor`.

There are some design decisions baked into the PR that I think we need consensus on though - so I'm planning on writing a larger design doc to go over the changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104483
Approved by: https://github.com/ezyang
2023-10-10 16:13:16 +00:00
chilli
201d02ef77 stop non-differentiable values from being materialized in aotautograd (#110721)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110721
Approved by: https://github.com/bdhirsh
ghstack dependencies: #110720
2023-10-09 20:18:19 +00:00
chilli
c596db762f refactor aotautograd to set requires_grad on info rather than a separate array (#110720)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110720
Approved by: https://github.com/bdhirsh
2023-10-09 20:18:19 +00:00
Kazuaki Ishizaki
b5f9696d81 Fix typo under torch directory (#110824)
This PR fixes typo `the the` of comments and exception messages in files under `torch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110824
Approved by: https://github.com/H-Huang
2023-10-09 19:16:43 +00:00
chilli
6d23193aab Added strict=True to zip in aot_autograd (#110668)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110668
Approved by: https://github.com/ezyang
ghstack dependencies: #110501, #110504, #110591
2023-10-06 05:12:05 +00:00
chilli
f767a6c57a Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504
Approved by: https://github.com/mlazos, https://github.com/eellison
ghstack dependencies: #110501
2023-10-05 15:47:30 +00:00
PyTorch MergeBot
1e4c0641ce Revert "Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)"
This reverts commit 9648df1a6a.

Reverted https://github.com/pytorch/pytorch/pull/110504 on behalf of https://github.com/PaliC due to temporarily will revert as it's causing problems with difftrain import ([comment](https://github.com/pytorch/pytorch/pull/110504#issuecomment-1749132253))
2023-10-05 15:28:23 +00:00
chilli
9648df1a6a Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504
Approved by: https://github.com/mlazos, https://github.com/eellison
ghstack dependencies: #110501
2023-10-05 01:34:57 +00:00
chilli
e686341f64 Consider that ops can be fused into cat in the min-cut partitioner (#110501)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110501
Approved by: https://github.com/eellison
2023-10-05 01:34:57 +00:00
Xuehai Pan
0daa7d4815 [test][docs] Fix doctest warnings for syntax errors (#110517)
Fixes some syntax errors in doctest find in CI tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110517
Approved by: https://github.com/albanD
2023-10-05 00:00:06 +00:00
Brian Hirsh
b457e3f79a Reland attempt 2 of "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406)" (#109906)" (#110079)
The first reland broke internal (failing diff: D49617462).

The major error looks like it's because there's an internal-only higher order op that needs a new functionalization rule. I'm going to land an internal diff for that and confirm tests pass before relanding this PR.

Also confirmed that the issue from https://github.com/pytorch/pytorch/issues/110121 is fixed, and added a test.

This reverts commit 1b90f07f5a.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110079
Approved by: https://github.com/ezyang
2023-10-03 18:50:25 +00:00
Ubuntu
16e3f158b9 Add function to port FX minified graph to HLO via StableHLO (#109084)
If `XLA_HLO_DEBUG` flag is enabled, generated a minified HLO graph when using the minifier. This function enables HLO minification support by porting the minified FX graph to StableHLO via the `save_torch_model_as_stablehlo` function.

This allows users to port the minified graph to compilers that are not compatible with TorchDynamo/Inductor workflow and use XLA instead. The purpose of this PR is to help XLA users debug accuracy and compilation errors. It will also be helpful for existing TorchDynamo/XLA workflow on `torchxla_trace_once` backend as well.

Fixes [#5461](https://github.com/pytorch/xla/issues/5461) in Torch XLA repo. CC @GleasonK @qihqi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109084
Approved by: https://github.com/anijain2305
2023-10-02 19:36:04 +00:00
PyTorch MergeBot
1b90f07f5a Revert "Reland "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406)" (#109906)"
This reverts commit d0fe8fa5db.

Reverted https://github.com/pytorch/pytorch/pull/109906 on behalf of https://github.com/atalman due to Breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/109906#issuecomment-1735416852))
2023-09-26 12:10:25 +00:00
Brian Hirsh
d0fe8fa5db Reland "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406)" (#109906)
I'm pretty sure this is fixed but I'll run inductor and trunk CI. The failing test in trunk previously was that the selective activation checkpointing code that landed recently assumes that it can detect whether or not AOTAutograd is running by seeing if the inputs to SAC are C++ `FunctionalTensorWrapper`s

previous land broke some inductor trunk tests

This reverts commit 629a628cc8.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109906
Approved by: https://github.com/ezyang
2023-09-25 14:53:54 +00:00
PyTorch MergeBot
629a628cc8 Revert "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406)"
This reverts commit b5d6e831a9.

Reverted https://github.com/pytorch/pytorch/pull/106406 on behalf of https://github.com/malfet due to Broke lots of tests on trunk ([comment](https://github.com/pytorch/pytorch/pull/106406#issuecomment-1731524917))
2023-09-22 14:32:34 +00:00
Brian Hirsh
b5d6e831a9 Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406)
Now that FunctionalTensor and `FunctionalTensorMode` are lower down in this stack, the changes in this PR are more mechanical: Everywhere in AOTAutograd that I used to use the C++ functionalization API, I now use the python functionalization API.

Note that this doesn't actually cause functionalization to run underneath torch_dispatch. I'm saving that re-ordering for later in the stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106406
Approved by: https://github.com/ezyang
ghstack dependencies: #108654, #109662, #109632, #109023
2023-09-22 07:09:04 +00:00
willfengg
772e104dfd [inductor] visualize fused ops in svg graph (#107752)
example usage
* `TORCH_COMPILE_DEBUG=1 INDUCTOR_ORIG_FX_SVG=1 INDUCTOR_POST_FUSION_SVG=1 python trig.py`: show original fx node name, file, and code. see snapshot 2 where we have origin_0, 1, 2
* trig.py can be found in P816304818

Implementation
* keep original fx graph in GraphLowering, ```self.orig_gm: torch.fx.GraphModule = gm.__copy__()```
* draw original fx graph with origins ir_post_fusion ```V.debug.draw_orig_fx_graph(self.orig_gm, self.scheduler.nodes)```. node.meta["buff_meta"] tracks buf_name

<img width="350" alt="Screenshot 2023-08-29 at 12 40 24 PM" src="https://github.com/pytorch/pytorch/assets/134637289/c4e197cb-ab3b-4a09-a584-c1356376accb">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107752
Approved by: https://github.com/mlazos
2023-09-21 08:03:05 +00:00
Brian Hirsh
238fb66085 python functionalization: support higher order ops (#108656)
We now have two types of functionalization, C++ Functionalization (through the `Functionalize` dispatch key), and python functionalization (through the `FunctionalTensorMode` torch_dispatch mode).

This means that all higher order ops need custom functionalization rules for the python variant too. I added them here, as well as a helper function `dispatch_functionalize()` - equivalent to `torch.func.functionalize()`, except that it uses `FunctionalTensorMode`.

In theory we could have secretly switched `torch.func.functionalize` to use `FunctionalTensorMode`. This would be BC-breaking, though, since `FunctionalTensorMode` isn't composable with the other functorch transforms (the functorch layer-mode stack doesn't know how to re-order torch_dispatch modes arbitrarily).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108656
Approved by: https://github.com/zou3519
ghstack dependencies: #109024, #109248
2023-09-20 04:37:31 +00:00
Brian Hirsh
25e81f19f3 reland "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)" (#109518)
Reland - the previous PR was reverted by internal with this error:
```
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/buck-out/v2/gen/fbcode/363cd7e240f5d021/caffe2/torch/fb/trainer/data_modules/tests/__test_dataloader__/test_dataloader#link-tree/torch/__init__.py", line 29, in <module>
    from ._utils_internal import _functionalize_sync as _sync
ImportError: cannot import name '_functionalize_sync' from 'torch._utils_internal'
```

I couldn't figure out why internal was unhappy with the import. One potential reason is that I see a build rule for *another* `_utils_internal.py` in the fb folder here ([link](https://www.internalfb.com/code/fbsource/[30ed85cd88409af98b7490be137aaa5dfd7afd01]/fbcode/caffe2/TARGETS?lines=444))

Rather than burn more time investigating, I confirmed internally that the error goes away if I move the util from `torch/_utils_internal.py` to `torch/_utils.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109518
Approved by: https://github.com/albanD
2023-09-19 13:25:24 +00:00
PyTorch MergeBot
49b18ae546 Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)"
This reverts commit 0ad595954a.

Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))
2023-09-17 20:57:41 +00:00
Aaron Gokaslan
247e2f8461 [BE]: Update ruff to v0.0.290 (#109435)
Updates our ruff linter to the latest and fixes a few false negatives along the way.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109435
Approved by: https://github.com/ezyang
2023-09-16 18:43:34 +00:00
Brian Hirsh
0ad595954a python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)
Added two new utils to help with turning python functionalization on in AOTAutograd (next PR):

(1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic

(2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917
Approved by: https://github.com/ezyang
ghstack dependencies: #106404
2023-09-15 20:19:25 +00:00
max
e066056414 fix 'Node' object is not iterable in functorch.compile.minifier (#103011)
Fixes #102169

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103011
Approved by: https://github.com/Chillee
2023-09-12 23:47:40 +00:00
redwrasse
9118073fe7 assign var for "not populated" str (#108844)
minor cleanup of assigning a variable to the 'not populated' string value referenced in several places in `vmapify_autograd_function`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108844
Approved by: https://github.com/zou3519
2023-09-12 20:53:48 +00:00
Animesh Jain
8b7b824dca [inductor][ac] preserve recompute tags through pattern matching (#107742)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107742
Approved by: https://github.com/eellison
2023-08-25 03:48:26 +00:00
soulitzer
3cc5c42a23 Fix aot sequence_nr to reset bwd flag (#107210)
The way the aot autograd sequence_nr tracking works is that we run the aot export logic, the dynamo captured forward graph is run under an fx.Interpreter, which iterates through the nodes of the forward graph while setting the `current_metadata`.
Since during backward what is run doesn't correspond to any node during forward, we fallback to the global `current_metadata`. And since this global metadata is ends up being shared between runs, that leads to weirdness if we forget to reset things, e.g., depending whether this is the first test run, the printed results will be different.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107210
Approved by: https://github.com/bdhirsh
2023-08-24 16:58:12 +00:00
Elias Ellison
918df10198 [Easy] use dtype.itemsize in partitions (#107749)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107749
Approved by: https://github.com/davidberard98
2023-08-24 16:07:05 +00:00
vasiliy
61fe49b8ed pt2: make aot_eager backend handle basic float8 operations (#107783)
Summary:

Reland of https://github.com/pytorch/pytorch/pull/107642 with a fix for tests on Windows.

Makes aot_eager backend of torch.compile handle basic float8 operations.

This is useful for float8 training UX.

Test Plan:

```
python test/test_quantization.py -k test_pt2_traceable_aot_eager
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107783
Approved by: https://github.com/albanD
2023-08-23 18:10:53 +00:00
PyTorch MergeBot
5025fb9213 Revert "pt2: make aot_eager backend handle basic float8 operations (#107642)"
This reverts commit 24147a8e1c.

Reverted https://github.com/pytorch/pytorch/pull/107642 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it is failing Windows CPU test in trunk. The Windows failures on your PR looks related I think ([comment](https://github.com/pytorch/pytorch/pull/107642#issuecomment-1688999380))
2023-08-22 22:17:36 +00:00
vasiliy
24147a8e1c pt2: make aot_eager backend handle basic float8 operations (#107642)
Summary:

Makes aot_eager backend of torch.compile handle basic float8 operations.

This is useful for float8 training UX.

Test Plan:

```
python test/test_quantization.py -k test_pt2_traceable_aot_eager
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107642
Approved by: https://github.com/albanD
2023-08-22 18:57:14 +00:00
Brian Hirsh
8c44cfef5e Add some support for detecting false aliasing in AOTAutograd (#106461)
This is a partial fix for https://github.com/pytorch/pytorch/issues/106457. In the examples with the shampoo optimizer that i ran, they were enough to remove the parameter aliasing in shampoo.

I added some new logic for detecting if two inputs have overlapping memory in specific cases: if they're both 2D tensors with stride 1. In that case (the case for shampoo), I try to compute a bunch of contiguous intervals on the two tensors, and check if any of the intervals overlap. In theory this is slow, since if our two tensors are e.g. of size (256, N), we'll need to create 256 intervals to check for overlap on. This seems... probably fine, since I think we do more egregious things in the compile stack to cause slowness. Open to suggestions though!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106461
Approved by: https://github.com/albanD
ghstack dependencies: #106460
2023-08-15 17:27:37 +00:00
Brian Hirsh
517ba2add7 AOTAutograd: allow input mutations on inputs that are non-contiguous (#106460)
Fixes https://github.com/pytorch/pytorch/issues/106456

I also had to update the logic in functionalization's resize_() kernel to convey to AOTAutograd that resize_() is a metadata mutation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106460
Approved by: https://github.com/ezyang
2023-08-15 17:27:37 +00:00
kshitij12345
cce2c52b0b [pt2] support vmap (#101707)
Teach dynamo about `vmap`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101707
Approved by: https://github.com/zou3519
2023-08-09 03:39:33 +00:00
Kshiteej K
a899333ffc fix: nll_loss batch rule with negative ignore_idx (#106118)
We use python decompositions instead of writing our own for batching rules.

Fixes https://github.com/pytorch/pytorch/issues/105736

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106118
Approved by: https://github.com/lezcano, https://github.com/zou3519
2023-08-04 07:43:02 +00:00
Alex Settle
9ba0558d48 Add sequence_nr to aot_autograd to map forward ops to their corresponding backward ops (#103129)
Fixes #102375

Sequence_nr increments in the forward pass and decrements in the backward pass.  Backward ops with the same sequence_nr as a forward op represent the backward implementation for the op.  The long term goal is to make this information available to the profiler so users can observe which ops are fused by the inductor openai triton kernels.

Added a test for this feature **test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_sequence_nr**.  The test case uses **aot_export_module()** to create a joint fwd/bwd fx graph.  Then it walks all the nodes in fx graph using fx_graph.graph.nodes.   The seq_nr of each node is recorded in node.meta.  During the fwd pass the seq_nr increments and it decrements during the bwd pass.  This allows the user to map forward ops to their corresponding bwd ops which is useful for performance analysis.

Expected output from the test case.

 SeqNr|OrigAten|SrcFn
0|aten.convolution.default|l__self___conv1
0|aten.add.Tensor|l__self___bn1
1|aten._native_batch_norm_legit_functional.default|l__self___bn1
2|aten.relu.default|l__self___relu1
3|aten.add.Tensor|add
4|aten.view.default|flatten
5|aten.t.default|l__self___fc1
6|aten.unsqueeze.default|l__self___fc1
7|aten.mm.default|l__self___fc1
8|aten.squeeze.dim|l__self___fc1
9|aten.add.Tensor|l__self___fc1
10|aten.sub.Tensor|l__self___loss_fn
11|aten.abs.default|l__self___loss_fn
12|aten.mean.default|l__self___loss_fn
12|aten.ones_like.default|
12|aten.expand.default|
12|aten.div.Scalar|
11|aten.sgn.default|
11|aten.mul.Tensor|
8|aten.unsqueeze.default|
7|aten.t.default|
7|aten.mm.default|
7|aten.t.default|
7|aten.t.default|
7|aten.mm.default|
6|aten.squeeze.dim|
5|aten.t.default|
4|aten.view.default|
2|aten.threshold_backward.default|
1|aten.native_batch_norm_backward.default|
0|aten.convolution_backward.default|
0|aten.add.Tensor|

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103129
Approved by: https://github.com/soulitzer
2023-08-02 00:52:52 +00:00
Brian Hirsh
4a549dd57a AOTAutograd: correctness fix when tracing custom autograd functions that alias inputs (#102992)
Fixes https://github.com/pytorch/pytorch/issues/102970. See the comment [here](https://github.com/pytorch/pytorch/issues/102970#issuecomment-1577223773) for details.

We normally treat "outputs that alias inputs" specially in AOTAutograd, by replaying the views at runtime, instead of baking them into the graph. For views that are part of custom autograd functions though, we can't do that view-replay, since it will clobber the backwards function that the user specified in their custom autograd.Function.

Right now in this PR, I distinguish between "aliased inputs that are normal views" vs. "aliased inputs that are views that came from an autograd.Function call" by checking the outputs `.grad_fn` field, to see if it inherits from our custom CBackward function class. Then I added a new `OutputType` enum value, that we effectively treat the "normal" way (the same way that we treat ordinary, non-aliased outputs). The new enum val is mostly for debugging - so we can print it and know that our graph had custom autograd.Function aliased outputs in it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102992
Approved by: https://github.com/ezyang, https://github.com/zou3519
2023-07-31 19:02:12 +00:00
XiaobingSuper
afd621ddde inductor: fix CSE issue when have symbolic shape input at the freezing path (#105651)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105651
Approved by: https://github.com/jgong5, https://github.com/eellison
2023-07-26 08:07:31 +00:00
PyTorch MergeBot
48cd8e29c1 Revert "Slightly improve AOTAutograd logging with ViewAndMutationMeta (#105702)"
This reverts commit cc137342d0.

Reverted https://github.com/pytorch/pytorch/pull/105702 on behalf of https://github.com/PaliC due to breaking internal export tests (relevant details shared with author) ([comment](https://github.com/pytorch/pytorch/pull/105702#issuecomment-1650492077))
2023-07-25 20:17:27 +00:00
Edward Z. Yang
cc137342d0 Slightly improve AOTAutograd logging with ViewAndMutationMeta (#105702)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105702
Approved by: https://github.com/albanD
2023-07-25 00:47:38 +00:00
Jason Ansel
c902b84e0b Compiled autograd (#103822)
This branch:
1) converts the autograd tape into an FX graph
2) caches that conversion using a "shadow" graph
3) compiles and runs the generated FX graph instead of the normal autograd

What works currently:
1) Caching, capture, and initial integration
2) Backwards hooks
3) Inlining AotAutograd generated subgraphs
4) torch.compiling the generated FX graph
5) Auto-detecting dynamic shapes based on changes

Future work
1) Larger scale testing
1) Boxed calling convention, so memory can be freed incrementally
1) Support hooks on SavedTensor
1) Additional testing by running eager autograd tests under compiled_autograd.enable()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103822
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-07-24 21:12:05 +00:00
Animesh Jain
0b11da0ccb [partitioners][ac][dynamic] Fix output signature of fwd with symints (#105771)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105771
Approved by: https://github.com/Chillee
2023-07-22 03:04:11 +00:00
Justin Chu
8a688277a2 [BE] Enable ruff's UP rules and autoformat dynamo / functorch and refs (#105432)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105432
Approved by: https://github.com/ezyang
2023-07-19 13:48:44 +00:00
Edward Z. Yang
2fa7d11b64 Immediately compile backwards graph in AOTAutograd if dynamic shapes (#104971)
Previously, we made backwards graph compilation lazy to avoid paying
for compilation if the user didn't actually end up using the backwards
graph.  This was useful in the old days when a lot of things in Inductor
didn't work and we could bypass errors this way.

However, this has a bad implication for dynamic shapes: the backwards
graph compilation can trigger extra guards, which are too late to
install in the Dynamo context if we wait until backwards is being run.
So in this PR I move us back to compiling backwards graph immediately
if we capture any SymInts for backwards.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104971
Approved by: https://github.com/Chillee
2023-07-17 15:37:17 +00:00
Nikita Shulga
5837e95d30 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`

Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-15 20:30:20 +00:00
PyTorch MergeBot
15fd1ea118 Revert "[Reland] Update mypy to 1.4.1 (#105227)"
This reverts commit c9c4f8efc3.

Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))
2023-07-14 22:28:35 +00:00
Nikita Shulga
c9c4f8efc3 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-14 20:45:12 +00:00
Edward Z. Yang
10cbc9a063 Enable cuda graphs for dynamic shapes (#105064)
The general idea is to do a separate CUDA graph for each size. Because of cuda graph trees, these graphs will all share the same memory pool, so your memory usage will only be the worst case memory usage of the biggest dynamic size you want. This requires an extra dispatch in the cudagraphified callable. You must pay for a CUDA graph recording for every dynamic size you encounter, but this is MUCH cheaper than running the entire PT2 compile stack, so I expect you to still see benefits.

This was surprisingly easy to do.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105064
Approved by: https://github.com/voznesenskym
2023-07-14 16:13:50 +00:00
PyTorch MergeBot
b4d91b1c5b Revert "[Typing] Fix PEP 484 Violation (#105022)"
This reverts commit 4148b7bada.

Reverted https://github.com/pytorch/pytorch/pull/105022 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/105022#issuecomment-1635967734))
2023-07-14 14:45:09 +00:00
Edward Z. Yang
979f826015 Read out real strides from compilation result, rather than real args (#105010)
This prefigures a refactor that will move the backward compilation
to entirely ahead of time, so I need to extract these strides some
other way.  Straight from the compiler's mouth will do it.

I can't easily get the information via the return result of `fw_compiler` without changing the calling convention, so instead I smuggle it via TracingContext. TracingContext may be None when we are compiling patterns for the joint graph pattern matcher.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105010
Approved by: https://github.com/shunting314
2023-07-12 11:33:08 +00:00
Nikita Shulga
4148b7bada [Typing] Fix PEP 484 Violation (#105022)
Not sure, how it worked before, but if arguments must be annotated is optional if they are defaulted to None

Towards enabling mypy-1.4.1 in lintrunner

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 5e1b9f4</samp>

> _We annotate the arguments of doom_
> _To show the `None` values of gloom_
> _We improve the type checking and readability_
> _With `Optional` annotations of metal-ity_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105022
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn, https://github.com/Skylion007
2023-07-12 10:20:48 +00:00