Commit Graph

686 Commits

Author SHA1 Message Date
Simon Fan
1d4adf4e1f [dynamo] log recompile reason to dynamo_compile (#146117)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146117
Approved by: https://github.com/bobrenjc93
2025-02-03 21:04:04 +00:00
Aaron Orenstein
23695ea002 Fix dynamo use of list[int] in graph break (#145554)
This reintroduces the change backed out by #145393 and fixes the underlying problem.

Although using a BuiltinVariable was better than nothing when we saw a GenericAlias it had problems if there was a graph break and we had to reconstruct the original python code which BuiltinVariable did as a simple `list` instead of a `list[int]`.

This changes it to use a TypingVariable instead and then teaches TypingVariable how to reconstruct.

Original commit changeset: 77b9193acb23

python test/dynamo/test_repros.py ReproTests.test_graph_break_on_jit_isinstance

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145554
Approved by: https://github.com/anijain2305
ghstack dependencies: #145551, #145552, #145553
2025-01-30 22:21:40 +00:00
IvanKobzarev
894ef8c1e3 [torchbench] Inductor freezing bfloat16 conv folding needs high tolerance (#145623)
Issue:
https://github.com/pytorch/pytorch/issues/144888

Torchbench of timm lcnet_050 model fails on accuracy in case of `--frezing` `--inference` `--bfloat16`
`res_error==0.12`
If to turn off convolution inductor constant folding - `res_error==0.016`

`float16 error ~ 0.00669`
`float16 without conv folding ~ 0.0018`

convolution folding results in increase of error almost at one order of magnitude.

I think we should revisit and try to do something to improve the accuracy for conv folding.
E.g. For example doing conv folding at compilation time with float64?

At the moment I am adding counters to identify if convolution folding happened, and in case of bfloat16 and conv_folding - increase multiplier to the max level (10) to pass accuracy test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145623
Approved by: https://github.com/eellison
2025-01-30 12:46:35 +00:00
PyTorch MergeBot
1185b81c51 Revert "[dynamo] Use polyfill to implement comparison operators (#144485)"
This reverts commit d1f82de2bf.

Reverted https://github.com/pytorch/pytorch/pull/144485 on behalf of https://github.com/huydhn due to This seems to break dynamo tests in trunk after landing ([comment](https://github.com/pytorch/pytorch/pull/144485#issuecomment-2622893294))
2025-01-29 21:30:42 +00:00
Animesh Jain
d1f82de2bf [dynamo] Use polyfill to implement comparison operators (#144485)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144485
Approved by: https://github.com/jansel
2025-01-29 17:37:40 +00:00
Colin L. Rice
c1161957a4 inductor_config_logging: Don't drop keys (#144700)
This bit me while I was trying to debug some trace issues.
In general this config is already quite large when dumping, so adding
more fields doesn't make it significantly worse.

Also a number of the items we are type checking for (except the test
configs), don't even show up. Primarily this will help us when debugging
rocm, halide, and trace configs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144700
Approved by: https://github.com/ezyang
2025-01-27 23:47:25 +00:00
Animesh Jain
7e1c7253e9 [dynamo][builtin-skipfile-cleanup] Support tuple.__new__ (#145558)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145558
Approved by: https://github.com/jansel, https://github.com/StrongerXi
ghstack dependencies: #145519, #145547
2025-01-27 21:42:43 +00:00
Animesh Jain
ef60de07a0 [dynamo] Log guard latency (#145132)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132
Approved by: https://github.com/ezyang
ghstack dependencies: #145509
2025-01-25 03:01:18 +00:00
Aishwarya Sivaraman
457facf7e2 [caffe2] Use the manifold cache backend as the default (#144773)
Test Plan: CI

D68155591

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144773
Approved by: https://github.com/izaitsevfb
2025-01-24 19:48:34 +00:00
PyTorch MergeBot
6f60c65a3a Revert "[dynamo] Log guard latency (#145132)"
This reverts commit 0a310d7388.

Reverted https://github.com/pytorch/pytorch/pull/145132 on behalf of https://github.com/anijain2305 due to CI failures observed after PR was merged ([comment](https://github.com/pytorch/pytorch/pull/145132#issuecomment-2611268421))
2025-01-24 00:11:50 +00:00
Animesh Jain
0a310d7388 [dynamo] Log guard latency (#145132)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132
Approved by: https://github.com/ezyang
ghstack dependencies: #145351, #145420
2025-01-23 23:30:07 +00:00
Aaron Orenstein
a79100ab11 PEP585 update - torch/_dynamo (#145105)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105
Approved by: https://github.com/bobrenjc93
2025-01-18 20:47:11 +00:00
Colin L. Rice
95c363cc9b dynamo: Don't crash with internal error if getattr on a tensor fails (#144817)
This prevents crashes when getattr is called on a tensor for something
which doesn't exist.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144817
Approved by: https://github.com/williamwen42, https://github.com/jansel
2025-01-16 22:04:06 +00:00
James Wu
e58c823ab8 Implement increment and add_to_set for CompileEventLogger (#143427)
This diff implements `increment` and `add_to_set`, which are features of MetricsContext, but not ChromiumEventLogger. This allows us to add a bunch of other metricscontext callsites to use CompileEventLogger instead.

Differential Revision: [D67354867](https://our.internmc.facebook.com/intern/diff/D67354867/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143427
Approved by: https://github.com/masnesral
2025-01-14 02:42:49 +00:00
bobrenjc93
1fe3af2c68 Migrate from Tuple -> tuple in torch/_dynamo (#144261)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144261
Approved by: https://github.com/aorenste, https://github.com/zou3519
2025-01-10 07:45:57 +00:00
Colin L. Rice
73278e6a5d easy: sort dictionary keys for inductor config when publishing (#143307)
This means we should get consistent logging strings for the same
config on different ranks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143307
Approved by: https://github.com/xmfan
2025-01-09 18:01:20 +00:00
Aaron Gokaslan
373541fbf4 [BE]: Remove unnecessary copy of gradients in util (#144329)
No need to copy gradients to CPU too

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144329
Approved by: https://github.com/awgu, https://github.com/cyyever
2025-01-08 16:52:15 +00:00
James Wu
f2d6cfa677 Introduce CompileEventLogger, replace usages of metrics_context and chromium_event with it (#143420)
**Problem statement**: I want to be able to centralize and simplify the process by which people add columns/data to existing spans. We have MetricsContext and ChromiumEventLogger, and there's various choices you can make to decide where and when to log different levels of observability for your events. To resolve this, I want a central API for "adding to events under dynamo_timed".

**CompileEventLogger** is intended as a frontend for MetricsContext and ChromiumEventLogger so we can use the same class for handling everything.

CompileEventLogger is intended be used within a `dynamo_timed()` context. Its purpose is to 1. log to existing events that are in progress (i.e. within dynamo_timed), and 2. log instant events to chromium that are independent of any specific span.

CompileEventLogger has three log levels:

- CHROMIUM: Log only to chromium events, visible via tlparse.
- PT2_COMPILE: Log to chromium_events + pt2_compile_events
- COMPILATION_METRIC: Log to compilation metrics in addition to the toplevel chromium and pt2_compile_event.

In addition, we have a function CompileEventLogger.add() that automagically chooses the correct log level. For now, it is conservative, and will never automagically choose to log CompilationMetrics (though I could imagine it figuring out the metadata are all keys in CompilationMetric and therefore loggable there).

The goal here is to make one single interface to log stuff for observability reasons, and make it as easy as possible.

Not included in this diff:
- V1 of this diff will not have implementations of `increment` and `add_to_set` which MetricsContext has, so those usages are not replaced yet. But I'll add those in a followup.

- We don't handle `RuntimeMetricsContext`. It's unclear if I want that to be part of this, because under RuntimeMetricsContext there might not be a toplevel event to log to, so chromium events doesn't make sense in that context. So I might leave that separate for now.

Differential Revision: [D67346203](https://our.internmc.facebook.com/intern/diff/D67346203/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143420
Approved by: https://github.com/aorenste
2025-01-04 22:40:34 +00:00
Animesh Jain
816328fa51 [dynamo][lazy] LazyVT utils to get original value/source and is_hashable (#144160)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144160
Approved by: https://github.com/williamwen42, https://github.com/jansel
ghstack dependencies: #144129, #144130, #144141, #144158, #144163
2025-01-04 06:23:05 +00:00
Animesh Jain
3292220c43 [dynamo][easy] Move symnode helpers to utils (#144158)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144158
Approved by: https://github.com/williamwen42, https://github.com/jansel
ghstack dependencies: #144129, #144130, #144141
2025-01-04 02:52:58 +00:00
Animesh Jain
e296bab614 [dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722)
In hinsight, we never needed a DICT_SUBCLASS_GUARD_MANAGER, because Dynamo would inline through the overridden keys method. In this PR, we ensure that while creating guards and constructing variable trackers, we get the `d.keys()` value by using `dict.keys(d)`. This ensures that we do not call overridden keys method. Therefore, the C++ guard can use `PyDict_Next` directly to check the guards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143722
Approved by: https://github.com/jansel
2024-12-27 04:51:35 +00:00
PyTorch MergeBot
26364428f5 Revert "[dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722)"
This reverts commit fe95cbe018.

Reverted https://github.com/pytorch/pytorch/pull/143722 on behalf of https://github.com/wdvr due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/143722#issuecomment-2563127017))
2024-12-26 22:04:36 +00:00
Animesh Jain
fe95cbe018 [dynamo] Remove DICT_SUBCLASS_GUARD_MANAGER and use dict.keys (#143722)
In hinsight, we never needed a DICT_SUBCLASS_GUARD_MANAGER, because Dynamo would inline through the overridden keys method. In this PR, we ensure that while creating guards and constructing variable trackers, we get the `d.keys()` value by using `dict.keys(d)`. This ensures that we do not call overridden keys method. Therefore, the C++ guard can use `PyDict_Next` directly to check the guards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143722
Approved by: https://github.com/jansel
2024-12-24 02:00:18 +00:00
Sam Larsen
4271a95590 [logging] A few fixes/updates to record_compilation_metrics (#143332)
Summary: Mostly cosmetic, but one bug fix:
* Bug fix: Make sure compile_id is converted to a string in the compilation metrics so it's printed as, e.g., "0/1" instead of "[0, 1]"
* Sort collections in `collection_to_str`
* Print non-string elements as `"<unknown>"` instead of None (since we don't expect non-strings)
* Move the population of the legacy metrics and any pre-processing to a new factory method in CompilationMetrics

Test Plan:
```
python test/dynamo/test_structured_trace.py
python test/dynamo/test_utils.py
```
Internal testing: https://fburl.com/scuba/dynamo_compile/sandbox/l0me8auf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143332
Approved by: https://github.com/ppanchalia
2024-12-23 23:10:11 +00:00
Oguz Ulgen
dc55704b48 Rename cache limit to recompile limit in configs (#143709)
This PR renames every cache_limit to recompile_limit via sed.

Old config options are maintained via Config(alias='xyz')

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709
Approved by: https://github.com/jansel
2024-12-22 10:03:57 +00:00
Animesh Jain
4627cfd1f9 [dynamo] Support user defined dicts (#143548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143548
Approved by: https://github.com/yanboliang, https://github.com/jansel, https://github.com/williamwen42
2024-12-21 01:46:14 +00:00
Simon Fan
d88ebbf822 cleanup chromium event log on dynamo exit rather than on entry (#143175)
clearing at dynamo start is an issue because it throws away events from compiled autograd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143175
Approved by: https://github.com/Skylion007, https://github.com/jamesjwu
ghstack dependencies: #141907
2024-12-21 00:41:24 +00:00
PyTorch MergeBot
ad7ab5ef84 Revert "[logging] A few fixes/updates to record_compilation_metrics (#143332)"
This reverts commit a9c753bbc8.

Reverted https://github.com/pytorch/pytorch/pull/143332 on behalf of https://github.com/malfet due to Surprisingly failure is caused by this PR ([comment](https://github.com/pytorch/pytorch/pull/143332#issuecomment-2557899120))
2024-12-21 00:06:44 +00:00
Sam Larsen
a9c753bbc8 [logging] A few fixes/updates to record_compilation_metrics (#143332)
Summary: Mostly cosmetic, but one bug fix:
* Bug fix: Make sure compile_id is converted to a string in the compilation metrics so it's printed as, e.g., "0/1" instead of "[0, 1]"
* Sort collections in `collection_to_str`
* Print non-string elements as `"<unknown>"` instead of None (since we don't expect non-strings)
* Move the population of the legacy metrics and any pre-processing to a new factory method in CompilationMetrics

Test Plan:
```
python test/dynamo/test_structured_trace.py
python test/dynamo/test_utils.py
```
Internal testing: https://fburl.com/scuba/dynamo_compile/sandbox/l0me8auf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143332
Approved by: https://github.com/ppanchalia
2024-12-20 21:42:32 +00:00
bobrenjc93
8850a7b62c add some logging for tensorify (#143391)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143391
Approved by: https://github.com/jamesjwu
2024-12-19 20:06:26 +00:00
qiurc
90cc43f270 Support garbage collection after pt2 compilation (#143364)
Summary:
Support garbage collection after pt2 compilation.
Add jk to control the global rollout / rollback of this functionality
Add env var to control individual job's rollout

Test Plan:
Test the model training job with / without this changes

Reviewers:
@yuxihu @ezyang , @Yuzhen11 ,

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143364
Approved by: https://github.com/ezyang
2024-12-18 07:25:11 +00:00
Sam Larsen
60c54467db [logging] Log runtime autotuning timing to scuba (#141919)
See test plan in internal diff [D66679369](https://our.internmc.facebook.com/intern/diff/D66679369)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141919
Approved by: https://github.com/jamesjwu, https://github.com/ezyang
2024-12-13 21:22:13 +00:00
Tom Ritchford
dc23f1944a Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-12 17:39:14 +00:00
Sam Larsen
30b61e521c [logging] Populate compile_time_autotune_time_us (#143104)
See testing in attached diff

Differential Revision: [D67128210](https://our.internmc.facebook.com/intern/diff/D67128210)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143104
Approved by: https://github.com/ezyang
2024-12-12 17:08:43 +00:00
PyTorch MergeBot
5c97ac9721 Revert "Remove unused Python variables in torch/[_-a]* (#133492)"
This reverts commit fda975a7b3.

Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))
2024-12-11 17:29:12 +00:00
Tom Ritchford
fda975a7b3 Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-10 21:48:44 +00:00
Sam Larsen
a751558467 [logging] Fix bug involving missing compilation_metrics fields in tlparse logs (#142423)
Summary: The line of code that's compiling the set of compilation_metrics to include in the corresponding tlparse log is missing the "legacy" and "common" fields populated above. Fix is to make sure we consider all fields in the compilation_metrics object.

Test Plan:
Before: https://fburl.com/d6em8csg (e.g, https://fburl.com/c19s7ny0)
After: https://fburl.com/5zr6kbvf (e.g, https://fburl.com/3hp14ht2)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142423
Approved by: https://github.com/ezyang
2024-12-10 15:58:43 +00:00
Sam Larsen
692b5e75ed [logging] Add triton_compile_time_us column to dynamo_compile (#142068)
Test Plan: See internal diff [D66799565](https://www.internalfb.com/diff/D66799565)

Differential Revision: [D66799565](https://our.internmc.facebook.com/intern/diff/D66799565)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142068
Approved by: https://github.com/c00w
2024-12-06 16:11:57 +00:00
James Wu
288b73cb14 [Redo] Set remote cache version and backend type once in compilation metrics (#141967)
(Got reverted due to a silly bug, fixed now.)

This is causing FbFxGraphRemoteCache.init to no longer be idempotent, i.e. only safe to call once per compile. AOTAutogradCache initializes a new remote cache for the forward and the backward.
Technically, we could make AOTAutogradCache smart and globally thread through a single FbFxGraphRemoteCache everywhere. But there's no reason to do so, as this class is just the handle to access the cache. Plus, it's very brittle for FbFxGraphRemoteCache to not be safe to call multiple times

Differential Revision: [D66701970](https://our.internmc.facebook.com/intern/diff/D66701970/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141967
Approved by: https://github.com/laithsakka
2024-12-04 03:07:53 +00:00
Colin L. Rice
6b620423a3 dynamo_timed: Add a log_waitcounter option. (#141402)
This logs a waitcounter of the name pytorch.dynamo_timed.{key}.

Primarily sending this now to make sure everyone likes the API, then
I'll add tests, and migrate one dynamo_timed to use it. (likely starting
with
https://github.com/pytorch/pytorch/pull/141379).

Testing is a bit harder, since we don't normally have any way to read
_WaitCounter state AFAICT. I want to poke around and see if I can figure
out a way to read the state, otherwise I'll just mock it to at least
make sure it's mostly working.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141402
Approved by: https://github.com/jamesjwu, https://github.com/masnesral
2024-12-03 19:24:29 +00:00
Ryan Guo
7c3c8a662e [dynamo] Add RANGE_ITERATOR_MATCH to properly guard on range iterators (#141902)
A subsequeunt patch attempts to fix a side-effect issue for range
iterators, which in turn exposed an exising issue on guards for range
iterators -- the following test started failing:
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/test_tensor_creation_ops.py TestTensorCreationCPU.test_hstack_column_stack_cpu_int16
```

This patch adds a `RANGE_ITERATOR_MATCH` guard to make sure that we
properly guard on range iterators, and adds a regression test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141902
Approved by: https://github.com/jansel
ghstack dependencies: #141713, #141714, #141715
2024-12-03 09:18:06 +00:00
PyTorch MergeBot
ce86119503 Revert "Set remote cache version and backend type once in compilation metrics (#141707)"
This reverts commit d633cf1f55.

Reverted https://github.com/pytorch/pytorch/pull/141707 on behalf of https://github.com/malfet due to It breaks tests by referencing FbRemoteFxGraphCache, but CI was green ([comment](https://github.com/pytorch/pytorch/pull/141707#issuecomment-2513555185))
2024-12-03 05:01:02 +00:00
Aaron Gokaslan
08db735629 [BE]: Update mypy to 1.13.0 (#140808)
Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-12-03 02:50:10 +00:00
James Wu
d633cf1f55 Set remote cache version and backend type once in compilation metrics (#141707)
This is causing FbFxGraphRemoteCache.init to no longer be idempotent, i.e. only safe to call once per compile. AOTAutogradCache initializes a new remote cache for the forward and the backward.
Technically, we could make AOTAutogradCache smart and globally thread through a single FbFxGraphRemoteCache everywhere. But there's no reason to do so, as this class is just the handle to access the cache. Plus, it's very brittle for FbFxGraphRemoteCache to not be safe to call multiple times.

(Same problem, different fix of D66502138)

Differential Revision: [D66508492](https://our.internmc.facebook.com/intern/diff/D66508492/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141707
Approved by: https://github.com/ezyang
2024-12-03 01:49:11 +00:00
PyTorch MergeBot
daa77f3d9f Revert "[BE]: Update mypy to 1.13.0 (#140808)"
This reverts commit 00134d68af.

Reverted https://github.com/pytorch/pytorch/pull/140808 on behalf of https://github.com/huydhn due to This is failing a distributed test in trunk, target determination missed this test and did not run it on PR ([comment](https://github.com/pytorch/pytorch/pull/140808#issuecomment-2512788426))
2024-12-02 20:47:43 +00:00
Aaron Gokaslan
00134d68af [BE]: Update mypy to 1.13.0 (#140808)
Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-12-02 18:47:54 +00:00
Ryan Guo
3141e038f0 [dynamo] Fix VariableBuilder._wrap on frozenset and enforce invariants on ConstantVariable (#141504)
Prior to this patch, we are using `ConstantVariable.create` to create VT
for frozenset objects, and intended yet failed to predicate that on all
itmes being literals (see https://github.com/pytorch/pytorch/pull/140984#discussion_r1847393736).

The code was from https://github.com/pytorch/torchdynamo/commit/7c03434 and
the original goal was to help DBR quantization, but as the new test in
this patch shows, it could lead to silent incorrectness.

Upon a closer look, this exposes some subtleties in how Dynamo handles
`ConstantVariable` and `LOAD_CONST`, so this patch both fixes the
aforementioned issue and documents, enforces, and makes explicit the
invariants around `ConstantVariable` and `LOAD_CONST` -- only immutable
objects are supported.

Specifically, this patch:
1. refine the checks for wrapping a `frozenset` object, document why we
   can't just wrap its items directly due to lack of `Sourcec` for set
   items, and use a safe workaround (`SourcelessBuilder`) to ensure
   soundness while keeping the DBR quantization support.
2. Adds more types to `common_constant_types`, thereby making
   `ConstantVariable.is_base_literal` more lenient, and strictly checks
   this property in the constructor of `ConstantVariable`.
3. Change relevant uses of `create_instruction("LOAD_CONST", ...)` to
   `create_load_const` which checks `is_safe_constant`, and makes
   developer overrides explicit by using `create_load_const_unchecked`
   when needed.
4. In a few places, use more specific `VariableTracker`, e.g.,
   `TypingVariable` rather than `ConstantVariable`, and
   `FrozensetVariable` rather than `SetVariable`.

(2) and (3) are mainly to future-proof Dynamo against bugs like (1).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141504
Approved by: https://github.com/jansel
2024-11-27 21:58:35 +00:00
James Wu
a7ca6a9113 Enable autograd cache on inductor tests (#140890)
This turns on AOTAutogradCache for all inductor tests. It clears AOTAutogradCache on each test as well, by virtue of the local cache using the same directory to store cache entries.

I've also tested with INDUCTOR_TEST_DISABLE_FRESH_CACHE=1, running all the tests. AOTAutogradCache successfully caches 99% of these. There are a few tests that use view_replay and therefore save functional tensors, which cause AOTAutogradCache to fail to pickle its result. Will look into next steps there, but for now, it seems okay if the cache just misses on those cases where it can't serialize the result. It would be better to check before pickling, though.

I've made the following small bugfixes to get this working:
- Inductor is sometimes used in a standalone mode without dynamo, which leads to attribute errors in check_can_cache. In general, we should *never* crash in cache checking, only bypass. So I change a try catch to check Exception instead of just a specific exception.
- Add extra structured logging for metadata on cache hits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140890
Approved by: https://github.com/bdhirsh
2024-11-27 20:41:43 +00:00
Xuehai Pan
cdde73033e [dynamo] fix generic namedtuple support when the class is created via class MyTuple(NamedTuple, Generic[T]): ... (#141360)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141360
Approved by: https://github.com/jansel
2024-11-27 00:21:58 +00:00
Sam Larsen
07906f2f2b [logging] Move population of common MetricsContext fields to record_compilation_metrics (#141291)
Summary: Fix outstanding TODOs related to logging of CompilationMetrics by moving the population of common fields to record_compilation_metrics() instead of populating those independently wherever we use a the metrics_context contextmanager:
* Keep track of start and end time in MetricsContext and pass those to record_compilation_metrics() and populate those fields in that function.
* Pass exception info to record_compilation_metrics() and populate those field in that function.
* Add a new contextmanager, chromium_event_timed, to create the start/end "dynamo" event. This is important because I want this contextmanager to complete _after_ building the CompilationMetrics.
* Populate the compile_id field centrally in record_compilation_metrics().
* Populate the structured_logging_overhead centrally in record_compilation_metrics().
* Add the CompilationMetrics to the current chromium event in record_compilation_metrics(), after all common fields have been added. In a future diff, I can also add _all_ compilation metrics to the chromium event.

Test plan: Unit tests. Also see internal testing:
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/jrascnf9
* pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/l3jnla06
* tlparse: https://fburl.com/bq5a9nqs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141291
Approved by: https://github.com/jamesjwu
2024-11-25 13:18:40 +00:00
Colin L. Rice
1aea642393 pytorch/feature: Record if inductor fx cache is enabled (#141059)
This uses the underlying infrastructure and records if the fx cache is
enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141059
Approved by: https://github.com/masnesral
2024-11-23 01:55:27 +00:00
Jovian Anthony Jaison
45d62d6fc5 [dynamo] Added cuda and triton versions to dynamo_compile (#141290)
Opening another PR since #141140 was reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141290
Approved by: https://github.com/masnesral
2024-11-22 20:04:42 +00:00
Simon Fan
db4e8a1d8a [ca] expose option to collect sizes as dynamic (#141153)
This is to address recompiles from eager nodes that saved dynamic activations

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141153
Approved by: https://github.com/jansel
ghstack dependencies: #141152
2024-11-22 19:26:27 +00:00
Colin L. Rice
f5d00f1456 pytorch/features: Make a feature logger and record triton bundling (#141056)
This modifies metrics_context to allow us to store whether a feature was
used or not.

This also starts recording this for triton bundling.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141056
Approved by: https://github.com/masnesral
2024-11-22 01:31:08 +00:00
Prajesh Praveen Anchalia
4e34fbdcbc Add inductor_fx_graph_cache stats to dynamo_utils (#141190)
Summary:
Add the following inductor fx graph cache stats to dynamo compile

- inductor_fx_cache_hit_count
- inductor_fx_cache_miss_count
- inductor_fx_cache_backend_type
- inductor_fx_cache_hit_keys
- inductor_fx_cache_miss_keys
- remote_cache_version

Test Plan: Run local tests and staging logger: P1683061460

Differential Revision: D66232206

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141190
Approved by: https://github.com/masnesral
2024-11-21 20:59:10 +00:00
Ivan Zaitsev
149677e30c Revert "[dynamo] Added cuda and triton versions to dynamo_compile" (#141280)
Reverts pytorch/pytorch#141140

reason: conflicts with https://github.com/pytorch/pytorch/pull/141190 and wasn't merged using mergebot

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141280
Approved by: https://github.com/clee2000, https://github.com/kit1980
2024-11-21 20:50:06 +00:00
Jovian Anthony Jaison
11d0ba068f
[dynamo] Added cuda and triton versions to dynamo_compile (#141140)
[dynamo] Added cuda and triton versions to dynamo_compile (#141140)

Summary:

Add cuda and triton versions to dynamo_compile logging site.

Test Plan:
$ buck2 run mode/opt //scripts/oulgen:runner
File changed: fbcode//caffe2/torch/_dynamo/convert_frame.py
Buck UI: https://www.internalfb.com/buck2/1a8ada1f-d54e-44b2-a368-b2ff2030e113
Network: Up: 65KiB  Down: 0B  (reSessionID-8f4d1d6d-a680-4ecc-8e73-c29c932d824b)
Jobs completed: 2166. Time elapsed: 7.0s.
Cache hits: 0%. Commands: 3 (cached: 0, remote: 0, local: 3)
BUILD SUCCEEDED
...
Cuda: 12.4.0
Triton: 3.0.0

Reviewed By: masnesral

Differential Revision: D66181508
2024-11-21 12:20:02 -08:00
Aaron Gokaslan
12e95aa4ee [BE]: Apply PERF401 autofixes from ruff (#140980)
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-11-20 17:52:07 +00:00
Sam Larsen
ff17d2b83e [easy][logging] Remove dynamo_timed fwd_only param (#140993)
Summary: It's ignored; remove it

Test Plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140993
Approved by: https://github.com/ezyang
2024-11-20 02:31:51 +00:00
Prajesh Praveen Anchalia
1e234e63b3 [pytorch][dynamo_compile] Log inductor config to dynamo_compile (#140790)
Summary:
Scrubbed inductor config logging to dynamo_compile as json:str.

Scrub RE: `r'((^TYPE_CHECKING$)|(.*_progress$)|(.*TESTING.*)|(.*(rocm|halide).*)|(^trace\..*)|(^_))'`to save some space.

Test Plan:
Staging logger: https://fburl.com/data/ltkt08zm

P1679697917

{F1958428018}

Differential Revision: D65806399

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140790
Approved by: https://github.com/masnesral
2024-11-19 02:39:33 +00:00
James Wu
8d5b3eeaa6 Remove __start__ stack, log backward compile to empty stack (#140431)
Summary:
This diff removes "__start__" from all stacks in Pt2 Compile Events, as it's unnecessary.

It also starts logging events for backward compile, because otherwise we have no toplevel event representing full backward compilation. This gives us a toplevel event outside of the inductor compile.

Test Plan:
New chromium events:

https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json#!/viewer?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json&local_cache_key

New tlparse:
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/jjwu/custom/stuff4/index.html

New scuba icicle view, still good: https://fburl.com/scuba/pt2_compile_events/z6gr3z53

Differential Revision: D65832045

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140431
Approved by: https://github.com/masnesral
2024-11-18 22:48:31 +00:00
Bob Ren
e1d6c08f3d Specialize symfloats when getting fake value involves complex args (#140832)
Fixed `PYTORCH_TEST_WITH_DYNAMO=1 tlp python test/test_sparse_csr.py TestSparseCSRCPU.test_sampled_addmm_cpu_complex64` when `specialize_float=False`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140832
Approved by: https://github.com/ezyang
ghstack dependencies: #140830
2024-11-17 18:17:54 +00:00
Xuehai Pan
90d3584147 [dyanmo] support subclasses of namedtuple type (#140534)
Allow subclassing namedtuple type. Allow assign attributes to instances of these subtypes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140534
Approved by: https://github.com/jansel
2024-11-17 14:13:40 +00:00
Sam Larsen
e2e67a010a [logging] Add dynamo_compile fields for pre-dispatch/joint/post-dispatch times (#140306)
Tested internally: P1679622670

Differential Revision: [D65986059](https://our.internmc.facebook.com/intern/diff/D65986059)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140306
Approved by: https://github.com/ezyang
2024-11-15 15:02:08 +00:00
Sam Larsen
b11ff3cf60 [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)
Here's the overview:

There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits.

Some specifics:
* Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile).
* Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed.
* Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead.
* `record_compilation_metrics` is now called on exit from MetricsContext.
* Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`.
* Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext.

And specifically, several changes to dynamo_timed:
* "Modernize" the parameters and update all callsites accordingly.
* Move the backwards logging of the CompilationMetrics to the backwards compile location.
* Add a parameter for which CompilationMetrics field to update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849
Approved by: https://github.com/ezyang
2024-11-14 19:11:20 +00:00
Prajesh Praveen Anchalia
9ff368c270 [pytorch] Add logger for pt2 compile chromium events to hive (#139941)
Summary:
X-link: https://github.com/pytorch/benchmark/pull/2535

Logging raw chromium events to hive per job run enables us to build combined rank perfetto traces without having to depend on Logarithm and deal with things like rate limits etc.

We can easily build a utility to query hive and upload traces to manifold and view them on perfetto

Test Plan:
Launch a job

```
buck2 run mode/opt //aps_models/examples/dlrm:dlrm_train_app -- --config-name train_mast_fsdp_torchdynamo launcher.data_project=apf_ai_infra launcher.fbl_entitlement=ai_infra_training_rnd_tc  launcher.hardware=TC_ANY_80G
```

Local run
```
Perfetto: ['https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https://interncache-all.fbcdn.net/manifold/pt2_compile_traces_test/tree/pt2_trace_files/aps-ppanchalia-426838c277/0/0/2bc9975d-921c-4766-9cb2-e7ce9833ae96.json']
```

{F1954710538}

Differential Revision: D65525513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139941
Approved by: https://github.com/jamesjwu
2024-11-14 18:27:38 +00:00
Laith Sakka
f98c601efe Avoid logging zeros (#139968)
Summary: title

Test Plan: NA

Differential Revision: D65582953

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139968
Approved by: https://github.com/zou3519
2024-11-14 15:46:49 +00:00
PyTorch MergeBot
d63eb3c46c Revert "[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)"
This reverts commit cb15c15157.

Reverted https://github.com/pytorch/pytorch/pull/139849 on behalf of https://github.com/kit1980 due to Breaking an internal tests + there is a bug according to the author ([comment](https://github.com/pytorch/pytorch/pull/139849#issuecomment-2474459094))
2024-11-13 18:47:51 +00:00
Sam Larsen
cb15c15157 [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)
Here's the overview:

There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits.

Some specifics:
* Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile).
* Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed.
* Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead.
* `record_compilation_metrics` is now called on exit from MetricsContext.
* Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`.
* Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext.

And specifically, several changes to dynamo_timed:
* "Modernize" the parameters and update all callsites accordingly.
* Move the backwards logging of the CompilationMetrics to the backwards compile location.
* Add a parameter for which CompilationMetrics field to update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849
Approved by: https://github.com/ezyang
ghstack dependencies: #140094
2024-11-11 14:24:23 +00:00
Animesh Jain
86792a5a8d [invoke_subgraph] User facing API to support arbitrary args and kwargs (#139162)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139162
Approved by: https://github.com/zou3519
2024-11-08 03:31:19 +00:00
Bob Ren
d8afa21ef2 specialize symfloats for wrapped_gradient in get_fake_value (#139935)
Fixes `PYTORCH_TEST_WITH_DYNAMO=1 python test/test_torch.py TestTorchDeviceTypeCPU.test_gradient_type_promotion_cpu` when `specialize_float=False`

Reviewers might wonder why we need to have this whitelist. Can't we rely on python_arg_parser.h to do the specialization generically? Alas this path doesn't actually FFI to C++ so we do need to do the specialization in pythonland.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139935
Approved by: https://github.com/ezyang
ghstack dependencies: #139569, #139457, #139568, #139572, #139846, #139454, #139896
2024-11-07 20:27:02 +00:00
Oguz Ulgen
1270c78268 Add logging for num_triton_bundles (#139807)
Summary: Adding logs for number of inductor cache triton bundles

Test Plan:
Ran adhoc code and looked at dynamo_compile/sandbox

https://fburl.com/scuba/dynamo_compile/sandbox/nhktfy19

Differential Revision: D65490826

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139807
Approved by: https://github.com/masnesral
2024-11-06 21:11:04 +00:00
Laith Sakka
3f248a5735 Classify miss-inplaced tensors in logs. (#139240)
Summary:
use signpost logs,
a followup is to remove the field possibly_missed_reinplacing_opportunities form dynamo compile table.

Differential Revision: D65180194

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139240
Approved by: https://github.com/zou3519
2024-11-04 23:56:14 +00:00
Bob Ren
9919932783 Specialize symfloats that flow through is_integer (#139572)
Fixes `python test/dynamo/test_dynamic_shapes.py DynamicShapesFunctionTests.test_number_method_method_is_integer_num_type6_dynamic_shapes` when specialize_float = False

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139572
Approved by: https://github.com/ezyang
ghstack dependencies: #139569, #139457, #139568
2024-11-04 23:35:35 +00:00
James Wu
c8a648d4df Add option to dynamo_timed and chromium_event_logger for logging pt2 compile events (#139309)
This diff considerably changes the column format of PT2 Compile Events:

- Now, instead of logging one new column per every piece of metadata, we just log a single column, "metadata". This vastly decreases the number of columns we need to log, which should help with retention.

- Now, we only log to scuba for a set of dynamo_timed() events that we actually care about aggregating. To do so, we add a boolean to dynamo_timed() that decides whether or not to log a pt2_compile_event. We'll always log a chromium_event for every dynamo_timed(), but only log a subset of those to scuba.

Differential Revision: [D65225598](https://our.internmc.facebook.com/intern/diff/D65225598/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139309
Approved by: https://github.com/oulgen
2024-11-01 02:40:25 +00:00
James Wu
864beebb41 [easy] Add start event metadata to collected metadata for PT2 Compile Events (#139289)
We should be logging metadata from event starts to PT2 Compile Events too.

Differential Revision: [D65070086](https://our.internmc.facebook.com/intern/diff/D65070086/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139289
Approved by: https://github.com/oulgen
2024-10-31 16:52:30 +00:00
Simon Fan
fd9f4e6770 Back out "[compiled autograd] tls access helpers (#138061)" and Back out "[compiled autograd] Compiled autograd configs in TLS (#137821)" (#139086)
Summary:
Original commit changeset: 9bf80c1492d7

Original Phabricator Diff: D64796226

Original commit changeset: aa1d9ef8f6e6

Original Phabricator Diff: D64796212

Differential Revision: D65072644

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139086
Approved by: https://github.com/malfet
2024-10-28 23:37:05 +00:00
William Wen
35be6aef69 [dynamo] add some cpython debugging methods (#138030)
This PR enables you to inspect PyObjects in C using `INSPECT(...)` without requiring https://docs.python.org/3/howto/gdb_helpers.html. `torch._dynamo.eval_frame.raise_sigtrap` can also be used to set gdb breakpoints while running Python code, e.g.

```python
x = x + 1
torch._dynamo.eval_frame.raise_sigtrap();
# can breakpoint on ceval.c:CALL to breakpoint the `sin` call in C.
x = torch.sin(x)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138030
Approved by: https://github.com/jansel
2024-10-28 22:25:21 +00:00
Edward Z. Yang
bca696ae81 Switch times to us in CompilationMetrics and improvements (#138975)
Companion logger diff: https://www.internalfb.com/diff/D65012523

* Using float seconds for timestamps is bad because our internal system defaults to float32 precision and you don't even get second precision for timestamps in float32
* We decide to use microseconds instead of milliseconds because millisecond granularity you can end up with the same timestamp if compilation is happening very quickly; much better to force non-overlapping spans
* Because there are so many new fields and I don't feel like reimplementing each on BwdCompilationMetrics, BwdCompilationMetrics is no more, it's just that everything in CompilationMetrics is now optional.
* The actual frame compile times collection is not modified (still float) to reduce blast radius, so I just convert to microseconds before making the record. At float64 precision (Python's default), you get about microsecond precision on timestamps so shouldn't be a data problem (https://www.leebutterman.com/2021/02/01/store-your-unix-epoch-times-as-float64.html)
* I rename some entries for clarity. In particular, whenever a timing contains all of the its lower phases (e.g., how Inductor also contains Triton compilation) we put "cumulative" in its name.  If something doesn't happen at compile time but is delayed until we have actual real inputs, we put "runtime" in its name.

Test plan:

```
buck2 run @mode/opt @mode/inplace //scripts/oulgen:runner
```

And then inspect https://fburl.com/scuba/dynamo_compile/sandbox/mslu7f5w and verify the us columns are populated and meaningful.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138975
Approved by: https://github.com/masnesral
2024-10-28 17:17:18 +00:00
Aaron Gokaslan
49ed365b22 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-10-26 15:07:13 +00:00
Sam Larsen
86b45bde19 [pt2] Add logger logging for remote fx graph cache get + put (#138164)
Summary: Capture the timing for the remote fx graph cache get and put operations and add them to the logger logging.

Test Plan:
1) Landed D64483593 and waited for logger actualization.
2) Ran test script on devserver: `buck2 run mode/opt scripts/slarsen/torch_compile_model:run`
3) Queried dynamo_compile/sandbox:
```
(pytorch-3.10_4) devvm2296:~/local/pytorch-3.10_4  $ scuba -e="select time,co_filename,remote_fx_graph_cache_get_time_s,remote_fx_graph_cache_put_time_s from \`dynamo_compile/sandbox\` where remote_fx_graph_cache_put_time_s is not null"
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+
|    time    |                                                                                    co_filename                                                                                    | remote_fx_graph_cache_get_time_s | remote_fx_graph_cache_put_time_s |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+
| 1729136266 | null                                                                                                                                                                              |              0.05652284622192383 |               0.9691152572631836 |
| 1729136263 | /data/users/slarsen/fbsource/buck-out/v2/gen/fbcode/289bb46b326874c6/scripts/slarsen/torch_compile_model/__run__/run-inplace#link-tree/scripts/slarsen/torch_compile_model/run.py |               0.8298435211181641 |              0.18642282485961914 |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+
```

Reviewed By: oulgen

Differential Revision: D64484025

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138164
Approved by: https://github.com/jamesjwu, https://github.com/ezyang
2024-10-25 21:30:18 +00:00
Animesh Jain
cfdf658a91 [dynamo][modules] Support overridden __call__ on nn modules (#138619)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138619
Approved by: https://github.com/williamwen42
ghstack dependencies: #138657
2024-10-24 03:49:26 +00:00
Animesh Jain
b1acd0978e [dynamo] Support range_iterator as a function input (#138657)
Fixes https://github.com/pytorch/pytorch/issues/138654

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138657
Approved by: https://github.com/williamwen42, https://github.com/jansel
2024-10-24 03:49:26 +00:00
James Wu
a16476b671 Add support for adding extra metadata to chromium events, log to separate columns (#138477)
This diff does a few things:

## Add metadata to events in progress
Adds the ability to add extra metadata to Chromium Events via `add_event_data`.
Metadata can only be added to chromium events that have started, but not ended (so, in progress events)
- When you add the data, the metadata is appended to the metadata when you call log_event_end().
- The metadata appears in chromium events in tlparse. It also gets logged to scuba.

## New `dynamo` chromium event
We add a new `dynamo` chromium event to the top of the stack, where we collect various metadata found in dynamo_compile. So the new order of events goes:

```
__start__
-> dynamo (dynamo compile metrics)
-> entire_frame_compile (compile.inner)
-> backend_compile (i.e. aotdispatch)
-> create_aot_dispatch_function
-> inductor_compile
-> ...
```

BackwardCompilationMetrics doesn't have any dynamo specific information (as it's mostly inductor timings). So we don't include that here.

*FAQ: Why can't we use `entire_frame_compile` as the event?*
This is mostly due to backward compatibility with `dynamo_compile`. `dynamo_compile` collects CompilationMetrics outside of `compile.compile_inner`, and uses `dynamo_timed` to grab timings from phases of the compiler, including `entire_frame_compile`. So we don't have a CompilationMetric object until after an `entire_frame_compile` event ends! Separately, `dynamo` as a name for all of dynamo compile is more descriptive than `entire_frame_compile`, imo.

## Log metadata as separate columns
(Meta only): Separately, this also changes the `metadata` column in PT2 Compile Events. Instead of logging a single metadata column in JSON, it separates the JSON into separate columns. This is much better for data analysis. Now that this table is more mature, I think logging keys to separate columns is a better system.Differential Revision: [D64696287](https://our.internmc.facebook.com/intern/diff/D64696287/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D64696287/)!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138477
Approved by: https://github.com/aorenste
2024-10-22 21:17:44 +00:00
Simon Fan
5a13282c75 [compiled autograd] tls access helpers (#138061)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138061
Approved by: https://github.com/yf225
ghstack dependencies: #137953, #137821
2024-10-22 08:03:52 +00:00
Simon Fan
49fa437097 [compiled autograd] Compiled autograd configs in TLS (#137821)
Multithreaded doesn't work yet, this adds python side TLS only for the python side state

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821
Approved by: https://github.com/jansel, https://github.com/yf225
ghstack dependencies: #137953
2024-10-22 08:03:52 +00:00
Sam Larsen
a80b87353c [pt2] Log is_forward field to dynamo_compile scuba table (#138505)
Differential Revision: [D64711721](https://our.internmc.facebook.com/intern/diff/D64711721)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138505
Approved by: https://github.com/oulgen
2024-10-22 05:50:49 +00:00
Tom Ritchford
8ad191ae21 [dynamo] Replace __str__ with __repr__ in some places (#136316)
## The problem

In a typical debugger, `repr()` is used to display variables and not `str()`.

Several classes in Dynamo have a `__str__()` method that returns useful information and a  `__repr__()` that does not. Having to call `str(x)` or `[str(i) for i in x]` in the debugger all the time is a chore.

`str()` should be ["informal, nicely printable"](https://docs.python.org/3/library/stdtypes.html#str) and `repr()` should ["attempt to return a string that would yield an object with the same value when passed to eval()](https://docs.python.org/3/library/functions.html#repr)".

## The solution

In the Python object model, if there is no `__str__` method, `__repr__`  is used instead (but not the other way around).

So renaming `__str__` to `__repr__` in a few cases where no `__repr__` method exists now should not change observable behavior, and should make debugging easier.

The specific classes changed were all in `torch._dynamo.variables`:

* `builtin.BuiltinVariable`
* `constant.ConstantVariable`
* `constant.EnumVariable`
* `functions.UserMethodVariable`
* `lazy.LazyVariableTracker`
* `lazy.LazySymNodeFormatString`
* `misc.GetAttrVariable`
* `misc.NullVariable`
* `user_defined.UserDefinedObjectVariable`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136316
Approved by: https://github.com/XuehaiPan, https://github.com/jansel
2024-10-21 19:50:38 +00:00
PyTorch MergeBot
32d4582e02 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit 16caa8c1b3.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to checking if this will solve inductor errors ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2427565425))
2024-10-21 19:40:58 +00:00
Aaron Orenstein
07cc4bd3e2 typing compile_fx.py (#138033)
Type annotations for compile_fx.
- Some of the stuff here is pretty complicated (functions which return functions that take functions) so I bailed on those and used `Any` just to get the rest landed.
- There are also changes to type signatures in other files which I did just to let mypy know more about the types in compile_fx.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138033
Approved by: https://github.com/Skylion007
2024-10-21 18:14:59 +00:00
Aaron Gokaslan
16caa8c1b3 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-10-21 17:20:06 +00:00
Michael Lazos
a20a17fd6f [Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669)
Fixes https://github.com/pytorch/pytorch/issues/114369

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137669
Approved by: https://github.com/anijain2305
2024-10-19 04:12:45 +00:00
PyTorch MergeBot
47e4045566 Revert "[pt2] Log is_forward field to dynamo_compile scuba table (#138097)"
This reverts commit 4e9273c84e.

Reverted https://github.com/pytorch/pytorch/pull/138097 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think it has a land race with https://github.com/pytorch/pytorch/pull/137803 ([comment](https://github.com/pytorch/pytorch/pull/138097#issuecomment-2423297516))
2024-10-18 22:00:40 +00:00
James Wu
295de00908 [PT2 Compile Events] Revamp PT2 Compile/chromium event logging [1/?] (#138093)
This diff is the starting steps of https://docs.google.com/document/u/2/d/1kAEBt4AyW7HTAhXHbjoz8FBFHNyyEA2Qo2mPn7v3WUQ/edit?usp=drive_web&ouid=113555078003219714709

It implements the following changes:

- Only log spans to scuba, so no start events are ever logged
- Log events as the full event name, without "START" or "END"
- Only log to scuba major phases from chromium events. These are:
  - entire_frame_compile (dynamo)
  - backend_compile (aotdispatch)
  - inductor_compile (inductor)
  - codegen (inductor codegen)

Tlparse chromium events stay basically the same. But I implemented a few changes to clean that up as well:
- When there's a phase name available, log the phase name instead of the function name as the event name. This simplifies the trace to not have two identical rows. The fn_name is avaliable as metadata on the chromium event, if interested
- Log new events for pre and post grad passes. These do *not* log to scuba.

By making the phases much simpler in Scuba, with only categories for major phases of PT2 Compilation, we pave the way to add **much** more metadata and information to each individual event type. Diffs for that will come later.

**IMPLEMENTATION NOTES:**
- The logic for `log_chromium_event_internal` (which is the function that logs to Scuba) lives in chromium_events for now, but in the future as we add more metadata, it may belong independently in dynamo_timed or even outside of dynamo_timed. I haven't explored in detail what the refactor will look like. Once we start logging metadata for dynamo, aotdispatch, inductor, I suspect we will call log_pt2_compile_event directly, instead of making chromium event logger handle the pt2_compile_event logic. But that refactor is left for another PR on top of this one.

- There's an interesting space after pre grad passes within AOT autograd logic, that's between create_aot_dispatcher_function and pre grad passes. I'm not sure what we're spending time doing in that time, but I'll find out with a profile later.

Differential Revision: [D64479033](https://our.internmc.facebook.com/intern/diff/D64479033/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138093
Approved by: https://github.com/ezyang
2024-10-18 20:36:08 +00:00
Sam Larsen
4e9273c84e [pt2] Log is_forward field to dynamo_compile scuba table (#138097)
Summary: ^^

Test Plan:
Ran a test script out of fbcode: D64350202. Then:

```
(pytorch-3.10_4) devvm2296:~/fbcode  $ scuba -e="select time,co_filename,is_forward from \`dynamo_compile/sandbox\` where is_forward is not null"
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
|    time    |                                                                                    co_filename                                                                                    | is_forward |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
| 1729032583 | /data/users/slarsen/fbsource/buck-out/v2/gen/fbcode/1638b36e975169f6/scripts/slarsen/torch_compile_model/__run__/run-inplace#link-tree/scripts/slarsen/torch_compile_model/run.py |          1 |
| 1729032583 | null                                                                                                                                                                              |          0 |
| 1729032650 | /data/users/slarsen/fbsource/buck-out/v2/gen/fbcode/1638b36e975169f6/scripts/slarsen/torch_compile_model/__run__/run-inplace#link-tree/scripts/slarsen/torch_compile_model/run.py |          1 |
| 1729032650 | null                                                                                                                                                                              |          0 |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
4 row(s) in set (0 warnings, 131 errors, 0.80 sec)
```

Reviewed By: ezyang

Differential Revision: D64438144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138097
Approved by: https://github.com/ezyang
2024-10-18 18:48:52 +00:00
PyTorch MergeBot
361f42bc42 Revert "[compiled autograd] Compiled autograd configs in TLS (#137821)"
This reverts commit 9aba0b91c8.

Reverted https://github.com/pytorch/pytorch/pull/137821 on behalf of https://github.com/wdvr due to Reverting this for now, it is failing test_public_bindings in trunk ([comment](https://github.com/pytorch/pytorch/pull/137821#issuecomment-2417351788))
2024-10-16 16:38:29 +00:00
Simon Fan
9aba0b91c8 [compiled autograd] Compiled autograd configs in TLS (#137821)
Multithreaded doesn't work yet, this adds python side TLS only for the python side state

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821
Approved by: https://github.com/jansel, https://github.com/yf225
ghstack dependencies: #137953
2024-10-16 09:28:32 +00:00
PyTorch MergeBot
4557f6e339 Revert "[Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669)"
This reverts commit bf0b670598.

Reverted https://github.com/pytorch/pytorch/pull/137669 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing test_public_bindings in trunk, maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/137669#issuecomment-2415331274))
2024-10-15 23:22:58 +00:00
Michael Lazos
bf0b670598 [Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669)
Fixes https://github.com/pytorch/pytorch/issues/114369

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137669
Approved by: https://github.com/anijain2305
2024-10-15 20:52:58 +00:00
Jovian Anthony Jaison
6001b16597 Add entire _dynamo.config as a json for logging (#137216)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137216
Approved by: https://github.com/ezyang

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2024-10-12 11:48:59 +00:00
William Wen
93bbc8abcc [dynamo, 3.13] use 3.13 multiline traceback in get_instruction_source_311 (#137617)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137617
Approved by: https://github.com/jansel
2024-10-10 20:19:27 +00:00
Michael Lazos
d5785d4295 [Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137119
Approved by: https://github.com/williamwen42, https://github.com/anijain2305
ghstack dependencies: #137114, #137115, #137116, #137117, #137120, #137227
2024-10-09 02:29:40 +00:00
Michael Lazos
108b469f78 [Dynamo] Remove ignored modes workaround (#135502) (#137115)
Approved by: https://github.com/anijain2305
ghstack dependencies: #134732, #133137, #135443, #135444, #135422

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137115
Approved by: https://github.com/yanboliang
ghstack dependencies: #137114
2024-10-09 02:29:40 +00:00
PyTorch MergeBot
8c937445ee Revert "[Dynamo] Remove ignored modes workaround (#135502) (#137115)"
This reverts commit b1fd7708bd.

Reverted https://github.com/pytorch/pytorch/pull/137115 on behalf of https://github.com/huydhn due to The top of the stack has been reverted but it leaves trunk in a broken state, so I try to revert the rest of the stack ([comment](https://github.com/pytorch/pytorch/pull/137114#issuecomment-2400765603))
2024-10-08 20:33:17 +00:00
James Wu
3bf6594d13 Log compile ids to pt2_remote_cache and pt2_compile_events (#137431)
Log the current compilation id for all relevant samples for these two tables, so we can have a 1:1 analog with dynamo_compile.

Differential Revision: [D63900826](https://our.internmc.facebook.com/intern/diff/D63900826/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137431
Approved by: https://github.com/oulgen
2024-10-08 18:04:48 +00:00
PyTorch MergeBot
c88c0e6c65 Revert "[Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119)"
This reverts commit d255b34c0a.

Reverted https://github.com/pytorch/pytorch/pull/137119 on behalf of https://github.com/malfet due to Need to revert to be able to revert https://github.com/pytorch/pytorch/pull/136910 ([comment](https://github.com/pytorch/pytorch/pull/137119#issuecomment-2400401262))
2024-10-08 17:09:26 +00:00
Michael Lazos
d255b34c0a [Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137119
Approved by: https://github.com/williamwen42
ghstack dependencies: #137114, #137115, #137116, #137117, #137120, #137227
2024-10-07 18:55:26 +00:00
Michael Lazos
b1fd7708bd [Dynamo] Remove ignored modes workaround (#135502) (#137115)
Approved by: https://github.com/anijain2305
ghstack dependencies: #134732, #133137, #135443, #135444, #135422

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137115
Approved by: https://github.com/yanboliang
ghstack dependencies: #137114
2024-10-07 18:55:26 +00:00
Jovian Anthony Jaison
59d7cf7342 Add _dynamo.config inline_inbuilt_nn_modules and specialize_float logging (#137139)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137139
Approved by: https://github.com/ezyang
2024-10-02 19:58:38 +00:00
Animesh Jain
289df45cee Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422)" (#136590)
This reverts commit 7743149b2b.

Reverts
* https://github.com/pytorch/pytorch/pull/135503
* https://github.com/pytorch/pytorch/pull/135502
* https://github.com/pytorch/pytorch/pull/135422

This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation.

```
import torch
from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention
from torch._inductor.lowering import make_pointwise, register_lowering
from torch._inductor.virtualized import ops
from torch.nn.attention.flex_attention import create_block_mask

torch.set_default_device('cuda')

flex_attention = torch.compile(flex_attention, dynamic=False)

prefix_lengths = torch.arange(8)
def prefix_lm(b, h, q, kv):
    return prefix_lengths[b] >= kv

mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136590
Approved by: https://github.com/Chillee
2024-09-25 21:10:43 +00:00
Animesh Jain
dacf0c4884 [dynamo] Do not treat user defined nn module attributes static for dynamic shape infra (#136516)
Fixes https://github.com/pytorch/pytorch/issues/136254

Th regression was introduced in https://github.com/pytorch/pytorch/pull/132736 where originally we were trying to fix another regression. This PR and the offending PR together say - "treat user defined nn module attributes as automatic dynamic, but for cudagraphs they will be considered static". This avoid recompilations. This can lead to a cudagraph recording, which is ok. This also maintains the state before inline_inbuilt_nn_modules flag was introduced.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136516
Approved by: https://github.com/williamwen42
2024-09-24 18:26:12 +00:00
Jovian Anthony Jaison
09715638ab Add _dynamo.config.suppress_errors logging (#136379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136379
Approved by: https://github.com/ezyang
2024-09-21 21:00:26 +00:00
James Wu
803ce507f1 Log structured logging overhead to dynamo compile (kinda) (#136142)
Summary:
X-link: https://github.com/pytorch/benchmark/pull/2454

This adds structured logging overhead at a per compile basis to compilation metrics.

To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table.

Implementation notes:
- If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis.
- We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small.
- I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though.

Test Plan:
Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table:

https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq

Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278  = 6%, which seems reasonable as the overhead for a small compilation like this.

You can also look at samples for a more detailed log of this.

Reviewed By: oulgen

Differential Revision: D62643611

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136142
Approved by: https://github.com/bobrenjc93
2024-09-19 16:11:38 +00:00
Sun, Jiayi
701ba5203f [Inductor] Increase multiplier to 3 for Inductor AMP FP16 benchmark correctness check (#135932)
Fix https://github.com/pytorch/pytorch/issues/135657.
Aligned with AMP BF16, using multiplier 3 for Inductor AMP FP16 benchmark correctness check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135932
Approved by: https://github.com/CaoE, https://github.com/jgong5, https://github.com/jansel
2024-09-18 13:03:45 +00:00
Aaron Gokaslan
31715be72a [BE]: Update mypy to 1.11.2 (#133816)
Updates mypy to 1.11.1 to improve type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-16 19:44:11 +00:00
PyTorch MergeBot
3117f2cf67 Revert "[BE]: Update mypy to 1.11.2 (#133816)"
This reverts commit 55299cfc22.

Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684))
2024-09-16 09:11:16 +00:00
Aaron Gokaslan
55299cfc22 [BE]: Update mypy to 1.11.2 (#133816)
Updates mypy to 1.11.1 to improve type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-14 21:40:36 +00:00
Michael Lazos
860838e9be [Dynamo] Remove ignored modes workaround (#135502)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135502
Approved by: https://github.com/anijain2305
ghstack dependencies: #134732, #133137, #135443, #135444, #135422
2024-09-14 18:52:22 +00:00
Michael Lazos
5c5c33ac32 [Dynamo] Trace torch function modes entered outside of torch.compile (#133137)
This PR adds initial tracing for torch function modes.

Details:
In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call.
This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers.

Previously landed:
https://github.com/pytorch/pytorch/pull/133135
https://github.com/pytorch/pytorch/pull/133136
https://github.com/pytorch/pytorch/pull/133134
https://github.com/pytorch/pytorch/pull/133133
https://github.com/pytorch/pytorch/pull/133132
https://github.com/pytorch/pytorch/pull/133131
https://github.com/pytorch/pytorch/pull/133729
https://github.com/pytorch/pytorch/pull/133130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #134732
2024-09-14 18:52:22 +00:00
PyTorch MergeBot
8c8a3086a7 Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137)"
This reverts commit 4528777e03.

Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))
2024-09-14 10:02:55 +00:00
PyTorch MergeBot
838c912502 Revert "[Dynamo] Remove ignored modes workaround (#135502)"
This reverts commit 5c67cf180e.

Reverted https://github.com/pytorch/pytorch/pull/135502 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))
2024-09-14 10:02:55 +00:00
Michael Lazos
5c67cf180e [Dynamo] Remove ignored modes workaround (#135502)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135502
Approved by: https://github.com/anijain2305
ghstack dependencies: #134732, #133137, #135443, #135444, #135422
2024-09-14 02:41:16 +00:00
Michael Lazos
4528777e03 [Dynamo] Trace torch function modes entered outside of torch.compile (#133137)
This PR adds initial tracing for torch function modes.

Details:
In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call.
This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers.

Previously landed:
https://github.com/pytorch/pytorch/pull/133135
https://github.com/pytorch/pytorch/pull/133136
https://github.com/pytorch/pytorch/pull/133134
https://github.com/pytorch/pytorch/pull/133133
https://github.com/pytorch/pytorch/pull/133132
https://github.com/pytorch/pytorch/pull/133131
https://github.com/pytorch/pytorch/pull/133729
https://github.com/pytorch/pytorch/pull/133130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #134732
2024-09-14 02:40:43 +00:00
James Wu
ad2f0e9f81 Add remote cache time saved to compilation metrics (#135490)
Summary:
Record remote cache time saved via frame_phase_timing

We add to the "phase" when remote cache hits and saves us time, so that we have a 1:1 correspondence between a frame and time saved.

Test Plan:
Internally run benchmark, see that it's populated in sandbox table after previous diff lands and logger config is actualized.

Show that column exists in table:

https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/fp2te0ff

Note that an earlier version of D62105258 had the column as a string so the staging table is a bit messed up. But you can see the most recent samples have the column populates as a float.

Reviewed By: aorenste

Differential Revision: D62106921

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135490
Approved by: https://github.com/aorenste
2024-09-13 16:35:51 +00:00
PyTorch MergeBot
eb7dd91dd1 Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137)"
This reverts commit fafdd588f2.

Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))
2024-09-13 12:52:58 +00:00
PyTorch MergeBot
fca58bfda1 Revert "[Dynamo] Remove ignored modes workaround (#135502)"
This reverts commit 7d5e0dd4b1.

Reverted https://github.com/pytorch/pytorch/pull/135502 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))
2024-09-13 12:52:57 +00:00
Michael Lazos
7d5e0dd4b1 [Dynamo] Remove ignored modes workaround (#135502)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135502
Approved by: https://github.com/anijain2305
ghstack dependencies: #134732, #133137, #135443, #135444, #135422
2024-09-13 08:41:32 +00:00
Michael Lazos
fafdd588f2 [Dynamo] Trace torch function modes entered outside of torch.compile (#133137)
This PR adds initial tracing for torch function modes.

Details:
In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call.
This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers.

Previously landed:
https://github.com/pytorch/pytorch/pull/133135
https://github.com/pytorch/pytorch/pull/133136
https://github.com/pytorch/pytorch/pull/133134
https://github.com/pytorch/pytorch/pull/133133
https://github.com/pytorch/pytorch/pull/133132
https://github.com/pytorch/pytorch/pull/133131
https://github.com/pytorch/pytorch/pull/133729
https://github.com/pytorch/pytorch/pull/133130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #134732
2024-09-13 08:41:00 +00:00
PyTorch MergeBot
183c32fd3b Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137)"
This reverts commit 0d15122092.

Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/clee2000 due to something in this stack broke functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph [GH job link](https://github.com/pytorch/pytorch/actions/runs/10804912306/job/29980571390) [HUD commit link](444b52ff40), newly added test yesterday ([comment](https://github.com/pytorch/pytorch/pull/133137#issuecomment-2344054339))
2024-09-11 15:57:00 +00:00
Michael Lazos
0d15122092 [Dynamo] Trace torch function modes entered outside of torch.compile (#133137)
This PR adds initial tracing for torch function modes.

Details:
In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call.
This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers.

Previously landed:
https://github.com/pytorch/pytorch/pull/133135
https://github.com/pytorch/pytorch/pull/133136
https://github.com/pytorch/pytorch/pull/133134
https://github.com/pytorch/pytorch/pull/133133
https://github.com/pytorch/pytorch/pull/133132
https://github.com/pytorch/pytorch/pull/133131
https://github.com/pytorch/pytorch/pull/133729
https://github.com/pytorch/pytorch/pull/133130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #134732
2024-09-11 04:18:22 +00:00
William Wen
a4030e37be [dynamo] reland map/zip iterator related changes (#135074)
Differential Revision: [D62211019](https://our.internmc.facebook.com/intern/diff/D62211019)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135074
Approved by: https://github.com/jansel, https://github.com/anijain2305, https://github.com/mlazos
2024-09-06 20:38:02 +00:00
Animesh Jain
32f45f01a9 [dynamo] Retire CompileProfiler (#135133)
Fixes confusion in https://github.com/pytorch/pytorch/issues/113443

We have TORCH_LOGS that supersedes CompileProfiler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135133
Approved by: https://github.com/ezyang
ghstack dependencies: #135039, #135121, #135129, #135130
2024-09-05 01:08:40 +00:00
Michael Lazos
d9ae92cd6e [Dynamo] Support for proxying frozen dataclasses (#134846)
Fixes https://github.com/pytorch/pytorch/issues/133858

Details: Previously Dynamo would treat dataclasses as UserDefinedVariables. This was non-desirable if we would like to proxy the value into the graph, which is needed for TensorSubclassMetadata. To rectify this, frozen dataclasses are now able to be proxied similarly to NamedTuples. We require the object to be frozen, because if arbitrary mutation were allowed, we would need to replay those mutations in the graph after construction of the object.

For tracing construction of the variable, the generated `__init__` for the dataclass uses `object.__setattr__` because frozen dataclasses throw errors on the usual `__setattr__` invocation. With this treatment, no special handling is needed in dynamo for frozen dataclass construction.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134846
Approved by: https://github.com/bdhirsh, https://github.com/anijain2305
2024-09-04 22:17:00 +00:00
Edward Z. Yang
0cbcef12bd Stop adding useless prefix to error message here, you're pushing the important info off the screen. (#133108)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133108
Approved by: https://github.com/Skylion007
2024-09-01 23:11:17 +00:00
Shunting Zhang
1e92d7b688 [inductor] move loop ordering after fusion (#126254)
Restart the work from PR https://github.com/pytorch/pytorch/pull/100331 in this new PR since it's hard to rebase. It would be expected that some code is copy/pasted from the previous PR and main idea is the same.

Previously we see relatively large compilation time increase due to too many loop orders being considered. This PR tries to continue the work by doing pruning and only considering loop orders that we know for sure are relevant (i.e. do it on demand).

Some manually created cases that loop ordering matters are added as unit tests. The PR can make sure inductor does not miss fusion opportunities for them.

This PR should solve the not-able to fusion problem in https://github.com/pytorch/pytorch/issues/130015

Right now there is still significant increase of compilation time. I'll disable the feature by default. Later on after the compilation time issue is resolved, I'll enable it  by default.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126254
Approved by: https://github.com/jansel
2024-08-29 21:50:07 +00:00
Laith Sakka
d6091c8726 Add compile time instruction count metric (#133834)
PYTHONPATH=$(pwd) python benchmarks/update_hint_benchmark.py out
as of this diff, compile_time_instruction_count counts the number of instruction from within
convert_frame.compile_inner
```
update_hint_regression,compile_time_instruction_count,10522459165
```
 will add result from CI once populated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133834
Approved by: https://github.com/aorenste
2024-08-27 23:29:02 +00:00
James Wu
f8fbfe5846 Always emit end events even on failure, use thread local storage for stack (#134279)
Summary:
We should always emit an end event in a finally block so that if a unit test or job fails, the stack is still correct.

Also, we use thread local storage for the stack, so that in multithreaded scenarios the stack will still be correctly added.

Test Plan:
Run benchmark and see that everything still works
Run
```
TORCH_LOGS=dynamo buck run test/functorch:test_aotdispatch -- -r test_backward_mutation_on_grad_out
```
With some extra logging to see that start events with the correct stack are emitted, and the end events are also emitted even though the test fails at runtime.

Differential Revision: D61682556

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134279
Approved by: https://github.com/aorenste
2024-08-23 18:13:13 +00:00
Animesh Jain
fee677eeb6 [fbode-testing][dynamo][reland][inline-inbuilt-nn-modules] Mark attri… (#134136)
Shuai wants to test this internally before https://github.com/pytorch/pytorch/pull/133713 can go in. Creating a separate PR for ghmport.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134136
Approved by: https://github.com/yanboliang
2024-08-22 17:54:58 +00:00
Aaron Orenstein
d95aedf5fd [BE] typing for decorators - fx/_compatibility (part 1) (#134202)
Part of #134054.

This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
2024-08-22 17:07:33 +00:00
James Wu
3c5485fb7f [Retry] Log chromium events to scuba (#134118)
Summary:
This diff implements a bunch of views for internal scuba viewing.

TODOS that I might punt to another diff:
- Saving cache stats via counter is definitely sus here, but there's not really a good way to track "fx graph cache hit for this compile phase" right now. Will think about this more.
- We should definitely log frame id, compile id, etc
- We should definitely be logging configs. That way, we can A/B test based on whether a config is turned on.
- idk what I'm doing with compile_uuid yet, but it's useful when you want to look at samples for a single run. I think if we had mast job info this field is not needed, but it's nice to be able to drill down to a single run and get its chrome trace view or icicle view, so idk

Test Plan:
All of the above views are run with nanogpt benchmark:

```
buck run mode/opt caffe2/benchmarks/dynamo:torchbench -- --training --backend=inductor --only nanogpt --performance
```

Differential Revision: D61603243

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134118
Approved by: https://github.com/oulgen
2024-08-22 14:59:45 +00:00
PyTorch MergeBot
2db28a9611 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit bce0caba78.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/ezyang due to root cause of internal failures not addressed ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2302466444))
2024-08-21 16:13:34 +00:00
PyTorch MergeBot
68425e68fe Revert "[dynamo][reland][inline-inbuilt-nn-modules] Mark attributes of nn mod… (#133714)"
This reverts commit e8d3c4be36.

Reverted https://github.com/pytorch/pytorch/pull/133714 on behalf of https://github.com/anijain2305 due to fails internally ([comment](https://github.com/pytorch/pytorch/pull/133714#issuecomment-2302171472))
2024-08-21 14:21:06 +00:00
Aaron Gokaslan
bce0caba78 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-08-20 17:19:57 +00:00
PyTorch MergeBot
42097f0ec1 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit cf60fe53a8.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to Broke 12k internal signals/jobs, @ezyang please help get those changes merged. More details check D61488368 ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2298210309))
2024-08-20 08:02:49 +00:00
Michael Lazos
c0b4aaa8c5 [Dynamo] Support pop torch function mode stack (#133131)
This PR adds support for tracing `torch._C._pop_torch_function_stack()` without graph breaking and in order to verify the state change also adds replay of mutations to the torch function mode stack via side_effects appending supplemental bytecode as we do for other python mutable objects.

Details:
To represent the torch function mode stack symbolically a deque field is added to the instruction translator. When the InstructionTranslator is initialized, all modes are read from the current torch function mode stack, and stashed in a global weak ref for later access (using existing sources) without needing to push/pop the python/cpp torch function mode stack.

During tracing, when `_pop_torch_function_stack` is encountered a value is popped from this deque and the variable tracker representing the mode is returned. To ensure the true torch function mode stack matches this state, `TorchFunctionModeStackVariable`, a singleton, is marked as mutated, this adds it to side effects, where during final codegen, side effects will codegen a call to a python helper which will update the python torch function mode stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133131
Approved by: https://github.com/jansel
ghstack dependencies: #133130, #133729
2024-08-20 07:14:42 +00:00
Michael Lazos
09e366cb57 [Dynamo] Add torch function mode stack guard to dynamo (#133130)
This PR adds a guard on the torch function mode stack state at the beginning of tracing. The way this is implemented is via a new leaf guard which is passed the initial stack state at construction and compares it to the stack state at the time the guard is run.

Details:
The stack state is extracted via popping all modes, appending them to a list, and pushing all modes back. This list is stored on the output graph and read during guard construction to pass to the stack mode guard. There the length and types of the modes are recorded. Next time the guard is run it compares this recorded state to the current mode stack state.

To implement this in python a helper function was added to utils.py and this is used if cpp guards are not enabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133130
Approved by: https://github.com/anijain2305
2024-08-20 07:14:33 +00:00
Animesh Jain
e8d3c4be36 [dynamo][reland][inline-inbuilt-nn-modules] Mark attributes of nn mod… (#133714)
Relands https://github.com/pytorch/pytorch/pull/132539
Relands https://github.com/pytorch/pytorch/pull/132736

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133714
Approved by: https://github.com/jansel
2024-08-20 05:57:52 +00:00
Aaron Gokaslan
cf60fe53a8 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-08-18 19:10:16 +00:00
Will Feng
f57b00704e [Traceable FSDP2][Dynamo] Support reconstructing CUDA event object within Dynamo graph (#133635)
`torch.cuda.Event` objects are different from `torch.cuda.Stream` in that events are not pooled, meaning we can't look up a previously created CUDA event object by ID. This prevents CUDA event object created outside of the Dynamo graph from being used within the graph (since Dynamo needs a way to emit a `call_function` line in the graph that does the retrieval of the event object for downstream op use). This PR adds a simple object pool within Dynamo utility, to support looking up CUDA event object by ID from within the Dynamo graph.

After this PR, if a user creates a CUDA event object outside of the graph and use that event within the graph, the behavior will exactly match eager.

Test commands:
- `pytest -rA test/dynamo/test_ctx_manager.py::CtxManagerTests::test_cuda_event_created_outside_of_graph`
- `pytest -rA test/dynamo/test_ctx_manager.py::CtxManagerTests::test_cuda_event_across_graph_break`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133635
Approved by: https://github.com/yifuwang
ghstack dependencies: #133532, #133531, #133636
2024-08-16 20:40:46 +00:00
Edward Z. Yang
90d2593b3e Revert #132806, #132736, #132539, #132487 (#133570)
This reverts commit 25df063f04.
This reverts commit de00c79583.
This reverts commit 419b76c4ac.
This reverts commit bc57d5b6ff.

Differential Revision: [D61335013](https://our.internmc.facebook.com/intern/diff/D61335013)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133570
Approved by: https://github.com/albanD, https://github.com/jansel, https://github.com/anijain2305
2024-08-15 20:54:21 +00:00
Yanbo Liang
9de023d44d [Dynamo] Make torch.Size can be reconstructed by LOAD_CONST (#133342)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133342
Approved by: https://github.com/mlazos, https://github.com/jansel
2024-08-13 23:18:38 +00:00
James Wu
f037803290 Add ChromiumEventLogger, log FXGraphCache and AOTAutogradCache (#132864)
This PR implements ChromiumEventLogger in all @dynamo_timed events. For each dynamo timed call, we log:
- A start event before starting the function execution
- An end event after finishing the function execution
- An extra pair of start/end events for any phase names included in dynamo.

Separately, this also gives us the ability to log instant events. I use them to log cache hits/misses as a first step. The little arrows on the bottom of the UI are cache hits/misses, and you can look at cache details by clicking each triangle.

The outputted chromium trace events can be viewed in perfetto for a timeline of an execution. Here's what it looks like for a run of nanogpt:
![image](https://github.com/user-attachments/assets/cb9e6c7a-1acf-45e6-8a27-6651d9ae6132)

And another with warm start:
![image](https://github.com/user-attachments/assets/cd9709bc-59ef-4da1-a7dd-10b1a0ab9b8f)

Trace events are based around the JSON Event format: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview

We may want to switch to the less deprecated Protobuf format later, but so far I don't see any features we care about supported there.

Internal FB employees can see a link to this in the tlparse output:
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpVi1FIl/dedicated_log_torch_trace_bb4zl_bc.log/index.html

I'll also work on logging these

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132864
Approved by: https://github.com/aorenste
2024-08-10 01:15:53 +00:00
Shunting Zhang
10c2168b31 [pt2-bench] use larger multiplier for smaller tensors for a few models (#132952)
Fix https://github.com/pytorch/pytorch/issues/132922  and https://github.com/pytorch/pytorch/issues/132924

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132952
Approved by: https://github.com/eellison, https://github.com/jansel
2024-08-09 00:09:21 +00:00
Xuehai Pan
24dee99cb7 Populate submodules of torch._C to sys.modules recursively (#132216)
See comment:

e9d1c26275/torch/__init__.py (L938-L950)

This PR recursively sets the submodules in the C extension to `sys.modules` (e.g., `_C._dynamo.eval_frame`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132216
Approved by: https://github.com/ezyang
2024-08-08 10:20:25 +00:00
PyTorch MergeBot
ff81ca8e0c Revert "Populate submodules of torch._C to sys.modules recursively (#132216)"
This reverts commit 672ce4610e.

Reverted https://github.com/pytorch/pytorch/pull/132216 on behalf of https://github.com/PaliC due to was breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/132216#issuecomment-2274112397))
2024-08-07 18:45:00 +00:00
Joel Schlosser
fb146fc3c6 Only store necessary tensor_dict fields in node meta (#132805)
Fixes #132290

This PR attempts a more invasive / complete solution than the one from #132338, which removes immediate tensor fields from the `tensor_dict` copy stored in node meta. The approach taken here is to store only those fields of the `tensor_dict` which are absolutely utilized somewhere else.

So far, this appears to be limited to:
* `_dynamo_static_input_type`
* `tag` (at least in the tests). Discussion at #94080 appears to indicate this is depended on for export

(CI may point out more)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132805
Approved by: https://github.com/mlazos
2024-08-07 13:35:16 +00:00
rzou
0d6caeb259 Add logging + counter for missed reinplacing opportunities (#132758)
Summary:
- We add Inductor logs for what tensors we tried to reinplace, what
  tensors we were unable to reinplace, and of those tensors, which of
  those might be bugs (the "missed reinplacing opportunities"). You can
  tell this by reading the Inductor output graph but the logs make it
  easier to figure out.
- Add a dynamo_compile counter for missed reinplacing opportunities. The
  goal is to see how widespread existing problems (if any) are. We've had
  trouble getting all of the edge cases for the reinplacing pass; the
  counter will help us hunt down issues.

Test Plan:
- tested locally

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132758
Approved by: https://github.com/eellison
2024-08-06 23:44:24 +00:00
Animesh Jain
de00c79583 [dynamo][inline_inbuilt_nn_modules] Mark nn module tensor static for cudagraphs (#132736)
Fixes https://github.com/pytorch/pytorch/issues/132714

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132736
Approved by: https://github.com/mlazos
ghstack dependencies: #132538
2024-08-06 20:13:28 +00:00
William Wen
01cdcbf7c8 [dynamo] revert map/zip iterator related changes (#132528)
Need to revert due to internal hangs: S437700

This reverts commit b6c1490cc0.

Revert "[dynamo] implement IteratorVariable and polyfill fallbacks for enumerate (#131725)"

This reverts commit 2576dbbc35.

Revert "[dynamo] add itertools repeat/count bytecode reconstruction (#131716)"

This reverts commit 35b4de32fa.

Revert "[dynamo] add lazy IteratorVariable implementations for map and zip (#131413)"

This reverts commit 7d282d8755.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132528
Approved by: https://github.com/ZainRizvi
2024-08-04 18:46:55 +00:00
PyTorch MergeBot
0a25666f92 Revert "[dynamo] revert map/zip iterator related changes (#132528)"
This reverts commit e81e74ca6c.

Reverted https://github.com/pytorch/pytorch/pull/132528 on behalf of https://github.com/ZainRizvi due to This stack entered a weird state in the diff train. Reverting and relanding to clean the state ([comment](https://github.com/pytorch/pytorch/pull/132528#issuecomment-2267628475))
2024-08-04 18:26:09 +00:00
Animesh Jain
06581c277a [dynamo][stable-diffusion] Support dict(obj) on constrained subclasses of dict and OrderedDict (#132558)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132558
Approved by: https://github.com/jansel
2024-08-03 06:31:00 +00:00
Animesh Jain
419b76c4ac [dynamo] Reland 132308, 132314, 132318, 132334 - Make builtin nn modules attributes static (#132539)
Relanding 4 PRs ending at https://github.com/pytorch/pytorch/pull/132334

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132539
Approved by: https://github.com/Skylion007, https://github.com/yanboliang, https://github.com/mlazos
2024-08-03 02:08:22 +00:00
William Wen
e81e74ca6c [dynamo] revert map/zip iterator related changes (#132528)
Need to revert due to internal hangs: S437700

This reverts commit b6c1490cc0.

Revert "[dynamo] implement IteratorVariable and polyfill fallbacks for enumerate (#131725)"

This reverts commit 2576dbbc35.

Revert "[dynamo] add itertools repeat/count bytecode reconstruction (#131716)"

This reverts commit 35b4de32fa.

Revert "[dynamo] add lazy IteratorVariable implementations for map and zip (#131413)"

This reverts commit 7d282d8755.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132528
Approved by: https://github.com/ZainRizvi
2024-08-02 19:40:57 +00:00
PyTorch MergeBot
193a19ee91 Revert "[dynamo] Treat attr of unspecialized buiitin nn modules as static (#132318)"
This reverts commit 7b816d7d6d.

Reverted https://github.com/pytorch/pytorch/pull/132318 on behalf of https://github.com/anijain2305 due to broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/132318#issuecomment-2265945433))
2024-08-02 18:43:32 +00:00
PyTorch MergeBot
b8f7019df0 Revert "[dynamo] Track params/buffers and mark them as static (#132334)"
This reverts commit babb249a89.

Reverted https://github.com/pytorch/pytorch/pull/132334 on behalf of https://github.com/anijain2305 due to broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/132334#issuecomment-2265942261))
2024-08-02 18:41:19 +00:00
Edward Z. Yang
290f09f829 Ban decorator usage of dynamo_timed (#132328)
This is a more manual version of https://github.com/pytorch/pytorch/pull/132073 that just manually creates the new function at each call site instead of magicking it with clone. Review with whitespace diffs off.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132328
Approved by: https://github.com/albanD
2024-08-02 12:00:46 +00:00
Animesh Jain
babb249a89 [dynamo] Track params/buffers and mark them as static (#132334)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132334
Approved by: https://github.com/ezyang, https://github.com/mlazos
2024-08-02 08:55:43 +00:00
PyTorch MergeBot
c8958f8f84 Revert "Ban decorator usage of dynamo_timed (#132328)"
This reverts commit 9853c048eb.

Reverted https://github.com/pytorch/pytorch/pull/132328 on behalf of https://github.com/clee2000 due to seems to have broken functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input [GH job link](https://github.com/pytorch/pytorch/actions/runs/10204547165/job/28233976446) [HUD commit link](9853c048eb).  Test passed on PR, probably a landrace, base is only 10 hours old ([comment](https://github.com/pytorch/pytorch/pull/132328#issuecomment-2263909337))
2024-08-01 20:20:28 +00:00
Edward Z. Yang
9853c048eb Ban decorator usage of dynamo_timed (#132328)
This is a more manual version of https://github.com/pytorch/pytorch/pull/132073 that just manually creates the new function at each call site instead of magicking it with clone. Review with whitespace diffs off.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132328
Approved by: https://github.com/albanD
2024-08-01 19:27:58 +00:00
Animesh Jain
7b816d7d6d [dynamo] Treat attr of unspecialized buiitin nn modules as static (#132318)
This fixes the huge increase in compile time with +dynamic with inline_inbuilt_nn_modules.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132318
Approved by: https://github.com/yanboliang, https://github.com/mlazos, https://github.com/ezyang
ghstack dependencies: #132302, #132304, #132312, #132308, #132314
2024-08-01 17:11:18 +00:00
Xuehai Pan
672ce4610e Populate submodules of torch._C to sys.modules recursively (#132216)
See comment:

e9d1c26275/torch/__init__.py (L938-L950)

This PR recursively sets the submodules in the C extension to `sys.modules` (e.g., `_C._dynamo.eval_frame`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132216
Approved by: https://github.com/ezyang
2024-08-01 12:04:59 +00:00
Animesh Jain
e772547d70 [dynamo][rename/refactor] Rename guard_source NN_MODULE to SPECIALIZED_NN_MODULE (#132302)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132302
Approved by: https://github.com/yanboliang
2024-08-01 04:35:43 +00:00
Xuehai Pan
e74ba1b34a [BE][Easy][15/19] enforce style for empty lines in import segments in torch/_d*/ (#129767)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129767
Approved by: https://github.com/anijain2305
2024-07-31 21:18:11 +00:00
PyTorch MergeBot
945bf78894 Revert "[BE] typing for decorators - fx/_compatibility (#131568)"
This reverts commit 193f62fde9.

Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident.  This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))
2024-07-28 03:43:39 +00:00
William Wen
7d282d8755 [dynamo] add lazy IteratorVariable implementations for map and zip (#131413)
Fixes https://github.com/pytorch/pytorch/issues/130750.

Repro of lazy/eager `map` discrepancy without `islice`:
```python
    def fn(a, b):
        y = 1

        def f(x):
            nonlocal y
            y += 1
            return x

        l = list(zip([a, b], map(f, [1, 2, 3, 4])))
        return a + y
```

The major change is that we implement `MapVariable` and `ZipVariable` based on `IteratorVariable`. Before, `map` and `zip` were being traced by immediately unpacking the result as a `TupleVariable`, which is wrong in cases such as the example above.

`MapVariable`s are not allowed to be unpacked while `ZipVariable`s can only be unpacked if all of its iterables can also be unpacked.

We also add new `[has_]force_unpack_var_sequence` methods to `VariableTracker` for the case where it is safe to unpack the entire sequence lazily, e.g., when building a list from a map (i.e. `list(map(f, ...))`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131413
Approved by: https://github.com/anijain2305
2024-07-26 10:47:38 +00:00
Aaron Orenstein
193f62fde9 [BE] typing for decorators - fx/_compatibility (#131568)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568
Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519
2024-07-25 22:24:19 +00:00
Adria Orenstein
f75d724482 Updating Types in torch/_dynamo/utils.py (#131001)
Adds some type annotations to the torch/_dynamo/utils.py file.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131001
Approved by: https://github.com/aorenste
2024-07-23 18:25:52 +00:00
Michael Lazos
470f07c840 Add guard override capability for tensor subclass metadata (#130780)
Fixes https://github.com/pytorch/pytorch/issues/114405

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130780
Approved by: https://github.com/anijain2305, https://github.com/bdhirsh
ghstack dependencies: #130779
2024-07-17 19:13:53 +00:00
Aaron Orenstein
4c3348932c typing: convert_frame (#130670)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130670
Approved by: https://github.com/Skylion007
ghstack dependencies: #130669
2024-07-16 14:31:35 +00:00
Michael Lazos
d8616eb66a Mark nn_module params and buffers as static in dynamo (#130391)
This PR marks all buffers and parameters of an NNModule as static using the `mark_static_address` API. As a result, when tensors are passed to AOT, the `tensor_dict` metadata of placeholder nodes will contain the `static_address_type` key, indicating which graph argument positions are static for cudagraphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130391
Approved by: https://github.com/anijain2305
2024-07-16 00:25:23 +00:00
Alex Dennis
7d4f50de19 dynamo add support for defaultdict(set) (#130745)
Fixes #130554

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130745
Approved by: https://github.com/Skylion007
2024-07-15 22:23:33 +00:00
dshi7
d727e2f2d1 add total wall time in calculate_time_spent (#130611)
Fixes #ISSUE_NUMBER

Actual wall time is fwd_entire_frame_time + bwd_inductor_compile.  `calculate_time_spent` is accessed internally for monitoring use https://fburl.com/code/iiurj5m6.  However, summing values up lose the info of fwd/bwd.

This PR adds a new key of `total_wall_time` without affecting dynamo counters.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130611
Approved by: https://github.com/oulgen, https://github.com/Yuzhen11
2024-07-12 19:32:44 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
Animesh Jain
c5c9dbece1 [dynamo][user-defined] Simplify and improve scope of UserDefinedObject var_getattr (#130169)
Fixes https://github.com/pytorch/pytorch/issues/122649

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130169
Approved by: https://github.com/jansel
ghstack dependencies: #118448, #130159
2024-07-08 04:10:56 +00:00
Shunting Zhang
c0735a3dd3 [pt2-bench] fix accuracy failure for a few models (#129941)
This PR batch the fix for a few accuracy failures issues during training by raising tolerance. I do that only for models that I think it fails not due to real issue.

## sebotnet33ts_256

The accuracy test for this model start to fail around June 05 [link](https://hud.pytorch.org/benchmark/timm_models/inductor_with_cudagraphs?dashboard=torchinductor&startTime=Sun%2C%2002%20Jun%202024%2007%3A19%3A38%20GMT&stopTime=Tue%2C%2002%20Jul%202024%2007%3A19%3A38%20GMT&granularity=day&mode=training&dtype=amp&lBranch=main&lCommit=04a0d856207d83c2031e4b9cb6825ba3e0092850&rBranch=main&rCommit=e62925930f6a62f6aeeb1fe1a661a9bd3352b53d&model=sebotnet33ts_256).

I can not repro locally, but from the log from the dashboard:
```
RMSE (res-fp64): 0.09441, (ref-fp64): 0.02971 and shape=torch.Size([1536]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.040000
```
raising the tolerance should fix it.

## DebertaForQuestionAnswering

This model fails accuracy test on the dashboard only in max-autotune mode. I can not repro locally by command:
```
TORCHINDUCTOR_MAX_AUTOTUNE=1 time python benchmarks/dynamo/huggingface.py --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --only DebertaForQuestionAnswering
```

From error message on the dashboard:
```
RMSE (res-fp64): 0.01803, (ref-fp64): 0.00537 and shape=torch.Size([2]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.010000
```

0.02 tolerance should suppress this error.

## gluon_inception_v3

This model fail on the dashboard in max-autotune mode. I can not repro locally by command
```
TORCHINDUCTOR_MAX_AUTOTUNE=1 time python benchmarks/dynamo/timm_models.py --accuracy --training --amp --backend inductor --disable-cudagraphs --device cuda --only gluon_inception_v3
```

From error message on the dashboard
```
RMSE (res-fp64): 0.02798, (ref-fp64): 0.00730 and shape=torch.Size([384]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.010000
Accuracy failed for key name Mixed_7c.branch3x3dbl_3a.bn.running_var
```
raising tolerance should suppress this error.

# mobilenetv3_large_100
Fail in MA model. I can not repro locally by command
```
TORCHINDUCTOR_MAX_AUTOTUNE=1 time python benchmarks/dynamo/timm_models.py --accuracy --training --amp --backend inductor --disable-cudagraphs --device cuda --only
```
The error message on the dashboard is
```
RMSE (res-fp64): 0.29754, (ref-fp64): 0.05205 and shape=torch.Size([]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.040000
```

The tensor is so small that the noise can be high. I use larger multiplier for smaller tensor in torch._dynamo.utils.same.

# yolov3

Fail on dashboard with error
```
Error on the dashboard: RMSE (res-fp64): 0.01278, (ref-fp64): 0.00246 and shape=torch.Size([256]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.001000
```

Fix it by using a larger multiplier for smaller tensors and raising the tolereance.

# timm_efficientdet

Fail on the dashboard with error
```
E0623 18:37:43.638000 139924418725056 torch/_dynamo/utils.py:1468] RMSE (res-fp64): 0.00096, (ref-fp64): 0.00009 and shape=torch.Size([2]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.001000
```
But I can not repro locally with command
```
time python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --only timm_efficientdet  --training
```

Raise the tolerance should fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129941
Approved by: https://github.com/jansel
ghstack dependencies: #129996
2024-07-05 10:26:39 +00:00
Shunting Zhang
8f1c2e1e28 [pt2-bench] pass acc test if ref is NaN (#129996)
I'm debugging the accuracy failure for training vision_maskrcnn.

Unfortunately I could not succeed to run it locally (I've check pined commits for torchbenchmars/torchvision are correct, and reinstalled torchbenchmark for mask_rcnn). I get this error:
```
eager run fail: AssertionError: targets should not be none when in training mode
```
(Command: time python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --training --only vision_maskrcnn )

But look at the log from the dashboard
```
E0623 19:17:59.085000 140114670171328 torch/_dynamo/utils.py:1468] RMSE (res-fp64): nan, (ref-fp64): nan and shape=torch.Size([1024, 256, 1, 1]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.001000
```

We can see both the reference number and the pt2 number are NaN. I change torch._dynamo.utils.same to return true if both RMSE values are NaN.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129996
Approved by: https://github.com/jansel
2024-07-05 10:26:39 +00:00
PyTorch MergeBot
6dfa53ca76 Revert "[pt2-bench] pass acc test if ref is NaN (#129996)"
This reverts commit 51fa0bd436.

Reverted https://github.com/pytorch/pytorch/pull/129996 on behalf of https://github.com/jeanschmidt due to Seems to have introduced breakages in main cuda12 focal jobs ([comment](https://github.com/pytorch/pytorch/pull/129996#issuecomment-2209175516))
2024-07-04 14:55:38 +00:00
PyTorch MergeBot
fa3953a2e1 Revert "[pt2-bench] fix accuracy failure for a few models (#129941)"
This reverts commit dafbd603ee.

Reverted https://github.com/pytorch/pytorch/pull/129941 on behalf of https://github.com/jeanschmidt due to Seems to have introduced breakages in main cuda12 focal jobs ([comment](https://github.com/pytorch/pytorch/pull/129996#issuecomment-2209175516))
2024-07-04 14:55:38 +00:00
Animesh Jain
a7a7363be0 [dynamo] Skip side effect tracking for c wrappers/descriptors (#129914)
Fixes PYTORCH_TEST_WITH_DYNAMO=1 pytest -vs test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_wrapper_subclass

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129914
Approved by: https://github.com/jansel
ghstack dependencies: #129913
2024-07-04 03:14:45 +00:00
Shunting Zhang
dafbd603ee [pt2-bench] fix accuracy failure for a few models (#129941)
This PR batch the fix for a few accuracy failures issues during training by raising tolerance. I do that only for models that I think it fails not due to real issue.

## sebotnet33ts_256

The accuracy test for this model start to fail around June 05 [link](https://hud.pytorch.org/benchmark/timm_models/inductor_with_cudagraphs?dashboard=torchinductor&startTime=Sun%2C%2002%20Jun%202024%2007%3A19%3A38%20GMT&stopTime=Tue%2C%2002%20Jul%202024%2007%3A19%3A38%20GMT&granularity=day&mode=training&dtype=amp&lBranch=main&lCommit=04a0d856207d83c2031e4b9cb6825ba3e0092850&rBranch=main&rCommit=e62925930f6a62f6aeeb1fe1a661a9bd3352b53d&model=sebotnet33ts_256).

I can not repro locally, but from the log from the dashboard:
```
RMSE (res-fp64): 0.09441, (ref-fp64): 0.02971 and shape=torch.Size([1536]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.040000
```
raising the tolerance should fix it.

## DebertaForQuestionAnswering

This model fails accuracy test on the dashboard only in max-autotune mode. I can not repro locally by command:
```
TORCHINDUCTOR_MAX_AUTOTUNE=1 time python benchmarks/dynamo/huggingface.py --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --only DebertaForQuestionAnswering
```

From error message on the dashboard:
```
RMSE (res-fp64): 0.01803, (ref-fp64): 0.00537 and shape=torch.Size([2]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.010000
```

0.02 tolerance should suppress this error.

## gluon_inception_v3

This model fail on the dashboard in max-autotune mode. I can not repro locally by command
```
TORCHINDUCTOR_MAX_AUTOTUNE=1 time python benchmarks/dynamo/timm_models.py --accuracy --training --amp --backend inductor --disable-cudagraphs --device cuda --only gluon_inception_v3
```

From error message on the dashboard
```
RMSE (res-fp64): 0.02798, (ref-fp64): 0.00730 and shape=torch.Size([384]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.010000
Accuracy failed for key name Mixed_7c.branch3x3dbl_3a.bn.running_var
```
raising tolerance should suppress this error.

# mobilenetv3_large_100
Fail in MA model. I can not repro locally by command
```
TORCHINDUCTOR_MAX_AUTOTUNE=1 time python benchmarks/dynamo/timm_models.py --accuracy --training --amp --backend inductor --disable-cudagraphs --device cuda --only
```
The error message on the dashboard is
```
RMSE (res-fp64): 0.29754, (ref-fp64): 0.05205 and shape=torch.Size([]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.040000
```

The tensor is so small that the noise can be high. I use larger multiplier for smaller tensor in torch._dynamo.utils.same.

# yolov3

Fail on dashboard with error
```
Error on the dashboard: RMSE (res-fp64): 0.01278, (ref-fp64): 0.00246 and shape=torch.Size([256]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.001000
```

Fix it by using a larger multiplier for smaller tensors and raising the tolereance.

# timm_efficientdet

Fail on the dashboard with error
```
E0623 18:37:43.638000 139924418725056 torch/_dynamo/utils.py:1468] RMSE (res-fp64): 0.00096, (ref-fp64): 0.00009 and shape=torch.Size([2]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.001000
```
But I can not repro locally with command
```
time python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --only timm_efficientdet  --training
```

Raise the tolerance should fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129941
Approved by: https://github.com/jansel
ghstack dependencies: #129996
2024-07-04 01:14:29 +00:00
Shunting Zhang
51fa0bd436 [pt2-bench] pass acc test if ref is NaN (#129996)
I'm debugging the accuracy failure for training vision_maskrcnn.

Unfortunately I could not succeed to run it locally (I've check pined commits for torchbenchmars/torchvision are correct, and reinstalled torchbenchmark for mask_rcnn). I get this error:
```
eager run fail: AssertionError: targets should not be none when in training mode
```
(Command: time python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --training --only vision_maskrcnn )

But look at the log from the dashboard
```
E0623 19:17:59.085000 140114670171328 torch/_dynamo/utils.py:1468] RMSE (res-fp64): nan, (ref-fp64): nan and shape=torch.Size([1024, 256, 1, 1]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.001000
```

We can see both the reference number and the pt2 number are NaN. I change torch._dynamo.utils.same to return true if both RMSE values are NaN.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129996
Approved by: https://github.com/jansel
2024-07-04 01:14:29 +00:00
Animesh Jain
514f9279f8 [dynamo][compile-time] Manually implement nn.Module.__getattr__ to reduce compile time (#129315)
# Compile time for eager backend
## AlbertForMaskedLM
No inlining - 3.65 seconds
Inlining on main - 7.48 seconds
Inlining + this PR - 6.70 seconds

## MobileBertForMaskedLM
No inlining - 26.90 seconds
Inlining on main - 48.21 seconds
Inlining + this PR - 43.85 seconds

*Next PR in the stack makes the total compile time better/comparable to no inlining*

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129315
Approved by: https://github.com/jansel
ghstack dependencies: #129316
2024-06-25 01:31:26 +00:00
Simon Fan
f0443ad174 [compiled autograd] flatten runtime inputs with fast path (#129116)
covered by test_compiled_autograd.py and test_standalone_compile.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129116
Approved by: https://github.com/jansel
ghstack dependencies: #127960, #128905, #128982, #128987, #129181
2024-06-21 08:16:33 +00:00
Simon Fan
123812790b [compiled autograd] update benchmarks to use cli flags for fullgraph/dynamic (#127960)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127960
Approved by: https://github.com/jansel
2024-06-21 08:16:33 +00:00
Animesh Jain
6b5fbc544e [dynamo] Use polyfill to trace through the attributes of torch.jit.* and lru_cache_wrapper (#128336)
Earlier we were taking the vt for `obj` and then monkeypatching that `vt.source` to be `obj._torchdynamo_inline`. If one accesses `obj.attr_a`, this would cause problems because Dynamo would then search it in `obj._torchdynamo_inline.attr_a`. This PR makes it more functional, so that we have different vts for obj and `ob._torchdynamo_inline`.

Fixes https://github.com/pytorch/pytorch/issues/93698

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128336
Approved by: https://github.com/jansel, https://github.com/yanboliang
ghstack dependencies: #129117
2024-06-21 07:44:44 +00:00
Yanbo Liang
acefc5c016 [torch.compile] Enable bwd compilation metrics (#128973)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128973
Approved by: https://github.com/dshi7
2024-06-19 03:45:41 +00:00
Simon Fan
4b96575a09 [dynamo][aot autograd] Silently disable default saved tensor hooks during tracing (#123196)
FIXES #113263. Same idea as in https://github.com/pytorch/pytorch/pull/113417, but we need a more intrusive C API to silently nop default saved tensor hooks, in order to support user-code that use torch.autograd.disable_saved_tensors_hooks (see test_unpack_hooks_can_be_disabled). We mock the output of get_hooks while leaving push/pop untouched.

For compiled autograd, we're firing pack hooks once and unpack hooks twice right now, I'll look into this separately from this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123196
Approved by: https://github.com/soulitzer
2024-06-14 20:28:08 +00:00
chilli
c486e2ab64 Add coloring to fx graph print out (#128476)
Note: Won't land immediately, at least I'll need to add a color option to the field. But curious if any tests fail.

Old:
<img width="1294" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/c3a750ed-5e54-4621-b2e4-be5481be15b6">

New:
<img width="1303" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/3a1f1adc-6f3a-413e-8b87-ee53da9bf4ed">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128476
Approved by: https://github.com/ezyang
2024-06-13 23:39:04 +00:00
rzou
87072dcfdb Change Dynamo's custom ops warning message to be less spammy (#128456)
This is a short-term fix (for 2.4). In the longer term we should
fix https://github.com/pytorch/pytorch/issues/128430

The problem is that warnings.warn that are inside Dynamo print
all the time. Python warnings are supposed to print once, unless their
cache is reset: Dynamo ends up resetting that cache everytime it runs.

As a workaround we provide our own warn_once cache that is keyed on the
warning msg. I am not worried about this increasing memory usage because
that's effectively what python's warnings.warn cache does.

Test Plan:
- fix tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128456
Approved by: https://github.com/anijain2305
2024-06-12 21:57:12 +00:00
Animesh Jain
c0b87afcad [RELAND2][dynamo][nn-modules] Trace through nn.Module dunder methods for UnspecializedNNModule (#126578)
Tracing through `__init__`  is important because it initializes (calls STORE_ATTR) on members. By doing that, we kick in the mutation tracking for these objects. So, things like mutating `_modules` etc is tracked automatically.

Fixes https://github.com/pytorch/pytorch/issues/111837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126578
Approved by: https://github.com/jansel
2024-06-12 04:09:23 +00:00