Commit Graph

20 Commits

Author SHA1 Message Date
bobrenjc93
1fe3af2c68 Migrate from Tuple -> tuple in torch/_dynamo (#144261)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144261
Approved by: https://github.com/aorenste, https://github.com/zou3519
2025-01-10 07:45:57 +00:00
Oguz Ulgen
9ee242213b [RFC] Introduce cache hot loading APIs (a.k.a. "Mega-cache") (#143341)
This PR essentially introduces two new APIs
* torch.compiler.save_cache_artifacts
* torch.compiler.load_cache_artifacts

which aim to create a mega cache experience where the user can start collecting cache artifacts, and later call the save API to fetch them. In the next attempt, the user can "hot load" the cache artifacts via the load function.

This bundling approach reduces the need to rely on porting individual files one by one, or relying on many network requests.

Note that these APIs CANNOT log to structured logging as these functions will be called before and after compilation, as opposed to during compilation. Due to this limitation, the API returns a struct that the user can log with.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143341
Approved by: https://github.com/jansel
2025-01-07 23:13:24 +00:00
James Wu
f2d6cfa677 Introduce CompileEventLogger, replace usages of metrics_context and chromium_event with it (#143420)
**Problem statement**: I want to be able to centralize and simplify the process by which people add columns/data to existing spans. We have MetricsContext and ChromiumEventLogger, and there's various choices you can make to decide where and when to log different levels of observability for your events. To resolve this, I want a central API for "adding to events under dynamo_timed".

**CompileEventLogger** is intended as a frontend for MetricsContext and ChromiumEventLogger so we can use the same class for handling everything.

CompileEventLogger is intended be used within a `dynamo_timed()` context. Its purpose is to 1. log to existing events that are in progress (i.e. within dynamo_timed), and 2. log instant events to chromium that are independent of any specific span.

CompileEventLogger has three log levels:

- CHROMIUM: Log only to chromium events, visible via tlparse.
- PT2_COMPILE: Log to chromium_events + pt2_compile_events
- COMPILATION_METRIC: Log to compilation metrics in addition to the toplevel chromium and pt2_compile_event.

In addition, we have a function CompileEventLogger.add() that automagically chooses the correct log level. For now, it is conservative, and will never automagically choose to log CompilationMetrics (though I could imagine it figuring out the metadata are all keys in CompilationMetric and therefore loggable there).

The goal here is to make one single interface to log stuff for observability reasons, and make it as easy as possible.

Not included in this diff:
- V1 of this diff will not have implementations of `increment` and `add_to_set` which MetricsContext has, so those usages are not replaced yet. But I'll add those in a followup.

- We don't handle `RuntimeMetricsContext`. It's unclear if I want that to be part of this, because under RuntimeMetricsContext there might not be a toplevel event to log to, so chromium events doesn't make sense in that context. So I might leave that separate for now.

Differential Revision: [D67346203](https://our.internmc.facebook.com/intern/diff/D67346203/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143420
Approved by: https://github.com/aorenste
2025-01-04 22:40:34 +00:00
Colin L. Rice
a94f259a69 pgo: Log feature use (#142819)
This will cause dynamo_compile to popualte the feature column if we have
a hit for PGO.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142819
Approved by: https://github.com/ezyang
2024-12-20 20:22:20 +00:00
Colin L. Rice
d68403df3b filelock: Make waitcounter variant to use (#139816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816
Approved by: https://github.com/ezyang
2024-12-12 01:18:34 +00:00
PyTorch MergeBot
2374d460d0 Revert "filelock: Make waitcounter variant to use (#139816)"
This reverts commit 237c4b559c.

Reverted https://github.com/pytorch/pytorch/pull/139816 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/139816#issuecomment-2536616808))
2024-12-11 17:26:46 +00:00
Colin L. Rice
237c4b559c filelock: Make waitcounter variant to use (#139816)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139816
Approved by: https://github.com/ezyang
2024-12-10 23:02:59 +00:00
Oguz Ulgen
0f6bfc58a2 Introduce remote cache key prefix to break cache (#142148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142148
Approved by: https://github.com/jamesjwu, https://github.com/ezyang
2024-12-10 00:35:50 +00:00
Edward Z. Yang
114a0bc306 Make PGO work correctly with NJT inputs (#140046)
We were actually triggering a latent bug where nested ints were
uselessly being incorporated into the automatic dynamic state, even
though they were unconditionally ignored afterwards.  Now we munge
them out before putting them in.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Differential Revision: [D65623303](https://our.internmc.facebook.com/intern/diff/D65623303)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140046
Approved by: https://github.com/jbschlosser, https://github.com/bdhirsh
ghstack dependencies: #140042
2024-11-08 04:27:39 +00:00
James Wu
c35a01173b Remove compile event logging for automatic dynamic (#139891)
Summary: These events are a pretty large portion of the table, but not really currently used. Only log to tlparse for now.

Test Plan: Unit tests

Differential Revision: D65539986

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139891
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2024-11-07 14:52:10 +00:00
Edward Z. Yang
349cd49406 Fix compiler collective TORCH_TRACE and improve code state printing (#139716)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139716
Approved by: https://github.com/yf225
2024-11-05 14:32:52 +00:00
Edward Yang
639162f39a Add cache size to pt2_compile_events (#139627)
Summary:
I realized I wanted to check "are my cache entries/IO unreasonably large"
and there's no easy way to do it.  This lets me do it.

Test Plan: servicelab

Differential Revision: D65390363

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139627
Approved by: https://github.com/c00w
2024-11-05 00:30:10 +00:00
Edward Z. Yang
585dbfa583 Profile guided optimization for automatic_dynamic (#139001)
Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR.

This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001
Approved by: https://github.com/oulgen
2024-11-03 06:29:57 +00:00
PyTorch MergeBot
92d7f29e59 Revert "Profile guided optimization for automatic_dynamic (#139001)"
This reverts commit f6be44c74e.

Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to more fbcode errors ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452985581))
2024-11-02 13:11:04 +00:00
Edward Z. Yang
f6be44c74e Profile guided optimization for automatic_dynamic (#139001)
Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR.

This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001
Approved by: https://github.com/oulgen
2024-11-02 11:50:11 +00:00
PyTorch MergeBot
8d1eaa3da6 Revert "Profile guided optimization for automatic_dynamic (#139001)"
This reverts commit a6630bcf87.

Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to internal code triggers import cycle ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452833882))
2024-11-02 03:38:15 +00:00
Edward Z. Yang
a6630bcf87 Profile guided optimization for automatic_dynamic (#139001)
Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR.

This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001
Approved by: https://github.com/oulgen
2024-11-01 21:43:25 +00:00
James Wu
c8a648d4df Add option to dynamo_timed and chromium_event_logger for logging pt2 compile events (#139309)
This diff considerably changes the column format of PT2 Compile Events:

- Now, instead of logging one new column per every piece of metadata, we just log a single column, "metadata". This vastly decreases the number of columns we need to log, which should help with retention.

- Now, we only log to scuba for a set of dynamo_timed() events that we actually care about aggregating. To do so, we add a boolean to dynamo_timed() that decides whether or not to log a pt2_compile_event. We'll always log a chromium_event for every dynamo_timed(), but only log a subset of those to scuba.

Differential Revision: [D65225598](https://our.internmc.facebook.com/intern/diff/D65225598/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139309
Approved by: https://github.com/oulgen
2024-11-01 02:40:25 +00:00
Edward Z. Yang
c480a479b1 Make automatic_dynamic state live per CodeId, rather than on code object (#138740)
This is semantics changing as if you are dealing with multiple code objects which have exactly the same filename/firstlineno/name, but are distinct objects, and need non-aliasing automatic dynamic state. Otherwise, this should be equivalent (modulo lifetime). I want to do this because when I do PGO I can't index on code object identity, need a stable identifier.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138740
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #138693, #138717
2024-10-27 03:08:41 +00:00
Edward Z. Yang
14a45d7793 Refactor core algorithm for automatic dynamic shapes (#138717)
While working on automatic dynamic PGO (https://github.com/pytorch/pytorch/pull/138052) one abstract property I was looking for out of profile information is that it formed a semilattice: I could join together two profiles and get a merged profile that is consistent with the profiles that I saw in both cases. While working on this data structure that supported joins, I realized that the base automatic dynamic algorithm could be implemented in this way, therefore this refactor.

The basic recipe is that we now support a join operation on FrameStateSizeEntry. Intuitively, if you join two sizes that are equal, you get back that size (join(2, 2) == 2), but if you join two different sizes you get a special singleton auto_dynamic indicating that the size of the tensor is dynamic (join(2, 3) == auto_dynamic). So now, the automatic dynamic algorithm is: (1) compute the FrameStateSizeEntry that corresponds to the concrete values we've seen, and (2) join it into the ambient FrameStateSizeEntry. As a bonus, compiler collectives can buy into the same abstraction (we're simply distributing FrameStateSizeEntry from each node to every other node). For convenience, I also added the necessary `auto_unset` extra state which is the identity element (which makes our semilattice bounded from both top and bottom). Here, join(2, auto_unset) == 2.

While doing this, there was a complication: the infer stride algorithm wasn't technically a semilattice. Here, I did what I suggested in the original code review https://github.com/pytorch/pytorch/pull/130232 which is stop using a heuristic, and instead replicate the stride inference algorithm in automatic dynamic. This means that when I join strides together, I don't join their concrete values, instead, if a stride can be inferred as the contiguous stride for a particular inner dimension, then you represent it as InferStride(dim). There's an example in code which I recommend looking at.

Some other extra things that are happening in this PR:

* I tried to deduplicate the size/stride automatic dynamic logic as much as possible. So hopefully less code to review here.
* I had to reimplement all the logging. For the most part I tried to track the logging as closely to the original as possible, but I think we could be emitting less Chrome events here
* The `marked_dynamic` handling is still preserved as is, but I kind of don't like it and we should figure out how to put it somewhere else

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138717
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #138693
2024-10-27 03:08:41 +00:00