pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xuehai Pan	cdde73033e	[dynamo] fix generic namedtuple support when the class is created via `class MyTuple(NamedTuple, Generic[T]): ...` (#141360 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141360 Approved by: https://github.com/jansel	2024-11-27 00:21:58 +00:00
Sam Larsen	07906f2f2b	[logging] Move population of common MetricsContext fields to record_compilation_metrics (#141291 ) Summary: Fix outstanding TODOs related to logging of CompilationMetrics by moving the population of common fields to record_compilation_metrics() instead of populating those independently wherever we use a the metrics_context contextmanager: * Keep track of start and end time in MetricsContext and pass those to record_compilation_metrics() and populate those fields in that function. * Pass exception info to record_compilation_metrics() and populate those field in that function. * Add a new contextmanager, chromium_event_timed, to create the start/end "dynamo" event. This is important because I want this contextmanager to complete _after_ building the CompilationMetrics. * Populate the compile_id field centrally in record_compilation_metrics(). * Populate the structured_logging_overhead centrally in record_compilation_metrics(). * Add the CompilationMetrics to the current chromium event in record_compilation_metrics(), after all common fields have been added. In a future diff, I can also add _all_ compilation metrics to the chromium event. Test plan: Unit tests. Also see internal testing: * dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/jrascnf9 * pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/l3jnla06 * tlparse: https://fburl.com/bq5a9nqs Pull Request resolved: https://github.com/pytorch/pytorch/pull/141291 Approved by: https://github.com/jamesjwu	2024-11-25 13:18:40 +00:00
Colin L. Rice	1aea642393	pytorch/feature: Record if inductor fx cache is enabled (#141059 ) This uses the underlying infrastructure and records if the fx cache is enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141059 Approved by: https://github.com/masnesral	2024-11-23 01:55:27 +00:00
Jovian Anthony Jaison	45d62d6fc5	[dynamo] Added cuda and triton versions to dynamo_compile (#141290 ) Opening another PR since #141140 was reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141290 Approved by: https://github.com/masnesral	2024-11-22 20:04:42 +00:00
Simon Fan	db4e8a1d8a	[ca] expose option to collect sizes as dynamic (#141153 ) This is to address recompiles from eager nodes that saved dynamic activations Pull Request resolved: https://github.com/pytorch/pytorch/pull/141153 Approved by: https://github.com/jansel ghstack dependencies: #141152	2024-11-22 19:26:27 +00:00
Colin L. Rice	f5d00f1456	pytorch/features: Make a feature logger and record triton bundling (#141056 ) This modifies metrics_context to allow us to store whether a feature was used or not. This also starts recording this for triton bundling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141056 Approved by: https://github.com/masnesral	2024-11-22 01:31:08 +00:00
Prajesh Praveen Anchalia	4e34fbdcbc	Add inductor_fx_graph_cache stats to dynamo_utils (#141190 ) Summary: Add the following inductor fx graph cache stats to dynamo compile - inductor_fx_cache_hit_count - inductor_fx_cache_miss_count - inductor_fx_cache_backend_type - inductor_fx_cache_hit_keys - inductor_fx_cache_miss_keys - remote_cache_version Test Plan: Run local tests and staging logger: P1683061460 Differential Revision: D66232206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141190 Approved by: https://github.com/masnesral	2024-11-21 20:59:10 +00:00
Ivan Zaitsev	149677e30c	Revert "[dynamo] Added cuda and triton versions to dynamo_compile" (#141280 ) Reverts pytorch/pytorch#141140 reason: conflicts with https://github.com/pytorch/pytorch/pull/141190 and wasn't merged using mergebot Pull Request resolved: https://github.com/pytorch/pytorch/pull/141280 Approved by: https://github.com/clee2000, https://github.com/kit1980	2024-11-21 20:50:06 +00:00
Jovian Anthony Jaison	11d0ba068f	[dynamo] Added cuda and triton versions to dynamo_compile (#141140 ) [dynamo] Added cuda and triton versions to dynamo_compile (#141140) Summary: Add cuda and triton versions to dynamo_compile logging site. Test Plan: $ buck2 run mode/opt //scripts/oulgen:runner File changed: fbcode//caffe2/torch/_dynamo/convert_frame.py Buck UI: https://www.internalfb.com/buck2/1a8ada1f-d54e-44b2-a368-b2ff2030e113 Network: Up: 65KiB Down: 0B (reSessionID-8f4d1d6d-a680-4ecc-8e73-c29c932d824b) Jobs completed: 2166. Time elapsed: 7.0s. Cache hits: 0%. Commands: 3 (cached: 0, remote: 0, local: 3) BUILD SUCCEEDED ... Cuda: 12.4.0 Triton: 3.0.0 Reviewed By: masnesral Differential Revision: D66181508	2024-11-21 12:20:02 -08:00
Aaron Gokaslan	12e95aa4ee	[BE]: Apply PERF401 autofixes from ruff (#140980 ) * Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-11-20 17:52:07 +00:00
Sam Larsen	ff17d2b83e	[easy][logging] Remove dynamo_timed fwd_only param (#140993 ) Summary: It's ignored; remove it Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/140993 Approved by: https://github.com/ezyang	2024-11-20 02:31:51 +00:00
Prajesh Praveen Anchalia	1e234e63b3	[pytorch][dynamo_compile] Log inductor config to dynamo_compile (#140790 ) Summary: Scrubbed inductor config logging to dynamo_compile as json:str. Scrub RE: `r'((^TYPE_CHECKING$)\|(._progress$)\|(.TESTING.)\|(.(rocm\|halide).)\|(^trace\..)\|(^_))'`to save some space. Test Plan: Staging logger: https://fburl.com/data/ltkt08zm P1679697917 {F1958428018} Differential Revision: D65806399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140790 Approved by: https://github.com/masnesral	2024-11-19 02:39:33 +00:00
James Wu	8d5b3eeaa6	Remove __start__ stack, log backward compile to empty stack (#140431 ) Summary: This diff removes "__start__" from all stacks in Pt2 Compile Events, as it's unnecessary. It also starts logging events for backward compile, because otherwise we have no toplevel event representing full backward compilation. This gives us a toplevel event outside of the inductor compile. Test Plan: New chromium events: https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json#!/viewer?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json&local_cache_key New tlparse: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/jjwu/custom/stuff4/index.html New scuba icicle view, still good: https://fburl.com/scuba/pt2_compile_events/z6gr3z53 Differential Revision: D65832045 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140431 Approved by: https://github.com/masnesral	2024-11-18 22:48:31 +00:00
Bob Ren	e1d6c08f3d	Specialize symfloats when getting fake value involves complex args (#140832 ) Fixed `PYTORCH_TEST_WITH_DYNAMO=1 tlp python test/test_sparse_csr.py TestSparseCSRCPU.test_sampled_addmm_cpu_complex64` when `specialize_float=False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140832 Approved by: https://github.com/ezyang ghstack dependencies: #140830	2024-11-17 18:17:54 +00:00
Xuehai Pan	90d3584147	[dyanmo] support subclasses of namedtuple type (#140534 ) Allow subclassing namedtuple type. Allow assign attributes to instances of these subtypes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140534 Approved by: https://github.com/jansel	2024-11-17 14:13:40 +00:00
Sam Larsen	e2e67a010a	[logging] Add dynamo_compile fields for pre-dispatch/joint/post-dispatch times (#140306 ) Tested internally: P1679622670 Differential Revision: [D65986059](https://our.internmc.facebook.com/intern/diff/D65986059) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140306 Approved by: https://github.com/ezyang	2024-11-15 15:02:08 +00:00
Sam Larsen	b11ff3cf60	[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849 ) Here's the overview: There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits. Some specifics: * Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile). * Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed. * Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead. * `record_compilation_metrics` is now called on exit from MetricsContext. * Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`. * Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext. And specifically, several changes to dynamo_timed: * "Modernize" the parameters and update all callsites accordingly. * Move the backwards logging of the CompilationMetrics to the backwards compile location. * Add a parameter for which CompilationMetrics field to update Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849 Approved by: https://github.com/ezyang	2024-11-14 19:11:20 +00:00
Prajesh Praveen Anchalia	9ff368c270	[pytorch] Add logger for pt2 compile chromium events to hive (#139941 ) Summary: X-link: https://github.com/pytorch/benchmark/pull/2535 Logging raw chromium events to hive per job run enables us to build combined rank perfetto traces without having to depend on Logarithm and deal with things like rate limits etc. We can easily build a utility to query hive and upload traces to manifold and view them on perfetto Test Plan: Launch a job ``` buck2 run mode/opt //aps_models/examples/dlrm:dlrm_train_app -- --config-name train_mast_fsdp_torchdynamo launcher.data_project=apf_ai_infra launcher.fbl_entitlement=ai_infra_training_rnd_tc launcher.hardware=TC_ANY_80G ``` Local run ``` Perfetto: ['https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https://interncache-all.fbcdn.net/manifold/pt2_compile_traces_test/tree/pt2_trace_files/aps-ppanchalia-426838c277/0/0/2bc9975d-921c-4766-9cb2-e7ce9833ae96.json'] ``` {F1954710538} Differential Revision: D65525513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139941 Approved by: https://github.com/jamesjwu	2024-11-14 18:27:38 +00:00
Laith Sakka	f98c601efe	Avoid logging zeros (#139968 ) Summary: title Test Plan: NA Differential Revision: D65582953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139968 Approved by: https://github.com/zou3519	2024-11-14 15:46:49 +00:00
PyTorch MergeBot	d63eb3c46c	Revert "[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849 )" This reverts commit `cb15c15157`. Reverted https://github.com/pytorch/pytorch/pull/139849 on behalf of https://github.com/kit1980 due to Breaking an internal tests + there is a bug according to the author ([comment](https://github.com/pytorch/pytorch/pull/139849#issuecomment-2474459094))	2024-11-13 18:47:51 +00:00
Sam Larsen	cb15c15157	[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849 ) Here's the overview: There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits. Some specifics: * Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile). * Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed. * Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead. * `record_compilation_metrics` is now called on exit from MetricsContext. * Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`. * Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext. And specifically, several changes to dynamo_timed: * "Modernize" the parameters and update all callsites accordingly. * Move the backwards logging of the CompilationMetrics to the backwards compile location. * Add a parameter for which CompilationMetrics field to update Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849 Approved by: https://github.com/ezyang ghstack dependencies: #140094	2024-11-11 14:24:23 +00:00
Animesh Jain	86792a5a8d	[invoke_subgraph] User facing API to support arbitrary args and kwargs (#139162 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139162 Approved by: https://github.com/zou3519	2024-11-08 03:31:19 +00:00
Bob Ren	d8afa21ef2	specialize symfloats for wrapped_gradient in get_fake_value (#139935 ) Fixes `PYTORCH_TEST_WITH_DYNAMO=1 python test/test_torch.py TestTorchDeviceTypeCPU.test_gradient_type_promotion_cpu` when `specialize_float=False` Reviewers might wonder why we need to have this whitelist. Can't we rely on python_arg_parser.h to do the specialization generically? Alas this path doesn't actually FFI to C++ so we do need to do the specialization in pythonland. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139935 Approved by: https://github.com/ezyang ghstack dependencies: #139569, #139457, #139568, #139572, #139846, #139454, #139896	2024-11-07 20:27:02 +00:00
Oguz Ulgen	1270c78268	Add logging for num_triton_bundles (#139807 ) Summary: Adding logs for number of inductor cache triton bundles Test Plan: Ran adhoc code and looked at dynamo_compile/sandbox https://fburl.com/scuba/dynamo_compile/sandbox/nhktfy19 Differential Revision: D65490826 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139807 Approved by: https://github.com/masnesral	2024-11-06 21:11:04 +00:00
Laith Sakka	3f248a5735	Classify miss-inplaced tensors in logs. (#139240 ) Summary: use signpost logs, a followup is to remove the field possibly_missed_reinplacing_opportunities form dynamo compile table. Differential Revision: D65180194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139240 Approved by: https://github.com/zou3519	2024-11-04 23:56:14 +00:00
Bob Ren	9919932783	Specialize symfloats that flow through is_integer (#139572 ) Fixes `python test/dynamo/test_dynamic_shapes.py DynamicShapesFunctionTests.test_number_method_method_is_integer_num_type6_dynamic_shapes` when specialize_float = False Pull Request resolved: https://github.com/pytorch/pytorch/pull/139572 Approved by: https://github.com/ezyang ghstack dependencies: #139569, #139457, #139568	2024-11-04 23:35:35 +00:00
James Wu	c8a648d4df	Add option to dynamo_timed and chromium_event_logger for logging pt2 compile events (#139309 ) This diff considerably changes the column format of PT2 Compile Events: - Now, instead of logging one new column per every piece of metadata, we just log a single column, "metadata". This vastly decreases the number of columns we need to log, which should help with retention. - Now, we only log to scuba for a set of dynamo_timed() events that we actually care about aggregating. To do so, we add a boolean to dynamo_timed() that decides whether or not to log a pt2_compile_event. We'll always log a chromium_event for every dynamo_timed(), but only log a subset of those to scuba. Differential Revision: [D65225598](https://our.internmc.facebook.com/intern/diff/D65225598/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139309 Approved by: https://github.com/oulgen	2024-11-01 02:40:25 +00:00
James Wu	864beebb41	[easy] Add start event metadata to collected metadata for PT2 Compile Events (#139289 ) We should be logging metadata from event starts to PT2 Compile Events too. Differential Revision: [D65070086](https://our.internmc.facebook.com/intern/diff/D65070086/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139289 Approved by: https://github.com/oulgen	2024-10-31 16:52:30 +00:00
Simon Fan	fd9f4e6770	Back out "[compiled autograd] tls access helpers (#138061 )" and Back out "[compiled autograd] Compiled autograd configs in TLS (#137821 )" (#139086 ) Summary: Original commit changeset: 9bf80c1492d7 Original Phabricator Diff: D64796226 Original commit changeset: aa1d9ef8f6e6 Original Phabricator Diff: D64796212 Differential Revision: D65072644 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139086 Approved by: https://github.com/malfet	2024-10-28 23:37:05 +00:00
William Wen	35be6aef69	[dynamo] add some cpython debugging methods (#138030 ) This PR enables you to inspect PyObjects in C using `INSPECT(...)` without requiring https://docs.python.org/3/howto/gdb_helpers.html. `torch._dynamo.eval_frame.raise_sigtrap` can also be used to set gdb breakpoints while running Python code, e.g. ```python x = x + 1 torch._dynamo.eval_frame.raise_sigtrap(); # can breakpoint on ceval.c:CALL to breakpoint the `sin` call in C. x = torch.sin(x) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138030 Approved by: https://github.com/jansel	2024-10-28 22:25:21 +00:00
Edward Z. Yang	bca696ae81	Switch times to us in CompilationMetrics and improvements (#138975 ) Companion logger diff: https://www.internalfb.com/diff/D65012523 * Using float seconds for timestamps is bad because our internal system defaults to float32 precision and you don't even get second precision for timestamps in float32 * We decide to use microseconds instead of milliseconds because millisecond granularity you can end up with the same timestamp if compilation is happening very quickly; much better to force non-overlapping spans * Because there are so many new fields and I don't feel like reimplementing each on BwdCompilationMetrics, BwdCompilationMetrics is no more, it's just that everything in CompilationMetrics is now optional. * The actual frame compile times collection is not modified (still float) to reduce blast radius, so I just convert to microseconds before making the record. At float64 precision (Python's default), you get about microsecond precision on timestamps so shouldn't be a data problem (https://www.leebutterman.com/2021/02/01/store-your-unix-epoch-times-as-float64.html) * I rename some entries for clarity. In particular, whenever a timing contains all of the its lower phases (e.g., how Inductor also contains Triton compilation) we put "cumulative" in its name. If something doesn't happen at compile time but is delayed until we have actual real inputs, we put "runtime" in its name. Test plan: ``` buck2 run @mode/opt @mode/inplace //scripts/oulgen:runner ``` And then inspect https://fburl.com/scuba/dynamo_compile/sandbox/mslu7f5w and verify the us columns are populated and meaningful. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138975 Approved by: https://github.com/masnesral	2024-10-28 17:17:18 +00:00
Aaron Gokaslan	49ed365b22	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-10-26 15:07:13 +00:00
Sam Larsen	86b45bde19	[pt2] Add logger logging for remote fx graph cache get + put (#138164 ) Summary: Capture the timing for the remote fx graph cache get and put operations and add them to the logger logging. Test Plan: 1) Landed D64483593 and waited for logger actualization. 2) Ran test script on devserver: `buck2 run mode/opt scripts/slarsen/torch_compile_model:run` 3) Queried dynamo_compile/sandbox: ``` (pytorch-3.10_4) devvm2296:~/local/pytorch-3.10_4 $ scuba -e="select time,co_filename,remote_fx_graph_cache_get_time_s,remote_fx_graph_cache_put_time_s from \`dynamo_compile/sandbox\` where remote_fx_graph_cache_put_time_s is not null" +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+ \| time \| co_filename \| remote_fx_graph_cache_get_time_s \| remote_fx_graph_cache_put_time_s \| +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+ \| 1729136266 \| null \| 0.05652284622192383 \| 0.9691152572631836 \| \| 1729136263 \| /data/users/slarsen/fbsource/buck-out/v2/gen/fbcode/289bb46b326874c6/scripts/slarsen/torch_compile_model/__run__/run-inplace#link-tree/scripts/slarsen/torch_compile_model/run.py \| 0.8298435211181641 \| 0.18642282485961914 \| +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+ ``` Reviewed By: oulgen Differential Revision: D64484025 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138164 Approved by: https://github.com/jamesjwu, https://github.com/ezyang	2024-10-25 21:30:18 +00:00
Animesh Jain	cfdf658a91	[dynamo][modules] Support overridden __call__ on nn modules (#138619 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138619 Approved by: https://github.com/williamwen42 ghstack dependencies: #138657	2024-10-24 03:49:26 +00:00
Animesh Jain	b1acd0978e	[dynamo] Support range_iterator as a function input (#138657 ) Fixes https://github.com/pytorch/pytorch/issues/138654 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138657 Approved by: https://github.com/williamwen42, https://github.com/jansel	2024-10-24 03:49:26 +00:00
James Wu	a16476b671	Add support for adding extra metadata to chromium events, log to separate columns (#138477 ) This diff does a few things: ## Add metadata to events in progress Adds the ability to add extra metadata to Chromium Events via `add_event_data`. Metadata can only be added to chromium events that have started, but not ended (so, in progress events) - When you add the data, the metadata is appended to the metadata when you call log_event_end(). - The metadata appears in chromium events in tlparse. It also gets logged to scuba. ## New `dynamo` chromium event We add a new `dynamo` chromium event to the top of the stack, where we collect various metadata found in dynamo_compile. So the new order of events goes: ``` __start__ -> dynamo (dynamo compile metrics) -> entire_frame_compile (compile.inner) -> backend_compile (i.e. aotdispatch) -> create_aot_dispatch_function -> inductor_compile -> ... ``` BackwardCompilationMetrics doesn't have any dynamo specific information (as it's mostly inductor timings). So we don't include that here. FAQ: Why can't we use `entire_frame_compile` as the event? This is mostly due to backward compatibility with `dynamo_compile`. `dynamo_compile` collects CompilationMetrics outside of `compile.compile_inner`, and uses `dynamo_timed` to grab timings from phases of the compiler, including `entire_frame_compile`. So we don't have a CompilationMetric object until after an `entire_frame_compile` event ends! Separately, `dynamo` as a name for all of dynamo compile is more descriptive than `entire_frame_compile`, imo. ## Log metadata as separate columns (Meta only): Separately, this also changes the `metadata` column in PT2 Compile Events. Instead of logging a single metadata column in JSON, it separates the JSON into separate columns. This is much better for data analysis. Now that this table is more mature, I think logging keys to separate columns is a better system.Differential Revision: [D64696287](https://our.internmc.facebook.com/intern/diff/D64696287/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D64696287/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/138477 Approved by: https://github.com/aorenste	2024-10-22 21:17:44 +00:00
Simon Fan	5a13282c75	[compiled autograd] tls access helpers (#138061 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138061 Approved by: https://github.com/yf225 ghstack dependencies: #137953, #137821	2024-10-22 08:03:52 +00:00
Simon Fan	49fa437097	[compiled autograd] Compiled autograd configs in TLS (#137821 ) Multithreaded doesn't work yet, this adds python side TLS only for the python side state Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821 Approved by: https://github.com/jansel, https://github.com/yf225 ghstack dependencies: #137953	2024-10-22 08:03:52 +00:00
Sam Larsen	a80b87353c	[pt2] Log is_forward field to dynamo_compile scuba table (#138505 ) Differential Revision: [D64711721](https://our.internmc.facebook.com/intern/diff/D64711721) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138505 Approved by: https://github.com/oulgen	2024-10-22 05:50:49 +00:00
Tom Ritchford	8ad191ae21	[dynamo] Replace __str__ with __repr__ in some places (#136316 ) ## The problem In a typical debugger, `repr()` is used to display variables and not `str()`. Several classes in Dynamo have a `__str__()` method that returns useful information and a `__repr__()` that does not. Having to call `str(x)` or `[str(i) for i in x]` in the debugger all the time is a chore. `str()` should be ["informal, nicely printable"](https://docs.python.org/3/library/stdtypes.html#str) and `repr()` should ["attempt to return a string that would yield an object with the same value when passed to eval()](https://docs.python.org/3/library/functions.html#repr)". ## The solution In the Python object model, if there is no `__str__` method, `__repr__` is used instead (but not the other way around). So renaming `__str__` to `__repr__` in a few cases where no `__repr__` method exists now should not change observable behavior, and should make debugging easier. The specific classes changed were all in `torch._dynamo.variables`: * `builtin.BuiltinVariable` * `constant.ConstantVariable` * `constant.EnumVariable` * `functions.UserMethodVariable` * `lazy.LazyVariableTracker` * `lazy.LazySymNodeFormatString` * `misc.GetAttrVariable` * `misc.NullVariable` * `user_defined.UserDefinedObjectVariable` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136316 Approved by: https://github.com/XuehaiPan, https://github.com/jansel	2024-10-21 19:50:38 +00:00
PyTorch MergeBot	32d4582e02	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit `16caa8c1b3`. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to checking if this will solve inductor errors ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2427565425))	2024-10-21 19:40:58 +00:00
Aaron Orenstein	07cc4bd3e2	typing compile_fx.py (#138033 ) Type annotations for compile_fx. - Some of the stuff here is pretty complicated (functions which return functions that take functions) so I bailed on those and used `Any` just to get the rest landed. - There are also changes to type signatures in other files which I did just to let mypy know more about the types in compile_fx.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138033 Approved by: https://github.com/Skylion007	2024-10-21 18:14:59 +00:00
Aaron Gokaslan	16caa8c1b3	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-10-21 17:20:06 +00:00
Michael Lazos	a20a17fd6f	[Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669 ) Fixes https://github.com/pytorch/pytorch/issues/114369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137669 Approved by: https://github.com/anijain2305	2024-10-19 04:12:45 +00:00
PyTorch MergeBot	47e4045566	Revert "[pt2] Log is_forward field to dynamo_compile scuba table (#138097 )" This reverts commit `4e9273c84e`. Reverted https://github.com/pytorch/pytorch/pull/138097 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think it has a land race with https://github.com/pytorch/pytorch/pull/137803 ([comment](https://github.com/pytorch/pytorch/pull/138097#issuecomment-2423297516))	2024-10-18 22:00:40 +00:00
James Wu	295de00908	[PT2 Compile Events] Revamp PT2 Compile/chromium event logging [1/?] (#138093 ) This diff is the starting steps of https://docs.google.com/document/u/2/d/1kAEBt4AyW7HTAhXHbjoz8FBFHNyyEA2Qo2mPn7v3WUQ/edit?usp=drive_web&ouid=113555078003219714709 It implements the following changes: - Only log spans to scuba, so no start events are ever logged - Log events as the full event name, without "START" or "END" - Only log to scuba major phases from chromium events. These are: - entire_frame_compile (dynamo) - backend_compile (aotdispatch) - inductor_compile (inductor) - codegen (inductor codegen) Tlparse chromium events stay basically the same. But I implemented a few changes to clean that up as well: - When there's a phase name available, log the phase name instead of the function name as the event name. This simplifies the trace to not have two identical rows. The fn_name is avaliable as metadata on the chromium event, if interested - Log new events for pre and post grad passes. These do not log to scuba. By making the phases much simpler in Scuba, with only categories for major phases of PT2 Compilation, we pave the way to add much more metadata and information to each individual event type. Diffs for that will come later. IMPLEMENTATION NOTES: - The logic for `log_chromium_event_internal` (which is the function that logs to Scuba) lives in chromium_events for now, but in the future as we add more metadata, it may belong independently in dynamo_timed or even outside of dynamo_timed. I haven't explored in detail what the refactor will look like. Once we start logging metadata for dynamo, aotdispatch, inductor, I suspect we will call log_pt2_compile_event directly, instead of making chromium event logger handle the pt2_compile_event logic. But that refactor is left for another PR on top of this one. - There's an interesting space after pre grad passes within AOT autograd logic, that's between create_aot_dispatcher_function and pre grad passes. I'm not sure what we're spending time doing in that time, but I'll find out with a profile later. Differential Revision: [D64479033](https://our.internmc.facebook.com/intern/diff/D64479033/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138093 Approved by: https://github.com/ezyang	2024-10-18 20:36:08 +00:00
Sam Larsen	4e9273c84e	[pt2] Log is_forward field to dynamo_compile scuba table (#138097 ) Summary: ^^ Test Plan: Ran a test script out of fbcode: D64350202. Then: ``` (pytorch-3.10_4) devvm2296:~/fbcode $ scuba -e="select time,co_filename,is_forward from \`dynamo_compile/sandbox\` where is_forward is not null" +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+ \| time \| co_filename \| is_forward \| +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+ \| 1729032583 \| /data/users/slarsen/fbsource/buck-out/v2/gen/fbcode/1638b36e975169f6/scripts/slarsen/torch_compile_model/__run__/run-inplace#link-tree/scripts/slarsen/torch_compile_model/run.py \| 1 \| \| 1729032583 \| null \| 0 \| \| 1729032650 \| /data/users/slarsen/fbsource/buck-out/v2/gen/fbcode/1638b36e975169f6/scripts/slarsen/torch_compile_model/__run__/run-inplace#link-tree/scripts/slarsen/torch_compile_model/run.py \| 1 \| \| 1729032650 \| null \| 0 \| +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+ 4 row(s) in set (0 warnings, 131 errors, 0.80 sec) ``` Reviewed By: ezyang Differential Revision: D64438144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138097 Approved by: https://github.com/ezyang	2024-10-18 18:48:52 +00:00
PyTorch MergeBot	361f42bc42	Revert "[compiled autograd] Compiled autograd configs in TLS (#137821 )" This reverts commit `9aba0b91c8`. Reverted https://github.com/pytorch/pytorch/pull/137821 on behalf of https://github.com/wdvr due to Reverting this for now, it is failing test_public_bindings in trunk ([comment](https://github.com/pytorch/pytorch/pull/137821#issuecomment-2417351788))	2024-10-16 16:38:29 +00:00
Simon Fan	9aba0b91c8	[compiled autograd] Compiled autograd configs in TLS (#137821 ) Multithreaded doesn't work yet, this adds python side TLS only for the python side state Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821 Approved by: https://github.com/jansel, https://github.com/yf225 ghstack dependencies: #137953	2024-10-16 09:28:32 +00:00
PyTorch MergeBot	4557f6e339	Revert "[Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669 )" This reverts commit `bf0b670598`. Reverted https://github.com/pytorch/pytorch/pull/137669 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing test_public_bindings in trunk, maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/137669#issuecomment-2415331274))	2024-10-15 23:22:58 +00:00
Michael Lazos	bf0b670598	[Dynamo] Disable torch function compilation during guard execution and in compiled bytecode (#137669 ) Fixes https://github.com/pytorch/pytorch/issues/114369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137669 Approved by: https://github.com/anijain2305	2024-10-15 20:52:58 +00:00
Jovian Anthony Jaison	6001b16597	Add entire _dynamo.config as a json for logging (#137216 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137216 Approved by: https://github.com/ezyang Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-10-12 11:48:59 +00:00
William Wen	93bbc8abcc	[dynamo, 3.13] use 3.13 multiline traceback in get_instruction_source_311 (#137617 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137617 Approved by: https://github.com/jansel	2024-10-10 20:19:27 +00:00
Michael Lazos	d5785d4295	[Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137119 Approved by: https://github.com/williamwen42, https://github.com/anijain2305 ghstack dependencies: #137114, #137115, #137116, #137117, #137120, #137227	2024-10-09 02:29:40 +00:00
Michael Lazos	108b469f78	[Dynamo] Remove ignored modes workaround (#135502 ) (#137115 ) Approved by: https://github.com/anijain2305 ghstack dependencies: #134732, #133137, #135443, #135444, #135422 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137115 Approved by: https://github.com/yanboliang ghstack dependencies: #137114	2024-10-09 02:29:40 +00:00
PyTorch MergeBot	8c937445ee	Revert "[Dynamo] Remove ignored modes workaround (#135502 ) (#137115 )" This reverts commit `b1fd7708bd`. Reverted https://github.com/pytorch/pytorch/pull/137115 on behalf of https://github.com/huydhn due to The top of the stack has been reverted but it leaves trunk in a broken state, so I try to revert the rest of the stack ([comment](https://github.com/pytorch/pytorch/pull/137114#issuecomment-2400765603))	2024-10-08 20:33:17 +00:00
James Wu	3bf6594d13	Log compile ids to pt2_remote_cache and pt2_compile_events (#137431 ) Log the current compilation id for all relevant samples for these two tables, so we can have a 1:1 analog with dynamo_compile. Differential Revision: [D63900826](https://our.internmc.facebook.com/intern/diff/D63900826/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137431 Approved by: https://github.com/oulgen	2024-10-08 18:04:48 +00:00
PyTorch MergeBot	c88c0e6c65	Revert "[Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119 )" This reverts commit `d255b34c0a`. Reverted https://github.com/pytorch/pytorch/pull/137119 on behalf of https://github.com/malfet due to Need to revert to be able to revert https://github.com/pytorch/pytorch/pull/136910 ([comment](https://github.com/pytorch/pytorch/pull/137119#issuecomment-2400401262))	2024-10-08 17:09:26 +00:00
Michael Lazos	d255b34c0a	[Dynamo] Handle torch function subclass/mode dispatch on generic tensor methods (#137119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137119 Approved by: https://github.com/williamwen42 ghstack dependencies: #137114, #137115, #137116, #137117, #137120, #137227	2024-10-07 18:55:26 +00:00
Michael Lazos	b1fd7708bd	[Dynamo] Remove ignored modes workaround (#135502 ) (#137115 ) Approved by: https://github.com/anijain2305 ghstack dependencies: #134732, #133137, #135443, #135444, #135422 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137115 Approved by: https://github.com/yanboliang ghstack dependencies: #137114	2024-10-07 18:55:26 +00:00
Jovian Anthony Jaison	59d7cf7342	Add _dynamo.config inline_inbuilt_nn_modules and specialize_float logging (#137139 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137139 Approved by: https://github.com/ezyang	2024-10-02 19:58:38 +00:00
Animesh Jain	289df45cee	Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422 )" (#136590 ) This reverts commit `7743149b2b`. Reverts * https://github.com/pytorch/pytorch/pull/135503 * https://github.com/pytorch/pytorch/pull/135502 * https://github.com/pytorch/pytorch/pull/135422 This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation. ``` import torch from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention from torch._inductor.lowering import make_pointwise, register_lowering from torch._inductor.virtualized import ops from torch.nn.attention.flex_attention import create_block_mask torch.set_default_device('cuda') flex_attention = torch.compile(flex_attention, dynamic=False) prefix_lengths = torch.arange(8) def prefix_lm(b, h, q, kv): return prefix_lengths[b] >= kv mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136590 Approved by: https://github.com/Chillee	2024-09-25 21:10:43 +00:00
Animesh Jain	dacf0c4884	[dynamo] Do not treat user defined nn module attributes static for dynamic shape infra (#136516 ) Fixes https://github.com/pytorch/pytorch/issues/136254 Th regression was introduced in https://github.com/pytorch/pytorch/pull/132736 where originally we were trying to fix another regression. This PR and the offending PR together say - "treat user defined nn module attributes as automatic dynamic, but for cudagraphs they will be considered static". This avoid recompilations. This can lead to a cudagraph recording, which is ok. This also maintains the state before inline_inbuilt_nn_modules flag was introduced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136516 Approved by: https://github.com/williamwen42	2024-09-24 18:26:12 +00:00
Jovian Anthony Jaison	09715638ab	Add _dynamo.config.suppress_errors logging (#136379 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136379 Approved by: https://github.com/ezyang	2024-09-21 21:00:26 +00:00
James Wu	803ce507f1	Log structured logging overhead to dynamo compile (kinda) (#136142 ) Summary: X-link: https://github.com/pytorch/benchmark/pull/2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number in compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136142 Approved by: https://github.com/bobrenjc93	2024-09-19 16:11:38 +00:00
Sun, Jiayi	701ba5203f	[Inductor] Increase multiplier to 3 for Inductor AMP FP16 benchmark correctness check (#135932 ) Fix https://github.com/pytorch/pytorch/issues/135657. Aligned with AMP BF16, using multiplier 3 for Inductor AMP FP16 benchmark correctness check Pull Request resolved: https://github.com/pytorch/pytorch/pull/135932 Approved by: https://github.com/CaoE, https://github.com/jgong5, https://github.com/jansel	2024-09-18 13:03:45 +00:00
Aaron Gokaslan	31715be72a	[BE]: Update mypy to 1.11.2 (#133816 ) Updates mypy to 1.11.1 to improve type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816 Approved by: https://github.com/ezyang	2024-09-16 19:44:11 +00:00
PyTorch MergeBot	3117f2cf67	Revert "[BE]: Update mypy to 1.11.2 (#133816 )" This reverts commit `55299cfc22`. Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684))	2024-09-16 09:11:16 +00:00
Aaron Gokaslan	55299cfc22	[BE]: Update mypy to 1.11.2 (#133816 ) Updates mypy to 1.11.1 to improve type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816 Approved by: https://github.com/ezyang	2024-09-14 21:40:36 +00:00
Michael Lazos	860838e9be	[Dynamo] Remove ignored modes workaround (#135502 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135502 Approved by: https://github.com/anijain2305 ghstack dependencies: #134732, #133137, #135443, #135444, #135422	2024-09-14 18:52:22 +00:00
Michael Lazos	5c5c33ac32	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-14 18:52:22 +00:00
PyTorch MergeBot	8c8a3086a7	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `4528777e03`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))	2024-09-14 10:02:55 +00:00
PyTorch MergeBot	838c912502	Revert "[Dynamo] Remove ignored modes workaround (#135502 )" This reverts commit `5c67cf180e`. Reverted https://github.com/pytorch/pytorch/pull/135502 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))	2024-09-14 10:02:55 +00:00
Michael Lazos	5c67cf180e	[Dynamo] Remove ignored modes workaround (#135502 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135502 Approved by: https://github.com/anijain2305 ghstack dependencies: #134732, #133137, #135443, #135444, #135422	2024-09-14 02:41:16 +00:00
Michael Lazos	4528777e03	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-14 02:40:43 +00:00
James Wu	ad2f0e9f81	Add remote cache time saved to compilation metrics (#135490 ) Summary: Record remote cache time saved via frame_phase_timing We add to the "phase" when remote cache hits and saves us time, so that we have a 1:1 correspondence between a frame and time saved. Test Plan: Internally run benchmark, see that it's populated in sandbox table after previous diff lands and logger config is actualized. Show that column exists in table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/fp2te0ff Note that an earlier version of D62105258 had the column as a string so the staging table is a bit messed up. But you can see the most recent samples have the column populates as a float. Reviewed By: aorenste Differential Revision: D62106921 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135490 Approved by: https://github.com/aorenste	2024-09-13 16:35:51 +00:00
PyTorch MergeBot	eb7dd91dd1	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `fafdd588f2`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))	2024-09-13 12:52:58 +00:00
PyTorch MergeBot	fca58bfda1	Revert "[Dynamo] Remove ignored modes workaround (#135502 )" This reverts commit `7d5e0dd4b1`. Reverted https://github.com/pytorch/pytorch/pull/135502 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))	2024-09-13 12:52:57 +00:00
Michael Lazos	7d5e0dd4b1	[Dynamo] Remove ignored modes workaround (#135502 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135502 Approved by: https://github.com/anijain2305 ghstack dependencies: #134732, #133137, #135443, #135444, #135422	2024-09-13 08:41:32 +00:00
Michael Lazos	fafdd588f2	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-13 08:41:00 +00:00
PyTorch MergeBot	183c32fd3b	Revert "[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 )" This reverts commit `0d15122092`. Reverted https://github.com/pytorch/pytorch/pull/133137 on behalf of https://github.com/clee2000 due to something in this stack broke functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph [GH job link](https://github.com/pytorch/pytorch/actions/runs/10804912306/job/29980571390) [HUD commit link](`444b52ff40`), newly added test yesterday ([comment](https://github.com/pytorch/pytorch/pull/133137#issuecomment-2344054339))	2024-09-11 15:57:00 +00:00
Michael Lazos	0d15122092	[Dynamo] Trace torch function modes entered outside of torch.compile (#133137 ) This PR adds initial tracing for torch function modes. Details: In essence, this adds tracing into the torch function of modes entered outside of the torch.compile call. This does not yet support tracing enter/exit of a torch function mode/ tracing set_default_device properly using the new mode infra (this will be a very good stress test for modes). I am adding more PRs to this stack to support these. The overall plan is to support tracing enter/exit and handling graph breaks like we do other torch.* context managers. Previously landed: https://github.com/pytorch/pytorch/pull/133135 https://github.com/pytorch/pytorch/pull/133136 https://github.com/pytorch/pytorch/pull/133134 https://github.com/pytorch/pytorch/pull/133133 https://github.com/pytorch/pytorch/pull/133132 https://github.com/pytorch/pytorch/pull/133131 https://github.com/pytorch/pytorch/pull/133729 https://github.com/pytorch/pytorch/pull/133130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133137 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #134732	2024-09-11 04:18:22 +00:00
William Wen	a4030e37be	[dynamo] reland map/zip iterator related changes (#135074 ) Differential Revision: [D62211019](https://our.internmc.facebook.com/intern/diff/D62211019) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135074 Approved by: https://github.com/jansel, https://github.com/anijain2305, https://github.com/mlazos	2024-09-06 20:38:02 +00:00
Animesh Jain	32f45f01a9	[dynamo] Retire CompileProfiler (#135133 ) Fixes confusion in https://github.com/pytorch/pytorch/issues/113443 We have TORCH_LOGS that supersedes CompileProfiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/135133 Approved by: https://github.com/ezyang ghstack dependencies: #135039, #135121, #135129, #135130	2024-09-05 01:08:40 +00:00
Michael Lazos	d9ae92cd6e	[Dynamo] Support for proxying frozen dataclasses (#134846 ) Fixes https://github.com/pytorch/pytorch/issues/133858 Details: Previously Dynamo would treat dataclasses as UserDefinedVariables. This was non-desirable if we would like to proxy the value into the graph, which is needed for TensorSubclassMetadata. To rectify this, frozen dataclasses are now able to be proxied similarly to NamedTuples. We require the object to be frozen, because if arbitrary mutation were allowed, we would need to replay those mutations in the graph after construction of the object. For tracing construction of the variable, the generated `__init__` for the dataclass uses `object.__setattr__` because frozen dataclasses throw errors on the usual `__setattr__` invocation. With this treatment, no special handling is needed in dynamo for frozen dataclass construction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134846 Approved by: https://github.com/bdhirsh, https://github.com/anijain2305	2024-09-04 22:17:00 +00:00
Edward Z. Yang	0cbcef12bd	Stop adding useless prefix to error message here, you're pushing the important info off the screen. (#133108 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133108 Approved by: https://github.com/Skylion007	2024-09-01 23:11:17 +00:00
Shunting Zhang	1e92d7b688	[inductor] move loop ordering after fusion (#126254 ) Restart the work from PR https://github.com/pytorch/pytorch/pull/100331 in this new PR since it's hard to rebase. It would be expected that some code is copy/pasted from the previous PR and main idea is the same. Previously we see relatively large compilation time increase due to too many loop orders being considered. This PR tries to continue the work by doing pruning and only considering loop orders that we know for sure are relevant (i.e. do it on demand). Some manually created cases that loop ordering matters are added as unit tests. The PR can make sure inductor does not miss fusion opportunities for them. This PR should solve the not-able to fusion problem in https://github.com/pytorch/pytorch/issues/130015 Right now there is still significant increase of compilation time. I'll disable the feature by default. Later on after the compilation time issue is resolved, I'll enable it by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126254 Approved by: https://github.com/jansel	2024-08-29 21:50:07 +00:00
Laith Sakka	d6091c8726	Add compile time instruction count metric (#133834 ) PYTHONPATH=$(pwd) python benchmarks/update_hint_benchmark.py out as of this diff, compile_time_instruction_count counts the number of instruction from within convert_frame.compile_inner ``` update_hint_regression,compile_time_instruction_count,10522459165 ``` will add result from CI once populated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133834 Approved by: https://github.com/aorenste	2024-08-27 23:29:02 +00:00
James Wu	f8fbfe5846	Always emit end events even on failure, use thread local storage for stack (#134279 ) Summary: We should always emit an end event in a finally block so that if a unit test or job fails, the stack is still correct. Also, we use thread local storage for the stack, so that in multithreaded scenarios the stack will still be correctly added. Test Plan: Run benchmark and see that everything still works Run ``` TORCH_LOGS=dynamo buck run test/functorch:test_aotdispatch -- -r test_backward_mutation_on_grad_out ``` With some extra logging to see that start events with the correct stack are emitted, and the end events are also emitted even though the test fails at runtime. Differential Revision: D61682556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134279 Approved by: https://github.com/aorenste	2024-08-23 18:13:13 +00:00
Animesh Jain	fee677eeb6	[fbode-testing][dynamo][reland][inline-inbuilt-nn-modules] Mark attri… (#134136 ) Shuai wants to test this internally before https://github.com/pytorch/pytorch/pull/133713 can go in. Creating a separate PR for ghmport. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134136 Approved by: https://github.com/yanboliang	2024-08-22 17:54:58 +00:00
Aaron Orenstein	d95aedf5fd	[BE] typing for decorators - fx/_compatibility (part 1) (#134202 ) Part of #134054. This corresponds to the pytorch mypy changes from D61493706. Updating takes so long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change. So landing these 'type: ignore' for pytorch in advance of them actually being needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202 Approved by: https://github.com/Skylion007	2024-08-22 17:07:33 +00:00
James Wu	3c5485fb7f	[Retry] Log chromium events to scuba (#134118 ) Summary: This diff implements a bunch of views for internal scuba viewing. TODOS that I might punt to another diff: - Saving cache stats via counter is definitely sus here, but there's not really a good way to track "fx graph cache hit for this compile phase" right now. Will think about this more. - We should definitely log frame id, compile id, etc - We should definitely be logging configs. That way, we can A/B test based on whether a config is turned on. - idk what I'm doing with compile_uuid yet, but it's useful when you want to look at samples for a single run. I think if we had mast job info this field is not needed, but it's nice to be able to drill down to a single run and get its chrome trace view or icicle view, so idk Test Plan: All of the above views are run with nanogpt benchmark: ``` buck run mode/opt caffe2/benchmarks/dynamo:torchbench -- --training --backend=inductor --only nanogpt --performance ``` Differential Revision: D61603243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134118 Approved by: https://github.com/oulgen	2024-08-22 14:59:45 +00:00
PyTorch MergeBot	2db28a9611	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit `bce0caba78`. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/ezyang due to root cause of internal failures not addressed ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2302466444))	2024-08-21 16:13:34 +00:00
PyTorch MergeBot	68425e68fe	Revert "[dynamo][reland][inline-inbuilt-nn-modules] Mark attributes of nn mod… (#133714 )" This reverts commit `e8d3c4be36`. Reverted https://github.com/pytorch/pytorch/pull/133714 on behalf of https://github.com/anijain2305 due to fails internally ([comment](https://github.com/pytorch/pytorch/pull/133714#issuecomment-2302171472))	2024-08-21 14:21:06 +00:00
Aaron Gokaslan	bce0caba78	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-08-20 17:19:57 +00:00
PyTorch MergeBot	42097f0ec1	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit `cf60fe53a8`. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to Broke 12k internal signals/jobs, @ezyang please help get those changes merged. More details check D61488368 ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2298210309))	2024-08-20 08:02:49 +00:00
Michael Lazos	c0b4aaa8c5	[Dynamo] Support pop torch function mode stack (#133131 ) This PR adds support for tracing `torch._C._pop_torch_function_stack()` without graph breaking and in order to verify the state change also adds replay of mutations to the torch function mode stack via side_effects appending supplemental bytecode as we do for other python mutable objects. Details: To represent the torch function mode stack symbolically a deque field is added to the instruction translator. When the InstructionTranslator is initialized, all modes are read from the current torch function mode stack, and stashed in a global weak ref for later access (using existing sources) without needing to push/pop the python/cpp torch function mode stack. During tracing, when `_pop_torch_function_stack` is encountered a value is popped from this deque and the variable tracker representing the mode is returned. To ensure the true torch function mode stack matches this state, `TorchFunctionModeStackVariable`, a singleton, is marked as mutated, this adds it to side effects, where during final codegen, side effects will codegen a call to a python helper which will update the python torch function mode stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133131 Approved by: https://github.com/jansel ghstack dependencies: #133130, #133729	2024-08-20 07:14:42 +00:00
Michael Lazos	09e366cb57	[Dynamo] Add torch function mode stack guard to dynamo (#133130 ) This PR adds a guard on the torch function mode stack state at the beginning of tracing. The way this is implemented is via a new leaf guard which is passed the initial stack state at construction and compares it to the stack state at the time the guard is run. Details: The stack state is extracted via popping all modes, appending them to a list, and pushing all modes back. This list is stored on the output graph and read during guard construction to pass to the stack mode guard. There the length and types of the modes are recorded. Next time the guard is run it compares this recorded state to the current mode stack state. To implement this in python a helper function was added to utils.py and this is used if cpp guards are not enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133130 Approved by: https://github.com/anijain2305	2024-08-20 07:14:33 +00:00
Animesh Jain	e8d3c4be36	[dynamo][reland][inline-inbuilt-nn-modules] Mark attributes of nn mod… (#133714 ) Relands https://github.com/pytorch/pytorch/pull/132539 Relands https://github.com/pytorch/pytorch/pull/132736 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133714 Approved by: https://github.com/jansel	2024-08-20 05:57:52 +00:00
Aaron Gokaslan	cf60fe53a8	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-08-18 19:10:16 +00:00

1 2 3 4 5 ...

538 Commits