Commit Graph

293 Commits

Author SHA1 Message Date
William Wen
cf6d1e6824 [dynamo] add generic graph break hints (#147429)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147429
Approved by: https://github.com/jansel, https://github.com/zou3519
ghstack dependencies: #147385
2025-02-26 09:20:28 +00:00
William Wen
3fd68e4e2f [dynamo] make some more graph break messages readable in English [2/N] (#147385)
This is for "for some large number Z, make sure the error messages are readable English." - beginning to audit all `unimplemented` sites and making sure that all messages are at least English-readable. Hints may not necessarily be provided.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147385
Approved by: https://github.com/jansel
2025-02-26 09:20:28 +00:00
Simon Fan
ed83b0b70b [ddp] decouple python reducer from compilation mode (#147123)
Current implementation reads as: we will only actually use the "python_reducer" config if the DDP forward is compiled. Otherwise, we will silently fallback to C++ reducer + no DDPOptimizer.
I'm changing this behavior to always use the python reducer if the config is specified.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147123
Approved by: https://github.com/fegin
2025-02-19 15:51:40 +00:00
William Wen
63e8ad49b8 [dynamo] replace hardcoded eval frame control flags skip_code_recursive_flag/cache_limit_hit_flag (#146355)
This PR and the previous:
- Moves parts of `eval_frame.c` to C++.
- Reduces code duplication in `dynamo__custom_eval_frame` and makes the control flow more clear.
- Enables `convert_frame` to signal to `eval_frame.cpp` in a general manner how to evaluate this frame, recursive frames, and future frames with the same code object (default/compile, skip, run-only). e.g. this will allow us to change skipping/cache limit hit eval_frame behavior directly from convert_frame without requiring changes to C/C++.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146355
Approved by: https://github.com/jansel
ghstack dependencies: #145603
2025-02-18 21:37:12 +00:00
zeshengzong
c6b331f7d9 Deprecate skip_code_recursive_on_cache_limit_hit config flag (#136970)
Fixes one of #136862

Make `skip_code_recursive_on_cache_limit_hit` flag deprecated.

Affected logic is in here:
6931c1644a/torch/_dynamo/convert_frame.py (L866-L876)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136970
Approved by: https://github.com/williamwen42
2025-02-18 18:48:23 +00:00
Raymond Li
21c2565f35 Document dynamo (#146736)
Many files in dynamo are currently lacking file/module-level documentation, which makes it hard to know what they do at a glance and without digging into the code. This fixes that.

Note: documentation was AI-generated and could be incorrect, please review carefully.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146736
Approved by: https://github.com/jansel, https://github.com/StrongerXi, https://github.com/anijain2305, https://github.com/zou3519
2025-02-13 00:02:21 +00:00
Animesh Jain
ee45ea599d [dynamo] Actionable message on recompilations for fullgraph=True (#146550)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146550
Approved by: https://github.com/zou3519, https://github.com/StrongerXi
ghstack dependencies: #146553
2025-02-07 17:28:43 +00:00
Simon Fan
a14c780c4c [dynamo] fix dynamo_compile logging on RecompileLimitExceeded (#146544)
Logging branches based on RecompileLimitExceeded or not. If we exceed the limit, we fallback to eager before even trying to analyze the frame. We handle RecompileLimitExceeded outside of the try/catch/finally that edits the metrics context:
72405b0c0f/torch/_dynamo/convert_frame.py (L908-L935).

dynamo_config and recompile_reason are both known before we raise the RecompileLimitExceeded, so we can add them with the rest of the "common" metrics. which are logged on metric_context decorator exit and is always called

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146544
Approved by: https://github.com/masnesral
2025-02-06 16:20:42 +00:00
Simon Fan
1d4adf4e1f [dynamo] log recompile reason to dynamo_compile (#146117)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146117
Approved by: https://github.com/bobrenjc93
2025-02-03 21:04:04 +00:00
PyTorch MergeBot
3481c2aec4 Revert "[dynamo] save/restore system random state more carefully (#145750)"
This reverts commit e3d3f2b22e.

Reverted https://github.com/pytorch/pytorch/pull/145750 on behalf of https://github.com/eellison due to bisected perf regression ([comment](https://github.com/pytorch/pytorch/pull/145750#issuecomment-2620028414))
2025-01-28 20:51:07 +00:00
William Wen
e3d3f2b22e [dynamo] save/restore system random state more carefully (#145750)
Reattempt of https://github.com/pytorch/pytorch/pull/145435 since the state of the linked internal diff appears to be messed up.

Note: I have verified that the previously failing internal tests now pass internally.

Differential Revision: [D68723334](https://our.internmc.facebook.com/intern/diff/D68723334)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145750
Approved by: https://github.com/StrongerXi
2025-01-28 01:34:13 +00:00
Aaron Orenstein
a79100ab11 PEP585 update - torch/_dynamo (#145105)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105
Approved by: https://github.com/bobrenjc93
2025-01-18 20:47:11 +00:00
Laith Sakka
c3fcb3606d Profile compile_inner instead of _compile_inner (#144930)
Summary: title

Test Plan: NA

Reviewed By: jamesjwu

Differential Revision: D67990492

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144930
Approved by: https://github.com/jamesjwu
2025-01-16 23:59:27 +00:00
Colin L. Rice
0cd9320c7f easy: dynamo_config: sort keys and set values (#143317)
This will create consistent ordering of keys when writing, as well as
sorting sets before serializing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143317
Approved by: https://github.com/masnesral
ghstack dependencies: #143307
2025-01-11 03:08:04 +00:00
bobrenjc93
1fe3af2c68 Migrate from Tuple -> tuple in torch/_dynamo (#144261)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144261
Approved by: https://github.com/aorenste, https://github.com/zou3519
2025-01-10 07:45:57 +00:00
James Wu
f2d6cfa677 Introduce CompileEventLogger, replace usages of metrics_context and chromium_event with it (#143420)
**Problem statement**: I want to be able to centralize and simplify the process by which people add columns/data to existing spans. We have MetricsContext and ChromiumEventLogger, and there's various choices you can make to decide where and when to log different levels of observability for your events. To resolve this, I want a central API for "adding to events under dynamo_timed".

**CompileEventLogger** is intended as a frontend for MetricsContext and ChromiumEventLogger so we can use the same class for handling everything.

CompileEventLogger is intended be used within a `dynamo_timed()` context. Its purpose is to 1. log to existing events that are in progress (i.e. within dynamo_timed), and 2. log instant events to chromium that are independent of any specific span.

CompileEventLogger has three log levels:

- CHROMIUM: Log only to chromium events, visible via tlparse.
- PT2_COMPILE: Log to chromium_events + pt2_compile_events
- COMPILATION_METRIC: Log to compilation metrics in addition to the toplevel chromium and pt2_compile_event.

In addition, we have a function CompileEventLogger.add() that automagically chooses the correct log level. For now, it is conservative, and will never automagically choose to log CompilationMetrics (though I could imagine it figuring out the metadata are all keys in CompilationMetric and therefore loggable there).

The goal here is to make one single interface to log stuff for observability reasons, and make it as easy as possible.

Not included in this diff:
- V1 of this diff will not have implementations of `increment` and `add_to_set` which MetricsContext has, so those usages are not replaced yet. But I'll add those in a followup.

- We don't handle `RuntimeMetricsContext`. It's unclear if I want that to be part of this, because under RuntimeMetricsContext there might not be a toplevel event to log to, so chromium events doesn't make sense in that context. So I might leave that separate for now.

Differential Revision: [D67346203](https://our.internmc.facebook.com/intern/diff/D67346203/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143420
Approved by: https://github.com/aorenste
2025-01-04 22:40:34 +00:00
Vinayak Pandey
16a57e232c removed dead code for dynamo flag dead_code_elimination (#140938)
Fixes #136862

1.  removed dead code from torch/_dynamo/convert_frame.py
2.  ran `lintrunner -a` and all the tests passed.
3. ran the unit tests and everything seems to be in order.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140938
Approved by: https://github.com/zou3519
2024-12-31 09:27:43 +00:00
Jason Ansel
9e5f3fdfc7 [dynamo] Shorten tracebacks for backend compiler errors (#143552)
Fixes #143406

After this PR the error for missing Triton is:
```py
Traceback (most recent call last):
  File "/home/jansel/pytorch/repro.py", line 51, in <module>
    fp32_compiled = optimized_model(low_input)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 580, in _fn
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3624, in create_backend
    raise TritonMissing(inspect.currentframe())
torch._dynamo.exc.TritonMissing: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
```

Setting `TORCHDYNAMO_VERBOSE=1` yields something like the old error:
```py
Traceback (most recent call last):
  File "/home/jansel/pytorch/repro.py", line 51, in <module>
    fp32_compiled = optimized_model(low_input)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 580, in _fn
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/eval_frame.py", line 576, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 1383, in __call__
    return self._torchdynamo_orig_callable(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 1167, in __call__
    result = self._inner_convert(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 548, in __call__
    return _compile(
           ^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 988, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 716, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_utils_internal.py", line 95, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 751, in _compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object
    transformations(instructions, code_options)
  File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 232, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/convert_frame.py", line 663, in transform
    tracer.run()
  File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 2870, in run
    super().run()
  File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 1053, in run
    while self.step():
          ^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 963, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 3050, in RETURN_VALUE
    self._return(inst)
  File "/home/jansel/pytorch/torch/_dynamo/symbolic_convert.py", line 3035, in _return
    self.output.compile_subgraph(
  File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1102, in compile_subgraph
    self.compile_and_call_fx_graph(
  File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1383, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1433, in call_user_compiler
    return self._call_user_compiler(gm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/output_graph.py", line 1463, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/__init__.py", line 2314, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1880, in compile_fx
    return aot_autograd(
           ^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/backends/common.py", line 83, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 1145, in aot_module_simplified
    compiled_fn = AOTAutogradCache.load(
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_functorch/_aot_autograd/autograd_cache.py", line 754, in load
    compiled_fn = dispatch_and_compile()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile
    compiled_fn, _ = create_aot_dispatcher_function(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function
    return _create_aot_dispatcher_function(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function
    compiled_fn, fw_metadata = compiler_fn(
                               ^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 676, in aot_dispatch_autograd
    compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_functorch/aot_autograd.py", line 489, in __call__
    return self.compiler_fn(gm, example_inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1758, in fw_compiler_base
    return inner_compile(
           ^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 572, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 686, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile
    compiled_fn = graph.compile_to_module().call
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1975, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1981, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
                                                             ^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/graph.py", line 1916, in codegen
    self.scheduler.codegen()
  File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3667, in codegen
    return self._codegen()
           ^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3761, in _codegen
    if device is not None and self.get_backend(device).ready_to_flush():
                              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3631, in get_backend
    self.backends[device] = self.create_backend(device)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jansel/pytorch/torch/_inductor/scheduler.py", line 3624, in create_backend
    raise TritonMissing(inspect.currentframe())
torch._dynamo.exc.TritonMissing: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
```

This PR also strips dynamo stack frames from other types of backend compile errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143552
Approved by: https://github.com/yanboliang
2024-12-24 21:48:23 +00:00
Oguz Ulgen
dc55704b48 Rename cache limit to recompile limit in configs (#143709)
This PR renames every cache_limit to recompile_limit via sed.

Old config options are maintained via Config(alias='xyz')

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709
Approved by: https://github.com/jansel
2024-12-22 10:03:57 +00:00
Simon Fan
d88ebbf822 cleanup chromium event log on dynamo exit rather than on entry (#143175)
clearing at dynamo start is an issue because it throws away events from compiled autograd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143175
Approved by: https://github.com/Skylion007, https://github.com/jamesjwu
ghstack dependencies: #141907
2024-12-21 00:41:24 +00:00
Simon Fan
4ee166b82f [ca] add compiled autograd to CompileId (#141907)
tlparse PR: https://github.com/ezyang/tlparse/pull/83

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141907
Approved by: https://github.com/ezyang
2024-12-21 00:41:24 +00:00
Guilherme Leobas
673cc88fd6 Add support for contextmanager in Dynamo (#136033)
Fixes #130559

* Intro

This PR adds support for `@contextmanager` in Dynamo. We chose to limit the
scope of this work to only `@contextmanager` and plan to handle generators fully
in #141055 (still in draft).

* Motivation

Dynamo lacks support for generator functions. When it encounters one, it traces
it as if it were a regular function. This is problematic because it can lead to
incorrect behavior. To illustrate, consider the test case below:

```python
import torch
import contextlib

@contextlib.contextmanager
def set_default_dtype(dtype):
    old_dtype = torch.get_default_dtype()
    try:
        torch.set_default_dtype(dtype)
        yield
    finally:
        torch.set_default_dtype(old_dtype)

@torch.compile(backend="eager", fullgraph=True)
def fn():
    with set_default_dtype(torch.float64):
        x = torch.tensor([3.0, 3.0 + 5.0j])
    return x
```

Before this work, Dynamo would not stop at the `yield`, and the graph produced
would contain both calls to `set_default_dtype` executed one after the other.
This is incorrect because the context manager should execute code before and
after the `yield`.

* List of changes

`YIELD_VALUE` now raises an exception (`YieldValueOp`) to signal that control
flow must be suspended and returned to the caller. Additionally, `RETURN_VALUE`
behaves differently in a generator function. Unlike regular functions, where
`RETURN_VALUE` indicates the final result, in generators it signifies that the
generator is exhausted and implicitly raises `StopIteration`.

A new `VariableTracker` named `FunctionDecoratedByContextlibContextManagerVariable`
was introduced to handle `@contextmanager`. This variable tracker acts not just
as a wrapper for the original function but also maintains an internal `tx`
(InstructionTranslator) object to suspend and return control flow to the parent
tracer when a `yield` is encountered.

* Corner cases

Returning a context manager from a compiled function is not supported. This
would require PyTorch to synchronize the generator state between Dynamo and the
interpreter. Any attempt to return it will result in an `IncorrectUsage`
exception.

Graph breaks require special handling as well. In the event of a graph break,
the frame associated with the context manager is skipped, and the context
manager runs in eager mode.

* This PR is breaking my code

There is a configuration flag (`enable_trace_contextlib`) that can be set to
`False` to disable tracing context managers. If this still causes crashes,
please revert this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136033
Approved by: https://github.com/zou3519
2024-12-20 12:02:20 +00:00
qiurc
90cc43f270 Support garbage collection after pt2 compilation (#143364)
Summary:
Support garbage collection after pt2 compilation.
Add jk to control the global rollout / rollback of this functionality
Add env var to control individual job's rollout

Test Plan:
Test the model training job with / without this changes

Reviewers:
@yuxihu @ezyang , @Yuzhen11 ,

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143364
Approved by: https://github.com/ezyang
2024-12-18 07:25:11 +00:00
William Wen
18261e9f39 [dynamo] implement framelocals mapping as c++ object (#140063)
Implements https://github.com/pytorch/pytorch/issues/93753 - move frame local guard accessors to C++.

Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible.

Some implementation notes:
- `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells.
- `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily.
- We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed.
- We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up)

dynamo_guard_eval.py microbenchmark update:
- 713.2us -> 630.0us (3.10)
- 598.8us -> 530.7us (3.12)

Other followups:
- Add `FrameLocalsMapping` version for `check_verbose_nopybind` in order to match behavior between `check_nopybind` and `check_verbose_nopybind`. This can prevent difficult debugging situations where guards fail (`check_nopybind` returns false) but no guard error message is generated (`check_verbose_nopybind` succeeds).
- Rewrite the `SHAPE_ENV` guard into C++ - it is a fairly common guard that results in `FrameLocalsMapping` needing to convert to a dict

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140063
Approved by: https://github.com/jansel
ghstack dependencies: #142117, #142430
2024-12-17 18:54:27 +00:00
Edward Z. Yang
256bfd1096 Rename 'cache limit' to 'recompile limit' (#141542)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141542
Approved by: https://github.com/oulgen, https://github.com/jansel
2024-12-11 05:05:11 +00:00
snahir
16ea0ddcdb Ignore logger methods to avoid graph breaks (#139403)
Fixes #132635

Calls to logging.logger cause a graph break, this PR allows the user to avoid these graph breaks (for specific methods) by setting DISABLE_LOGS_WHILE_COMPILING to 1.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139403
Approved by: https://github.com/williamwen42
2024-12-05 20:12:26 +00:00
Sam Larsen
07906f2f2b [logging] Move population of common MetricsContext fields to record_compilation_metrics (#141291)
Summary: Fix outstanding TODOs related to logging of CompilationMetrics by moving the population of common fields to record_compilation_metrics() instead of populating those independently wherever we use a the metrics_context contextmanager:
* Keep track of start and end time in MetricsContext and pass those to record_compilation_metrics() and populate those fields in that function.
* Pass exception info to record_compilation_metrics() and populate those field in that function.
* Add a new contextmanager, chromium_event_timed, to create the start/end "dynamo" event. This is important because I want this contextmanager to complete _after_ building the CompilationMetrics.
* Populate the compile_id field centrally in record_compilation_metrics().
* Populate the structured_logging_overhead centrally in record_compilation_metrics().
* Add the CompilationMetrics to the current chromium event in record_compilation_metrics(), after all common fields have been added. In a future diff, I can also add _all_ compilation metrics to the chromium event.

Test plan: Unit tests. Also see internal testing:
* dynamo_compile: https://fburl.com/scuba/dynamo_compile/sandbox/jrascnf9
* pt2_compile_events: https://fburl.com/scuba/pt2_compile_events/l3jnla06
* tlparse: https://fburl.com/bq5a9nqs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141291
Approved by: https://github.com/jamesjwu
2024-11-25 13:18:40 +00:00
Atul Jangra
6a096a0b96 [PT2] Fix callbacks to account for entire execution in compilation (#141323)
Summary:
In SJD, we register the callbacks to get notified of an active compilation. Using this information, we can basically allow for an increase time for the training loop

The callbacks currently do not account for entire time and in several cases, the end callback is not called at all.

This leads to a bunch of APS jobs getting terminated incorrectly: https://fburl.com/scuba/mast_hpc_job_run_status/ondwzt2w

In this diff, we basically install a context manager which will call the start and end callbacks, similar to how we log counters and other information.

Test Plan:
```
buck2 run mode/opt //aps_models/examples/dlrm:dlrm_train_app -- --config-name train_mast_fsdp_torchdynamo launcher.data_project=apf_ai_infra launcher.fbl_entitlement=ai_infra_training_rnd_tc  launcher.hardware=TC_ANY_80G
```
Led to https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-atuljangra-ef2285ba9a?job_attempt=0&version=0&env=prod

https://fburl.com/ai_infra/sv0a213y confirms that callback was correctly called and a lease was properly installed, which takes over the training loop lease.

{F1965137027}

Differential Revision: D66347023

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141323
Approved by: https://github.com/ezyang
2024-11-24 22:31:04 +00:00
Bob Ren
f3f7ba5a69 Restart dynamo analysis when we fail to tensorify away all symfloat inputs (#140346)
Fixes a bunch of benchmarks that failed with cudagraph errors including `tlp python benchmarks/dynamo/timm_models.py --device cuda --inductor --accuracy --amp --training --only resmlp_12_224` when `specialize_float=False`

Also brings down number of overall failures (with keep-going) from 108 => 62. I'd estimate >80% of those 62 are wobbly expect tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140346
Approved by: https://github.com/ezyang
ghstack dependencies: #140983, #141003
2024-11-20 21:20:41 +00:00
Animesh Jain
9d229f08f4 [dynamo][guards] Introduce a diff_guard_manager (#140250)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140250
Approved by: https://github.com/jansel
ghstack dependencies: #140223
2024-11-20 17:59:30 +00:00
Animesh Jain
f4ce9ac29d [dynamo] Dont erase the cache line on invalidation (#140821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140821
Approved by: https://github.com/jansel
2024-11-19 19:11:10 +00:00
James Wu
8d5b3eeaa6 Remove __start__ stack, log backward compile to empty stack (#140431)
Summary:
This diff removes "__start__" from all stacks in Pt2 Compile Events, as it's unnecessary.

It also starts logging events for backward compile, because otherwise we have no toplevel event representing full backward compilation. This gives us a toplevel event outside of the inductor compile.

Test Plan:
New chromium events:

https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json#!/viewer?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json&local_cache_key

New tlparse:
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/jjwu/custom/stuff4/index.html

New scuba icicle view, still good: https://fburl.com/scuba/pt2_compile_events/z6gr3z53

Differential Revision: D65832045

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140431
Approved by: https://github.com/masnesral
2024-11-18 22:48:31 +00:00
Ryan Guo
ac6684ebbc [dynamo] Identify pre-existing captured cells by cell id rather than content id (#140436)
In `match_nested_cell`, Dynamo tried to identify pre-existing captured
cells by `(cell_name, id(cell_contents))`. This works in most cases, but
as the test added in this patch shows, it's not a complete solution.

This patch
1. changes `match_nested_cell` to `lookup_variable_for_captured_cell`,
   and does the lookup based on id of cell objects, not their contents.
   This requires plumbing a tuple of captured cell objects from
   different CPython versions all the way to
   `InstructionTranslator.__init__`, where we store a mapping from the
   ids of these cell objects, and use it later in
   `UserFunctionVariable.bind_args` to look for these unboxed cells.
2. builds off (1) -- rather than using a `VariableTracker` that
   represents the content of the unboxed cells, use `ClosureVariable`,
   which enables codegen in case these cells escape as closure of a
   `NestedUserFunctionVariable`.

The patch adds a regression test for each of the scenarios above:
1. `test_write_to_cells_with_name_shadowing` where Dynamo mistakenly
   thought the program is writing to a cell captured by root frame (which
   it doesn't support atm), which resulted in
```
  File "/Users/ryanguo99/Documents/work/pytorch/torch/_dynamo/symbolic_convert.py", line 3340, in STORE_DEREF
    unimplemented("write to __closure__ while inlining")
  File "/Users/ryanguo99/Documents/work/pytorch/torch/_dynamo/exc.py", line 313, in unimplemented
    raise Unsupported(msg, case_name=case_name)
torch._dynamo.exc.Unsupported: write to __closure__ while inlining
```
2. `test_existing_func_that_creates_capturing_nested_func` where Dynamo
   ended up trying to codegen a `NestedUserFunctionVariable` that
   captures a cell which was also captured by the root frame, so it was
   unboxed and ends up emitting `LOAD_DEREF` rather than
   `LOAD_FAST/LOAD_CLOSURE` during codegen, resulting in
```
  File "/Users/ryanguo99/Documents/work/pytorch/torch/_dynamo/variables/functions.py", line 105, in _create_nested_fn
    func = FunctionType(code, f_globals, name, defaults, closure)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: arg 5 (closure) expected cell, found int
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140436
Approved by: https://github.com/jansel, https://github.com/williamwen42
ghstack dependencies: #140330, #140152
2024-11-15 17:17:30 +00:00
Ryan Guo
85dd7b84cf [dynamo] Add a DynamoFrameType type above Python frame object (#140330)
This patch introduces a `DynamoFrameType` to serve as a layer between
Dynamo and different versions of Python frame object. In
`DynamoFrameType`, we only register attributes Dynamo cares about (e.g.,
`f_code`, `f_locals`, etc.

This will be helpful when it comes to adding new attributes to this
`DynamoFrameType`, or dealing with Python version changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140330
Approved by: https://github.com/jansel, https://github.com/williamwen42
2024-11-15 17:17:30 +00:00
Sam Larsen
b11ff3cf60 [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)
Here's the overview:

There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits.

Some specifics:
* Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile).
* Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed.
* Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead.
* `record_compilation_metrics` is now called on exit from MetricsContext.
* Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`.
* Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext.

And specifically, several changes to dynamo_timed:
* "Modernize" the parameters and update all callsites accordingly.
* Move the backwards logging of the CompilationMetrics to the backwards compile location.
* Add a parameter for which CompilationMetrics field to update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849
Approved by: https://github.com/ezyang
2024-11-14 19:11:20 +00:00
PyTorch MergeBot
d63eb3c46c Revert "[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)"
This reverts commit cb15c15157.

Reverted https://github.com/pytorch/pytorch/pull/139849 on behalf of https://github.com/kit1980 due to Breaking an internal tests + there is a bug according to the author ([comment](https://github.com/pytorch/pytorch/pull/139849#issuecomment-2474459094))
2024-11-13 18:47:51 +00:00
Ryan Guo
6a821c9e6a [dynamo] Remove cell unboxing/restart optimization (#140149)
We added an unboxing optimization to avoid writes to cells that existed
before Dynamo tracing (such writes interfere with HOPs). However, the
avoided write shouldn't be there in the first place, since we were
basically creating an empty `NewCellVariable`, and then write the
pre-existing content into the variable.

This patch
1. adds logic to bypass the initial write for pre-existing cells
   without undermining correctness.
2. removes the unboxing optimization and the restart code path.

Fixes #137456, #138491; also see those issues for more historical
context.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140149
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #140035, #140036
2024-11-13 03:14:23 +00:00
Sam Larsen
cb15c15157 [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)
Here's the overview:

There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits.

Some specifics:
* Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile).
* Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed.
* Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead.
* `record_compilation_metrics` is now called on exit from MetricsContext.
* Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`.
* Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext.

And specifically, several changes to dynamo_timed:
* "Modernize" the parameters and update all callsites accordingly.
* Move the backwards logging of the CompilationMetrics to the backwards compile location.
* Add a parameter for which CompilationMetrics field to update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849
Approved by: https://github.com/ezyang
ghstack dependencies: #140094
2024-11-11 14:24:23 +00:00
Edward Z. Yang
af682f3cd7 Move put_code_state to only trigger on successful compile (#140042)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Differential Revision: [D65623081](https://our.internmc.facebook.com/intern/diff/D65623081)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140042
Approved by: https://github.com/markkm
2024-11-08 04:19:50 +00:00
Oguz Ulgen
1270c78268 Add logging for num_triton_bundles (#139807)
Summary: Adding logs for number of inductor cache triton bundles

Test Plan:
Ran adhoc code and looked at dynamo_compile/sandbox

https://fburl.com/scuba/dynamo_compile/sandbox/nhktfy19

Differential Revision: D65490826

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139807
Approved by: https://github.com/masnesral
2024-11-06 21:11:04 +00:00
Laith Sakka
3f248a5735 Classify miss-inplaced tensors in logs. (#139240)
Summary:
use signpost logs,
a followup is to remove the field possibly_missed_reinplacing_opportunities form dynamo compile table.

Differential Revision: D65180194

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139240
Approved by: https://github.com/zou3519
2024-11-04 23:56:14 +00:00
Edward Z. Yang
585dbfa583 Profile guided optimization for automatic_dynamic (#139001)
Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR.

This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001
Approved by: https://github.com/oulgen
2024-11-03 06:29:57 +00:00
PyTorch MergeBot
92d7f29e59 Revert "Profile guided optimization for automatic_dynamic (#139001)"
This reverts commit f6be44c74e.

Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to more fbcode errors ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452985581))
2024-11-02 13:11:04 +00:00
Edward Z. Yang
f6be44c74e Profile guided optimization for automatic_dynamic (#139001)
Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR.

This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001
Approved by: https://github.com/oulgen
2024-11-02 11:50:11 +00:00
PyTorch MergeBot
8d1eaa3da6 Revert "Profile guided optimization for automatic_dynamic (#139001)"
This reverts commit a6630bcf87.

Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to internal code triggers import cycle ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452833882))
2024-11-02 03:38:15 +00:00
Edward Z. Yang
a6630bcf87 Profile guided optimization for automatic_dynamic (#139001)
Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR.

This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001
Approved by: https://github.com/oulgen
2024-11-01 21:43:25 +00:00
James Wu
c8a648d4df Add option to dynamo_timed and chromium_event_logger for logging pt2 compile events (#139309)
This diff considerably changes the column format of PT2 Compile Events:

- Now, instead of logging one new column per every piece of metadata, we just log a single column, "metadata". This vastly decreases the number of columns we need to log, which should help with retention.

- Now, we only log to scuba for a set of dynamo_timed() events that we actually care about aggregating. To do so, we add a boolean to dynamo_timed() that decides whether or not to log a pt2_compile_event. We'll always log a chromium_event for every dynamo_timed(), but only log a subset of those to scuba.

Differential Revision: [D65225598](https://our.internmc.facebook.com/intern/diff/D65225598/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139309
Approved by: https://github.com/oulgen
2024-11-01 02:40:25 +00:00
Edward Z. Yang
ae6cbd4256 Block more keys from config serialization (#139285)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139285
Approved by: https://github.com/jovianjaison, https://github.com/markkm, https://github.com/c00w
2024-10-30 23:05:59 +00:00
Edward Z. Yang
a4b35767cb Don't have random print in convert_frame (#139203)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139203
Approved by: https://github.com/Skylion007
2024-10-30 09:35:37 +00:00
Edward Z. Yang
bca696ae81 Switch times to us in CompilationMetrics and improvements (#138975)
Companion logger diff: https://www.internalfb.com/diff/D65012523

* Using float seconds for timestamps is bad because our internal system defaults to float32 precision and you don't even get second precision for timestamps in float32
* We decide to use microseconds instead of milliseconds because millisecond granularity you can end up with the same timestamp if compilation is happening very quickly; much better to force non-overlapping spans
* Because there are so many new fields and I don't feel like reimplementing each on BwdCompilationMetrics, BwdCompilationMetrics is no more, it's just that everything in CompilationMetrics is now optional.
* The actual frame compile times collection is not modified (still float) to reduce blast radius, so I just convert to microseconds before making the record. At float64 precision (Python's default), you get about microsecond precision on timestamps so shouldn't be a data problem (https://www.leebutterman.com/2021/02/01/store-your-unix-epoch-times-as-float64.html)
* I rename some entries for clarity. In particular, whenever a timing contains all of the its lower phases (e.g., how Inductor also contains Triton compilation) we put "cumulative" in its name.  If something doesn't happen at compile time but is delayed until we have actual real inputs, we put "runtime" in its name.

Test plan:

```
buck2 run @mode/opt @mode/inplace //scripts/oulgen:runner
```

And then inspect https://fburl.com/scuba/dynamo_compile/sandbox/mslu7f5w and verify the us columns are populated and meaningful.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138975
Approved by: https://github.com/masnesral
2024-10-28 17:17:18 +00:00