pytorch/torch
Sam Larsen cb15c15157 [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849)
Here's the overview:

There's a new contextmanager singleton called MetricsContext. Entering the MetricsContext is how we demarcate the boundary on which we'll create a single CompilationMetrics object, and therefore, a single dynamo_compile log entry. While we're inside the MetricsContext, we can update/set many different metrics. Most importantly: `dynamo_timed` can also update the in-progress MetricsContext. In the proposal here, we tell `dynamo_timed` that we want it to do so by providing the name of the MetricsContext field to increment. There can be many `dynamo_timed` calls in different parts of the code updating different fields. Then when the MetricsContext exits, that's when the logging of everything gathered finally happens. One potential footgun is trying to use `dynamo_timed` when we haven't entered the MetricsContext, but we assert on that problem. Another problem is that we re-enter the context recursively, but we watch for that and do the logging only when the outermost exits.

Some specifics:
* Introduce MetricsContext - a context manager that on exit, records the CompilationMetrics (which also logs to dynamo_compile).
* Completely remove the concept of frame_phase_timing. Instead, update the MetricsContext during compilation, either directly or via dynamo_timed.
* Remove some globals we previously used to accumulate counters to later populate a CompilationMetrics. We use CompilationMetrics set/update/increment APIs instead.
* `record_compilation_metrics` is now called on exit from MetricsContext.
* Populate legacy CompilationMetrics fields right before logging, inside `record_compilation_metrics`.
* Remove the one-off `add_remote_cache_time_saved` helper; capture that timing directly into the MetricsContext.

And specifically, several changes to dynamo_timed:
* "Modernize" the parameters and update all callsites accordingly.
* Move the backwards logging of the CompilationMetrics to the backwards compile location.
* Add a parameter for which CompilationMetrics field to update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139849
Approved by: https://github.com/ezyang
ghstack dependencies: #140094
2024-11-11 14:24:23 +00:00
..
_awaits
_C [SymmetricMemory] improve the API for stream_write_value32 (#139934) 2024-11-11 01:54:35 +00:00
_C_flatbuffer
_custom_op
_decomp Revert "Fix split decomp returning self (#140065)" 2024-11-09 00:16:26 +00:00
_dispatch
_dynamo [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849) 2024-11-11 14:24:23 +00:00
_export Support symbolic builtin round in export (#139549) 2024-11-07 02:49:44 +00:00
_functorch [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849) 2024-11-11 14:24:23 +00:00
_higher_order_ops [inductor] Support autotune restore_value for user-defined Triton kernels (#139851) 2024-11-08 14:59:00 +00:00
_inductor [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849) 2024-11-11 14:24:23 +00:00
_lazy
_library Optimize mutable torch.library.custom_op overhead (#139513) 2024-11-05 18:30:53 +00:00
_logging [logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139849) 2024-11-11 14:24:23 +00:00
_numpy
_prims use torch.sym_sum instead of incremental sum in _cat_meta (#139653) 2024-11-05 07:24:24 +00:00
_prims_common check fake/real mismatches during real tensor prop (#137747) 2024-11-04 23:39:48 +00:00
_refs Revert "Fix unbind_copy and add its decomposition (#134319)" 2024-10-29 04:54:37 +00:00
_strobelight Increase default COMPILE_STROBELIGHT_MAX_STACK_LENGTH to 500 (#138006) 2024-10-17 07:31:32 +00:00
_subclasses refuse to generate a symbolic variable if a float input is inf (#139846) 2024-11-07 09:16:55 +00:00
_vendor
accelerator Introduce a device-agnostic runtime API design (#132204) 2024-10-27 10:37:09 +00:00
amp [MPS] Update error message for supported autocast type (#139192) 2024-10-30 16:48:29 +00:00
ao Revert "Tighten type hints for tensor arithmetic (#135392)" 2024-11-08 23:44:41 +00:00
autograd [Profiler] Create Auto-Trace Frontend for Trace ID (#139310) 2024-10-31 19:02:57 +00:00
backends Revert "[sparse] add search for optimal alg_id to torch.compile (#137427)" 2024-10-24 17:27:06 +00:00
compiler Profile guided optimization for automatic_dynamic (#139001) 2024-11-03 06:29:57 +00:00
contrib Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
cpu [Inductor][CPP] Add oneDNN BRGEMM config for Half cpp gemm template (#136255) 2024-11-05 05:33:29 +00:00
csrc [SymmetricMemory] improve the API for stream_write_value32 (#139934) 2024-11-11 01:54:35 +00:00
cuda Adds snapshot API for MemPools to get pool memory segments (#133601) 2024-10-29 01:01:47 +00:00
distributed [SymmetricMemory] improve the API for stream_write_value32 (#139934) 2024-11-11 01:54:35 +00:00
distributions Clarify meaning of rate parameter in Gamma distribution (#134847) 2024-11-09 00:22:13 +00:00
export [export] Dedup data-dependent errors based on stacktrace (#139540) 2024-11-05 18:16:05 +00:00
fft
func
futures
fx Fix another item memo loss location + bool specialization bug (#139587) 2024-11-09 03:11:19 +00:00
jit Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
legacy
lib
linalg
masked [BE]: Update Typeguard to TypeIs for better type inference (#133814) 2024-10-26 15:07:13 +00:00
monitor
mps
mtia
multiprocessing Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
nested Misc. non-contig NJT fixes (#140160) 2024-11-09 01:18:26 +00:00
nn Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ (#139662) 2024-11-07 23:13:23 +00:00
onnx [ONNX] Update TorchTensor implementation to handle fake mode (#139534) 2024-11-07 04:36:24 +00:00
optim Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)" 2024-11-05 23:10:38 +00:00
package Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
profiler [fx graph cache] Support freezing with FX graph caching (#136505) 2024-11-01 18:29:29 +00:00
quantization Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
signal
sparse Revert "[sparse] add search for optimal alg_id to torch.compile (#137427)" 2024-10-24 17:27:06 +00:00
special
testing Recover non-standard bool test for msort (#139870) 2024-11-11 02:00:34 +00:00
utils Speed up AMD AOT Inductor lowering by memoizing hipify trie to regex logic (#140156) 2024-11-09 04:28:58 +00:00
xpu Add torch.xpu.get_arch_list and torch.xpu.get_gencode_flags for XPU (#137773) 2024-10-18 02:28:08 +00:00
__config__.py
__future__.py
__init__.py Removing warning for Windows Arm64 (#139746) 2024-11-08 16:23:59 +00:00
_appdirs.py
_classes.py
_compile.py
_custom_ops.py
_deploy.py
_environment.py
_guards.py [hierarchical-compilation][invoke_subgraph] Use tracing context to cache artifacts of dispatch keys (#137965) 2024-10-22 15:33:42 +00:00
_jit_internal.py
_linalg_utils.py
_lobpcg.py
_lowrank.py
_meta_registrations.py [inductor] support masked_scatter w/ unbacked sized source (#138083) 2024-11-06 02:16:25 +00:00
_namedtensor_internals.py
_ops.py remove redundant a (#139046) 2024-10-28 17:47:24 +00:00
_python_dispatcher.py
_size_docs.py
_sources.py
_storage_docs.py
_streambase.py
_tensor_docs.py
_tensor_str.py
_tensor.py Remove numpy dependency for maia serialization (#137600) 2024-10-28 20:57:35 +00:00
_thread_safe_fork.py
_torch_docs.py Fix type description of torch.chunk (#140089) 2024-11-08 15:21:13 +00:00
_utils_internal.py justknobs: Remove JustKnobsConfig and justknobs_feature (#138767) 2024-11-07 00:21:46 +00:00
_utils.py Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)" 2024-11-05 23:10:38 +00:00
_VF.py
_vmap_internals.py
_weights_only_unpickler.py [DTensor][unpickler] Add DTensor related classes to allowed globals so we can still torch.load(DTensor) with weights_only=True (#139949) 2024-11-08 05:06:11 +00:00
abi-check.cpp
CMakeLists.txt Add torch.version.xpu (#139466) 2024-11-09 13:31:21 +00:00
custom_class_detail.h Remove some pre-cpp17 stuff (#138410) 2024-10-23 00:38:03 +00:00
custom_class.h Remove some pre-cpp17 stuff (#138410) 2024-10-23 00:38:03 +00:00
extension.h
functional.py Clarify opt-einsum usage, fix #127109 (#137596) 2024-10-09 20:31:24 +00:00
hub.py Remove unused Python variables in torch/[b-z]* (#136963) 2024-10-19 16:45:22 +00:00
library.h [1/N] Enable cppcoreguidelines-special-member-functions (#137405) 2024-10-23 00:16:53 +00:00
library.py no-op torch.library.custom_op APIs on torch.deploy (#139509) 2024-11-04 18:01:08 +00:00
overrides.py Add Weighted Loss Functions to PyTorch : WMSE, WMAE, and Weighted Huber Loss (#132049) 2024-10-31 21:59:43 +00:00
py.typed
quasirandom.py
random.py [Torch] Support meta device in random.fork_rng (#137715) 2024-10-16 18:00:39 +00:00
README.txt
return_types.py
script.h
serialization.py Forward fix D65441551 for T206731737 (#139767) 2024-11-05 23:19:08 +00:00
storage.py Fix .to(cpu) for Storage (#138011) 2024-10-23 01:31:48 +00:00
torch_version.py
types.py
version.py.tpl

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.