pytorch/torch/_logging/structured.py
Edward Z. Yang 1a1fc1047d Add structured trace logs (#120289)
Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit

How to read the diff:
* Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes)
* torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs
* torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines.
* torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log.
* test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable.

https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289
Approved by: https://github.com/Skylion007
ghstack dependencies: #120712
2024-02-28 01:01:41 +00:00

38 lines
873 B
Python

"""
Utilities for converting data types into structured JSON for dumping.
"""
import traceback
from typing import Dict, Sequence
import torch._logging._internal
INTERN_TABLE: Dict[str, int] = {}
def intern_string(s: str) -> int:
r = INTERN_TABLE.get(s, None)
if r is None:
r = len(INTERN_TABLE)
INTERN_TABLE[s] = r
torch._logging._internal.trace_structured(
"str", lambda: (s, r), suppress_context=True
)
return r
def from_traceback(tb: Sequence[traceback.FrameSummary]) -> object:
r = []
for frame in tb:
# dict naming convention here coincides with
# python/combined_traceback.cpp
r.append(
{
"line": frame.lineno,
"name": frame.name,
"filename": intern_string(frame.filename),
}
)
return r