mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: X-link: https://github.com/pytorch/benchmark/pull/2454 This adds structured logging overhead at a per compile basis to compilation metrics. To do so, we track the frame_id_frame_compile_id that trace_structured uses to categorize compiles, and use that as the key in our timing table. Implementation notes: - If there's times we call trace_structured without a compile id, the time won't be measured. Not really a good way around that today given the compile id framework of compilation metrics. Strobelight is still the best way to measure on a per job basis. - We don't actually measure the time it takes to log the compilation metrics itself. Fundamentally, it's not possible to log this properly if we're storing the logging number *in* compilation metrics, since there's no way to measure it before we do it(unless we want discrepancies between dynamo_compile and tlparse, which seems suboptimal). Hopefully for a large job, the cost of structured_logging compilation metrics itself is small. - I wanted to use frame_phase_timing here, but there's a bunch of ids to iron out, and I don't really want to deal with that headache. compilation_time_metrics is sort of what I want, but that isn't by frame/compile id, so it's also a bit off. Putting it into torch.logging as a separate thing so logging tracks its own overhead seems fine, though. Test Plan: Run benchmarks/nanogpt and staging logger. See that the new compilation metric is logged to the staged dynamo_compile table: https://fburl.com/scuba/logger_staging_jjwu_30582a48f1ff9cf5f4ac50a4c40af/xazjg5xq Note that the sum(structured_logging_overhead_s) / sum(entire_frame_compile_time) = 8.387 / 124.278 = 6%, which seems reasonable as the overhead for a small compilation like this. You can also look at samples for a more detailed log of this. Reviewed By: oulgen Differential Revision: D62643611 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136142 Approved by: https://github.com/bobrenjc93
19 lines
776 B
Python
19 lines
776 B
Python
# Top level logging module for torch logging
|
|
# Design doc: https://docs.google.com/document/d/1ZRfTWKa8eaPq1AxaiHrq4ASTPouzzlPiuquSBEJYwS8/edit#
|
|
# Simple setup for onboarding (see above doc for more detail):
|
|
# 1. register any top-level log qualified name for your module in torch._logging._registrations (see there for examples)
|
|
# 2. register any artifacts (<artifact_name> below) in torch._logging._registrations
|
|
# a. call getArtifactLogger(__name__, <artifact_name>) at your logging site instead of the standard logger to log your artifact
|
|
import torch._logging._registrations
|
|
|
|
from ._internal import (
|
|
_init_logs,
|
|
DEFAULT_LOGGING,
|
|
get_structured_logging_overhead,
|
|
getArtifactLogger,
|
|
LazyString,
|
|
set_logs,
|
|
trace_structured,
|
|
warning_once,
|
|
)
|