[AOTI][dashboard] Update how peak memory is measured (#150534)

Summary: In the dashboard measurement script, AOTI needs to run Eager first to register the output pytree, so the peak memory compression ratio on the dashboard is always close to 1. Update AOTI run to use an extra warmup run, so the peak memory compression ratio measures the result at the run time instead of the compile time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150534 Approved by: https://github.com/yushangdi
2025-12-06 12:20:52 +01:00 · 2025-04-02 06:00:29 -07:00 · 2025-04-02 06:00:29 -07:00 · d4c30b4599
commit d4c30b4599
parent 6fa1b17195
1 changed files with 4 additions and 0 deletions
--- a/benchmarks/dynamo/common.py
+++ b/benchmarks/dynamo/common.py
@ -3735,6 +3735,10 @@ def run(runner, args, original_dir=None):
            # AOTInductor doesn't support control flow yet
            runner.skip_models.update(runner.skip_models_due_to_control_flow)
            runner.skip_models.update(runner.skip_models_due_to_export_not_supported)
+
+            # For AOTI, we only measure the memory compression ratio at the run time
+            # instead of the compile time, so use a warmup run to trigger AOTI compilation.
+            args.use_warm_peak_memory = True
        elif args.backend == "torchao":
            assert "cuda" in args.devices, "Quantization requires CUDA device."
            assert args.bfloat16, "Quantization requires dtype bfloat16."