Commit Graph

22 Commits

Author SHA1 Message Date
William Wen
e800d27b10 [dashboard] Add graphs for all summary metrics, add additional testing flags (#89580)
Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580
Approved by: https://github.com/davidberard98
2022-11-23 20:11:39 +00:00
William Wen
8bf8e4d71e [dashboard] Add metric graphs back to dashboard (#89531)
Title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89531
Approved by: https://github.com/davidberard98
2022-11-22 23:42:09 +00:00
Animesh Jain
5bba783d21 [dashboard] Remove aot_cudagraphs and nvprims_nvfuser (#89514)
Helps speeding up Dashboard runs

We will bring these back when the backends are ready to be tested on full model suite.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89514
Approved by: https://github.com/SherlockNoMad
2022-11-22 22:25:30 +00:00
William Wen
77d7f2c659 [dashboard] Add commit date & fix date related issues (#89517)
Add commit date to build summary of dashboard. Make the date of the run reflective of when the run started, not when the run ended. Use PST (UTC -8) to determine day, rather than GMT (UTC +0).

Test comment: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1324176119

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89517
Approved by: https://github.com/anijain2305
2022-11-22 21:17:36 +00:00
William Wen
fa4980cd5e Add commit hash to dynamo dashboard (#89462)
Title - also fix a small bug with dashboard outputs.

Sample: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1322732698

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89462
Approved by: https://github.com/anijain2305
2022-11-21 22:56:13 +00:00
William Wen
af448e84eb Fix bug in dynamo dashboard summary stats diff (#89226)
Fixes issue where a suite may not be present in one of the logs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89226
Approved by: https://github.com/anijain2305
2022-11-17 19:20:49 +00:00
William Wen
640af8d70a More dynamo dashboard improvements (#89155)
A number of dashboard improvements:
- Add accuracy failures to warnings section
- Add regression detection to all metrics (speedup, compile time, peak memory), not just accuracy
- Add testing flag to update-dashboard to prevent image/comment uploads
- Add section for comparing summary statistics (passrate, speedup) between 2 most recent reports
- Show names of reports for summary stats diff and regression detection sections
- Remove metric graphs from the comment (they can still be found in the generated text file)

Sample comment: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1317565972

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89155
Approved by: https://github.com/anijain2305
2022-11-16 21:54:27 +00:00
William Wen
45d2daaf85 Fix lookup file update in dashboard (#89024)
Lookup file should be updated before graphs are generated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89024
Approved by: https://github.com/mlazos, https://github.com/anijain2305
2022-11-15 02:32:55 +00:00
William Wen
36d87465fb Fix long comment error on dashboard (#89002)
Fix dashboard comment failure due to the following trace:
```
Traceback (most recent call last):
  File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1180, in <module>
    DashboardUpdater(args).update()
  File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1119, in update
    self.comment_on_gh(comment)
  File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1096, in comment_on_gh
    subprocess.check_call(
  File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 368, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 349, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/data/home/anijain/miniconda/bin/gh'
srun: error: a100-st-p4d24xlarge-27: task 0: Exited with exit code 1
```
That is, we were trying to execute a gh command in the OS that was too long.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89002
Approved by: https://github.com/davidberard98
2022-11-14 18:43:50 +00:00
William Wen
4bcf2c53e5 Add warnings & regressions info text (#88837)
Add text about what warnings and accuracy regressions dropdowns mean.

Sample: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1310770285

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88837
Approved by: https://github.com/anijain2305
2022-11-10 19:22:09 +00:00
William Wen
0b8889c724 Do not flag models in dashboard due to NaN values (#88792)
Title.

Tested by running `python benchmarks/dynamo/runner.py --output-dir ../test-dynamo-runner-logs-4 --training --visualize_logs` on a copy of a recent set of logs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88792
Approved by: https://github.com/anijain2305
2022-11-10 01:48:04 +00:00
William Wen
6e3555edea Add absolute latency to dashboard (#88790)
Add absolute latency to dashboard, as requested by https://github.com/pytorch/torchdynamo/issues/1833#issuecomment-1302742914

Tested by setting `run.sh` to
```
# Setup the output directory
rm -rf ../test-dynamo-runner-logs-7/
mkdir ../test-dynamo-runner-logs-7/

# Commands for torchbench for device=cuda, dtype=float32 for training and for performance testing
python benchmarks/dynamo/torchbench.py --performance --float32 -dcuda --output=../test-dynamo-runner-logs-7//inductor_torchbench_float32_training_cuda_performance.csv --training --inductor   --no-skip --dashboard --only mobilenet_v2 --cold_start_latency

# Commands for torchbench for device=cuda, dtype=float32 for training and for accuracy testing
python benchmarks/dynamo/torchbench.py --accuracy --float32 -dcuda --output=../test-dynamo-runner-logs-7//inductor_torchbench_float32_training_cuda_accuracy.csv --training --inductor   --no-skip --dashboard --only mobilenet_v2
```
and running `python benchmarks/dynamo/runner.py --output-dir ../test-dynamo-runner-logs-7/ --dashboard-archive-path /data/home/williamwen/dynamo-runner-logs-copy --training --run --compilers inductor --flag-compilers inductor --suites torchbench --update-dashboard`  (need to comment out the `generate_commands` line and change the github issue ID from 681 to something else).

Sample comment: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1309645562

NOTE: this change breaks processing old logs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88790
Approved by: https://github.com/anijain2305
2022-11-10 01:45:52 +00:00
William Wen
16bd363863 Fix dynamo dashboard passrate denominator (#88777)
Before the dashboard improvements, the passrate table looked like this:
~~~
+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 98%, 54/55 | 100%, 43/43 | 100%, 61/61 |
|       aot_eager        | 95%, 52/55 | 100%, 43/43 | 97%, 59/61  |
|     aot_cudagraphs     | 75%, 41/55 | 49%, 21/43  | 38%, 23/61  |
|    nvprims_nvfuser     | 71%, 39/55 |  16%, 7/43  | 48%, 29/61  |
|        inductor        | 87%, 48/55 | 93%, 40/43  | 95%, 58/61  |
| inductor_no_cudagraphs | 93%, 51/55 | 93%, 40/43  | 95%, 58/61  |
+------------------------+------------+-------------+-------------+
~~~
After the change, the table looked like:
~~~
+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 82%, 53/65 | 84%, 43/51  | 82%, 61/74  |
|       aot_eager        | 83%, 54/65 | 84%, 43/51  | 82%, 61/74  |
|     aot_cudagraphs     | 69%, 45/65 | 65%, 33/51  | 38%, 28/74  |
|    nvprims_nvfuser     | 48%, 31/65 | 78%, 40/51  | 26%, 19/74  |
|        inductor        | 75%, 49/65 | 82%, 42/51  | 81%, 60/74  |
| inductor_no_cudagraphs | 82%, 53/65 | 82%, 42/51  | 82%, 61/74  |
+------------------------+------------+-------------+-------------+
~~~
There is no actual regression, but the passrate is lower since the denominator is wrong. Check fix by running locally (e.g. `python benchmarks/dynamo/runner.py --output-dir ../test-dynamo-runner-logs-5 --training --visualize_logs`) and comparing passrate table output to previously correct one.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88777
Approved by: https://github.com/anijain2305
2022-11-10 00:26:58 +00:00
William Wen
0e67b2f7dd Dynamo Dashboard Improvements (#88516)
Implement various features in https://github.com/pytorch/torchdynamo/issues/1644:
- Upload nightly run logs to /fsx before parsing - for backing up parsing failures.
- Flag models with (1) < 0.95x speedup, (2) > 2min compile time, (3) < 0.9x compression ratio
- Flag models that were passing yesterday but failed today.
- Other small bug fixes.

See https://github.com/pytorch/torchdynamo/issues/1831 for sample outputs.
Also tested by running run.sh:
```bash
# Setup the output directory
rm -rf ../test-dynamo-runner-logs-3/
mkdir ../test-dynamo-runner-logs-3/

# Commands for torchbench for device=cuda, dtype=float32 for training and for performance testing
python benchmarks/dynamo/torchbench.py --performance --float32 -dcuda --output=../test-dynamo-runner-logs-3//inductor_torchbench_float32_training_cuda_performance.csv --training --inductor   --no-skip --dashboard --only mobilenet_v2 --cold_start_latency

# Commands for torchbench for device=cuda, dtype=float32 for training and for accuracy testing
python benchmarks/dynamo/torchbench.py --accuracy --float32 -dcuda --output=../test-dynamo-runner-logs-3//inductor_torchbench_float32_training_cuda_accuracy.csv --training --inductor   --no-skip --dashboard --only mobilenet_v2
```

with the command
`python benchmarks/dynamo/runner.py --output-dir ../test-dynamo-runner-logs-3/ --dashboard-archive-path /data/home/williamwen/dynamo-runner-logs-copy --training --run --compilers inductor --flag-compilers inductor --suites torchbench --update-dashboard` (need to comment out the `generate_commands` line and change the github issue ID from 681 to something else).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88516
Approved by: https://github.com/anijain2305
2022-11-07 22:24:44 +00:00
Animesh Jain
f8b73340c8 [dashboard] Replace aot_nvfuser with nvprims_nvfuser (#88437)
@IvanYashchuk @ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88437
Approved by: https://github.com/soumith
2022-11-03 19:07:03 +00:00
Yanbo Liang
72958b9665 [Dynamo] Update Dynamo benchmarks running commands (#87844)
Fixes https://github.com/pytorch/torchdynamo/issues/1761

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87844
Approved by: https://github.com/jansel
2022-11-01 22:45:13 +00:00
Animesh Jain
d67b2edec3 [dynamo][dashboard] minor fixes for a clean Dashboard (#88056)
* better check for cold start latency
* sort on inductor column for better readability.

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88056
Approved by: https://github.com/ngimel
2022-10-31 02:30:29 +00:00
Animesh Jain
83b381d34d [dynamo] add inductor runs w/o cudagraphs (#87847)
as title

cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87847
Approved by: https://github.com/jansel
2022-10-27 19:49:29 +00:00
William Wen
cc64863d71 Clean Inductor complication cache during dynamo dashboard run (#87246)
Implement improvement from https://github.com/pytorch/torchdynamo/issues/1644.

Tested by running `python benchmarks/dynamo/runner.py --print_run_commands --training` and inspecting the generated `run.sh` file for the `--cold_start_latency` flag, e.g.
```
python benchmarks/dynamo/torchbench.py --performance --float32 -dcuda --output=benchmark_logs/inductor_torchbench_float32_training_cuda_performance.csv --training --inductor   --no-skip --dashboard -x fambench_xlmr -x detectron2_fasterrcnn_r_50_c4 -x detectron2_fasterrcnn_r_50_dc5 -x detectron2_maskrcnn_r_101_fpn -x detectron2_maskrcnn_r_50_fpn -x detectron2_fasterrcnn_r_50_fpn -x detectron2_maskrcnn -x detectron2_fasterrcnn_r_101_dc5 -x opacus_cifar10 -x detectron2_maskrcnn_r_101_c4 -x pyhpc_turbulent_kinetic_energy -x maml -x detectron2_fasterrcnn_r_101_fpn -x pyhpc_equation_of_state -x detectron2_fasterrcnn_r_101_c4 -x pyhpc_isoneutral_mixing --cold_start_latency
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87246
Approved by: https://github.com/anijain2305, https://github.com/jansel
2022-10-19 16:39:12 +00:00
Animesh Jain
c30cfb07ab [dynamo][dashboard] Run 2 iterations for the correctness runs (#87104)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87104
Approved by: https://github.com/soumith
2022-10-18 15:53:40 +00:00
Animesh Jain
2b558138cf [inductor] Set correct strides in fallback example run (#87049)
Fixes #ISSUE_NUMBER

Helps in resolving many issues seen in https://github.com/pytorch/torchdynamo/issues/1675
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87049
Approved by: https://github.com/jansel
2022-10-17 15:43:53 +00:00
Jason Ansel
c7c09722ad Move TorchDynamo into PyTorch core (#86461)
Context:
https://github.com/pytorch/torchdynamo/issues/1588

This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core.
- `torchdynamo` becomes `torch._dynamo`
- `torchinductor` becomes `torch._inductor`

This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461
Approved by: https://github.com/voznesenskym
2022-10-13 23:18:06 +00:00