Commit Graph

869 Commits

Author SHA1 Message Date
Edward Z. Yang
e1ed5ad5a5 Add a timeout to benchmark script (#90634)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90634
Approved by: https://github.com/voznesenskym
2022-12-11 23:12:29 +00:00
Jiong Gong
181d37475d Simple fix: add missing positional arg in init_optimizer() call (#90641)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90641
Approved by: https://github.com/kit1980
2022-12-11 13:18:05 +00:00
Bin Bao
fd3f5d7bf7 [inductor] Update TIMM skip list (#90188)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90188
Approved by: https://github.com/anijain2305
2022-12-09 21:30:23 +00:00
Animesh Jain
d91d7a3221 [reland][dynamo] use optimizers correctly in benchmarking (#87492)
Reland https://github.com/pytorch/pytorch/pull/87311

mlazos: updated to use SGD to not add a bunch of additional memory allocations (like Adam)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87492
Approved by: https://github.com/desertfire
2022-12-09 20:32:53 +00:00
Bin Bao
f7cdd3a7a0 [inductor] Use a large tolerance for botnet26t_256 (#90383)
Summary: botnet26t_256 shows random tolerance failure on CI. The root
cause of this randomness is still to-be-invesitgated, but let's use a
larger tolerance for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90383
Approved by: https://github.com/ezyang
2022-12-07 19:35:06 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
David Berard
8f079b895b [Dynamo+FSDP] Update benchmarks with use_orig_params=True (#90100)
After https://github.com/pytorch/pytorch/pull/89523, we now need to assert use_orig_params=True, even in the non-recursive case where (I think) we wouldn't otherwise need to run with use_orig_params=True.

Tested with `python benchmarks/dynamo/torchbench.py --training --accuracy --only hf_T5 --fsdp`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90100
Approved by: https://github.com/wconstab
2022-12-07 03:33:58 +00:00
Richard Zou
4068c5467d [Reland] Move functorch/_src to torch/_functorch (#88756) (#90091)
This will be the last disruptive functorch internals change.

Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.

Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times

Test Plan:
- wait for tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90091
Approved by: https://github.com/anijain2305, https://github.com/ezyang
2022-12-03 14:17:15 +00:00
Wang, Eikan
0bde810572 Add more debug information for Inductor (#90008)
- Add graph index to the profile information of the Inductor kernel for better debugability.

  The generated code for different graphs could produce kernels with the same name. The side effect is that it is hard to identify the portion of E2E performance for these kernels because the profiler will aggregate the performance with the same kernel name regardless of different graphs. Hence, this PR added the graph index to the profile information to address this limitation.

- Label arbitrary code ranges for `eager` and `opt` modes for better debugability

  The profile information of dynamo benchmarks mixes the eager mode and opt mode. It is hard to separate the range for different modes. This PR added eager and opt marks to the profile information to address this limitation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90008
Approved by: https://github.com/jgong5, https://github.com/jansel
2022-12-02 09:34:48 +00:00
Animesh Jain
3162a48a77 [dynamo][benchmarks] Call zero grad (#90026)
Hoping that it might reduce some flakiness

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90026
Approved by: https://github.com/williamwen42
2022-12-02 04:05:57 +00:00
Animesh Jain
68805b08d1 [benchmarks][dynamo] Trying CI - Set train() for TIMM models accuracy tests (#89780)
Moving to train mode for TIMM models and also raising batch size for accuracy testing.

Raising batch size seems to remove a lot of noise/instability coming from batch_norm decomposition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89780
Approved by: https://github.com/ngimel
2022-11-30 12:57:35 +00:00
Animesh Jain
5a79144a79 [dashboaard] Fix flag compilers (#89853)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89853
Approved by: https://github.com/williamwen42
2022-11-30 01:02:36 +00:00
PyTorch MergeBot
218d9c6e09 Revert "Move functorch/_src to torch/_functorch (#88756)"
This reverts commit 52bc5c1cfe.

Reverted https://github.com/pytorch/pytorch/pull/88756 on behalf of https://github.com/clee2000 due to broke imports in tests 52bc5c1cfe https://github.com/pytorch/pytorch/actions/runs/3574742513/jobs/6010814968 probably a landrace
2022-11-29 17:17:11 +00:00
Richard Zou
52bc5c1cfe Move functorch/_src to torch/_functorch (#88756)
This will be the last disruptive functorch internals change.

Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.

Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times

Test Plan:
- wait for tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88756
Approved by: https://github.com/ezyang
2022-11-29 13:55:42 +00:00
Will Constable
7860fcc245 Enable DDPOptimizer by default in dynamo (#88523)
Performance benchmarks on 6 popular models from 1-64 GPUs compiled with
torchinductor show performance gains or parity with eager, and showed
regressions without DDPOptimizer.  *Note: resnet50 with small batch size shows a regression with optimizer, in part due to failing to compile one subgraph due to input mutation, which will be fixed.
(hf_Bert, hf_T5_large, hf_T5, hf_GPT2_large, timm_vision_transformer, resnet50)

Correctness checks are implemented in CI (test_dynamo_distributed.py),
via single-gpu benchmark scripts iterating over many models
(benchmarks/dynamo/torchbench.py/timm_models.py/huggingface.py),
and via (multi-gpu benchmark scripts in torchbench)[https://github.com/pytorch/benchmark/tree/main/userbenchmark/ddp_experiments].

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88523
Approved by: https://github.com/davidberard98
2022-11-29 05:27:06 +00:00
Will Constable
77df2ca9b6 Special-case fsdp wrapped modules to be Unspecialized (#89330)
### Summary
Making dynamo treat the nn.Modules inside FSDP wrappers as 'Unspecialized'
results in dynamo-produced graphs where nn.module parameters are inputs
to the graph rather than attributes of the outer graphmodule.

This helps in FSDP since it forces dynamo to pick the latest copy
of the parameters off the user's nn.Module (which FSDP mutates every pre_forward),
solving the ordering issue in backward.

### Details
Imagine this toy model
```
class MyModule(torch.nn.Module):
    def __init__(self, a, b):
        super(MyModule, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(a, b),
            nn.ReLU(),
        )
    def forward(self, x):
        return self.net(x)

class ToyModel(nn.Module):
    def __init__(self):
        super(ToyModel, self).__init__()
        self.net = nn.Sequential(
            *[MyModule(10, 10000)]
            + [MyModule(10000, 1000)]
            + [MyModule(1000, 5)]
        )

    def forward(self, x):
        return self.net(x)
```
Where FSDP is recursively wrapped around each `MyModule`, then dynamo-compiled, with dynamo already configured to skip/break in FSDP code.  You'd expect to get 3 compiled AOT functions, corresponding to the contents of `MyModule`, and then see FSDP's communication ops happen inbetween them (eagerly).  This almost happens (everything works out fine in forward), but in backward there is an ordering issue.

FSDP creates a flat buffer for all the parameters that are bucketed together, and then creates views into this buffer to replace the original parameters.  On each iteration of forward, it creates a new view after 'filling' the flatbuffer with data from an all-gather operation, to 'unshard' the parameters from remote devices.  Dynamo traces the first such view and stores it in a compiled graphmodule.

During  tracing, we see (1) view created for first MyModule, (2) compile first MyModule, (3) ... for the rest of layers

Then during runtime,  we see (A)  view created for first MyModule (and orphaned), (B) execute first compiled MyModule, using old view, ...

This is a problem, because we want backward hooks to run right after each compiled-backward, but autograd executes those hooks in an order mirroring their execution order during forward.  Since we are forever using the views created during steps (1, 3, ..  N), which all happen before the steps (A, B, ...),  this means that all the hooks will happen after all the compiled backwards.  An illustration of the problem - a torchviz graph showing the 2 possible orderings of autograd, and a profile showing the view-backwards ops happening after all the compiled backwards, and before all the backward hooks.

<img width="2069" alt="image" src="https://user-images.githubusercontent.com/4984825/202828002-32dbbd15-8fc3-4281-93e9-227ab5e32683.png">
<img width="2069" alt="image" src="https://user-images.githubusercontent.com/4984825/202828632-33e40729-9a7f-4e68-9ce1-571e3a8dd2dd.png">

A solution is to make dynamo not specialize on these nn modules.  It is worth pointing out that this nn.module specialization is de-facto failing, as we are modifying .parameters and this bypasses dynamo's __setattr__ monkeypatch, which should have automatically kicked us out to Unspecialized and forced a recompile.

After unspecializing, the new views (created during steps A,  C, ...) are actually _used_ at runtime by the module, making their creation order interleaved, making autograd execute their backwards interleaved.

The new torchviz graph (this time with names added for the view tensors):
<img width="2043" alt="image" src="https://user-images.githubusercontent.com/4984825/202828480-d30005ba-0d20-45d8-b647-30b7ff5e91d3.png">

And a new profile showing the interleaving of compiled backwards and hooks, allowing overlapping of reduce-scatter.
<img width="2293" alt="image" src="https://user-images.githubusercontent.com/4984825/202828533-bb20a041-19b8-499c-b3cf-02808933df47.png">

@jansel @davidberard98 @aazzolini @mrshenli @awgu @ezyang @soumith @voznesenskym @anijain2305

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89330
Approved by: https://github.com/davidberard98
2022-11-29 01:24:03 +00:00
William Wen
63843401f5 Fix archive issue impacting summary stat diff (#89789)
Summary stat diff was reporting diff between previous day and the day before that, instead of today and previous day. Issue was because summary stats were not uploaded to the archive before the summary stat differ was run.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89789
Approved by: https://github.com/anijain2305
2022-11-29 00:55:06 +00:00
Bin Bao
465ee7bc09 [inductor] skip dm_nfnet_f0 in TIMM model test (#89768)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89768
Approved by: https://github.com/clee2000
2022-11-28 20:08:41 +00:00
Animesh Jain
cdf4087597 [benchmarks] Disabling gradscaler (#89741)
Disabling Gradscaler because
 1) Benchmark setup runs 2 iterations of fwd-bwd. So, not useful.
 2) Current setup shares grad_scaler for eager and dynamo model,
 which is bad as Gradscaler has state and can adjust the scaling
 factor between eager and dynamo run, making accuracy check
 harder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89741
Approved by: https://github.com/ngimel
2022-11-28 20:08:37 +00:00
William Wen
e800d27b10 [dashboard] Add graphs for all summary metrics, add additional testing flags (#89580)
Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580
Approved by: https://github.com/davidberard98
2022-11-23 20:11:39 +00:00
Bin Bao
049a0f2cd5 [inductor] Update CI model tests (#89499)
Summary:
1) Add model inference test
2) Switch model training test to use AMP

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499
Approved by: https://github.com/bertmaher
2022-11-23 18:30:51 +00:00
Edward Z. Yang
ed32511974 Don't use explain() for --explain; instead read it off the counters (#89518)
Fixes huggingface problem where example_inputs is not actually the
args.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89518
Approved by: https://github.com/albanD
2022-11-23 02:43:53 +00:00
Will Constable
26322544b8 Add limited FSDP correctness to torchdynamo benchmark (#89469)
- Does not do recursive wrapping
- Only supports accuracy bench
- Mainly useful for sweeping over models for correctness, in part
  to evaluate whether dynamo support for FSDP is breaking anywhere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89469
Approved by: https://github.com/davidberard98, https://github.com/aazzolini
2022-11-23 00:19:36 +00:00
William Wen
8bf8e4d71e [dashboard] Add metric graphs back to dashboard (#89531)
Title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89531
Approved by: https://github.com/davidberard98
2022-11-22 23:42:09 +00:00
Animesh Jain
5bba783d21 [dashboard] Remove aot_cudagraphs and nvprims_nvfuser (#89514)
Helps speeding up Dashboard runs

We will bring these back when the backends are ready to be tested on full model suite.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89514
Approved by: https://github.com/SherlockNoMad
2022-11-22 22:25:30 +00:00
William Wen
77d7f2c659 [dashboard] Add commit date & fix date related issues (#89517)
Add commit date to build summary of dashboard. Make the date of the run reflective of when the run started, not when the run ended. Use PST (UTC -8) to determine day, rather than GMT (UTC +0).

Test comment: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1324176119

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89517
Approved by: https://github.com/anijain2305
2022-11-22 21:17:36 +00:00
Animesh Jain
f281f435a8 Fix benchmarks - xla tensor test (#89509)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89509
Approved by: https://github.com/ngimel, https://github.com/shunting314
2022-11-22 18:42:13 +00:00
Mike Iovine
7b0650d5cf Back out "[static-runtime] change the backend for permute_copy" (#89463)
Summary: This permute copy change seems to be causing huge regressions on machines without AVX512. Revert to mitigate. This shouldn't be problematic since the improvement from changing it was super small anyways.

Differential Revision: D41450088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89463
Approved by: https://github.com/hlu1
2022-11-22 06:26:10 +00:00
Shunting Zhang
e545caa50f dynamo/torchxla integration: trace on xla rather than eager (#88904)
In #87741 we added the inference support for dynamo/torchxla integration. Later on in #88449 we attempt to add the training support. That attempt is not smooth because
- we try 2 things together
   1. let dynamo trace the model on xla rather than eager
   2. enable training
- It turns out neither of these two tasks are trivial enough.

Furthermore, item 2 (enable training) depends on item 1 (tracing on xla). We enable training via AOTAutograd. AOTAutograd lift all model parameters/buffers as graph inputs. Without item 1 being done, we would need copy all graph inputs (including model parameters/buffers) from eager device to xla devices. That hurts performance a lot. Have a cache to map eager parameter to XLA parameter does not solve the problem since the update on either will not sync automatically to the other. They will easily go out of sync.

This PR let dynamo trace the model on XLA rather than eager. This is a preparation step to enabling training.

Also, tracing on XLA makes the data movement more efficient. We see 1.5x geomean speedup compared to previous 1.38x.
```
+-------------------------+--------------------+-------------------------+
| Model                   |   XLA (trace once) |   XLA (trace everytime) |
+=========================+====================+=========================+
| resnet18                |            1.38    |                 1.008   |
+-------------------------+--------------------+-------------------------+
| resnet50                |            1.227   |                 0.998   |
+-------------------------+--------------------+-------------------------+
| resnext50_32x4d         |            1.544   |                 1.008   |
+-------------------------+--------------------+-------------------------+
| alexnet                 |            1.085   |                 1.045   |
+-------------------------+--------------------+-------------------------+
| mobilenet_v2            |            2.028   |                 1.013   |
+-------------------------+--------------------+-------------------------+
| mnasnet1_0              |            1.516   |                 0.995   |
+-------------------------+--------------------+-------------------------+
| squeezenet1_1           |            0.868   |                 1.01    |
+-------------------------+--------------------+-------------------------+
| vgg16                   |            1.099   |                 1.008   |
+-------------------------+--------------------+-------------------------+
| BERT_pytorch            |            3.26    |                 1.027   |
+-------------------------+--------------------+-------------------------+
| timm_vision_transformer |            2.182   |                 1.015   |
+-------------------------+--------------------+-------------------------+
| geomean                 |            1.50389 |                 1.01261 |
+-------------------------+--------------------+-------------------------+
```

Example command
```
GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --only resnet18 --backend=torchxla_trace_once
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88904
Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/jansel
2022-11-22 03:57:04 +00:00
Will Constable
7174572b1e Add torchvis support to dist bench (#89324)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89324
Approved by: https://github.com/davidberard98, https://github.com/albanD
2022-11-22 00:41:33 +00:00
William Wen
fa4980cd5e Add commit hash to dynamo dashboard (#89462)
Title - also fix a small bug with dashboard outputs.

Sample: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1322732698

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89462
Approved by: https://github.com/anijain2305
2022-11-21 22:56:13 +00:00
Driss Guessous
1d9e1fca97 Update sdp dispatch logic to enable fused backward (#89154)
# Summary
Reorganizes how the sdp dispatch logic is down in order to enable backwards for fused kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89154
Approved by: https://github.com/cpuhrsch
2022-11-21 20:02:09 +00:00
Xu Zhao
e4d9dbd7d2 Port torchdynamo's torchbench script to userbenchmark (#89239)
Summary:
This Diff ports the torchbench.py script from torchdynamo to torchbench to support the development of internal models.

Currently, only works with the `--only` option, and can only test one model at a time.

Note that the noisy logs are from upstream model code, not the benchmark code.
In the internal environment, `torch._dynamo.config.base_dir` is not writable, so we add an option to specify the output directory.

Test Plan:
```
$ buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only ads_dhen_5x --part over --output-directory /tmp/tb-test/
cuda eval  ads_dhen_5x
  1/  1 +0 frames   2s  1 graphs  1 graph calls  412/ 411 = 100% ops 100% time
```

```
$  buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only cmf_10x --part over --output-directory /tmp/tb-test/
cuda eval  cmf_10x
  1/  1 +0 frames   1s  1 graphs  1 graph calls  306/ 305 = 100% ops 100% time
```

Reviewed By: jansel

Differential Revision: D41294311

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89239
Approved by: https://github.com/jansel
2022-11-21 17:25:28 +00:00
PyTorch MergeBot
e1d58b1928 Revert "Update sdp dispatch logic to enable fused backward (#89154)"
This reverts commit 2e72ec7982.

Reverted https://github.com/pytorch/pytorch/pull/89154 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but the new test_sdp_math_gradcheck test breaks periodic slow gradcheck, i.e. 419ef2cdcf
2022-11-20 22:14:38 +00:00
Michael Voznesensky
631baecbcd Add --explain flag to bench (#89316)
TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 time python benchmarks/dynamo/torchbench.py  --accuracy --explain  --backend aot_eager --train --only BERT_pytorch

Dynamo produced 76 graphs with 75 graph break and 198 ops

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89316
Approved by: https://github.com/ezyang
2022-11-19 03:35:09 +00:00
Driss Guessous
2e72ec7982 Update sdp dispatch logic to enable fused backward (#89154)
# Summary
Reorganizes how the sdp dispatch logic is down in order to enable backwards for fused kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89154
Approved by: https://github.com/cpuhrsch
2022-11-19 02:06:27 +00:00
Animesh Jain
cad5772c2c [dashboard][huggingface] skip accuracy checks for really large models… (#89273)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89273
Approved by: https://github.com/desertfire
2022-11-19 00:22:45 +00:00
Bin Bao
19fcb80551 [inductor] Skip DALLE2_pytorch in torchbench (#89288)
Summary: DALLE2_pytorch fails in eager as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89288
Approved by: https://github.com/Krovatkin
2022-11-18 16:21:17 +00:00
Bin Bao
1f7c0ff6e7 [inductor] Temporarily disable functorch_dp_cifar10 test in TorchBench (#89281)
Summary: The failure wasn't caught because of a land race. Skip the test
for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89281
Approved by: https://github.com/Krovatkin
2022-11-18 16:07:44 +00:00
Bin Bao
31b10e7d40 Enable inductor CI for TorchBench (#87465)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87465
Approved by: https://github.com/malfet
2022-11-17 23:16:21 +00:00
William Wen
af448e84eb Fix bug in dynamo dashboard summary stats diff (#89226)
Fixes issue where a suite may not be present in one of the logs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89226
Approved by: https://github.com/anijain2305
2022-11-17 19:20:49 +00:00
Will Constable
bdc9911575 Fix typo in dist_util.py (#89167)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89167
Approved by: https://github.com/davidberard98
2022-11-17 08:45:27 +00:00
Animesh Jain
74610a1ced [dynamo][benchmarks] HF - Fix seq len and batch sizes (#89165)
Fixes many models in https://github.com/pytorch/torchdynamo/issues/1842
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89165
Approved by: https://github.com/ngimel
2022-11-17 06:14:24 +00:00
Will Constable
f920bfaf2a Use torchrun for dynamo/distributed.py (#89149)
Mainly wanted to confirm torchrun works fine with dynamo/ddp,
but it is also a better system than manually launching processes.

Partially addresses issue #1779

New run commands
------------

single process:
python benchmarks/dynamo/distributed.py [args]

multi-gpu (e.g. 2 gpu on one host):
torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89149
Approved by: https://github.com/aazzolini
2022-11-16 23:05:34 +00:00
William Wen
640af8d70a More dynamo dashboard improvements (#89155)
A number of dashboard improvements:
- Add accuracy failures to warnings section
- Add regression detection to all metrics (speedup, compile time, peak memory), not just accuracy
- Add testing flag to update-dashboard to prevent image/comment uploads
- Add section for comparing summary statistics (passrate, speedup) between 2 most recent reports
- Show names of reports for summary stats diff and regression detection sections
- Remove metric graphs from the comment (they can still be found in the generated text file)

Sample comment: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1317565972

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89155
Approved by: https://github.com/anijain2305
2022-11-16 21:54:27 +00:00
Shunting Zhang
a13433940c allow loading model from a path in torchbench (#89028)
Sometimes it's really convenient to run simple models thru the torchbench.py script rather than those from pytorch/benchmark. This PR add the ability to run any model from a specified path by overloading the --only argument.

This PR is split out from #88904

Here is the usage:

        Specify the path and class name of the model in format like:
        --only=path:<MODEL_FILE_PATH>,class:<CLASS_NAME>

        Due to the fact that dynamo changes current working directory,
        the path should be an absolute path.

        The class should have a method get_example_inputs to return the inputs
        for the model. An example looks like
        ```
        class LinearModel(nn.Module):
            def __init__(self):
                super().__init__()
                self.linear = nn.Linear(10, 10)

            def forward(self, x):
                return self.linear(x)

            def get_example_inputs(self):
                return (torch.randn(2, 10),)
        ```

Test command:
```
# python benchmarks/dynamo/torchbench.py --performance --only=path:/pytorch/myscripts/model_collection.py,class:LinearModel --backend=eager
WARNING:common:torch.cuda.is_available() == False, using CPU
cpu  eval  LinearModel                        0.824x p=0.00
```

Content of model_collection.py
```
from torch import nn
import torch

class LinearModel(nn.Module):
    """
    AotAutogradStrategy.compile_fn ignore graph with at most 1 call nodes.
    Make sure this model calls 2 linear layers to avoid being skipped.
    """
    def __init__(self, nlayer=2):
        super().__init__()
        layers = []
        for _ in range(nlayer):
            layers.append(nn.Linear(10, 10))
        self.layers = nn.Sequential(*layers)

    def forward(self, x):
        return self.layers(x)

    def get_example_inputs(self):
        return (torch.randn(2, 10),)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89028
Approved by: https://github.com/jansel
2022-11-16 00:29:08 +00:00
William Wen
45d2daaf85 Fix lookup file update in dashboard (#89024)
Lookup file should be updated before graphs are generated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89024
Approved by: https://github.com/mlazos, https://github.com/anijain2305
2022-11-15 02:32:55 +00:00
William Wen
36d87465fb Fix long comment error on dashboard (#89002)
Fix dashboard comment failure due to the following trace:
```
Traceback (most recent call last):
  File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1180, in <module>
    DashboardUpdater(args).update()
  File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1119, in update
    self.comment_on_gh(comment)
  File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1096, in comment_on_gh
    subprocess.check_call(
  File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 368, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 349, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/data/home/anijain/miniconda/bin/gh'
srun: error: a100-st-p4d24xlarge-27: task 0: Exited with exit code 1
```
That is, we were trying to execute a gh command in the OS that was too long.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89002
Approved by: https://github.com/davidberard98
2022-11-14 18:43:50 +00:00
Jason Ansel
8f7e519f12 Skip dynamo benchmark tests under TSAN (#88895)
Summary: Fixes T137546804

Test Plan:
```
buck2 test mode/opt-tsan //caffe2/benchmarks/dynamo:test
buck2 test mode/opt //caffe2/benchmarks/dynamo:test
```

Differential Revision: D41226384

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88895
Approved by: https://github.com/anijain2305
2022-11-13 19:42:42 +00:00
Andrew Gu
4284862db6 [Dynamo][FSDP] Migrate to ModuleWrapPolicy (#88453)
Hello @wconstab! As you saw, `transformer_auto_wrap_policy()` is a misnomer and actually works for any module classes. The PR before this one tries to add a class `ModuleWrapPolicy` that takes in the `module_classes` in its constructor and works just like `transformer_auto_wrap_policy()` without requiring the `functools.partial()`. I hope you do not mind if we update the dynamo benchmarks util file with this migration.

The PR before this one might require some back and forth within FSDP devs, so I apologize for any consequent updates to this PR, which in itself is an easy change. I will request review once we know the previous PR is good for land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88453
Approved by: https://github.com/wconstab
2022-11-13 14:56:30 +00:00