Commit Graph

73 Commits

Author SHA1 Message Date
Huy Do
a9c663c269 Revert "Flash Attention v2 (#105602)" (#108827)
This reverts commit add45aea1c.

There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually.

The diff has been reverted internally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827
Approved by: https://github.com/kit1980
2023-09-08 07:43:04 +00:00
PyTorch MergeBot
e45b290127 Revert "Revert "Flash Attention v2 (#105602)" (#108827)"
This reverts commit 24e9bbe22a.

Reverted https://github.com/pytorch/pytorch/pull/108827 on behalf of https://github.com/huydhn due to I need to land this revert properly as there are new failures showing up on trunk ([comment](https://github.com/pytorch/pytorch/pull/108827#issuecomment-1711020924))
2023-09-08 03:25:45 +00:00
Huy Do
24e9bbe22a Revert "Flash Attention v2 (#105602)" (#108827)
This reverts commit add45aea1c.

There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually.

The diff has been reverted internally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827
Approved by: https://github.com/kit1980
2023-09-08 02:54:20 +00:00
eellison
738106c1f7 Torchbench model tolerance changes (#108598)
Move detectron2_fcos_r_50_fpn to amp. The minifier showed the following snippet as causing the divergence, where inductor has better numerics than eager:

```
import torch

def foo(x):
    return x > .2

inp = torch.tensor([.2002], device="cuda", dtype=torch.bfloat16)
print(foo(inp))

print(torch.compile(foo)(inp))
```

doctr_reco_predictor had very minimal divergence (.002 vs .001 required), bumping tolerance here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108598
Approved by: https://github.com/shunting314
2023-09-06 16:52:29 +00:00
drisspg
add45aea1c Flash Attention v2 (#105602)
# Summary
## PR Dependencies
I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier:
- [x] Separate build flags for Flash and MemEff: #107985

### Description
This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao

### Changes Made
The majority of the changes in this pull request involve:

- Copying over the flash_attention sources.
- Updating header files.
- Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd.
- Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates.
- Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80.
- Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes.
- Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources.
- Adding/Updating tests.

### Notes for Reviewers
This is not a fun review, and I apologize in advance.
Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO:
- aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp
- aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github)

There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts.

### Follow up items
- Include the updates from e07aa036db and 9e5e8bc91e | https://github.com/pytorch/pytorch/issues/108108

### Work Items
- [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee
- [x] Let multi_query/attention pass through and test | UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup.
- [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers.
- [x] Update test exercise above codepath
- [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it a4f148b6ab)
- [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b
- [x] Update dispatcher to universally prefer FlashV2
- [x] Update tests to exercise new head_dims
- [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional
- [x] Create template generator script
- [x] Initial cmake support for building kernels/ folder
- [x] Replay CudaGraph changes

### Results
#### Forward only
The TFlops are reported here are on a100 that is underclocked.
![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7)

#### Forward+Backward
Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back.
<img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602
Approved by: https://github.com/huydhn, https://github.com/cpuhrsch
2023-09-01 22:14:44 +00:00
Elias Ellison
e18f512b81 Update accuracy checking for nan, floats (#108202)
Fixes inference accuracy for `doctr_reco_predictor` and `pyhpc_turbulent_kinetic_energy`.

For the `same(float, float)` comparison we weren't going through the more rigorous tensor comparison path which takes into account the fp64 base results.

Also return True when fp64 base result are not well formed (nan).

I debugged these models and the source of divergence were innocuous:
`doctr_reco_predictor` - can be fixed by turning off layout optimization, decomp for batch norm

`pyhpc_turbulent_kinetic_energy` - divergence caused because fused kernel keeps precision in fp32 instead of casting back and forth from/to fp32/bf16. Fused kernel is better precision, anyway.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108202
Approved by: https://github.com/jansel
2023-09-01 02:54:01 +00:00
Elias Ellison
63eee52ba7 Add Drq to BF16 Higher Tolernace (#108368)
This passes for me on aws gpu but not devgpu, and was already in the `REQUIRE_HIGHER_FP16_TOLERANCE` set.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108368
Approved by: https://github.com/shunting314
2023-09-01 00:29:27 +00:00
Shunting Zhang
eb8659fe81 pass inference accuracy check for detectron2_fcos_r_50_fpn (#108328)
We need a higher tolerance to pass the inference accuracy check for detectron2_fcos_r_50_fpn .

Command:
```
python benchmarks/dynamo/torchbench.py --backend inductor --bfloat16 --accuracy --only detectron2_fcos_r_50_fpn --disable-cudagraphs --inference
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108328
Approved by: https://github.com/jansel
2023-08-31 20:21:20 +00:00
Edward Z. Yang
5b04e9b6ce Install torchrec/fbgemm from source in CI (#106808)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106808
Approved by: https://github.com/malfet, https://github.com/xuzhao9
2023-08-12 02:08:44 +00:00
Mark Saroufim
1b32ac3cab Update torchbench.txt (#106761)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106761
Approved by: https://github.com/malfet
2023-08-09 19:01:21 +00:00
Edward Z. Yang
c379d6283a Don't suppress ModuleNotFoundError if the failure is for an unrelated module (#106807)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106807
Approved by: https://github.com/williamwen42, https://github.com/voznesenskym
2023-08-09 01:54:49 +00:00
Mark Saroufim
90c264c276 sd flaky on cpu skip (#106726)
waiting for update expected script

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106726
Approved by: https://github.com/malfet
2023-08-08 02:44:47 +00:00
Elias Ellison
578969ca61 skip maml (#106471)
This one benchmark distorts benchmarks because it is so low (.0007, the equivalent of a 1400x speedup). It also has been flakey, which has produced a lot of noise. Disabling.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106471
Approved by: https://github.com/anijain2305
2023-08-04 22:14:09 +00:00
Howard Huang
236eda4d51 remove jit from torchbench (#106071)
Need to remove jit arguments after changes in https://github.com/pytorch/benchmark/pull/1787

Also curious, is there is a procedure for updating torchbench version in Pytorch CI?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106071
Approved by: https://github.com/xuzhao9, https://github.com/msaroufim, https://github.com/malfet, https://github.com/lezcano
2023-08-03 21:04:43 +00:00
Mark Saroufim
6268ab2c2d torchbench pin upd: hf auth token, clip, whisper, llamav2, sd (#106009)
Includes stable diffusion, whisper, llama7b and clip

To get this to work I had to Pass in hf auth token to all ci jobs, github does not pass in secrets from parent to child automatically. There's a likelihood HF will rate limit us in case please revert this PR and I'll work on adding a cache next - cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @aakhundov @malfet

Something upstream changed in torchbench too where now `hf_Bert` and `hf_Bert_large` are both failing on some dynamic shape looking error which I'm not sure how to debug yet so for now felt a bit gross but added a skip since others are building on top this work @ezyang

`llamav2_7b_16h` cannot pass through accuracy checks cause it OOMs on deepcloning extra inputs this seems to make it not need to show up in expected numbers csv, will figure this when we update the pin with https://github.com/pytorch/benchmark/pull/1803 cc @H-Huang @xuzhao9 @cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106009
Approved by: https://github.com/malfet
2023-08-03 16:28:40 +00:00
Bin Bao
28d42e66e4 [CI] Add DALLE2_pytorch to FORCE_AMP_FOR_FP16_BF16_MODELS (#104283)
Summary: DALLE2_pytorch inference does not support bfloat16, fallback to use AMP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104283
Approved by: https://github.com/eellison
2023-06-28 02:37:15 +00:00
Bin Bao
a2988c9e6a [CI] Switch inference accuracy and performance tests to bfloat16 (#103535)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103535
Approved by: https://github.com/eellison
2023-06-17 00:24:37 +00:00
Edward Z. Yang
bc6ec97e02 Switch dynamic_shapes to True by default (#103597)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597
Approved by: https://github.com/voznesenskym
2023-06-15 15:16:20 +00:00
Animesh Jain
d6da649a1b [benchmark] hf_T5_base - torchbench original batchsize too large (#103442)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103442
Approved by: https://github.com/desertfire
2023-06-15 01:06:40 +00:00
Animesh Jain
16c2090b2d [benchmark][compile] Limit number of bounding boxes to 5 (#103413)
Depends on https://github.com/pytorch/benchmark/pull/1729

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103413
Approved by: https://github.com/ezyang
2023-06-15 01:06:40 +00:00
Animesh Jain
428bff842d [benchmarks] Torchbench llama is not suitable for training (#103094)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103094
Approved by: https://github.com/eellison, https://github.com/desertfire
2023-06-07 01:33:07 +00:00
Animesh Jain
33a49eeae7 [benchmark] Flag to switch on activation checkpointing for HF models (#102557)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102557
Approved by: https://github.com/ngimel, https://github.com/Chillee
2023-05-30 23:46:14 +00:00
Edward Z. Yang
22ca1a1124 Partially fix shape mismatch in vision_maskrcnn (#101477)
The bulk of the heavy lifting is happening in
https://github.com/pytorch/vision/pull/7592

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101477
Approved by: https://github.com/voznesenskym
2023-05-21 05:20:08 +00:00
Edward Z. Yang
41468833fb vision_maskrcnn is now deterministic (#101116)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101116
Approved by: https://github.com/ngimel
2023-05-16 21:32:17 +00:00
Edward Z. Yang
f48718f749 Update torchbench pin (#101365)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101365
Approved by: https://github.com/albanD, https://github.com/awgu
2023-05-15 16:52:31 +00:00
Edward Z. Yang
fcf2fb273c Make missing model import error marginally better (#101221)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101221
Approved by: https://github.com/albanD, https://github.com/anijain2305
2023-05-14 19:57:01 +00:00
Edward Z. Yang
41a4e22015 Update torchbench pin (#101071)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101071
Approved by: https://github.com/malfet
2023-05-11 18:09:40 +00:00
Edward Z. Yang
ad070b6dfa Check canary_models for models too in torchbench.py (#101081)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101081
Approved by: https://github.com/desertfire
2023-05-11 13:23:17 +00:00
Edward Z. Yang
d25c93f919 Remove speech_transformer workaround, torchbench handles it correctly now (#100558)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100558
Approved by: https://github.com/albanD
2023-05-04 01:14:24 +00:00
Yanbo Liang
896eb1db26 [Dynamo] Skip TB Background_Matting model eager accuracy check because of non deterministic (#100513)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100513
Approved by: https://github.com/anijain2305
2023-05-03 07:06:50 +00:00
Yanbo Liang
3009c42e7d [CI Testing] Re-enable timm_efficientdet training (#99787)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99787
Approved by: https://github.com/desertfire
2023-04-24 20:05:15 +00:00
Edward Z. Yang
fc8fa6c356 Require at least one tensor to be marked dynamic with --dynamic-batch-only (#99620)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99620
Approved by: https://github.com/voznesenskym
2023-04-21 00:17:08 +00:00
Will Constable
9ac2b041c9 Make opacus xfail instead of skip (#99380)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99380
Approved by: https://github.com/desertfire, https://github.com/anijain2305
2023-04-19 21:09:06 +00:00
Huy Do
5d395769a6 Skip vision_maskrcnn after #98923 (#99394)
This is failing in trunk as documented in https://github.com/pytorch/pytorch/issues/99438

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99394
Approved by: https://github.com/desertfire
2023-04-19 17:07:07 +00:00
Bin Bao
46b9377190 [CI] Collect inductor max-autotune performance every Sunday (#99387)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99387
Approved by: https://github.com/malfet, https://github.com/huydhn
2023-04-18 13:20:13 +00:00
Will Constable
6eab5e88c8 Graph-break on allowed modules if they have hooks (#97184)
Allowed modules are stuck into dynamo's fx graph as call_module
nodes, without dynamo doing any tracing of the module.  This means
during AOT trace time, hooks will fire during tracing when the
call_module is executed, but the hooks themselves will disappear
after that and not be present in the compiled program.
  (worse, if they performed any tensor operations, those would get
   traced so you could end up with part of the hook's functionality).

To circumvent this, there are two options for 'allowed modules' with hooks.
1) don't treat them as 'allowed' - trace into them
2) graph-break, so the module is no longer part of the dynamo trace at all

(1) will fail for users that opted into allowed modules becuase they know
    their module has problems being traced by dynamo.
(2) causes graph breaks on common modules such as nn.Linear, just because they
    are marked as 'allowed'.

It would help matters if we could differentiate between types of allowed modules
  (A) allowed to avoid overheads - used for common ops like nn.Linear
  (B) allowed to avoid dynamo graphbreaks caused by unsupported code

Ideally, we'd use method (1) for group (A) and (2) for (B).

For now, graph-break on all cases of allowed modules.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97184
Approved by: https://github.com/jansel
2023-04-15 01:46:15 +00:00
Bin Bao
5210d7c423 [CI] Mark vision_maskrcnn as NONDETERMINISTIC (#98570)
Summary: vision_maskrcnn fails eager checking, so mark it as
NONDETERMINISTIC to reduce noise on the dashboard.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98570
Approved by: https://github.com/eellison, https://github.com/huydhn
2023-04-07 19:33:20 +00:00
Bin Bao
c4de7fdef5 [CI] Mark sebotnet33ts_256 as nondeterministic (#98356)
Summary: The goal is make sure the new dashboard doesn't give noisy
alert on this test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98356
Approved by: https://github.com/ezyang
2023-04-05 12:05:47 +00:00
Bin Bao
bd6db54285 [CI] Mark mobilenet_v3_large as nondeterministic (#98314)
Summary: Skip mobilenet_v3_large for accuracy checking to reduce
noise on the dashboard. The root cause still needs to be investigated.

mobilenet_v3_large shows random accuracy check failures with different
error values from time to time, and here are some examples:
```
cuda train mobilenet_v3_large                  [2023-04-04 14:54:50,990] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.02172, (ref-fp64): 0.01068 and shape=torch.Size([960, 1, 5, 5])
[2023-04-04 14:54:50,990] torch._dynamo.utils: [ERROR] Accuracy failed for key name features.14.block.1.0.weight.grad
```
```
cuda train mobilenet_v3_large                  [2023-04-04 14:57:59,972] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.07744, (ref-fp64): 0.03073 and shape=torch.Size([72, 1, 5, 5])
[2023-04-04 14:57:59,973] torch._dynamo.utils: [ERROR] Accuracy failed for key name features.4.block.1.0.weight.grad
```

One observation is turnning off cudnn in the eager mode with
`torch.backends.cudnn.enabled = False` makes the non-deterministic
behvior go away but meanwhile it fails accuaracy checking consistently.
Minifier didn't help to narrow down the error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98314
Approved by: https://github.com/huydhn
2023-04-04 21:55:23 +00:00
Bin Bao
69ff39d2e7 Skip gat, gcn and sage for TorchBench CUDA test (#98244)
Summary: The three models only support CPU for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98244
Approved by: https://github.com/ezyang
2023-04-04 01:06:18 +00:00
BowenBao
60a68477a6 Bump black version to 23.1.0 (#96578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
Bin Bao
02792ff16f [CI] Make inductor-perf-test-nightly produce data for dashboard (#95685)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95685
Approved by: https://github.com/ezyang, https://github.com/huydhn
2023-03-06 03:14:03 +00:00
Natalia Gimelshein
f2aee8b8d5 small fixes for mlir backend (#94717)
Fixes for skipped tests with mlir triton backend (will unskip once #94249 lands)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94717
Approved by: https://github.com/malfet, https://github.com/atalman
2023-02-13 22:42:53 +00:00
Nikita Shulga
4869929f32 Update Triton hash (#94249)
That includes MLIR + latest packaging changes (that also download ptxas from CUDA-12)
Tweak CI to install gcc-9 to build trition

Disable a few tests to make everything be correct

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94249
Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/weiwangmeta
2023-02-13 13:17:36 +00:00
Xuehai Pan
8d45f555d7 [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587
Approved by: https://github.com/ezyang
2023-02-11 18:19:48 +00:00
Michael Voznesensky
333e771394 Add benchmarks.py to run all benchmarks, add new file with all torchbench model names (#94146)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94146
Approved by: https://github.com/ezyang
2023-02-08 01:18:38 +00:00
atalman
6e285c479d Remove cuda 11.6 from CI replace with 11.7 (#93406)
Remove cuda 11.6 from CI replace with 11.7
Following the Release readme here: https://github.com/pytorch/pytorch/blob/master/RELEASE.md#release-compatibility-matrix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93406
Approved by: https://github.com/malfet, https://github.com/desertfire
2023-02-02 19:16:05 +00:00
Edward Z. Yang
c52567ec18 Switch CI exclusions to use exact match. (#92761)
Since the CI exclusions are hard-coded in our script, we might as well require them to match exactly. This solved some head scratching where I was like, "this model is not obviously excluded, why is it not showing up in CI."

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92761
Approved by: https://github.com/jansel
2023-01-22 17:10:20 +00:00
Jason Ansel
7c1c239db1 [inductor] Rewrite Triton templates + epilogue fusion (retry) (#91575)
This reverts commit 94262efc7d to reland #91105 / #90738.

Fixes https://github.com/pytorch/torchdynamo/issues/2015

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91575
Approved by: https://github.com/ngimel
2023-01-11 00:08:03 +00:00
blzheng
0c1777acec Dynamo benchmark: add CPU specific changes (#88477)
This pr adds some CPU specific changes:

- Add support for IPEX backend
- https://github.com/pytorch/torchdynamo/issues/1618
- https://github.com/pytorch/torchdynamo/issues/1534
- Enable CPU launcher in runner.py.
- Fix the issue that some environment variables are not support on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-01-07 09:26:06 +00:00