pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Simon Fan	ef8d461b09	Fix torchbench --multiprocess (#109657 ) `python benchmarks/dynamo/torchbench.py --multiprocess` currently fails due to initializing distributed multiple times: ``` torch.distributed.DistNetworkError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:6789 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:6789 (errno: 98 - Address already in use). ``` Because torchbench calls itself via mp.spawn, there is the parent run (with `--multiprocess`) and child runs (with `--multiprocess --only <model>`). This PR addresses this by fixing two issues: 1) distributed is initialized once in parent run and once in child runs, it should be initialized only in child runs where we have accurate rank and world size info 2) torchbench overrides CUDA_VISIBLE_DEVICES/world_size sometimes, but it shouldn't for distributed use cases where we want to use all available gpus I am also adding a CI test to cover this type of issue in #109311 ### Test plan parent run test: `python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cuda --inference --bfloat16 --output /home/xmfan/local/pytorch/test/test-reports/inference_torchbench.csv --multiprocess` child run test: `python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cuda --inference --bfloat16 --output /home/xmfan/local/pytorch/test/test-reports/inference_torchbench.csv --multiprocess --only simple_gpt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109657 Approved by: https://github.com/H-Huang	2023-09-21 16:53:07 +00:00
Mark Saroufim	0ec9f59f70	Loudly Error in dynamo bench if eager fails (#109536 ) Helps debug https://github.com/pytorch/benchmark/issues/1901 I will wait until the ONNX beartype sev is fixed before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/109536 Approved by: https://github.com/xuzhao9	2023-09-19 00:40:42 +00:00
angelayi	5b13f74e9b	[export] Update how we input kwargs (#109160 ) Previously, the code for passing inputs to exported program was: ``` if kwargs: return (args, kwargs) else: return args ``` However, this causes some inconsistency where if the original input contains args and kwargs, the treespec would be a tuple containing a tuple of arguments, and a dictionary of keyword arguments. But if the original input only contained args, the treespec would just be a tuple of arguments. This inconsistency causes some inconveniences in the runtime. So I updated the code to just always keep the kwargs around. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109160 Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri	2023-09-19 00:04:32 +00:00
Animesh Jain	f786fbdebd	Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109323 Approved by: https://github.com/huydhn, https://github.com/voznesenskym	2023-09-15 08:44:14 +00:00
Simon Fan	54c5f474a7	Forward rank and world size info to Torchbench models when using dynamo runner (#108438 ) Adding support to pass rank and world_size to torchbench model, via its extra_args parameter: https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/model.py#L83C80-L83C90 This is used for models which distribute over multiple GPUs e.g. simple_gpt https://github.com/pytorch/benchmark/pull/1867 Also add an option to skip multiprocess only gpu models Testing via `python benchmarks/dynamo/torchbench.py -d cuda --output=benchmark_logs/performance.csv --inference --performance --timing --print-memory --multiprocess --only simple_gpt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108438 Approved by: https://github.com/Chillee	2023-09-14 21:01:20 +00:00
angelayi	c3945b5f84	Update HF version to commit hash (6c26faa) (#107400 ) Some [errors](https://ossci-raw-job-status.s3.amazonaws.com/log/15968424899) in the [torchinductor hf benchmarks](https://hud.pytorch.org/benchmark/huggingface/inductor_aot_inductor?startTime=Thu,%2010%20Aug%202023%2018:05:47%20GMT&stopTime=Thu,%2017%20Aug%202023%2018:05:47%20GMT&granularity=hour&mode=inference&dtype=bfloat16&lBranch=main&lCommit=384e0d104fd077d31efafc564129660e9b7a0f25&rBranch=main&rCommit=03414081ff7ee011e17ee10f9ddb2584811bf965) should be fixed in the most recent release (for example, this [line](`c036c814f4/src/transformers/models/opt/modeling_opt.py (L688)`) no longer exists). Additionally, I landed a [commit (6c26faa)](`6c26faa159`) to the HF transformers repro to fix one of the graph breaks. This PR results in [76% pass rate for the export + aot inductor HF benchmark!](https://hud.pytorch.org/benchmark/compilers?startTime=Thu%2C%2010%20Aug%202023%2022%3A45%3A09%20GMT&stopTime=Thu%2C%2017%20Aug%202023%2022%3A45%3A09%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/hf_version&lCommit=0accaaca2fa70ca2f78c1a587dd4b6750448dd90&rBranch=main&rCommit=03414081ff7ee011e17ee10f9ddb2584811bf965) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107400 Approved by: https://github.com/ezyang, https://github.com/desertfire, https://github.com/malfet	2023-09-12 15:25:28 +00:00
PyTorch MergeBot	56c2386157	Revert "reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 )" This reverts commit `d4230e5574`. Reverted https://github.com/pytorch/pytorch/pull/108883 on behalf of https://github.com/huydhn due to Per the discussion thread on D49122208, reverting this change ([comment](https://github.com/pytorch/pytorch/pull/108883#issuecomment-1712707853))	2023-09-10 04:40:02 +00:00
Animesh Jain	d4230e5574	reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108883 Approved by: https://github.com/voznesenskym, https://github.com/huydhn	2023-09-09 03:12:31 +00:00
Bin Bao	e91f66471c	[reland][inductor] Switch to use the runtime interface for AOTInductor testing (#108878 ) Summary: This is a reland of https://github.com/pytorch/pytorch/pull/108663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108878 Approved by: https://github.com/muchulee8	2023-09-08 17:58:35 +00:00
PyTorch MergeBot	428f5f9e7e	Revert "[inductor] Switch to use the runtime interface for AOTInductor testing (#108663 )" This reverts commit `366ce589d0`. Reverted https://github.com/pytorch/pytorch/pull/108663 on behalf of https://github.com/Chillee due to Sorry :'( Need to revert to resolve merge conflict for another revert ([comment](https://github.com/pytorch/pytorch/pull/108663#issuecomment-1711076411))	2023-09-08 05:01:27 +00:00
PyTorch MergeBot	72f24d0001	Revert "[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 )" This reverts commit `34bb74c4cf`. Reverted https://github.com/pytorch/pytorch/pull/108528 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it has some nasty merge conflicts after the revert of D48910794. I need to revert this so the conflict could be resolved. Please help rebase this tomorrow and reland the change ([comment](https://github.com/pytorch/pytorch/pull/108528#issuecomment-1711034781))	2023-09-08 03:49:41 +00:00
Bin Bao	366ce589d0	[inductor] Switch to use the runtime interface for AOTInductor testing (#108663 ) Summary: Switch AOTInductor unit tests and integration tests to invoke the same runtime interface. This is only an effort to unify the usage of the runtime. The interface scrutiny will come in later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108663 Approved by: https://github.com/ezyang ghstack dependencies: #108653	2023-09-07 23:38:11 +00:00
Animesh Jain	34bb74c4cf	[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 ) This PR is a 99% copy paste of Sam Gross (@colesbury) work at https://github.com/pytorch/pytorch/pull/100642. Copied from there -------- The NN_MODULE guard now subsumes guards on Module attributes. The check_fn will fail if the module attributes are changed (such as Module.training), parameters, submodules, and buffers are added or removed, and if fields are changed on the type itself. This gives up specificity in the guard check -- if any field is changed the check_fn fails -- for faster overall checks. ----- Pull Request resolved: https://github.com/pytorch/pytorch/pull/108528 Approved by: https://github.com/ezyang	2023-09-07 01:45:47 +00:00
JackCaoG	e73ec92ad2	Minor fixs to make torchbench runable on torch/xla (#107919 ) `import torch_xla.core.xla_model as xm` no longer trigger the xla runtime to init, hence explictly create the device here. This is a workaround for https://github.com/pytorch/xla/issues/4174. `is_correct` reference has been deleted, I think it is a deadcode. After this patch, I am able to run ``` python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=openxla --only resnet50 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107919 Approved by: https://github.com/shunting314, https://github.com/wconstab	2023-09-06 22:35:53 +00:00
Bin Bao	60bd30ee0b	[inductor] Move AOTInductor runtime headers (#108564 ) Summary: Move AOTInductor runtime header files into its own subdirectory, to separate them from to-be-added libtorch C interface. Reviewed By: frank-wei Differential Revision: D48905038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108564 Approved by: https://github.com/frank-wei	2023-09-06 11:50:41 +00:00
Bin Bao	28c5b62210	[inductor] Use empty_strided to create output tensors when testing AOTInductor (#108364 ) Summary: This will fix 3 fail_accuracy failures in HF. Test Plan: ``` python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --only T5Small ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108364 Approved by: https://github.com/angelayi ghstack dependencies: #108412	2023-09-06 02:04:32 +00:00
Yanbo Liang	ff28b4b908	Fix dynamo benchmark config --print-graph-breaks (#108584 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108584 Approved by: https://github.com/anijain2305	2023-09-05 23:31:43 +00:00
Mark Saroufim	5f5caed25a	do not cast all inputs in benchmarks (#108456 ) Fixes why stable diffusion is not showing up in inference dashboard even though it shows up in training dashboard The reason is stable diffusion in torchbench has a line like `input_tensor = input_tensor.long().to(self.device)` and if you cast this to a bfloat16 the inference will fail <img width="1705" alt="Screenshot 2023-09-01 at 4 37 49 PM" src="https://github.com/pytorch/pytorch/assets/3282513/ada0d381-1af0-4378-8e8b-2375b39c3713"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108456 Approved by: https://github.com/cpuhrsch	2023-09-02 03:13:17 +00:00
Bin Bao	06d74e6b24	Revert "[AOTInductor] Include constants in AOTInductor .so file. (#10… (#108349 ) This reverts commit `c3239442a3` due to internal test failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108349 Approved by: https://github.com/aakhundov, https://github.com/zhxchen17	2023-08-31 16:26:02 +00:00
Mu-Chu Lee	c3239442a3	[AOTInductor] Include constants in AOTInductor .so file. (#107718 ) Summary: Include the constants into AOTInductor .so file. We do not modify existing API signatures but create necessary format with weight lifted out instead. Test Plan: test/inductor/test_aot_inductor.py Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107718 Approved by: https://github.com/angelayi, https://github.com/eellison	2023-08-29 22:37:30 +00:00
PyTorch MergeBot	2f226804a0	Revert "Minor fixs to make torchbench runable on torch/xla (#107919 )" This reverts commit `ed8f21282f`. Reverted https://github.com/pytorch/pytorch/pull/107919 on behalf of https://github.com/izaitsevfb due to Conflicts with the revert of 106914 ([comment](https://github.com/pytorch/pytorch/pull/107919#issuecomment-1696662453))	2023-08-29 02:18:07 +00:00
JackCaoG	ed8f21282f	Minor fixs to make torchbench runable on torch/xla (#107919 ) `import torch_xla.core.xla_model as xm` no longer trigger the xla runtime to init, hence explictly create the device here. This is a workaround for https://github.com/pytorch/xla/issues/4174. `is_correct` reference has been deleted, I think it is a deadcode. After this patch, I am able to run ``` python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=openxla --only resnet50 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107919 Approved by: https://github.com/shunting314, https://github.com/wconstab	2023-08-26 03:34:54 +00:00
blzheng	1ea83f04d2	benchmark: convert output of fp64 to torch.float64 (#107375 ) This PR adds converting the output of fp64 to torch.float64 before checking for accuracy. Why we need this change? For llama of torchbench, it converts output to float before returning it. `bad4e9ac19/torchbenchmark/models/llama/model.py (L241)` While in the correctness checker, it will not compare the res results with fp64_ref if the fp64_ref.dtype is not torch.float64. So llama fails the accuracy check in the low-precision case, even though res is closer to fp64_ref than ref. `e108f33299/torch/_dynamo/utils.py (L1025)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107375 Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper, https://github.com/jansel	2023-08-21 04:34:23 +00:00
Edward Z. Yang	5b9b816b17	WAR by avoid querying device before env mutation (#107301 ) We should probably fix https://github.com/pytorch/pytorch/issues/107300 properly but this works around the problem Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107301 Approved by: https://github.com/bdhirsh, https://github.com/H-Huang, https://github.com/albanD	2023-08-17 00:31:16 +00:00
BowenBao	19a76290d8	[ONNX] Public diagnostic options for 'dynamo_export' (#106741 ) Generate diagnostic reports to monitor the internal stages of the export process. This tool aids in unblocking model exports and debugging the exporter. #### Settings ~~1. Choose if you want to produce a .sarif file and specify its location.~~ 1. Updated: saving .sarif file should be done by `export_output.save_sarif_log(dst)`, similar to saving exported onnx model `export_output.save(model_dst)`. 2. Customize diagnostic options: - Set the desired verbosity for diagnostics. - Treat warnings as errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106741 Approved by: https://github.com/titaiwangms, https://github.com/justinchuby, https://github.com/malfet	2023-08-15 17:46:15 +00:00
Edward Z. Yang	5b04e9b6ce	Install torchrec/fbgemm from source in CI (#106808 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106808 Approved by: https://github.com/malfet, https://github.com/xuzhao9	2023-08-12 02:08:44 +00:00
Howard Huang	656412f0cb	Add multiprocess option to dynamo benchmarks (#106394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106394 Approved by: https://github.com/XilunWu	2023-08-11 18:34:09 +00:00
lezcano	a9dca53438	NumPy support in torch.compile (#106211 ) RFC: https://github.com/pytorch/rfcs/pull/54 First commit is the contents of https://github.com/Quansight-Labs/numpy_pytorch_interop/ We have already been using this in core for the last few months as a external dependency. This PR pulls all these into core. In the next commits, I do a number of things in this order - Fix a few small issues - Make the tests that this PR adds pass - Bend backwards until lintrunner passes - Remove the optional dependency on `torch_np` and simply rely on the upstreamed code - Fix a number dynamo tests that were passing before (they were not tasting anything I think) and are not passing now. Missing from this PR (but not blocking): - Have a flag that deactivates tracing NumPy functions and simply breaks. There used to be one but after the merge stopped working and I removed it. @lezcano to investigate. - https://github.com/pytorch/pytorch/pull/106431#issuecomment-1667079543. @voznesenskym to submit a fix after we merge. All the tests in `tests/torch_np` take about 75s to run. This was a work by @ev-br, @rgommers @honno and I. I did not create this PR via ghstack (which would have been convenient) as this is a collaboration, and ghstack doesn't allow for shared contributions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106211 Approved by: https://github.com/ezyang	2023-08-11 00:39:32 +00:00
angelayi	5b13c779d4	[AOTInductor] Remove call to aot_autograd when receiving ExportedProgram (#105977 ) https://github.com/pytorch/pytorch/issues/105555 Existing flow first exports and then calls torch._inductor.aot_compile. However, export calls aot_autograd with the core aten decomposition table, and then torch._inductor.aot_compile calls aot_autograd again with the inductor decomposition table. The 2nd calling of aot_autograd is supposedly causing some problems, and seems excessive, so instead we will create a new function, torch._export.aot_compiler which will export using the inductor decomposition table, pass it to inductor's compile_fx_aot, and because it has already been exported, avoid recalling aot_autograd. ``` def aot_compile( f: Callable, args: Tuple[Any], kwargs: Optional[Dict[str, Any]] = None, constraints: Optional[List[Constraint]] = None, ) -> Tuple[str, ExportedProgram]: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105977 Approved by: https://github.com/desertfire, https://github.com/zhxchen17, https://github.com/eellison	2023-08-04 15:35:23 +00:00
angelayi	b2d3a2f433	[inductor] Remove ReinterpretView copy_ for AOT Inductor outputs (#106564 ) Running benchmark on HF models result in 71% pass rate now: P802905571 Updated [dashboard](https://hud.pytorch.org/benchmark/compilers?startTime=Fri%2C%2028%20Jul%202023%2005%3A02%3A20%20GMT&stopTime=Fri%2C%2004%20Aug%202023%2005%3A02%3A20%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/bench&lCommit=e35a655e59b2038c0395f972a1f567f862093d9c&rBranch=main&rCommit=3e5a52cedd2d586fc6cb40a73a098252b9edc2a1) Originally, a lot of the HF export-aot-inductor tests are failing with the error message: ``` RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location. Please clone() the tensor before performing the operation. ``` I looked at the result of one of the models, AlbertForMaskedLM, and the error is due to an additional [`copy_`](https://www.internalfb.com/phabricator/paste/view/P802043305?lines=1460%2C1426%2C1438%2C1451%2C1428) being inserted at the end. Looking at the [exported graph](https://www.internalfb.com/phabricator/paste/view/P802908243?lines=1124), `buf237` in the cpp program corresponds to the `view_269` node. During inductor lowering, this `view_269` node will result in a `ir.ReinterpretView` node, and when generating code for the outputs, this [line](https://fburl.com/code/epola0di) will add an additional `copy_`. I'm unsure if removing this case will result in other errors, but it seems to raise the HF model benchmark pass rate :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106564 Approved by: https://github.com/jansel	2023-08-04 07:51:29 +00:00
Mark Saroufim	6268ab2c2d	torchbench pin upd: hf auth token, clip, whisper, llamav2, sd (#106009 ) Includes stable diffusion, whisper, llama7b and clip To get this to work I had to Pass in hf auth token to all ci jobs, github does not pass in secrets from parent to child automatically. There's a likelihood HF will rate limit us in case please revert this PR and I'll work on adding a cache next - cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @aakhundov @malfet Something upstream changed in torchbench too where now `hf_Bert` and `hf_Bert_large` are both failing on some dynamic shape looking error which I'm not sure how to debug yet so for now felt a bit gross but added a skip since others are building on top this work @ezyang `llamav2_7b_16h` cannot pass through accuracy checks cause it OOMs on deepcloning extra inputs this seems to make it not need to show up in expected numbers csv, will figure this when we update the pin with https://github.com/pytorch/benchmark/pull/1803 cc @H-Huang @xuzhao9 @cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/106009 Approved by: https://github.com/malfet	2023-08-03 16:28:40 +00:00
Ubuntu	77e369b363	Run minification for TorchDynamo benchmark models that fail evaluation (#106201 ) ### Description As an alternative to PR #105774, which provides a standalone, end-to-end minification script that covers all types of failures and has more functionality, this PR adds the ability to minify models when they fail the eval loop (accuracy checks). Both this PR and the other one can be merged without issue. ### Purpose The goal is to leverage the minifier to minify models that fail accuracy checks, allowing failed models to be debugged more easily. The ideal use-case is trying to run a model suite on a backend where operator coverage is not known or is limited. If models can compile but fails the eval loop, having the repro script for each model is valuable for any developer that's trying to fix the issue. ### Functionality - Create minify flag that minifies models when they fail accuracy check - Produce minified graph for each model, and save it into repro script - Move repro script to output directory/base Dynamo directory - Enable functionality for running an entire model suite (Hugging Face, timm, and TorchBench) by prepending model name to repro script Pull Request resolved: https://github.com/pytorch/pytorch/pull/106201 Approved by: https://github.com/ezyang	2023-08-03 03:34:04 +00:00
angelayi	6339f57fae	Update export/export-aot-inductor benchmark code (#106323 ) Update export/export-aot-inductor benchmark code to use recent changes related to kwarg inputs and dataclass outputs. Updated [dashboard](https://hud.pytorch.org/benchmark/compilers?startTime=Mon%2C%2031%20Jul%202023%2017%3A28%3A05%20GMT&stopTime=Tue%2C%2001%20Aug%202023%2017%3A28%3A05%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/benchmark&lCommit=f0987867a88b0b9510fcaf33307150e61517e7a1&rBranch=main&rCommit=f23d755e1f835485b8fef5661e7f983b520d844e) 80% pass rate on HF for export: P801372961 20% pass rate on HF for export-aot-inductor: [link](https://hud.pytorch.org/benchmark/huggingface/inductor_aot_inductor?startTime=Mon,%2031%20Jul%202023%2017:08:02%20GMT&stopTime=Tue,%2001%20Aug%202023%2017:08:02%20GMT&granularity=hour&mode=inference&dtype=bfloat16&lBranch=angelayi/benchmark&lCommit=f0987867a88b0b9510fcaf33307150e61517e7a1&rBranch=main&rCommit=f23d755e1f835485b8fef5661e7f983b520d844e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106323 Approved by: https://github.com/desertfire	2023-08-02 20:18:37 +00:00
Edward Z. Yang	0b8fbfe9de	automatic_dynamic_shapes is on by default (#106188 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106188 Approved by: https://github.com/albanD	2023-07-28 13:26:54 +00:00
Mark Saroufim	c759a57003	Skip deterministic mode for SAM (#105615 ) SAM uses cumsum which doesnt have a deterministic mode enabled so this the onl way I can work around this https://github.com/pytorch/pytorch/issues/89492 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/105615 Approved by: https://github.com/eellison, https://github.com/cpuhrsch	2023-07-21 01:52:08 +00:00
Elias Ellison	024d26208c	Add Freezing Option to Benchmarking (#105616 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105616 Approved by: https://github.com/desertfire	2023-07-20 22:50:51 +00:00
Michael Lazos	690ea933ca	Enable more e2e foreach optimizer compilation tests (#105438 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/105438 Approved by: https://github.com/jansel	2023-07-20 02:41:19 +00:00
Justin Chu	5ef023b05a	[BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105429 Approved by: https://github.com/malfet	2023-07-19 04:46:37 +00:00
Bin Bao	b10de43c0a	Add aot_inductor as a test backend for benchmarking (#105221 ) Summary: Original PR at https://github.com/pytorch/pytorch/pull/104977. Landing from fbcode instead. Add an aot_inductor backend (Export+AOTInductor) in the benchmarking harness. Note it is not a dynamo backend. Moved files from torch/_inductor/aot_inductor_include to torch/csrc/inductor as a more standard way for exposing headers Created a caching function in benchmarks/dynamo/common.py for compiling, loading and caching the .so file, as a proxy for a pure C++ deployment, but easier for benchmarking. Differential Revision: D47452591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105221 Approved by: https://github.com/jansel	2023-07-18 13:16:36 +00:00
Edward Z. Yang	10cbc9a063	Enable cuda graphs for dynamic shapes (#105064 ) The general idea is to do a separate CUDA graph for each size. Because of cuda graph trees, these graphs will all share the same memory pool, so your memory usage will only be the worst case memory usage of the biggest dynamic size you want. This requires an extra dispatch in the cudagraphified callable. You must pay for a CUDA graph recording for every dynamic size you encounter, but this is MUCH cheaper than running the entire PT2 compile stack, so I expect you to still see benefits. This was surprisingly easy to do. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105064 Approved by: https://github.com/voznesenskym	2023-07-14 16:13:50 +00:00
Yukio Siraichi	6abe0b2ee8	Disable translation validation on performance runs. (#104887 ) This PR disables translation validation (TV) when running the benchmark suits on performance workflows: inductor with A100s. In summary, the changes are: - Add flag for turning TV on and off on _benchmarks/dynamo/common.py_ - Turn TV on only on CI accuracy builds - Add `--no-translation-validation` target flag to _.ci/pytorch/test.sh_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104887 Approved by: https://github.com/ezyang	2023-07-11 17:30:40 +00:00
Yukio Siraichi	85cbe7e6fd	Add timeout for translation validation instances. (#104654 ) As of now, translation validation runs to its completion. However, Z3 is time consuming. PR #104464, for example, disables translation validation for a few benchmarks. Instead, this PR introduces a timeout for translation validation. In that case, Z3 will return `unknown`, since it wasn't able to prove or disprove the assertions. Then, we log it as a warning, but don't stop execution. Here's a summary of the changes: - Added an environment variable for turning translation validation on and off - Added an environment variable for setting the translation validation timeout - Possibly reverts the changes in #104464 - ~~Move from "QF_NRA" to "QF_NIRA" logic~~ - ~~It makes more sense, given the nature of the problems~~ - "QF_NRA" seems to solve more instances of _dynamo/test_dynamic_shapes.py_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104654 Approved by: https://github.com/ezyang	2023-07-08 19:19:00 +00:00
willfengg	202fb95c68	[benchmark][export] Add torch.export passrate for TB/TIMM benchmarks (#104382 ) issues resolved: https://github.com/pytorch/pytorch/issues/104294 local test on TB and TIMM * python benchmarks/dynamo/torchbench.py -d cuda --inference --accuracy --progress --export --print-dataframe-summary * python benchmarks/dynamo/timm_models.py -d cuda --inference --accuracy --progress --export --print-dataframe-summary why not HF * huggingface use kwargs (dict) to torch.nn.module * we will need to support kwargs in torch._export.export, which is in progress local test result timm 95% pass rate (58 ouf of 61 passed) P781702926 * 1 x [export specific]1 x ERROR:common:Mutating module attribute rel_indices during export * 1 x[not relevant to export] Unknown model (SelecSls42b) * 1 x [not relevant to export] Failed to load model: HTTP Error 409: Public access is not permitted on this storage account torchbench 54% pass rate (41 out of 75 passed) P781690552 * 7 x ERROR:common:Dynamo input and output is a strict subset of traced input/output * 3 x ERROR:common:call_method NNModuleVariable() / UserDefinedObjectVariable * 3 x ERROR:common:Mutating module attribute {xx} during export. * 2 x ERROR:common:inline in skipfiles * 2 x ERROR:common:Consider annotating your code using constrain_as_(). It appears that you're trying 1 x ERROR:common:guard on data-dependent symbolic int/float * 1 x ERROR:common:Tensor.tolist * 1 x ERROR:common:Tensor.numpy. Turn on config.numpy_ndarray_as_tensor and install torch_np to support tensor.numpy(). [may be dev * env?] * 1 x ERROR:common:missing: BUILD_SET * 1 x ERROR:common:whole graph export entails exactly one guard export * 1 x ERROR:common:call_function BuiltinVariable(str) [GetAttrVariable(UserMethodVariable(<function * 1 x ERROR:common:Dynamic slicing on data-dependent value is not supported * 1 x ERROR:common:Failed running call_function <function interpolate at 0x7f60a8361ea0>((FakeTensor(..., device='cuda:0', size=(1, 3, 427, * 1 x ERROR:common:Dynamo attempts to add additional input during export: value=0.6177528500556946, source=RandomValueSource(random_call_index=0) * 1 x Found following user inputs located at [16, 17, 18, 19, 20, 21, 22] are mutated. This is currently banned in the aot_export workflow. * 1 x RuntimeError: cumsum_cuda_kernel does not have a deterministic implementation * 4 x pass_due_to_skip * 1 x eager_2nd_run_OOM * 1 x fail_accuracy Pull Request resolved: https://github.com/pytorch/pytorch/pull/104382 Approved by: https://github.com/zhxchen17	2023-07-06 17:16:07 +00:00
Yukio Siraichi	0cee4e3c32	Turn translation validation off on timeouts. (#104464 ) Follow-up to PR: #97964 After the introduction of translation validation, (TV) a few TIMM and TorchBench benchmarks started failing due to TIMEOUT. This PR turns TV off for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104464 Approved by: https://github.com/malfet	2023-07-05 19:01:50 +00:00
Yukio Siraichi	40b8d10d5e	Re-land: Turn translation validation on for tests and accuracy runs by default. (#104467 ) Re-landing: #103611 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104467 Approved by: https://github.com/malfet	2023-07-05 19:01:50 +00:00
Edward Z. Yang	2385dad4b3	Enable automatic_dynamic_shapes by default (#103623 ) Some notes: * I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops * A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it * expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103623 Approved by: https://github.com/voznesenskym	2023-07-05 00:25:02 +00:00
PyTorch MergeBot	a2a8b4d415	Revert "Turn translation validation on for tests and accuracy runs by default. (#103611 )" This reverts commit `e311bed2a8`. Reverted https://github.com/pytorch/pytorch/pull/103611 on behalf of https://github.com/malfet due to Broke inductor tests ([comment](https://github.com/pytorch/pytorch/pull/103611#issuecomment-1614850276))	2023-06-30 15:54:18 +00:00
Yukio Siraichi	e311bed2a8	Turn translation validation on for tests and accuracy runs by default. (#103611 ) This PR turns translation validation on by default for tests and accuracy benchmark runs. It also installs Z3 on CI. The main changes are: - Add `--no-translation-validation` as an option in _test/run_tests.py_ - Set `PYTORCH_TEST_WITH_TV` environment variable - Add `TEST_WITH_TV` variable in _torch/testing/_internal/common_utils.py_ - Turn translation validation on for accuracy benchmarks in _benchmarks/dynamo/common.py_ - Add Z3 installation on CI scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/103611 Approved by: https://github.com/ezyang	2023-06-30 01:32:21 +00:00
BowenBao	c1a49823cd	[ONNX] Bench torch.onnx.dynamo_export and torch.onnx.export under dynamo bench (#103135 ) - Extend dynamo bench interface with '--compilers onnx' and '--compilers dynamo-onnx' - ONNX bench exports model to onnx and runs in ONNX Runtime. - Introduce error aggregation and report. - Scripts to build ONNX deps and running ONNX bench. - Huggingface accuracy check workaround for ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103135 Approved by: https://github.com/thiagocrepaldi, https://github.com/jansel	2023-06-22 01:21:09 +00:00
Bin Bao	a2988c9e6a	[CI] Switch inference accuracy and performance tests to bfloat16 (#103535 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103535 Approved by: https://github.com/eellison	2023-06-17 00:24:37 +00:00
Edward Z. Yang	bc6ec97e02	Switch dynamic_shapes to True by default (#103597 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597 Approved by: https://github.com/voznesenskym	2023-06-15 15:16:20 +00:00
Edward Z. Yang	5211fad738	cm3leon_generate is at edge of timeout, so bump it up (#103607 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103607 Approved by: https://github.com/malfet	2023-06-15 03:40:42 +00:00
PyTorch MergeBot	a60f6dbe69	Revert "Add groups to dynamo benchmarking output data (#103268 )" This reverts commit `455f542ed9`. Reverted https://github.com/pytorch/pytorch/pull/103268 on behalf of https://github.com/drisspg due to no longer needed ([comment](https://github.com/pytorch/pytorch/pull/103268#issuecomment-1591732331))	2023-06-14 17:50:34 +00:00
chuanqiw	3c5ac4baa4	[CI] Enable inductor dynamic accuracy test on cpu device (#103387 ) Enable inductor dynamic accuracy test on cpu in ci workflow to capture issue early. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103387 Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/desertfire	2023-06-14 06:12:41 +00:00
BowenBao	45104cb67f	Different csv headers by bench mode on infra error (#103134 ) As title. The headers are different for distinct bench mode. This PR is a supplement to https://github.com/pytorch/pytorch/pull/100372 to respect `performance` mode where numerical speedup is expected instead of status text. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103134 Approved by: https://github.com/thiagocrepaldi, https://github.com/ezyang	2023-06-13 03:40:22 +00:00
Driss Guessous	455f542ed9	Add groups to dynamo benchmarking output data (#103268 ) # Summary Ads the required information to enable this issue: https://github.com/pytorch/test-infra/issues/4268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103268 Approved by: https://github.com/huydhn	2023-06-12 21:09:42 +00:00
Edward Z. Yang	54daf870bc	CUDA graphs overrides dynamic shapes and forces specialization (#103290 ) Previously, cudagraphs and dynamic_shapes were incompatible and enabling dynamic shapes would forcibly disable cudagraphs. This new strategy I think is better. The idea is essentially that cudagraphs is an "optimization" that happens to guard on every input. When cudagraphs is on, we force everything static, and this automatically does the right thing because we will force a recompile if sizes change. This obsoletes https://github.com/pytorch/pytorch/pull/101813 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103290 Approved by: https://github.com/voznesenskym, https://github.com/eellison	2023-06-12 20:26:55 +00:00
Bin Bao	141828498c	[CI] Update inference accuracy test (#103361 ) Summary: 1) Switch inference accuracy test from fp32 to amp (consistent with dashboard run, https://github.com/pytorch/pytorch/pull/103220) 2) GoogleFnet fails in eager with amp or fp16, so fallback to always using fp32. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103361 Approved by: https://github.com/eellison	2023-06-12 19:34:18 +00:00
Edward Z. Yang	c3fdfca5da	Always create ShapeEnv, always apply unspec logic (#103302 ) Originally, my goal for this PR was to remove the `dynamic_shapes` tests in torch/_dynamo/variables/builder.py. However, one thing lead to another, and it turns out that it was easiest to do all of the following in one go: * Unconditionally allocate a ShapeEnv, no matter if dynamic_shapes is enabled or not (torch/_dynamo/output_graph.py). There is a small adjustment to export torch/_dynamo/eval_frame.py to account for the fact that a ShapeEnv always exists, even if you're not doing symbolic export. * Remove dynamic_shapes test from unspec logic (torch/_dynamo/variables/builder.py), the original goal * Specialize strides and storage offset if all sizes are dynamic (torch/fx/experimental/symbolic_shapes.py). This is required to deal with unconditional ShapeEnv: if a ShapeEnv exist, fake tensor-ification may choose to allocate symbols. The idea is that with `automatic_dynamic_shapes == False`, Dynamo should never request dynamic sizes, but this invariant was not upheld for nontrivial strides/offset. The rest are just auxiliary fixups from the above: * Workaround bug in FakeTensorProp where sometimes it doesn't return a FakeTensor (torch/fx/passes/fake_tensor_prop.py), see https://github.com/pytorch/pytorch/pull/103395 for follow up * Make ShapeProp correctly handle int inputs (torch/fx/passes/shape_prop.py) * Disable indexing strength reduction if `assume_static_by_default` is False (torch/_inductor/codegen/triton.py) * Fix hf_T5_generate to NOT toggle `assume_static_by_default` if dynamic shapes is not enabled (benchmarks/dynamo/common.py); technically this is not necessary anymore but it's in for safety. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103302 Approved by: https://github.com/voznesenskym	2023-06-12 12:48:28 +00:00
Edward Z. Yang	414ec6ce97	Turn off automatic_dynamic_shapes in prep for dynamic-by-default (#103320 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103320 Approved by: https://github.com/Skylion007	2023-06-10 02:49:59 +00:00
PyTorch MergeBot	d89dd05e4d	Revert "CUDA graphs overrides dynamic shapes and forces specialization (#103290 )" This reverts commit `c760f0e4dd`. Reverted https://github.com/pytorch/pytorch/pull/103290 on behalf of https://github.com/ezyang due to to handle the other cuda graphs case ([comment](https://github.com/pytorch/pytorch/pull/103290#issuecomment-1584977767))	2023-06-09 18:25:28 +00:00
Edward Z. Yang	c760f0e4dd	CUDA graphs overrides dynamic shapes and forces specialization (#103290 ) Previously, cudagraphs and dynamic_shapes were incompatible and enabling dynamic shapes would forcibly disable cudagraphs. This new strategy I think is better. The idea is essentially that cudagraphs is an "optimization" that happens to guard on every input. When cudagraphs is on, we force everything static, and this automatically does the right thing because we will force a recompile if sizes change. This obsoletes https://github.com/pytorch/pytorch/pull/101813 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103290 Approved by: https://github.com/voznesenskym	2023-06-09 17:43:47 +00:00
Will Constable	39201ce025	Make dynamo bench conditionally import DDP/FSDP (#103163 ) Avoids hitting importerror for singlenode benchmarks when running on a non-distributed build of pytorch. Fixes #102086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103163 Approved by: https://github.com/lezcano, https://github.com/wanchaol	2023-06-08 19:10:49 +00:00
Elias Ellison	18e4a466db	fix amp in inference in benchmarking suite (#103220 ) Even if you passed in --amp we would run inference in float32. `AlbertForMaskedLM` goes from 1.305 float32 to 1.724x amp, and then again to 1.910x with freezing. Benchmark numbers for amp are about to go way up lol. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103220 Approved by: https://github.com/desertfire	2023-06-08 05:16:22 +00:00
Edward Z. Yang	eeb3c62117	Add Wav2Vec2 HuggingFace support (#103009 ) This is not actually enabled in the benchmark suite as you need https://github.com/pytorch/pytorch/pull/103001 and also training is broken per https://github.com/pytorch/pytorch/issues/101160 but might as well review this part first. Contains https://github.com/pytorch/pytorch/pull/102979 but I will probably rebase past that once it lands. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103009 Approved by: https://github.com/Skylion007	2023-06-06 13:25:06 +00:00
Edward Z. Yang	cca7b38564	Don't allow skipping deepcopy (#102973 ) We might mutate it afterwards! This could lead to hard to understand bugs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102973 Approved by: https://github.com/albanD	2023-06-05 20:01:16 +00:00
Vinay Kumar Burugu	8215468870	Feature:To add --tolerance option to benchmark scripts (#102218 ) The "tolerance" option evaluates the model on the baseline device in eager mode (default: CPU) compared to the test device (e.g., CUDA, XLA, etc.) and compares the output tensors to determine the absolute tolerance value based on the [formula](https://pytorch.org/docs/stable/generated/torch.allclose.html). It then saves the results in a CSV file. This comparison highlights the tolerance/accuracy difference between XLA and GPU/CPU devices and can also be used to evaluate newer accelerators. This feature aims to identify accuracy failures on the test device (e.g., XLA) and facilitate quick bug triaging. This feature enables the following capabilities: 1. Ability to monitor accuracy issues of backends 2. Provide more informative picture on accuracy beyond pass/ fail status 3. Having a dump of accuracy information will help triage models accordingly The data generated using this feature is in the [spreadsheet](https://docs.google.com/spreadsheets/d/1A8BAzSqfAw0Q5rgzK5Gk__Uy7qhuynh8tedxKnH-t94/edit#gid=0). The spreadsheet data can be used to compile the below summary table: \| Suite \| Max Tolerance \| \| No. of models with high inaccuracy(>=0.005) \| \| Mean Tolerance \| \| \|------------------ \|:-------------:\|:--------:\|:-------------------------------------------:\|:--------:\|:--------------:\|:--------:\| \| \| xla \| inductor \| xla \| inductor \| xla \| inductor \| \| huggingface \| 0.1169 \| 0.0032 \| 1 \| 0 \| 0.0022 \| 0.0005 \| \| timm_models \| 0.0373 \| 2.8892 \| 10 \| 8 \| 0.0028 \| 0.7044 \| \| torchbench \| 3.013 \| 3.0381 \| 6 \| 2 \| 0.0016 \| 0.0016 \| \| All models \| 3.013 \| 3.0381 \| 17 \| 10 \| 0.0028 \| 0.7044 \| I used PyTorch release/2.0 branch and corresponding [commit_pin](https://github.com/pytorch/pytorch/blob/release/2.0/.github/ci_commit_pins/xla.txt) for XLA to generate the above data. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/102218 Approved by: https://github.com/jansel	2023-06-03 06:40:26 +00:00
Edward Z. Yang	624257890e	Reenable hf_T5_generate (#102818 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102818 Approved by: https://github.com/albanD	2023-06-02 17:59:53 +00:00
Edward Z. Yang	7c00d45312	Reenable cm3leon_generate (#102793 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102793 Approved by: https://github.com/albanD, https://github.com/awgu	2023-06-02 15:15:26 +00:00
Animesh Jain	65631d4515	[benchmarks] Use train mode for accuracy checks for HF models (#102578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102578 Approved by: https://github.com/desertfire	2023-05-31 19:47:18 +00:00
Bin Bao	47b884a74c	[inductor] Revert a CI remedy for Triton compilation error (#102541 ) Summary: revert https://github.com/pytorch/pytorch/pull/91634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102541 Approved by: https://github.com/ngimel	2023-05-31 13:13:51 +00:00
Animesh Jain	33a49eeae7	[benchmark] Flag to switch on activation checkpointing for HF models (#102557 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102557 Approved by: https://github.com/ngimel, https://github.com/Chillee	2023-05-30 23:46:14 +00:00
Horace He	e71ab21422	update triton pin (#101919 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101919 Approved by: https://github.com/ngimel	2023-05-30 17:16:05 +00:00
Animesh Jain	040d2cc969	[dynamo] Some torchrec_dlrm related fixes (#101953 ) Issue 1 of https://github.com/pytorch/pytorch/issues/101918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101953 Approved by: https://github.com/jansel	2023-05-28 17:56:08 +00:00
Bin Bao	ee33bae5c7	Fix an issue where checking sameness throw an exception (#102279 ) Summary: currently the exception is caught by outside and marked as infra_error Pull Request resolved: https://github.com/pytorch/pytorch/pull/102279 Approved by: https://github.com/anijain2305	2023-05-25 19:49:23 +00:00
Jason Ansel	5ba16011d7	Suppress profiler spam in dynamo benchmarks (#101942 ) Makes this stuff go away: ``` STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:311] Completed Stage: Warm Up STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:317] Completed Stage: Collection STAGE:2023-05-20 20:49:34 63580:63580 ActivityProfilerController.cpp:321] Completed Stage: Post Processing ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101942 Approved by: https://github.com/shunting314, https://github.com/desertfire	2023-05-22 18:32:31 +00:00
Edward Z. Yang	22ca1a1124	Partially fix shape mismatch in vision_maskrcnn (#101477 ) The bulk of the heavy lifting is happening in https://github.com/pytorch/vision/pull/7592 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101477 Approved by: https://github.com/voznesenskym	2023-05-21 05:20:08 +00:00
drisspg	6f13d6892a	Add meta support for multinomial (#101324 ) # Summary Found this when trying to compile the text gen loop of nanogpt here: `b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324 Approved by: https://github.com/ngimel	2023-05-19 00:04:26 +00:00
Animesh Jain	794cc3952e	adding moco to CI (#101098 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/101098 Approved by: https://github.com/desertfire	2023-05-18 10:01:49 +00:00
chuanqiw	b315c9b5ab	[CI] Enlarge memory for OOM models in inductor cpu HF accuracy test (#101395 ) Change the Inductor CPU HF accuracy test node from `linux.4xlarge` (32GB) to `linux.24xlarge` (192GB) to enlarge the node memory. Also add 3 HF models back to CI test. Fixes #101390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101395 Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/huydhn	2023-05-18 09:23:30 +00:00
Jason Ansel	403ce1a1c9	Fix benchmark model names printouts with tqdm (#101627 ) With the TQDM changes in #100969 -- the models names ended up getting hidden from the benchmark printouts. We would print the model name with no newline, then tqdm would print a `\r` and overwrite the name of the running model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101627 Approved by: https://github.com/ezyang	2023-05-17 15:31:11 +00:00
PaliC	e0fc24cdc5	add retries to inductor benchmark suite (#101019 ) This pr accomplishes 1) Enables retries for downloading torchbenchmark and huggingface models in a similar method to how we do it for timm models right now. 2) creates a `_download_model` function for the hugging face and TIMM runners whose output I plan to use to preload the models somewhere if possible (please double check I'll be saving the right thing). Instead of retries, we plan to just add torchbench to a docker image as it is relatively small. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 3361a4c</samp> > _We're the brave and bold coders of the `common.py` module_ > _We've made a handy function for downloading models_ > _We've shared it with our mates in the other runners_ > _So pull and push and try again, we'll get them all in time_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/101019 Approved by: https://github.com/huydhn, https://github.com/desertfire	2023-05-16 21:41:50 +00:00
Edward Z. Yang	41468833fb	vision_maskrcnn is now deterministic (#101116 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101116 Approved by: https://github.com/ngimel	2023-05-16 21:32:17 +00:00
Yanbo Liang	e4eaf33346	Re-enable detectron2_maskrcnn on CI (#100791 ) #99665 has been fixed, we can re-enable these models on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100791 Approved by: https://github.com/huydhn	2023-05-16 04:25:58 +00:00
Edward Z. Yang	f48718f749	Update torchbench pin (#101365 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101365 Approved by: https://github.com/albanD, https://github.com/awgu	2023-05-15 16:52:31 +00:00
Natalia Gimelshein	49578913fb	update timm commit (#100931 ) Fixes #100903 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100931 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-05-12 04:22:08 +00:00
Edward Z. Yang	41a4e22015	Update torchbench pin (#101071 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101071 Approved by: https://github.com/malfet	2023-05-11 18:09:40 +00:00
Jason Ansel	036a8d6b4a	Remove NullContext() from benchmark runners (#100309 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100309 Approved by: https://github.com/Skylion007, https://github.com/anijain2305	2023-05-11 06:42:27 +00:00
XiaobingSuper	c84627c2ee	benchmarks: make --amp works for cpu path (#101057 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101057 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/jansel	2023-05-11 02:51:38 +00:00
Edward Z. Yang	c658732950	[RFC] Add tqdm to benchmarking script (#100969 ) Here's what it looks like, on a slower running benchmark: https://github.com/pytorch/pytorch/assets/13564/47c4a5bd-e963-45de-a15c-2fd943de0fa4 There's actually quite a bit of dead time, it's possible there are more spots we should add tqdm to. Looking for opinions on utility of this. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100969 Approved by: https://github.com/Skylion007	2023-05-10 15:39:24 +00:00
Bin Bao	76cc3ab4f3	[CI] Delete skips from https://github.com/pytorch/pytorch/issues/93847 (#96049 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96049 Approved by: https://github.com/jansel	2023-05-10 01:27:27 +00:00
Edward Z. Yang	9eab13fc90	Reenable llama benchmark (#100877 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100877 Approved by: https://github.com/albanD	2023-05-09 01:12:54 +00:00
Natalia Gimelshein	9790f9174a	skip lcnet (#100726 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/100726 Approved by: https://github.com/voznesenskym	2023-05-05 23:19:42 +00:00
Animesh Jain	3f025c607c	summarize graph breaks (#100696 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100696 Approved by: https://github.com/yanboliang	2023-05-05 22:27:47 +00:00
Animesh Jain	8994d9e610	[dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590 ) For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590 Approved by: https://github.com/voznesenskym, https://github.com/wconstab	2023-05-04 18:52:21 +00:00
Yanbo Liang	896eb1db26	[Dynamo] Skip TB Background_Matting model eager accuracy check because of non deterministic (#100513 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100513 Approved by: https://github.com/anijain2305	2023-05-03 07:06:50 +00:00
Jason Ansel	fdc853b14c	Add --baseline option to benchmark runners (#100266 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100266 Approved by: https://github.com/ngimel	2023-05-02 02:35:11 +00:00
Edward Z. Yang	e918fd18e7	Disable densenet121 as it is flaky (#100371 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100371 Approved by: https://github.com/voznesenskym	2023-05-02 01:49:11 +00:00
Edward Z. Yang	5d93265cce	Report timeout/infra_error instead of 0.0000 on infra error (#100372 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/100372 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-05-01 14:56:01 +00:00
Huy Do	9a69634b28	Skip some failing dynamic shape models on periodic (#99895 ) After some recent changes, these tests are failing in periodic trunk. So let's move them to unstable while waiting for the team to root cause the issue https://github.com/pytorch/pytorch/issues/99893. Note that a forward fix can use `ciflow/unstable` to run those unstable jobs to confirm that they are fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99895 Approved by: https://github.com/malfet	2023-04-25 07:05:08 +00:00
Edward Z. Yang	04e8df4dd7	Return full accuracy status for printing, not abbreviated version (#99894 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99894 Approved by: https://github.com/jansel	2023-04-25 05:17:10 +00:00
Edward Z. Yang	cd61707167	yolov3 dynamic training accuracy is fixed (#99896 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99896 Approved by: https://github.com/albanD	2023-04-25 01:15:24 +00:00
chuanqiw	e9e5ffe83e	Re-enable dynamic shapes test in dynamo benchmark (#99816 ) Set `torch._dynamo.config.assume_static_by_default = False` for dynamic shapes flag enabled Fixes #99815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99816 Approved by: https://github.com/jgong5, https://github.com/ezyang	2023-04-24 20:34:52 +00:00
Edward Z. Yang	f602b3a6ae	Preserve mark_dynamic when cloning inputs (#99617 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99617 Approved by: https://github.com/ngimel, https://github.com/voznesenskym, https://github.com/anijain2305	2023-04-22 19:46:31 +00:00
Bin Bao	e09f785a72	[CI] Remove inductor skip list for Huggingface (#99375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99375 Approved by: https://github.com/anijain2305	2023-04-21 18:13:22 +00:00
Edward Z. Yang	fc8fa6c356	Require at least one tensor to be marked dynamic with --dynamic-batch-only (#99620 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99620 Approved by: https://github.com/voznesenskym	2023-04-21 00:17:08 +00:00
Huy Do	5315317b7b	Skip some detectron2_maskrcnn models with KeyError _ignore_torch_cuda_oom (#99599 ) These tests are failing in trunk `233cc34d3b` with `KeyError: '_ignore_torch_cuda_oom'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99599 Approved by: https://github.com/malfet	2023-04-20 18:11:35 +00:00
Jason Ansel	3233450d07	Add TorchXLA option to benchmark runner (#99505 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99505 Approved by: https://github.com/voznesenskym	2023-04-19 22:44:52 +00:00
Will Constable	9ac2b041c9	Make opacus xfail instead of skip (#99380 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99380 Approved by: https://github.com/desertfire, https://github.com/anijain2305	2023-04-19 21:09:06 +00:00
Michael Voznesensky	113bd11cf4	Skip levit (#99491 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99491 Approved by: https://github.com/ezyang	2023-04-19 07:41:42 +00:00
Edward Z. Yang	039faf0dbf	Add invariant that all symbolic shapes must be bound in graph (#99089 ) Previously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards. With this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well. This apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99089 Approved by: https://github.com/voznesenskym	2023-04-16 01:48:19 +00:00
Yanbo Liang	15fe5a0798	[Dynamo] Fix benchmark --verbose error (#99224 ) Dynamo benchmark --verbose is broken: ``` Traceback (most recent call last): File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/torchbench.py", line 400, in <module> torchbench_main() File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/torchbench.py", line 396, in torchbench_main main(TorchBenchmarkRunner(), original_dir) File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 1967, in main return maybe_fresh_cache( File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 993, in inner return fn(args, *kwargs) File "/scratch/ybliang/work/repos/pytorch/benchmarks/dynamo/common.py", line 2135, in run torch._dynamo.config.log_level = logging.DEBUG File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/config_utils.py", line 67, in __setattr__ raise AttributeError(f"{self.__name__}.{name} does not exist") AttributeError: torch._dynamo.config.log_level does not exist ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99224 Approved by: https://github.com/voznesenskym	2023-04-15 20:18:50 +00:00
Bin Bao	34f681c13b	[CI] Remove inductor skip list for timm_models (#98840 ) Summary: check against the expected csv file instead of skipping tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/98840 Approved by: https://github.com/ezyang	2023-04-15 13:54:41 +00:00
Bin Bao	e5501a967e	[inductor] Support IndexPutFallback in cpp_wrapper (#98972 ) Summary: 1) Make the fallback index_put generate the right cpp code in cpp_wapper 2) Add a --cpp-wrapper option to common.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/98972 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-04-13 15:41:03 +00:00
Edward Z. Yang	b8b840be3d	Convert logging f-strings to use % format, part five (#98765 ) This does some annoying but simple cases by hand. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765 Approved by: https://github.com/wanchaol	2023-04-11 13:17:59 +00:00
Edward Z. Yang	b09722f540	Convert logging f-strings to use % format, part two (#98700 ) This hits multi-line logging strings Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Edward Z. Yang	9a8f71f23e	Convert logging f-strings to use % format (#98697 ) Codemod done with https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with assistance from ChatGPT. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697 Approved by: https://github.com/voznesenskym	2023-04-10 12:19:31 +00:00
Edward Z. Yang	bdb79a8f52	Turn off divisible_by_16 for dynamic shapes; support ablation (#98471 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98471 Approved by: https://github.com/ngimel, https://github.com/voznesenskym	2023-04-06 12:57:07 +00:00
Edward Z. Yang	cf1bfca2ba	Require batch dimensions to be compiled dynamically (#98334 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98334 Approved by: https://github.com/voznesenskym	2023-04-05 19:40:22 +00:00
Edward Z. Yang	b923f84805	Switch accuracy CI to dynamic batch only (#98307 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98307 Approved by: https://github.com/wconstab	2023-04-05 01:20:12 +00:00
Elias Ellison	a3365e1d0d	Increment pending forwards after invocation (#98101 ) Forwards are only pending following invocation, not before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98101 Approved by: https://github.com/ngimel	2023-04-05 00:04:39 +00:00
Bin Bao	69ff39d2e7	Skip gat, gcn and sage for TorchBench CUDA test (#98244 ) Summary: The three models only support CPU for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98244 Approved by: https://github.com/ezyang	2023-04-04 01:06:18 +00:00
Jason Ansel	55afaa46a4	Support functools.partial and itertools.product (#98120 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98120 Approved by: https://github.com/anijain2305	2023-04-03 18:23:25 +00:00
Bin Bao	ba7ee00f00	Add a --inference flag to dynamo benchmark script (#98173 ) Summary: When calling benchmark scripts, make it a requirement to pass --inference or --training Pull Request resolved: https://github.com/pytorch/pytorch/pull/98173 Approved by: https://github.com/huydhn	2023-04-03 17:12:28 +00:00
Jason Ansel	92b46202ef	Add --stats option to benchmark scripts (#98109 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98109 Approved by: https://github.com/anijain2305	2023-04-02 02:23:13 +00:00
Edward Z. Yang	5df59f957f	Fix G001,G002,G003 in logs to % syntax (#97812 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97812 Approved by: https://github.com/Skylion007, https://github.com/kiukchung, https://github.com/malfet, https://github.com/mlazos	2023-04-01 01:43:33 +00:00
Bin Bao	c699ac17df	[CI] Bump up torchbench version to fix dynamo graph breaks in transformers (#98003 ) Summary: When we bump up the torchbench version pin last time, we found there were new graph breaks introduced with the trasformers version upgrade, see https://github.com/pytorch/pytorch/pull/96782. Turns out they are already fixed upstream, see https://github.com/huggingface/transformers/pull/21648 and https://github.com/pytorch/benchmark/pull/1511 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98003 Approved by: https://github.com/ngimel	2023-03-31 16:52:09 +00:00
Edward Z. Yang	97fc8ea5f4	Run the benchmark suite with dynamic batch only (#97912 ) Symbolic shapes compile time on full CI with inductor is horribly long (even though our aot_eager local runs seemed to suggest that the added latency was only 10s per model.) To patch over the problem for now, run the benchmark suite with dynamic batch only. This should absolve a lot of sins. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97912 Approved by: https://github.com/janeyx99, https://github.com/desertfire	2023-03-30 18:04:48 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
William Wen	b93e1f377e	[dynamo, benchmarks] Add inductor-mode (for max-autotune) and warm start options to dynamo benchmarks (#97719 ) Title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97719 Approved by: https://github.com/shunting314	2023-03-29 21:09:00 +00:00
Edward Z. Yang	f754be897a	Disable speedup_experiment_ds (#97806 ) It seems to be broken. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97806 Approved by: https://github.com/jansel	2023-03-29 01:27:31 +00:00
Bin Bao	a9a81ab7e3	[CI] Run benchmark test with dynamo_eager in periodic (#97543 ) Summary: The idea is to catch any dynamo_eager regression earlier, and also we can take that off the dashboard run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97543 Approved by: https://github.com/huydhn	2023-03-28 01:02:49 +00:00
Shunting Zhang	652592efa9	[inductor] use torch.prifiler in the triton wrapper (#97405 ) I think it's helpful to use torch.profiler to profile the triton wrapper. E.g., I tried it for nvidia_deeprecommender's infernece graph. Even with max-autotune, we see the majority of the time the GPU is running 2 mm/addmm op. That's why max autotune does not help for this model since tuning does not affect the external mm ops. <img width="711" alt="Screenshot 2023-03-22 at 5 49 28 PM" src="https://user-images.githubusercontent.com/52589240/227072474-2f0d7205-4a10-4929-b1b7-551214788c61.png"> next step I'll check why the triton mm kernels are not picked. EDIT: the above screenshot is captured without max-autotune due to a typo. below is the trace with max-autotune enabled: <img width="712" alt="Screenshot 2023-03-22 at 6 43 26 PM" src="https://user-images.githubusercontent.com/52589240/227077624-fdccf928-be08-4211-871b-a9e3d7b76fbe.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97405 Approved by: https://github.com/ngimel	2023-03-27 21:54:25 +00:00
Edward Z. Yang	cff4826f28	pytorch_unet is now passing (#97309 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97309 Approved by: https://github.com/janeyx99, https://github.com/zou3519	2023-03-22 13:55:05 +00:00
Bin Bao	be49d3b170	[CI] Turn on debug logging for dla102 and gernet_l (#97307 ) Summary: Log the generated code for those two flaky tests to see if there is any codegen difference when they fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97307 Approved by: https://github.com/ezyang	2023-03-22 13:42:13 +00:00
Natalia Gimelshein	e7d9331688	[inductor] hoist symbolic padding expressions (#97099 ) Towards fixing pnasnet5large, see #96709. The generated kernel looks much better ``` @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: 'fp32', 1: 'fp32', 2: 'i32', 3: 'i32', 4: 'i32', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': [], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 6), equal_to_1=())]}) @triton.jit def triton_(in_ptr0, out_ptr0, ks0, ks1, ks2, ks3, xnumel, XBLOCK : tl.constexpr): xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x1 = (xindex // ks0) % ks0 x0 = xindex % ks0 x2 = (xindex // ks3) x4 = xindex tmp0 = x1 + ((-1)ks1) tmp1 = 0 tmp2 = tmp0 >= tmp1 tmp3 = ks2 tmp4 = tmp0 < tmp3 tmp5 = x0 + ((-1)ks1) tmp6 = tmp5 >= tmp1 tmp7 = tmp5 < tmp3 tmp8 = tmp2 & tmp4 tmp9 = tmp8 & tmp6 tmp10 = tmp9 & tmp7 tmp11 = tl.load(in_ptr0 + (x0 + ((-1)ks1) + (ks2x1) + (x2(ks2ks2)) + ((-1)ks1ks2) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0) tmp12 = tl.where(tmp10, tmp11, 0.0) tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask) ``` Interestingly, removing `expand` in in index `simplify` function makes `load` expression a little bit better, but `store` fails to simplify to flat store in this case, so I'm leaving `expand` in. Full pnasnet still chokes on `ceiling` in batch_norm kernels, additionally, it looks like shape propagation goofs in inductor and generates overly complicated expressions, we should switch to meta data from fx graph. I'm still not adding `ceil` print to triton, because we should be able to hoist all indexing expression (and just printing ceil without converting to int64 doesn't work) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97099 Approved by: https://github.com/jansel	2023-03-21 21:43:32 +00:00
Edward Z. Yang	e74c5e5637	rexnet_100 is disabled for static, does not need dynamic listing (#97100 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97100 Approved by: https://github.com/Skylion007	2023-03-19 20:57:49 +00:00
Bin Bao	577d930c39	[CI] Revert https://github.com/pytorch/pytorch/pull/96195 (#96897 ) Summary: https://github.com/pytorch/pytorch/pull/96195 was an experiment for debugging flaky failures on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96897 Approved by: https://github.com/ngimel	2023-03-16 06:28:18 +00:00
Edward Z. Yang	3606f59366	Default specialize_int to False (#96624 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624 Approved by: https://github.com/janeyx99	2023-03-16 02:54:18 +00:00
Will Constable	54cd4a67d0	Output peak memory stats from dynamo torchbench perf CI (#95666 ) Adds absolute memory usage numbers (in addition to compression ratio) to performance jobs. Example output: <img width="1211" alt="image" src="https://user-images.githubusercontent.com/4984825/225419950-500908c5-00ce-4711-afa2-c995bf90d35d.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95666 Approved by: https://github.com/ezyang, https://github.com/williamwen42	2023-03-15 19:24:47 +00:00
Bin Bao	33c7be360f	[reland][CI] switch torchbench to a pinned version (#96782 ) Summary: This is reland of https://github.com/pytorch/pytorch/pull/96553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96782 Approved by: https://github.com/huydhn	2023-03-15 12:46:36 +00:00
Edward Z. Yang	037acd5a22	Update CI skips (#96745 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96745 Approved by: https://github.com/wconstab	2023-03-14 22:19:10 +00:00
PyTorch MergeBot	be4eaa69c2	Revert "[CI] switch torchbench to a pinned version (#96553 )" This reverts commit `61d6ccd29a`. Reverted https://github.com/pytorch/pytorch/pull/96553 on behalf of https://github.com/desertfire due to land race	2023-03-14 21:39:45 +00:00
PyTorch MergeBot	ba4fb9b6ad	Revert "Default specialize_int to False (#96624 )" This reverts commit `1ac8782db2`. Reverted https://github.com/pytorch/pytorch/pull/96624 on behalf of https://github.com/kit1980 due to Broke inductor/test_torchinductor_dynamic_shapes.py	2023-03-14 19:43:47 +00:00
Bin Bao	61d6ccd29a	[CI] switch torchbench to a pinned version (#96553 ) Summary: Previously we were using a branch on torchbench which skips torchaudio. We should switch to make sure a good test coverage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96553 Approved by: https://github.com/huydhn, https://github.com/ezyang	2023-03-14 18:42:22 +00:00
Edward Z. Yang	1ac8782db2	Default specialize_int to False (#96624 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624 Approved by: https://github.com/janeyx99	2023-03-14 18:37:47 +00:00
David Berard	6e3d51b08a	[inductor][CI] also skip rexnet_100 on non-dynamic shapes (#96691 ) Recent failures show rexnet_100 accuracy is flaky also on non-dynamic shapes (was already disabled for dynamic shapes in #96474). The failure occurs for the same reason (stem.bn.weight.grad). e.g. https://github.com/pytorch/pytorch/actions/runs/4402868441/jobs/7710977874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96691 Approved by: https://github.com/desertfire	2023-03-14 18:11:59 +00:00
Edward Z. Yang	ff7e510d1e	Correctly use PythonPrinter for generating wrapper code referencing sympy (#96710 ) Otherwise you get stuff like ceiling(s0) which is not valid Python code. Fixes volo_d1_224 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96710 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-03-14 14:35:52 +00:00
Wang, Eikan	3cad8d23d0	[Inductor] Skip the hf_T5_base due to intermittent failure on CI (#96649 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96649 Approved by: https://github.com/desertfire	2023-03-14 07:40:20 +00:00
Edward Z. Yang	507feb805f	Don't specialize torch.Size with specialize_int = False (#96419 ) Fixes https://github.com/pytorch/pytorch/issues/95868 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96419 Approved by: https://github.com/jansel, https://github.com/ngimel	2023-03-14 01:32:58 +00:00

1 2 3 4 5 ...

401 Commits